QA / SDET Interview Questions
200 scenario-based questions with detailed model answers, organized skill-wise and tool-wise. Filter by topic, level or keyword, reveal the answer — then pressure-test yourself in a real mock.
You inherit a suite with 1,800 UI tests and almost no unit coverage; nightly runs take six hours and fail randomly. Walk me through how you'd rebalance the pyramid over two quarters without halting feature delivery.
Leadership wants "100% automation" written into your team's OKRs. How do you push back with a strategy that defines what should never be automated, and what measurable quality outcomes you'd offer them instead?
A new microservice ships next sprint with no test strategy at all, and the dev lead insists unit tests are enough. How do you decide which integration and E2E scenarios genuinely earn a place?
Your org has three QA teams each maintaining their own E2E suite against the same monolith, with roughly 60% overlapping coverage. How do you consolidate strategy, ownership, and tooling without political fallout?
Release-eve, a critical payment flow has zero automated coverage and one manual tester available for four hours. How do you scope a risk-based smoke strategy for tonight and a permanent fix for next sprint?
Your contract tests, API tests, and UI tests all cover the checkout flow, and finance is asking why testing costs keep rising. How do you audit redundancy across layers and decide what gets deleted?
Developers complain your E2E suite blocks merges for 45 minutes, while product complains escaped defects doubled last quarter. Reconcile these two signals and tell me where in the pyramid the actual gap sits.
You join a startup with zero tests and weekly production fires. The CTO gives you one quarter and one engineer. Sequence your first 90 days of test strategy investment and justify the order.
Half of your "unit tests" spin up the whole Spring context and hit a real database. How would you reclassify, restructure, and re-tier them without losing the coverage they currently provide?
A platform rewrite means your 2,000-test regression suite targets a UI that will not exist in six months. Decide what you preserve, what you port, and what you deliberately abandon — and how you'd sell that.
Your team reports 85% coverage, yet a one-line currency-rounding bug reached production and cost ₹40 lakh. Explain to the VP why the coverage number lied and which risk signals you would track instead.
Mobile, web, and API teams each want their own definition of done for testing. Draft the shared exit criteria you'd propose, and describe the per-platform exceptions you would be willing to allow.
Your company acquires a competitor whose product is tested manually with a six-week regression cycle, and you must merge release trains in 90 days. Lay out your integration test strategy and its biggest risk.
A Selenium suite passes locally but 30% of tests fail on the Linux Grid with ElementClickInterceptedException. Walk through your diagnosis — viewport, headless rendering, timing — and the fix you would standardize across the suite.
Your team sprinkled Thread.sleep across 400 Selenium tests to "stabilize" them, and the suite now takes three hours. Describe your wait-strategy overhaul and the guardrails that stop sleeps from creeping back in.
After a frontend migration to a new component library, 70% of your XPath selectors broke overnight. How do you triage the carnage tonight, and how do you redesign your locator strategy so this never repeats?
You run 600 Selenium tests on a self-hosted Grid that randomly drops sessions under load. Management asks whether to fix the Grid, move to a containerized setup, or buy a cloud provider. Frame your evaluation.
A test clicks a button that triggers a file download, but the file lands in a different directory on CI and assertions fail intermittently. How do you make download verification deterministic across local and CI environments?
StaleElementReferenceException appears in 15 different tests, always on a dashboard that re-renders via polling. What is actually happening in the DOM, and what reusable abstraction do you build to handle it cleanly?
Your Selenium suite cannot reach inside a payment iframe served by a third-party PSP, and compliance forbids stubbing it in staging. Design a test approach that gives real confidence without violating that constraint.
A new teammate writes tests that locate elements by full absolute XPath copied from DevTools. Before code review turns hostile, what locator guidelines and review checklist do you institute, and how do you justify them?
You run the same Selenium suite nightly against Chrome, Firefox, and Edge, but Firefox failures are 90% rendering-timing noise. Decide whether full cross-browser E2E is worth it and defend the cheaper alternative you'd propose.
Your legacy app opens actions in new windows, nests three iframes deep, and uses native OS file-upload dialogs. Outline the Selenium patterns and escape hatches — CDP, Robot, AutoIt — you would standardize and document.
Every test authenticates through the UI login page, adding 20 seconds per test and breaking whenever marketing redesigns the login screen. How do you re-architect authentication handling for the whole suite?
Leadership wants to migrate 1,200 Selenium tests to Playwright. Estimate the real cost, identify which tests should not be migrated at all, and describe the parallel-run safety period you would insist on.
A shadow-DOM-heavy design system breaks every traditional locator your team writes. What is your strategy — getShadowRoot plumbing, JavaScript execution, or pushing developers for test hooks — and what trade-offs come with each?
Your Playwright suite is green locally but times out on CI only for tests involving a date picker; the trace shows clicks landing before hydration completes. How do you fix the wait condition rather than the symptom?
The team migrated from Cypress to Playwright but kept writing chained commands and arbitrary waitForTimeout calls. Design the linting rules, fixtures, and review gates that enforce idiomatic Playwright across forty contributors.
A Cypress test intercepting /api/orders passes alone but fails in the full run because an earlier spec's intercept leaks into it. Explain the isolation failure and how you would structure intercepts to prevent it.
Product asks you to automate a multi-tab OAuth flow, and Cypress's single-tab limitation blocks you. Lay out your realistic options — programmatic auth, stubbing the popup, or switching tools — and commit to one with reasons.
Your Playwright suite runs 1,400 tests across 12 parallel workers, and a shared staging database makes 5% of runs collide. Redesign for worker-scoped data isolation without standing up twelve separate environments.
Designers ship visual tweaks weekly, and your Playwright screenshot tests generate 200 diffs every Monday morning. How do you make visual testing useful instead of noise — thresholds, masking, or component-level snapshots?
You must test a WebSocket-driven trading dashboard where prices update every 200 milliseconds, and deterministic assertions keep failing. Design the mocking and clock-control strategy in Playwright that makes this feature reliably testable.
A junior keeps storing element handles in variables and reusing them after navigation, causing intermittent failures. Explain locator auto-waiting versus stale handles to them, and describe the team conventions you would codify afterward.
Your org wants one E2E framework for three squads: one loves Cypress's debugging, one needs Safari coverage, one needs multi-tab support. Drive the decision with explicit criteria and describe how you'd handle the losing camp.
Playwright trace files have ballooned your CI artifact storage to 40GB a week and triggered a finance review. Design a retention policy — traces on retry only, screenshot rules, sharding — that preserves debuggability cheaply.
A checkout test fails only in headed mode during stakeholder demos but passes headless in CI every time. What rendering, focus, and animation differences would you investigate, and how do you make the suite mode-agnostic?
After adopting Playwright component testing, teams now duplicate coverage between the component and E2E layers. Define the boundary — what belongs in each layer — and the review heuristic that keeps them from drifting together again.
Your Cypress dashboard shows the same 12 tests retried-to-green for three months, and nobody investigates them anymore. Argue for or against automatic retries, and describe the quarantine workflow you would implement instead.
Your Postman collection has 300 requests with hardcoded staging URLs and tokens pasted by hand, and an expired token killed yesterday's regression mid-run. Re-architect environments, auth handling, and secrets before the next cycle.
A partner integration fails in production although your RestAssured suite is fully green; you discover staging returns mocked 200s for the partner API. Redesign the test boundary so a green build actually means working.
An endpoint returns 200 with an empty array both when a user has no orders and when the user ID does not exist at all. Developers say it's fine. What assertions and contract conversation follow?
You own API tests for 40 microservices, and upstream schema changes silently break consumers every month. Build a layered defense — schema validation, version checks, smoke probes — and state exactly where each control runs.
A RestAssured test fails on a 502 roughly once in fifty runs, and the team wants a blanket retry. Argue when retries are legitimate for API tests and how you'd implement them without masking real outages.
Your POST tests have created thousands of orphan records in staging, and the environment team is furious. Design self-cleaning API tests — teardown hooks, dedicated tenants, TTL data — and weigh the trade-offs of each approach.
Release-eve, you must validate a breaking change to the /payments API that three internal consumers depend on, but only one consumer has any tests. How do you assess and contain the blast radius in four hours?
The same JSON response asserts fine in Postman, but your RestAssured deserialization test fails on an unexpected null field. Walk through how strict you would make response validation across the suite, and why.
Your API suite of 900 tests takes 70 minutes because every test re-authenticates and re-seeds its own data. Get it under ten minutes — describe token caching, parallelization, and the data strategy concretely.
An idempotency bug double-charged customers during a retry storm, and no test caught it. Design the API test cases for idempotency keys, concurrent retries, and partial-failure semantics that you would now mandate for every payment endpoint.
QA writes API tests a full sprint after each endpoint ships, and developers say specs change too fast to test earlier. How do you move API testing left without becoming the team's bottleneck?
You're asked to certify a GraphQL gateway where clients compose arbitrary queries, so exhaustive testing is impossible. Define the risk-based test matrix — query depth limits, resolver-level auth, N+1 behavior — you would actually build.
A rate-limited public API allows 100 requests per minute, but your CI runs hammer it and got the company IP banned last week. Restructure the negative, pagination, and load-adjacent suites around this hard constraint.
Your Appium suite passes on the Pixel emulator, but 40% fails on real Samsung devices with different keyboards, popups, and timing. Triage the failures and define your real-device versus emulator policy going forward.
The company supports Android back to API 26 and iOS 15+, but you can afford eight devices in the lab. Build the device matrix from analytics data and defend what you deliberately will not test.
A login test fails because a system permission dialog appears on first launch, but only on certain OS versions. How do you handle OS interrupts — autoGrantPermissions, dialog watchers — so the suite behaves deterministically?
Each Appium test takes eleven minutes because every test reinstalls the app and walks through onboarding. Redesign session and state management — deep links, backdoor APIs, noReset — to cut runtime by 80% safely.
Your iOS tests cannot find elements that are clearly visible, and the accessibility inspector shows missing identifiers everywhere. How do you drive an accessibility-ID convention with developers and enforce it in their pull requests?
Release-eve, a hybrid app's WebView checkout fails on Android 14 only, and switching context to WEBVIEW returns an empty list. Walk through your chromedriver checks, debugging sequence, and escalation decision tonight.
Gesture-heavy features — swipe-to-delete, pinch-zoom on maps — keep flaking across different screen sizes. How do you write resolution-independent gestures with W3C actions instead of the hardcoded coordinates your team currently uses?
Your team is debating cloud device farms versus an in-house rack: farm tests add four seconds of latency per command and flake more often. Frame the cost, stability, and security trade-offs and make the call.
A test must verify OTP autofill via SMS on both Android and iOS, and real SMS in CI is impossible. Design the seams — a test OTP backdoor, mocked SMS retriever — you would negotiate with developers.
After every Appium or OS upgrade, roughly 20% of your suite breaks for a week. Build the upgrade playbook — pinned versions, a canary lane, capability audits — that makes upgrades boring and predictable.
Push-notification flows are completely untested because nobody on the team knows how to assert on them. Outline how you would verify delivery, tap-through deep links, and badge state across both Android and iOS.
Your app ships in twelve languages, and Arabic RTL layouts visually break almost weekly. Design a localization test strategy combining Appium screenshots, pseudo-localization, and selective manual passes that actually scales with releases.
Battery drain, network-drop, and low-memory scenarios caused your last three production crashes, and none reproduce in the suite. Decide which non-functional mobile tests you automate versus run as exploratory charters on real devices.
Your JMeter test reported 800ms p95 in staging, but production p95 hit four seconds during the Diwali sale. List the modeling errors — think time, payload realism, cache warmth — you would audit first.
The load test "passed," but it was hitting a CDN-cached endpoint while production traffic is 60% cache-miss. Redesign the workload model and cache-busting strategy so your numbers stop lying to stakeholders.
k6 shows throughput plateauing at 400 RPS while the target service's CPU stays low. Walk through how you'd determine whether the bottleneck is the system under test or your own load generator.
Leadership wants a single "can we handle 10x?" answer before a marketing campaign. Design the capacity test — workload mix, ramp profile, abort criteria — and name the caveats you will refuse to drop from the report.
Your nightly k6 run flags a 15% latency regression, but the service team blames noisy neighbors in the shared cluster. How do you make performance results trustworthy enough to gate releases on?
A two-hour soak test passes, but production degrades after six hours as memory creeps until GC thrashes. Rework your endurance test design and the telemetry you would correlate with the k6 metrics.
The login endpoint needs load testing, but at scale it triggers real OTP SMS costs. Design the test seams and stubbing approach that keeps the scenario realistic without burning ₹2 lakh in SMS charges.
Your JMeter scripts model uniform request arrival, but real traffic spikes twenty-fold within two minutes at 10am IST. Argue why spike profiles change the verdict entirely, and explain how you would implement them.
Developers want performance checks in every PR pipeline, but full load tests take 40 minutes. Design a tiered approach — PR-level k6 smoke thresholds versus scheduled full runs — with concrete pass criteria for each tier.
After your load test, staging's database was left with nine million junk rows and the next team's test cycle collapsed. Build the data hygiene and environment-booking protocol for shared performance environments.
Your p99 looks healthy, but users complain checkout freezes; you discover the dashboard averages across all endpoints. How do you restructure k6 metrics — per-flow thresholds, trend tags — to surface the real pain?
A third-party payment sandbox throttles at 50 RPS while production allows 2,000, yet you must load test checkout end-to-end. Design the hybrid approach using service virtualization for the PSP leg, and state what it cannot prove.
Your team treats performance testing as a release-eve ritual that always "passes," and two outages later the CTO asks you to make performance engineering continuous. Lay out your first quarter, milestone by milestone.
Your nightly suite fails 8% of runs, always with different tests, and the team has started ignoring red builds entirely. What does your first week of triage look like to restore signal credibility?
You're given a flaky budget: any test failing more than 2% of runs gets quarantined automatically. Design the detection pipeline, the quarantine workflow, and the SLA that prevents quarantine from becoming a graveyard.
A test fails only when it runs after one specific other test. Walk through how you would prove test-order dependency — bisection, randomized ordering — and the shared-state culprits you would investigate first.
Retries-on-failure got your pipeline green, but two production incidents traced back to bugs that retried tests had intermittently caught. Make the case to leadership for removing retries, and explain what replaces them.
Failures cluster between 2am and 3am IST, and nobody can reproduce them during the day. Which environmental suspects — cron jobs, backup windows, certificate rotations, data refreshes — do you investigate, and how?
Your CI agents are heterogeneous — some have two cores, some eight — and timing-sensitive tests flake only on the slow ones. Decide between fixing the tests, homogenizing infrastructure, or load-aware scheduling, with costs for each.
A teammate "fixes" flaky tests by widening timeouts from five seconds to sixty; runtime doubled and the flakes persist. Explain why this approach fails and the deterministic-wait alternatives you would enforce instead.
Two thousand tests share one staging environment with three other teams deploying to it randomly. Build the isolation roadmap — ephemeral environments, contract seams, data namespacing — and sequence it under real budget pressure.
The same Docker-based pipeline produces different font rendering, locale, and timezone behavior across runs. Standardize the container contract for test reproducibility — what exactly goes into the image specification, and what gets pinned?
Release-eve, the pipeline is red with fourteen failures — mostly suspected flakes, possibly one real defect — and you have two hours to ship or slip. Describe your triage protocol and who makes the final call.
Your flaky-test dashboard lists sixty unstable tests, but developers dispute which failures are environment issues, test bugs, or product bugs. Define the classification taxonomy and the evidence standard you would impose for each label.
Test runtime grew from twelve to fifty-five minutes in a year, and engineers now batch merges to avoid CI. Design the recovery plan — sharding, diff-based selective runs, test impact analysis — with its rollout order.
A datetime bug makes the suite fail every month-end, and the team jokes about "calendar flakes." Eliminate the entire class — clock injection, frozen time, timezone matrices — across 1,500 tests without a big-bang rewrite.
Every Monday your suite fails because the weekend data-refresh job replaced the staging records your tests depend on. Redesign test-data ownership and seeding so the refresh process and the test suite stop fighting.
Compliance bans production data copies after a DPDP audit, but your regression suite secretly depends on a 2019 prod snapshot. Plan the migration to synthetic data without losing the suite's defect-finding power.
All tests share a single "testuser01" account, so parallel runs corrupt its cart state and orders collide. Design per-test data isolation — factories, unique namespacing — for a suite that cannot get new environments.
Your synthetic data generator produces perfectly clean records, but production bugs come from messy reality: emoji names, 200-character addresses, null middle names. Build that realism into generation without ever copying real users.
An E2E test needs a user with a 90-day-old subscription sitting in grace-period state, and creating it through the UI takes forty steps. Which seams — seeding APIs, time travel, DB fixtures — do you negotiate, and why?
Three teams seed the same staging database with conflicting reference data, so "works in staging" means nothing. Architect a shared seeding contract — versioned baseline, namespaced deltas, reset policy — and how you'd enforce it.
Your fixtures have drifted from the real schema — four columns were added this year — so tests pass against tables production no longer resembles. How do you detect and prevent fixture drift continuously?
A masking job anonymizes production data for testing, but QA discovers the masking on PAN and phone numbers is reversible. Outline the immediate containment, the masking redesign, and the validation tests you'd write for masking itself.
Tests that create orders silently depend on inventory seeded by a different test earlier in the run. Untangle the hidden data coupling and describe the rules that keep every test data-independent going forward.
You need realistic volumes — fifty million rows — to catch index regressions, but full datasets make environment spin-up take six hours. Design tiered datasets and decide precisely which tiers run at which pipeline stage.
Payment tests rely on sandbox magic card numbers that trigger specific PSP responses like insufficient funds and 3DS challenges; the numbers changed without notice and broke thirty tests. How do you harden this external dependency?
Auditors ask you to prove no real customer PII exists anywhere in your test environments, including old database backups and CI artifacts. Describe the discovery sweep and the permanent controls you would institutionalize afterward.
Time-dependent data — expiring offers, ageing invoices — makes 10% of your suite valid only on certain calendar days. Eliminate calendar coupling across both the tests and the seeding layer, and prove the fix holds.
Your E2E suite catches integration breaks three days after merge, so you propose Pact, and developers ask "why not just more API tests?" Make the case using a concrete failure scenario from a microservices system.
A provider team refuses consumer-driven verification, calling it "consumers dictating our API." Design the rollout — broker, can-i-deploy gating, pending pacts — that wins them over both politically and technically.
Your first Pact contract asserts exact values copied from a staging response, and provider verification now fails on every data refresh. Explain matcher-based contracts and how you would refactor toward flexible, type-level matching.
With 25 services and 60 consumer-provider pairs, your Pact broker is full of red verifications nobody owns. Define contract ownership, the breaking-change protocol, and exactly where can-i-deploy sits in the release train.
A provider added a required request field, yet consumer pacts stayed green because no pact covered that endpoint variant. What does this reveal about your contract coverage, and how do you close the gap systematically?
Release-eve, can-i-deploy blocks the payment service because a long-deprecated consumer's pact still references a removed field. Walk through how you resolve it tonight, and the deprecation hygiene you would add afterward.
Your consumer tests mock the provider so loosely that pacts verify trivially, and a real 400-on-empty-list response still broke production. How do you make pact interactions reflect realistic provider behavior and error states?
An external vendor's API cannot run your Pact verifications. Choose between bi-directional contract testing, schema validation against their OpenAPI spec, or scheduled probe tests — and quantify the residual risk each option leaves you carrying.
Teams write pacts but never use provider states, so every verification runs against one happy-path dataset and error contracts go untested. Redesign the provider-state strategy for auth failures, empty results, and 5xx semantics.
Pact verification runs only in a nightly job, so consumers learn about breaking provider changes twelve hours after merge. Re-wire the broker webhooks and pipeline triggers to give commit-time feedback to both sides.
Pact adoption stalled at four of thirty services because writing contracts feels like duplicate effort beside the existing RestAssured suites. Build the incremental adoption strategy and identify which API tests each new pact retires.
A GraphQL federation layer sits between consumers and providers, and field-level deprecations keep breaking mobile clients. Adapt contract-testing principles to GraphQL — persisted queries, schema checks — and be explicit about what Pact cannot do here.
Your full regression runs only nightly, so a morning merge can break main for twenty hours before anyone notices. Redesign which test tiers run at PR, merge, and deploy stages, with explicit time budgets.
The pipeline has fourteen sequential stages and ninety-minute feedback, so teams bypass it with hotfix branches. Re-architect for parallelism and selective test execution, and define precisely what is allowed to bypass what.
Tests pass in CI, but the deployed artifact fails smoke checks in staging because CI tested a different build than what actually deployed. Find the pipeline integrity gaps — artifact promotion, image digests — and close them.
You're asked to gate production deploys on automated tests alone — no manual sign-off — for a payments product. Define the quality gates, canary verification, and rollback automation you would require before agreeing.
Secrets for the test environments live in plaintext pipeline variables, and a contractor with pipeline access just left the company. Walk through the immediate rotation and the secrets architecture you would migrate the pipeline to.
Three squads share one Jenkins instance, and one squad's two-gigabyte artifact uploads starve everyone's builds during release week. Decide between governance, queue priorities, or migrating to per-team runners — and sketch the migration plan.
Your E2E tests need a deployed environment, but environment creation takes thirty minutes, making PR feedback glacial. Evaluate ephemeral preview environments against a pooled-environment checkout model for your stack, and pick one.
An intermittent CI infrastructure failure — DNS resolution inside the runner network — burned three days being misdiagnosed as test flakiness. Build the observability that automatically distinguishes infra failures from genuine test failures.
Developers merge with "skip-tests" labels during crunch, and the label was used forty-seven times last sprint. What pipeline policy, audit trail, and cultural fix do you propose without becoming the process police?
Your monorepo pipeline runs all 8,000 tests on every commit. Implement test-impact analysis — diff-based selection, dependency graphs — and define the safety nets that catch the cases where selection guesses wrong.
A deploy went out while the nightly suite was mid-run against the same environment, and both failed in confusing ways. Design the environment locking and pipeline orchestration that stops deploys and test runs from interleaving.
Auditors require evidence that every production release passed its defined tests, traceable for two years. Retrofit your GitHub Actions pipeline with signed attestations and immutable test reports without slowing the release cadence.
Your Java test framework swallows exceptions in a catch-all block and logs "test failed" with no stack trace, so debugging takes hours. Refactor the error-handling and reporting layers — what specifically changes, and where?
The framework's God-class TestUtils has 4,000 lines, every test imports it, and merge conflicts are constant. Plan the decomposition — package boundaries, dependency direction — without freezing ongoing test development for the team.
A teammate's pytest fixtures perform network calls at import time, so collecting tests takes ninety seconds even when running a single test. Diagnose the anti-pattern and restructure the fixtures with proper scoping and laziness.
Your Java suite runs tests in parallel but shares a static WebDriver field, so parallel runs interleave browser commands across tests. Explain the failure mode precisely and redesign it with ThreadLocal or dependency injection done properly.
Flaky assertions compare floating-point numbers for exact equality and full JSON strings byte-for-byte. Establish the assertion guidelines — tolerances, partial matching, canonical ordering — and describe the shared helper layer you would build to enforce them.
Your Python automation gets reviewed only on "does it pass" — no typing, no docstrings, dead code everywhere. Institute engineering standards for test code — mypy, ruff, coverage of helpers — without grinding delivery to a halt.
A retry decorator copied from Stack Overflow retries on every exception, including assertion failures, and hid real bugs for weeks. Rewrite the retry semantics — what is legitimately retryable, and what must never be?
Your automation depends on five-year-old library versions; a CVE forces upgrades, and 200 tests break on RestAssured's new defaults. Run the upgrade — isolation strategy, characterization tests, sequencing — while the suite keeps gating releases.
Test code duplicates the application's business logic to compute expected values, so bugs pass silently when both share the same flawed formula. How do you choose independent oracles for your assertions instead?
You need one framework serving Selenium UI tests, API tests, and Kafka event assertions, but the current one hardcodes WebDriver into the base test class. Redesign the layering so channels become pluggable, not inherited.
Memory profiling shows your JVM test runner leaking — listeners accumulate per test class — and the 6,000-test run dies with OutOfMemoryError around test 4,800. Walk through the diagnosis and the lifecycle fixes you would apply.
Half the team writes Java, half writes Python, and you maintain two parallel frameworks with diverging behavior. Define consolidation criteria, the migration approach, what you'd preserve from each — or defend deliberately keeping both.
Your page objects return raw WebElements, and tests perform clicks and waits directly, so every UI change touches dozens of tests. Refactor toward action-returning methods and explain the boundary you would enforce in review.
The framework's BasePage exposes sixty inherited methods across four levels of inheritance, and writing a new test requires archaeology. Re-architect with composition — components, widgets — and plan the migration path for 800 existing tests.
Two squads model the same checkout page with separate page objects, and fixes never propagate between them. Establish shared component ownership and the packaging strategy that lets both squads consume one implementation safely.
Your POM framework hides all waits inside page objects, so a serious performance regression became invisible because pages "helpfully" waited thirty seconds. Redesign wait ownership so tests can still assert on responsiveness.
Assertions have leaked into page-object methods like verifyLoginSuccessful, making the pages unusable for negative testing. Articulate the page-versus-test responsibility split and describe how you would refactor the worst offenders first.
A new design system gives every component — dropdowns, modals, tables — consistent DOM patterns across all pages. Exploit this: design the component-object layer that replaces per-page duplication, and define its versioning with the frontend team.
Locators live as string literals scattered across page objects, tests, and helpers, and a header redesign required edits to ninety files. Centralize the locator strategy and define exactly where locators may legally live.
Your framework must support web, mobile-web, and the Appium native app, with roughly 70% shared user flows. Design the abstraction — shared screen interfaces, platform-specific implementations — and admit where it will inevitably leak.
Onboarding a new SDET to your framework takes six weeks, and writing one test requires understanding eleven base classes. You get one sprint to improve developer experience — what do you change first, and why?
Tests construct page objects with new everywhere, so switching browser contexts or capabilities means editing constructors across the whole suite. Introduce a creation or dependency-injection pattern appropriate for a test framework — and acknowledge its limits.
Your fluent page-object chains look elegant in examples but produce unreadable stack traces and impossible debugging when a mid-chain step fails. Critique fluent design in test frameworks and define the alternative you would adopt.
Leadership wants a "codeless" keyword layer on top of your POM framework so manual testers can contribute. Assess what a DSL layer really costs to maintain, and state the contract you would demand before building it.
You receive the release scope on Thursday for a Friday release: fourteen changes and no time for full regression. Walk me through how you would rank what gets tested, and which risk signals drive the ranking.
Your regression suite is green, and you have exactly one day of exploratory budget before a major release. Design the charter set targeting where automation is blind, and explain how you'd staff and debrief the sessions.
A payments module hasn't produced a production defect in two years, while a low-traffic admin tool generates them monthly, yet the test plan spends equal effort on both. Rebalance it and justify the cuts.
Product, support, and engineering each hand you conflicting risk rankings for the release. Build the risk model — impact, likelihood, detectability, recency of change — that merges their signals into a defensible test priority order.
During an exploratory session you found a serious defect, but you cannot reproduce it and your notes are thin. What changes to your session discipline — recording, logging, charter notes — prevent this loss next time?
Your team automated everything and stopped exploring, and the last four production incidents were all valid-but-absurd user behaviors no test modeled. Reintroduce exploratory practice into a sprint cadence that actively resists unscripted work.
A feature spec is three bullet points, the developer is on leave, and QA is blocked "waiting for requirements." How do you test productively against missing specs, and what artifacts do you produce along the way?
Release-eve triage: a sev-2 defect in an edge-case flow surfaces at 8pm; fixing it risks regression, shipping it risks customer impact. Walk through the risk assessment you would present and the recommendation you'd make.
Your testing concentrates where bugs were found historically, but those modules are now over-hardened while new code ships lightly tested. When does defect clustering mislead you, and how do you rebalance the effort?
The organization wants every test case mapped to a requirement for audit purposes, but your best defects come from unscripted exploration that maps to nothing. Reconcile the traceability demand with the value of exploratory work.
You're asked to quantify how much risk remains if the release ships today, and coverage percentage means nothing to the VP. Construct the risk-burndown narrative — tested and untested areas weighted by impact — you would present.
After a sev-1 escape, leadership demands you "test everything from now on." Use the incident's own data to show why that is the wrong lesson, and propose the targeted alternative you'd commit to instead.
During API testing you notice order IDs are sequential integers, and changing the ID in a GET returns another customer's order. Write up the IDOR finding — severity, reproduction, and the regression test you would add.
Your company has no AppSec team, and the CTO asks QA to "cover security." Define the realistic scope — OWASP-top-10 smoke checks, dependency scanning, auth abuse tests — and what you would refuse without specialists.
A search box reflects user input into results without encoding, and your script-tag payload executes in the browser. Beyond reporting the XSS, how do you build output-encoding checks into your automated regression suite?
Pen-testers found thirty issues six months ago, the report is now a stale PDF, and half the issues have silently regressed. Convert pen-test findings into automated security regression tests, and identify which genuinely cannot be automated.
Your login tests confirm the happy path, but you've never tested account lockout, password-reset token reuse, or session invalidation after logout. Build the authentication abuse-case checklist you would fold into the regression suite.
A teammate wired OWASP ZAP into the pipeline; it adds thirty-five minutes and produces four hundred findings, of which roughly three hundred ninety are noise. Tune the integration — baseline scans, context, alert filters — so it gates usefully.
An API returns full user objects including password hashes and internal flags, and developers argue the frontend "only displays some fields." Make the over-exposure case and describe the response-shape tests you would write.
Your e-commerce promo-code endpoint can be brute-forced: no rate limiting, predictable code formats. Design the abuse-case test suite for business-logic security holes that automated scanners will never find on their own.
You notice one internal service accepts a JWT signed with alg=none while every other service rejects it. Walk through verifying the misconfiguration safely in staging and the regression tests you would add afterward.
Compliance demands "security testing evidence" every release, and the team currently responds with a screenshot of a green scan. Define a security test strategy with meaningful gates across SAST, DAST, dependency checks, and manual layers.
A file-upload feature accepts resumes, and you suspect path traversal and content-type spoofing risks. Design the malicious-upload test matrix and describe the sandboxed environment you would insist on running it in.
Secrets keep appearing in test code — API keys were committed twice this quarter despite written policy. Implement the technical controls — pre-commit scanning, vault integration in test config, honeytokens — and design the incident drill.
The UI shows an order as delivered, but you suspect the status is derived in the API layer rather than stored. How do you validate UI-versus-database truth, and which discrepancies actually matter to report?
A nightly ETL moves forty million rows from OLTP to the warehouse, and finance reports drifted by ₹3 crore last quarter. Design the reconciliation test suite — row counts, checksums, sampled field-level diffs — and its alerting.
Your test asserts on a row immediately after the API call, but replication lag makes it intermittently absent on the read replica. Restructure the validation to handle eventual consistency honestly rather than padding sleeps.
A schema migration adds NOT NULL to a column containing legacy nulls; staging migrated cleanly because its data is synthetic. Build the migration-testing practice — production-shaped data, rollback rehearsal — that catches this entire class of failure.
Developers ask you to verify a new soft-delete feature end to end. Which SQL checks would you run — flag consistency, cascade behavior, unique-constraint collisions with deleted rows, restore paths — and in what order?
Your suite runs raw SQL assertions against table internals, so every schema refactor breaks hundreds of tests despite correct behavior. Redefine the database validation boundary — what should tests legally observe, and through which interfaces?
An order total in the database does not match the sum of its line items for 0.3% of rows. Walk through how you would characterize the corruption — when introduced, by which write path — using SQL alone.
A double-booking bug shipped: two users reserved the same slot under concurrent load. Design the database-level tests for race conditions — isolation levels, constraint verification, a concurrent test harness — you would now require before release.
Your team validates a financial report by re-running the very SQL the report itself uses, so the test can never fail. Construct independent SQL oracles for report validation and explain the underlying principle to a junior.
Timestamps are stored inconsistently — some UTC, some IST — after years of mixed code paths. Design the audit queries and the ongoing validation suite that establish datetime integrity across sixty tables without halting feature work.
A privacy deletion request must remove a user across fourteen tables, message queues, and a search index. Build the verification suite that proves deletion completeness, including the storage locations engineers always forget.
Read-heavy queries got ten times slower after an ORM upgrade, yet every functional test stayed green. Add database-level regression checks — query plans, index usage, N+1 detection — into the pipeline without drowning in noise.
Your product added an LLM-powered chat assistant, and the same question yields different answers on every run, making exact-match assertions useless. Design your first non-deterministic test strategy — what exactly do you assert on?
Product wants a quality gate for the AI feature before each release, but "the answers feel worse" is the only signal today. Build the evaluation pipeline — golden datasets, rubric scoring, LLM-as-judge — and name its failure modes.
The recommendation engine's output changed after a model retrain, and forty snapshot tests broke even though the recommendations arguably improved. Rethink what regression testing means when the model is supposed to drift over time.
Your AI summarizer occasionally fabricates plausible-looking figures, and one invented number reached a customer report. Design the hallucination test harness — grounding checks, source attribution validation — and be honest about what it cannot catch.
A fraud-scoring model gates checkout, and your tests only cover score thresholds with clean synthetic profiles. Which boundary, drift, and bias scenarios are missing, and how would you source realistic cases to cover them?
The team wants to gate releases on an LLM-as-judge evaluation, but you've caught the judge scoring identical outputs differently across runs. Quantify and control the judge's instability before you allow anyone to trust the gate.
Prompts live hardcoded in application code, and every prompt tweak ships untested. Establish prompt change management — versioning, eval-before-merge, canary cohorts — sized appropriately for a small team without a dedicated ML platform.
Your chatbot passed all evaluations, then a production user extracted the system prompt and made it promise refunds. Design adversarial test coverage — injection corpora, jailbreak regression packs — and the process that keeps them current.
Latency for the AI feature swings from 800 milliseconds to fifteen seconds depending on token count, and your performance thresholds keep flapping. Restructure performance testing for token-dependent, streaming responses with meaningful pass criteria.
A vendor model upgrade — forced, with thirty days' notice — changed tone and refusal behavior across your product. Build the model-migration test playbook you will reuse every time a provider deprecates a version you depend on.
Your search ranking now blends ML scores with business rules, and relevance regressions surface only through support tickets. Construct the offline relevance evaluation — judgment lists, NDCG budgets — that catches degradation before release, not after.
Leadership asks "is the AI feature tested?" expecting a yes or no. Frame the honest answer: the statistical confidence model, sampled evaluation coverage, and the residual-risk statement you would put in the release notes.
Book a mock interview with a senior QA / SDET mentor — structured scorecard, replay, and a gap plan.