What is Flaky Test?

A flaky test is an automated test that produces inconsistent pass/fail results across multiple runs without any changes to the source code or test itself.

Start a Free Mock Interview

Free to start · 7-day trial on paid plans

IN DEPTH

In depth.

Flaky tests are one of the most insidious problems in test automation. They erode team confidence in the test suite because engineers stop trusting failures as genuine signals. When a test passes on one run and fails on the next with identical code, the usual culprits are race conditions, shared mutable state between tests, dependency on external services, non-deterministic data ordering, or timing-sensitive assertions that do not account for asynchronous behavior.

The real cost of flakiness is not the debugging time for a single failure but the cultural damage. Teams start reflexively re-running pipelines instead of investigating failures, which masks real regressions. In large organizations, flaky tests can waste hundreds of compute hours per week and significantly slow release velocity.

Effective strategies include quarantining known flaky tests into a separate suite, adding retry annotations with investigation tickets, using deterministic waits instead of arbitrary sleeps, isolating test data per run, and tracking flakiness rates over time with tools like a test analytics dashboard.

Root causes fall into a handful of repeatable categories. Race conditions and async timing issues top the list: the test asserts before a network call, animation, or background job has completed, so the outcome depends on machine speed. Shared mutable state comes next: tests that pass alone but fail in the suite are usually leaking data through a shared database, static variable, or filesystem. Test-order dependence is the same disease in another form, where test B silently relies on state test A created, so any reordering or parallelization breaks it. Finally, dependence on external services (real third-party APIs, sandboxes, DNS, system clocks) imports someone else's instability into your suite.

The proven operational playbook is quarantine plus retry-with-tracking. When a test is confirmed flaky, move it to a quarantined suite that still runs but does not block the pipeline, open a ticket with the captured traces, and set a deadline to fix it or delete it. In the main suite, allow a bounded automatic retry (one or two attempts) but record every retried-then-passed run as a flake event in a dashboard so per-test flakiness rates are visible and trending downward. The key discipline is treating a retried pass as data, not as a clean pass.

Frameworks give you concrete levers. In JUnit 5, @RepeatedTest hammers a suspect test locally to reproduce the flake, libraries like rerunner-jupiter or Maven Surefire's rerunFailingTestsCount add CI retries, and running methods in random order with a logged seed deliberately flushes out order dependence. Playwright has first-class support: retries: 2 in playwright.config.ts, retried-then-passed tests marked "flaky" in the HTML report, and trace: 'on-first-retry' so the trace viewer replays the exact failing run with DOM snapshots and network logs. The caution is that blind retries without tracking hide real bugs: a genuine race condition in production code will also pass on retry, quietly converting a customer-facing defect into green CI.

WHY IT MATTERS

Why interviewers ask about this.

Interviewers ask about flaky tests to gauge whether you have real-world automation experience. Anyone can write a happy-path test; handling flakiness demonstrates you understand the operational side of test infrastructure.

EXAMPLE

Example scenario.

A Selenium test checks that a toast notification appears after form submission. It passes locally but fails in CI 30% of the time because the CI environment is slower and the toast animation has not completed before the assertion fires. The fix is replacing a hardcoded sleep with an explicit wait that polls for the toast element to be visible.

TIP

Interview tip.

When discussing flaky tests, describe a specific flaky test you triaged. Walk through the symptoms, your debugging process, and the root cause. Interviewers value the investigation story more than the fix itself.

FAQ

Frequently asked questions.

What is a flaky test in simple terms?

A flaky test is one that sometimes passes and sometimes fails with no change to the application code or the test itself. The same commit can produce green and red runs, which means the failure signal cannot be trusted. Flakiness is a property of the test plus its environment (timing, state, dependencies), not of the feature under test.

What causes flaky tests?

The most common root causes are race conditions and async timing (asserting before the app finishes work), hardcoded sleeps that are too short on slow CI machines, shared state between tests, test-order dependence, non-deterministic data such as dates and random values, and reliance on external services or networks. Most flakes trace back to timing or isolation.

What is the best way to handle flaky tests?

Quarantine confirmed flaky tests into a non-blocking suite with a ticket and a fix-or-delete deadline, allow at most one or two tracked retries in the main pipeline, and fix root causes with deterministic waits, per-test data isolation, and hermetic dependencies. Track per-test flakiness rates over time so the problem is measured, not anecdotal.

How do you handle flaky tests in JUnit?

Reproduce the flake locally with @RepeatedTest to run it dozens of times, add bounded CI retries with Maven Surefire's rerunFailingTestsCount or the rerunner-jupiter library, and expose hidden order dependence by running methods in random order with a logged seed. Then fix the root cause: Awaitility for async assertions and Testcontainers for isolated infrastructure solve most JUnit flakiness.

Should you automatically retry flaky tests?

Bounded retries are acceptable only when every retried pass is recorded and reviewed, the way Playwright's built-in retries mark tests as flaky in the report. Blind retries are dangerous because a real race condition in production code also passes on retry, so the retry policy can hide a customer-facing bug behind green CI.