What Is Chaos Engineering?
Chaos engineering is the discipline of deliberately injecting controlled failures into a system in production or pre-production to proactively discover weaknesses before they cause unplanned outages.
Free to start · 7-day trial on paid plans
In Depth
Pioneered by Netflix with Chaos Monkey (which randomly terminates production instances), chaos engineering has evolved into a rigorous practice. The process follows the scientific method: form a hypothesis about how the system should behave under stress ("if we kill one database replica, the app should fail over to another with no user-visible errors"), design an experiment that injects the failure, run the experiment while monitoring impact, and analyze results to identify weaknesses.
Common chaos experiments include terminating server instances, injecting network latency or packet loss, exhausting CPU or memory resources, failing over databases, and corrupting DNS resolution. Tools like Gremlin, Chaos Mesh (for Kubernetes), and AWS Fault Injection Simulator provide controlled, safe ways to run these experiments.
QA engineers contribute to chaos engineering by defining steady-state hypotheses (what "normal" looks like in metrics), designing experiment scenarios based on production failure patterns, monitoring dashboards during experiments, and verifying that graceful degradation and error handling work as designed. Chaos engineering extends QA from "does the feature work?" to "does the system survive?"
Why Interviewers Ask About This
Chaos engineering is an advanced topic that signals senior-level thinking. Interviewers ask about it to see whether you think about reliability and resilience, not just functional correctness.
Example Scenario
During a game day, the team injects 500ms latency into calls to the payment service. The hypothesis is that the checkout flow should show a "processing" spinner and complete within 10 seconds. Instead, the frontend times out at 3 seconds and shows a blank error page. The team adds a retry mechanism with exponential backoff and a user-friendly timeout message.
Interview Tip
Describe the scientific method behind chaos engineering: hypothesis, experiment, observe, learn. Mention a specific tool and explain how QA fits into the process. Avoid implying chaos testing is random destruction.
Related Terms
Explore related glossary terms to deepen your understanding.
Related Resources
Dive deeper with these related interview prep pages.
Ready to Ace Your QA Interview?
Practice explaining chaos engineering and other key concepts with our AI interviewer.
Join 1,200+ QA engineers already practicing with AssertHired.
Start Your Free QA InterviewFree to start · 7-day trial on paid plans