Smoke Testing vs Sanity Testing: What's Difference And When To Use Each

These two get confused constantly. Here is answer

Smoke testing asks: "Is this build stable enough to test at all?" It's a broad, shallow check of app's core functions right after a new build drops. Does app launch? Can you log in? Does home screen load? If any of these fail, you stop. There's no point running deeper tests on a build that can't even start.

Sanity testing asks: "Did specific fix or change we just made actually work?" It's a narrow, focused check that happens after smoke testing passes. A developer fixed checkout bug. Sanity testing runs checkout flow to confirm fix works and didn't break anything adjacent. It doesn't retest entire app. It retests area that changed.

Smoke testing is gate. Sanity testing is targeted follow-up.

The same build, both tests: a concrete example

The difference is clearest when you see both applied to same build in sequence.

The situation: Your team ships a new build of a food delivery app. The build includes a fix for a bug where promo codes weren't applying at checkout. It also includes a minor redesign of order tracking screen.

Smoke testing (first). A tester (or an automated suite) runs through core flows: launch app, sign up with a new account, log in with an existing account, browse restaurants, add an item to cart, go to checkout, enter payment, complete order, view order history. This takes 10-15 minutes. The goal isn't to test deeply. The goal is to confirm that build is stable and main paths work.

Result: app launches fine, login works, browsing works, adding to cart works, but checkout screen crashes when tester taps "Place Order." The build fails smoke testing. It goes back to development. Nobody runs sanity testing, regression testing, or any other testing because build is too broken for further work.

New build drops (bug fixed). The developer finds crash (a null pointer in payment module), fixes it, and ships a new build.

Smoke testing (again). Same core flows. This time, everything passes. The app launches, login works, checkout completes, order history loads. The build is stable. Smoke testing passes.

Sanity testing (now). The build had two changes: promo code fix and tracking screen redesign. Sanity testing focuses on those two areas.

For promo code fix: tester applies a code at checkout, confirms discount shows, completes order, and verifies discounted total on confirmation screen. Then they try an expired code, an invalid code, and a code with a minimum order requirement that isn't met. The fix works. Edge cases are handled.

For tracking screen redesign: tester places an order, navigates to tracking screen, confirms map loads, order status updates, and ETA displays correctly. The redesign looks right.

Sanity testing passes. Now build moves to full regression testing, performance testing, and rest of QA cycle.

That's sequence: smoke first (is build stable?), sanity second (do specific changes work?), deeper testing third (does everything else still work?).

The difference, plainly

	Smoke testing	Sanity testing
When	Right after a new build	After smoke passes, focused on recent changes
Scope	Broad (whole app, core flows)	Narrow (only changed area)
Depth	Shallow (does it work at all?)	Deeper (does fix hold up, including edge cases?)
What it answers	"Is this build worth testing?"	"Did this specific change work?"
If it fails	Build goes back to dev immediately	The specific fix goes back; rest of testing continues
Who runs it	Usually automated in CI/CD	Can be manual or automated
Analogy	Turning car key to see if engine starts	Test-driving new brakes to confirm repair worked

Why this distinction matters more on mobile

On web, smoke testing is straightforward: load page in a browser, click through main flows, confirm nothing crashes. One browser, one environment, done in 5 minutes.

On mobile, smoke testing has to answer a harder question: does build work on real devices, not just emulator?

A build might pass smoke on a stock Android emulator and crash on a Samsung Galaxy A14 because Samsung's One UI handles a permission dialog differently. It might pass on an iPhone 15 simulator and fail on an iPhone SE because smaller screen pushes a button off visible area. If your smoke suite only runs on one emulator, it's answering wrong question. It's telling you build is stable on a device your users don't have.

Sanity testing on mobile has same problem. A promo code fix might work on stock Android but break on Xiaomi because HyperOS's custom keyboard handles text input differently in certain fields. If your sanity check only runs on emulator where bug was reproduced, you've confirmed fix works in one environment but not in environments where your users actually are.

This is why both smoke and sanity tests on mobile should run on real devices across at least 3-4 configurations: a Samsung (One UI), a Pixel (stock Android), a budget device (3-4 GB RAM), and an iPhone if you support iOS. That's minimum coverage to answer "is build stable?" and "does fix work?" with any confidence.

Which one to automate first

If you're starting with limited automation capacity, automate smoke testing first.

Smoke tests are small (10-20 test cases covering core flows), stable (core flows don't change much between sprints), and high-impact (a failing smoke test blocks entire testing cycle). They run on every build. They're perfect first automation target because ROI is immediate: instead of a tester spending 15 minutes manually checking if build is stable, CI pipeline does it in 2-3 minutes and rejects broken builds before anyone touches them.

Sanity tests are a secondary automation target. They change with every sprint because they're tied to whatever was fixed or changed. Some teams keep a library of reusable sanity checks (checkout flow, login flow, search flow) and pick from it based on what changed. Others write quick throwaway scripts for each sanity cycle.

With Drizz, both are fast to automate. A smoke suite for a mobile app is a set of plain-English steps: "Launch app," "Tap Log in," "Type credentials," "Validate home screen." It runs on real devices across device matrix. If it passes, build moves to next stage. If it fails, Drizz's step-by-step screenshots and failure reasoning show exactly which step broke and on which device.

Sanity tests are just as fast to write. "Apply promo code FIRST50," "Validate discount shows," "Complete checkout," "Validate total on confirmation." Because Drizz uses Vision AI instead of selectors, these tests don't break when UI changes between builds. The self-healing adapts, and popup agent handles OEM dialogs. Teams go from 15 tests per month to 200, with flakiness at ~5% compared to ~15% on Appium.

How smoke and sanity relate to regression testing

All three are related but different in scope and timing.

Smoke testing runs first on a new build. It checks whether build is testable. If smoke fails, nothing else runs.

Sanity testing runs second, only on changed areas. It checks whether specific fix or feature works. If sanity fails, that fix goes back to development.

Regression testing runs third, across full (or selective) test suite. It checks whether changes broke anything else in app. If regression fails, there's a side-effect bug that needs investigation.

The three form a pipeline: smoke (is it alive?) then sanity (does change work?) then regression (did change break anything else?). Each one is a gate. Each one catches a different class of problem. Skipping any of three means shipping a category of bugs you didn't check for.

FAQ

What is smoke testing?

A quick, broad check that a new build's core functions work (app launches, login works, main flows complete). It determines whether build is stable enough for further testing. Also called "build verification testing."

What is sanity testing?

A quick, focused check that a specific fix or change works correctly. It runs after smoke testing passes and targets only areas affected by recent change, including edge cases around fix.

Can smoke and sanity testing be automated?

Yes. Smoke testing is best first automation target because tests are stable and reusable across builds. Sanity testing can also be automated, especially if you maintain a library of reusable flow-level tests.

Is sanity testing same as regression testing?

No. Sanity testing checks whether a specific fix works. Regression testing checks whether that fix (or any other change) broke something else in app. Sanity is narrow and targeted. Regression is broad and protective.

How many test cases should a smoke test have?

Typically 10-20, covering core user flows: app launch, login, main navigation, primary feature (search, browse, add to cart), checkout/payment, and logout. Enough to confirm build is functional, not enough to be slow.

Should smoke testing run on real devices or emulators?

Both, but real devices matter more on mobile. Emulators miss OEM-specific crashes, font scaling issues, and manufacturer permission dialogs. Run smoke on at least a Samsung, a Pixel, and a budget device for meaningful coverage.

‍

About the Author: