Regression Testing: How Teams Prevent Old Bugs from Returning

Regression testing is practice of re-running existing tests after a code change to make sure nothing that previously worked is now broken.

You ship a bug fix. You add a feature. You upgrade a dependency. You merge a refactor. Then you run your regression suite to confirm rest of app still behaves way it did before change. As IBM puts it, it's "a software testing strategy used to check that code modifications aren't harming existing functionality or introducing new bugs."

The word "regression" means moving backward. In software, a regression is when something that used to work stops working because of a change somewhere else. The fix itself might be fine. The damage is in ripple effect.

This sounds simple. Run old tests. Check if they pass. In practice, regression testing is where most QA teams spend majority of their time, and where most automation efforts eventually collapse under their own weight. Katalon's 2025 State of Software Quality Report found that organizations using AI-driven regression testing see 24% lower operational costs and 32% higher customer satisfaction compared to those relying on manual or script-heavy approaches.

This guide covers what regression testing means, how it differs from retesting, every type with real failure scenarios, why it's a fundamentally different problem on mobile, and how to automate it without building a maintenance burden that's worse than problem you started with.

Regression testing vs retesting

These two get confused constantly.

Retesting re-runs a specific test that previously failed. The purpose is to confirm that a bug fix actually worked. You found a defect, developer patched it, and you run that exact test again to verify patch.

Regression testing re-runs tests that previously passed. The purpose is to confirm that fix or any other change didn't break something else.

Here's a concrete scenario. Your checkout flow has a bug: discount codes aren't applying correctly. The developer fixes discount logic. Retesting runs discount code test again to confirm it now works. Regression testing runs entire checkout suite cart calculation, shipping options, payment processing, order confirmation to confirm that discount fix didn't accidentally break tax calculation or change how free shipping thresholds work.

Retesting asks: "Did we fix bug?" Regression testing asks: "Did fixing bug break anything else?"

You always need both after a code change. They answer different questions.

When do you need regression testing?

Not every commit needs a full regression suite. But certain types of changes should always trigger it.

Code changes and bug fixes. Any time production code is modified, there's a risk that change touched something it shouldn't have. A fix to checkout flow might break how discount codes apply. A refactor of authentication module might change session handling in ways that aren't obvious until a user gets logged out mid-session three screens deep.

Dependency and library updates. You upgrade your payment gateway SDK from v3.1 to v3.2. Your app's code is identical. But SDK now returns a different response format for declined transactions, and your error handling screen shows a blank message instead of "Card declined." Your code didn't change. The code it depends on did. Regression catches this.

Configuration changes. Feature flags, environment variables, API endpoint changes, database migrations none of these modify source code, but they change how app behaves at runtime. A feature flag that enables a new onboarding flow might inadvertently disable old one before new one is ready for all users.

OS and platform updates. On mobile, this is where regressions get sneaky. Android 15 changed how background processes are handled. iOS 18 modified permission dialogs. Your code is same. The ground underneath it moved. A test that passed on Android 14 might fail on Android 15 because OS now kills background services more aggressively, and your app's GPS tracking flow depends on one.

Merges and branch integrations. Two developers work on separate features in parallel branches. Each branch passes its own tests. When they merge, combined code introduces a conflict that neither branch's tests could have predicted. Regression testing on merged result catches interactions that isolated testing misses.

The rule of thumb: if anything that affects how app runs has changed code, dependencies, config, platform, or environment run regression tests.

Types of regression testing (with real failure scenarios)

Most guides list types as one-paragraph definitions. That's not useful when you're deciding which type to run after a specific change. Here's each type with a concrete scenario showing when you'd pick it and what it catches.

Corrective regression testing

What it is: Re-run existing test cases without modifying them. Nothing in app's specification changed. You're confirming that current behavior still matches expectations.

When to use it: After a small bug fix, a performance optimization, or a code cleanup that shouldn't change any visible behavior.

Real scenario: A developer removes unused CSS classes and consolidates duplicate style rules to reduce app's bundle size. No feature changed. No new UI was added. Corrective regression re-runs existing UI test suite to confirm that cleanup didn't accidentally delete a style rule that a screen still depends on. On one team, exactly this happened a shared .card-shadow class was removed because it looked unused, but a product detail screen inherited it. The shadow disappeared, and visual regression test flagged a layout shift.

Selective regression testing

What it is: Pick a subset of your test suite based on which parts of codebase change touched. Run only tests that cover affected modules and their dependencies.

When to use it: After a change that's scoped to a specific feature area, when you know what was affected and what wasn't.

Real scenario: A developer modifies search algorithm to include fuzzy matching. Selective regression runs search tests, product listing tests (because listings depend on search results), and filter tests (because filters apply to search output). It skips checkout suite, account settings suite, and onboarding flow. The change is contained. But it still catches that fuzzy matching accidentally returns out-of-stock products that old exact-match algorithm excluded, because product listing tests validate inventory status.

Risk: If your impact analysis is wrong if you misjudge what was affected you'll miss a regression. Selective regression saves time but requires accurate dependency mapping.

Progressive regression testing

What it is: Create new test cases alongside existing ones when new features or requirements are introduced. Run both new and old tests together.

When to use it: After shipping a new feature that changes or extends existing behavior.

Real scenario: An e-commerce app adds a "Buy Now" button that skips cart and goes directly to checkout. Progressive regression creates new tests for "Buy Now" flow, then runs them alongside existing cart-based checkout tests. The new tests pass. But old tests catch something: "Buy Now" flow bypassed a step where cart applies loyalty points, so users who buy directly lose their rewards credit. The old tests expected loyalty calculation on order confirmation screen. It wasn't there.

Partial regression testing

What it is: Test affected module plus its immediate neighbors. A middle ground between selective (just changed module) and complete (everything).

When to use it: After a medium-sized change where blast radius is somewhat predictable but might extend beyond directly modified code.

Real scenario: A developer updates user profile module to support a new avatar upload feature. Partial regression tests profile module, settings screen (which links to profile), navigation header (which displays avatar), and chat interface (which shows avatar in message threads). It doesn't test checkout, onboarding, or analytics. The partial run catches that new avatar upload changes image format from JPEG to WebP, and chat interface's image renderer doesn't support WebP, so avatars in chat threads show as broken images.

Complete regression testing

What it is: Re-run entire test suite from top to bottom. Every test. Every flow. Every module.

When to use it: After a major release, a large refactor, a database migration, a framework upgrade, or any change where blast radius could be entire app.

Real scenario: The engineering team upgrades app's database from PostgreSQL 14 to PostgreSQL 16. No application code changed. But Postgres 16 handles NULL sorting differently in certain edge cases. Complete regression runs every test, and order history screen which nobody expected to be affected shows orders in a different sequence because query's ORDER BY clause relied on old NULL sorting behavior. Selective regression wouldn't have caught this because nobody mapped "database upgrade" to "order history display."

Trade-off: Complete regression is most thorough and most expensive. Teams typically reserve it for release candidates, nightly builds, or post-migration validation rather than running it on every commit.

Visual regression testing

What it is: Compare screenshots of app's UI before and after a change. Functional tests might pass button still works but visual regression catches that button moved 20 pixels left, overlaps a label, or disappeared on a smaller screen.

When to use it: After any UI change, especially on mobile where same screen renders differently across hundreds of device/OS/screen-density combinations.

Real scenario: A developer updates padding on login form. Functional tests pass email field works, password field works, login button works. Visual regression compares screenshots across 8 devices. On a Samsung Galaxy A14 with a 720p screen, updated padding pushes "Forgot Password" link below fold. Users on that device can't see it without scrolling. Functional tests had no way to catch this. The screenshot comparison flagged layout shift in seconds.

Which combination to use

Most teams don't pick one type. They layer them:

Selective or partial regression on every pull request (fast feedback, scoped to change)
Complete regression before a release or after a major change (thorough validation)
Visual regression as a continuous check across device matrix (catches presentation breaks that functional tests miss)

How regression testing works: 5 steps

1. Identify what changed and map blast radius. Look at code diff, dependency updates, config changes. Don't just ask "what did we modify?" Ask "what depends on what we modified?" A change to authentication module affects every screen that requires a logged-in user. A database schema change affects every query that touches modified table.

2. Select test cases and pick right type. Based on blast radius, decide: selective, partial, or complete? Prioritize tests that cover revenue-impacting flows first login, search, checkout, payments, onboarding. If 2,000 tests exist and you're running selective regression, 200 tests covering your top 10 revenue flows matter more than 200 tests covering admin settings.

3. Execute. Run selected tests, ideally in an automated pipeline on every build or at least nightly. Manual regression works for small teams with infrequent releases, but it doesn't scale. A 10-person QA team manually running regression can spend 20-30% of every sprint on test execution and triage alone that's 2-3 full-time engineers doing work a machine should handle.

4. Separate real failures from false positives. When tests fail, figure out whether failure is a real regression (app broke) or a false positive (test broke). This step is where most teams hemorrhage time. Flaky tests, stale selectors, hardcoded waits, and environment differences all produce failures that look like regressions but aren't. According to Katalon's research, 48% of organizations now view QA as a competitive advantage but only teams that solved false-positive problem got there.

5. Fix, retest, update. If it's a real regression, developer fixes code. You re-run failing test (that's retesting). If test itself was problem a stale selector, a hardcoded wait, an environment-specific assertion update test. The suite should be green before change goes to production. Every unresolved red test erodes trust in suite.

Why regression testing is harder on mobile

Everything above applies to any software. But on mobile, regression testing has a layer of problems that web and backend teams don't face.

Device fragmentation compounds test matrix. Android runs across thousands of device models from hundreds of manufacturers. A Samsung Galaxy with One UI renders screens differently than a Pixel running stock Android, which renders differently than a Xiaomi on MIUI. A regression that only appears on devices with notch displays, or only on tablets with landscape mode, or only on foldables with multi-window these are invisible if your regression suite runs on three emulators. One team working with Drizz found that 23% of their test failures came from device-specific rendering differences, not code bugs.

OS updates you didn't trigger. When Apple ships iOS 18 or Google ships Android 15, both introduce behavior changes that break existing apps. Your code is identical. The operating system isn't. And users update at different rates, so you're simultaneously supporting Android 13, 14, and 15 users on same build. A regression test that passes on Android 14 might fail on Android 15 because OS now handles background location permissions differently. Your location-based feature didn't change. The OS's enforcement of that feature did.

OEM skins alter behavior without touching your code. Samsung's One UI, Xiaomi's MIUI, Oppo's ColorOS each adds its own permission dialogs, notification handling, battery optimization, and UI rendering on top of stock Android. A test that passes on a Pixel fails on a Samsung because Samsung's aggressive battery optimization kills your background sync service. Your code didn't change. The device manufacturer's skin changed how OS treats your code.

Third-party SDKs ship independently. Your analytics SDK, payment library, push notification service, and ad network all release their own updates on their own schedules. A new version of your payment SDK might change how checkout screen renders or how callback functions are named. Your regression suite needs to catch that, even though change didn't come from your team.

Async UI behavior creates timing-based regressions. Mobile apps have more loading states, animations, transitions, and network-dependent rendering than web apps. A test that taps "Submit" and immediately checks for a success message might pass on a fast Pixel 9 and fail on a budget device where API response takes 800ms longer. Static waits (sleep(3000)) make tests slow. Dynamic waits (wait until element appears) require tooling that understands screen state in real time.

These aren't theoretical problems. They're why mobile regression suites decay faster than web regression suites, and why mobile QA teams spend more time maintaining tests than writing new ones.

How to automate regression testing without creating a maintenance trap

Automating regression testing is supposed to save time. It does until suite starts breaking faster than app.

The pattern is predictable. You write automated tests using selectors (element IDs, XPaths, CSS classes) that identify UI elements. A developer renames a button ID from btn-login to login-submit. The selector breaks. The test fails even though app works perfectly. Multiply this by hundreds of tests, dozens of devices, and weekly UI updates, and selector maintenance becomes its own full-time job.

Three things separate regression suites that last from ones that collapse:

Prioritize by revenue impact, not coverage percentage. Don't automate everything. Automate flows that cost you money or customers when they break: login, search, checkout, payments, onboarding. A 200-test suite covering your top 10 revenue flows with 98% stability is worth more than a 2,000-test suite with 15% flakiness that nobody trusts.

Run on real devices, not just emulators. Emulators are fine for development. For regression, you need real hardware. GPU rendering differences, touch event behavior, OEM skins, memory pressure under actual system load emulators don't replicate these. Drizz runs regression tests on real Android and iOS devices across OS versions, so what passes in your pipeline passes in your users' hands.

Remove selector layer entirely. This is structural change that eliminates maintenance trap. Instead of identifying elements by ID or XPath, Vision AI reads screen way a human does it sees a "Login" button as a login button, regardless of whether underlying element ID changed. When UI changes, test doesn't break because it was never coupled to selector.

With Drizz, regression tests are written in plain English ("Tap on Login," "Type 'test@email.com' in email field," "Validate 'Welcome' is visible") and executed by Vision AI on real devices. The platform's self-healing adapts when UI shifts, and popup agent dismisses unexpected permission dialogs, cookie banners, and ad overlays that cause false failures. Teams using Drizz report going from 15 tests authored per month to 200, with flakiness dropping from ~15% to ~5%.

FAQ

What is regression testing in software testing?

It's the practice of re-running tests that previously passed after a code change to make sure the change didn't break existing features or introduce new bugs.

What's the difference between smoke testing and regression testing?

Smoke testing is a quick surface-level check that the build is stable enough to test further. Regression testing goes deeper, re-running specific tests to confirm features still work after a change.

How often should you run regression tests?

Run selective or partial regression on every pull request for fast feedback. Run complete regression before every release. Run visual regression across your device matrix weekly or nightly.

Can regression testing be fully automated?

Yes, if the automation is built to handle UI changes without breaking. Selector-dependent suites need constant maintenance. Vision AI-based tools avoid this by reading the screen visually instead.

What does regression testing mean?

It means running your existing test suite after any code, dependency, or environment change to confirm the app didn't move backward — that nothing previously working is now broken.

‍

About the Author: