Page Object Model (POM) in Test Automation

The page object model is a design pattern that separates your test logic from your UI selectors. Instead of scattering element locators across every test script, you create one class per screen that holds all selectors and interaction methods for that screen. Your tests call methods on page object instead of touching selectors directly.

POM is most widely used design pattern in test automation. It reduces code duplication and makes selector changes easier to manage. But on mobile, it solves half problem. POM organizes your selectors into one place. It doesn't make those selectors any more stable. When a developer changes UI, you still have to update page objects. The maintenance just happens in a tidier codebase.

What is a page object model?

The page object model is a design pattern where each screen (or page) in your application gets its own class. That class contains two things: element locators for that screen, and methods that interact with those elements.

The page object model definition from Selenium's official docs puts it this way: a page object is an object-oriented class that acts as an interface to a page of your application. Tests use methods of page object class whenever they need to interact with UI of that page.

Martin Fowler's description adds: basic rule of thumb is that a page object should allow a software client to do anything and see anything that a human can. It should provide an easy-to-program interface and hide underlying widgetry.

In practice, what is page object model doing for your codebase? It's creating a single source of truth for every UI element. If a button's selector changes, you update it in one place (page object) instead of hunting through 50 test scripts.

How does page object model structure work?

A POM based test framework has three layers:

Layer	What it contains	Example
Page objects	Element locators and interaction methods	`LoginPage.tapLoginButton()`
Test classes	Test logic and assertions	`assert homePage.isWelcomeVisible()`
Base/config	Shared setup, driver initialization, utilities	`driver.setUp(device: "Pixel7")`

Here's what a page object model example looks like for a mobile login screen using Appium:

public class LoginPage {
    // Locators
    @FindBy(id = "username_field")
    private MobileElement usernameField;

    @FindBy(id = "password_field")
    private MobileElement passwordField;

    @FindBy(id = "login_button")
    private MobileElement loginButton;

    // Methods
    public void enterUsername(String username) {
        usernameField.sendKeys(username);
    }

    public void enterPassword(String password) {
        passwordField.sendKeys(password);
    }

    public HomePage tapLogin() {
        loginButton.click();
        return new HomePage(driver);
    }
}

‍

And test that uses it:

@Test
public void testValidLogin() {
    LoginPage loginPage = new LoginPage(driver);
    loginPage.enterUsername("testuser@email.com");
    loginPage.enterPassword("password123");
    HomePage home = loginPage.tapLogin();
    assertTrue(home.isWelcomeMessageVisible());
}

‍

The test reads cleanly. The selectors are hidden inside page object. If login_button ID changes, you update it in LoginPage, not in every test that taps login.

What are types of page object model?

There are several variations of page object model design pattern, each suited to different levels of app complexity:

The basic POM creates one class per screen with locators and methods. This is what most teams start with. It works well for apps with distinct, stable screens.
The page factory POM (used in Selenium and Appium) uses annotations like @FindBy to initialize elements automatically. Page factory reduces boilerplate but underlying pattern is same. Page object model in Selenium Java projects almost always uses Page Factory.
The component-based POM breaks screens into reusable components (a nav bar, a search widget, a product card) instead of treating each screen as monolithic. This works well for apps with shared UI components across screens.
The screenplay pattern is an alternative to POM that models tests as user tasks instead of page interactions. Instead of loginPage.enterUsername(), you write actor.attemptsTo(Login.withCredentials(...)). It's more flexible but has a steeper learning curve.

For most mobile teams, basic POM or component based POM covers what you need. The page object model structure stays same regardless of which variation you use: locators in one place, methods as interface, tests call methods.

Why POM is standard for selector based testing

POM became standard because it solves a real problem: selector duplication. Without POM, every test script hardcodes its own selectors. When a selector changes, you update it in 10, 20, or 50 different files. With POM, you update it in one place.

On test automation frameworks built with Selenium, Appium, Espresso, or XCUITest, POM is right pattern. It's industry best practice for a reason.

The benefits of POM for selector-based testing:

One source of truth for element locators (reduced duplication)
Test scripts read like user actions, not selector chains
Easier onboarding for new team members who can read tests without knowing locator syntax
Faster selector updates when UI changes (update in one file, not fifty)

POM also works across frameworks. You can implement POM in page object model Selenium, page object model Playwright, page object model Cypress, or page object model Python projects. The pattern is framework-agnostic. Whether you're doing page object model testing with Playwright automation or Appium, structure stays same.

Where POM falls short on mobile

Here's what most page object model guides won't tell you: POM doesn't solve core problem with mobile test automation. It organizes your selectors. It doesn't make them more stable.

On mobile, UI changes are frequent. Developers update layouts, rename components, change accessibility IDs, and restructure view hierarchies on a regular cadence. Every one of those changes breaks selectors, and POM means those broken selectors are neatly organized in page object files instead of scattered across test scripts. The maintenance is tidier, but it's still maintenance.

A team with 200 tests organized in POM and a team with 200 tests without POM both face same problem: when login button's resource ID changes, tests break. The POM team fixes it faster (one file instead of many), but fix is still manual, still time consuming, and still recurring.

The math on mobile looks like this:

Average mobile app has 30-60 screens
Each screen has 5-15 interactable elements
A typical test automation suite has 100-300 tests
Each sprint introduces 5-10 UI changes that affect selectors
Each broken selector requires finding, updating, and re running affected tests

POM reduces fix time per broken selector. It doesn't reduce number of broken selectors. On a mobile team shipping weekly, volume of selector maintenance accumulates regardless of how well organized code is.

When does vision based testing make POM unnecessary?

Vision based testing eliminates selector layer entirely. Instead of finding elements by ID, xPath, or accessibility label, test finds elements by how they look on screen, same way a human does.

With Drizz, same login test looks like this:

Tap on "Username" field
Type "testuser@email.com"
Tap on "Password" field
Type "password123"
Tap on "Log In"
Validate "Welcome" is visible

‍

There are no selectors. There's no page object file. There's no locator to break. When a developer renames login button's resource ID, test still passes because Drizz sees "Log In" on screen visually.

This doesn't mean POM is wrong. POM is best pattern for selector-based testing. But if your testing tool doesn't use selectors, you don't need POM. The design pattern exists to manage a problem that vision-based testing doesn't have.

What does a real before-and-after look like?

An e-commerce team had a mature Appium framework with 250 tests organized in POM across 45 page objects. The framework was clean, well-documented, and followed every POM best practice.

They still spent 30% of their SDET time maintaining page objects. Every sprint, frontend engineers changed 5 to 10 selectors. The SDETs updated page objects, re-ran affected tests, and verified fixes. It took 2 to 3 days per sprint. The POM pattern made updates predictable, but volume was constant.

After switching to Drizz, they rewrote their 50 highest-priority tests in plain English in one sprint. Those 50 tests now run nightly on 10 device configurations. Zero page objects to maintain. Zero selectors to update. The SDETs who used to maintain page objects now write new tests and expand coverage into flows that were never automated.

They kept their Appium POM suite for remaining 200 tests and are migrating them gradually. The POM framework still works. It's just more expensive to maintain than plain English alternative.

When should you still use POM?

POM is right choice when:

Your team uses Selenium, Playwright, Espresso, or XCUITest and isn't switching tools
Your app's UI changes infrequently (internal tools, admin panels, stable products)
Your team has dedicated SDETs who are comfortable maintaining page objects
You need fine-grained control over element interactions that vision-based tools don't support yet

POM is wrong choice when:

Your UI changes every sprint and selector maintenance is eating your SDET bandwidth
Your team doesn't have dedicated SDETs and needs non-technical testers to contribute
You're starting from scratch and want to avoid building an object repository you'll have to maintain forever
Your priority is speed of test creation over framework architecture

For teams that want POM's readability without its maintenance burden, vision-based testing with tools like Drizz offers plain English tests that read like page object methods but don't depend on selectors.

‍

FAQs

What is a page object model?

The page object model is a design pattern in test automation where each screen in your app gets its own class containing element locators and interaction methods. Tests call methods on page object instead of touching selectors directly. The page object model definition comes from Selenium community and has become standard pattern across web and mobile automation frameworks.

Why use page object model?

POM reduces selector duplication. Without POM, every test script hardcodes its own element locators. When a selector changes, you update it in dozens of files. With POM, you update it in one page object class. POM also makes test scripts more readable because tests call methods like loginPage.tapLogin() instead of writing raw selector chains.

What is page object model structure?

A POM framework has three layers: page objects (locators and methods per screen), test classes (test logic and assertions), and a base/config layer (driver setup, utilities). Page objects act as interface between your tests and UI. Tests never touch selectors directly.

Does POM work for mobile testing?

Yes. POM works with Appium, Espresso, and XCUITest. The pattern is same as web: one class per screen, locators and methods inside. The limitation on mobile is that POM organizes selectors but doesn't make them more stable. UI changes still break locators, and page objects still need manual updates.

What is page object model in Selenium vs Playwright?

The pattern is same in both. Page object model Selenium uses @FindBy annotations with Page Factory. Page object model Playwright uses Playwright's built-in locator API. Both create page classes with locators and methods. The framework syntax differs but design pattern is identical.

When should you skip POM and use vision-based testing instead?

Skip POM when your UI changes frequently and selector maintenance is consuming your SDET bandwidth. Vision-based tools like Drizz find elements visually instead of by selector, so there's no object repository to maintain. Plain English tests read like page object methods but don't break when developers change UI.

‍

About the Author:

Asad Abrar

Co-founder & CEO, Drizz

Ex-Coinbase PM and IIT Kharagpur grad killing flaky mobile tests by day, and obsessing over F1 lap timings by night.