A Comprehensive Testing Strategy: Unit, End-to-End, and Load Testing

The Testing Pyramid

A mature application usually needs three distinct kinds of confidence:

logic confidence
integration confidence
performance confidence

That maps naturally to:

unit tests
end-to-end tests
load tests

Each catches a different failure class. Treating one tier as a substitute for the others usually creates blind spots.

Unit Tests with Real In-Memory Dependencies

For backend code, the best unit tests are often not fully mocked. If the application depends heavily on a document database, an in-memory database instance can provide much better signal than fake repository objects.

beforeAll(async () => {
  db = await createInMemoryDatabase()
  await connect(db.uri)
})

afterAll(async () => {
  await db.stop()
})

This keeps the tests fast while still exercising real query semantics. The goal is not to reproduce production perfectly. The goal is to catch logic and query-shape bugs without requiring a full external database process for every developer run.

End-to-End Tests Need Controlled State

End-to-end tests should verify complete user flows through the real UI and server behavior. To do that reliably, they need deterministic data.

That usually means:

explicit seeding before a scenario
explicit cleanup afterward
helper commands for common user actions
selectors that are stable across refactors and localization

Good E2E tests are not just "click around and hope." They are reproducible environment orchestration.

When Teams Move from Older E2E Tools to Playwright

At some point, many frontend teams hit an uncomfortable truth: the problem is not just the tests. The problem is the testing tool's model of the browser.

This usually becomes obvious when the application starts depending on things like:

third-party payment flows inside iframes
embedded identity or challenge widgets
cross-origin interactions
stricter cookie and session behavior
multi-step flows that behave differently in CI than they do locally

That is often the moment teams start looking seriously at Playwright.

The appeal is not hype. It is practical.

Playwright tends to feel closer to how a real browser behaves in production, especially when the test needs to deal with boundaries that are awkward in older E2E tooling. Iframes are one of the most common breaking points. If a product relies on embedded checkout, embedded video, or third-party verification widgets, weak iframe support turns an end-to-end suite into a partial-confidence suite very quickly.

That matters because "we tested everything except the hardest part of the user flow" is not really the kind of confidence most teams want to have.

One of the clearest examples is payments. A lot of checkout systems render their most important inputs inside secure embedded frames. If your E2E runner cannot interact with those frames cleanly, the suite often ends up verifying that a modal opened, not that a payment flow actually worked. That is a major difference in confidence before shipping.

A Good Migration Signal: Too Many Workarounds

One of the clearest signs that it may be time to migrate test tooling is the number of special-case fixes the suite starts accumulating.

Examples of that smell include:

repeated CI-only retries and timeout increases
test logic that skips part of a third-party flow because the runner cannot handle it cleanly
selector hacks tied to visibility quirks rather than user behavior
custom login shortcuts that drift too far from real session behavior
constant patching around cross-origin or iframe boundaries
test comments that explicitly admit a critical third-party flow is being skipped or only partially checked

None of those workarounds are inherently wrong. Sometimes they are the only reasonable short-term option.

But when they become the dominant maintenance cost of the E2E suite, the team should step back and ask a harder question:

is the suite fighting the product, or is the tool fighting the product?

That is where Playwright starts to win the argument. (Or rather, it did for us.)

Why Playwright Is Attractive for Modern Full-Stack Apps

For beginner developers, the simplest way to think about it is this:

unit tests check your logic
end-to-end tests check your product
the runner should behave enough like a real browser that the product can be tested honestly

Playwright is attractive in modern full-stack systems because it is strong in exactly the places where complex web apps tend to get messy:

iframe-heavy experiences
realistic browser context handling
session and cookie behavior
multi-page navigation flows
better consistency between local and CI execution

It also encourages a cleaner test architecture:

global setup can seed shared prerequisites before the suite runs
per-test fixtures can create and clean up user state predictably
helper utilities can interact with embedded frames directly instead of working around them
one base URL and one browser configuration can become the single source of truth for the suite

That does not mean every team must migrate. It means the testing stack should match the complexity of the product.

If the application is mostly simple forms and single-origin pages, an older E2E tool may still be enough.

If the application increasingly depends on embedded third-party surfaces, stronger browser realism becomes more valuable.

Playwright Is Not Just About Features, It Is About Honesty

This is the real lesson I think beginners should understand.

The point of an end-to-end test suite is not to generate green checkmarks. It is to tell the truth about whether the product works.

If the most important flows in the application involve:

secure payment widgets (Stripe offers both embedded and custom payment components)
anti-bot challenges (Cloudflare Turnstile in our usecase)
iframe-based media or embedded tools

then the browser test runner has to be capable of interacting with those surfaces in a way that still feels honest.

That is why Playwright adoption often shows up in more mature stacks. Not because teams want a trendier library, but because they want their end-to-end tests to cover the hard parts instead of stepping around them.

Stable Selectors Matter

UI tests should avoid fragile selectors tied to styling or human-readable copy. Class names change. Text changes. Responsive layouts change.

Stable test identifiers are a better contract between the application and the test suite.

That keeps tests focused on behavior instead of presentation details.

Shared Infrastructure Requires Isolation

One of the hardest practical problems in CI is making sure E2E execution does not interfere with a shared non-production environment.

A reliable pattern is:

run test dependencies on alternate ports
isolate test databases from shared environments
keep test seed data separate from operational data
tear down aggressively after suite completion

This is less glamorous than writing assertions, but it is what makes E2E stable enough to trust.

Load Testing Is a Contract

Load testing is useful only if it enforces an explicit standard.

export const options = {
  thresholds: {
    http_req_duration: ['p95<500'],
    http_req_failed: ['rate<0.1'],
  },
}

That turns performance from a vague aspiration into a build-time contract. If latency or error rate crosses the agreed threshold, the result is not "something to keep an eye on." It is a failure.

The exact numbers are application-specific, but the discipline is universal.

What Each Tier Answers

Unit tests

Does this function behave correctly in isolation?
Are edge cases handled?
Does a query builder produce the right shape?

End-to-end tests

Can a real user complete the flow?
Does the UI still agree with the backend contract?
Did a refactor break integration behavior?

Load tests

Does the system degrade acceptably under pressure?
Are latency and error budgets still respected?
Did a code change introduce a performance regression?

Design Lessons

Unit tests should prefer realistic dependencies over brittle mocks when possible.
End-to-end tests are only trustworthy if data seeding and cleanup are deliberate.
A growing pile of E2E workarounds is often a sign that the tool no longer matches the product.
Shared infrastructure requires test isolation by design, not by convention.
Stable selectors are part of test architecture.
Performance thresholds should fail builds, not just generate dashboards.

A Comprehensive Testing Strategy: Unit, End-to-End, and Load Testing

Building a practical testing pyramid with isolated unit tests, seeded end-to-end flows, and load testing that enforces performance thresholds before release.