Front PageProjectsBlogAbout
Language
A Comprehensive Testing Strategy: Unit, End-to-End, and Load Testing
March 18, 20257 min read

A Comprehensive Testing Strategy: Unit, End-to-End, and Load Testing

How to build a practical testing pyramid with isolated unit tests, seeded end-to-end flows, and load testing that enforces performance thresholds before release.

  • testing
  • playwright
  • jest
  • performance

The Testing Pyramid

A mature application usually needs three distinct kinds of confidence:

  • logic confidence
  • integration confidence
  • performance confidence

That maps naturally to:

  • unit tests
  • end-to-end tests
  • load tests

Each catches a different failure class. Treating one tier as a substitute for the others usually creates blind spots.

Unit Tests with Real In-Memory Dependencies

For backend code, the best unit tests are often not fully mocked. If the application depends heavily on a document database, an in-memory database instance can provide much better signal than fake repository objects.

beforeAll(async () => {
  db = await createInMemoryDatabase()
  await connect(db.uri)
})

afterAll(async () => {
  await db.stop()
})

This keeps the tests fast while still exercising real query semantics. The goal is not to reproduce production perfectly. The goal is to catch logic and query-shape bugs without requiring a full external database process for every developer run.

End-to-End Tests Need Controlled State

End-to-end tests should verify complete user flows through the real UI and server behavior. To do that reliably, they need deterministic data.

That usually means:

  • explicit seeding before a scenario
  • explicit cleanup afterward
  • helper commands for common user actions
  • selectors that are stable across refactors and localization

Good E2E tests are not just "click around and hope." They are reproducible environment orchestration.

When Teams Move from Older E2E Tools to Playwright

At some point, many frontend teams hit an uncomfortable truth: the problem is not just the tests. The problem is the testing tool's model of the browser.

This usually becomes obvious when the application starts depending on things like:

  • third-party payment flows inside iframes
  • embedded identity or challenge widgets
  • cross-origin interactions
  • stricter cookie and session behavior
  • multi-step flows that behave differently in CI than they do locally

That is often the moment teams start looking seriously at Playwright.

The appeal is not hype. It is practical.

Playwright tends to feel closer to how a real browser behaves in production, especially when the test needs to deal with boundaries that are awkward in older E2E tooling. Iframes are one of the most common breaking points. If a product relies on embedded checkout, embedded video, or third-party verification widgets, weak iframe support turns an end-to-end suite into a partial-confidence suite very quickly.

That matters because "we tested everything except the hardest part of the user flow" is not really the kind of confidence most teams think they have.

One of the clearest examples is payments. A lot of checkout systems render their most important inputs inside secure embedded frames. If your E2E runner cannot interact with those frames cleanly, the suite often ends up verifying that a modal opened, not that a payment flow actually worked. That is a major difference in confidence.

A Good Migration Signal: Too Many Workarounds

One of the clearest signs that it may be time to migrate test tooling is the number of special-case fixes the suite starts accumulating.

Examples of that smell include:

  • repeated CI-only retries and timeout increases
  • test logic that skips part of a third-party flow because the runner cannot handle it cleanly
  • selector hacks tied to visibility quirks rather than user behavior
  • custom login shortcuts that drift too far from real session behavior
  • constant patching around cross-origin or iframe boundaries
  • test comments that explicitly admit a critical third-party flow is being skipped or only partially checked

None of those workarounds are inherently wrong. Sometimes they are the only reasonable short-term option.

But when they become the dominant maintenance cost of the E2E suite, the team should step back and ask a harder question:

is the suite fighting the product, or is the tool fighting the product?

That is often where Playwright starts to win the argument.

Why Playwright Is Attractive for Modern Full-Stack Apps

For beginner developers, the simplest way to think about it is this:

  • unit tests check your logic
  • end-to-end tests check your product
  • the runner should behave enough like a real browser that the product can be tested honestly

Playwright is attractive in modern full-stack systems because it is strong in exactly the places where complex web apps tend to get messy:

  • iframe-heavy experiences
  • realistic browser context handling
  • session and cookie behavior
  • multi-page navigation flows
  • better consistency between local and CI execution

It also encourages a cleaner test architecture:

  • global setup can seed shared prerequisites before the suite runs
  • per-test fixtures can create and clean up user state predictably
  • helper utilities can interact with embedded frames directly instead of working around them
  • one base URL and one browser configuration can become the single source of truth for the suite

That does not mean every team must migrate. It means the testing stack should match the complexity of the product.

If the application is mostly simple forms and single-origin pages, an older E2E tool may still be enough.

If the application increasingly depends on embedded third-party surfaces, stronger browser realism becomes more valuable.

Playwright Is Not Just About Features, It Is About Honesty

This is the real lesson I think beginners should understand.

The point of an end-to-end test suite is not to generate green checkmarks. It is to tell the truth about whether the product works.

If the most important flows in the application involve:

  • secure payment widgets
  • anti-bot challenges
  • federated identity screens
  • iframe-based media or embedded tools

then the browser test runner has to be capable of interacting with those surfaces in a way that still feels honest.

That is why Playwright adoption often shows up in more mature stacks. Not because teams want a trendier library, but because they want their end-to-end tests to cover the hard parts instead of stepping around them.

Stable Selectors Matter

UI tests should avoid fragile selectors tied to styling or human-readable copy. Class names change. Text changes. Responsive layouts change.

Stable test identifiers are a better contract between the application and the test suite.

That keeps tests focused on behavior instead of presentation details.

Shared Infrastructure Requires Isolation

One of the hardest practical problems in CI is making sure E2E execution does not interfere with a shared non-production environment.

A reliable pattern is:

  • run test dependencies on alternate ports
  • isolate test databases from shared environments
  • keep test seed data separate from operational data
  • tear down aggressively after suite completion

This is less glamorous than writing assertions, but it is what makes E2E stable enough to trust.

Load Testing Is a Contract

Load testing is useful only if it enforces an explicit standard.

export const options = {
  thresholds: {
    http_req_duration: ['p95<500'],
    http_req_failed: ['rate<0.1'],
  },
}

That turns performance from a vague aspiration into a build-time contract. If latency or error rate crosses the agreed threshold, the result is not "something to keep an eye on." It is a failure.

The exact numbers are application-specific, but the discipline is universal.

What Each Tier Answers

Unit tests

  • Does this function behave correctly in isolation?
  • Are edge cases handled?
  • Does a query builder produce the right shape?

End-to-end tests

  • Can a real user complete the flow?
  • Does the UI still agree with the backend contract?
  • Did a refactor break integration behavior?

Load tests

  • Does the system degrade acceptably under pressure?
  • Are latency and error budgets still respected?
  • Did a code change introduce a performance regression?

Design Lessons

  1. Unit tests should prefer realistic dependencies over brittle mocks when possible.
  2. End-to-end tests are only trustworthy if data seeding and cleanup are deliberate.
  3. A growing pile of E2E workarounds is often a sign that the tool no longer matches the product.
  4. Shared infrastructure requires test isolation by design, not by convention.
  5. Stable selectors are part of test architecture.
  6. Performance thresholds should fail builds, not just generate dashboards.
Explore more articles