At an agency I worked at, one of the acceptance criteria was test coverage above 80%. We wrote code to hit the number. 79% meant a red pipeline; 81% and deploys went through. After the green light, bugs kept showing up in production, because the tests we wrote measured which lines executed, not whether those lines behaved correctly.
The 80% figure has turned into folklore. Nobody even remembers where it came from. Probably a speculative aside in a Martin Fowler post or an old Google doc. Quoted without context, it does real harm.
The real question: what are we trying to measure?
Line coverage: did the line execute? Branch coverage: were the if/else branches tested separately? Function coverage: was every declared function called at least once? Path coverage: were the combined branches walked? Mutation coverage: if we deliberately break code, do the tests catch it?
These don’t carry equal weight. A project can hit 95% line coverage and 40% mutation coverage. The second number is the honest one; the first lies to you.
I saw this play out on an iOS project. A view model had 100% coverage. Every method had a test. But the tests just called the function and checked that it didn’t throw. The actual business logic was never asserted. A validation function was producing the wrong output and the test passed, because the test only confirmed the function was called. I only caught it during a refactor.
On the flip side, I’ve shipped a project with 30% coverage that ran beautifully. I wasn’t the only author; a disciplined colleague had covered the critical paths with high-quality integration tests. Edge cases, concurrency, error handling. The rest of the code was UI wiring and display logic, things with genuinely low test value. The project ran in production for two years without a single critical-path bug.
Which parts of the codebase are worth testing?
- Anything touching money, security, or authorization
- Complex domain logic
- Places where hard-to-reproduce bugs have hit you before
- The public API surface
- The boundaries of third-party integrations
Where is testing marginal?
- Simple getters/setters
- Trivial wrappers around a third-party library
- Short-lived experiments
- Things the framework already tests (asserting that a SwiftUI View renders)
The policy I’ve run with a team: set a coverage floor, not a target. New code can’t drop coverage below the current level. If we’re at 55%, a new PR can’t bring it to 54%. But we don’t force it toward 56 either. That way we write tests where they matter and don’t trick ourselves into writing them where they don’t.
For mutation testing, I’ve used Muter on iOS and Infection in PHP. The results are humbling. The module you’re bragging about at 90% coverage comes back with a 40% mutation score, and you finally see the real picture.
Coverage is a signal, not a goal. Below 80%, ask why. Above it, don’t relax. The real question isn’t whether the lines executed; it’s whether the tests actually cover the scenarios that matter.