Cache invalidation: two hard problems and the patterns that work

“There are only two hard problems in computer science: cache invalidation, naming things, and off-by-one errors.” Phil Karlton’s joke has a serious core. Cache invalidation really is hard.

In the last 10+ years I’ve stood up cache layers on dozens of projects. The good news: there are practical patterns. The bad news: every pattern has trade-offs. There is no silver bullet.

The two core problems

Cache invalidation has two fundamental issues:

1. Stale data: the cache holds old data and the user sees the wrong answer. “I dropped the product price and users are still seeing the old one.”

2. Cache stampede: the moment the cache expires, 1,000 requests hammer the origin. The origin goes down.

These usually need different strategies. Handling both cleanly is critical.

Strategies for stale data

Time-based (TTL)

Simplest option: give each cache entry a lifetime (60 seconds, 5 minutes, 1 hour). When it expires, it invalidates.

Upside: dead simple to implement. Downside: you serve stale data for the whole TTL window.

When it works: change frequency is low and a minute or two of staleness is acceptable. Fine for most content sites.

When it doesn’t: financial data, stock levels, real-time metrics. You’re publishing wrong answers for the TTL window.

Event-based invalidation

When data changes, explicitly clear the cache:

updateProduct(productId, newData);
cache.delete(`product:${productId}`);
cache.delete(`products:list`); // Invalidate the list cache too

Upside: no stale-data problem. Downside: complexity balloons. On every update you have to think about which caches to invalidate.

When it works: critical data, zero tolerance for staleness. E-commerce pricing, stock.

To cut complexity you can version cache keys. When a product updates, product:123:v456 becomes product:123:v457. Old keys expire on their own TTL, new requests land on the new key.

Write-through cache

Writes go to both the DB and the cache:

db.save(data);
cache.set(key, data);

The cache stays in sync with the DB. Extra complexity: what if the cache write fails? What if the DB write fails?

Making this genuinely transactional is hard. It’s more of a “mostly consistent” approach.

Write-around cache

Writes skip the cache and go straight to the DB. Reads populate the cache.

Upside: simple, scalable. Downside: the first read is always a miss.

When it works: write-heavy, read-light workloads. Log data, analytics events.

The cache stampede problem

Scenario: a site with 100k users. /products/popular is cached with a 5 minute TTL. At minute 0 the cache populates, at minute 5 it expires. At minute 5 you have 10k concurrent requests. All miss, all hit the DB, the DB falls over.

Solutions:

1. Probabilistic early expiration

Let a few requests refresh early before the TTL actually expires:

function get(key) {
    const entry = cache.get(key);
    if (!entry) return refresh(key);
    
    // How close to expiry?
    const ratio = entry.age / entry.ttl;
    if (ratio > 0.9 && Math.random() < 0.1) {
        // 10% chance of early refresh
        refresh(key); // async, this request still returns old data
    }
    return entry.data;
}

Now everyone doesn’t miss at the same instant. Some requests refresh ahead of time.

2. Lock-based refresh (single-flight)

On a miss, the first request goes to the DB and the rest wait on a “fetch in progress” lock. When the first finishes, they all get the same result:

function get(key) {
    const cached = cache.get(key);
    if (cached) return cached;
    
    // Take the lock
    if (!locks.has(key)) {
        locks.set(key, fetchFromDB(key).then(data => {
            cache.set(key, data);
            locks.delete(key);
            return data;
        }));
    }
    return locks.get(key);
}

In Redis you implement this with SETNX. It works in distributed setups.

3. Stale-while-revalidate

Borrowed from HTTP cache-control. Even when the cache has expired, return the old data and refresh in the background:

function get(key) {
    const entry = cache.get(key);
    if (!entry) return fetchAndStore(key);
    
    if (entry.age > entry.ttl) {
        // Stale, but use it
        refresh(key); // Async refresh
        return entry.data;
    }
    return entry.data;
}

The user never waits, even with stale data. Fresh data lands in the background and the next request picks it up.

When to reach for it: staleness tolerance of seconds or minutes is fine. News feeds, recommendations, trending items.

A practical layered setup

My projects usually end up with these layers:

L1: application memory cache. In-process. Millisecond latency. TTL 10 to 60 seconds. Enough for single-instance apps.

L2: Redis cache. Distributed. 1 to 5ms latency. TTL minutes to hours. Shared across multiple app instances.

L3: CDN cache (HTTP level). Seconds to days of TTL. Static or semi-static content.

L4: DB materialized view. Especially for analytics and reporting. Refreshed overnight.

Different TTLs per layer, different invalidation strategies. L1: short TTL with auto-refresh. L2: event-based invalidation. L3: TTL plus purge API.

Cache key design

Key design dictates invalidation:

Versioned keys: product:123:v5. Every update bumps the version, old keys die on their own.

Hierarchical keys: products:cat:electronics:page:1. When a category changes, clear every key with that prefix (Redis SCAN).

User-scoped keys: user:456:cart. User-specific cache, clear everything when the user logs out.

Composite keys: product:123:user:456. Personalised data. Invalidation gets complicated.

Monitoring

Metrics to track:

Hit rate: below 80% means the strategy is wrong
Miss rate per key pattern: which keys tend to miss?
Staleness duration: how long did users see stale data?
Invalidation frequency: invalidating too often makes the cache pointless
Stampede events: concurrent misses on the same key?

Takeaway

Cache invalidation isn’t magic, it’s discipline. The right TTL, the right invalidation trigger, stampede protection, careful key design. Think about those four axes and you can keep both stale data and overload in check.

Adding cache to an existing complex app is usually a one to two week project. Worth it? Database load drops 70%+ and latency improves dramatically. It’s worth the investment, but know your metrics before you start.