API response caching: ETag, Last-Modified, and when to build your own

API response caching looks simple from far away and is one of the most consistently botched parts of backend engineering. The HTTP spec has had the answer for years: ETag, Last-Modified, Cache-Control. Plenty of APIs still misuse them or skip them entirely.

I’ve invested real effort in API caching on my last three projects. Here are the patterns and the traps.

When caching is worth it

A cache trades CPU and latency for a bit of operational complexity. It’s worth it when:

Read-heavy endpoint: product catalog, article listings, user profiles
Expensive computation: aggregates, ML inference, full-text search
Rate-limited upstream: third-party API calls
Common response: every user sees the same data

Not worth it when:

Write-heavy (every update invalidates the cache)
Per-user unique response
Real-time data (stock price, notification count)
Small response with fast generation

Three cache layers

Response caching usually lives at three levels:

Browser or client cache: on the client itself
CDN or reverse proxy: at the edge (Cloudflare, Varnish, NGINX)
Application cache: Redis, Memcached, in-memory

Each layer wants its own key scheme and its own invalidation story.

ETag: the honest way

An ETag (entity tag) is the hash or version of a response body. When the client receives a response it stores the ETag, and on the next request it sends If-None-Match: <etag>.

If the ETag still matches, the server returns 304 Not Modified with no body. The client uses its cached copy.

# First request
GET /api/products/42
Response:
    200 OK
    ETag: "abc123"
    Body: {...}

# Second request (cached)
GET /api/products/42
If-None-Match: "abc123"
Response:
    304 Not Modified
    (no body)

Strategies for generating the ETag:

Content hash: SHA-1 or MD5 of the response body. Accurate, but you pay to compute it on every request.

Version field: the version or updated_at column on the source object. Depends on a DB query.

ID plus timestamp combination: "{id}-{updated_at.timestamp}". Fast and deterministic.

def generate_etag(product):
    return f'"{product.id}-{product.updated_at.timestamp()}"'

Last-Modified: the alternative

Instead of (or alongside) ETag, you can use Last-Modified. The client compares with If-Modified-Since.

Response:
    200 OK
    Last-Modified: Wed, 20 Apr 2026 10:00:00 GMT

Subsequent request:
    If-Modified-Since: Wed, 20 Apr 2026 10:00:00 GMT
Response:
    304 Not Modified

Differences from ETag:
– Granularity: one second (ETag is arbitrary)
– Comparison: timestamp (ETag is exact match)
– Clock skew sensitivity (server and client clocks have to agree)

I prefer ETag, it’s more robust. Last-Modified earns its keep as legacy support.

Cache-Control: TTL-based caching

With a TTL-based cache, the response is considered fresh for a set duration and no revalidation happens until it expires.

Cache-Control: public, max-age=300, s-maxage=600

public: any cache (browser, CDN) may store it
private: browser cache only
max-age=300: browser caches for five minutes
s-maxage=600: CDN caches for ten minutes (shared cache)

max-age and ETag combine nicely: fresh until max-age elapses, then revalidate via ETag.

stale-while-revalidate: the modern pattern

Cache-Control: max-age=60, stale-while-revalidate=3600

The response is fresh for 60 seconds. For the next hour it may be served stale while the cache revalidates in the background.

User experience: no cache misses, every request is fast (and possibly slightly stale), and the background refresh means the next one is fresh.

It handles the performance-versus-consistency trade-off gracefully. Modern CDNs (Cloudflare, Fastly) support it.

Cache invalidation: the hard problem

Cache invalidation is famously one of the two hard things. Strategies:

1. TTL-based. The simplest approach: the cache expires after a set interval and you accept eventual consistency.

2. Explicit purge. After a write, invalidate the relevant cache keys. Cloudflare API, Varnish ban, Redis DEL.

def update_product(product_id, data):
    db.update(product_id, data)
    cache.delete(f"product:{product_id}")
    cdn.purge(f"/api/products/{product_id}")

3. Versioning in the key. Include a version in the cache key. Bumping the version invalidates everything under it.

product:v3:42

Version bump equals full invalidation. Granular invalidation is harder.

4. Event-driven invalidation. Emit a write event (Kafka, pub/sub), have cache workers subscribe, invalidate asynchronously.

Scales well on big systems. You accept eventual consistency.

The per-user response problem

Caching per-user responses is fiddly. Vary header:

Cache-Control: private, max-age=60
Vary: Authorization, Accept-Language

Vary: Authorization means the auth token is part of the cache key, so every user gets their own entry. At the public cache layer this helps little (every authenticated user has a separate entry anyway).

Per-user caches are more effective when you keep them at the application layer: Redis with the user_id in the key.

cache.set(f"user:{user_id}:dashboard", response, ex=60)

Conditional GET discipline

Clients often forget to send the ETag. If you ship an SDK or library, bake automatic ETag handling in:

class APIClient:
    def __init__(self):
        self.etag_cache = {}
    
    def get(self, path):
        headers = {}
        if path in self.etag_cache:
            headers['If-None-Match'] = self.etag_cache[path]
        
        response = requests.get(self.base_url + path, headers=headers)
        
        if response.status_code == 304:
            return self.response_cache[path]
        
        self.etag_cache[path] = response.headers.get('ETag')
        self.response_cache[path] = response.json()
        return response.json()

Monitoring: the cache hit rate

To measure whether your cache is actually pulling its weight:

Cache hit rate: percentage of requests served from cache
Cache miss latency: p95 latency of requests that missed
Cache size: memory usage
Evict rate: how often the cache evicts on full

If the hit rate is below 80%, the cache isn’t doing much, your TTL may be too short, or the key strategy is off.

Custom cache: Redis direct

Sometimes standard HTTP caching doesn’t fit: complex query results, aggregated data. Go to Redis directly.

def get_user_dashboard(user_id):
    cache_key = f"dashboard:{user_id}"
    cached = redis.get(cache_key)
    if cached:
        return json.loads(cached)
    
    dashboard = compute_dashboard(user_id)  # expensive
    redis.setex(cache_key, 300, json.dumps(dashboard))
    return dashboard

Invalidate with redis.delete(f"dashboard:{user_id}") when the user’s data changes.

Anti-patterns

1. Race conditions in cache-aside. Two requests miss concurrently, both compute, both write. Thundering herd.

Mitigation: a lock (Redis SETNX pattern), or background refresh.

2. Permanent cache. TTL set to forever. Data changes but the cache never reflects it. Stale data is permanent.

3. No max size. Redis or Memcached fills up and you OOM. Every cache needs a maxmemory policy (LRU, LFU).

4. Cache stampede. A popular key expires, 1000 concurrent requests all miss, all hit the DB. DB down.

Mitigation: probabilistic early expiration, random jitter, a lock.

Real numbers

On an e-commerce API I reworked the cache strategy:

Before: every request hit the DB, p95 of 280ms
After: ETag plus CDN plus Redis, p95 of 45ms, 87% cache hit rate

Costs dropped (DB load halved, worker count halved), latency improved, the user experience got better.

Closing thought

Decide the caching strategy per endpoint before you ship to production. The “I’ll add caching later” approach is miserable six months in, because caching decisions leak into the API design.

Take the HTTP standard mechanisms (ETag, Cache-Control) seriously. Reach for a custom cache only when you have to. A minimum viable cache beats a clever custom one.