API response caching looks simple from far away and is one of the most consistently botched parts of backend engineering. The HTTP spec has had the answer for years: ETag, Last-Modified, Cache-Control. Plenty of APIs still misuse them or skip them entirely.
I’ve invested real effort in API caching on my last three projects. Here are the patterns and the traps.
When caching is worth it
A cache trades CPU and latency for a bit of operational complexity. It’s worth it when:
- Read-heavy endpoint: product catalog, article listings, user profiles
- Expensive computation: aggregates, ML inference, full-text search
- Rate-limited upstream: third-party API calls
- Common response: every user sees the same data
Not worth it when:
- Write-heavy (every update invalidates the cache)
- Per-user unique response
- Real-time data (stock price, notification count)
- Small response with fast generation
Three cache layers
Response caching usually lives at three levels:
- Browser or client cache: on the client itself
- CDN or reverse proxy: at the edge (Cloudflare, Varnish, NGINX)
- Application cache: Redis, Memcached, in-memory
Each layer wants its own key scheme and its own invalidation story.
ETag: the honest way
An ETag (entity tag) is the hash or version of a response body. When the client receives a response it stores the ETag, and on the next request it sends If-None-Match: <etag>.
If the ETag still matches, the server returns 304 Not Modified with no body. The client uses its cached copy.
# First request
GET /api/products/42
Response:
200 OK
ETag: "abc123"
Body: {...}
# Second request (cached)
GET /api/products/42
If-None-Match: "abc123"
Response:
304 Not Modified
(no body)Strategies for generating the ETag:
Content hash: SHA-1 or MD5 of the response body. Accurate, but you pay to compute it on every request.
Version field: the version or updated_at column on the source object. Depends on a DB query.
ID plus timestamp combination: "{id}-{updated_at.timestamp}". Fast and deterministic.
def generate_etag(product):
return f'"{product.id}-{product.updated_at.timestamp()}"'Last-Modified: the alternative
Instead of (or alongside) ETag, you can use Last-Modified. The client compares with If-Modified-Since.
Response:
200 OK
Last-Modified: Wed, 20 Apr 2026 10:00:00 GMT
Subsequent request:
If-Modified-Since: Wed, 20 Apr 2026 10:00:00 GMT
Response:
304 Not ModifiedDifferences from ETag:
– Granularity: one second (ETag is arbitrary)
– Comparison: timestamp (ETag is exact match)
– Clock skew sensitivity (server and client clocks have to agree)
I prefer ETag, it’s more robust. Last-Modified earns its keep as legacy support.
Cache-Control: TTL-based caching
With a TTL-based cache, the response is considered fresh for a set duration and no revalidation happens until it expires.
Cache-Control: public, max-age=300, s-maxage=600public: any cache (browser, CDN) may store itprivate: browser cache onlymax-age=300: browser caches for five minutess-maxage=600: CDN caches for ten minutes (shared cache)
max-age and ETag combine nicely: fresh until max-age elapses, then revalidate via ETag.
stale-while-revalidate: the modern pattern
Cache-Control: max-age=60, stale-while-revalidate=3600The response is fresh for 60 seconds. For the next hour it may be served stale while the cache revalidates in the background.
User experience: no cache misses, every request is fast (and possibly slightly stale), and the background refresh means the next one is fresh.
It handles the performance-versus-consistency trade-off gracefully. Modern CDNs (Cloudflare, Fastly) support it.
Cache invalidation: the hard problem
Cache invalidation is famously one of the two hard things. Strategies:
1. TTL-based. The simplest approach: the cache expires after a set interval and you accept eventual consistency.
2. Explicit purge. After a write, invalidate the relevant cache keys. Cloudflare API, Varnish ban, Redis DEL.
def update_product(product_id, data):
db.update(product_id, data)
cache.delete(f"product:{product_id}")
cdn.purge(f"/api/products/{product_id}")3. Versioning in the key. Include a version in the cache key. Bumping the version invalidates everything under it.
product:v3:42Version bump equals full invalidation. Granular invalidation is harder.
4. Event-driven invalidation. Emit a write event (Kafka, pub/sub), have cache workers subscribe, invalidate asynchronously.
Scales well on big systems. You accept eventual consistency.
The per-user response problem
Caching per-user responses is fiddly. Vary header:
Cache-Control: private, max-age=60
Vary: Authorization, Accept-LanguageVary: Authorization means the auth token is part of the cache key, so every user gets their own entry. At the public cache layer this helps little (every authenticated user has a separate entry anyway).
Per-user caches are more effective when you keep them at the application layer: Redis with the user_id in the key.
cache.set(f"user:{user_id}:dashboard", response, ex=60)Conditional GET discipline
Clients often forget to send the ETag. If you ship an SDK or library, bake automatic ETag handling in:
class APIClient:
def __init__(self):
self.etag_cache = {}
def get(self, path):
headers = {}
if path in self.etag_cache:
headers['If-None-Match'] = self.etag_cache[path]
response = requests.get(self.base_url + path, headers=headers)
if response.status_code == 304:
return self.response_cache[path]
self.etag_cache[path] = response.headers.get('ETag')
self.response_cache[path] = response.json()
return response.json()Monitoring: the cache hit rate
To measure whether your cache is actually pulling its weight:
- Cache hit rate: percentage of requests served from cache
- Cache miss latency: p95 latency of requests that missed
- Cache size: memory usage
- Evict rate: how often the cache evicts on full
If the hit rate is below 80%, the cache isn’t doing much, your TTL may be too short, or the key strategy is off.
Custom cache: Redis direct
Sometimes standard HTTP caching doesn’t fit: complex query results, aggregated data. Go to Redis directly.
def get_user_dashboard(user_id):
cache_key = f"dashboard:{user_id}"
cached = redis.get(cache_key)
if cached:
return json.loads(cached)
dashboard = compute_dashboard(user_id) # expensive
redis.setex(cache_key, 300, json.dumps(dashboard))
return dashboardInvalidate with redis.delete(f"dashboard:{user_id}") when the user’s data changes.
Anti-patterns
1. Race conditions in cache-aside. Two requests miss concurrently, both compute, both write. Thundering herd.
Mitigation: a lock (Redis SETNX pattern), or background refresh.
2. Permanent cache. TTL set to forever. Data changes but the cache never reflects it. Stale data is permanent.
3. No max size. Redis or Memcached fills up and you OOM. Every cache needs a maxmemory policy (LRU, LFU).
4. Cache stampede. A popular key expires, 1000 concurrent requests all miss, all hit the DB. DB down.
Mitigation: probabilistic early expiration, random jitter, a lock.
Real numbers
On an e-commerce API I reworked the cache strategy:
Before: every request hit the DB, p95 of 280ms
After: ETag plus CDN plus Redis, p95 of 45ms, 87% cache hit rate
Costs dropped (DB load halved, worker count halved), latency improved, the user experience got better.
Closing thought
Decide the caching strategy per endpoint before you ship to production. The “I’ll add caching later” approach is miserable six months in, because caching decisions leak into the API design.
Take the HTTP standard mechanisms (ETag, Cache-Control) seriously. Reach for a custom cache only when you have to. A minimum viable cache beats a clever custom one.