When you design a distributed system, you give up the consistency side of CAP. You accept eventual consistency. But hiding that reality from the user is your job.
For the last two years I’ve been fighting eventual consistency in fintech and e-commerce projects. Users were opening support tickets saying “I just added a product, I can’t see it”, and we couldn’t answer with “the database will sync eventually”.
Here’s how I solved it.
Why eventual consistency is unavoidable
Modern systems use replication: read replicas, multi-region, cache layers, search indexes (Elasticsearch), event streams (Kafka). No two replicas see a write at the exact same moment. Sync delays run from milliseconds to seconds.
You can’t engineer this away. Replication isn’t zero-latency.
But users don’t care about the internals of eventual consistency. “I just added it, why isn’t it there?” doesn’t accept a technical answer.
The problem: read-your-writes inconsistency
The most common symptom: a user writes something, reads the same resource right after, and doesn’t see what they wrote.
Scenario: user updates their profile photo. The write goes to master. Before the 30ms replication finishes, the user page loads, reads from a replica, and shows the old photo. The user clicks twice and complains “why didn’t it change?”
Solution 1: read-your-writes consistency
For a user’s own writes, read from master instead of a replica. Implementation:
- Track the last write timestamp in the user session
- On reads, use a replica that’s caught up past the session’s last write (Lamport timestamp or LSN)
- If none is, fall back to master
In PostgreSQL, pg_last_wal_replay_lsn() returns the WAL LSN the replica has replayed up to. If that’s greater than the session’s stored write LSN, the replica is current and you can read from it. Otherwise, hit master.
There’s some complexity, but the user experience holds.
Solution 2: session consistency
The entire user session is sticky to a specific master snapshot. When the session starts, it takes a baseline from master and reads from that snapshot for the rest. Session-local consistency guaranteed.
Database vendors (AWS RDS, Cloud Spanner, CockroachDB) ship session pinning features.
Useful for session-bound data like an e-commerce cart. Not for global state.
Solution 3: optimistic UI
The pattern I reach for most: the UI updates local state immediately on user action, without waiting for backend confirmation. When the backend confirms, everything is already in sync.
User clicks “add to favorites”. The UI fills the heart icon instantly. The API call goes out in the background. Success, nothing to do. Failure, revert the icon and show an error.
async function toggleFavorite(itemId) {
setOptimisticState(itemId, 'favorited'); // UI instantly updates
try {
await api.favorite(itemId);
// Real state matches optimistic state
} catch (e) {
setOptimisticState(itemId, 'not_favorited'); // Revert
showError('Could not add to favorites');
}
}Upside:
- Instant feedback for the user
- Network latency and eventual consistency are hidden
- Perceived performance is dramatically better
Downside:
- You have to think through the failure path
- State rollback UX is delicate
- Not appropriate for read-only data
Solution 4: polling plus conflict resolution
Short-lived polling after a write. Write goes out, 200ms later re-read from a replica. Same value? You’re current. Different? Replication hasn’t caught up, wait another 500ms.
Simple but hacky. Optimistic UI is better in most cases.
Solution 5: CRDTs (Conflict-free Replicated Data Types)
Data structures that let two distributed nodes write at the same time and resolve the conflict automatically.
Popular CRDTs:
- G-Counter (grow-only counter)
- LWW-Register (last-write-wins)
- OR-Set (observed-removed set)
The backbone of collaborative editors (Figma, Notion) and offline-first apps. Goal: no write is lost when two users write simultaneously, everything merges deterministically.
Setup is complex, and there’s usually a dedicated library (Automerge, Yjs). If you’re wading into a new problem domain, use the library.
The discipline of telling the user the truth
Sometimes hiding isn’t right. At a fintech, a balance transfer takes 2 to 5 seconds (bank partner integration). Showing it optimistically would have misled users.
Pattern: transparent pending state. “Transfer in progress, this may take 2 to 3 seconds”, a spinner, a pending badge in the UI. When done, green checkmark.
The user gets clear information, no false expectations.
Monitor read replica lag
The moment you accept eventual consistency, start measuring replica lag:
- p50 and p95 replica lag (in seconds)
- Lag spikes (replication hiccup signal)
- Alarm when lag crosses the threshold
Prometheus query for PostgreSQL:
pg_replication_lag_seconds{instance="replica-1"} > 5Lag above 5 seconds breaks a lot of UX patterns. Alarm on it, investigate.
Cache invalidation: the hidden enemy
Cache layers (Redis, Memcached, CDN) are the most invisible source of eventual consistency. The DB is updated but the cache still serves the old value, and the user reads stale data.
Cache invalidation strategies:
- TTL-based: cache entries expire after a set duration. Simple, but you’re accepting stale data.
- Write-through: update the cache on write. Consistent, but write latency goes up.
- Event-driven invalidation: DB writes publish events, the cache subscribes and invalidates. Needs pub/sub infrastructure.
- Versioning: the cache key includes a version number, a write bumps the version, new cache key.
They say cache invalidation is one of the two hard problems in computer science. Take it seriously.
The closing lesson
Eventual consistency is a technical fact, UX is a choice. The user doesn’t see eventual consistency, that awareness is yours, not theirs.
There’s no silver bullet. I use read-your-writes, optimistic UI, session consistency, transparent pending state in different situations. Picking the right pattern is context-dependent.
General advice: think about UX from the start, don’t leave it as “database is eventually consistent, UI will show it somehow”. At design time, nail down which pattern goes in which flow, what it looks like, which failure mode shows which message. Retrofitting this later is painful.