Read replica strategy: when it earns its keep, when it's just complexity

Every SaaS hits the database-scaling conversation eventually. “Queries are slow, DB load is high, let’s add a read replica.” The classic answer. But a read replica isn’t a free lunch, it brings its own complexity.

I’ve made the call either way on dozens of projects. Here are the practical decision criteria and the real problems I’ve run into.

How a read replica actually works

The primary database takes a write. As soon as the transaction commits, the transaction log (WAL, binlog) streams out to the replicas. The replicas replay that log to rebuild the same state.

Reads go to a replica, writes go to the primary.

What you get:
– Read load spread off the primary onto the replicas
– Horizontal scaling for read capacity by adding more replicas
– Replicas still serve reads when the primary goes down (partial failover)
– A dedicated replica for analytics queries

Replication lag: the cost nobody talks about enough

This is the core problem. A replica is always “a bit behind” the primary. Network latency plus replay time. Usually milliseconds, sometimes seconds, occasionally minutes.

Where it bites:

1. Immediate write-then-read. A user updates their profile, refreshes the page, and the change isn’t there yet (the replica is still on the old value).

2. Sequential business logic. A user places an order, the next screen shows their orders list, and the new one is missing (replica hasn’t caught up).

3. Financial operations. A user pays, the balance should be updated. If you read from the replica you can show a stale balance.

At scale replication lag becomes a real issue. 99.9% of the time it’s 50ms, but 0.1% of the time it’s 30 seconds. Those edge cases turn into support tickets.

When to add a read replica

1. High read/write ratio (10:1 or more).

Get-heavy APIs. Product catalogs (every listing is a read, updates are rare). Dashboard metrics (constantly queried). User feeds.

2. Heavy read queries that aren’t critical.

Reporting, analytics, the detail-heavy queries in an admin panel. A bit of stale data (1 to 5 seconds) is fine. Pulling that workload off the primary is a clear win.

3. Geographic distribution.

Primary in the US, half your users in Europe. Put a replica in Europe and read latency drops from 150ms to 20ms.

4. The primary hardware is saturated.

CPU above 80%, memory full, IOPS close to the cap. Vertical scaling has hit its ceiling, horizontal (read replica) is the next move.

When not to add a read replica

1. Write-heavy workload.

Messaging, real-time collaboration, IoT sensor data. Replication lag adds a fresh delay on every write. The primary is already busy, the replica struggles to keep up.

2. Strong consistency required.

Finance, medical, inventory. “The user deposited $100, their balance has to go up by $100, right now.” A replica read gives the wrong answer here.

3. Small scale.

The primary is sitting at 30 to 40% CPU. A read replica adds complexity for no real gain. Do query optimisation and indexing first.

4. Frequent, complex schema changes.

A replica is a mirror of the primary’s structure. During schema migrations replicas can drop out of sync temporarily. Lots of schema churn multiplies the operational pain.

What the application has to handle

Once you add a replica the application layer has to change:

1. Connection routing.

Write query → primary connection
Read query → replica connection

Most ORMs support this split (ActiveRecord’s db:create_with_replicas in Rails, DATABASES config in Django). You’ll still need manual routing in places.

2. Session-level “read your writes”.

A user just updated their profile. The next read has to show their own update. Pin the session to the primary for a short window:

class Database:
    def get_read_connection(self):
        if session_has_recent_write(session_id):
            return self.primary
        return random_replica()

Tune the “recent write” window to your replication lag (roughly 5 to 10 seconds).

3. Critical reads go to the primary.

Some reads are consistency-critical. On those paths read from the primary explicitly:

def get_user_balance(user_id):
    // Financial read, always primary
    return db.primary.query("SELECT balance FROM users WHERE id = ?", user_id)

Replication lag monitoring

Metrics you have to watch:

Replication lag (seconds behind primary): under 100ms normal, 1s or more alarm
Replica CPU and IO: should track the primary
Replication thread health: is it running, is it frozen?
Replica connection count: is it overloaded?

Skip any of these and one day a replica will be an hour behind and nobody will notice.

Multiple replica strategies

Is one replica enough? More?

Single replica: read failover, simple setup. A reasonable starting point.

Multiple replicas plus a load balancer: 2 to 3 replicas, traffic spread across them. For high read-load scenarios.

Dedicated analytics replica: a separate replica just for reporting. This one can lag further behind without it mattering. Keeps the operational replicas clean.

Multi-region replicas: geographically distributed. Cross-region replication lag is higher, but local reads are very fast.

Failover scenarios

What happens when the primary goes down?

Manual failover: promote a replica to primary, point the other replicas at the new primary. 5 to 15 minutes of downtime.

Automatic failover (AWS RDS, Aurora): the primary is detected down and a replica is promoted automatically. 30 seconds to 2 minutes of downtime.

Split-brain risk: two nodes each claim to be primary. Data corruption territory. Managed databases (RDS, CloudSQL) handle this for you, self-managed setups have to be configured carefully.

Cost

A read replica isn’t free:

Each replica adds compute plus storage cost
Cross-region replicas add network transfer cost
Monitoring and alerting overhead
Operational complexity (failover planning, disaster recovery)

If the primary on AWS RDS costs $200 a month, each replica is roughly $150 to $200 on top. Three replicas, $600 to $800 in extra spend.

Is that cost justified by the DB load relief? Sometimes yes (performance is critical), sometimes no (query optimisation would do more).

Query optimisation first

Before you reach for a replica:

Analyse the slow query log. Optimise the ten slowest queries.
Missing indexes. Run EXPLAIN on the critical queries.
N+1 query problems. Fix with ORM eager loading.
Caching layer (Redis). Cache frequently accessed data.
Query result caching. Expensive queries into a materialized view or cache.

These five changes typically cut DB load by 50 to 70%. Sometimes you don’t need a replica at all.

Takeaway

A read replica is a strong scaling tool, but not for every project. It pays off in the right situations (read-heavy, analytics, geographic). It adds complexity in the wrong ones (write-heavy, strong consistency).

Make the call by asking “which problem am I solving?” Don’t add a replica because “I should probably scale”, add it to fix a specific performance problem. Set up monitoring, watch replication lag, and get the routing right in the application layer.

For most projects the order is query optimisation plus caching first, replica later. Complexity you add too early is an expensive lesson.

Read replica strategy: when it earns its keep, when it’s just complexity