Database connection pool sizing: why the defaults are wrong

Even an experienced developer will pick a connection pool size off the top of their head. “Pool size 20, should be enough.” In practice it works fine under light load, but as soon as traffic grows, strange performance issues start.

I’ve tuned connection pools on dozens of projects over the last decade. Here’s the practical formula I use and the common misconceptions I keep seeing.

The classic fallacy: “more is better”

The first reflex: “user traffic is going up, bump the pool to be safe.” You push from 20 to 200. Performance usually gets worse.

The reason: most production databases (PostgreSQL, MySQL) spin up a process or thread per connection. Each connection costs memory, CPU context switching, and lock contention. A PostgreSQL server handling 200 connections is slower than the same server handling 20.

Benchmarks show PostgreSQL on an 8 to 16 core server does best around 20 to 40 connections. Push past 100 and throughput drops.

The HikariCP formula

I found this in HikariCP’s (Java’s popular connection pool) docs and it holds up across every framework I’ve tried:

pool_size = ((core_count * 2) + effective_spindle_count)

Core count: the DB server’s CPU count. Effective spindle count: the number of disks (1 is enough on SSD).

For an 8-core server on SSD: (8*2) + 1 = 17. Round up to 20.

Take this as a starting point. You can pull it down to 15 or push up to 30 depending on the load pattern.

The real-world math

Your app server has three instances. Each pool is 20. The DB sees 60 connections.

The DB’s max_connections is 100. You’ve got 40 in reserve (admin sessions, maintenance tools, read replica sync, and so on).

Scale from 3 to 5 instances: 5 times 20 is 100, right at the DB limit. Connection allocation starts failing.

So size the pool relative to total instance count:

per_instance_pool = (db_max_connections * 0.7) / app_instance_count

Five instances, DB max 100, per instance pool of 14. 30% reserve.

Connection lifetime and leak detection

Besides pool size, another big one: connection lifetime. The default is usually “infinite”, which hides bugs:

Connection leaks: code grabbed a connection and never returned it. The pool quietly exhausts.
Stale connections: network timeouts or firewall drops. The pool thinks it has a connection but it’s dead.
Database failover: primary changes but the pool is still clinging to old connections.

Fix:

max_connection_lifetime = 30 minutes
idle_timeout = 10 minutes
connection_validation_interval = 30 seconds

With these settings:
– Every connection recycles every 30 minutes
– Idle connections are closed after 10 minutes
– Health check (SELECT 1) every 30 seconds

We also watch for leaks: every get connection carries a stack trace of “who took it”. A connection not returned in 30 seconds triggers an alert.

Read/write split

On one project the pool issue showed up differently: read-heavy workload.

If you have a read replica, route read queries through a separate pool:

Write pool: 20 connections to the primary
Read pool: 60 connections to the replica (per replica if you have more than one)

Reads are faster so the pool turnover is higher, which lets it handle more connections.

Long-running queries need their own pool

Got analytics or reporting queries that run 30+ seconds? Give them their own pool:

Main pool: 20 connections, OLTP workload
Analytics pool: 5 connections, OLAP or reporting

Now a long report query doesn’t choke the main app’s pool. It has dedicated connections.

Sometimes pushing analytics to a separate replica entirely is cleaner. Reports go to the replica, OLTP stays on the primary.

Connection pool metrics

What you need to monitor in production:

Active connections: currently in use
Idle connections: waiting
Pool saturation: active divided by max
Wait time: how long a request waits to get a connection
Timeout count: requests that failed due to connection timeout
Leak detection alerts: stack traces for connections never returned

HikariCP, PgBouncer, and similar tools export all of these. Point Datadog or Grafana at them.

When saturation approaches 90%, either grow the pool or cut DB load.

External pooler like PgBouncer

On PostgreSQL, PgBouncer adds an abstraction layer. App goes to PgBouncer goes to database. Benefits:

App-side connections go to PgBouncer (fast)
PgBouncer keeps far fewer connections open to the DB (connection multiplexing)
Less process and memory overhead on the database

A typical example: 10 app instances each with a pool of 20. Without PgBouncer that’s 200 DB connections. With PgBouncer, 10 instances times 20 means 200 connections to PgBouncer, but PgBouncer opens 20 to 30 to the DB and multiplexes.

Downside: PgBouncer’s transaction pooling mode breaks some PostgreSQL features (prepared statement cache, session-level vars). Session mode avoids this but you lose some of the multiplex advantage.

Practical starting values

What do I start with on a new project?

Small app (1 to 2 instances, under 100 concurrent users): pool size 10
Medium (3 to 5 instances, 500 to 1,000 users): pool size 20
Large (10+ instances, 5,000+ users): 10 to 15 per instance, watch total DB connections, add PgBouncer if needed
Analytics workload: separate pool of 5 to 10

Then load test. Use k6 or similar. Look at saturation and wait time. Tune with real data.

Troubleshooting

When the pool misbehaves, the questions I ask:

Is the pool saturated? Active above 80%, grow it.
Are there leaks? Long-running connections? Add stack trace logging.
Is DB load high? Growing the pool is not the answer, the DB will fall over. Optimise queries first.
Are timeouts rising? Pool is big enough, the DB is slow. Time for DB profiling.
Connection aging? If you see stale connections, set lifetime and idle timeout.

Takeaway

Pool size isn’t magic. It’s a value you can compute. Start with the HikariCP formula, respect the DB max_connection, account for instance count. Read/write split, analytics split, external pooler, those are extra tools for larger systems.

On a new project, start at 10 to 20, wire up monitoring, tune with real data. Don’t say “pool size 200 to be safe”, because it usually does the opposite.