Scaling WebSockets past one server: the fixes that actually held

On a messaging app, a single WebSocket server handled up to 2000 concurrent users fine. At 3000 it started failing. The CPU wasn’t the bottleneck, the file descriptor limit was. The fix was horizontal scale, but running WebSockets across multiple servers is nowhere near as simple as plain HTTP. Here’s what I learned.

What happens on a single-server setup

A WebSocket server keeps a TCP connection open per client. Linux defaults to 1024 file descriptors per user. You can raise it with ulimit -n 65535, but the real ceiling lives in kernel parameters.

sysctl -w fs.file-max=1000000
sysctl -w net.core.somaxconn=65535
sysctl -w net.ipv4.tcp_max_syn_backlog=65535
ulimit -n 65535

A single box can hold 50 to 70 thousand connections depending on your app logic. Memory grows with it, around 10 to 20 KB per idle connection.

The horizontal scale problem

In HTTP the load balancer spreads requests randomly and each request stands alone. With WebSockets the connection is persistent. Client A is pinned to server 1, client B to server 2. When A wants to message B, server 1 has to tell server 2. That’s where pub/sub comes in.

Option 1: Redis Pub/Sub

Every server subscribes to Redis. When a client sends a message, the server publishes it to Redis. All servers receive it and forward to their local clients.

// message received
function onMessageFromClient(message) {
  const envelope = { roomId: message.roomId, content: message.content };
  redis.publish('chat', JSON.stringify(envelope));
}

// redis subscriber
subscriber.subscribe('chat');
subscriber.on('message', (channel, data) => {
  const { roomId, content } = JSON.parse(data);
  getLocalClientsInRoom(roomId).forEach(c => c.send(content));
});

Easy to set up, but Redis has a throughput ceiling. Around 100k messages a second it becomes the bottleneck. At that point you need sharded Redis or a move to NATS.

Option 2: Sticky sessions

The load balancer keeps routing a client to the same server, via cookie or IP hash. Clients in the same room tend to land on the same server.

Upside: less server-to-server chatter.
Downside: if a server goes down, every client on it drops. When they reconnect onto different servers, the “where does this room live” question reopens.

Sticky sessions don’t make much sense unless the routing is room-aware, and room-aware routing is complex, it needs custom load balancer logic.

Option 3: a messaging broker (NATS, RabbitMQ, Kafka)

The next step up from Redis. A dedicated broker with high throughput, persistence, and tuning knobs. NATS JetStream is my pick, lightweight and fast.

Connection state management

When a user connects, write a presence record of “online”. Redis: SET user:123:online "server-5". Give it a TTL so if the server dies the key expires on its own.

Close cleanly on disconnect. Client disconnects aren’t always clean, though. On a network hiccup the server sees the connection as open until TCP times out. Add a heartbeat:

ws.on('pong', () => (ws.isAlive = true));
setInterval(() => {
  wss.clients.forEach(ws => {
    if (!ws.isAlive) return ws.terminate();
    ws.isAlive = false;
    ws.ping();
  });
}, 30000);

A ping every 30 seconds, kill anything that doesn’t pong back.

Authentication

Auth during the handshake. Passing a token in the query string is common but tokens leak into logs. Pass the JWT via the subprotocol header instead, it’s safer.

const ws = new WebSocket('wss://host', ['token.' + jwt]);

The server parses the subprotocol and validates the token. If it’s invalid, close the handshake.

Rate limiting

Cap per-client message rate. Assume someone will spam. A token bucket is enough:

const limiters = new Map();
function checkLimit(userId) {
  if (!limiters.has(userId)) limiters.set(userId, { tokens: 10, last: Date.now() });
  const l = limiters.get(userId);
  const now = Date.now();
  l.tokens = Math.min(10, l.tokens + (now - l.last) / 1000);
  l.last = now;
  if (l.tokens < 1) return false;
  l.tokens -= 1;
  return true;
}

One message token per second, up to a burst of 10. Anyone hammering gets throttled.

Graceful shutdown

When the server is coming down for a deploy, send clients a “disconnect soon” signal first. The client reconnects and the load balancer routes it to a new server.

process.on('SIGTERM', () => {
  wss.clients.forEach(ws => ws.send(JSON.stringify({ type: 'shutdown', reconnectIn: 5 })));
  setTimeout(() => wss.close(() => process.exit(0)), 10000);
});

Wait 10 seconds, then shut down. Client reconnect logic finds a new server inside that window.

Monitoring

Concurrent connections per server.
Message throughput.
Reconnect rate.
Memory per connection.
Latency (ping-pong round trip).

Scrape these with Prometheus, graph them in Grafana. Anything anomalous pages you immediately.

Takeaway

WebSocket horizontal scaling is an architectural decision, not a default choice. Add a pub/sub layer, don’t skip heartbeats, auth at the handshake, prepare a graceful shutdown. Do that and you’ll hit 200k concurrent, not 2k.