Batch vs streaming API endpoints: picking the right one for the client

A reporting system had one endpoint that produced a 50,000-row CSV. The first design was a single request and a single response. It worked, but for bigger customers it pushed past 30 seconds and the timeouts started. We moved to streaming and the problems shifted. Here’s when each one is right, based on actual scenarios.

Single batch response

The client sends one request and the server assembles everything into one response. JSON array, CSV, XML, it doesn’t matter.

Upsides:
– Simple, every HTTP client supports it.
– Easy to cache, CDN-friendly.
– Idempotent. Retry on failure, no drama.
– No client-side state management.

Downsides:
– Memory pressure grows with the response. Serializing a 500 MB response blows up server RAM.
– The whole computation has to finish before the first byte, so TTFB is high.
– The client waits for the whole thing. The user stares at a blank screen for 25 seconds.
– Timeouts sneak up on you, proxies default to 30 seconds.

Streaming

The server writes the response in pieces and the client reads and processes them as they arrive. HTTP chunked encoding, Server-Sent Events (SSE), WebSockets, NDJSON are the common options.

Upsides:
– First result arrives quickly, perceived performance is good.
– Memory pressure stays balanced. Results go out as they’re produced.
– Scales to large data sets.
– The user sees progress.

Downsides:
– Not every client supports every protocol. Old HTTP 1.0 proxies choke on chunked.
– Error handling is tricky. What do you do when an error hits mid-response?
– Idempotency risk. A retry can produce duplicate rows.
– CDN caching is off the table.
– Harder to test.

My decision criteria

Data volume
– Small (under 1 MB): batch.
– Medium (1-50 MB): batch still works.
– Large (50 MB+): streaming or pagination.

Real-time-ness
– Data should arrive as it’s produced: streaming.
– Nothing is useful until the whole thing is ready: batch.

Examples: LLM responses should arrive token by token, streaming is essential. But a report CSV is useless half-delivered; the user waits for the whole thing anyway, so batch is fine.

Infrastructure
– HTTP/1.1 behind nginx: buffer settings matter a lot for streaming.
– HTTP/2 makes streaming far easier.
– Behind a CDN, you have to bypass it for streaming.
– Serverless (AWS Lambda) has timeout limits, batch fits better.

Error handling
– The operation has to be atomic: batch.
– Partial results have value: streaming.

In my reporting system, streaming broke the first time out because I forgot proxy_buffering off in nginx. Nginx was accumulating the whole response and streaming was pointless. Fixing the config opened up real streaming.

There’s a practical middle ground: “chunked batch”. If the client asks for 10,000 rows, the server sends NDJSON in blocks of 1,000. Each row is its own JSON object, each line break is a chunk. The client reads it as a stream and parses per line. A single bad row doesn’t take the rest down, every line is independent. Error handling is simpler because each line stands on its own.

My last example: a checkout history endpoint. The user wants five years of purchase history. Average is 500 rows, VIP customers are 5,000. I shipped both streaming and batch behind one endpoint, chosen via header. Accept: application/x-ndjson means streaming, Accept: application/json means batch. The client declares its preference. Mobile used batch (easier to parse), the reporting screen used streaming (to show progress).

Picking the protocol comes down to how the client will actually consume the data. That’s the core rule of API design.