The first question whenever you wire up data sync between two systems: webhook or polling? The honest answer is usually both, depending on the case.
In the past few years I’ve answered this question twice on an e-commerce integration (Shopify plus ERP), a fintech notification system (bank webhook with a polling fallback), and an analytics platform (event ingestion). Same question every time, slightly different answer.
Here’s the framework I use.
Polling: plain, dumb, works
Polling: the client hits an endpoint on a schedule and checks for changes.
while true:
response = GET /api/orders?since=last_sync_timestamp
for order in response.orders:
process(order)
last_sync_timestamp = response.server_time
sleep(60)Upsides:
– The client pulls, so firewall and NAT aren’t your problem
– Auth is client-side, a standard Bearer token is enough
– Failure recovery is trivial: one call fails, the next catches up
– Debugging is easy: curl to test, every call shows up in logs
Downsides:
– Latency is capped by your poll interval. 60 second polling means a 30 second average delay
– Waste: most polls return “no changes”, burning bandwidth and CPU
– Server-side load: N clients polling every 60 seconds is a steady baseline load
Webhook: reactive, efficient, tricky
Webhook: the server POSTs to a URL the client provides whenever something happens. The client needs a publicly reachable endpoint.
Upsides:
– Latency measured in milliseconds. The notification arrives the moment the event does
– Efficient: no wasted polls, only real changes
– Scalable: millions of clients can each receive only their own events
Downsides:
– The client has to be publicly reachable (dev needs a tunnel, ngrok or Cloudflare Tunnel)
– Auth is harder: signature verification, secret rotation
– Delivery guarantees are fuzzy: the webhook can drop on the server side, the client can return 5xx, retry semantics vary
– Ordering isn’t guaranteed: events can arrive out of sequence
– Debugging is harder: did the POST arrive or not? Hard to inspect after the fact
Which one, when?
My decision matrix:
| Situation | Preference |
|——-|——–|
| Low-frequency data, sub-second latency required | Webhook |
| High-frequency data, seconds are fine | Polling |
| Client is firewalled or local dev | Polling |
| Event order is critical | Polling (with pagination) |
| Backend signal, unknown event frequency | Webhook |
| Both ends under your control | Webhook is ideal |
| Third-party consumer | Polling fallback is a must |
The mistake I see most: webhook-only
Webhooks are easy to love. Low latency, efficient, modern. I’ve met engineers who call polling “2010s technology”.
But a webhook-only system is fragile. If a webhook drops (server outage, network hiccup, endpoint returning 500), the event is just gone. There’s no mechanism for the consumer to notice an event never arrived.
A real incident: Shopify had a 15-minute webhook delivery delay, retried, but our endpoint hit a rate limit. Some orders landed with 99% success, 1% lost. 47 orders were missing from the ERP. Support crisis followed.
The hybrid: webhook plus reconciliation
The healthiest pattern:
- Webhook: primary for real-time events
- Polling-based reconciliation: every 1 to 6 hours, fetch “last 24 hours of events” and catch anything the webhook missed
Reconciliation job:
def reconcile_last_24h():
since = now - timedelta(hours=24)
remote_events = api.fetch_events(since=since)
local_events = db.fetch_events(since=since)
missing = [e for e in remote_events if e.id not in local_events]
for event in missing:
log.warn(f"Missed webhook: {event.id}")
process_event(event)Reconciliation runs infrequently (hourly is plenty), one job. It catches everything the webhook dropped.
Webhook delivery guarantees
If you’re designing the webhook side, guarantee these:
At-least-once delivery. Retry policy is mandatory. If the consumer returns 5xx or times out, retry with exponential backoff.
Signature verification. Every webhook should be HMAC-signed. The consumer verifies with the secret and blocks forgery.
Idempotency key. Every event gets a unique ID. If the consumer sees the same ID twice, it skips.
Delivery log. The consumer logs “this event ID arrived at this time”. Non-negotiable for support debugging.
Delivery dashboard. On the producer side, show users “in the last hour, N webhooks sent, M failed, these are retrying”. Cuts support tickets.
Consumer side: endpoint design
When you build the receiving endpoint:
Fast ack. Return 200 OK immediately and hand the processing work to an async queue. Webhook producers time out quickly (10 to 30 seconds); anything long-running needs to happen off the request thread.
Idempotency built-in. The consumer dedupes on event ID and timestamp.
Rate-limit tolerant. Burst traffic is a thing (catch-up after downtime). Queue to buffer, apply backpressure.
Logging. Log every webhook with its raw body. Otherwise debugging lost events is hopeless.
Polling: efficient if you design it right
Polling isn’t always wasteful. Good patterns:
Incremental pull with a since cursor. Don’t refetch the whole dataset every time. Fetch records newer than the last timestamp.
Conditional requests. ETag and If-Modified-Since. If nothing changed, you get a 304 Not Modified and save bandwidth.
Adaptive polling interval. Poll often when there’s activity, slowly when there isn’t. Active session: every 10 seconds. Idle: every 10 minutes.
Long polling. The client holds the request open and the server responds when something changes. You get the feel of real-time with polling’s simplicity.
Long polling is a near-alternative to webhooks and, because it’s client-initiated, it bypasses NAT and firewall issues.
GraphQL subscriptions, SSE, WebSocket
Other transports:
Server-Sent Events (SSE). One-way push from server to client over a long-lived HTTP connection. Native in browsers, no polyfill needed.
WebSocket. Bidirectional, low-latency. Chat, games, collaborative editors.
GraphQL subscriptions. Event streams over WebSocket via the subscription spec.
These can beat webhook and polling when:
– The client can hold an open connection indefinitely
– You need bidirectional messaging
– The browser talks directly to the server
For server-to-server backend integrations, webhooks usually fit better.
Final advice
Don’t ask “webhook or polling?” Ask “how do I combine them?”
- Webhook for real-time
- Polling reconciliation for reliability
- Logging on both for debugging
- A delivery dashboard for monitoring
Running two systems in parallel seems complicated at first. During an outage it’s exactly what keeps you afloat.