Home / Blog / 90% of webhooks are wired wrong: the retry/ack logic that actually works

90% of webhooks are wired wrong: the retry/ack logic that actually works

Webhooks from a payments provider, an email service, a CRM. Test your handler against five minutes of downtime and it almost certainly breaks.

Webhooks sound simple: “when something happens, I’ll POST you.” But getting webhook handling right in production is harder than it looks, and most teams get it wrong.

The standard wrong implementation

A typical webhook endpoint looks like this: parse the JSON payload, switch on the event type, do something, return 200. That code works for the first week. Then you start losing:

  • Duplicate events (when the provider retries)
  • Out-of-order events (payment.succeeded, then payment.pending)
  • No signature verification, so you accept forged payloads
  • Slow work inside the handler (email send, DB write) causes timeouts
  • The handler throws, does the provider retry, and are you ready when it does?

Let’s walk through each.

1. Signature verification is non-negotiable

If anyone learns your webhook URL (and it will leak, through logs or error reports), they can send fake payloads. Every major provider offers signed webhooks.

For Stripe the signature header is validated with HMAC-SHA256. Hash the payload with the secret, compare against the header. Use a timing-safe comparison (hash_equals); normal string comparison is open to a timing attack.

2. Idempotency: the same event can arrive more than once

Webhook providers deliver “at least once”. The same event can land repeatedly:

  • Your endpoint timed out, the provider retried
  • The provider’s internal system sent a duplicate
  • You responded, but the network dropped it

Every event has a unique ID (in Stripe, event.id, a UUID). Once you’ve seen an ID, mark it processed. The processed-event store can be Redis (7 to 30 days TTL) or a DB table (with a UNIQUE index on event_id).

3. If the handler is slow, queue the work

Your handler has to respond within 5 seconds. Miss that window, the provider times out, retries, and you process duplicates (a disaster without idempotency).

Solution: validate the event in the handler and push it onto a queue. Do the real work in a background worker. The handler returns 200 in under 200ms, and Laravel Horizon, RabbitMQ, or AWS SQS processes it asynchronously.

4. Out-of-order events

Events don’t always arrive in the order they happened. A Stripe example:

  1. User cancels a subscription → subscription.updated
  2. System processes the cancel → subscription.deleted

Network reality: subscription.deleted can arrive first. You mark it deleted. Then subscription.updated arrives and you mark it active again. Wrong state.

Fix: events carry a timestamp. Before mutating state, check that the event is newer than the last one you applied. If the subscription’s updated_at is after the event’s created, the event is stale, skip it.

5. Retry strategy for a failed handler

Your background job fails: DB is down, an external API is broken. What now?

  • Panic retry: retry immediately → you load the failing system more, and it fails again
  • Exponential backoff: 1s, 2s, 4s, 8s, 16s…, best practice
  • Dead letter queue: after N failures, drop it into an errors queue for manual review

Also: log it. Every failed job’s cause needs to land somewhere. Datadog, Sentry, a custom dashboard, whatever, as long as it’s visible.

6. Watch the provider’s dashboard

Stripe, Paddle, SendGrid and friends keep webhook delivery logs. Failed deliveries show up in the dashboard. Look once a week:

  • Average response time?
  • Failed delivery rate?
  • Which event type fails most?

That dashboard is your production health monitoring.

7. Test with ngrok + the provider’s webhook tester

For local development, forward webhooks with ngrok. The provider usually offers a webhook tester, and Stripe CLI is one of the best. stripe listen --forward-to localhost:8080/webhooks forwards events locally, stripe trigger payment_intent.succeeded fires a test event.

It sends me realistic event payloads, so I can exercise the handler against real data.

Takeaway

Webhook handling is serious work in production. If you audit these 7 points against your codebase right now, at least two or three are probably broken. Especially idempotency and signature, which give you 80% of the wins.

The health of your webhooks is the health of your revenue. Treat it that way.

Have a project on this topic?

Leave a brief summary — I’ll get back to you within 24 hours.

Get in touch