Webhooks vs Polling: Why Push Won
There are two ways to learn that something happened in another system. Polling: you ask repeatedly — "any new orders?" every 30 seconds, all day, mostly hearing "no". Webhooks: you register a URL once, and the other system POSTs to it the moment an event occurs.
The trade-off is stark. Polling every 30 seconds means up to 2,880 requests per day per resource — nearly all wasted — with up to 30 seconds of latency on every event. A webhook is one request per actual event, delivered within seconds. For any event-driven integration (payments, CI, messaging, order fulfillment), push wins on latency, cost, and rate-limit budget simultaneously.
A typical webhook delivery looks like this:
POST /webhooks/stripe HTTP/1.1
Host: api.yourapp.com
Content-Type: application/json
Stripe-Signature: t=1720080000,v1=5257a869e7ecebeda32affa62cdca3fa51cad7e77a0e56ff536d0ce8e108d8bd
{
"id": "evt_1PXk2j2eZvKYlo2C",
"type": "payment_intent.succeeded",
"data": { "object": { "id": "pi_3PXk...", "amount": 4999, "currency": "usd" } }
}
But webhooks flip the client/server relationship, and that flip creates every problem in the rest of this guide: your endpoint is now a public URL that anyone on the internet can POST to, receiving events you did not request, possibly duplicated, possibly out of order, from a sender who will give up on you if you respond too slowly.
Security Rule #1: Verify Signatures — Correctly
An unverified webhook endpoint is an open door: anyone who discovers the URL can forge a "payment succeeded" event and get free product. Every serious provider signs its deliveries, almost universally with HMAC-SHA256: the provider computes a keyed hash of the raw request body using a shared secret and sends it in a header (Stripe-Signature, X-Hub-Signature-256 for GitHub, X-Shopify-Hmac-Sha256…). Your handler recomputes the HMAC and compares.
A correct Node.js verification (GitHub-style):
import crypto from 'crypto';
function verifySignature(rawBody, signatureHeader, secret) {
const expected = 'sha256=' +
crypto.createHmac('sha256', secret).update(rawBody).digest('hex');
// timing-safe comparison — never use ===
return expected.length === signatureHeader.length &&
crypto.timingSafeEqual(Buffer.from(expected), Buffer.from(signatureHeader));
}
The four mistakes that appear in almost every broken implementation:
- Verifying a re-serialized body. HMAC is computed over the exact raw bytes. If your framework parses JSON and you verify against JSON.stringify(req.body), key reordering or whitespace differences break verification randomly. Capture the raw body before any middleware parses it.
- Using == or === to compare signatures. String comparison short-circuits at the first differing character, leaking timing information that lets attackers reconstruct a valid signature byte by byte. Always use a constant-time comparison (crypto.timingSafeEqual, hmac.compare_digest in Python).
- Ignoring the timestamp. Providers like Stripe include a timestamp in the signed payload precisely so you can reject old deliveries (commonly older than 5 minutes). Without that check, a captured request can be replayed later — a valid signature is not proof of freshness.
- Confusing HTTPS with authentication. TLS encrypts the transport; it says nothing about who sent the request. An HTTPS endpoint with no signature check is still an open door.
Rotate webhook secrets periodically, store them like any credential (not in code), and if the provider offers it, also pin the event types you accept.
Delivery Semantics: Retries, Duplicates, and Ordering
Webhook providers promise at-least-once delivery — never exactly-once. When your endpoint times out or returns 5xx, the provider retries with exponential backoff (Stripe retries for up to 3 days; GitHub, Shopify and others have similar schedules). Three consequences follow, and each demands a pattern:
1. You will receive duplicates → be idempotent
A retry after a timeout your handler actually completed means the same event arrives twice. If the handler ships an order per event, someone gets two packages. The fix: every event carries a unique ID — record processed IDs and skip repeats:
-- atomic idempotency guard
INSERT INTO processed_events (event_id) VALUES ($1)
ON CONFLICT (event_id) DO NOTHING;
-- if no row was inserted, this event was already handled: return 200 and stop
2. Events arrive out of order → trust the source, not the sequence
Retries and parallel delivery mean "subscription.updated" can arrive before "subscription.created". Never build state by folding events in arrival order. The robust pattern: treat the webhook as a notification that something changed, then fetch the current state from the provider's API (or use the full object embedded in the event, comparing timestamps or version numbers before overwriting newer local state).
3. Slow handlers get dropped → acknowledge fast, process async
Providers typically time out after 10–30 seconds; consistently slow endpoints get suspended. The production pattern is fast-ack: verify the signature, persist the raw event to a queue or table, return 200 immediately — then process from the queue with your own retry policy. Your response time becomes milliseconds regardless of how heavy the real work is, and a bug in processing no longer causes redelivery storms.
Debugging Webhooks Without Losing Your Mind
Webhooks are miserable to debug precisely because the client is someone else's server calling an endpoint that must be publicly reachable. The standard toolkit:
- Capture before you code. Point the provider at a request-capture endpoint first and look at real deliveries — exact headers, exact body, exact content type — before writing a line of handler code. Payloads routinely differ from documentation.
- Tunnel to localhost. Tools like ngrok, Cloudflare Tunnel, or localtunnel give your dev machine a public HTTPS URL so providers can reach your local handler while you set breakpoints.
- Replay from the provider dashboard. Stripe, GitHub, and Shopify all let you view delivery history — request, response, and status — and redeliver any event with one click. This is the single most useful debugging surface; check it before adding logging.
- Use provider CLIs. The Stripe CLI (stripe listen, stripe trigger) forwards live events to localhost and fires synthetic test events on demand — no tunnel needed. GitHub's CLI can redeliver hook payloads similarly.
- Send test requests yourself. To test signature verification and error paths, send crafted POSTs with valid and deliberately invalid HMAC signatures at your endpoint and confirm it accepts the former and rejects the latter — a webhook tester that supports custom headers and HMAC signing does this in seconds.
- Log the failures, not just the successes. Persist every rejected delivery (bad signature, unknown event type, processing error) with the raw body. When a provider changes payload format — it happens — the rejects log is how you find out before customers do.
The Production Hardening Checklist
Everything above, condensed into the list worth pinning next to your handler code:
- ✅ Verify HMAC signatures on the raw body with a timing-safe comparison; reject anything unsigned or stale (timestamp older than ~5 minutes).
- ✅ Return 2xx fast (< 1s): verify, enqueue, acknowledge. Do the real work asynchronously with your own retries and a dead-letter queue.
- ✅ Be idempotent: dedupe on the event ID atomically; design every side effect to be safe to attempt twice.
- ✅ Tolerate disorder: never assume arrival order; fetch current state or compare object versions before overwriting.
- ✅ Validate before trusting: check the event type is one you expect, parse defensively, and treat payload fields as untrusted input (they are).
- ✅ Return the right codes: 2xx for handled (including duplicates and events you deliberately ignore — do not make the provider retry those), 4xx for permanently invalid, 5xx only when a retry might genuinely succeed.
- ✅ Monitor the endpoint: alert on signature-failure spikes (attack or secret rotation gone wrong), on delivery-failure rates from the provider dashboard, and on queue depth.
- ✅ Plan for outages: after downtime longer than the provider's retry window, reconcile by listing recent events/objects from the provider's API — webhooks are a latency optimization, polling is the backstop.
Handlers built this way are boring — they shrug off duplicates, replay attacks, provider outages, and payload changes. In webhook engineering, boring is the whole goal.