System Design FAQ: Top Questions
34. How would you design a Webhook Delivery System?
A Webhook Delivery System allows external clients (subscribers) to receive asynchronous HTTP notifications (webhooks) triggered by internal events like user signup, payment, or deploy status.
📋 Functional Requirements
- Allow clients to register webhook endpoints for specific events
- Send HTTP POST payloads reliably with retry and exponential backoff
- Support signature verification for authenticity
- Dashboard to manage delivery attempts and failures
📦 Non-Functional Requirements
- Exactly-once or at-least-once semantics
- Low-latency triggering (within seconds)
- Scalable for millions of webhooks/day
🏗️ Core Components
- Event Producer: Publishes business events (e.g. signup)
- Dispatcher: Queues and sends webhook payloads
- Retry Engine: Backs off and retries failed sends
- Delivery Store: Tracks attempt logs, status, and failures
📄 Sample Webhook Payload
{
"event": "user.signup",
"data": {
"id": "u_123",
"email": "test@example.com",
"timestamp": "2025-06-11T11:00:00Z"
},
"signature": "sha256=abcdef..."
}
🔐 Signature Verification (Node.js)
const crypto = require("crypto");
function verifySignature(payload, signature, secret) {
const hash = crypto
.createHmac("sha256", secret)
.update(payload)
.digest("hex");
return `sha256=${hash}` === signature;
}
🔁 Retry Strategy
- Initial delay: 5s → retry at 15s, 30s, 1m, 5m...
- Limit to N retries (e.g., 5)
- Persist undelivered events in DLQ (Dead Letter Queue)
📦 Redis Queue or Kafka for Dispatch Buffer
// enqueue webhook job
redis.lpush("webhook:queue", JSON.stringify(webhookJob));
📊 Admin UI Features
- List subscribers and delivery URL
- Filter by event type or response code
- Manual replay of failed attempts
📈 Observability
- Delivery success/failure rate per endpoint
- Latency histogram of webhook delivery
- Retries per webhook + DLQ size
🧰 Tools/Infra Used
- Queue: Kafka / Redis
- Worker: Node.js/Go + axios/gRPC
- Retry Mgmt: BullMQ / Sidekiq / Celery
- Observability: Prometheus + Grafana
📌 Final Insight
A robust webhook system handles scale, retries, and verification gracefully. Design with retry safety, DLQ persistence, and observability as top priorities to ensure trustworthy and debuggable delivery.
