System Design FAQ: Top Questions
51. How would you design a Webhook Delivery System?
A Webhook Delivery System sends outbound HTTP callbacks (webhooks) to third-party URLs when events occur, such as a payment success or file upload. Reliability, security, and observability are critical in design.
📋 Functional Requirements
- Register webhook URLs for different events
- Deliver HTTP POSTs reliably with retries
- Support exponential backoff and dead letter queues
- Log attempts and enable auditing
📦 Non-Functional Requirements
- Low latency for first delivery
- At-least-once guarantee
- Support signing for authenticity
🏗️ Core Components
- Event Bus: Produces events (e.g., Kafka, RabbitMQ)
- Delivery Worker: Pulls jobs and sends HTTP requests
- Retry Engine: Schedules exponential backoff retries
- DLQ: Stores failed webhooks for analysis
🗄️ Webhook Registry Schema
CREATE TABLE webhook_endpoints (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
event_type TEXT,
url TEXT,
secret TEXT,
enabled BOOLEAN DEFAULT true
);
📤 Delivery Worker (Node.js Example)
const axios = require('axios');
const crypto = require('crypto');
async function sendWebhook(event, endpoint) {
const payload = JSON.stringify(event);
const signature = crypto.createHmac('sha256', endpoint.secret).update(payload).digest('hex');
try {
await axios.post(endpoint.url, payload, {
headers: {
'X-Signature': signature,
'Content-Type': 'application/json'
},
timeout: 3000
});
return 'delivered';
} catch (err) {
return 'retry';
}
}
🔁 Retry Logic (Redis-based Backoff)
- Initial delay: 5s, then 30s, 5min, 30min...
- Max attempts: 5
- Use sorted sets in Redis:
ZADD retry_queue timestamp job
📈 Metrics and Logging
- Success/failure rates
- Average delivery latency
- Most common endpoints failing
🔐 Security Best Practices
- Sign requests using HMAC-SHA256
- Allow IP whitelisting at receivers
- Use TLS (HTTPS-only)
🧰 Tools/Infra Used
- Queue: Kafka, RabbitMQ, or Redis Streams
- Delivery: Axios, cURL, or Go HTTP client
- Observability: Prometheus + Grafana, ELK
📌 Final Insight
Webhooks require reliable delivery with backoff and verification. A scalable architecture separates event production from delivery logic and incorporates queueing, retries, and audit logs to ensure trustworthy integrations.