API Design: Scenario-Based Questions

67. How do you design and implement rate limiting for APIs?

Rate limiting prevents abuse, ensures fair usage, and protects backend services from overload. It’s a fundamental part of API design for both public and internal interfaces.

📐 Core Design Goals

Fairness: Prevent one client from starving others.
Security: Throttle brute-force or bot activity.
Stability: Shield services from traffic spikes or DoS patterns.

⚙️ Algorithms

Token Bucket: Allows bursty traffic; tokens refill at a fixed rate.
Leaky Bucket: Smooth output rate; drops excess requests.
Fixed Window: Simple counter per time window (e.g., 1000 req/min).
Sliding Window: More accurate smoothing over multiple windows.

📊 Dimensions of Limiting

Per IP address or user ID
Per API key or client app
Per endpoint (e.g., login stricter than GET /status)
Region-aware throttling

🧰 Implementation Options

API Gateway: Native support in Kong, AWS API Gateway, Apigee, etc.
Reverse Proxies: NGINX, Envoy, HAProxy with Lua or filters.
Middleware: Express.js, Flask, Django middleware with Redis counters.
Distributed: Redis, Memcached, or custom in-memory counter services.

✅ Best Practices

Return 429 Too Many Requests with Retry-After headers.
Expose usage headers (X-RateLimit-Remaining, etc.).
Use global and per-service quotas.
Rate limit at multiple layers (edge, app, DB).

🚫 Common Pitfalls

Single-node in-memory counters (break under scale).
No visibility into rejected traffic patterns.
Inconsistent enforcement across services.

📌 Final Insight

Rate limiting is not just about protection — it’s about control and predictability. Design with visibility, fairness, and user experience in mind.

←→