API Design: Scenario-Based Questions
67. How do you design and implement rate limiting for APIs?
Rate limiting prevents abuse, ensures fair usage, and protects backend services from overload. Itβs a fundamental part of API design for both public and internal interfaces.
π Core Design Goals
- Fairness: Prevent one client from starving others.
- Security: Throttle brute-force or bot activity.
- Stability: Shield services from traffic spikes or DoS patterns.
βοΈ Algorithms
- Token Bucket: Allows bursty traffic; tokens refill at a fixed rate.
- Leaky Bucket: Smooth output rate; drops excess requests.
- Fixed Window: Simple counter per time window (e.g., 1000 req/min).
- Sliding Window: More accurate smoothing over multiple windows.
π Dimensions of Limiting
- Per IP address or user ID
- Per API key or client app
- Per endpoint (e.g., login stricter than GET /status)
- Region-aware throttling
π§° Implementation Options
- API Gateway: Native support in Kong, AWS API Gateway, Apigee, etc.
- Reverse Proxies: NGINX, Envoy, HAProxy with Lua or filters.
- Middleware: Express.js, Flask, Django middleware with Redis counters.
- Distributed: Redis, Memcached, or custom in-memory counter services.
β Best Practices
- Return
429 Too Many Requests
with Retry-After headers. - Expose usage headers (X-RateLimit-Remaining, etc.).
- Use global and per-service quotas.
- Rate limit at multiple layers (edge, app, DB).
π« Common Pitfalls
- Single-node in-memory counters (break under scale).
- No visibility into rejected traffic patterns.
- Inconsistent enforcement across services.
π Final Insight
Rate limiting is not just about protection β itβs about control and predictability. Design with visibility, fairness, and user experience in mind.