System Design FAQ: Top Questions
36. How would you design a Rate Limiter Service?
A Rate Limiter restricts how often a client can make requests to a server in a defined window. It prevents abuse, overload, and ensures fair usage across users or APIs.
๐ Functional Requirements
- Define request limit per user, IP, or API key (e.g., 100 requests/min)
- Enforce limits with near-zero latency
- Support burst allowance and quota renewal
- Expose API to retrieve remaining quota
๐ฆ Non-Functional Requirements
- Low latency (sub-5ms checks)
- High availability and distributed consistency
- Global enforcement across edge and backend
๐๏ธ Core Components
- Limiter Engine: Token Bucket or Sliding Window implementation
- KV Store: Redis with TTLs per client key
- API Gateway Middleware: Calls limiter on every request
- Dashboard: Quota management and analytics
๐ฆ Token Bucket Logic (Python)
def allow_request(user_id):
key = f"bucket:{user_id}"
capacity, refill_rate = 100, 1.6 # tokens/min
now = time.time()
bucket = redis.hgetall(key)
last_refill = float(bucket.get("last", now))
tokens = float(bucket.get("tokens", capacity))
elapsed = now - last_refill
tokens = min(capacity, tokens + elapsed * refill_rate)
if tokens >= 1:
redis.hmset(key, {"tokens": tokens - 1, "last": now})
return True
else:
return False
๐ง Redis TTL Key Setup
SET bucket:user123.tokens 50 EX 60 NX
๐ API Rate Limit Header Example
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 37
X-RateLimit-Reset: 1691878800
๐งช Testing Strategy
- High-throughput script with expected failure after N requests
- Latency benchmarks under load
- Simulate clock drift / multi-region collisions
๐ Observability
- Throttle counts per endpoint and region
- Quota exhaustion alerts
- Redis operation error rate and latency
๐งฐ Tools/Infra Used
- Redis: Fast atomic operations with TTL support
- Go/Python: Middleware or service implementation
- API Gateway: Envoy, Kong, or NGINX with plugin
- Grafana: Metrics dashboard for quotas
๐ Final Insight
Rate limiting is critical for API safety. Use a resilient store (like Redis), fair algorithms (Token Bucket), and good telemetry to protect infrastructure while preserving user experience.