System Design FAQ: Top Questions

36. How would you design a Rate Limiter Service?

A Rate Limiter restricts how often a client can make requests to a server in a defined window. It prevents abuse, overload, and ensures fair usage across users or APIs.

📋 Functional Requirements

Define request limit per user, IP, or API key (e.g., 100 requests/min)
Enforce limits with near-zero latency
Support burst allowance and quota renewal
Expose API to retrieve remaining quota

📦 Non-Functional Requirements

Low latency (sub-5ms checks)
High availability and distributed consistency
Global enforcement across edge and backend

🏗️ Core Components

Limiter Engine: Token Bucket or Sliding Window implementation
KV Store: Redis with TTLs per client key
API Gateway Middleware: Calls limiter on every request
Dashboard: Quota management and analytics

🚦 Token Bucket Logic (Python)


def allow_request(user_id):
    key = f"bucket:{user_id}"
    capacity, refill_rate = 100, 1.6  # tokens/min
    now = time.time()

    bucket = redis.hgetall(key)
    last_refill = float(bucket.get("last", now))
    tokens = float(bucket.get("tokens", capacity))

    elapsed = now - last_refill
    tokens = min(capacity, tokens + elapsed * refill_rate)

    if tokens >= 1:
        redis.hmset(key, {"tokens": tokens - 1, "last": now})
        return True
    else:
        return False

🔧 Redis TTL Key Setup


SET bucket:user123.tokens 50 EX 60 NX

📊 API Rate Limit Header Example


HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 37
X-RateLimit-Reset: 1691878800

🧪 Testing Strategy

High-throughput script with expected failure after N requests
Latency benchmarks under load
Simulate clock drift / multi-region collisions

📈 Observability

Throttle counts per endpoint and region
Quota exhaustion alerts
Redis operation error rate and latency

🧰 Tools/Infra Used

Redis: Fast atomic operations with TTL support
Go/Python: Middleware or service implementation
API Gateway: Envoy, Kong, or NGINX with plugin
Grafana: Metrics dashboard for quotas

📌 Final Insight

Rate limiting is critical for API safety. Use a resilient store (like Redis), fair algorithms (Token Bucket), and good telemetry to protect infrastructure while preserving user experience.

←→