System Design FAQ: Top Questions

38. How would you design a Distributed Cache System like Memcached or Redis?

A Distributed Cache System speeds up data access by storing frequently read data in memory. It reduces backend load, improves latency, and is crucial in high-traffic applications.

📋 Functional Requirements

Read/write to cache with defined TTL (time-to-live)
Auto-evict stale or least-used items (LRU)
Distributed consistency across nodes
Support for pub/sub, atomic ops, and eviction policies

📦 Non-Functional Requirements

Low-latency (<1ms read, <5ms write)
High availability and data partitioning
Horizontal scalability

🏗️ Core Components

Client SDK: Encodes keys, sends requests, applies consistent hashing
Shard Map / Ring: Maps keys to node clusters
Cache Nodes: Store K/V in RAM, run eviction strategies
Failover Logic: Detects down nodes and reroutes

🔄 Consistent Hashing with Virtual Nodes (Python)


import hashlib

def get_node(key, nodes):
    ring = sorted((hashlib.md5((n+v).encode()).hexdigest(), n)
                  for n in nodes for v in ["#1", "#2", "#3"])
    key_hash = hashlib.md5(key.encode()).hexdigest()
    for h, n in ring:
        if key_hash <= h:
            return n
    return ring[0][1]

🧪 Redis Caching with TTL


SET page:/docs/home "html" EX 300
GET page:/docs/home

🧠 Lazy vs Write-Through

Lazy: Read-miss triggers backend fetch and populate cache
Write-through: Every DB write updates cache immediately

🔐 Cache Invalidation Patterns

On write/update, evict relevant keys
Use pub/sub to notify other nodes
Include etag or version tag in cache entry

📊 Eviction Strategy

LRU (Least Recently Used) or LFU (Least Frequently Used)
TTL expiration, max memory usage trigger

📈 Observability

Cache hit/miss ratio
Eviction and memory usage metrics
Node latency and failover events

🧰 Tools/Infra Used

Cache Nodes: Redis, Memcached
Coordination: Consul, ZooKeeper
Monitoring: Datadog / Prometheus

📌 Final Insight

Caching is a backbone of scalable systems. To avoid stale data and hotspot nodes, design with smart keying, invalidation, and failover-aware partitioning.

←→