System Design FAQ: Top Questions

57. How would you design a Distributed Lock Service?

A Distributed Lock Service allows multiple nodes in a distributed system to coordinate exclusive access to shared resources. It prevents race conditions in scenarios like cron jobs, microservices, leader election, or database writes.

📋 Functional Requirements

Acquire and release lock on named resources
Support lock TTL (time-to-live)
Handle retries and deadlocks

📦 Non-Functional Requirements

High availability and fault tolerance
Low latency lock acquisition
Correctness under network partitions

🏗️ Architecture Options

Redis-based: Use SET NX PX + Lua script for safe release
Etcd or ZooKeeper: Lease and versioned key control
DynamoDB: With conditional writes

🔐 Redis Locking (Redlock Algorithm)


SET resource_name unique_value NX PX 30000

NX: Only set if not exists
PX 30000: Expire after 30 seconds
unique_value: UUID to identify the lock owner

🧪 Release Lock (Safe Lua Script)


if redis.call("get", KEYS[1]) == ARGV[1] then
  return redis.call("del", KEYS[1])
else
  return 0
end

📄 Etcd Lease Lock Example


# Grant a lease with 15s TTL
etcdctl lease grant 15

# Put key with lease
etcdctl put lock/job1 owner123 --lease=123456

# Renew or revoke lease
etcdctl lease keep-alive 123456

🔁 Retry Policy

Exponential backoff with jitter
Optional circuit breaker if retries spike

📈 Metrics to Monitor

Lock acquisition latency
TTL expiration vs manual release ratio
Failed vs successful lock attempts

🧰 Tools/Infra

Redis, Etcd, ZooKeeper
UUID generators, Lua scripting
Prometheus + Grafana for observability

📌 Final Insight

A distributed lock should be simple yet resilient. Redis + UUID + Lua is popular for ephemeral coordination. Etcd is preferred for stronger consistency and leader election. Always combine locks with timeout and observability to avoid deadlocks.

←→