System Design FAQ: Top Questions
57. How would you design a Distributed Lock Service?
A Distributed Lock Service allows multiple nodes in a distributed system to coordinate exclusive access to shared resources. It prevents race conditions in scenarios like cron jobs, microservices, leader election, or database writes.
๐ Functional Requirements
- Acquire and release lock on named resources
- Support lock TTL (time-to-live)
- Handle retries and deadlocks
๐ฆ Non-Functional Requirements
- High availability and fault tolerance
- Low latency lock acquisition
- Correctness under network partitions
๐๏ธ Architecture Options
- Redis-based: Use
SET NX PX
+ Lua script for safe release - Etcd or ZooKeeper: Lease and versioned key control
- DynamoDB: With conditional writes
๐ Redis Locking (Redlock Algorithm)
SET resource_name unique_value NX PX 30000
NX: Only set if not exists
PX 30000: Expire after 30 seconds
unique_value: UUID to identify the lock owner
๐งช Release Lock (Safe Lua Script)
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
๐ Etcd Lease Lock Example
# Grant a lease with 15s TTL
etcdctl lease grant 15
# Put key with lease
etcdctl put lock/job1 owner123 --lease=123456
# Renew or revoke lease
etcdctl lease keep-alive 123456
๐ Retry Policy
- Exponential backoff with jitter
- Optional circuit breaker if retries spike
๐ Metrics to Monitor
- Lock acquisition latency
- TTL expiration vs manual release ratio
- Failed vs successful lock attempts
๐งฐ Tools/Infra
- Redis, Etcd, ZooKeeper
- UUID generators, Lua scripting
- Prometheus + Grafana for observability
๐ Final Insight
A distributed lock should be simple yet resilient. Redis + UUID + Lua is popular for ephemeral coordination. Etcd is preferred for stronger consistency and leader election. Always combine locks with timeout and observability to avoid deadlocks.