System Design FAQ: Top Questions
56. How would you design a Real-Time Chat System like WhatsApp or Slack?
A Real-Time Chat System allows users to send and receive messages instantly, with features like read receipts, typing indicators, presence status, media sharing, and conversation history.
📋 Functional Requirements
- 1:1 and group messaging
- Message ordering and delivery guarantees
- Online/offline presence, typing indicator
- Support for emojis, media, file uploads
📦 Non-Functional Requirements
- Low latency & real-time delivery
- Scalability to millions of concurrent users
- Durable message storage with retries
🏗️ High-Level Architecture
- WebSocket Gateway: Maintains persistent connection
- Message Broker: Kafka, Redis Streams, or NATS
- Chat Service: Handles routing, storage, and acknowledgments
- Presence Tracker: Redis or ZooKeeper
📄 Message Schema (JSON)
{
"msg_id": "uuid123",
"sender_id": "user1",
"receiver_id": "user2",
"type": "text",
"content": "Hello!",
"ts": 1718132025,
"status": "sent"
}
📡 WebSocket Server (Node.js Snippet)
wss.on('connection', (socket) => {
socket.on('message', (msg) => {
const parsed = JSON.parse(msg);
routeMessage(parsed); // Publish to broker
});
});
🗂️ Persistence Layer (PostgreSQL)
CREATE TABLE messages (
msg_id UUID PRIMARY KEY,
sender_id TEXT,
receiver_id TEXT,
content TEXT,
ts TIMESTAMPTZ,
status TEXT
);
🔔 Notification & Retry Logic
- Push notification if receiver is offline
- Retry delivery on reconnect
- Acknowledgement receipt and state transition
📊 Observability
- Message delivery latency (p50/p95)
- WebSocket open/close counts
- Errors per user or region
🧰 Tools and Infra
- WebSocket: NGINX + uWSGI, Socket.IO, Phoenix Channels
- Storage: PostgreSQL, Cassandra, ScyllaDB
- Queue: Redis Streams, Kafka, RabbitMQ
📌 Final Insight
Real-time messaging requires coordination between persistent connections, scalable message routing, and eventual consistency. Choose appropriate queues and databases based on volume and latency sensitivity.