System Design FAQ: Top Questions
13. How would you design a Real-Time Analytics System?
A Real-Time Analytics System collects, processes, aggregates, and displays metrics and logs within seconds of data generation. This is vital for dashboards, alerting, fraud detection, etc.
๐ Functional Requirements
- Ingest event data from multiple sources (web, mobile, backend)
- Stream processing and aggregation
- Queryable analytics dashboard
๐ฆ Non-Functional Requirements
- Sub-second or near real-time latency
- Horizontal scalability and backpressure handling
- Durable, fault-tolerant data pipeline
๐๏ธ Architecture Components
- Producers: Client SDKs and backend services
- Ingestion: Kafka, Amazon Kinesis, or Pub/Sub
- Stream Processing: Apache Flink, Spark Streaming, Kafka Streams
- Data Store: Druid, ClickHouse, or BigQuery for OLAP-style queries
- Dashboard: Superset, Grafana, or custom UI
๐ค Example Event Schema (JSON)
{
"event_type": "page_view",
"timestamp": "2025-06-11T12:00:00Z",
"user_id": "u1234",
"page": "/pricing",
"device": "mobile",
"country": "US"
}
๐งช Kafka Topic Config (Example)
# Topic: user-events
cleanup.policy=compact
retention.ms=86400000
compression.type=snappy
num.partitions=12
โ๏ธ Flink SQL Query Example
SELECT
TUMBLE_START(event_time, INTERVAL '1' MINUTE) AS window_start,
COUNT(*) AS page_views
FROM page_events
GROUP BY TUMBLE(event_time, INTERVAL '1' MINUTE);
๐๏ธ Druid Ingestion Spec (Partial)
{
"type": "kafka",
"dataSchema": {
"dataSource": "realtime_views",
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "MINUTE"
},
"timestampSpec": {
"column": "timestamp",
"format": "iso"
}
}
}
๐ Visualization (Grafana)
- Data source: ClickHouse or Druid plugin
- Panels: Total events, active users, latency percentiles
- Alerts: Page view drops or spike detection
๐ Observability
- Event ingestion lag (Kafka lag metrics)
- Streaming job failures or throughput drops
- Query response latency spikes
๐ Final Insight
Real-time analytics systems require a balance between throughput and latency. Durable event ingestion, scalable stream processing, and columnar stores like ClickHouse or Druid enable powerful, low-latency insights.
