System Design FAQ: Top Questions
22. How would you design a Time-Series Database (TSDB)?
A Time-Series Database (TSDB) stores and queries time-stamped data efficiently, such as metrics, logs, or sensor data. TSDBs optimize for insert rate, compression, and range queries.
📋 Functional Requirements
- Insert large volumes of timestamped data
- Efficient range scans and aggregation (e.g., avg, sum)
- Retention policy for old data
- Downsampling and compaction support
📦 Non-Functional Requirements
- High write throughput (millions/sec)
- Low-latency reads over ranges
- Compression and deduplication
🏗️ Core Components
- Write API: Accepts time-series points via HTTP/UDP
- Ingestion Buffer: Stores recent writes in memory (WAL, memtable)
- Compactor: Merges and compresses old blocks
- Storage Engine: Appends to columnar store by time (e.g., Parquet, LSM)
- Query Layer: Supports PromQL/SQL for querying and visualization
🔧 InfluxDB Line Protocol Example
# Format: measurement,tags fields timestamp
cpu,host=web01 usage=0.87 1718070000000000000
🧱 Data Model Example (Prometheus)
metric: http_requests_total
labels: {method="GET", handler="/api"}
value: 1468
timestamp: 1718070000000
📉 Query Example (PromQL)
rate(http_requests_total{handler="/api"}[5m])
📁 Storage Strategy
- Partition data by metric → time window → block (TSM or Gorilla)
- Use delta-encoding, XOR compression
- Files stored in object storage or SSD
🗑️ Retention and Downsampling
- High-resolution data kept for 7 days
- Downsampled (e.g., hourly avg) data stored for 6 months
📈 Use Cases
- Infrastructure monitoring (e.g., Grafana+Prometheus)
- IoT sensor telemetry
- Financial tick data ingestion
📌 Final Insight
A TSDB balances write-heavy ingestion with query efficiency. Optimizations include using in-memory buffers, aggressive compaction, and block-oriented storage. Retention and aggregation policies shape scalability and cost.
