System Design FAQ: Top Questions

48. How would you design a Time-Series Data Store like InfluxDB or Prometheus?

A Time-Series Database (TSDB) stores sequences of data points indexed by time. It is optimized for fast ingest and aggregation over time intervals, often used for monitoring, metrics, and IoT.

📋 Functional Requirements

Store timestamped data with tags/labels
Support downsampling and time-windowed queries
Handle millions of metrics/sec
Compact old data to reduce space

📦 Non-Functional Requirements

High write throughput
Efficient range scans by time
Retention and compression of historical data

🏗️ Core Components

Ingest Service: Parses and validates time-series points
Storage Engine: Append-only store with time-based partitions
Compactor: Downsamples and deduplicates
Query Engine: PromQL/InfluxQL interface for rollups

🗄️ Schema Example


CREATE TABLE metrics (
  ts TIMESTAMPTZ NOT NULL,
  metric TEXT,
  value DOUBLE PRECISION,
  labels JSONB,
  PRIMARY KEY (metric, ts)
) PARTITION BY RANGE (ts);

🔁 Write Example with Labels


{
  "metric": "cpu_usage",
  "value": 93.2,
  "ts": "2025-06-11T18:00:00Z",
  "labels": {
    "host": "web-12",
    "env": "prod",
    "region": "us-west"
  }
}

⚙️ Downsampling Aggregation Query


SELECT time_bucket('1 minute', ts) AS bucket,
       avg(value) AS avg_cpu
FROM metrics
WHERE metric = 'cpu_usage'
  AND ts >= now() - interval '1 hour'
GROUP BY bucket
ORDER BY bucket;

🗃 Compression Strategy

Use Gorilla-style delta encoding or Snappy for blocks
Store recent data in-memory (memtable); flush to disk in chunks
Use columnar layout (TSM, Parquet) for query efficiency

📈 Observability

Ingest rate per metric
Disk space by retention bucket
Query latency for different time ranges

🧰 Tools/Infra Used

Storage: TimescaleDB, InfluxDB, Apache Druid
Aggregation Engine: Prometheus + Thanos/Cortex
Compression: Gorilla, ZSTD, Snappy

📌 Final Insight

A robust TSDB should optimize for write performance, time-partitioned scans, and compression. Downsampling strategies and TTL-based retention ensure long-term scale and performance.

←→