System Design FAQ: Top Questions
19. How would you design a Schema Registry System?
A Schema Registry is a centralized repository for managing and validating data schemas (e.g., Avro, Protobuf, JSON). It ensures producer-consumer compatibility in distributed data systems.
📋 Functional Requirements
- Register new schemas for topics (Kafka, streams)
- Enforce schema compatibility (backward, forward, full)
- Support schema versioning and deletion
- Expose APIs for producer/consumer validation
📦 Non-Functional Requirements
- Low-latency schema lookups
- ACID safety for concurrent updates
- Access control on schema modifications
🏗️ Core Architecture
- Storage: PostgreSQL, etcd, or Zookeeper for metadata
- API Layer: REST interface for schema registration and retrieval
- Compatibility Engine: Validation logic for new schema registration
- Cache: In-memory (e.g., Redis or local JVM cache) for fast access
📝 Sample Schema (Avro)
{
"type": "record",
"name": "UserEvent",
"fields": [
{ "name": "user_id", "type": "string" },
{ "name": "action", "type": "string" },
{ "name": "timestamp", "type": "long" }
]
}
🔐 Register Schema API Example (cURL)
curl -X POST http://schema-registry.local/subjects/user-events/versions -H "Content-Type: application/vnd.schemaregistry.v1+json" -d '{"schema": "{"type": "record", "name": "UserEvent", "fields":[...]}"}'
🧠 Compatibility Modes
- BACKWARD: New consumers can read old data
- FORWARD: Old consumers can read new data
- FULL: Both directions guaranteed
🛂 Role-Based Access Control
- Producers: can register/update schemas for allowed topics
- Consumers: can fetch but not modify schemas
- Admins: can delete or force-reset schemas
⚙️ Kafka Compatibility Integration
// Kafka Producer config snippet
props.put("key.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
props.put("schema.registry.url", "http://schema-registry.local");
📊 Observability
- Schema registration error rate
- Fetch latency (p95, p99)
- Incompatible schema rejection count
📌 Final Insight
A Schema Registry enforces discipline in event-driven systems, reducing serialization errors and ensuring forward compatibility. It is foundational for stream processing and contract-driven development.
