System Design FAQ: Top Questions
29. How would you design a Document Versioning System?
A Document Versioning System maintains and tracks versions of documents over time. It allows users to create, retrieve, compare, and restore past versions.
๐ Functional Requirements
- Save full or incremental versions of documents
- Support rollback to previous versions
- Allow diff viewing (e.g., changes between versions)
- Maintain metadata (author, timestamp, message)
๐ฆ Non-Functional Requirements
- Efficient storage for large documents
- Quick retrieval and low latency access
- High durability and eventual consistency
๐๏ธ Core Components
- Document Store: Stores content and metadata
- Version Manager: Manages diffs and history
- Diff Engine: Computes and applies deltas
- API Gateway: Provides RESTful endpoints
๐ Document Schema (MongoDB)
{
"_id": "doc_1234",
"latest_version": 6,
"title": "System Design Notes",
"versions": [
{
"version": 1,
"author": "alice",
"created_at": "2024-01-01T10:00:00Z",
"content": "Initial draft...",
"comment": "created doc"
},
{
"version": 2,
"author": "bob",
"diff": "+ Added intro\n- Removed placeholder",
"created_at": "2024-01-02T11:00:00Z",
"comment": "cleanup"
}
]
}
๐ Delta Storage Strategy
- Full snapshot every N versions (e.g., 5)
- Interleaving diffs stored with metadata
- Reverse diffs optional for backward reconstruction
๐งช Python Example: Creating Version Diffs
import difflib
def generate_diff(old_text, new_text):
diff = difflib.unified_diff(
old_text.splitlines(), new_text.splitlines(), lineterm=''
)
return "\n".join(diff)
old = "System Design is hard."
new = "System Design is hard but rewarding."
print(generate_diff(old, new))
๐งพ REST API Endpoints
POST /docsโ create new docPOST /docs/:id/versionsโ commit a new versionGET /docs/:id/versions/:vโ retrieve specific versionGET /docs/:id/diff/:v1/:v2โ compare versions
๐งฐ Tools/Infra Used
- Database: MongoDB / DynamoDB
- Diff Engine: difflib / google-diff-match-patch
- API Layer: FastAPI / Express.js
๐ Observability
- Version creation frequency
- Document rollback counts
- Diff size and compute time
๐ Final Insight
Document versioning enables collaboration and traceability. Storing full snapshots selectively with delta compression ensures balance between retrieval speed and storage efficiency. APIs should be intuitive for editor UIs and restore flows.
