Database Operations: Scenario-Based Questions
59. How do you perform schema migrations with zero downtime in a production database?
Making changes to a live production database can introduce risk β from application crashes to corrupt data. Zero-downtime migrations ensure users never see errors while schemas evolve safely behind the scenes.
π οΈ Zero-Downtime Migration Principles
- Backward Compatibility: Old and new code should work with both schema versions.
- Three-Step Process: Add (non-breaking), Migrate, Remove (post-deploy cleanup).
- Avoid Locks: Never block writes β avoid long-lived DDL on large tables.
π Common Patterns
- Additive Changes: Add new columns or tables with defaults β donβt remove yet.
- Dual Writes: Temporarily write to both old and new schemas.
- Read Switch: Update app to read from new schema only once populated.
- Cleanup Phase: Remove old columns, triggers, fallback logic.
βοΈ Tools & Automation
- Online Migration Tools: pt-online-schema-change, gh-ost (for MySQL).
- Frameworks: Liquibase, Flyway, Alembic, Rails ActiveRecord migrations.
- Release Coordination: Use feature flags or staggered deploys to roll out read/write changes.
β Best Practices
- Test migrations on staging with production-like data sizes.
- Monitor latency and locking metrics during rollout.
- Communicate with downstream teams and SREs before impactful changes.
π« Common Pitfalls
- Dropping or renaming columns in use by older code.
- Running large ALTER TABLE on peak traffic without throttling.
- Not validating new data flows before removing the old one.
π Final Insight
Schema evolution is part of software growth. Treat migrations like product releases β staged, tested, and monitored β to achieve agility without user disruption.