Cloud Data Management: Scenario-Based Questions
57. How do you manage data lifecycle and retention policies in cloud-native systems?
Managing the lifecycle of cloud data ensures compliance, cost control, and performance. Effective policies dictate how long data is kept, how itβs archived, and when itβs deleted β all while meeting legal and business needs.
π¦ Lifecycle Stages
- Active: Frequently accessed data (e.g., operational DBs, logs for live dashboards).
- Warm: Occasionally accessed data β still online but cheaper storage (e.g., S3 Standard-IA).
- Cold: Rarely accessed, long-term archive (e.g., Glacier, Azure Archive).
- Deleted: Permanently removed after TTL expiry or deletion request.
π Policy Components
- Retention Rules: How long data is stored (by type, app, or compliance class).
- Transition Rules: Move between tiers based on age or access.
- Deletion Schedules: Final removal after legal/compliance TTLs.
- Overrides & Locks: Legal holds, GDPR delete requests, WORM policies.
π§° Cloud Tools
- AWS: S3 Lifecycle Policies, DynamoDB TTL, CloudWatch log retention settings.
- GCP: Object Lifecycle Management, BigQuery table expiration.
- Azure: Blob lifecycle rules, Retention Policies for logs and backups.
β Best Practices
- Classify data by access pattern and regulatory requirement.
- Automate tiering and deletion via lifecycle rules.
- Audit configurations regularly β TTL, encryption, access logs.
- Involve legal/data governance teams for retention SLAs.
π« Common Pitfalls
- No TTL on logs or staging datasets β leads to spiraling costs.
- Deleting data prematurely and violating SLAs/compliance.
- Inconsistent policies across teams or regions.
π Final Insight
Data lifecycle isnβt just a storage problem β itβs a product, compliance, and operations challenge. Proactive policy design keeps systems lean, legal, and performant.