Cloud Data Management: Scenario-Based Questions
57. How do you manage data lifecycle and retention policies in cloud-native systems?
Managing the lifecycle of cloud data ensures compliance, cost control, and performance. Effective policies dictate how long data is kept, how it’s archived, and when it’s deleted — all while meeting legal and business needs.
📦 Lifecycle Stages
- Active: Frequently accessed data (e.g., operational DBs, logs for live dashboards).
- Warm: Occasionally accessed data — still online but cheaper storage (e.g., S3 Standard-IA).
- Cold: Rarely accessed, long-term archive (e.g., Glacier, Azure Archive).
- Deleted: Permanently removed after TTL expiry or deletion request.
📃 Policy Components
- Retention Rules: How long data is stored (by type, app, or compliance class).
- Transition Rules: Move between tiers based on age or access.
- Deletion Schedules: Final removal after legal/compliance TTLs.
- Overrides & Locks: Legal holds, GDPR delete requests, WORM policies.
🧰 Cloud Tools
- AWS: S3 Lifecycle Policies, DynamoDB TTL, CloudWatch log retention settings.
- GCP: Object Lifecycle Management, BigQuery table expiration.
- Azure: Blob lifecycle rules, Retention Policies for logs and backups.
✅ Best Practices
- Classify data by access pattern and regulatory requirement.
- Automate tiering and deletion via lifecycle rules.
- Audit configurations regularly — TTL, encryption, access logs.
- Involve legal/data governance teams for retention SLAs.
🚫 Common Pitfalls
- No TTL on logs or staging datasets — leads to spiraling costs.
- Deleting data prematurely and violating SLAs/compliance.
- Inconsistent policies across teams or regions.
📌 Final Insight
Data lifecycle isn’t just a storage problem — it’s a product, compliance, and operations challenge. Proactive policy design keeps systems lean, legal, and performant.
