Infrastructure as Code: Scenario-Based Questions
10. How do you detect and manage infrastructure drift in cloud environments?
Infrastructure drift occurs when the actual cloud infrastructure deviates from its declared state in code. This leads to unpredictable behavior, security gaps, and failed deployments.
๐ Drift Detection Techniques
- Terraform: Use
terraform plan
orterraform plan -detailed-exitcode
to identify drift. - Pulumi: Run
pulumi preview
to detect resource mismatches. - AWS Config: Continuously monitors AWS resources and compares them to predefined rules or baselines.
- Third-party tools: Driftctl, Terrascan, Infracost for in-depth drift and security analysis.
๐ Common Sources of Drift
- Manual changes via console (e.g., security group edits, scaling configs).
- CI/CD bypasses for urgent fixes or test experiments.
- Untracked resources provisioned outside IaC tooling.
- Misconfigured lifecycle blocks in Terraform or Pulumi.
โ Managing and Remediating Drift
- Automation: Auto-run
plan
in CI pipelines and notify teams if changes are detected. - Tagging: Use metadata to distinguish managed vs unmanaged resources.
- Audit Logs: Enable CloudTrail (AWS), Activity Logs (GCP), or Azure Monitor for change tracking.
- Guardrails: Prevent console access or enforce policies via SCPs, OPA, or Sentinel.
- Apply Fixes via Code: Always reconcile drift by updating infrastructure code, not by manual re-alignment.
๐งช Tools for Enforcement
driftctl
: Detects unmanaged and drifted resources from Terraform.tflint
,checkov
: Static checks to reduce drift-prone patterns.terraform state show
: Inspect real-time state of individual resources.
๐ซ Common Mistakes
- Ignoring drift warnings and applying Terraform blindly.
- Reverting changes in the console instead of IaC updates.
- Using
terraform taint
orstate rm
recklessly without backing up state.
๐ Real-World Insight
Drift is inevitable in dynamic cloud environments. Organizations that invest in continuous detection and reconciliation workflows improve consistency, compliance, and operational safety at scale.