Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
stop-slop, taste-skill, terrashark had embedded .git dirs causing Woodpecker clone to fail on submodule update.
87 lines
4.5 KiB
Markdown
87 lines
4.5 KiB
Markdown
# Backend State Safety
|
|
|
|
This guide covers backend-specific state safety for Terraform and OpenTofu. Use it when configuring a backend, migrating state, handling locks, or reviewing access to state storage.
|
|
|
|
## When This Guide Applies
|
|
|
|
Load this guidance when the backend is `s3`, `azurerm`, `gcs`, `remote`, `cloud`, `pg`, `consul`, or `local`, or when work mentions backend migration, state storage, locking, force-unlock, state backup, or restore.
|
|
|
|
## Why This Matters
|
|
|
|
Terraform/OpenTofu state is the source of truth for live resource identity and often contains sensitive values. Backend mistakes can leak secrets, orphan resources, disable locking, or make a routine refactor look like a destructive replacement.
|
|
|
|
## Backend Baseline
|
|
|
|
- Use remote state for every shared, CI, or production environment
|
|
- Require locking on every apply path
|
|
- Encrypt state at rest and in transit
|
|
- Enable state versioning or point-in-time recovery where the backend supports it
|
|
- Keep backend storage and lock primitives in a bootstrap root with a separate lifecycle
|
|
- Never manage the backend bucket/container/table from the same root that uses it as its active backend
|
|
- Keep backend credentials out of checked-in backend config; prefer workload identity or CI-provided partial backend config
|
|
|
|
## Backend-Specific Checks
|
|
|
|
| Backend | Required Checks |
|
|
|---|---|
|
|
| `s3` | Bucket versioning, encryption, public access block, narrow IAM, lock mechanism configured, state key split by environment/root |
|
|
| `azurerm` | Storage account encryption, blob soft delete/versioning where available, lease-based locking, private/network restrictions, narrow data-plane RBAC |
|
|
| `gcs` | Bucket versioning, uniform bucket-level access, encryption policy, narrow IAM, prefix split by environment/root |
|
|
| `remote` / `cloud` | Workspace boundary matches blast radius, state sharing is restricted, sensitive variables are marked, applies use approved execution mode |
|
|
| `pg` | TLS, database backups, least-privilege user, lock behavior verified, connection secrets kept out of code |
|
|
| `consul` | TLS, ACLs, snapshots/backups, highly available quorum, lock/session behavior verified |
|
|
| `local` | Solo prototype only; do not use for shared, CI, or production environments |
|
|
|
|
## Migration Guardrails
|
|
|
|
- Do not combine backend migration with unrelated resource changes
|
|
- Freeze applies for the affected state before migrating
|
|
- Pull and securely store a state backup before `init -migrate-state`; do not commit it
|
|
- Record current backend type, address/key, workspace, runtime version, and actor
|
|
- Migrate the lowest-risk environment first
|
|
- After migration, compare resource addresses before/after and run a no-op plan
|
|
- Keep the old backend retained and access-controlled until restore has been tested or the rollback window has passed
|
|
|
|
Use `init -migrate-state` when moving state between backends. Use `init -reconfigure` only when intentionally accepting the configured backend without migrating existing state.
|
|
|
|
## Lock Handling
|
|
|
|
- Treat a lock as a safety signal, not an inconvenience
|
|
- Before `force-unlock`, verify the lock holder, CI run, process, and timestamp
|
|
- Never recommend `force-unlock` while an apply may still be running
|
|
- Serialize applies for shared foundation, backend, identity, and network roots
|
|
|
|
## Access and Secret Handling
|
|
|
|
- Treat state readers as secret readers
|
|
- Avoid storing plan/state artifacts in public or broad-access CI logs
|
|
- If a secret entered state, rotate the secret and use the secret remediation playbook; masking output is not enough
|
|
- Keep backend read/write permissions separate when the platform supports it
|
|
|
|
## LLM Mistake Checklist
|
|
|
|
- Suggesting `local` backend for a team, CI, or production stack
|
|
- Creating backend storage inside the same root that uses it
|
|
- Omitting a lock strategy for a shared backend
|
|
- Treating encryption as protection from anyone who can read state
|
|
- Combining backend migration with broad resource refactors
|
|
- Recommending `force-unlock` without proving no apply is active
|
|
- Deleting old backend data immediately after migration
|
|
- Hard-coding backend credentials in HCL or checked-in config
|
|
|
|
## Validation Commands
|
|
|
|
Use the active runtime (`terraform` or `tofu`) consistently:
|
|
|
|
```bash
|
|
terraform version
|
|
terraform workspace show
|
|
terraform state pull > state-backup.json
|
|
terraform state list > state-before.txt
|
|
terraform init -migrate-state
|
|
terraform state list > state-after.txt
|
|
diff -u state-before.txt state-after.txt
|
|
terraform plan -detailed-exitcode
|
|
```
|
|
|
|
Store `state-backup.json` in a secure temporary location outside the repository and delete it only after rollback is no longer needed.
|