autojanet/skills/terrashark/docs/architecture/backend-state-safety.md
Zoë cfec11bb46
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
fix: convert skill submodules to plain directories
stop-slop, taste-skill, terrashark had embedded .git dirs causing
Woodpecker clone to fail on submodule update.
2026-05-30 15:44:44 -07:00

87 lines
4.5 KiB
Markdown

# Backend State Safety
This guide covers backend-specific state safety for Terraform and OpenTofu. Use it when configuring a backend, migrating state, handling locks, or reviewing access to state storage.
## When This Guide Applies
Load this guidance when the backend is `s3`, `azurerm`, `gcs`, `remote`, `cloud`, `pg`, `consul`, or `local`, or when work mentions backend migration, state storage, locking, force-unlock, state backup, or restore.
## Why This Matters
Terraform/OpenTofu state is the source of truth for live resource identity and often contains sensitive values. Backend mistakes can leak secrets, orphan resources, disable locking, or make a routine refactor look like a destructive replacement.
## Backend Baseline
- Use remote state for every shared, CI, or production environment
- Require locking on every apply path
- Encrypt state at rest and in transit
- Enable state versioning or point-in-time recovery where the backend supports it
- Keep backend storage and lock primitives in a bootstrap root with a separate lifecycle
- Never manage the backend bucket/container/table from the same root that uses it as its active backend
- Keep backend credentials out of checked-in backend config; prefer workload identity or CI-provided partial backend config
## Backend-Specific Checks
| Backend | Required Checks |
|---|---|
| `s3` | Bucket versioning, encryption, public access block, narrow IAM, lock mechanism configured, state key split by environment/root |
| `azurerm` | Storage account encryption, blob soft delete/versioning where available, lease-based locking, private/network restrictions, narrow data-plane RBAC |
| `gcs` | Bucket versioning, uniform bucket-level access, encryption policy, narrow IAM, prefix split by environment/root |
| `remote` / `cloud` | Workspace boundary matches blast radius, state sharing is restricted, sensitive variables are marked, applies use approved execution mode |
| `pg` | TLS, database backups, least-privilege user, lock behavior verified, connection secrets kept out of code |
| `consul` | TLS, ACLs, snapshots/backups, highly available quorum, lock/session behavior verified |
| `local` | Solo prototype only; do not use for shared, CI, or production environments |
## Migration Guardrails
- Do not combine backend migration with unrelated resource changes
- Freeze applies for the affected state before migrating
- Pull and securely store a state backup before `init -migrate-state`; do not commit it
- Record current backend type, address/key, workspace, runtime version, and actor
- Migrate the lowest-risk environment first
- After migration, compare resource addresses before/after and run a no-op plan
- Keep the old backend retained and access-controlled until restore has been tested or the rollback window has passed
Use `init -migrate-state` when moving state between backends. Use `init -reconfigure` only when intentionally accepting the configured backend without migrating existing state.
## Lock Handling
- Treat a lock as a safety signal, not an inconvenience
- Before `force-unlock`, verify the lock holder, CI run, process, and timestamp
- Never recommend `force-unlock` while an apply may still be running
- Serialize applies for shared foundation, backend, identity, and network roots
## Access and Secret Handling
- Treat state readers as secret readers
- Avoid storing plan/state artifacts in public or broad-access CI logs
- If a secret entered state, rotate the secret and use the secret remediation playbook; masking output is not enough
- Keep backend read/write permissions separate when the platform supports it
## LLM Mistake Checklist
- Suggesting `local` backend for a team, CI, or production stack
- Creating backend storage inside the same root that uses it
- Omitting a lock strategy for a shared backend
- Treating encryption as protection from anyone who can read state
- Combining backend migration with broad resource refactors
- Recommending `force-unlock` without proving no apply is active
- Deleting old backend data immediately after migration
- Hard-coding backend credentials in HCL or checked-in config
## Validation Commands
Use the active runtime (`terraform` or `tofu`) consistently:
```bash
terraform version
terraform workspace show
terraform state pull > state-backup.json
terraform state list > state-before.txt
terraform init -migrate-state
terraform state list > state-after.txt
diff -u state-before.txt state-after.txt
terraform plan -detailed-exitcode
```
Store `state-backup.json` in a secure temporary location outside the repository and delete it only after rollback is no longer needed.