autojanet/skills/terrashark/docs/architecture/backend-state-safety.md
Zoë cfec11bb46
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
fix: convert skill submodules to plain directories
stop-slop, taste-skill, terrashark had embedded .git dirs causing
Woodpecker clone to fail on submodule update.
2026-05-30 15:44:44 -07:00

4.5 KiB

Backend State Safety

This guide covers backend-specific state safety for Terraform and OpenTofu. Use it when configuring a backend, migrating state, handling locks, or reviewing access to state storage.

When This Guide Applies

Load this guidance when the backend is s3, azurerm, gcs, remote, cloud, pg, consul, or local, or when work mentions backend migration, state storage, locking, force-unlock, state backup, or restore.

Why This Matters

Terraform/OpenTofu state is the source of truth for live resource identity and often contains sensitive values. Backend mistakes can leak secrets, orphan resources, disable locking, or make a routine refactor look like a destructive replacement.

Backend Baseline

  • Use remote state for every shared, CI, or production environment
  • Require locking on every apply path
  • Encrypt state at rest and in transit
  • Enable state versioning or point-in-time recovery where the backend supports it
  • Keep backend storage and lock primitives in a bootstrap root with a separate lifecycle
  • Never manage the backend bucket/container/table from the same root that uses it as its active backend
  • Keep backend credentials out of checked-in backend config; prefer workload identity or CI-provided partial backend config

Backend-Specific Checks

Backend Required Checks
s3 Bucket versioning, encryption, public access block, narrow IAM, lock mechanism configured, state key split by environment/root
azurerm Storage account encryption, blob soft delete/versioning where available, lease-based locking, private/network restrictions, narrow data-plane RBAC
gcs Bucket versioning, uniform bucket-level access, encryption policy, narrow IAM, prefix split by environment/root
remote / cloud Workspace boundary matches blast radius, state sharing is restricted, sensitive variables are marked, applies use approved execution mode
pg TLS, database backups, least-privilege user, lock behavior verified, connection secrets kept out of code
consul TLS, ACLs, snapshots/backups, highly available quorum, lock/session behavior verified
local Solo prototype only; do not use for shared, CI, or production environments

Migration Guardrails

  • Do not combine backend migration with unrelated resource changes
  • Freeze applies for the affected state before migrating
  • Pull and securely store a state backup before init -migrate-state; do not commit it
  • Record current backend type, address/key, workspace, runtime version, and actor
  • Migrate the lowest-risk environment first
  • After migration, compare resource addresses before/after and run a no-op plan
  • Keep the old backend retained and access-controlled until restore has been tested or the rollback window has passed

Use init -migrate-state when moving state between backends. Use init -reconfigure only when intentionally accepting the configured backend without migrating existing state.

Lock Handling

  • Treat a lock as a safety signal, not an inconvenience
  • Before force-unlock, verify the lock holder, CI run, process, and timestamp
  • Never recommend force-unlock while an apply may still be running
  • Serialize applies for shared foundation, backend, identity, and network roots

Access and Secret Handling

  • Treat state readers as secret readers
  • Avoid storing plan/state artifacts in public or broad-access CI logs
  • If a secret entered state, rotate the secret and use the secret remediation playbook; masking output is not enough
  • Keep backend read/write permissions separate when the platform supports it

LLM Mistake Checklist

  • Suggesting local backend for a team, CI, or production stack
  • Creating backend storage inside the same root that uses it
  • Omitting a lock strategy for a shared backend
  • Treating encryption as protection from anyone who can read state
  • Combining backend migration with broad resource refactors
  • Recommending force-unlock without proving no apply is active
  • Deleting old backend data immediately after migration
  • Hard-coding backend credentials in HCL or checked-in config

Validation Commands

Use the active runtime (terraform or tofu) consistently:

terraform version
terraform workspace show
terraform state pull > state-backup.json
terraform state list > state-before.txt
terraform init -migrate-state
terraform state list > state-after.txt
diff -u state-before.txt state-after.txt
terraform plan -detailed-exitcode

Store state-backup.json in a secure temporary location outside the repository and delete it only after rollback is no longer needed.