# Terraform Quick Ops and Troubleshooting

Fast command recall and common failure handling for Terraform and OpenTofu operations.

## Core Command Sequence

### Terraform

```bash
terraform fmt -check
terraform init
terraform validate
terraform plan -out=plan.bin
terraform show -json plan.bin > plan.json
```

### OpenTofu

```bash
tofu fmt -check
tofu init
tofu validate
tofu plan -out=plan.bin
tofu show -json plan.bin > plan.json
```

## Common Failures and Fixes

### CI Passes Locally but Fails in Runner

**Causes:**
- Mismatch in runtime/provider versions
- Missing lockfile updates
- Environment variables present locally but missing in CI

**Fix:**
- Pin runtime and providers
- Commit lockfile
- Make required env vars explicit in pipeline

### Large Unexpected Replacements in Plan

**Causes:**
- Unstable iteration keys
- Hidden rename without `moved` mapping
- Data source drift feeding identity fields

**Fix:**
- Stabilize keys
- Add `moved` blocks
- Separate identity from mutable attributes

### AWS RDS Identifier Validation Errors

**Symptoms:** `InvalidParameterValue` for `identifier` or `final_snapshot_identifier`, names include dots.

**Fix:** Normalize to lowercase letters, numbers, and hyphens:

```javascript
locals {
  rds_base = regexreplace(lower("${var.project}-prod"), "[^a-z0-9-]", "-")
}

resource "aws_db_instance" "main" {
  identifier                = local.rds_base
  final_snapshot_identifier = "${local.rds_base}-final"
}
```

### Apply Contention on Shared State

**Cause:** Concurrent pipelines targeting same backend key.

**Fix:**
- Serialize applies for that stack
- Use lock timeout and per-stack concurrency guard

### Tests Are Too Costly

**Fix:**
- Tag tests by risk (`fast`, `integration`, `destructive`)
- Run full suite nightly, risk-tier suite on PRs
- Auto-clean ephemeral infra with TTL tags

### State Lock Stuck

**Symptom:** `Error: Error acquiring the state lock`

**Fix:**

```bash
# Identify the lock holder from the error message (lock ID shown)
terraform force-unlock LOCK_ID
# OpenTofu equivalent:
tofu force-unlock LOCK_ID
```

Only force-unlock when you are certain no other apply is running. Check CI pipelines and team activity first.

### State Corruption or Lost State

**Fix:**
- Restore from versioned state backend (S3 versioning, GCS versioning)
- If no backup: re-import resources using `import` blocks
- Never manually edit state JSON unless absolutely no alternative and with peer review

```bash
# Pull current state for inspection
terraform state pull > state-backup.json
# List all tracked resources
terraform state list
```

### Backend Migration

When changing state backends (e.g., local to S3, or S3 to different bucket):

```bash
# Update backend config in code, then:
terraform init -migrate-state
```

- Always backup state before migration
- Verify resource count matches after migration
- Test plan shows no changes after migration

### Provider Authentication Failures in CI

**Symptom:** `Error: No valid credential sources found`

**Fix:**
- Verify environment variables are set in CI runner
- Prefer workload identity federation over static keys
- Check credential expiry for short-lived tokens
- Ensure CI runner IAM role/service account has required permissions

### `null_resource` vs `terraform_data`

Use `terraform_data` (TF 1.4+) instead of `null_resource` + `null` provider:

```javascript
# Prefer this (no extra provider needed):
resource "terraform_data" "bootstrap" {
  triggers_replace = [var.config_hash]

  provisioner "local-exec" {
    command = "bootstrap.sh"
  }
}
```