# The 7-Step Terraform Skill Workflow

The core of TerraShark is a 7-step operational workflow defined in `SKILL.md`. Unlike traditional reference manuals that dump information and hope the AI uses it correctly, this workflow forces the AI to **diagnose before generating** — the single most important pattern for preventing Terraform hallucinations.

## Overview

```
Capture Context → Diagnose Failure Modes → Load References → Propose Fix → Generate Artifacts → Validate → Output Contract
```

## Step 1: Capture Execution Context

Before writing any code, the Terraform skill records:

- **Runtime**: `terraform` or `tofu` and exact version
- **Providers**: which cloud providers and their versions
- **Target platform**: AWS, Azure, GCP, etc.
- **State backend**: S3, GCS, Azure Blob, HCP Terraform, etc.
- **Execution path**: local CLI, CI, HCP Terraform/TFE, Atlantis
- **Environment criticality**: dev, shared, or production

If any of these are unknown, the skill states assumptions explicitly in the output contract.

**Why this matters**: A module targeting Terraform 1.1 cannot use `import` blocks (requires 1.5+). A CI pipeline needs different patterns than local CLI. Production requires approval gates that dev does not.

## Step 2: Diagnose Likely Failure Modes

The skill selects one or more failure modes based on the user's intent and risk level:

| Failure Mode | When Diagnosed |
|---|---|
| **Identity churn** | Refactors, collection changes, count/for_each decisions |
| **Secret exposure** | Credential handling, state access, CI artifact management |
| **Blast radius** | Stack design, environment isolation, state boundaries |
| **CI drift** | Pipeline setup, version management, plan/apply separation |
| **Compliance gate gaps** | Policy setup, framework requirements, approval workflows |

This step is what makes TerraShark fundamentally different from static reference skills. The AI must identify **what could go wrong** before it starts generating code.

## Step 3: Load Only Relevant References

Based on the diagnosed failure modes, the skill loads targeted reference files:

**Primary references** (one per failure mode):
- `references/identity-churn.md`
- `references/secret-exposure.md`
- `references/blast-radius.md`
- `references/ci-drift.md`
- `references/compliance-gates.md`

**Supplemental references** (loaded only when needed):
- Testing, CI delivery, module architecture, coding standards, migration playbooks, security governance, quick ops, examples, and more

This granularity means a query about secret handling never loads CI delivery patterns, and a query about module architecture never loads compliance gates. Only 1-2 small, focused files are loaded per query instead of one massive dump.

## Step 4: Propose Fix Path with Explicit Risk Controls

For each proposed fix, the skill includes:

- **Why this addresses the failure mode** — direct mapping from diagnosis to solution
- **What could still go wrong** — honest risk assessment
- **Guardrails** — tests, approvals, and rollback steps to mitigate remaining risk

This forces transparency. The AI cannot silently generate code that might cause damage without disclosing the risks.

## Step 5: Generate Implementation Artifacts

When applicable, the output includes:

- **HCL changes**: typed variables, stable keys, bounded version constraints
- **Migration blocks**: `moved` blocks, `import` strategy
- **CI pipeline updates**: plan/apply separation, artifact management, policy checks
- **Compliance controls**: approval gates, policy rules, evidence paths

## Step 6: Validate Before Finalize

The skill provides a command sequence tailored to the runtime and risk tier:

```bash
terraform fmt -check
terraform validate
terraform plan -out=plan.bin
terraform show -json plan.bin > plan.json
```

The skill **never recommends direct production apply** without a reviewed plan and approval gate.

## Step 7: Output Contract

Every response includes a structured contract:

| Section | Content |
|---|---|
| **Assumptions and version floor** | What was assumed about the environment |
| **Selected failure modes** | Which risks were diagnosed |
| **Chosen remediation and tradeoffs** | What was recommended and what was traded off |
| **Validation/test plan** | How to verify the output |
| **Rollback/recovery notes** | How to undo if something goes wrong |

This makes every output **auditable**. A reviewer can check assumptions, verify failure mode coverage, and validate the rollback path — all before applying anything.

## Why This Architecture Works

The 7-step workflow prevents the most common failure pattern in LLM-assisted infrastructure-as-code: **producing syntactically valid but operationally dangerous code**.

A static reference manual tells the AI what good Terraform looks like. The 7-step workflow tells the AI **how to think about Terraform problems**. This is the difference between giving someone a cookbook and giving them a diagnostic checklist.