fix: use library/ Harbor project, add skills, fix pipeline secrets
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- .woodpecker.yaml: image paths -> library/autojanet-{agent,dispatcher}
- .woodpecker.yaml: secret names RS_HARBOR_USER / RS_HARBOR_PASS (global)
- container/Dockerfile: restore COPY skills/, skills/ populated from opencode config
- skills/: 84 opencode skills bundled into image
- k8s/manifests: update image refs to library/
This commit is contained in:
parent
a3f25456e4
commit
cc74ad0bd0
232 changed files with 34556 additions and 19 deletions
|
|
@ -1,8 +1,8 @@
|
|||
---
|
||||
# AutoJanet CI Pipeline
|
||||
# Builds and pushes two images to Harbor:
|
||||
# - registry.ctz.fyi/autojanet/agent:latest (+ git SHA tag)
|
||||
# - registry.ctz.fyi/autojanet/dispatcher:latest (+ git SHA tag)
|
||||
# - registry.ctz.fyi/library/autojanet-agent:latest (+ git SHA tag)
|
||||
# - registry.ctz.fyi/library/autojanet-dispatcher:latest (+ git SHA tag)
|
||||
# Triggered on push to mainline or semver tags.
|
||||
|
||||
when:
|
||||
|
|
@ -17,17 +17,16 @@ steps:
|
|||
image: woodpeckerci/plugin-docker-buildx
|
||||
settings:
|
||||
registry: registry.ctz.fyi
|
||||
repo: registry.ctz.fyi/autojanet/agent
|
||||
repo: registry.ctz.fyi/library/autojanet-agent
|
||||
dockerfile: container/Dockerfile
|
||||
context: .
|
||||
username:
|
||||
from_secret: harbor_user
|
||||
from_secret: RS_HARBOR_USER
|
||||
password:
|
||||
from_secret: harbor_password
|
||||
from_secret: RS_HARBOR_PASS
|
||||
tags:
|
||||
- latest
|
||||
- "${CI_COMMIT_SHA:0:12}"
|
||||
cache_from: registry.ctz.fyi/autojanet/agent:latest
|
||||
platforms: linux/amd64
|
||||
when:
|
||||
- event: push
|
||||
|
|
@ -39,17 +38,16 @@ steps:
|
|||
image: woodpeckerci/plugin-docker-buildx
|
||||
settings:
|
||||
registry: registry.ctz.fyi
|
||||
repo: registry.ctz.fyi/autojanet/dispatcher
|
||||
repo: registry.ctz.fyi/library/autojanet-dispatcher
|
||||
dockerfile: container/Dockerfile.dispatcher
|
||||
context: .
|
||||
username:
|
||||
from_secret: harbor_user
|
||||
from_secret: RS_HARBOR_USER
|
||||
password:
|
||||
from_secret: harbor_password
|
||||
from_secret: RS_HARBOR_PASS
|
||||
tags:
|
||||
- latest
|
||||
- "${CI_COMMIT_SHA:0:12}"
|
||||
cache_from: registry.ctz.fyi/autojanet/dispatcher:latest
|
||||
platforms: linux/amd64
|
||||
when:
|
||||
- event: push
|
||||
|
|
@ -62,12 +60,12 @@ steps:
|
|||
commands:
|
||||
- trivy image --exit-code 1 --severity HIGH,CRITICAL
|
||||
--ignore-unfixed
|
||||
registry.ctz.fyi/autojanet/agent:${CI_COMMIT_SHA:0:12}
|
||||
registry.ctz.fyi/library/autojanet-agent:${CI_COMMIT_SHA:0:12}
|
||||
environment:
|
||||
TRIVY_USERNAME:
|
||||
from_secret: harbor_user
|
||||
from_secret: RS_HARBOR_USER
|
||||
TRIVY_PASSWORD:
|
||||
from_secret: harbor_password
|
||||
from_secret: RS_HARBOR_PASS
|
||||
when:
|
||||
- event: push
|
||||
branch: mainline
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@
|
|||
# Role is determined at runtime via AGENT_ROLE env var.
|
||||
#
|
||||
# Build:
|
||||
# docker build -t registry.ctz.fyi/autojanet/agent:latest .
|
||||
# docker build -t registry.ctz.fyi/library/autojanet-agent:latest .
|
||||
#
|
||||
# The image bundles:
|
||||
# - opencode CLI (Node.js)
|
||||
|
|
@ -64,7 +64,7 @@ COPY container/entrypoint.py /app/entrypoint.py
|
|||
# All agent definition files
|
||||
COPY agents/ /app/agents/
|
||||
|
||||
# Skills (read-only reference)
|
||||
# Skills from ~/.config/opencode/skills — copied into repo at skills/
|
||||
COPY skills/ /app/skills/
|
||||
|
||||
USER agent
|
||||
|
|
|
|||
|
|
@ -42,7 +42,7 @@ VIKUNJA_TODO_BUCKET_ID = int(os.environ.get("VIKUNJA_TODO_BUCKET_ID", "116"))
|
|||
VIKUNJA_IN_PROGRESS_BUCKET_ID = int(os.environ.get("VIKUNJA_IN_PROGRESS_BUCKET_ID", "117"))
|
||||
|
||||
K8S_NAMESPACE = os.environ.get("K8S_NAMESPACE", "autojanet")
|
||||
AGENT_IMAGE = os.environ.get("AGENT_IMAGE", "registry.ctz.fyi/autojanet/agent:latest")
|
||||
AGENT_IMAGE = os.environ.get("AGENT_IMAGE", "registry.ctz.fyi/library/autojanet-agent:latest")
|
||||
|
||||
VALID_ROLES = {
|
||||
"pm", "coder", "code-reviewer", "test-engineer", "devsecops", "secops",
|
||||
|
|
|
|||
|
|
@ -25,7 +25,7 @@ spec:
|
|||
restartPolicy: Never
|
||||
containers:
|
||||
- name: dispatcher
|
||||
image: registry.ctz.fyi/autojanet/dispatcher:latest
|
||||
image: registry.ctz.fyi/library/autojanet-dispatcher:latest
|
||||
imagePullPolicy: Always
|
||||
env:
|
||||
- name: OPENBAO_ADDR
|
||||
|
|
@ -51,7 +51,7 @@ spec:
|
|||
- name: K8S_NAMESPACE
|
||||
value: "autojanet"
|
||||
- name: AGENT_IMAGE
|
||||
value: "registry.ctz.fyi/autojanet/agent:latest"
|
||||
value: "registry.ctz.fyi/library/autojanet-agent:latest"
|
||||
resources:
|
||||
requests:
|
||||
cpu: "100m"
|
||||
|
|
|
|||
|
|
@ -32,7 +32,7 @@ spec:
|
|||
tolerations: []
|
||||
containers:
|
||||
- name: agent
|
||||
image: registry.ctz.fyi/autojanet/agent:latest
|
||||
image: registry.ctz.fyi/library/autojanet-agent:latest
|
||||
imagePullPolicy: Always
|
||||
env:
|
||||
- name: AGENT_ROLE
|
||||
|
|
|
|||
185
skills/adding-keycloak-sso/SKILL.md
Normal file
185
skills/adding-keycloak-sso/SKILL.md
Normal file
|
|
@ -0,0 +1,185 @@
|
|||
---
|
||||
name: adding-keycloak-sso
|
||||
description: Use when adding Keycloak SSO authentication to a service on the homelab cluster at ctz.fyi, whether via oauth2-proxy sidecar or native OIDC configuration.
|
||||
---
|
||||
|
||||
# Adding Keycloak SSO
|
||||
|
||||
## Overview
|
||||
|
||||
Two patterns depending on whether the app supports OIDC natively. Both use Keycloak at `sso.ctz.fyi`, realm `ctz`, with secrets stored in OpenBao.
|
||||
|
||||
## Pattern Selection
|
||||
|
||||
| App type | Pattern |
|
||||
|----------|---------|
|
||||
| No auth or basic auth only | **A: oauth2-proxy sidecar** |
|
||||
| Native OIDC/OAuth2 support (Grafana, Jellyfin, Open WebUI) | **B: Native OIDC** |
|
||||
| SPA (React/Vue/etc) | **B: Public PKCE client** (`publicClient: true`, no secret) |
|
||||
|
||||
**Gotcha:** If an app already uses keycloak-js internally, do NOT also add oauth2-proxy — you'll get double-auth. Pick one.
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Create Keycloak Client
|
||||
|
||||
```bash
|
||||
# Port-forward Keycloak
|
||||
kubectl port-forward -n keycloak svc/keycloak 8080:80 &
|
||||
|
||||
# Get admin password from OpenBao
|
||||
bao kv get secret/production/keycloak/keycloak-admin
|
||||
|
||||
# Get admin token
|
||||
TOKEN=$(curl -s http://localhost:8080/realms/master/protocol/openid-connect/token \
|
||||
-d "client_id=admin-cli&grant_type=password&username=admin&password=<PASSWORD>" \
|
||||
| jq -r .access_token)
|
||||
|
||||
# Create client
|
||||
curl -s -X POST http://localhost:8080/admin/realms/ctz/clients \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"clientId": "<service-name>",
|
||||
"enabled": true,
|
||||
"protocol": "openid-connect",
|
||||
"publicClient": false,
|
||||
"standardFlowEnabled": true,
|
||||
"directAccessGrantsEnabled": false,
|
||||
"redirectUris": ["https://<hostname>/oauth2/callback", "https://<hostname>/*"],
|
||||
"webOrigins": ["https://<hostname>"],
|
||||
"baseUrl": "https://<hostname>"
|
||||
}'
|
||||
|
||||
# Get client UUID, then fetch secret
|
||||
CLIENT_ID=$(curl -s http://localhost:8080/admin/realms/ctz/clients \
|
||||
-H "Authorization: Bearer $TOKEN" | jq -r '.[] | select(.clientId=="<service-name>") | .id')
|
||||
|
||||
CLIENT_SECRET=$(curl -s http://localhost:8080/admin/realms/ctz/clients/$CLIENT_ID/client-secret \
|
||||
-H "Authorization: Bearer $TOKEN" | jq -r .value)
|
||||
|
||||
kill %1 # Kill port-forward
|
||||
```
|
||||
|
||||
**Redirect URI must include BOTH** `/oauth2/callback` AND `/*` wildcard — missing wildcard causes `redirect_uri_mismatch` for SPAs using keycloak-js.
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Write Secrets to OpenBao
|
||||
|
||||
**Pattern A only — generate cookie secret first:**
|
||||
```bash
|
||||
COOKIE_SECRET=$(python3 -c "import os,base64; print(base64.urlsafe_b64encode(os.urandom(32)).decode())")
|
||||
bao kv put secret/production/<namespace>/<name>-oauth2proxy-secret \
|
||||
client-secret="$CLIENT_SECRET" \
|
||||
cookie-secret="$COOKIE_SECRET"
|
||||
```
|
||||
|
||||
**Pattern B:** Store whatever the app needs (client secret, etc.) under an appropriate path.
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Pattern A — oauth2-proxy Sidecar
|
||||
|
||||
### ExternalSecret
|
||||
|
||||
```yaml
|
||||
apiVersion: external-secrets.io/v1
|
||||
kind: ExternalSecret
|
||||
metadata:
|
||||
name: <name>-oauth2proxy-secret
|
||||
annotations:
|
||||
argocd.argoproj.io/sync-wave: "-1"
|
||||
spec:
|
||||
refreshInterval: 1h
|
||||
secretStoreRef:
|
||||
name: openbao
|
||||
kind: ClusterSecretStore
|
||||
target:
|
||||
name: <name>-oauth2proxy-secret
|
||||
creationPolicy: Owner
|
||||
data:
|
||||
- secretKey: client-secret
|
||||
remoteRef:
|
||||
key: secret/production/<namespace>/<name>-oauth2proxy-secret
|
||||
property: client-secret
|
||||
- secretKey: cookie-secret
|
||||
remoteRef:
|
||||
key: secret/production/<namespace>/<name>-oauth2proxy-secret
|
||||
property: cookie-secret
|
||||
```
|
||||
|
||||
### Deployment sidecar container
|
||||
|
||||
```yaml
|
||||
- name: oauth2-proxy
|
||||
image: quay.io/oauth2-proxy/oauth2-proxy:v7.7.1
|
||||
args:
|
||||
- --provider=oidc
|
||||
- --oidc-issuer-url=https://sso.ctz.fyi/realms/ctz
|
||||
- --client-id=<service-name>
|
||||
- --redirect-url=https://<hostname>/oauth2/callback
|
||||
- --email-domain=*
|
||||
- --upstream=http://localhost:<app-port>
|
||||
- --cookie-secure=true
|
||||
- --cookie-samesite=lax
|
||||
- --skip-provider-button=true
|
||||
- --pass-authorization-header=true
|
||||
- --pass-access-token=true
|
||||
- --set-xauthrequest=true
|
||||
- --http-address=0.0.0.0:4180
|
||||
env:
|
||||
- name: OAUTH2_PROXY_CLIENT_SECRET
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: <name>-oauth2proxy-secret
|
||||
key: client-secret
|
||||
- name: OAUTH2_PROXY_COOKIE_SECRET
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: <name>-oauth2proxy-secret
|
||||
key: cookie-secret
|
||||
ports:
|
||||
- containerPort: 4180
|
||||
```
|
||||
|
||||
### IngressRoute
|
||||
|
||||
Update the service port to `4180`. The app's own port no longer needs to be exposed externally.
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Pattern B — Native OIDC
|
||||
|
||||
Configure the app using:
|
||||
- **Issuer URL:** `https://sso.ctz.fyi/realms/ctz`
|
||||
- **Client ID:** `<service-name>`
|
||||
- **Client secret:** from OpenBao (via ExternalSecret or however the app ingests it)
|
||||
- **Callback/redirect URL:** whatever the app expects (configure in Keycloak `redirectUris`)
|
||||
|
||||
For SPAs: set `"publicClient": true` in client creation, omit secret entirely.
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Deploy and Verify
|
||||
|
||||
```bash
|
||||
git add -A && git commit -m "feat(<service>): add Keycloak SSO"
|
||||
git push
|
||||
# Watch ArgoCD sync
|
||||
```
|
||||
|
||||
Test the login flow manually. Check that:
|
||||
- Unauthenticated requests redirect to Keycloak
|
||||
- Successful login lands back on the app
|
||||
- No double-auth prompts
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
| Mistake | Fix |
|
||||
|---------|-----|
|
||||
| Missing `/*` wildcard in redirectUris | Add `"https://<hostname>/*"` alongside the callback URI |
|
||||
| Cookie secret wrong length | Must be exactly 32 bytes → use the `python3` command above |
|
||||
| Double-auth on apps with built-in keycloak-js | Remove app's internal auth OR remove oauth2-proxy, not both |
|
||||
| IngressRoute still pointing at app port | Update to port `4180` for Pattern A |
|
||||
| `directAccessGrantsEnabled: true` | Set to `false` — resource owner password grant is not needed |
|
||||
25
skills/ansible-convert/README.md
Normal file
25
skills/ansible-convert/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
128
skills/ansible-convert/SKILL.md
Normal file
128
skills/ansible-convert/SKILL.md
Normal file
|
|
@ -0,0 +1,128 @@
|
|||
---
|
||||
name: ansible-convert
|
||||
description: Use when converting shell scripts to Ansible playbooks. Use when migrating bash automation, manual procedures, or Dockerfiles to idempotent Ansible tasks.
|
||||
---
|
||||
|
||||
# Shell to Ansible Conversion
|
||||
|
||||
## Overview
|
||||
|
||||
Shell scripts execute commands imperatively; Ansible declares desired state. Conversion means rethinking operations as state declarations, not translating commands line-by-line. The goal is idempotency: running twice produces identical results.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Converting existing shell scripts to playbooks
|
||||
- Migrating manual server setup procedures
|
||||
- Replacing bash automation with Ansible
|
||||
- Converting Dockerfile RUN commands
|
||||
|
||||
## Core Principle
|
||||
|
||||
**Don't wrap shell commands in Ansible's `shell` module.** Find the module that achieves the same end state declaratively.
|
||||
|
||||
```bash
|
||||
# Shell: imperative
|
||||
mkdir -p /opt/app
|
||||
chown app:app /opt/app
|
||||
```
|
||||
|
||||
```yaml
|
||||
# Ansible: declarative
|
||||
- ansible.builtin.file:
|
||||
path: /opt/app
|
||||
state: directory
|
||||
owner: app
|
||||
group: app
|
||||
mode: '0755'
|
||||
```
|
||||
|
||||
## Conversion Table
|
||||
|
||||
| Shell Command | Ansible Module | Notes |
|
||||
|---------------|----------------|-------|
|
||||
| `mkdir -p` | `ansible.builtin.file` | `state: directory` |
|
||||
| `cp` | `ansible.builtin.copy` | Static files |
|
||||
| `cp` with variables | `ansible.builtin.template` | Use `.j2` templates |
|
||||
| `rm -rf` | `ansible.builtin.file` | `state: absent` |
|
||||
| `ln -s` | `ansible.builtin.file` | `state: link` |
|
||||
| `chmod`, `chown` | Include in file/copy/template | `mode`, `owner`, `group` params |
|
||||
| `apt-get install` | `ansible.builtin.apt` | `update_cache: yes` |
|
||||
| `yum install` | `ansible.builtin.yum` | Or use `package` for cross-platform |
|
||||
| `pip install` | `ansible.builtin.pip` | Specify `executable` if needed |
|
||||
| `useradd` | `ansible.builtin.user` | Handles home, shell, groups |
|
||||
| `systemctl start` | `ansible.builtin.service` | `state: started` |
|
||||
| `systemctl enable` | `ansible.builtin.service` | `enabled: yes` |
|
||||
| `curl -O` | `ansible.builtin.get_url` | Use `checksum` for verification |
|
||||
| `tar -xzf` | `ansible.builtin.unarchive` | `remote_src: yes` if already on target |
|
||||
| `echo >> file` | `ansible.builtin.lineinfile` | Ensures line exists |
|
||||
| `cat > file` | `ansible.builtin.copy` | `content:` parameter |
|
||||
|
||||
## Control Flow Conversion
|
||||
|
||||
### Conditionals
|
||||
|
||||
```bash
|
||||
# Shell
|
||||
if [ -f /etc/debian_version ]; then
|
||||
apt-get install nginx
|
||||
fi
|
||||
```
|
||||
|
||||
```yaml
|
||||
# Ansible
|
||||
- ansible.builtin.apt:
|
||||
name: nginx
|
||||
when: ansible_os_family == "Debian"
|
||||
```
|
||||
|
||||
### Loops
|
||||
|
||||
```bash
|
||||
# Shell
|
||||
for user in alice bob; do
|
||||
useradd $user
|
||||
done
|
||||
```
|
||||
|
||||
```yaml
|
||||
# Ansible
|
||||
- ansible.builtin.user:
|
||||
name: "{{ item }}"
|
||||
loop:
|
||||
- alice
|
||||
- bob
|
||||
```
|
||||
|
||||
## When Shell Module is Necessary
|
||||
|
||||
Use `command` or `shell` only when no module exists. Always add proper change detection:
|
||||
|
||||
```yaml
|
||||
- name: Run custom installer
|
||||
ansible.builtin.shell: /opt/app/install.sh
|
||||
args:
|
||||
creates: /opt/app/.installed # Skip if file exists
|
||||
register: install_result
|
||||
changed_when: "'Installed' in install_result.stdout"
|
||||
failed_when: install_result.rc != 0 and 'already installed' not in install_result.stderr
|
||||
```
|
||||
|
||||
## Variable Extraction
|
||||
|
||||
Identify values to parameterize:
|
||||
- Version numbers → `app_version: "1.2.3"`
|
||||
- Paths → `app_dir: "/opt/app"`
|
||||
- Usernames → `app_user: "appuser"`
|
||||
- Ports → `app_port: 8080`
|
||||
|
||||
Place in `defaults/main.yml` for easy override.
|
||||
|
||||
## Conversion Workflow
|
||||
|
||||
1. Read entire script, identify major phases
|
||||
2. Map each command to Ansible module
|
||||
3. Extract hardcoded values as variables
|
||||
4. Order tasks for dependencies (dirs before files)
|
||||
5. Add handlers for service restarts
|
||||
6. Test with `--check --diff`
|
||||
7. Verify idempotency: second run shows no changes
|
||||
25
skills/ansible-debug/README.md
Normal file
25
skills/ansible-debug/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
137
skills/ansible-debug/SKILL.md
Normal file
137
skills/ansible-debug/SKILL.md
Normal file
|
|
@ -0,0 +1,137 @@
|
|||
---
|
||||
name: ansible-debug
|
||||
description: Use when playbooks fail with UNREACHABLE, permission denied, MODULE FAILURE, or undefined variable errors. Use when SSH connections fail or sudo password is missing.
|
||||
---
|
||||
|
||||
# Ansible Debugging
|
||||
|
||||
## Overview
|
||||
|
||||
Ansible errors fall into four categories: connection, authentication, module, and syntax. Systematic diagnosis starts with identifying the category, then isolating the specific cause.
|
||||
|
||||
## When to Use
|
||||
|
||||
- UNREACHABLE errors (SSH/network issues)
|
||||
- Permission denied or sudo password errors
|
||||
- MODULE FAILURE messages
|
||||
- Undefined variable errors
|
||||
- Template rendering failures
|
||||
- Slow playbook execution
|
||||
|
||||
## Error Categories
|
||||
|
||||
| Category | Symptoms | First Check |
|
||||
|----------|----------|-------------|
|
||||
| Connection | UNREACHABLE | `ssh -v user@host` |
|
||||
| Authentication | Permission denied, Missing sudo password | SSH keys, sudoers config |
|
||||
| Module | MODULE FAILURE | Module parameters, target state |
|
||||
| Syntax | YAML parse error | Line number in error, indentation |
|
||||
|
||||
## Quick Diagnosis
|
||||
|
||||
### Connection Errors
|
||||
|
||||
```bash
|
||||
# Test SSH directly
|
||||
ssh -v -i /path/to/key user@hostname
|
||||
|
||||
# Test port connectivity
|
||||
nc -zv hostname 22
|
||||
|
||||
# Verify inventory parsing
|
||||
ansible-inventory --host hostname
|
||||
```
|
||||
|
||||
**Common causes:**
|
||||
- Wrong IP/hostname in inventory
|
||||
- Firewall blocking port 22
|
||||
- SSH key permissions (must be 600)
|
||||
|
||||
### Authentication Errors
|
||||
|
||||
```bash
|
||||
# Test with explicit options
|
||||
ansible hostname -m ping -u user --private-key /path/to/key
|
||||
|
||||
# For sudo password issues, either:
|
||||
ansible-playbook playbook.yml --ask-become-pass
|
||||
# Or configure NOPASSWD in /etc/sudoers
|
||||
```
|
||||
|
||||
### Module Errors
|
||||
|
||||
```bash
|
||||
# Check module documentation
|
||||
ansible-doc ansible.builtin.copy
|
||||
|
||||
# Verify module parameters match your Ansible version
|
||||
ansible --version
|
||||
```
|
||||
|
||||
### Variable Errors
|
||||
|
||||
```yaml
|
||||
# Use default filter for optional variables
|
||||
{{ my_var | default('fallback') }}
|
||||
|
||||
# Debug variable values
|
||||
- ansible.builtin.debug:
|
||||
var: problematic_variable
|
||||
```
|
||||
|
||||
## Verbosity Levels
|
||||
|
||||
| Flag | Shows |
|
||||
|------|-------|
|
||||
| `-v` | Task results |
|
||||
| `-vv` | Task input parameters |
|
||||
| `-vvv` | SSH connection details |
|
||||
| `-vvvv` | Full plugin internals |
|
||||
|
||||
Start with `-v`, increase only if needed.
|
||||
|
||||
## Debugging Commands
|
||||
|
||||
```bash
|
||||
# Syntax check only
|
||||
ansible-playbook --syntax-check playbook.yml
|
||||
|
||||
# Dry run
|
||||
ansible-playbook --check playbook.yml
|
||||
|
||||
# Step through tasks
|
||||
ansible-playbook --step playbook.yml
|
||||
|
||||
# Start at specific task
|
||||
ansible-playbook --start-at-task "Task Name" playbook.yml
|
||||
|
||||
# Limit to specific host
|
||||
ansible-playbook --limit hostname playbook.yml
|
||||
```
|
||||
|
||||
## Common Error Patterns
|
||||
|
||||
| Error | Cause | Fix |
|
||||
|-------|-------|-----|
|
||||
| `Permission denied (publickey)` | SSH key not accepted | Check key permissions, verify authorized_keys |
|
||||
| `Missing sudo password` | become=true without password | Use `--ask-become-pass` or configure NOPASSWD |
|
||||
| `No such file or directory` | Path doesn't exist | Create parent directories first |
|
||||
| `Unable to lock` (apt/yum) | Package manager locked | Wait for other process, remove stale lock |
|
||||
| `undefined variable` | Variable not defined | Check spelling, use `default()` filter |
|
||||
|
||||
## Performance Debugging
|
||||
|
||||
```ini
|
||||
# ansible.cfg
|
||||
[defaults]
|
||||
callback_whitelist = profile_tasks # Show task timing
|
||||
|
||||
[ssh_connection]
|
||||
pipelining = True # Faster SSH
|
||||
```
|
||||
|
||||
```yaml
|
||||
# Skip fact gathering if not needed
|
||||
- hosts: all
|
||||
gather_facts: no
|
||||
```
|
||||
25
skills/ansible-interactive/README.md
Normal file
25
skills/ansible-interactive/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
130
skills/ansible-interactive/SKILL.md
Normal file
130
skills/ansible-interactive/SKILL.md
Normal file
|
|
@ -0,0 +1,130 @@
|
|||
---
|
||||
name: ansible-interactive
|
||||
description: Use when guiding someone through Ansible setup step-by-step. Use when starting a new Ansible project from scratch. Use when teaching Ansible through hands-on development.
|
||||
---
|
||||
|
||||
# Interactive Ansible Development
|
||||
|
||||
## Overview
|
||||
|
||||
Interactive development builds automation incrementally with continuous validation. Each component is tested before adding the next. This catches errors early when they're easy to diagnose.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Setting up Ansible for a new environment
|
||||
- Teaching someone Ansible hands-on
|
||||
- Building playbooks incrementally with validation
|
||||
- Troubleshooting connectivity before automation
|
||||
|
||||
## Development Phases
|
||||
|
||||
### Phase 1: Environment Analysis
|
||||
|
||||
Gather before writing any code:
|
||||
|
||||
| Question | Why It Matters |
|
||||
|----------|----------------|
|
||||
| How many servers? | Affects inventory organization |
|
||||
| IP addresses/hostnames? | Required for inventory |
|
||||
| SSH user and key location? | Connection configuration |
|
||||
| Password or key auth? | Determines SSH setup |
|
||||
| Sudo with or without password? | Privilege escalation config |
|
||||
| Server roles (web, db, app)? | Inventory grouping |
|
||||
| Operating systems? | Module selection (apt vs yum) |
|
||||
|
||||
Verify Ansible is installed: `ansible --version`
|
||||
|
||||
### Phase 2: Project Setup
|
||||
|
||||
Create minimal structure:
|
||||
|
||||
```bash
|
||||
mkdir ansible-project && cd ansible-project
|
||||
```
|
||||
|
||||
**ansible.cfg:**
|
||||
```ini
|
||||
[defaults]
|
||||
inventory = ./inventory
|
||||
host_key_checking = False
|
||||
stdout_callback = yaml
|
||||
|
||||
[privilege_escalation]
|
||||
become = True
|
||||
become_method = sudo
|
||||
```
|
||||
|
||||
**inventory:**
|
||||
```ini
|
||||
[webservers]
|
||||
web1 ansible_host=192.168.1.10 ansible_user=admin ansible_ssh_private_key_file=~/.ssh/id_rsa
|
||||
|
||||
[dbservers]
|
||||
db1 ansible_host=192.168.1.20 ansible_user=admin ansible_ssh_private_key_file=~/.ssh/id_rsa
|
||||
```
|
||||
|
||||
### Phase 3: Connectivity Test
|
||||
|
||||
**Always test before writing playbooks:**
|
||||
|
||||
```bash
|
||||
ansible all -m ping
|
||||
```
|
||||
|
||||
| Result | Action |
|
||||
|--------|--------|
|
||||
| SUCCESS | Proceed to playbooks |
|
||||
| UNREACHABLE | Check `ssh -v user@host` |
|
||||
| Permission denied | Verify key path, permissions (600) |
|
||||
| Sudo password required | Add `--ask-become-pass` or configure NOPASSWD |
|
||||
|
||||
### Phase 4: Incremental Playbook Development
|
||||
|
||||
Start simple, add one task at a time:
|
||||
|
||||
```yaml
|
||||
# playbook.yml - start with facts
|
||||
---
|
||||
- hosts: all
|
||||
tasks:
|
||||
- name: Show OS info
|
||||
ansible.builtin.debug:
|
||||
msg: "{{ ansible_distribution }} {{ ansible_distribution_version }}"
|
||||
```
|
||||
|
||||
Run: `ansible-playbook playbook.yml`
|
||||
|
||||
Then add tasks one by one, testing after each:
|
||||
|
||||
```yaml
|
||||
- name: Ensure nginx installed
|
||||
ansible.builtin.package:
|
||||
name: nginx
|
||||
state: present
|
||||
```
|
||||
|
||||
Run again. Fix any errors before adding more.
|
||||
|
||||
### Phase 5: Validation Cycle
|
||||
|
||||
After each change:
|
||||
|
||||
1. `ansible-playbook --syntax-check playbook.yml`
|
||||
2. `ansible-playbook --check --diff playbook.yml`
|
||||
3. `ansible-playbook playbook.yml`
|
||||
4. Run again—verify `changed=0` (idempotency)
|
||||
|
||||
## Red Flags - Stop and Debug
|
||||
|
||||
- Adding multiple untested tasks at once
|
||||
- Skipping `--check` before real runs
|
||||
- Ignoring "changed" on second run
|
||||
- Not testing SSH before writing playbooks
|
||||
|
||||
## Communication Pattern
|
||||
|
||||
When guiding users:
|
||||
- Explain what will happen before running commands
|
||||
- After completion, summarize what was done
|
||||
- When multiple approaches exist, present options with tradeoffs
|
||||
- Acknowledge progress at milestones
|
||||
25
skills/ansible-playbook/README.md
Normal file
25
skills/ansible-playbook/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
123
skills/ansible-playbook/SKILL.md
Normal file
123
skills/ansible-playbook/SKILL.md
Normal file
|
|
@ -0,0 +1,123 @@
|
|||
---
|
||||
name: ansible-playbook
|
||||
description: Use when creating playbooks, roles, or inventory files. Use when automating infrastructure with Ansible. Use when encountering YAML syntax errors, module failures, or variable precedence issues.
|
||||
---
|
||||
|
||||
# Ansible Playbook Development
|
||||
|
||||
## Overview
|
||||
|
||||
Ansible playbooks declare desired system state rather than imperative commands. The core principle is idempotency: running a playbook multiple times produces the same result without unintended changes.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Creating new playbooks or roles
|
||||
- Writing inventory files
|
||||
- Debugging YAML syntax errors
|
||||
- Troubleshooting module parameter issues
|
||||
- Understanding variable precedence
|
||||
- Converting shell scripts to Ansible
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Project Structure
|
||||
|
||||
```
|
||||
project/
|
||||
├── ansible.cfg # Configuration
|
||||
├── inventory # Host definitions
|
||||
├── group_vars/ # Group variables
|
||||
├── host_vars/ # Host-specific vars
|
||||
├── roles/ # Reusable roles
|
||||
└── playbooks/ # Playbook files
|
||||
```
|
||||
|
||||
### Essential ansible.cfg
|
||||
|
||||
```ini
|
||||
[defaults]
|
||||
inventory = ./inventory
|
||||
roles_path = ./roles
|
||||
host_key_checking = False
|
||||
stdout_callback = yaml
|
||||
|
||||
[privilege_escalation]
|
||||
become = True
|
||||
become_method = sudo
|
||||
```
|
||||
|
||||
### Module Patterns
|
||||
|
||||
| Operation | Module | Key Parameters |
|
||||
|-----------|--------|----------------|
|
||||
| Create directory | `ansible.builtin.file` | `state: directory`, `mode`, `owner` |
|
||||
| Copy file | `ansible.builtin.copy` | `src`, `dest`, `mode` |
|
||||
| Template | `ansible.builtin.template` | `src`, `dest`, variables in `.j2` |
|
||||
| Install package | `ansible.builtin.package` | `name`, `state: present` |
|
||||
| Manage service | `ansible.builtin.service` | `name`, `state`, `enabled` |
|
||||
| Run command | `ansible.builtin.command` | `cmd`, register result, set `changed_when` |
|
||||
|
||||
### Variable Precedence (lowest to highest)
|
||||
|
||||
1. Role defaults (`defaults/main.yml`)
|
||||
2. Inventory group_vars
|
||||
3. Inventory host_vars
|
||||
4. Playbook vars
|
||||
5. Role vars (`vars/main.yml`)
|
||||
6. Task vars
|
||||
7. Extra vars (`-e`)
|
||||
|
||||
### Handlers
|
||||
|
||||
```yaml
|
||||
tasks:
|
||||
- name: Update config
|
||||
ansible.builtin.template:
|
||||
src: app.conf.j2
|
||||
dest: /etc/app.conf
|
||||
notify: Restart app
|
||||
|
||||
handlers:
|
||||
- name: Restart app
|
||||
ansible.builtin.service:
|
||||
name: app
|
||||
state: restarted
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
```yaml
|
||||
- block:
|
||||
- name: Risky operation
|
||||
ansible.builtin.command: /opt/app/upgrade.sh
|
||||
rescue:
|
||||
- name: Handle failure
|
||||
ansible.builtin.debug:
|
||||
msg: "Upgrade failed, rolling back"
|
||||
always:
|
||||
- name: Cleanup
|
||||
ansible.builtin.file:
|
||||
path: /tmp/upgrade.lock
|
||||
state: absent
|
||||
```
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
| Mistake | Fix |
|
||||
|---------|-----|
|
||||
| Using short module names | Always use FQCN: `ansible.builtin.copy` not `copy` |
|
||||
| Hardcoded values | Extract to variables in `defaults/main.yml` |
|
||||
| Missing `changed_when` on commands | Add `changed_when: "'created' in result.stdout"` |
|
||||
| Forgetting handler flush | Use `meta: flush_handlers` when needed before dependent tasks |
|
||||
| YAML indentation errors | Use 2 spaces, never tabs |
|
||||
| Colon in unquoted string | Quote values containing `: ` |
|
||||
|
||||
## Verification Commands
|
||||
|
||||
```bash
|
||||
ansible-playbook --syntax-check playbook.yml # Check YAML
|
||||
ansible-playbook --check playbook.yml # Dry run
|
||||
ansible-playbook --check --diff playbook.yml # Show file changes
|
||||
ansible-inventory --list # Verify inventory
|
||||
ansible-inventory --host hostname # Check host vars
|
||||
```
|
||||
25
skills/architecture-decision-records/README.md
Normal file
25
skills/architecture-decision-records/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
444
skills/architecture-decision-records/SKILL.md
Normal file
444
skills/architecture-decision-records/SKILL.md
Normal file
|
|
@ -0,0 +1,444 @@
|
|||
---
|
||||
name: architecture-decision-records
|
||||
description: "Write and maintain Architecture Decision Records (ADRs) following best practices for technical decision documentation. Use when documenting significant technical decisions, reviewing past architect..."
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Architecture Decision Records
|
||||
|
||||
Comprehensive patterns for creating, maintaining, and managing Architecture Decision Records (ADRs) that capture the context and rationale behind significant technical decisions.
|
||||
|
||||
## Use this skill when
|
||||
|
||||
- Making significant architectural decisions
|
||||
- Documenting technology choices
|
||||
- Recording design trade-offs
|
||||
- Onboarding new team members
|
||||
- Reviewing historical decisions
|
||||
- Establishing decision-making processes
|
||||
|
||||
## Do not use this skill when
|
||||
|
||||
- You only need to document small implementation details
|
||||
- The change is a minor patch or routine maintenance
|
||||
- There is no architectural decision to capture
|
||||
|
||||
## Instructions
|
||||
|
||||
1. Capture the decision context, constraints, and drivers.
|
||||
2. Document considered options with tradeoffs.
|
||||
3. Record the decision, rationale, and consequences.
|
||||
4. Link related ADRs and update status over time.
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### 1. What is an ADR?
|
||||
|
||||
An Architecture Decision Record captures:
|
||||
- **Context**: Why we needed to make a decision
|
||||
- **Decision**: What we decided
|
||||
- **Consequences**: What happens as a result
|
||||
|
||||
### 2. When to Write an ADR
|
||||
|
||||
| Write ADR | Skip ADR |
|
||||
|-----------|----------|
|
||||
| New framework adoption | Minor version upgrades |
|
||||
| Database technology choice | Bug fixes |
|
||||
| API design patterns | Implementation details |
|
||||
| Security architecture | Routine maintenance |
|
||||
| Integration patterns | Configuration changes |
|
||||
|
||||
### 3. ADR Lifecycle
|
||||
|
||||
```
|
||||
Proposed → Accepted → Deprecated → Superseded
|
||||
↓
|
||||
Rejected
|
||||
```
|
||||
|
||||
## Templates
|
||||
|
||||
### Template 1: Standard ADR (MADR Format)
|
||||
|
||||
```markdown
|
||||
# ADR-0001: Use PostgreSQL as Primary Database
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
We need to select a primary database for our new e-commerce platform. The system
|
||||
will handle:
|
||||
- ~10,000 concurrent users
|
||||
- Complex product catalog with hierarchical categories
|
||||
- Transaction processing for orders and payments
|
||||
- Full-text search for products
|
||||
- Geospatial queries for store locator
|
||||
|
||||
The team has experience with MySQL, PostgreSQL, and MongoDB. We need ACID
|
||||
compliance for financial transactions.
|
||||
|
||||
## Decision Drivers
|
||||
|
||||
* **Must have ACID compliance** for payment processing
|
||||
* **Must support complex queries** for reporting
|
||||
* **Should support full-text search** to reduce infrastructure complexity
|
||||
* **Should have good JSON support** for flexible product attributes
|
||||
* **Team familiarity** reduces onboarding time
|
||||
|
||||
## Considered Options
|
||||
|
||||
### Option 1: PostgreSQL
|
||||
- **Pros**: ACID compliant, excellent JSON support (JSONB), built-in full-text
|
||||
search, PostGIS for geospatial, team has experience
|
||||
- **Cons**: Slightly more complex replication setup than MySQL
|
||||
|
||||
### Option 2: MySQL
|
||||
- **Pros**: Very familiar to team, simple replication, large community
|
||||
- **Cons**: Weaker JSON support, no built-in full-text search (need
|
||||
Elasticsearch), no geospatial without extensions
|
||||
|
||||
### Option 3: MongoDB
|
||||
- **Pros**: Flexible schema, native JSON, horizontal scaling
|
||||
- **Cons**: No ACID for multi-document transactions (at decision time),
|
||||
team has limited experience, requires schema design discipline
|
||||
|
||||
## Decision
|
||||
|
||||
We will use **PostgreSQL 15** as our primary database.
|
||||
|
||||
## Rationale
|
||||
|
||||
PostgreSQL provides the best balance of:
|
||||
1. **ACID compliance** essential for e-commerce transactions
|
||||
2. **Built-in capabilities** (full-text search, JSONB, PostGIS) reduce
|
||||
infrastructure complexity
|
||||
3. **Team familiarity** with SQL databases reduces learning curve
|
||||
4. **Mature ecosystem** with excellent tooling and community support
|
||||
|
||||
The slight complexity in replication is outweighed by the reduction in
|
||||
additional services (no separate Elasticsearch needed).
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- Single database handles transactions, search, and geospatial queries
|
||||
- Reduced operational complexity (fewer services to manage)
|
||||
- Strong consistency guarantees for financial data
|
||||
- Team can leverage existing SQL expertise
|
||||
|
||||
### Negative
|
||||
- Need to learn PostgreSQL-specific features (JSONB, full-text search syntax)
|
||||
- Vertical scaling limits may require read replicas sooner
|
||||
- Some team members need PostgreSQL-specific training
|
||||
|
||||
### Risks
|
||||
- Full-text search may not scale as well as dedicated search engines
|
||||
- Mitigation: Design for potential Elasticsearch addition if needed
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
- Use JSONB for flexible product attributes
|
||||
- Implement connection pooling with PgBouncer
|
||||
- Set up streaming replication for read replicas
|
||||
- Use pg_trgm extension for fuzzy search
|
||||
|
||||
## Related Decisions
|
||||
|
||||
- ADR-0002: Caching Strategy (Redis) - complements database choice
|
||||
- ADR-0005: Search Architecture - may supersede if Elasticsearch needed
|
||||
|
||||
## References
|
||||
|
||||
- [PostgreSQL JSON Documentation](https://www.postgresql.org/docs/current/datatype-json.html)
|
||||
- [PostgreSQL Full Text Search](https://www.postgresql.org/docs/current/textsearch.html)
|
||||
- Internal: Performance benchmarks in `/docs/benchmarks/database-comparison.md`
|
||||
```
|
||||
|
||||
### Template 2: Lightweight ADR
|
||||
|
||||
```markdown
|
||||
# ADR-0012: Adopt TypeScript for Frontend Development
|
||||
|
||||
**Status**: Accepted
|
||||
**Date**: 2024-01-15
|
||||
**Deciders**: @alice, @bob, @charlie
|
||||
|
||||
## Context
|
||||
|
||||
Our React codebase has grown to 50+ components with increasing bug reports
|
||||
related to prop type mismatches and undefined errors. PropTypes provide
|
||||
runtime-only checking.
|
||||
|
||||
## Decision
|
||||
|
||||
Adopt TypeScript for all new frontend code. Migrate existing code incrementally.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Good**: Catch type errors at compile time, better IDE support, self-documenting
|
||||
code.
|
||||
|
||||
**Bad**: Learning curve for team, initial slowdown, build complexity increase.
|
||||
|
||||
**Mitigations**: TypeScript training sessions, allow gradual adoption with
|
||||
`allowJs: true`.
|
||||
```
|
||||
|
||||
### Template 3: Y-Statement Format
|
||||
|
||||
```markdown
|
||||
# ADR-0015: API Gateway Selection
|
||||
|
||||
In the context of **building a microservices architecture**,
|
||||
facing **the need for centralized API management, authentication, and rate limiting**,
|
||||
we decided for **Kong Gateway**
|
||||
and against **AWS API Gateway and custom Nginx solution**,
|
||||
to achieve **vendor independence, plugin extensibility, and team familiarity with Lua**,
|
||||
accepting that **we need to manage Kong infrastructure ourselves**.
|
||||
```
|
||||
|
||||
### Template 4: ADR for Deprecation
|
||||
|
||||
```markdown
|
||||
# ADR-0020: Deprecate MongoDB in Favor of PostgreSQL
|
||||
|
||||
## Status
|
||||
|
||||
Accepted (Supersedes ADR-0003)
|
||||
|
||||
## Context
|
||||
|
||||
ADR-0003 (2021) chose MongoDB for user profile storage due to schema flexibility
|
||||
needs. Since then:
|
||||
- MongoDB's multi-document transactions remain problematic for our use case
|
||||
- Our schema has stabilized and rarely changes
|
||||
- We now have PostgreSQL expertise from other services
|
||||
- Maintaining two databases increases operational burden
|
||||
|
||||
## Decision
|
||||
|
||||
Deprecate MongoDB and migrate user profiles to PostgreSQL.
|
||||
|
||||
## Migration Plan
|
||||
|
||||
1. **Phase 1** (Week 1-2): Create PostgreSQL schema, dual-write enabled
|
||||
2. **Phase 2** (Week 3-4): Backfill historical data, validate consistency
|
||||
3. **Phase 3** (Week 5): Switch reads to PostgreSQL, monitor
|
||||
4. **Phase 4** (Week 6): Remove MongoDB writes, decommission
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- Single database technology reduces operational complexity
|
||||
- ACID transactions for user data
|
||||
- Team can focus PostgreSQL expertise
|
||||
|
||||
### Negative
|
||||
- Migration effort (~4 weeks)
|
||||
- Risk of data issues during migration
|
||||
- Lose some schema flexibility
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
Document from ADR-0003 experience:
|
||||
- Schema flexibility benefits were overestimated
|
||||
- Operational cost of multiple databases was underestimated
|
||||
- Consider long-term maintenance in technology decisions
|
||||
```
|
||||
|
||||
### Template 5: Request for Comments (RFC) Style
|
||||
|
||||
```markdown
|
||||
# RFC-0025: Adopt Event Sourcing for Order Management
|
||||
|
||||
## Summary
|
||||
|
||||
Propose adopting event sourcing pattern for the order management domain to
|
||||
improve auditability, enable temporal queries, and support business analytics.
|
||||
|
||||
## Motivation
|
||||
|
||||
Current challenges:
|
||||
1. Audit requirements need complete order history
|
||||
2. "What was the order state at time X?" queries are impossible
|
||||
3. Analytics team needs event stream for real-time dashboards
|
||||
4. Order state reconstruction for customer support is manual
|
||||
|
||||
## Detailed Design
|
||||
|
||||
### Event Store
|
||||
|
||||
```
|
||||
OrderCreated { orderId, customerId, items[], timestamp }
|
||||
OrderItemAdded { orderId, item, timestamp }
|
||||
OrderItemRemoved { orderId, itemId, timestamp }
|
||||
PaymentReceived { orderId, amount, paymentId, timestamp }
|
||||
OrderShipped { orderId, trackingNumber, timestamp }
|
||||
```
|
||||
|
||||
### Projections
|
||||
|
||||
- **CurrentOrderState**: Materialized view for queries
|
||||
- **OrderHistory**: Complete timeline for audit
|
||||
- **DailyOrderMetrics**: Analytics aggregation
|
||||
|
||||
### Technology
|
||||
|
||||
- Event Store: EventStoreDB (purpose-built, handles projections)
|
||||
- Alternative considered: Kafka + custom projection service
|
||||
|
||||
## Drawbacks
|
||||
|
||||
- Learning curve for team
|
||||
- Increased complexity vs. CRUD
|
||||
- Need to design events carefully (immutable once stored)
|
||||
- Storage growth (events never deleted)
|
||||
|
||||
## Alternatives
|
||||
|
||||
1. **Audit tables**: Simpler but doesn't enable temporal queries
|
||||
2. **CDC from existing DB**: Complex, doesn't change data model
|
||||
3. **Hybrid**: Event source only for order state changes
|
||||
|
||||
## Unresolved Questions
|
||||
|
||||
- [ ] Event schema versioning strategy
|
||||
- [ ] Retention policy for events
|
||||
- [ ] Snapshot frequency for performance
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
1. Prototype with single order type (2 weeks)
|
||||
2. Team training on event sourcing (1 week)
|
||||
3. Full implementation and migration (4 weeks)
|
||||
4. Monitoring and optimization (ongoing)
|
||||
|
||||
## References
|
||||
|
||||
- [Event Sourcing by Martin Fowler](https://martinfowler.com/eaaDev/EventSourcing.html)
|
||||
- [EventStoreDB Documentation](https://www.eventstore.com/docs)
|
||||
```
|
||||
|
||||
## ADR Management
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
docs/
|
||||
├── adr/
|
||||
│ ├── README.md # Index and guidelines
|
||||
│ ├── template.md # Team's ADR template
|
||||
│ ├── 0001-use-postgresql.md
|
||||
│ ├── 0002-caching-strategy.md
|
||||
│ ├── 0003-mongodb-user-profiles.md # [DEPRECATED]
|
||||
│ └── 0020-deprecate-mongodb.md # Supersedes 0003
|
||||
```
|
||||
|
||||
### ADR Index (README.md)
|
||||
|
||||
```markdown
|
||||
# Architecture Decision Records
|
||||
|
||||
This directory contains Architecture Decision Records (ADRs) for [Project Name].
|
||||
|
||||
## Index
|
||||
|
||||
| ADR | Title | Status | Date |
|
||||
|-----|-------|--------|------|
|
||||
| 0001 | Use PostgreSQL as Primary Database | Accepted | 2024-01-10 |
|
||||
| 0002 | Caching Strategy with Redis | Accepted | 2024-01-12 |
|
||||
| 0003 | MongoDB for User Profiles | Deprecated | 2023-06-15 |
|
||||
| 0020 | Deprecate MongoDB | Accepted | 2024-01-15 |
|
||||
|
||||
## Creating a New ADR
|
||||
|
||||
1. Copy `template.md` to `NNNN-title-with-dashes.md`
|
||||
2. Fill in the template
|
||||
3. Submit PR for review
|
||||
4. Update this index after approval
|
||||
|
||||
## ADR Status
|
||||
|
||||
- **Proposed**: Under discussion
|
||||
- **Accepted**: Decision made, implementing
|
||||
- **Deprecated**: No longer relevant
|
||||
- **Superseded**: Replaced by another ADR
|
||||
- **Rejected**: Considered but not adopted
|
||||
```
|
||||
|
||||
### Automation (adr-tools)
|
||||
|
||||
```bash
|
||||
# Install adr-tools
|
||||
brew install adr-tools
|
||||
|
||||
# Initialize ADR directory
|
||||
adr init docs/adr
|
||||
|
||||
# Create new ADR
|
||||
adr new "Use PostgreSQL as Primary Database"
|
||||
|
||||
# Supersede an ADR
|
||||
adr new -s 3 "Deprecate MongoDB in Favor of PostgreSQL"
|
||||
|
||||
# Generate table of contents
|
||||
adr generate toc > docs/adr/README.md
|
||||
|
||||
# Link related ADRs
|
||||
adr link 2 "Complements" 1 "Is complemented by"
|
||||
```
|
||||
|
||||
## Review Process
|
||||
|
||||
```markdown
|
||||
## ADR Review Checklist
|
||||
|
||||
### Before Submission
|
||||
- [ ] Context clearly explains the problem
|
||||
- [ ] All viable options considered
|
||||
- [ ] Pros/cons balanced and honest
|
||||
- [ ] Consequences (positive and negative) documented
|
||||
- [ ] Related ADRs linked
|
||||
|
||||
### During Review
|
||||
- [ ] At least 2 senior engineers reviewed
|
||||
- [ ] Affected teams consulted
|
||||
- [ ] Security implications considered
|
||||
- [ ] Cost implications documented
|
||||
- [ ] Reversibility assessed
|
||||
|
||||
### After Acceptance
|
||||
- [ ] ADR index updated
|
||||
- [ ] Team notified
|
||||
- [ ] Implementation tickets created
|
||||
- [ ] Related documentation updated
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Do's
|
||||
- **Write ADRs early** - Before implementation starts
|
||||
- **Keep them short** - 1-2 pages maximum
|
||||
- **Be honest about trade-offs** - Include real cons
|
||||
- **Link related decisions** - Build decision graph
|
||||
- **Update status** - Deprecate when superseded
|
||||
|
||||
### Don'ts
|
||||
- **Don't change accepted ADRs** - Write new ones to supersede
|
||||
- **Don't skip context** - Future readers need background
|
||||
- **Don't hide failures** - Rejected decisions are valuable
|
||||
- **Don't be vague** - Specific decisions, specific consequences
|
||||
- **Don't forget implementation** - ADR without action is waste
|
||||
|
||||
## Resources
|
||||
|
||||
- [Documenting Architecture Decisions (Michael Nygard)](https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions)
|
||||
- [MADR Template](https://adr.github.io/madr/)
|
||||
- [ADR GitHub Organization](https://adr.github.io/)
|
||||
- [adr-tools](https://github.com/npryce/adr-tools)
|
||||
25
skills/aws-cost-cleanup/README.md
Normal file
25
skills/aws-cost-cleanup/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
310
skills/aws-cost-cleanup/SKILL.md
Normal file
310
skills/aws-cost-cleanup/SKILL.md
Normal file
|
|
@ -0,0 +1,310 @@
|
|||
---
|
||||
name: aws-cost-cleanup
|
||||
description: "Automated cleanup of unused AWS resources to reduce costs"
|
||||
risk: safe
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# AWS Cost Cleanup
|
||||
|
||||
Automate the identification and removal of unused AWS resources to eliminate waste.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when you need to automatically clean up unused AWS resources to reduce costs and eliminate waste.
|
||||
|
||||
## Automated Cleanup Targets
|
||||
|
||||
**Storage**
|
||||
- Unattached EBS volumes
|
||||
- Old EBS snapshots (>90 days)
|
||||
- Incomplete multipart S3 uploads
|
||||
- Old S3 versions in versioned buckets
|
||||
|
||||
**Compute**
|
||||
- Stopped EC2 instances (>30 days)
|
||||
- Unused AMIs and associated snapshots
|
||||
- Unused Elastic IPs
|
||||
|
||||
**Networking**
|
||||
- Unused Elastic Load Balancers
|
||||
- Unused NAT Gateways
|
||||
- Orphaned ENIs
|
||||
|
||||
## Cleanup Scripts
|
||||
|
||||
### Safe Cleanup (Dry-Run First)
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# cleanup-unused-ebs.sh
|
||||
|
||||
echo "Finding unattached EBS volumes..."
|
||||
VOLUMES=$(aws ec2 describe-volumes \
|
||||
--filters Name=status,Values=available \
|
||||
--query 'Volumes[*].VolumeId' \
|
||||
--output text)
|
||||
|
||||
for vol in $VOLUMES; do
|
||||
echo "Would delete: $vol"
|
||||
# Uncomment to actually delete:
|
||||
# aws ec2 delete-volume --volume-id $vol
|
||||
done
|
||||
```
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# cleanup-old-snapshots.sh
|
||||
|
||||
CUTOFF_DATE=$(date -d '90 days ago' --iso-8601)
|
||||
|
||||
aws ec2 describe-snapshots --owner-ids self \
|
||||
--query "Snapshots[?StartTime<='$CUTOFF_DATE'].[SnapshotId,StartTime,VolumeSize]" \
|
||||
--output text | while read snap_id start_time size; do
|
||||
|
||||
echo "Snapshot: $snap_id (Created: $start_time, Size: ${size}GB)"
|
||||
# Uncomment to delete:
|
||||
# aws ec2 delete-snapshot --snapshot-id $snap_id
|
||||
done
|
||||
```
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# release-unused-eips.sh
|
||||
|
||||
aws ec2 describe-addresses \
|
||||
--query 'Addresses[?AssociationId==null].[AllocationId,PublicIp]' \
|
||||
--output text | while read alloc_id public_ip; do
|
||||
|
||||
echo "Would release: $public_ip ($alloc_id)"
|
||||
# Uncomment to release:
|
||||
# aws ec2 release-address --allocation-id $alloc_id
|
||||
done
|
||||
```
|
||||
|
||||
### S3 Lifecycle Automation
|
||||
|
||||
```bash
|
||||
# Apply lifecycle policy to transition old objects to cheaper storage
|
||||
cat > lifecycle-policy.json <<EOF
|
||||
{
|
||||
"Rules": [
|
||||
{
|
||||
"Id": "Archive old objects",
|
||||
"Status": "Enabled",
|
||||
"Transitions": [
|
||||
{
|
||||
"Days": 90,
|
||||
"StorageClass": "STANDARD_IA"
|
||||
},
|
||||
{
|
||||
"Days": 180,
|
||||
"StorageClass": "GLACIER"
|
||||
}
|
||||
],
|
||||
"NoncurrentVersionExpiration": {
|
||||
"NoncurrentDays": 30
|
||||
},
|
||||
"AbortIncompleteMultipartUpload": {
|
||||
"DaysAfterInitiation": 7
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
EOF
|
||||
|
||||
aws s3api put-bucket-lifecycle-configuration \
|
||||
--bucket my-bucket \
|
||||
--lifecycle-configuration file://lifecycle-policy.json
|
||||
```
|
||||
|
||||
## Cost Impact Calculator
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
# calculate-savings.py
|
||||
|
||||
import boto3
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
ec2 = boto3.client('ec2')
|
||||
|
||||
# Calculate EBS volume savings
|
||||
volumes = ec2.describe_volumes(
|
||||
Filters=[{'Name': 'status', 'Values': ['available']}]
|
||||
)
|
||||
|
||||
total_size = sum(v['Size'] for v in volumes['Volumes'])
|
||||
monthly_cost = total_size * 0.10 # $0.10/GB-month for gp3
|
||||
|
||||
print(f"Unattached EBS Volumes: {len(volumes['Volumes'])}")
|
||||
print(f"Total Size: {total_size} GB")
|
||||
print(f"Monthly Savings: ${monthly_cost:.2f}")
|
||||
|
||||
# Calculate Elastic IP savings
|
||||
addresses = ec2.describe_addresses()
|
||||
unused = [a for a in addresses['Addresses'] if 'AssociationId' not in a]
|
||||
|
||||
eip_cost = len(unused) * 3.65 # $0.005/hour * 730 hours
|
||||
print(f"\nUnused Elastic IPs: {len(unused)}")
|
||||
print(f"Monthly Savings: ${eip_cost:.2f}")
|
||||
|
||||
print(f"\nTotal Monthly Savings: ${monthly_cost + eip_cost:.2f}")
|
||||
print(f"Annual Savings: ${(monthly_cost + eip_cost) * 12:.2f}")
|
||||
```
|
||||
|
||||
## Automated Cleanup Lambda
|
||||
|
||||
```python
|
||||
import boto3
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
def lambda_handler(event, context):
|
||||
ec2 = boto3.client('ec2')
|
||||
|
||||
# Delete unattached volumes older than 7 days
|
||||
volumes = ec2.describe_volumes(
|
||||
Filters=[{'Name': 'status', 'Values': ['available']}]
|
||||
)
|
||||
|
||||
cutoff = datetime.now() - timedelta(days=7)
|
||||
deleted = 0
|
||||
|
||||
for vol in volumes['Volumes']:
|
||||
create_time = vol['CreateTime'].replace(tzinfo=None)
|
||||
if create_time < cutoff:
|
||||
try:
|
||||
ec2.delete_volume(VolumeId=vol['VolumeId'])
|
||||
deleted += 1
|
||||
print(f"Deleted volume: {vol['VolumeId']}")
|
||||
except Exception as e:
|
||||
print(f"Error deleting {vol['VolumeId']}: {e}")
|
||||
|
||||
return {
|
||||
'statusCode': 200,
|
||||
'body': f'Deleted {deleted} volumes'
|
||||
}
|
||||
```
|
||||
|
||||
## Cleanup Workflow
|
||||
|
||||
1. **Discovery Phase** (Read-only)
|
||||
- Run all describe commands
|
||||
- Generate cost impact report
|
||||
- Review with team
|
||||
|
||||
2. **Validation Phase**
|
||||
- Verify resources are truly unused
|
||||
- Check for dependencies
|
||||
- Notify resource owners
|
||||
|
||||
3. **Execution Phase** (Dry-run first)
|
||||
- Run cleanup scripts with dry-run
|
||||
- Review proposed changes
|
||||
- Execute actual cleanup
|
||||
|
||||
4. **Verification Phase**
|
||||
- Confirm deletions
|
||||
- Monitor for issues
|
||||
- Document savings
|
||||
|
||||
## Safety Checklist
|
||||
|
||||
- [ ] Run in dry-run mode first
|
||||
- [ ] Verify resources have no dependencies
|
||||
- [ ] Check resource tags for ownership
|
||||
- [ ] Notify stakeholders before deletion
|
||||
- [ ] Create snapshots of critical data
|
||||
- [ ] Test in non-production first
|
||||
- [ ] Have rollback plan ready
|
||||
- [ ] Document all deletions
|
||||
|
||||
## Example Prompts
|
||||
|
||||
**Discovery**
|
||||
- "Find all unused resources and calculate potential savings"
|
||||
- "Generate a cleanup report for my AWS account"
|
||||
- "What resources can I safely delete?"
|
||||
|
||||
**Execution**
|
||||
- "Create a script to cleanup unattached EBS volumes"
|
||||
- "Delete all snapshots older than 90 days"
|
||||
- "Release unused Elastic IPs"
|
||||
|
||||
**Automation**
|
||||
- "Set up automated cleanup for old snapshots"
|
||||
- "Create a Lambda function for weekly cleanup"
|
||||
- "Schedule monthly resource cleanup"
|
||||
|
||||
## Integration with AWS Organizations
|
||||
|
||||
```bash
|
||||
# Run cleanup across multiple accounts
|
||||
for account in $(aws organizations list-accounts \
|
||||
--query 'Accounts[*].Id' --output text); do
|
||||
|
||||
echo "Checking account: $account"
|
||||
aws ec2 describe-volumes \
|
||||
--filters Name=status,Values=available \
|
||||
--profile account-$account
|
||||
done
|
||||
```
|
||||
|
||||
## Monitoring and Alerts
|
||||
|
||||
```bash
|
||||
# Create CloudWatch alarm for cost anomalies
|
||||
aws cloudwatch put-metric-alarm \
|
||||
--alarm-name high-cost-alert \
|
||||
--alarm-description "Alert when daily cost exceeds threshold" \
|
||||
--metric-name EstimatedCharges \
|
||||
--namespace AWS/Billing \
|
||||
--statistic Maximum \
|
||||
--period 86400 \
|
||||
--evaluation-periods 1 \
|
||||
--threshold 100 \
|
||||
--comparison-operator GreaterThanThreshold
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Schedule cleanup during maintenance windows
|
||||
- Always create final snapshots before deletion
|
||||
- Use resource tags to identify cleanup candidates
|
||||
- Implement approval workflow for production
|
||||
- Log all cleanup actions for audit
|
||||
- Set up cost anomaly detection
|
||||
- Review cleanup results weekly
|
||||
|
||||
## Risk Mitigation
|
||||
|
||||
**Medium Risk Actions:**
|
||||
- Deleting unattached volumes (ensure no planned reattachment)
|
||||
- Removing old snapshots (verify no compliance requirements)
|
||||
- Releasing Elastic IPs (check DNS records)
|
||||
|
||||
**Always:**
|
||||
- Maintain 30-day backup retention
|
||||
- Use AWS Backup for critical resources
|
||||
- Test restore procedures
|
||||
- Document cleanup decisions
|
||||
|
||||
## Kiro CLI Integration
|
||||
|
||||
```bash
|
||||
# Analyze and cleanup in one command
|
||||
kiro-cli chat "Use aws-cost-cleanup to find and remove unused resources"
|
||||
|
||||
# Generate cleanup script
|
||||
kiro-cli chat "Create a safe cleanup script for my AWS account"
|
||||
|
||||
# Schedule automated cleanup
|
||||
kiro-cli chat "Set up weekly automated cleanup using aws-cost-cleanup"
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- [AWS Resource Cleanup Best Practices](https://aws.amazon.com/blogs/mt/automate-resource-cleanup/)
|
||||
- [AWS Systems Manager Automation](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-automation.html)
|
||||
- [AWS Config Rules for Compliance](https://docs.aws.amazon.com/config/latest/developerguide/managed-rules-by-aws-config.html)
|
||||
25
skills/aws-cost-optimizer/README.md
Normal file
25
skills/aws-cost-optimizer/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
193
skills/aws-cost-optimizer/SKILL.md
Normal file
193
skills/aws-cost-optimizer/SKILL.md
Normal file
|
|
@ -0,0 +1,193 @@
|
|||
---
|
||||
name: aws-cost-optimizer
|
||||
description: "Comprehensive AWS cost analysis and optimization recommendations using AWS CLI and Cost Explorer"
|
||||
risk: safe
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# AWS Cost Optimizer
|
||||
|
||||
Analyze AWS spending patterns, identify waste, and provide actionable cost reduction strategies.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when you need to analyze AWS spending, identify cost optimization opportunities, or reduce cloud waste.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
**Cost Analysis**
|
||||
- Parse AWS Cost Explorer data for trends and anomalies
|
||||
- Break down costs by service, region, and resource tags
|
||||
- Identify month-over-month spending increases
|
||||
|
||||
**Resource Optimization**
|
||||
- Detect idle EC2 instances (low CPU utilization)
|
||||
- Find unattached EBS volumes and old snapshots
|
||||
- Identify unused Elastic IPs
|
||||
- Locate underutilized RDS instances
|
||||
- Find old S3 objects eligible for lifecycle policies
|
||||
|
||||
**Savings Recommendations**
|
||||
- Suggest Reserved Instance/Savings Plans opportunities
|
||||
- Recommend instance rightsizing based on CloudWatch metrics
|
||||
- Identify resources in expensive regions
|
||||
- Calculate potential savings with specific actions
|
||||
|
||||
## AWS CLI Commands
|
||||
|
||||
### Get Cost and Usage
|
||||
```bash
|
||||
# Last 30 days cost by service
|
||||
aws ce get-cost-and-usage \
|
||||
--time-period Start=$(date -d '30 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
|
||||
--granularity MONTHLY \
|
||||
--metrics BlendedCost \
|
||||
--group-by Type=DIMENSION,Key=SERVICE
|
||||
|
||||
# Daily costs for current month
|
||||
aws ce get-cost-and-usage \
|
||||
--time-period Start=$(date +%Y-%m-01),End=$(date +%Y-%m-%d) \
|
||||
--granularity DAILY \
|
||||
--metrics UnblendedCost
|
||||
```
|
||||
|
||||
### Find Unused Resources
|
||||
```bash
|
||||
# Unattached EBS volumes
|
||||
aws ec2 describe-volumes \
|
||||
--filters Name=status,Values=available \
|
||||
--query 'Volumes[*].[VolumeId,Size,VolumeType,CreateTime]' \
|
||||
--output table
|
||||
|
||||
# Unused Elastic IPs
|
||||
aws ec2 describe-addresses \
|
||||
--query 'Addresses[?AssociationId==null].[PublicIp,AllocationId]' \
|
||||
--output table
|
||||
|
||||
# Idle EC2 instances (requires CloudWatch)
|
||||
aws cloudwatch get-metric-statistics \
|
||||
--namespace AWS/EC2 \
|
||||
--metric-name CPUUtilization \
|
||||
--dimensions Name=InstanceId,Value=i-xxxxx \
|
||||
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
|
||||
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
|
||||
--period 86400 \
|
||||
--statistics Average
|
||||
|
||||
# Old EBS snapshots (>90 days)
|
||||
aws ec2 describe-snapshots \
|
||||
--owner-ids self \
|
||||
--query 'Snapshots[?StartTime<=`'$(date -d '90 days ago' --iso-8601)'`].[SnapshotId,StartTime,VolumeSize]' \
|
||||
--output table
|
||||
```
|
||||
|
||||
### Rightsizing Analysis
|
||||
```bash
|
||||
# List EC2 instances with their types
|
||||
aws ec2 describe-instances \
|
||||
--query 'Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name,Tags[?Key==`Name`].Value|[0]]' \
|
||||
--output table
|
||||
|
||||
# Get RDS instance utilization
|
||||
aws cloudwatch get-metric-statistics \
|
||||
--namespace AWS/RDS \
|
||||
--metric-name CPUUtilization \
|
||||
--dimensions Name=DBInstanceIdentifier,Value=mydb \
|
||||
--start-time $(date -u -d '30 days ago' +%Y-%m-%dT%H:%M:%S) \
|
||||
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
|
||||
--period 86400 \
|
||||
--statistics Average,Maximum
|
||||
```
|
||||
|
||||
## Optimization Workflow
|
||||
|
||||
1. **Baseline Assessment**
|
||||
- Pull 3-6 months of cost data
|
||||
- Identify top 5 spending services
|
||||
- Calculate growth rate
|
||||
|
||||
2. **Quick Wins**
|
||||
- Delete unattached EBS volumes
|
||||
- Release unused Elastic IPs
|
||||
- Stop/terminate idle EC2 instances
|
||||
- Delete old snapshots
|
||||
|
||||
3. **Strategic Optimization**
|
||||
- Analyze Reserved Instance coverage
|
||||
- Review instance types vs. workload
|
||||
- Implement S3 lifecycle policies
|
||||
- Consider Spot instances for non-critical workloads
|
||||
|
||||
4. **Ongoing Monitoring**
|
||||
- Set up AWS Budgets with alerts
|
||||
- Enable Cost Anomaly Detection
|
||||
- Tag resources for cost allocation
|
||||
- Monthly cost review meetings
|
||||
|
||||
## Cost Optimization Checklist
|
||||
|
||||
- [ ] Enable AWS Cost Explorer
|
||||
- [ ] Set up cost allocation tags
|
||||
- [ ] Create AWS Budget with alerts
|
||||
- [ ] Review and delete unused resources
|
||||
- [ ] Analyze Reserved Instance opportunities
|
||||
- [ ] Implement S3 Intelligent-Tiering
|
||||
- [ ] Review data transfer costs
|
||||
- [ ] Optimize Lambda memory allocation
|
||||
- [ ] Use CloudWatch Logs retention policies
|
||||
- [ ] Consider multi-region cost differences
|
||||
|
||||
## Example Prompts
|
||||
|
||||
**Analysis**
|
||||
- "Show me AWS costs for the last 3 months broken down by service"
|
||||
- "What are my top 10 most expensive resources?"
|
||||
- "Compare this month's spending to last month"
|
||||
|
||||
**Optimization**
|
||||
- "Find all unattached EBS volumes and calculate savings"
|
||||
- "Identify EC2 instances with <5% CPU utilization"
|
||||
- "Suggest Reserved Instance purchases based on usage"
|
||||
- "Calculate savings from deleting snapshots older than 90 days"
|
||||
|
||||
**Implementation**
|
||||
- "Create a script to delete unattached volumes"
|
||||
- "Set up a budget alert for $1000/month"
|
||||
- "Generate a cost optimization report for leadership"
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Always test in non-production first
|
||||
- Verify resources are truly unused before deletion
|
||||
- Document all cost optimization actions
|
||||
- Calculate ROI for optimization efforts
|
||||
- Automate recurring optimization tasks
|
||||
- Use AWS Trusted Advisor recommendations
|
||||
- Enable AWS Cost Anomaly Detection
|
||||
|
||||
## Integration with Kiro CLI
|
||||
|
||||
This skill works seamlessly with Kiro CLI's AWS integration:
|
||||
|
||||
```bash
|
||||
# Use Kiro to analyze costs
|
||||
kiro-cli chat "Use aws-cost-optimizer to analyze my spending"
|
||||
|
||||
# Generate optimization report
|
||||
kiro-cli chat "Create a cost optimization plan using aws-cost-optimizer"
|
||||
```
|
||||
|
||||
## Safety Notes
|
||||
|
||||
- **Risk Level: Low** - Read-only analysis is safe
|
||||
- **Deletion Actions: Medium Risk** - Always verify before deleting resources
|
||||
- **Production Changes: High Risk** - Test rightsizing in dev/staging first
|
||||
- Maintain backups before any deletion
|
||||
- Use `--dry-run` flag when available
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- [AWS Cost Optimization Best Practices](https://aws.amazon.com/pricing/cost-optimization/)
|
||||
- [AWS Well-Architected Framework - Cost Optimization](https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/welcome.html)
|
||||
- [AWS Cost Explorer API](https://docs.aws.amazon.com/cost-management/latest/APIReference/Welcome.html)
|
||||
144
skills/aws-iam-debugging/SKILL.md
Normal file
144
skills/aws-iam-debugging/SKILL.md
Normal file
|
|
@ -0,0 +1,144 @@
|
|||
---
|
||||
name: aws-iam-debugging
|
||||
description: Use when hitting AWS AccessDenied, authorization failures, IRSA/EKS pod permission errors, SSO session issues, cross-account AssumeRole failures, or MalformedPolicyDocument errors involving AWSReservedSSO_* principals in multi-account/Organizations environments.
|
||||
---
|
||||
|
||||
# AWS IAM Debugging
|
||||
|
||||
## Overview
|
||||
|
||||
IAM failures have predictable root causes. Identify the caller, simulate or inspect the policy, check SCPs if multi-account. S3 requires BOTH IAM and bucket policy to allow — either can block independently.
|
||||
|
||||
## Error Reference
|
||||
|
||||
| Error | Likely cause |
|
||||
|-------|-------------|
|
||||
| `is not authorized to perform: X on resource: Y` | Missing IAM policy statement |
|
||||
| `MalformedPolicyDocument: Invalid principal` | Using `AWSReservedSSO_*` role as principal (not allowed) |
|
||||
| `Access Denied` (S3) | Bucket policy + IAM both must allow; SCP may be blocking |
|
||||
| `AccessDenied` (STS AssumeRole) | Trust policy missing caller ARN, or SCP blocks |
|
||||
| `InvalidClientTokenId` | Wrong region, expired credentials, wrong profile |
|
||||
| `TokenRefreshRequired` | SSO session expired — run `aws sso login` |
|
||||
| `Unable to locate credentials` | No credentials configured — check `~/.aws/credentials` or env vars |
|
||||
|
||||
## Diagnostic Flow
|
||||
|
||||
**Step 1: Who is calling?**
|
||||
```bash
|
||||
aws sts get-caller-identity
|
||||
# Arn field tells you exactly what entity is making the call
|
||||
```
|
||||
|
||||
**Step 2: Simulate the permission**
|
||||
```bash
|
||||
aws iam simulate-principal-policy \
|
||||
--policy-source-arn arn:aws:iam::<account>:role/<role> \
|
||||
--action-names s3:GetObject \
|
||||
--resource-arns arn:aws:s3:::<bucket>/*
|
||||
|
||||
aws iam list-attached-role-policies --role-name <role>
|
||||
aws iam list-role-policies --role-name <role> # inline policies
|
||||
aws iam get-role-policy --role-name <role> --policy-name <policy>
|
||||
```
|
||||
|
||||
**Step 3: Check SCPs (multi-account)**
|
||||
```bash
|
||||
aws organizations list-policies-for-target \
|
||||
--target-id <account-id> --filter SERVICE_CONTROL_POLICY
|
||||
aws organizations describe-policy --policy-id <policy-id>
|
||||
```
|
||||
|
||||
## AWSReservedSSO_* Principal Gotcha
|
||||
|
||||
`AWSReservedSSO_*` roles **cannot** be used as IAM principals in trust policies.
|
||||
|
||||
```hcl
|
||||
# WRONG:
|
||||
principals {
|
||||
type = "AWS"
|
||||
identifiers = ["arn:aws:iam::123456789:role/AWSReservedSSO_Admin_abc"]
|
||||
}
|
||||
|
||||
# CORRECT — allow via condition:
|
||||
principals {
|
||||
type = "AWS"
|
||||
identifiers = ["arn:aws:iam::123456789:root"]
|
||||
}
|
||||
condition {
|
||||
test = "StringLike"
|
||||
variable = "aws:PrincipalArn"
|
||||
values = ["arn:aws:iam::123456789:assumed-role/AWSReservedSSO_Admin_*/*"]
|
||||
}
|
||||
```
|
||||
|
||||
Alternatives: `aws:PrincipalOrgID` (if all callers are in the org), or `aws:PrincipalTag`.
|
||||
|
||||
## IRSA (EKS IAM Roles for Service Accounts)
|
||||
|
||||
```bash
|
||||
# Check ServiceAccount annotation
|
||||
kubectl get sa <name> -n <namespace> -o yaml | grep eks.amazonaws.com
|
||||
|
||||
# Verify OIDC provider is registered
|
||||
aws iam list-open-id-connect-providers
|
||||
|
||||
# Inspect role trust policy condition (must match exactly)
|
||||
aws iam get-role --role-name <role> \
|
||||
| jq '.Role.AssumeRolePolicyDocument.Statement[].Condition'
|
||||
# Required: "oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>:sub":
|
||||
# "system:serviceaccount:<namespace>:<sa-name>"
|
||||
|
||||
# Test from inside the pod
|
||||
kubectl exec -n <ns> <pod> -- aws sts get-caller-identity
|
||||
```
|
||||
|
||||
Common mistakes: namespace/SA name typo in trust policy; OIDC provider not registered.
|
||||
|
||||
## S3 Access Denied
|
||||
|
||||
```bash
|
||||
aws s3api get-bucket-policy --bucket <bucket>
|
||||
aws s3api get-bucket-acl --bucket <bucket>
|
||||
aws s3api get-public-access-block --bucket <bucket>
|
||||
aws s3 ls s3://<bucket> --debug 2>&1 | grep "Final credentials"
|
||||
```
|
||||
|
||||
## Cross-Account AssumeRole
|
||||
|
||||
```bash
|
||||
# Try manually
|
||||
aws sts assume-role \
|
||||
--role-arn arn:aws:iam::<target-account>:role/<role> \
|
||||
--role-session-name test-session
|
||||
|
||||
# If AccessDenied, check:
|
||||
# 1. Trust policy of target role allows caller's ARN
|
||||
# 2. Caller has sts:AssumeRole in their own account
|
||||
# 3. No SCP blocks sts:AssumeRole in either account
|
||||
|
||||
aws iam get-role --role-name <role> | jq '.Role.AssumeRolePolicyDocument'
|
||||
```
|
||||
|
||||
## SSO / Identity Center Sessions
|
||||
|
||||
```bash
|
||||
aws sso login --profile <profile>
|
||||
aws configure list-profiles
|
||||
aws sts get-caller-identity --profile <profile>
|
||||
|
||||
# Clear stale tokens
|
||||
rm ~/.aws/sso/cache/*.json && aws sso login --profile <profile>
|
||||
```
|
||||
|
||||
## CloudTrail — Find What Was Denied
|
||||
|
||||
```bash
|
||||
aws cloudtrail lookup-events \
|
||||
--lookup-attributes AttributeKey=EventName,AttributeValue=AssumeRole \
|
||||
--start-time "2024-01-01T00:00:00Z" --max-results 10
|
||||
|
||||
# Filter by error code
|
||||
aws cloudtrail lookup-events \
|
||||
--lookup-attributes AttributeKey=Username,AttributeValue=<username> \
|
||||
| jq '.Events[] | select(.CloudTrailEvent | fromjson | .errorCode != null)'
|
||||
```
|
||||
25
skills/aws-skills/README.md
Normal file
25
skills/aws-skills/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
23
skills/aws-skills/SKILL.md
Normal file
23
skills/aws-skills/SKILL.md
Normal file
|
|
@ -0,0 +1,23 @@
|
|||
---
|
||||
name: aws-skills
|
||||
description: "AWS development with infrastructure automation and cloud architecture patterns"
|
||||
risk: safe
|
||||
source: "https://github.com/zxkane/aws-skills"
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Aws Skills
|
||||
|
||||
## Overview
|
||||
|
||||
AWS development with infrastructure automation and cloud architecture patterns
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when you need to work with aws development with infrastructure automation and cloud architecture patterns.
|
||||
|
||||
## Instructions
|
||||
|
||||
This skill provides guidance and patterns for aws development with infrastructure automation and cloud architecture patterns.
|
||||
|
||||
For more information, see the [source repository](https://github.com/zxkane/aws-skills).
|
||||
180
skills/azure-devops-pipeline/SKILL.md
Normal file
180
skills/azure-devops-pipeline/SKILL.md
Normal file
|
|
@ -0,0 +1,180 @@
|
|||
---
|
||||
name: azure-devops-pipeline
|
||||
description: Generates Azure DevOps pipeline YAML using EKS-Pool with nonprod auto-deploy and prod manual approval gate. Always load this skill first, then load the type-specific skill before generating any YAML.
|
||||
---
|
||||
|
||||
## What I do
|
||||
|
||||
Guide the generation of a complete `azure-pipelines.yml` file for a self-hosted EKS-Pool Azure DevOps agent pool. I define all shared standards. You MUST also load the appropriate type skill before generating YAML:
|
||||
|
||||
- Lambda deployments → load `azure-pipeline-lambda`
|
||||
- Ansible playbooks → load `azure-pipeline-ansible`
|
||||
- Docker builds → load `azure-pipeline-docker`
|
||||
|
||||
## IMPORTANT — do not generate YAML without loading a type skill
|
||||
|
||||
STOP. Before generating any pipeline YAML, you MUST load the type skill that matches the requested pipeline type:
|
||||
- `azure-pipeline-lambda` for Lambda
|
||||
- `azure-pipeline-ansible` for Ansible
|
||||
- `azure-pipeline-docker` for Docker
|
||||
|
||||
Generate nothing until that skill is loaded.
|
||||
|
||||
## Required inputs — ask the user for these before generating
|
||||
|
||||
1. **Service/repo name** — used in display names and tags
|
||||
2. **Pipeline type** — `lambda` | `ansible` | `docker`
|
||||
3. **Target tier** — `nonprod` | `prod`
|
||||
4. **Trigger branch** — branch that triggers auto-deploy (default: `main`)
|
||||
5. **Secret sources** — which are in use: `ADO variable groups` | `AWS SSM/Secrets Manager` | `Vault/OpenBao` (can be multiple)
|
||||
6. **ADO variable group name(s)** — if ADO variable groups selected
|
||||
|
||||
## Pipeline skeleton — always use this structure
|
||||
|
||||
```yaml
|
||||
trigger:
|
||||
branches:
|
||||
include:
|
||||
- <trigger-branch>
|
||||
|
||||
pool: EKS-Pool
|
||||
|
||||
stages:
|
||||
- stage: Lint
|
||||
displayName: "Lint"
|
||||
jobs:
|
||||
- job: Lint
|
||||
pool: EKS-Pool
|
||||
timeoutInMinutes: 30
|
||||
continueOnError: false
|
||||
steps: [] # type skill fills this in
|
||||
|
||||
- stage: SecurityScan
|
||||
displayName: "Security Scan"
|
||||
dependsOn: Lint
|
||||
condition: succeeded()
|
||||
jobs:
|
||||
- job: SecurityScan
|
||||
pool: EKS-Pool
|
||||
timeoutInMinutes: 30
|
||||
continueOnError: false
|
||||
steps: [] # type skill fills this in
|
||||
|
||||
- stage: Build
|
||||
displayName: "Build"
|
||||
dependsOn: SecurityScan
|
||||
condition: succeeded()
|
||||
jobs:
|
||||
- job: Build
|
||||
pool: EKS-Pool
|
||||
timeoutInMinutes: 30
|
||||
continueOnError: false
|
||||
steps: [] # type skill fills this in
|
||||
|
||||
- stage: DeployNonprod
|
||||
displayName: "Deploy — Nonprod"
|
||||
dependsOn: Build
|
||||
condition: succeeded()
|
||||
jobs:
|
||||
- deployment: DeployNonprod
|
||||
displayName: "Deploy to Nonprod"
|
||||
pool: EKS-Pool
|
||||
timeoutInMinutes: 30
|
||||
environment: nonprod
|
||||
strategy:
|
||||
runOnce:
|
||||
deploy:
|
||||
steps: [] # type skill fills this in
|
||||
|
||||
- stage: DeployProd
|
||||
displayName: "Deploy — Prod"
|
||||
dependsOn: DeployNonprod
|
||||
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/<trigger-branch>'))
|
||||
jobs:
|
||||
- deployment: DeployProd
|
||||
displayName: "Deploy to Prod"
|
||||
pool: EKS-Pool
|
||||
timeoutInMinutes: 30
|
||||
environment: prod # manual approval gate configured in ADO environment settings
|
||||
strategy:
|
||||
runOnce:
|
||||
deploy:
|
||||
steps: [] # type skill fills this in + git tag step below
|
||||
```
|
||||
|
||||
## Prod tier pipelines
|
||||
|
||||
When `target tier` is `prod`, omit `DeployNonprod` entirely. The pipeline contains only `Lint` → `SecurityScan` → `Build` → `DeployProd` with the manual approval gate.
|
||||
|
||||
When `target tier` is `nonprod`, omit `DeployProd` entirely.
|
||||
|
||||
## Git tagging on prod deploy
|
||||
|
||||
Add this as the final step inside `DeployProd`'s steps (prod tier only):
|
||||
|
||||
```yaml
|
||||
- script: |
|
||||
git config user.email "azdo-pipeline@$(System.TeamProject)"
|
||||
git config user.name "Azure DevOps Pipeline"
|
||||
git remote set-url origin "https://x-token:$(System.AccessToken)@$(echo $BUILD_REPOSITORY_URI | sed 's|https://||')"
|
||||
git tag $(Build.BuildNumber) $(Build.SourceVersion)
|
||||
git push origin $(Build.BuildNumber)
|
||||
displayName: "Tag commit with build number"
|
||||
env:
|
||||
SYSTEM_ACCESSTOKEN: $(System.AccessToken)
|
||||
BUILD_REPOSITORY_URI: $(Build.Repository.Uri)
|
||||
```
|
||||
|
||||
## Secret handling patterns
|
||||
|
||||
Emit the correct block(s) based on declared secret sources:
|
||||
|
||||
### ADO variable groups
|
||||
```yaml
|
||||
variables:
|
||||
- group: <variable-group-name>
|
||||
```
|
||||
Reference values as `$(VAR_NAME)` throughout the pipeline.
|
||||
|
||||
### AWS SSM Parameter Store
|
||||
```yaml
|
||||
- script: |
|
||||
VALUE=$(aws ssm get-parameter \
|
||||
--name "/myapp/mykey" \
|
||||
--with-decryption \
|
||||
--query "Parameter.Value" \
|
||||
--output text)
|
||||
echo "##vso[task.setvariable variable=MY_VAR;issecret=true]$VALUE"
|
||||
displayName: "Fetch secret from SSM"
|
||||
```
|
||||
|
||||
### AWS Secrets Manager
|
||||
```yaml
|
||||
- script: |
|
||||
VALUE=$(aws secretsmanager get-secret-value \
|
||||
--secret-id "myapp/mykey" \
|
||||
--query "SecretString" \
|
||||
--output text)
|
||||
echo "##vso[task.setvariable variable=MY_VAR;issecret=true]$VALUE"
|
||||
displayName: "Fetch secret from Secrets Manager"
|
||||
```
|
||||
|
||||
### Vault / OpenBao
|
||||
```yaml
|
||||
- script: |
|
||||
VALUE=$(vault kv get -field=mykey secret/myapp/mykey)
|
||||
echo "##vso[task.setvariable variable=MY_VAR;issecret=true]$VALUE"
|
||||
displayName: "Fetch secret from Vault"
|
||||
env:
|
||||
VAULT_ADDR: $(VAULT_ADDR)
|
||||
VAULT_TOKEN: $(VAULT_TOKEN)
|
||||
```
|
||||
|
||||
## Hard rules — always follow these
|
||||
|
||||
- `pool: EKS-Pool` on every job — no exceptions
|
||||
- `timeoutInMinutes: 30` on every job
|
||||
- `continueOnError: false` at **job level** on every job (not step level). Step-level `continueOnError` may be omitted.
|
||||
- No secrets hardcoded in YAML — all via variable groups or runtime fetch
|
||||
- Every stage and job has a `displayName:` set
|
||||
- `pool: EKS-Pool` must appear at job level, not stage level, to ensure it applies correctly
|
||||
145
skills/azure-pipeline-ansible/SKILL.md
Normal file
145
skills/azure-pipeline-ansible/SKILL.md
Normal file
|
|
@ -0,0 +1,145 @@
|
|||
---
|
||||
name: azure-pipeline-ansible
|
||||
description: Extends azure-devops-pipeline for Ansible playbook runs. Handles syntax check, galaxy install, vault passwords, SSH key injection, check mode on nonprod, and dynamic AWS EC2 inventory. Always load azure-devops-pipeline first.
|
||||
---
|
||||
|
||||
## What I add
|
||||
|
||||
Type-specific steps for Ansible pipelines. Merge these into the skeleton from `azure-devops-pipeline`.
|
||||
|
||||
## Additional required inputs — ask the user
|
||||
|
||||
1. **Playbook path** — e.g. `playbooks/site.yml`
|
||||
2. **Inventory source** — `static` | `dynamic-aws-ec2`
|
||||
3. **Ansible Vault in use** — `yes` | `no`
|
||||
4. **ADO secret variable name for vault password** — if vault in use, e.g. `ANSIBLE_VAULT_PASSWORD`
|
||||
5. **ADO secret variable name for SSH private key** — e.g. `ANSIBLE_SSH_KEY`
|
||||
6. **Ansible version to pin** — e.g. `9.2.0`
|
||||
7. **Run --check mode on nonprod before real apply** — `yes` (default) | `no`
|
||||
|
||||
## Lint stage steps
|
||||
|
||||
```yaml
|
||||
- script: |
|
||||
pip install "ansible==$(ANSIBLE_VERSION)" ansible-lint
|
||||
ansible-lint <playbook-path> --profile production
|
||||
displayName: "Lint — ansible-lint"
|
||||
env:
|
||||
ANSIBLE_VERSION: <ansible-version>
|
||||
```
|
||||
|
||||
## Security scan stage steps
|
||||
|
||||
```yaml
|
||||
- script: |
|
||||
pip install "ansible==$(ANSIBLE_VERSION)" ansible-lint
|
||||
ansible-lint <playbook-path> --profile security \
|
||||
--sarif-file ansible-lint-security.sarif || true
|
||||
ansible-galaxy install -r requirements.yml --force
|
||||
displayName: "Security scan — ansible-lint security profile"
|
||||
env:
|
||||
ANSIBLE_VERSION: <ansible-version>
|
||||
- task: PublishBuildArtifacts@1
|
||||
inputs:
|
||||
pathToPublish: ansible-lint-security.sarif
|
||||
artifactName: security-scan
|
||||
displayName: "Publish scan results"
|
||||
```
|
||||
|
||||
## Build stage steps
|
||||
|
||||
```yaml
|
||||
- script: |
|
||||
pip install "ansible==$(ANSIBLE_VERSION)"
|
||||
[ -f requirements.yml ] && ansible-galaxy install -r requirements.yml || true
|
||||
ansible-playbook <playbook-path> --syntax-check -i <inventory-file>
|
||||
displayName: "Validate — syntax check and galaxy install"
|
||||
env:
|
||||
ANSIBLE_VERSION: <ansible-version>
|
||||
```
|
||||
|
||||
Note: for dynamic-aws-ec2 inventory, replace `-i <inventory-file>` with `-i aws_ec2.yml` and ensure `aws_ec2.yml` exists in the repo with the `amazon.aws.aws_ec2` plugin configured.
|
||||
|
||||
## Deploy stage steps
|
||||
|
||||
### Step order — always emit in this order
|
||||
|
||||
1. Write SSH key to temp file
|
||||
2. Write vault password to temp file (if vault in use)
|
||||
3. Check mode run (nonprod only, if enabled)
|
||||
4. Real playbook run
|
||||
5. Clean up SSH key (condition: always)
|
||||
6. Clean up vault password (condition: always)
|
||||
|
||||
### SSH key injection (always include)
|
||||
|
||||
```yaml
|
||||
- script: |
|
||||
echo "$(ANSIBLE_SSH_KEY)" > /tmp/ansible_ssh_key
|
||||
chmod 600 /tmp/ansible_ssh_key
|
||||
displayName: "Inject SSH key"
|
||||
env:
|
||||
ANSIBLE_SSH_KEY: $(ANSIBLE_SSH_KEY)
|
||||
```
|
||||
|
||||
### Vault password file (include only if vault in use)
|
||||
|
||||
```yaml
|
||||
- script: |
|
||||
echo "$(ANSIBLE_VAULT_PASSWORD)" > /tmp/vault_pass
|
||||
chmod 600 /tmp/vault_pass
|
||||
displayName: "Write vault password file"
|
||||
env:
|
||||
ANSIBLE_VAULT_PASSWORD: $(ANSIBLE_VAULT_PASSWORD)
|
||||
```
|
||||
|
||||
### Check mode run (nonprod only, if enabled)
|
||||
|
||||
```yaml
|
||||
- script: |
|
||||
VAULT_ARGS=""
|
||||
[ -f /tmp/vault_pass ] && VAULT_ARGS="--vault-password-file /tmp/vault_pass"
|
||||
ansible-playbook <playbook-path> \
|
||||
-i <inventory> \
|
||||
--check \
|
||||
--diff \
|
||||
--private-key /tmp/ansible_ssh_key \
|
||||
$VAULT_ARGS
|
||||
displayName: "Dry run — check mode"
|
||||
```
|
||||
|
||||
### Real run
|
||||
|
||||
```yaml
|
||||
- script: |
|
||||
VAULT_ARGS=""
|
||||
[ -f /tmp/vault_pass ] && VAULT_ARGS="--vault-password-file /tmp/vault_pass"
|
||||
ansible-playbook <playbook-path> \
|
||||
-i <inventory> \
|
||||
--diff \
|
||||
--private-key /tmp/ansible_ssh_key \
|
||||
$VAULT_ARGS
|
||||
displayName: "Apply playbook"
|
||||
```
|
||||
|
||||
### Cleanup (always at end of deploy steps — condition: always())
|
||||
|
||||
```yaml
|
||||
- script: rm -f /tmp/ansible_ssh_key
|
||||
displayName: "Clean up SSH key"
|
||||
condition: always()
|
||||
|
||||
- script: rm -f /tmp/vault_pass
|
||||
displayName: "Clean up vault password file"
|
||||
condition: always()
|
||||
```
|
||||
|
||||
## Hard rules for Ansible
|
||||
|
||||
- Always pin Ansible version with quoted pip specifier `"ansible==$(ANSIBLE_VERSION)"` — never use `latest`, unquoted `==` may fail in some shells
|
||||
- Always clean up SSH key and vault password files with `condition: always()` — they must be removed even if the playbook fails
|
||||
- Always include `--diff` on real runs so changes are visible in pipeline logs
|
||||
- SSH key file permissions must be `600` — Ansible refuses keys with broader permissions
|
||||
- Use shell variable expansion (`VAULT_ARGS=""`) rather than subshell substitution in the step script to avoid bash syntax issues in ADO agents
|
||||
- For dynamic inventory, AWS credentials come from the OIDC service connection environment — same pattern as Lambda
|
||||
- `requirements.yml` must exist in the repo if galaxy install step is included; if uncertain, wrap with `[ -f requirements.yml ] && ansible-galaxy install -r requirements.yml || true`
|
||||
160
skills/azure-pipeline-docker/SKILL.md
Normal file
160
skills/azure-pipeline-docker/SKILL.md
Normal file
|
|
@ -0,0 +1,160 @@
|
|||
---
|
||||
name: azure-pipeline-docker
|
||||
description: Extends azure-devops-pipeline for Docker image builds and pushes. Handles buildx with layer caching, Trivy scanning, ECR and ACR login, and a git-SHA/tag tagging strategy. Always load azure-devops-pipeline first.
|
||||
---
|
||||
|
||||
## What I add
|
||||
|
||||
Type-specific steps for Docker image pipelines. Merge these into the skeleton from `azure-devops-pipeline`.
|
||||
|
||||
## Additional required inputs — ask the user
|
||||
|
||||
1. **Registry type** — `ECR` | `ACR`
|
||||
2. **Registry URL** — e.g. `123456789.dkr.ecr.us-east-1.amazonaws.com` or `myregistry.azurecr.io`
|
||||
3. **Image repository name** — e.g. `myapp/api`
|
||||
4. **Dockerfile path** — default `./Dockerfile`
|
||||
5. **AWS region** — required if ECR
|
||||
6. **AWS service connection name** — required if ECR
|
||||
7. **ACR service connection name** — required if ACR
|
||||
|
||||
## Lint stage steps
|
||||
|
||||
```yaml
|
||||
- script: |
|
||||
docker run --rm -i hadolint/hadolint < <dockerfile-path>
|
||||
displayName: "Lint — hadolint Dockerfile"
|
||||
```
|
||||
|
||||
## Security scan stage steps
|
||||
|
||||
The security scan builds the image locally and runs Trivy against it **before** pushing. This ensures vulnerabilities are caught pre-push.
|
||||
|
||||
```yaml
|
||||
- script: |
|
||||
docker build \
|
||||
-t scan-target:$(Build.SourceVersion) \
|
||||
-f <dockerfile-path> \
|
||||
.
|
||||
docker run --rm \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
aquasec/trivy:latest image \
|
||||
--exit-code 1 \
|
||||
--severity HIGH,CRITICAL \
|
||||
--format json \
|
||||
--output trivy-results.json \
|
||||
scan-target:$(Build.SourceVersion)
|
||||
displayName: "Security scan — Trivy"
|
||||
- task: PublishBuildArtifacts@1
|
||||
inputs:
|
||||
pathToPublish: trivy-results.json
|
||||
artifactName: security-scan
|
||||
condition: always()
|
||||
displayName: "Publish Trivy results"
|
||||
```
|
||||
|
||||
Note: `condition: always()` on the publish step ensures results are available even when Trivy exits 1. The `--exit-code 1` on the scan step itself still fails the pipeline on HIGH/CRITICAL findings.
|
||||
|
||||
## Build stage steps
|
||||
|
||||
### Step order — always emit in this order
|
||||
|
||||
1. Registry login
|
||||
2. docker buildx build + push
|
||||
|
||||
### Registry login — ECR
|
||||
|
||||
```yaml
|
||||
- script: |
|
||||
aws ecr get-login-password --region <aws-region> | \
|
||||
docker login --username AWS --password-stdin <registry-url>
|
||||
displayName: "Login — ECR"
|
||||
env:
|
||||
AWS_DEFAULT_REGION: <aws-region>
|
||||
# Wire the OIDC service connection at the job level, not inside the script step.
|
||||
# In the job or deployment job that contains this step, set:
|
||||
#
|
||||
# job: Build
|
||||
# pool: EKS-Pool
|
||||
# container: {} # omit if not containerised
|
||||
# services:
|
||||
# ...
|
||||
#
|
||||
# For OIDC federation, the AWSCLI task approach is preferred.
|
||||
# Alternatively, wrap with AWSShellScript@1:
|
||||
#
|
||||
# - task: AWSShellScript@1
|
||||
# inputs:
|
||||
# awsCredentials: <aws-service-connection-name>
|
||||
# regionName: <aws-region>
|
||||
# scriptType: inline
|
||||
# inlineScript: |
|
||||
# aws ecr get-login-password --region <aws-region> | \
|
||||
# docker login --username AWS --password-stdin <registry-url>
|
||||
# displayName: "Login — ECR (via service connection)"
|
||||
```
|
||||
|
||||
AWS credentials come from the OIDC service connection configured on the job — do not add any `AWS_ACCESS_KEY_ID` or `AWS_SECRET_ACCESS_KEY` env vars.
|
||||
|
||||
### Registry login — ACR
|
||||
|
||||
```yaml
|
||||
- task: Docker@2
|
||||
inputs:
|
||||
command: login
|
||||
containerRegistry: <acr-service-connection-name>
|
||||
displayName: "Login — ACR"
|
||||
```
|
||||
|
||||
### Build and push — nonprod
|
||||
|
||||
```yaml
|
||||
- script: |
|
||||
docker buildx create --use --name pipeline-builder 2>/dev/null || \
|
||||
docker buildx use pipeline-builder
|
||||
docker buildx build \
|
||||
--cache-from type=registry,ref=<registry-url>/<image-repo>:cache \
|
||||
--cache-to type=registry,ref=<registry-url>/<image-repo>:cache,mode=max \
|
||||
--tag <registry-url>/<image-repo>:$(Build.SourceVersion) \
|
||||
--tag <registry-url>/<image-repo>:latest \
|
||||
--file <dockerfile-path> \
|
||||
--push \
|
||||
.
|
||||
displayName: "Build and push — nonprod"
|
||||
```
|
||||
|
||||
### Build and push — prod
|
||||
|
||||
```yaml
|
||||
- script: |
|
||||
docker buildx create --use --name pipeline-builder 2>/dev/null || \
|
||||
docker buildx use pipeline-builder
|
||||
docker buildx build \
|
||||
--cache-from type=registry,ref=<registry-url>/<image-repo>:cache \
|
||||
--cache-to type=registry,ref=<registry-url>/<image-repo>:cache,mode=max \
|
||||
--tag <registry-url>/<image-repo>:$(Build.SourceBranchName) \
|
||||
--tag <registry-url>/<image-repo>:$(Build.SourceVersion) \
|
||||
--file <dockerfile-path> \
|
||||
--push \
|
||||
.
|
||||
displayName: "Build and push — prod"
|
||||
```
|
||||
|
||||
## Tagging strategy
|
||||
|
||||
| Tier | Tags applied |
|
||||
|---------|---------------------------------------------------|
|
||||
| Nonprod | `<git-sha>`, `latest` |
|
||||
| Prod | `<git-tag / branch-name>`, `<git-sha>` |
|
||||
|
||||
Never tag prod images as `latest`.
|
||||
|
||||
## Hard rules for Docker
|
||||
|
||||
- Always use `docker buildx` — never plain `docker build`
|
||||
- Trivy scan must run before push — the scan in SecurityScan stage uses a locally built image, not a registry pull
|
||||
- `--exit-code 1` on Trivy is non-negotiable — HIGH and CRITICAL findings must fail the pipeline
|
||||
- Never tag prod images as `latest` — prod tags use `$(Build.SourceBranchName)` and `$(Build.SourceVersion)` only
|
||||
- Build args containing secrets must come from ADO variables injected via `env:` — never hardcoded in YAML
|
||||
- Registry layer cache lives in the registry itself (not ADO pipeline cache) for reproducibility across EKS-Pool agents
|
||||
- ECR login uses OIDC credentials only — never hardcode `AWS_ACCESS_KEY_ID` or `AWS_SECRET_ACCESS_KEY`
|
||||
- The `docker buildx create --use ... || docker buildx use ...` pattern is required to handle re-use across runs without error
|
||||
158
skills/azure-pipeline-lambda/SKILL.md
Normal file
158
skills/azure-pipeline-lambda/SKILL.md
Normal file
|
|
@ -0,0 +1,158 @@
|
|||
---
|
||||
name: azure-pipeline-lambda
|
||||
description: Extends azure-devops-pipeline for AWS Lambda deployments. Handles zip and container packaging, OIDC credentials, function update and alias promotion. Always load azure-devops-pipeline first.
|
||||
---
|
||||
|
||||
## What I add
|
||||
|
||||
Type-specific steps for AWS Lambda pipelines. Merge these into the skeleton from `azure-devops-pipeline`.
|
||||
|
||||
## Additional required inputs — ask the user
|
||||
|
||||
1. **Function name** — the Lambda function name in AWS
|
||||
2. **AWS region** — e.g. `us-east-1`
|
||||
3. **AWS service connection name** — the ADO AWS OIDC service connection name
|
||||
4. **Packaging method** — `zip` | `container`
|
||||
5. **Deployment method** — `aws-cli` | `SAM` | `CDK`
|
||||
6. **Runtime** — `python3.x` | `nodejs20.x` | other (for linting tool selection)
|
||||
7. **Alias to update** — e.g. `nonprod` or `prod` (matches target tier)
|
||||
|
||||
## Lint stage steps
|
||||
|
||||
### Python runtime
|
||||
```yaml
|
||||
- script: pip install pylint && pylint src/ --fail-under=7
|
||||
displayName: "Lint — pylint"
|
||||
- script: |
|
||||
pip install cfn-lint
|
||||
cfn-lint template.yaml 2>/dev/null || true
|
||||
displayName: "Lint — cfn-lint (CloudFormation, if present)"
|
||||
continueOnError: true
|
||||
```
|
||||
|
||||
### Node runtime
|
||||
```yaml
|
||||
- script: npm ci && npx eslint src/
|
||||
displayName: "Lint — eslint"
|
||||
```
|
||||
|
||||
## Security scan stage steps
|
||||
|
||||
### Python runtime
|
||||
```yaml
|
||||
- script: |
|
||||
pip install pip-audit
|
||||
pip-audit -r requirements.txt --output json > pip-audit-results.json
|
||||
displayName: "Security scan — pip-audit"
|
||||
- task: PublishBuildArtifacts@1
|
||||
inputs:
|
||||
pathToPublish: pip-audit-results.json
|
||||
artifactName: security-scan
|
||||
displayName: "Publish scan results"
|
||||
```
|
||||
|
||||
### Node runtime
|
||||
```yaml
|
||||
- script: |
|
||||
npm audit --json > npm-audit-results.json || true
|
||||
npm audit --audit-level=high
|
||||
displayName: "Security scan — npm audit"
|
||||
- task: PublishBuildArtifacts@1
|
||||
inputs:
|
||||
pathToPublish: npm-audit-results.json
|
||||
artifactName: security-scan
|
||||
displayName: "Publish scan results"
|
||||
```
|
||||
|
||||
## Build stage steps (zip packaging)
|
||||
|
||||
```yaml
|
||||
- script: |
|
||||
mkdir -p package
|
||||
# Python: install deps into package dir
|
||||
pip install -r requirements.txt -t ./package
|
||||
# Copy handler (adjust filename as needed)
|
||||
cp *.py ./package/
|
||||
# Remove dev/test artifacts
|
||||
find ./package -name "*.pyc" -delete
|
||||
find ./package -name "__pycache__" -type d -exec rm -rf {} + 2>/dev/null || true
|
||||
find ./package -name "*.dist-info" -type d -exec rm -rf {} + 2>/dev/null || true
|
||||
cd package && zip -r ../$(Build.BuildNumber).zip .
|
||||
displayName: "Package Lambda — zip (Python)"
|
||||
- task: PublishBuildArtifacts@1
|
||||
inputs:
|
||||
pathToPublish: $(Build.BuildNumber).zip
|
||||
artifactName: lambda-package
|
||||
displayName: "Publish Lambda artifact"
|
||||
```
|
||||
|
||||
For Node runtime, replace the pip install/cp lines with:
|
||||
```yaml
|
||||
- script: |
|
||||
npm ci --omit=dev
|
||||
zip -r $(Build.BuildNumber).zip . \
|
||||
--exclude "*.git*" \
|
||||
--exclude "*node_modules/.cache*" \
|
||||
--exclude "*test*" \
|
||||
--exclude "*.spec.*" \
|
||||
--exclude "*.test.*"
|
||||
displayName: "Package Lambda — zip (Node)"
|
||||
```
|
||||
|
||||
## Build stage steps (container packaging)
|
||||
|
||||
Use the full `azure-pipeline-docker` steps for the container build. Reference the resulting image URI in the Lambda deploy step by passing `--image-uri` instead of `--zip-file`.
|
||||
|
||||
## Deploy stage steps (aws-cli method)
|
||||
|
||||
```yaml
|
||||
- task: AWSCLI@1
|
||||
inputs:
|
||||
awsCredentials: <aws-service-connection-name>
|
||||
regionName: <aws-region>
|
||||
awsCommand: lambda
|
||||
awsSubCommand: update-function-code
|
||||
awsArguments: >-
|
||||
--function-name <function-name>
|
||||
--zip-file fileb://$(Pipeline.Workspace)/lambda-package/$(Build.BuildNumber).zip
|
||||
displayName: "Deploy — update function code"
|
||||
|
||||
- task: AWSCLI@1
|
||||
inputs:
|
||||
awsCredentials: <aws-service-connection-name>
|
||||
regionName: <aws-region>
|
||||
awsCommand: lambda
|
||||
awsSubCommand: wait
|
||||
awsArguments: function-updated --function-name <function-name>
|
||||
displayName: "Deploy — wait for update"
|
||||
|
||||
- task: AWSCLI@1
|
||||
inputs:
|
||||
awsCredentials: <aws-service-connection-name>
|
||||
regionName: <aws-region>
|
||||
awsCommand: lambda
|
||||
awsSubCommand: publish-version
|
||||
awsArguments: --function-name <function-name>
|
||||
displayName: "Deploy — publish version"
|
||||
|
||||
- script: |
|
||||
VERSION=$(aws lambda list-versions-by-function \
|
||||
--function-name <function-name> \
|
||||
--query "Versions[-1].Version" \
|
||||
--output text)
|
||||
aws lambda update-alias \
|
||||
--function-name <function-name> \
|
||||
--name <alias-name> \
|
||||
--function-version "$VERSION"
|
||||
displayName: "Deploy — update alias"
|
||||
env:
|
||||
AWS_DEFAULT_REGION: <aws-region>
|
||||
```
|
||||
|
||||
## Hard rules for Lambda
|
||||
|
||||
- Always use OIDC service connection — never hardcode `AWS_ACCESS_KEY_ID` or `AWS_SECRET_ACCESS_KEY` in the pipeline YAML
|
||||
- Always wait for `function-updated` before publishing version — skipping this causes race conditions
|
||||
- Always update alias after publishing version — direct function invocation without alias is not acceptable
|
||||
- Zip packaging: always exclude `.git`, `__pycache__`, `*.pyc`, `node_modules/.cache`, test files
|
||||
- Shell variable expansion in AWSCLI task `awsArguments` requires `>-` (block scalar) not `>` to avoid newline issues
|
||||
25
skills/backend-patterns/README.md
Normal file
25
skills/backend-patterns/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
598
skills/backend-patterns/SKILL.md
Normal file
598
skills/backend-patterns/SKILL.md
Normal file
|
|
@ -0,0 +1,598 @@
|
|||
---
|
||||
name: backend-patterns
|
||||
description: Backend architecture patterns, API design, database optimization, and server-side best practices for Node.js, Express, and Next.js API routes.
|
||||
origin: ECC
|
||||
---
|
||||
|
||||
# Backend Development Patterns
|
||||
|
||||
Backend architecture patterns and best practices for scalable server-side applications.
|
||||
|
||||
## When to Activate
|
||||
|
||||
- Designing REST or GraphQL API endpoints
|
||||
- Implementing repository, service, or controller layers
|
||||
- Optimizing database queries (N+1, indexing, connection pooling)
|
||||
- Adding caching (Redis, in-memory, HTTP cache headers)
|
||||
- Setting up background jobs or async processing
|
||||
- Structuring error handling and validation for APIs
|
||||
- Building middleware (auth, logging, rate limiting)
|
||||
|
||||
## API Design Patterns
|
||||
|
||||
### RESTful API Structure
|
||||
|
||||
```typescript
|
||||
// ✅ Resource-based URLs
|
||||
GET /api/markets # List resources
|
||||
GET /api/markets/:id # Get single resource
|
||||
POST /api/markets # Create resource
|
||||
PUT /api/markets/:id # Replace resource
|
||||
PATCH /api/markets/:id # Update resource
|
||||
DELETE /api/markets/:id # Delete resource
|
||||
|
||||
// ✅ Query parameters for filtering, sorting, pagination
|
||||
GET /api/markets?status=active&sort=volume&limit=20&offset=0
|
||||
```
|
||||
|
||||
### Repository Pattern
|
||||
|
||||
```typescript
|
||||
// Abstract data access logic
|
||||
interface MarketRepository {
|
||||
findAll(filters?: MarketFilters): Promise<Market[]>
|
||||
findById(id: string): Promise<Market | null>
|
||||
create(data: CreateMarketDto): Promise<Market>
|
||||
update(id: string, data: UpdateMarketDto): Promise<Market>
|
||||
delete(id: string): Promise<void>
|
||||
}
|
||||
|
||||
class SupabaseMarketRepository implements MarketRepository {
|
||||
async findAll(filters?: MarketFilters): Promise<Market[]> {
|
||||
let query = supabase.from('markets').select('*')
|
||||
|
||||
if (filters?.status) {
|
||||
query = query.eq('status', filters.status)
|
||||
}
|
||||
|
||||
if (filters?.limit) {
|
||||
query = query.limit(filters.limit)
|
||||
}
|
||||
|
||||
const { data, error } = await query
|
||||
|
||||
if (error) throw new Error(error.message)
|
||||
return data
|
||||
}
|
||||
|
||||
// Other methods...
|
||||
}
|
||||
```
|
||||
|
||||
### Service Layer Pattern
|
||||
|
||||
```typescript
|
||||
// Business logic separated from data access
|
||||
class MarketService {
|
||||
constructor(private marketRepo: MarketRepository) {}
|
||||
|
||||
async searchMarkets(query: string, limit: number = 10): Promise<Market[]> {
|
||||
// Business logic
|
||||
const embedding = await generateEmbedding(query)
|
||||
const results = await this.vectorSearch(embedding, limit)
|
||||
|
||||
// Fetch full data
|
||||
const markets = await this.marketRepo.findByIds(results.map(r => r.id))
|
||||
|
||||
// Sort by similarity
|
||||
return markets.sort((a, b) => {
|
||||
const scoreA = results.find(r => r.id === a.id)?.score || 0
|
||||
const scoreB = results.find(r => r.id === b.id)?.score || 0
|
||||
return scoreA - scoreB
|
||||
})
|
||||
}
|
||||
|
||||
private async vectorSearch(embedding: number[], limit: number) {
|
||||
// Vector search implementation
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Middleware Pattern
|
||||
|
||||
```typescript
|
||||
// Request/response processing pipeline
|
||||
export function withAuth(handler: NextApiHandler): NextApiHandler {
|
||||
return async (req, res) => {
|
||||
const token = req.headers.authorization?.replace('Bearer ', '')
|
||||
|
||||
if (!token) {
|
||||
return res.status(401).json({ error: 'Unauthorized' })
|
||||
}
|
||||
|
||||
try {
|
||||
const user = await verifyToken(token)
|
||||
req.user = user
|
||||
return handler(req, res)
|
||||
} catch (error) {
|
||||
return res.status(401).json({ error: 'Invalid token' })
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Usage
|
||||
export default withAuth(async (req, res) => {
|
||||
// Handler has access to req.user
|
||||
})
|
||||
```
|
||||
|
||||
## Database Patterns
|
||||
|
||||
### Query Optimization
|
||||
|
||||
```typescript
|
||||
// ✅ GOOD: Select only needed columns
|
||||
const { data } = await supabase
|
||||
.from('markets')
|
||||
.select('id, name, status, volume')
|
||||
.eq('status', 'active')
|
||||
.order('volume', { ascending: false })
|
||||
.limit(10)
|
||||
|
||||
// ❌ BAD: Select everything
|
||||
const { data } = await supabase
|
||||
.from('markets')
|
||||
.select('*')
|
||||
```
|
||||
|
||||
### N+1 Query Prevention
|
||||
|
||||
```typescript
|
||||
// ❌ BAD: N+1 query problem
|
||||
const markets = await getMarkets()
|
||||
for (const market of markets) {
|
||||
market.creator = await getUser(market.creator_id) // N queries
|
||||
}
|
||||
|
||||
// ✅ GOOD: Batch fetch
|
||||
const markets = await getMarkets()
|
||||
const creatorIds = markets.map(m => m.creator_id)
|
||||
const creators = await getUsers(creatorIds) // 1 query
|
||||
const creatorMap = new Map(creators.map(c => [c.id, c]))
|
||||
|
||||
markets.forEach(market => {
|
||||
market.creator = creatorMap.get(market.creator_id)
|
||||
})
|
||||
```
|
||||
|
||||
### Transaction Pattern
|
||||
|
||||
```typescript
|
||||
async function createMarketWithPosition(
|
||||
marketData: CreateMarketDto,
|
||||
positionData: CreatePositionDto
|
||||
) {
|
||||
// Use Supabase transaction
|
||||
const { data, error } = await supabase.rpc('create_market_with_position', {
|
||||
market_data: marketData,
|
||||
position_data: positionData
|
||||
})
|
||||
|
||||
if (error) throw new Error('Transaction failed')
|
||||
return data
|
||||
}
|
||||
|
||||
// SQL function in Supabase
|
||||
CREATE OR REPLACE FUNCTION create_market_with_position(
|
||||
market_data jsonb,
|
||||
position_data jsonb
|
||||
)
|
||||
RETURNS jsonb
|
||||
LANGUAGE plpgsql
|
||||
AS $$
|
||||
BEGIN
|
||||
-- Start transaction automatically
|
||||
INSERT INTO markets VALUES (market_data);
|
||||
INSERT INTO positions VALUES (position_data);
|
||||
RETURN jsonb_build_object('success', true);
|
||||
EXCEPTION
|
||||
WHEN OTHERS THEN
|
||||
-- Rollback happens automatically
|
||||
RETURN jsonb_build_object('success', false, 'error', SQLERRM);
|
||||
END;
|
||||
$$;
|
||||
```
|
||||
|
||||
## Caching Strategies
|
||||
|
||||
### Redis Caching Layer
|
||||
|
||||
```typescript
|
||||
class CachedMarketRepository implements MarketRepository {
|
||||
constructor(
|
||||
private baseRepo: MarketRepository,
|
||||
private redis: RedisClient
|
||||
) {}
|
||||
|
||||
async findById(id: string): Promise<Market | null> {
|
||||
// Check cache first
|
||||
const cached = await this.redis.get(`market:${id}`)
|
||||
|
||||
if (cached) {
|
||||
return JSON.parse(cached)
|
||||
}
|
||||
|
||||
// Cache miss - fetch from database
|
||||
const market = await this.baseRepo.findById(id)
|
||||
|
||||
if (market) {
|
||||
// Cache for 5 minutes
|
||||
await this.redis.setex(`market:${id}`, 300, JSON.stringify(market))
|
||||
}
|
||||
|
||||
return market
|
||||
}
|
||||
|
||||
async invalidateCache(id: string): Promise<void> {
|
||||
await this.redis.del(`market:${id}`)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Cache-Aside Pattern
|
||||
|
||||
```typescript
|
||||
async function getMarketWithCache(id: string): Promise<Market> {
|
||||
const cacheKey = `market:${id}`
|
||||
|
||||
// Try cache
|
||||
const cached = await redis.get(cacheKey)
|
||||
if (cached) return JSON.parse(cached)
|
||||
|
||||
// Cache miss - fetch from DB
|
||||
const market = await db.markets.findUnique({ where: { id } })
|
||||
|
||||
if (!market) throw new Error('Market not found')
|
||||
|
||||
// Update cache
|
||||
await redis.setex(cacheKey, 300, JSON.stringify(market))
|
||||
|
||||
return market
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling Patterns
|
||||
|
||||
### Centralized Error Handler
|
||||
|
||||
```typescript
|
||||
class ApiError extends Error {
|
||||
constructor(
|
||||
public statusCode: number,
|
||||
public message: string,
|
||||
public isOperational = true
|
||||
) {
|
||||
super(message)
|
||||
Object.setPrototypeOf(this, ApiError.prototype)
|
||||
}
|
||||
}
|
||||
|
||||
export function errorHandler(error: unknown, req: Request): Response {
|
||||
if (error instanceof ApiError) {
|
||||
return NextResponse.json({
|
||||
success: false,
|
||||
error: error.message
|
||||
}, { status: error.statusCode })
|
||||
}
|
||||
|
||||
if (error instanceof z.ZodError) {
|
||||
return NextResponse.json({
|
||||
success: false,
|
||||
error: 'Validation failed',
|
||||
details: error.errors
|
||||
}, { status: 400 })
|
||||
}
|
||||
|
||||
// Log unexpected errors
|
||||
console.error('Unexpected error:', error)
|
||||
|
||||
return NextResponse.json({
|
||||
success: false,
|
||||
error: 'Internal server error'
|
||||
}, { status: 500 })
|
||||
}
|
||||
|
||||
// Usage
|
||||
export async function GET(request: Request) {
|
||||
try {
|
||||
const data = await fetchData()
|
||||
return NextResponse.json({ success: true, data })
|
||||
} catch (error) {
|
||||
return errorHandler(error, request)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Retry with Exponential Backoff
|
||||
|
||||
```typescript
|
||||
async function fetchWithRetry<T>(
|
||||
fn: () => Promise<T>,
|
||||
maxRetries = 3
|
||||
): Promise<T> {
|
||||
let lastError: Error
|
||||
|
||||
for (let i = 0; i < maxRetries; i++) {
|
||||
try {
|
||||
return await fn()
|
||||
} catch (error) {
|
||||
lastError = error as Error
|
||||
|
||||
if (i < maxRetries - 1) {
|
||||
// Exponential backoff: 1s, 2s, 4s
|
||||
const delay = Math.pow(2, i) * 1000
|
||||
await new Promise(resolve => setTimeout(resolve, delay))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
throw lastError!
|
||||
}
|
||||
|
||||
// Usage
|
||||
const data = await fetchWithRetry(() => fetchFromAPI())
|
||||
```
|
||||
|
||||
## Authentication & Authorization
|
||||
|
||||
### JWT Token Validation
|
||||
|
||||
```typescript
|
||||
import jwt from 'jsonwebtoken'
|
||||
|
||||
interface JWTPayload {
|
||||
userId: string
|
||||
email: string
|
||||
role: 'admin' | 'user'
|
||||
}
|
||||
|
||||
export function verifyToken(token: string): JWTPayload {
|
||||
try {
|
||||
const payload = jwt.verify(token, process.env.JWT_SECRET!) as JWTPayload
|
||||
return payload
|
||||
} catch (error) {
|
||||
throw new ApiError(401, 'Invalid token')
|
||||
}
|
||||
}
|
||||
|
||||
export async function requireAuth(request: Request) {
|
||||
const token = request.headers.get('authorization')?.replace('Bearer ', '')
|
||||
|
||||
if (!token) {
|
||||
throw new ApiError(401, 'Missing authorization token')
|
||||
}
|
||||
|
||||
return verifyToken(token)
|
||||
}
|
||||
|
||||
// Usage in API route
|
||||
export async function GET(request: Request) {
|
||||
const user = await requireAuth(request)
|
||||
|
||||
const data = await getDataForUser(user.userId)
|
||||
|
||||
return NextResponse.json({ success: true, data })
|
||||
}
|
||||
```
|
||||
|
||||
### Role-Based Access Control
|
||||
|
||||
```typescript
|
||||
type Permission = 'read' | 'write' | 'delete' | 'admin'
|
||||
|
||||
interface User {
|
||||
id: string
|
||||
role: 'admin' | 'moderator' | 'user'
|
||||
}
|
||||
|
||||
const rolePermissions: Record<User['role'], Permission[]> = {
|
||||
admin: ['read', 'write', 'delete', 'admin'],
|
||||
moderator: ['read', 'write', 'delete'],
|
||||
user: ['read', 'write']
|
||||
}
|
||||
|
||||
export function hasPermission(user: User, permission: Permission): boolean {
|
||||
return rolePermissions[user.role].includes(permission)
|
||||
}
|
||||
|
||||
export function requirePermission(permission: Permission) {
|
||||
return (handler: (request: Request, user: User) => Promise<Response>) => {
|
||||
return async (request: Request) => {
|
||||
const user = await requireAuth(request)
|
||||
|
||||
if (!hasPermission(user, permission)) {
|
||||
throw new ApiError(403, 'Insufficient permissions')
|
||||
}
|
||||
|
||||
return handler(request, user)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Usage - HOF wraps the handler
|
||||
export const DELETE = requirePermission('delete')(
|
||||
async (request: Request, user: User) => {
|
||||
// Handler receives authenticated user with verified permission
|
||||
return new Response('Deleted', { status: 200 })
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
### Simple In-Memory Rate Limiter
|
||||
|
||||
```typescript
|
||||
class RateLimiter {
|
||||
private requests = new Map<string, number[]>()
|
||||
|
||||
async checkLimit(
|
||||
identifier: string,
|
||||
maxRequests: number,
|
||||
windowMs: number
|
||||
): Promise<boolean> {
|
||||
const now = Date.now()
|
||||
const requests = this.requests.get(identifier) || []
|
||||
|
||||
// Remove old requests outside window
|
||||
const recentRequests = requests.filter(time => now - time < windowMs)
|
||||
|
||||
if (recentRequests.length >= maxRequests) {
|
||||
return false // Rate limit exceeded
|
||||
}
|
||||
|
||||
// Add current request
|
||||
recentRequests.push(now)
|
||||
this.requests.set(identifier, recentRequests)
|
||||
|
||||
return true
|
||||
}
|
||||
}
|
||||
|
||||
const limiter = new RateLimiter()
|
||||
|
||||
export async function GET(request: Request) {
|
||||
const ip = request.headers.get('x-forwarded-for') || 'unknown'
|
||||
|
||||
const allowed = await limiter.checkLimit(ip, 100, 60000) // 100 req/min
|
||||
|
||||
if (!allowed) {
|
||||
return NextResponse.json({
|
||||
error: 'Rate limit exceeded'
|
||||
}, { status: 429 })
|
||||
}
|
||||
|
||||
// Continue with request
|
||||
}
|
||||
```
|
||||
|
||||
## Background Jobs & Queues
|
||||
|
||||
### Simple Queue Pattern
|
||||
|
||||
```typescript
|
||||
class JobQueue<T> {
|
||||
private queue: T[] = []
|
||||
private processing = false
|
||||
|
||||
async add(job: T): Promise<void> {
|
||||
this.queue.push(job)
|
||||
|
||||
if (!this.processing) {
|
||||
this.process()
|
||||
}
|
||||
}
|
||||
|
||||
private async process(): Promise<void> {
|
||||
this.processing = true
|
||||
|
||||
while (this.queue.length > 0) {
|
||||
const job = this.queue.shift()!
|
||||
|
||||
try {
|
||||
await this.execute(job)
|
||||
} catch (error) {
|
||||
console.error('Job failed:', error)
|
||||
}
|
||||
}
|
||||
|
||||
this.processing = false
|
||||
}
|
||||
|
||||
private async execute(job: T): Promise<void> {
|
||||
// Job execution logic
|
||||
}
|
||||
}
|
||||
|
||||
// Usage for indexing markets
|
||||
interface IndexJob {
|
||||
marketId: string
|
||||
}
|
||||
|
||||
const indexQueue = new JobQueue<IndexJob>()
|
||||
|
||||
export async function POST(request: Request) {
|
||||
const { marketId } = await request.json()
|
||||
|
||||
// Add to queue instead of blocking
|
||||
await indexQueue.add({ marketId })
|
||||
|
||||
return NextResponse.json({ success: true, message: 'Job queued' })
|
||||
}
|
||||
```
|
||||
|
||||
## Logging & Monitoring
|
||||
|
||||
### Structured Logging
|
||||
|
||||
```typescript
|
||||
interface LogContext {
|
||||
userId?: string
|
||||
requestId?: string
|
||||
method?: string
|
||||
path?: string
|
||||
[key: string]: unknown
|
||||
}
|
||||
|
||||
class Logger {
|
||||
log(level: 'info' | 'warn' | 'error', message: string, context?: LogContext) {
|
||||
const entry = {
|
||||
timestamp: new Date().toISOString(),
|
||||
level,
|
||||
message,
|
||||
...context
|
||||
}
|
||||
|
||||
console.log(JSON.stringify(entry))
|
||||
}
|
||||
|
||||
info(message: string, context?: LogContext) {
|
||||
this.log('info', message, context)
|
||||
}
|
||||
|
||||
warn(message: string, context?: LogContext) {
|
||||
this.log('warn', message, context)
|
||||
}
|
||||
|
||||
error(message: string, error: Error, context?: LogContext) {
|
||||
this.log('error', message, {
|
||||
...context,
|
||||
error: error.message,
|
||||
stack: error.stack
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
const logger = new Logger()
|
||||
|
||||
// Usage
|
||||
export async function GET(request: Request) {
|
||||
const requestId = crypto.randomUUID()
|
||||
|
||||
logger.info('Fetching markets', {
|
||||
requestId,
|
||||
method: 'GET',
|
||||
path: '/api/markets'
|
||||
})
|
||||
|
||||
try {
|
||||
const markets = await fetchMarkets()
|
||||
return NextResponse.json({ success: true, data: markets })
|
||||
} catch (error) {
|
||||
logger.error('Failed to fetch markets', error as Error, { requestId })
|
||||
return NextResponse.json({ error: 'Internal error' }, { status: 500 })
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Remember**: Backend patterns enable scalable, maintainable server-side applications. Choose patterns that fit your complexity level.
|
||||
25
skills/bash-defensive-patterns/README.md
Normal file
25
skills/bash-defensive-patterns/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
46
skills/bash-defensive-patterns/SKILL.md
Normal file
46
skills/bash-defensive-patterns/SKILL.md
Normal file
|
|
@ -0,0 +1,46 @@
|
|||
---
|
||||
name: bash-defensive-patterns
|
||||
description: "Master defensive Bash programming techniques for production-grade scripts. Use when writing robust shell scripts, CI/CD pipelines, or system utilities requiring fault tolerance and safety."
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Bash Defensive Patterns
|
||||
|
||||
Comprehensive guidance for writing production-ready Bash scripts using defensive programming techniques, error handling, and safety best practices to prevent common pitfalls and ensure reliability.
|
||||
|
||||
## Use this skill when
|
||||
|
||||
- Writing production automation scripts
|
||||
- Building CI/CD pipeline scripts
|
||||
- Creating system administration utilities
|
||||
- Developing error-resilient deployment automation
|
||||
- Writing scripts that must handle edge cases safely
|
||||
- Building maintainable shell script libraries
|
||||
- Implementing comprehensive logging and monitoring
|
||||
- Creating scripts that must work across different platforms
|
||||
|
||||
## Do not use this skill when
|
||||
|
||||
- You need a single ad-hoc shell command, not a script
|
||||
- The target environment requires strict POSIX sh only
|
||||
- The task is unrelated to shell scripting or automation
|
||||
|
||||
## Instructions
|
||||
|
||||
1. Confirm the target shell, OS, and execution environment.
|
||||
2. Enable strict mode and safe defaults from the start.
|
||||
3. Validate inputs, quote variables, and handle files safely.
|
||||
4. Add logging, error traps, and basic tests.
|
||||
|
||||
## Safety
|
||||
|
||||
- Avoid destructive commands without confirmation or dry-run flags.
|
||||
- Do not run scripts as root unless strictly required.
|
||||
|
||||
Refer to `resources/implementation-playbook.md` for detailed patterns, checklists, and templates.
|
||||
|
||||
## Resources
|
||||
|
||||
- `resources/implementation-playbook.md` for detailed patterns, checklists, and templates.
|
||||
25
skills/bash-defensive-patterns/resources/README.md
Normal file
25
skills/bash-defensive-patterns/resources/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
|
|
@ -0,0 +1,517 @@
|
|||
# Bash Defensive Patterns Implementation Playbook
|
||||
|
||||
This file contains detailed patterns, checklists, and code samples referenced by the skill.
|
||||
|
||||
## Core Defensive Principles
|
||||
|
||||
### 1. Strict Mode
|
||||
Enable bash strict mode at the start of every script to catch errors early.
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -Eeuo pipefail # Exit on error, unset variables, pipe failures
|
||||
```
|
||||
|
||||
**Key flags:**
|
||||
- `set -E`: Inherit ERR trap in functions
|
||||
- `set -e`: Exit on any error (command returns non-zero)
|
||||
- `set -u`: Exit on undefined variable reference
|
||||
- `set -o pipefail`: Pipe fails if any command fails (not just last)
|
||||
|
||||
### 2. Error Trapping and Cleanup
|
||||
Implement proper cleanup on script exit or error.
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -Eeuo pipefail
|
||||
|
||||
trap 'echo "Error on line $LINENO"' ERR
|
||||
trap 'echo "Cleaning up..."; rm -rf "$TMPDIR"' EXIT
|
||||
|
||||
TMPDIR=$(mktemp -d)
|
||||
# Script code here
|
||||
```
|
||||
|
||||
### 3. Variable Safety
|
||||
Always quote variables to prevent word splitting and globbing issues.
|
||||
|
||||
```bash
|
||||
# Wrong - unsafe
|
||||
cp $source $dest
|
||||
|
||||
# Correct - safe
|
||||
cp "$source" "$dest"
|
||||
|
||||
# Required variables - fail with message if unset
|
||||
: "${REQUIRED_VAR:?REQUIRED_VAR is not set}"
|
||||
```
|
||||
|
||||
### 4. Array Handling
|
||||
Use arrays safely for complex data handling.
|
||||
|
||||
```bash
|
||||
# Safe array iteration
|
||||
declare -a items=("item 1" "item 2" "item 3")
|
||||
|
||||
for item in "${items[@]}"; do
|
||||
echo "Processing: $item"
|
||||
done
|
||||
|
||||
# Reading output into array safely
|
||||
mapfile -t lines < <(some_command)
|
||||
readarray -t numbers < <(seq 1 10)
|
||||
```
|
||||
|
||||
### 5. Conditional Safety
|
||||
Use `[[ ]]` for Bash-specific features, `[ ]` for POSIX.
|
||||
|
||||
```bash
|
||||
# Bash - safer
|
||||
if [[ -f "$file" && -r "$file" ]]; then
|
||||
content=$(<"$file")
|
||||
fi
|
||||
|
||||
# POSIX - portable
|
||||
if [ -f "$file" ] && [ -r "$file" ]; then
|
||||
content=$(cat "$file")
|
||||
fi
|
||||
|
||||
# Test for existence before operations
|
||||
if [[ -z "${VAR:-}" ]]; then
|
||||
echo "VAR is not set or is empty"
|
||||
fi
|
||||
```
|
||||
|
||||
## Fundamental Patterns
|
||||
|
||||
### Pattern 1: Safe Script Directory Detection
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -Eeuo pipefail
|
||||
|
||||
# Correctly determine script directory
|
||||
SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)"
|
||||
SCRIPT_NAME="$(basename -- "${BASH_SOURCE[0]}")"
|
||||
|
||||
echo "Script location: $SCRIPT_DIR/$SCRIPT_NAME"
|
||||
```
|
||||
|
||||
### Pattern 2: Comprehensive Function Templat
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -Eeuo pipefail
|
||||
|
||||
# Prefix for functions: handle_*, process_*, check_*, validate_*
|
||||
# Include documentation and error handling
|
||||
|
||||
validate_file() {
|
||||
local -r file="$1"
|
||||
local -r message="${2:-File not found: $file}"
|
||||
|
||||
if [[ ! -f "$file" ]]; then
|
||||
echo "ERROR: $message" >&2
|
||||
return 1
|
||||
fi
|
||||
return 0
|
||||
}
|
||||
|
||||
process_files() {
|
||||
local -r input_dir="$1"
|
||||
local -r output_dir="$2"
|
||||
|
||||
# Validate inputs
|
||||
[[ -d "$input_dir" ]] || { echo "ERROR: input_dir not a directory" >&2; return 1; }
|
||||
|
||||
# Create output directory if needed
|
||||
mkdir -p "$output_dir" || { echo "ERROR: Cannot create output_dir" >&2; return 1; }
|
||||
|
||||
# Process files safely
|
||||
while IFS= read -r -d '' file; do
|
||||
echo "Processing: $file"
|
||||
# Do work
|
||||
done < <(find "$input_dir" -maxdepth 1 -type f -print0)
|
||||
|
||||
return 0
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 3: Safe Temporary File Handling
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -Eeuo pipefail
|
||||
|
||||
trap 'rm -rf -- "$TMPDIR"' EXIT
|
||||
|
||||
# Create temporary directory
|
||||
TMPDIR=$(mktemp -d) || { echo "ERROR: Failed to create temp directory" >&2; exit 1; }
|
||||
|
||||
# Create temporary files in directory
|
||||
TMPFILE1="$TMPDIR/temp1.txt"
|
||||
TMPFILE2="$TMPDIR/temp2.txt"
|
||||
|
||||
# Use temporary files
|
||||
touch "$TMPFILE1" "$TMPFILE2"
|
||||
|
||||
echo "Temp files created in: $TMPDIR"
|
||||
```
|
||||
|
||||
### Pattern 4: Robust Argument Parsing
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -Eeuo pipefail
|
||||
|
||||
# Default values
|
||||
VERBOSE=false
|
||||
DRY_RUN=false
|
||||
OUTPUT_FILE=""
|
||||
THREADS=4
|
||||
|
||||
usage() {
|
||||
cat <<EOF
|
||||
Usage: $0 [OPTIONS]
|
||||
|
||||
Options:
|
||||
-v, --verbose Enable verbose output
|
||||
-d, --dry-run Run without making changes
|
||||
-o, --output FILE Output file path
|
||||
-j, --jobs NUM Number of parallel jobs
|
||||
-h, --help Show this help message
|
||||
EOF
|
||||
exit "${1:-0}"
|
||||
}
|
||||
|
||||
# Parse arguments
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
-v|--verbose)
|
||||
VERBOSE=true
|
||||
shift
|
||||
;;
|
||||
-d|--dry-run)
|
||||
DRY_RUN=true
|
||||
shift
|
||||
;;
|
||||
-o|--output)
|
||||
OUTPUT_FILE="$2"
|
||||
shift 2
|
||||
;;
|
||||
-j|--jobs)
|
||||
THREADS="$2"
|
||||
shift 2
|
||||
;;
|
||||
-h|--help)
|
||||
usage 0
|
||||
;;
|
||||
--)
|
||||
shift
|
||||
break
|
||||
;;
|
||||
*)
|
||||
echo "ERROR: Unknown option: $1" >&2
|
||||
usage 1
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
# Validate required arguments
|
||||
[[ -n "$OUTPUT_FILE" ]] || { echo "ERROR: -o/--output is required" >&2; usage 1; }
|
||||
```
|
||||
|
||||
### Pattern 5: Structured Logging
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -Eeuo pipefail
|
||||
|
||||
# Logging functions
|
||||
log_info() {
|
||||
echo "[$(date +'%Y-%m-%d %H:%M:%S')] INFO: $*" >&2
|
||||
}
|
||||
|
||||
log_warn() {
|
||||
echo "[$(date +'%Y-%m-%d %H:%M:%S')] WARN: $*" >&2
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo "[$(date +'%Y-%m-%d %H:%M:%S')] ERROR: $*" >&2
|
||||
}
|
||||
|
||||
log_debug() {
|
||||
if [[ "${DEBUG:-0}" == "1" ]]; then
|
||||
echo "[$(date +'%Y-%m-%d %H:%M:%S')] DEBUG: $*" >&2
|
||||
fi
|
||||
}
|
||||
|
||||
# Usage
|
||||
log_info "Starting script"
|
||||
log_debug "Debug information"
|
||||
log_warn "Warning message"
|
||||
log_error "Error occurred"
|
||||
```
|
||||
|
||||
### Pattern 6: Process Orchestration with Signals
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -Eeuo pipefail
|
||||
|
||||
# Track background processes
|
||||
PIDS=()
|
||||
|
||||
cleanup() {
|
||||
log_info "Shutting down..."
|
||||
|
||||
# Terminate all background processes
|
||||
for pid in "${PIDS[@]}"; do
|
||||
if kill -0 "$pid" 2>/dev/null; then
|
||||
kill -TERM "$pid" 2>/dev/null || true
|
||||
fi
|
||||
done
|
||||
|
||||
# Wait for graceful shutdown
|
||||
for pid in "${PIDS[@]}"; do
|
||||
wait "$pid" 2>/dev/null || true
|
||||
done
|
||||
}
|
||||
|
||||
trap cleanup SIGTERM SIGINT
|
||||
|
||||
# Start background tasks
|
||||
background_task &
|
||||
PIDS+=($!)
|
||||
|
||||
another_task &
|
||||
PIDS+=($!)
|
||||
|
||||
# Wait for all background processes
|
||||
wait
|
||||
```
|
||||
|
||||
### Pattern 7: Safe File Operations
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -Eeuo pipefail
|
||||
|
||||
# Use -i flag to move safely without overwriting
|
||||
safe_move() {
|
||||
local -r source="$1"
|
||||
local -r dest="$2"
|
||||
|
||||
if [[ ! -e "$source" ]]; then
|
||||
echo "ERROR: Source does not exist: $source" >&2
|
||||
return 1
|
||||
fi
|
||||
|
||||
if [[ -e "$dest" ]]; then
|
||||
echo "ERROR: Destination already exists: $dest" >&2
|
||||
return 1
|
||||
fi
|
||||
|
||||
mv "$source" "$dest"
|
||||
}
|
||||
|
||||
# Safe directory cleanup
|
||||
safe_rmdir() {
|
||||
local -r dir="$1"
|
||||
|
||||
if [[ ! -d "$dir" ]]; then
|
||||
echo "ERROR: Not a directory: $dir" >&2
|
||||
return 1
|
||||
fi
|
||||
|
||||
# Use -I flag to prompt before rm (BSD/GNU compatible)
|
||||
rm -rI -- "$dir"
|
||||
}
|
||||
|
||||
# Atomic file writes
|
||||
atomic_write() {
|
||||
local -r target="$1"
|
||||
local -r tmpfile
|
||||
tmpfile=$(mktemp) || return 1
|
||||
|
||||
# Write to temp file first
|
||||
cat > "$tmpfile"
|
||||
|
||||
# Atomic rename
|
||||
mv "$tmpfile" "$target"
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 8: Idempotent Script Design
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -Eeuo pipefail
|
||||
|
||||
# Check if resource already exists
|
||||
ensure_directory() {
|
||||
local -r dir="$1"
|
||||
|
||||
if [[ -d "$dir" ]]; then
|
||||
log_info "Directory already exists: $dir"
|
||||
return 0
|
||||
fi
|
||||
|
||||
mkdir -p "$dir" || {
|
||||
log_error "Failed to create directory: $dir"
|
||||
return 1
|
||||
}
|
||||
|
||||
log_info "Created directory: $dir"
|
||||
}
|
||||
|
||||
# Ensure configuration state
|
||||
ensure_config() {
|
||||
local -r config_file="$1"
|
||||
local -r default_value="$2"
|
||||
|
||||
if [[ ! -f "$config_file" ]]; then
|
||||
echo "$default_value" > "$config_file"
|
||||
log_info "Created config: $config_file"
|
||||
fi
|
||||
}
|
||||
|
||||
# Rerunning script multiple times should be safe
|
||||
ensure_directory "/var/cache/myapp"
|
||||
ensure_config "/etc/myapp/config" "DEBUG=false"
|
||||
```
|
||||
|
||||
### Pattern 9: Safe Command Substitution
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -Eeuo pipefail
|
||||
|
||||
# Use $() instead of backticks
|
||||
name=$(<"$file") # Modern, safe variable assignment from file
|
||||
output=$(command -v python3) # Get command location safely
|
||||
|
||||
# Handle command substitution with error checking
|
||||
result=$(command -v node) || {
|
||||
log_error "node command not found"
|
||||
return 1
|
||||
}
|
||||
|
||||
# For multiple lines
|
||||
mapfile -t lines < <(grep "pattern" "$file")
|
||||
|
||||
# NUL-safe iteration
|
||||
while IFS= read -r -d '' file; do
|
||||
echo "Processing: $file"
|
||||
done < <(find /path -type f -print0)
|
||||
```
|
||||
|
||||
### Pattern 10: Dry-Run Support
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -Eeuo pipefail
|
||||
|
||||
DRY_RUN="${DRY_RUN:-false}"
|
||||
|
||||
run_cmd() {
|
||||
if [[ "$DRY_RUN" == "true" ]]; then
|
||||
echo "[DRY RUN] Would execute: $*"
|
||||
return 0
|
||||
fi
|
||||
|
||||
"$@"
|
||||
}
|
||||
|
||||
# Usage
|
||||
run_cmd cp "$source" "$dest"
|
||||
run_cmd rm "$file"
|
||||
run_cmd chown "$owner" "$target"
|
||||
```
|
||||
|
||||
## Advanced Defensive Techniques
|
||||
|
||||
### Named Parameters Pattern
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -Eeuo pipefail
|
||||
|
||||
process_data() {
|
||||
local input_file=""
|
||||
local output_dir=""
|
||||
local format="json"
|
||||
|
||||
# Parse named parameters
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--input=*)
|
||||
input_file="${1#*=}"
|
||||
;;
|
||||
--output=*)
|
||||
output_dir="${1#*=}"
|
||||
;;
|
||||
--format=*)
|
||||
format="${1#*=}"
|
||||
;;
|
||||
*)
|
||||
echo "ERROR: Unknown parameter: $1" >&2
|
||||
return 1
|
||||
;;
|
||||
esac
|
||||
shift
|
||||
done
|
||||
|
||||
# Validate required parameters
|
||||
[[ -n "$input_file" ]] || { echo "ERROR: --input is required" >&2; return 1; }
|
||||
[[ -n "$output_dir" ]] || { echo "ERROR: --output is required" >&2; return 1; }
|
||||
}
|
||||
```
|
||||
|
||||
### Dependency Checking
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -Eeuo pipefail
|
||||
|
||||
check_dependencies() {
|
||||
local -a missing_deps=()
|
||||
local -a required=("jq" "curl" "git")
|
||||
|
||||
for cmd in "${required[@]}"; do
|
||||
if ! command -v "$cmd" &>/dev/null; then
|
||||
missing_deps+=("$cmd")
|
||||
fi
|
||||
done
|
||||
|
||||
if [[ ${#missing_deps[@]} -gt 0 ]]; then
|
||||
echo "ERROR: Missing required commands: ${missing_deps[*]}" >&2
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
check_dependencies
|
||||
```
|
||||
|
||||
## Best Practices Summary
|
||||
|
||||
1. **Always use strict mode** - `set -Eeuo pipefail`
|
||||
2. **Quote all variables** - `"$variable"` prevents word splitting
|
||||
3. **Use [[ ]] conditionals** - More robust than [ ]
|
||||
4. **Implement error trapping** - Catch and handle errors gracefully
|
||||
5. **Validate all inputs** - Check file existence, permissions, formats
|
||||
6. **Use functions for reusability** - Prefix with meaningful names
|
||||
7. **Implement structured logging** - Include timestamps and levels
|
||||
8. **Support dry-run mode** - Allow users to preview changes
|
||||
9. **Handle temporary files safely** - Use mktemp, cleanup with trap
|
||||
10. **Design for idempotency** - Scripts should be safe to rerun
|
||||
11. **Document requirements** - List dependencies and minimum versions
|
||||
12. **Test error paths** - Ensure error handling works correctly
|
||||
13. **Use `command -v`** - Safer than `which` for checking executables
|
||||
14. **Prefer printf over echo** - More predictable across systems
|
||||
|
||||
## Resources
|
||||
|
||||
- **Bash Strict Mode**: http://redsymbol.net/articles/unofficial-bash-strict-mode/
|
||||
- **Google Shell Style Guide**: https://google.github.io/styleguide/shellguide.html
|
||||
- **Defensive BASH Programming**: https://www.lifepipe.net/
|
||||
25
skills/bash-linux/README.md
Normal file
25
skills/bash-linux/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
204
skills/bash-linux/SKILL.md
Normal file
204
skills/bash-linux/SKILL.md
Normal file
|
|
@ -0,0 +1,204 @@
|
|||
---
|
||||
name: bash-linux
|
||||
description: "Bash/Linux terminal patterns. Critical commands, piping, error handling, scripting. Use when working on macOS or Linux systems."
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Bash Linux Patterns
|
||||
|
||||
> Essential patterns for Bash on Linux/macOS.
|
||||
|
||||
---
|
||||
|
||||
## 1. Operator Syntax
|
||||
|
||||
### Chaining Commands
|
||||
|
||||
| Operator | Meaning | Example |
|
||||
|----------|---------|---------|
|
||||
| `;` | Run sequentially | `cmd1; cmd2` |
|
||||
| `&&` | Run if previous succeeded | `npm install && npm run dev` |
|
||||
| `\|\|` | Run if previous failed | `npm test \|\| echo "Tests failed"` |
|
||||
| `\|` | Pipe output | `ls \| grep ".js"` |
|
||||
|
||||
---
|
||||
|
||||
## 2. File Operations
|
||||
|
||||
### Essential Commands
|
||||
|
||||
| Task | Command |
|
||||
|------|---------|
|
||||
| List all | `ls -la` |
|
||||
| Find files | `find . -name "*.js" -type f` |
|
||||
| File content | `cat file.txt` |
|
||||
| First N lines | `head -n 20 file.txt` |
|
||||
| Last N lines | `tail -n 20 file.txt` |
|
||||
| Follow log | `tail -f log.txt` |
|
||||
| Search in files | `grep -r "pattern" --include="*.js"` |
|
||||
| File size | `du -sh *` |
|
||||
| Disk usage | `df -h` |
|
||||
|
||||
---
|
||||
|
||||
## 3. Process Management
|
||||
|
||||
| Task | Command |
|
||||
|------|---------|
|
||||
| List processes | `ps aux` |
|
||||
| Find by name | `ps aux \| grep node` |
|
||||
| Kill by PID | `kill -9 <PID>` |
|
||||
| Find port user | `lsof -i :3000` |
|
||||
| Kill port | `kill -9 $(lsof -t -i :3000)` |
|
||||
| Background | `npm run dev &` |
|
||||
| Jobs | `jobs -l` |
|
||||
| Bring to front | `fg %1` |
|
||||
|
||||
---
|
||||
|
||||
## 4. Text Processing
|
||||
|
||||
### Core Tools
|
||||
|
||||
| Tool | Purpose | Example |
|
||||
|------|---------|---------|
|
||||
| `grep` | Search | `grep -rn "TODO" src/` |
|
||||
| `sed` | Replace | `sed -i 's/old/new/g' file.txt` |
|
||||
| `awk` | Extract columns | `awk '{print $1}' file.txt` |
|
||||
| `cut` | Cut fields | `cut -d',' -f1 data.csv` |
|
||||
| `sort` | Sort lines | `sort -u file.txt` |
|
||||
| `uniq` | Unique lines | `sort file.txt \| uniq -c` |
|
||||
| `wc` | Count | `wc -l file.txt` |
|
||||
|
||||
---
|
||||
|
||||
## 5. Environment Variables
|
||||
|
||||
| Task | Command |
|
||||
|------|---------|
|
||||
| View all | `env` or `printenv` |
|
||||
| View one | `echo $PATH` |
|
||||
| Set temporary | `export VAR="value"` |
|
||||
| Set in script | `VAR="value" command` |
|
||||
| Add to PATH | `export PATH="$PATH:/new/path"` |
|
||||
|
||||
---
|
||||
|
||||
## 6. Network
|
||||
|
||||
| Task | Command |
|
||||
|------|---------|
|
||||
| Download | `curl -O https://example.com/file` |
|
||||
| API request | `curl -X GET https://api.example.com` |
|
||||
| POST JSON | `curl -X POST -H "Content-Type: application/json" -d '{"key":"value"}' URL` |
|
||||
| Check port | `nc -zv localhost 3000` |
|
||||
| Network info | `ifconfig` or `ip addr` |
|
||||
|
||||
---
|
||||
|
||||
## 7. Script Template
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -euo pipefail # Exit on error, undefined var, pipe fail
|
||||
|
||||
# Colors (optional)
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
NC='\033[0m'
|
||||
|
||||
# Script directory
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
|
||||
# Functions
|
||||
log_info() { echo -e "${GREEN}[INFO]${NC} $1"; }
|
||||
log_error() { echo -e "${RED}[ERROR]${NC} $1" >&2; }
|
||||
|
||||
# Main
|
||||
main() {
|
||||
log_info "Starting..."
|
||||
# Your logic here
|
||||
log_info "Done!"
|
||||
}
|
||||
|
||||
main "$@"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Common Patterns
|
||||
|
||||
### Check if command exists
|
||||
|
||||
```bash
|
||||
if command -v node &> /dev/null; then
|
||||
echo "Node is installed"
|
||||
fi
|
||||
```
|
||||
|
||||
### Default variable value
|
||||
|
||||
```bash
|
||||
NAME=${1:-"default_value"}
|
||||
```
|
||||
|
||||
### Read file line by line
|
||||
|
||||
```bash
|
||||
while IFS= read -r line; do
|
||||
echo "$line"
|
||||
done < file.txt
|
||||
```
|
||||
|
||||
### Loop over files
|
||||
|
||||
```bash
|
||||
for file in *.js; do
|
||||
echo "Processing $file"
|
||||
done
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Differences from PowerShell
|
||||
|
||||
| Task | PowerShell | Bash |
|
||||
|------|------------|------|
|
||||
| List files | `Get-ChildItem` | `ls -la` |
|
||||
| Find files | `Get-ChildItem -Recurse` | `find . -type f` |
|
||||
| Environment | `$env:VAR` | `$VAR` |
|
||||
| String concat | `"$a$b"` | `"$a$b"` (same) |
|
||||
| Null check | `if ($x)` | `if [ -n "$x" ]` |
|
||||
| Pipeline | Object-based | Text-based |
|
||||
|
||||
---
|
||||
|
||||
## 10. Error Handling
|
||||
|
||||
### Set options
|
||||
|
||||
```bash
|
||||
set -e # Exit on error
|
||||
set -u # Exit on undefined variable
|
||||
set -o pipefail # Exit on pipe failure
|
||||
set -x # Debug: print commands
|
||||
```
|
||||
|
||||
### Trap for cleanup
|
||||
|
||||
```bash
|
||||
cleanup() {
|
||||
echo "Cleaning up..."
|
||||
rm -f /tmp/tempfile
|
||||
}
|
||||
trap cleanup EXIT
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
> **Remember:** Bash is text-based. Use `&&` for success chains, `set -e` for safety, and quote your variables!
|
||||
|
||||
## When to Use
|
||||
This skill is applicable to execute the workflow or actions described in the overview.
|
||||
25
skills/bash-pro/README.md
Normal file
25
skills/bash-pro/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
315
skills/bash-pro/SKILL.md
Normal file
315
skills/bash-pro/SKILL.md
Normal file
|
|
@ -0,0 +1,315 @@
|
|||
---
|
||||
name: bash-pro
|
||||
description: 'Master of defensive Bash scripting for production automation, CI/CD
|
||||
|
||||
pipelines, and system utilities. Expert in safe, portable, and testable shell
|
||||
|
||||
scripts.
|
||||
|
||||
'
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: '2026-02-27'
|
||||
---
|
||||
## Use this skill when
|
||||
|
||||
- Writing or reviewing Bash scripts for automation, CI/CD, or ops
|
||||
- Hardening shell scripts for safety and portability
|
||||
|
||||
## Do not use this skill when
|
||||
|
||||
- You need POSIX-only shell without Bash features
|
||||
- The task requires a higher-level language for complex logic
|
||||
- You need Windows-native scripting (PowerShell)
|
||||
|
||||
## Instructions
|
||||
|
||||
1. Define script inputs, outputs, and failure modes.
|
||||
2. Apply strict mode and safe argument parsing.
|
||||
3. Implement core logic with defensive patterns.
|
||||
4. Add tests and linting with Bats and ShellCheck.
|
||||
|
||||
## Safety
|
||||
|
||||
- Treat input as untrusted; avoid eval and unsafe globbing.
|
||||
- Prefer dry-run modes before destructive actions.
|
||||
|
||||
## Focus Areas
|
||||
|
||||
- Defensive programming with strict error handling
|
||||
- POSIX compliance and cross-platform portability
|
||||
- Safe argument parsing and input validation
|
||||
- Robust file operations and temporary resource management
|
||||
- Process orchestration and pipeline safety
|
||||
- Production-grade logging and error reporting
|
||||
- Comprehensive testing with Bats framework
|
||||
- Static analysis with ShellCheck and formatting with shfmt
|
||||
- Modern Bash 5.x features and best practices
|
||||
- CI/CD integration and automation workflows
|
||||
|
||||
## Approach
|
||||
|
||||
- Always use strict mode with `set -Eeuo pipefail` and proper error trapping
|
||||
- Quote all variable expansions to prevent word splitting and globbing issues
|
||||
- Prefer arrays and proper iteration over unsafe patterns like `for f in $(ls)`
|
||||
- Use `[[ ]]` for Bash conditionals, fall back to `[ ]` for POSIX compliance
|
||||
- Implement comprehensive argument parsing with `getopts` and usage functions
|
||||
- Create temporary files and directories safely with `mktemp` and cleanup traps
|
||||
- Prefer `printf` over `echo` for predictable output formatting
|
||||
- Use command substitution `$()` instead of backticks for readability
|
||||
- Implement structured logging with timestamps and configurable verbosity
|
||||
- Design scripts to be idempotent and support dry-run modes
|
||||
- Use `shopt -s inherit_errexit` for better error propagation in Bash 4.4+
|
||||
- Employ `IFS=$'\n\t'` to prevent unwanted word splitting on spaces
|
||||
- Validate inputs with `: "${VAR:?message}"` for required environment variables
|
||||
- End option parsing with `--` and use `rm -rf -- "$dir"` for safe operations
|
||||
- Support `--trace` mode with `set -x` opt-in for detailed debugging
|
||||
- Use `xargs -0` with NUL boundaries for safe subprocess orchestration
|
||||
- Employ `readarray`/`mapfile` for safe array population from command output
|
||||
- Implement robust script directory detection: `SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)"`
|
||||
- Use NUL-safe patterns: `find -print0 | while IFS= read -r -d '' file; do ...; done`
|
||||
|
||||
## Compatibility & Portability
|
||||
|
||||
- Use `#!/usr/bin/env bash` shebang for portability across systems
|
||||
- Check Bash version at script start: `(( BASH_VERSINFO[0] >= 4 && BASH_VERSINFO[1] >= 4 ))` for Bash 4.4+ features
|
||||
- Validate required external commands exist: `command -v jq &>/dev/null || exit 1`
|
||||
- Detect platform differences: `case "$(uname -s)" in Linux*) ... ;; Darwin*) ... ;; esac`
|
||||
- Handle GNU vs BSD tool differences (e.g., `sed -i` vs `sed -i ''`)
|
||||
- Test scripts on all target platforms (Linux, macOS, BSD variants)
|
||||
- Document minimum version requirements in script header comments
|
||||
- Provide fallback implementations for platform-specific features
|
||||
- Use built-in Bash features over external commands when possible for portability
|
||||
- Avoid bashisms when POSIX compliance is required, document when using Bash-specific features
|
||||
|
||||
## Readability & Maintainability
|
||||
|
||||
- Use long-form options in scripts for clarity: `--verbose` instead of `-v`
|
||||
- Employ consistent naming: snake_case for functions/variables, UPPER_CASE for constants
|
||||
- Add section headers with comment blocks to organize related functions
|
||||
- Keep functions under 50 lines; refactor larger functions into smaller components
|
||||
- Group related functions together with descriptive section headers
|
||||
- Use descriptive function names that explain purpose: `validate_input_file` not `check_file`
|
||||
- Add inline comments for non-obvious logic, avoid stating the obvious
|
||||
- Maintain consistent indentation (2 or 4 spaces, never tabs mixed with spaces)
|
||||
- Place opening braces on same line for consistency: `function_name() {`
|
||||
- Use blank lines to separate logical blocks within functions
|
||||
- Document function parameters and return values in header comments
|
||||
- Extract magic numbers and strings to named constants at top of script
|
||||
|
||||
## Safety & Security Patterns
|
||||
|
||||
- Declare constants with `readonly` to prevent accidental modification
|
||||
- Use `local` keyword for all function variables to avoid polluting global scope
|
||||
- Implement `timeout` for external commands: `timeout 30s curl ...` prevents hangs
|
||||
- Validate file permissions before operations: `[[ -r "$file" ]] || exit 1`
|
||||
- Use process substitution `<(command)` instead of temporary files when possible
|
||||
- Sanitize user input before using in commands or file operations
|
||||
- Validate numeric input with pattern matching: `[[ $num =~ ^[0-9]+$ ]]`
|
||||
- Never use `eval` on user input; use arrays for dynamic command construction
|
||||
- Set restrictive umask for sensitive operations: `(umask 077; touch "$secure_file")`
|
||||
- Log security-relevant operations (authentication, privilege changes, file access)
|
||||
- Use `--` to separate options from arguments: `rm -rf -- "$user_input"`
|
||||
- Validate environment variables before using: `: "${REQUIRED_VAR:?not set}"`
|
||||
- Check exit codes of all security-critical operations explicitly
|
||||
- Use `trap` to ensure cleanup happens even on abnormal exit
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
- Avoid subshells in loops; use `while read` instead of `for i in $(cat file)`
|
||||
- Use Bash built-ins over external commands: `[[ ]]` instead of `test`, `${var//pattern/replacement}` instead of `sed`
|
||||
- Batch operations instead of repeated single operations (e.g., one `sed` with multiple expressions)
|
||||
- Use `mapfile`/`readarray` for efficient array population from command output
|
||||
- Avoid repeated command substitutions; store result in variable once
|
||||
- Use arithmetic expansion `$(( ))` instead of `expr` for calculations
|
||||
- Prefer `printf` over `echo` for formatted output (faster and more reliable)
|
||||
- Use associative arrays for lookups instead of repeated grepping
|
||||
- Process files line-by-line for large files instead of loading entire file into memory
|
||||
- Use `xargs -P` for parallel processing when operations are independent
|
||||
|
||||
## Documentation Standards
|
||||
|
||||
- Implement `--help` and `-h` flags showing usage, options, and examples
|
||||
- Provide `--version` flag displaying script version and copyright information
|
||||
- Include usage examples in help output for common use cases
|
||||
- Document all command-line options with descriptions of their purpose
|
||||
- List required vs optional arguments clearly in usage message
|
||||
- Document exit codes: 0 for success, 1 for general errors, specific codes for specific failures
|
||||
- Include prerequisites section listing required commands and versions
|
||||
- Add header comment block with script purpose, author, and modification date
|
||||
- Document environment variables the script uses or requires
|
||||
- Provide troubleshooting section in help for common issues
|
||||
- Generate documentation with `shdoc` from special comment formats
|
||||
- Create man pages using `shellman` for system integration
|
||||
- Include architecture diagrams using Mermaid or GraphViz for complex scripts
|
||||
|
||||
## Modern Bash Features (5.x)
|
||||
|
||||
- **Bash 5.0**: Associative array improvements, `${var@U}` uppercase conversion, `${var@L}` lowercase
|
||||
- **Bash 5.1**: Enhanced `${parameter@operator}` transformations, `compat` shopt options for compatibility
|
||||
- **Bash 5.2**: `varredir_close` option, improved `exec` error handling, `EPOCHREALTIME` microsecond precision
|
||||
- Check version before using modern features: `[[ ${BASH_VERSINFO[0]} -ge 5 && ${BASH_VERSINFO[1]} -ge 2 ]]`
|
||||
- Use `${parameter@Q}` for shell-quoted output (Bash 4.4+)
|
||||
- Use `${parameter@E}` for escape sequence expansion (Bash 4.4+)
|
||||
- Use `${parameter@P}` for prompt expansion (Bash 4.4+)
|
||||
- Use `${parameter@A}` for assignment format (Bash 4.4+)
|
||||
- Employ `wait -n` to wait for any background job (Bash 4.3+)
|
||||
- Use `mapfile -d delim` for custom delimiters (Bash 4.4+)
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
- **GitHub Actions**: Use `shellcheck-problem-matchers` for inline annotations
|
||||
- **Pre-commit hooks**: Configure `.pre-commit-config.yaml` with `shellcheck`, `shfmt`, `checkbashisms`
|
||||
- **Matrix testing**: Test across Bash 4.4, 5.0, 5.1, 5.2 on Linux and macOS
|
||||
- **Container testing**: Use official bash:5.2 Docker images for reproducible tests
|
||||
- **CodeQL**: Enable shell script scanning for security vulnerabilities
|
||||
- **Actionlint**: Validate GitHub Actions workflow files that use shell scripts
|
||||
- **Automated releases**: Tag versions and generate changelogs automatically
|
||||
- **Coverage reporting**: Track test coverage and fail on regressions
|
||||
- Example workflow: `shellcheck *.sh && shfmt -d *.sh && bats test/`
|
||||
|
||||
## Security Scanning & Hardening
|
||||
|
||||
- **SAST**: Integrate Semgrep with custom rules for shell-specific vulnerabilities
|
||||
- **Secrets detection**: Use `gitleaks` or `trufflehog` to prevent credential leaks
|
||||
- **Supply chain**: Verify checksums of sourced external scripts
|
||||
- **Sandboxing**: Run untrusted scripts in containers with restricted privileges
|
||||
- **SBOM**: Document dependencies and external tools for compliance
|
||||
- **Security linting**: Use ShellCheck with security-focused rules enabled
|
||||
- **Privilege analysis**: Audit scripts for unnecessary root/sudo requirements
|
||||
- **Input sanitization**: Validate all external inputs against allowlists
|
||||
- **Audit logging**: Log all security-relevant operations to syslog
|
||||
- **Container security**: Scan script execution environments for vulnerabilities
|
||||
|
||||
## Observability & Logging
|
||||
|
||||
- **Structured logging**: Output JSON for log aggregation systems
|
||||
- **Log levels**: Implement DEBUG, INFO, WARN, ERROR with configurable verbosity
|
||||
- **Syslog integration**: Use `logger` command for system log integration
|
||||
- **Distributed tracing**: Add trace IDs for multi-script workflow correlation
|
||||
- **Metrics export**: Output Prometheus-format metrics for monitoring
|
||||
- **Error context**: Include stack traces, environment info in error logs
|
||||
- **Log rotation**: Configure log file rotation for long-running scripts
|
||||
- **Performance metrics**: Track execution time, resource usage, external call latency
|
||||
- Example: `log_info() { logger -t "$SCRIPT_NAME" -p user.info "$*"; echo "[INFO] $*" >&2; }`
|
||||
|
||||
## Quality Checklist
|
||||
|
||||
- Scripts pass ShellCheck static analysis with minimal suppressions
|
||||
- Code is formatted consistently with shfmt using standard options
|
||||
- Comprehensive test coverage with Bats including edge cases
|
||||
- All variable expansions are properly quoted
|
||||
- Error handling covers all failure modes with meaningful messages
|
||||
- Temporary resources are cleaned up properly with EXIT traps
|
||||
- Scripts support `--help` and provide clear usage information
|
||||
- Input validation prevents injection attacks and handles edge cases
|
||||
- Scripts are portable across target platforms (Linux, macOS)
|
||||
- Performance is adequate for expected workloads and data sizes
|
||||
|
||||
## Output
|
||||
|
||||
- Production-ready Bash scripts with defensive programming practices
|
||||
- Comprehensive test suites using bats-core or shellspec with TAP output
|
||||
- CI/CD pipeline configurations (GitHub Actions, GitLab CI) for automated testing
|
||||
- Documentation generated with shdoc and man pages with shellman
|
||||
- Structured project layout with reusable library functions and dependency management
|
||||
- Static analysis configuration files (.shellcheckrc, .shfmt.toml, .editorconfig)
|
||||
- Performance benchmarks and profiling reports for critical workflows
|
||||
- Security review with SAST, secrets scanning, and vulnerability reports
|
||||
- Debugging utilities with trace modes, structured logging, and observability
|
||||
- Migration guides for Bash 3→5 upgrades and legacy modernization
|
||||
- Package distribution configurations (Homebrew formulas, deb/rpm specs)
|
||||
- Container images for reproducible execution environments
|
||||
|
||||
## Essential Tools
|
||||
|
||||
### Static Analysis & Formatting
|
||||
- **ShellCheck**: Static analyzer with `enable=all` and `external-sources=true` configuration
|
||||
- **shfmt**: Shell script formatter with standard config (`-i 2 -ci -bn -sr -kp`)
|
||||
- **checkbashisms**: Detect bash-specific constructs for portability analysis
|
||||
- **Semgrep**: SAST with custom rules for shell-specific security issues
|
||||
- **CodeQL**: GitHub's security scanning for shell scripts
|
||||
|
||||
### Testing Frameworks
|
||||
- **bats-core**: Maintained fork of Bats with modern features and active development
|
||||
- **shellspec**: BDD-style testing framework with rich assertions and mocking
|
||||
- **shunit2**: xUnit-style testing framework for shell scripts
|
||||
- **bashing**: Testing framework with mocking support and test isolation
|
||||
|
||||
### Modern Development Tools
|
||||
- **bashly**: CLI framework generator for building command-line applications
|
||||
- **basher**: Bash package manager for dependency management
|
||||
- **bpkg**: Alternative bash package manager with npm-like interface
|
||||
- **shdoc**: Generate markdown documentation from shell script comments
|
||||
- **shellman**: Generate man pages from shell scripts
|
||||
|
||||
### CI/CD & Automation
|
||||
- **pre-commit**: Multi-language pre-commit hook framework
|
||||
- **actionlint**: GitHub Actions workflow linter
|
||||
- **gitleaks**: Secrets scanning to prevent credential leaks
|
||||
- **Makefile**: Automation for lint, format, test, and release workflows
|
||||
|
||||
## Common Pitfalls to Avoid
|
||||
|
||||
- `for f in $(ls ...)` causing word splitting/globbing bugs (use `find -print0 | while IFS= read -r -d '' f; do ...; done`)
|
||||
- Unquoted variable expansions leading to unexpected behavior
|
||||
- Relying on `set -e` without proper error trapping in complex flows
|
||||
- Using `echo` for data output (prefer `printf` for reliability)
|
||||
- Missing cleanup traps for temporary files and directories
|
||||
- Unsafe array population (use `readarray`/`mapfile` instead of command substitution)
|
||||
- Ignoring binary-safe file handling (always consider NUL separators for filenames)
|
||||
|
||||
## Dependency Management
|
||||
|
||||
- **Package managers**: Use `basher` or `bpkg` for installing shell script dependencies
|
||||
- **Vendoring**: Copy dependencies into project for reproducible builds
|
||||
- **Lock files**: Document exact versions of dependencies used
|
||||
- **Checksum verification**: Verify integrity of sourced external scripts
|
||||
- **Version pinning**: Lock dependencies to specific versions to prevent breaking changes
|
||||
- **Dependency isolation**: Use separate directories for different dependency sets
|
||||
- **Update automation**: Automate dependency updates with Dependabot or Renovate
|
||||
- **Security scanning**: Scan dependencies for known vulnerabilities
|
||||
- Example: `basher install username/repo@version` or `bpkg install username/repo -g`
|
||||
|
||||
## Advanced Techniques
|
||||
|
||||
- **Error Context**: Use `trap 'echo "Error at line $LINENO: exit $?" >&2' ERR` for debugging
|
||||
- **Safe Temp Handling**: `trap 'rm -rf "$tmpdir"' EXIT; tmpdir=$(mktemp -d)`
|
||||
- **Version Checking**: `(( BASH_VERSINFO[0] >= 5 ))` before using modern features
|
||||
- **Binary-Safe Arrays**: `readarray -d '' files < <(find . -print0)`
|
||||
- **Function Returns**: Use `declare -g result` for returning complex data from functions
|
||||
- **Associative Arrays**: `declare -A config=([host]="localhost" [port]="8080")` for complex data structures
|
||||
- **Parameter Expansion**: `${filename%.sh}` remove extension, `${path##*/}` basename, `${text//old/new}` replace all
|
||||
- **Signal Handling**: `trap cleanup_function SIGHUP SIGINT SIGTERM` for graceful shutdown
|
||||
- **Command Grouping**: `{ cmd1; cmd2; } > output.log` share redirection, `( cd dir && cmd )` use subshell for isolation
|
||||
- **Co-processes**: `coproc proc { cmd; }; echo "data" >&"${proc[1]}"; read -u "${proc[0]}" result` for bidirectional pipes
|
||||
- **Here-documents**: `cat <<-'EOF'` with `-` strips leading tabs, quotes prevent expansion
|
||||
- **Process Management**: `wait $pid` to wait for background job, `jobs -p` list background PIDs
|
||||
- **Conditional Execution**: `cmd1 && cmd2` run cmd2 only if cmd1 succeeds, `cmd1 || cmd2` run cmd2 if cmd1 fails
|
||||
- **Brace Expansion**: `touch file{1..10}.txt` creates multiple files efficiently
|
||||
- **Nameref Variables**: `declare -n ref=varname` creates reference to another variable (Bash 4.3+)
|
||||
- **Improved Error Trapping**: `set -Eeuo pipefail; shopt -s inherit_errexit` for comprehensive error handling
|
||||
- **Parallel Execution**: `xargs -P $(nproc) -n 1 command` for parallel processing with CPU core count
|
||||
- **Structured Output**: `jq -n --arg key "$value" '{key: $key}'` for JSON generation
|
||||
- **Performance Profiling**: Use `time -v` for detailed resource usage or `TIMEFORMAT` for custom timing
|
||||
|
||||
## References & Further Reading
|
||||
|
||||
### Style Guides & Best Practices
|
||||
- [Google Shell Style Guide](https://google.github.io/styleguide/shellguide.html) - Comprehensive style guide covering quoting, arrays, and when to use shell
|
||||
- [Bash Pitfalls](https://mywiki.wooledge.org/BashPitfalls) - Catalog of common Bash mistakes and how to avoid them
|
||||
- [Bash Hackers Wiki](https://wiki.bash-hackers.org/) - Comprehensive Bash documentation and advanced techniques
|
||||
- [Defensive BASH Programming](https://www.kfirlavi.com/blog/2012/11/14/defensive-bash-programming/) - Modern defensive programming patterns
|
||||
|
||||
### Tools & Frameworks
|
||||
- [ShellCheck](https://github.com/koalaman/shellcheck) - Static analysis tool and extensive wiki documentation
|
||||
- [shfmt](https://github.com/mvdan/sh) - Shell script formatter with detailed flag documentation
|
||||
- [bats-core](https://github.com/bats-core/bats-core) - Maintained Bash testing framework
|
||||
- [shellspec](https://github.com/shellspec/shellspec) - BDD-style testing framework for shell scripts
|
||||
- [bashly](https://bashly.dannyb.co/) - Modern Bash CLI framework generator
|
||||
- [shdoc](https://github.com/reconquest/shdoc) - Documentation generator for shell scripts
|
||||
|
||||
### Security & Advanced Topics
|
||||
- [Bash Security Best Practices](https://github.com/carlospolop/PEASS-ng) - Security-focused shell script patterns
|
||||
- [Awesome Bash](https://github.com/awesome-lists/awesome-bash) - Curated list of Bash resources and tools
|
||||
- [Pure Bash Bible](https://github.com/dylanaraps/pure-bash-bible) - Collection of pure bash alternatives to external commands
|
||||
125
skills/bookstack-documentation/SKILL.md
Normal file
125
skills/bookstack-documentation/SKILL.md
Normal file
|
|
@ -0,0 +1,125 @@
|
|||
---
|
||||
name: bookstack-documentation
|
||||
description: Use when completing any significant work — deploying services, fixing cluster issues, writing runbooks, finishing brainstorming sessions, or making architectural decisions — to determine whether and where to save it to BookStack at https://wiki.ctz.fyi
|
||||
---
|
||||
|
||||
# BookStack Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
Save durable knowledge to BookStack as part of normal work — not just specs and plans, but ops runbooks, architecture notes, troubleshooting outcomes, and session results. If future-you would need to look it up, write it down.
|
||||
|
||||
**Instance:** https://wiki.ctz.fyi (BookStack v26.03.5)
|
||||
**MCP tools:** `litellm_bookstack-*`
|
||||
|
||||
## Decision Table — Where Does This Go?
|
||||
|
||||
| Content type | Location |
|
||||
|---|---|
|
||||
| Design spec (from brainstorming) | Specs book (ID 157) |
|
||||
| Implementation plan | Plans book (ID 159) |
|
||||
| Architecture decision / how a system works | Ansiblestack book (ID 79), find or create page |
|
||||
| Ops runbook / "how to do X on the cluster" | Ansiblestack book, `playbook-reference` page or new dedicated page |
|
||||
| Troubleshooting investigation outcome | Ansiblestack book, relevant service page (e.g., update `keycloak` page) |
|
||||
| New service deployed | Ansiblestack book, create new page named after the service |
|
||||
| Project-specific docs | New book in Infrastructure Docs shelf, or new chapter in Ansiblestack |
|
||||
|
||||
## Shelf and Book Structure
|
||||
|
||||
```
|
||||
Shelf: Superpowers (ID 1)
|
||||
Book: Specs (ID 157) — Design specs from brainstorming sessions
|
||||
Book: Plans (ID 159) — Implementation plans
|
||||
|
||||
Shelf: Infrastructure Docs (ID 78)
|
||||
Book: Ansiblestack (ID 79) — Cluster bootstrap, services, architecture docs
|
||||
Existing pages: INDEX, addons, applications, architecture, argocd-consolidation,
|
||||
cluster, crowdsec, dns, external-secrets, hacker-ethos, keycloak,
|
||||
litellm-qdrant-memory, mcp-servers, missing-services, monitoring,
|
||||
netbox, networking, openbao, pangolin-newt-troubleshooting,
|
||||
playbook-reference, playwright-mcp, rabbitmq, scripts, tandoor,
|
||||
terrakube, tofu
|
||||
|
||||
Shelf: Repo Documentation (ID 121)
|
||||
Various per-repo books
|
||||
|
||||
Book: touchscreen — Family Room Dashboard (ID 162)
|
||||
```
|
||||
|
||||
## When to Save
|
||||
|
||||
- After brainstorming session completes → spec to Specs book
|
||||
- After plan is written → plan to Plans book
|
||||
- After deploying a new service → create/update service page in Ansiblestack
|
||||
- After investigating and fixing a cluster issue → document fix on the relevant service page
|
||||
- After writing a runbook or procedure → Ansiblestack `playbook-reference` or dedicated page
|
||||
- After any architectural decision that isn't obvious from the code
|
||||
|
||||
## When NOT to Save
|
||||
|
||||
- Temporary debug output or scratch work
|
||||
- Q&A that belongs in chat history
|
||||
- Anything immediately obsolete
|
||||
|
||||
## Page Naming Conventions
|
||||
|
||||
| Type | Format |
|
||||
|---|---|
|
||||
| Specs | `[Spec] YYYY-MM-DD: <topic>` |
|
||||
| Plans | `[Plan] YYYY-MM-DD: <feature name>` |
|
||||
| Service pages | lowercase service name (e.g., `rabbitmq`) |
|
||||
| Runbooks | descriptive verb phrase: `Rotating OpenBao Unseal Keys` |
|
||||
|
||||
## Page Format (Markdown)
|
||||
|
||||
For service pages, use this structure:
|
||||
|
||||
```markdown
|
||||
# Service Name
|
||||
|
||||
**Status:** Running / Deprecated
|
||||
**Namespace:** `<ns>`
|
||||
**URL:** https://<hostname>
|
||||
**Chart:** `helm/charts/<name>/`
|
||||
**ArgoCD App:** `helm/argocd/<name>-app.yaml`
|
||||
**Secrets:** OpenBao path `secret/production/<ns>/...`
|
||||
|
||||
## Overview
|
||||
What it is and why we run it.
|
||||
|
||||
## Architecture
|
||||
How it's deployed, what it depends on.
|
||||
|
||||
## Configuration
|
||||
Key config decisions, non-obvious settings.
|
||||
|
||||
## Operations
|
||||
### How to restart
|
||||
### How to update
|
||||
### Common issues
|
||||
```
|
||||
|
||||
For runbooks and procedures, use a clear numbered steps format. For troubleshooting outcomes, document: symptoms → investigation → root cause → fix.
|
||||
|
||||
## MCP Usage
|
||||
|
||||
```python
|
||||
# Find an existing page (search or list book contents)
|
||||
bookstack_books_read(id=79) # lists pages in Ansiblestack
|
||||
|
||||
# Create a new page
|
||||
bookstack_pages_create(
|
||||
book_id=79,
|
||||
name="my-service",
|
||||
markdown="# My Service\n..."
|
||||
)
|
||||
|
||||
# Update existing page — ALWAYS read first, updates replace entire content
|
||||
page = bookstack_pages_read(id=<page_id>)
|
||||
bookstack_pages_update(
|
||||
id=<page_id>,
|
||||
markdown="<updated full content>"
|
||||
)
|
||||
```
|
||||
|
||||
**Always read before updating.** `bookstack_pages_update` replaces the entire page.
|
||||
122
skills/brainstorming/SKILL.md
Normal file
122
skills/brainstorming/SKILL.md
Normal file
|
|
@ -0,0 +1,122 @@
|
|||
---
|
||||
name: brainstorming
|
||||
description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation."
|
||||
---
|
||||
|
||||
# Brainstorming Ideas Into Designs
|
||||
|
||||
Help turn ideas into fully formed designs and specs through natural collaborative dialogue.
|
||||
|
||||
Start by understanding the current project context, then ask questions one at a time to refine the idea. Once you understand what you're building, present the design and get user approval.
|
||||
|
||||
<HARD-GATE>
|
||||
Do NOT invoke any implementation skill, write any code, scaffold any project, or take any implementation action until you have presented a design and the user has approved it. This applies to EVERY project regardless of perceived simplicity.
|
||||
</HARD-GATE>
|
||||
|
||||
## Anti-Pattern: "This Is Too Simple To Need A Design"
|
||||
|
||||
Every project goes through this process. A todo list, a single-function utility, a config change — all of them. "Simple" projects are where unexamined assumptions cause the most wasted work. The design can be short (a few sentences for truly simple projects), but you MUST present it and get approval.
|
||||
|
||||
## Checklist
|
||||
|
||||
You MUST create a todo item for each of these and complete them in order:
|
||||
|
||||
1. **Explore project context** — check files, docs, recent commits
|
||||
2. **Offer visual companion** (if topic will involve visual questions) — own message, not combined with a clarifying question
|
||||
3. **Ask clarifying questions** — one at a time, understand purpose/constraints/success criteria
|
||||
4. **Propose 2-3 approaches** — with trade-offs and your recommendation
|
||||
5. **Present design** — in sections scaled to their complexity, get user approval after each section
|
||||
6. **Save spec to BookStack** — create a page in the Specs book (https://wiki.ctz.fyi) with the full design doc
|
||||
7. **Spec self-review** — quick inline check for placeholders, contradictions, ambiguity, scope
|
||||
8. **User reviews spec** — ask user to review the BookStack page before proceeding
|
||||
9. **Transition to implementation** — invoke `writing-plans` skill
|
||||
|
||||
## BookStack Spec Page
|
||||
|
||||
After the design is approved (step 6), save it to BookStack at https://wiki.ctz.fyi:
|
||||
|
||||
1. The **Specs** book already exists (book ID 157) under the Superpowers shelf.
|
||||
2. Create the spec page via `bookstack_pages_create`:
|
||||
- `book_id`: 157
|
||||
- `name`: `[Spec] YYYY-MM-DD: <topic>`
|
||||
- `markdown`: full design doc in markdown
|
||||
3. Note the returned page URL for the user review gate: `https://wiki.ctz.fyi/books/specs-CdD/page/<slug>`
|
||||
|
||||
> If a project-specific chapter is appropriate (e.g., a named project has multiple specs), create or reuse a chapter inside the Specs book and use `chapter_id` instead of `book_id`.
|
||||
|
||||
## Vikunja Project Setup
|
||||
|
||||
Also create or identify the Vikunja project for implementation tracking:
|
||||
|
||||
1. Call `litellm_vikunja-vikunja_api` with operation `get_projects` to list all projects
|
||||
2. Ask: "Which Vikunja project should tasks live in? Or I can create a new one cloned from the Template."
|
||||
3. If creating a new project:
|
||||
- Ask the user what to name it
|
||||
- Call `put_projects_projectid_duplicate` with `projectID: 5`, body `{ "name": "<chosen name>" }`
|
||||
4. Note the project ID for `writing-plans`
|
||||
|
||||
## The Process
|
||||
|
||||
**Understanding the idea:**
|
||||
- Check out the current project state first (files, docs, recent commits)
|
||||
- Before asking detailed questions, assess scope: if the request describes multiple independent subsystems, flag this immediately
|
||||
- If the project is too large for a single spec, help the user decompose into sub-projects
|
||||
- For appropriately-scoped projects, ask questions one at a time to refine the idea
|
||||
- Prefer multiple choice questions when possible
|
||||
- Only one question per message
|
||||
- Focus on understanding: purpose, constraints, success criteria
|
||||
|
||||
**Exploring approaches:**
|
||||
- Propose 2-3 different approaches with trade-offs
|
||||
- Present options conversationally with your recommendation and reasoning
|
||||
- Lead with your recommended option and explain why
|
||||
|
||||
**Presenting the design:**
|
||||
- Once you believe you understand what you're building, present the design
|
||||
- Scale each section to its complexity
|
||||
- Ask after each section whether it looks right so far
|
||||
- Cover: architecture, components, data flow, error handling, testing
|
||||
|
||||
**Design for isolation and clarity:**
|
||||
- Break the system into smaller units that each have one clear purpose
|
||||
- Can someone understand what a unit does without reading its internals?
|
||||
|
||||
**Working in existing codebases:**
|
||||
- Explore the current structure before proposing changes. Follow existing patterns.
|
||||
- Include targeted improvements but don't propose unrelated refactoring.
|
||||
|
||||
## Spec Self-Review (step 7)
|
||||
|
||||
Run this yourself — not a subagent:
|
||||
|
||||
1. **Placeholder scan:** Any "TBD", "TODO", incomplete sections, or vague requirements? Fix them.
|
||||
2. **Internal consistency:** Do any sections contradict each other?
|
||||
3. **Scope check:** Is this focused enough for a single implementation plan?
|
||||
4. **Ambiguity check:** Could any requirement be interpreted two different ways?
|
||||
|
||||
## User Review Gate (step 8)
|
||||
|
||||
After saving to BookStack and completing the self-review, ask the user:
|
||||
|
||||
> "Spec saved to BookStack: https://wiki.ctz.fyi/books/specs-CdD/page/<slug>. Please review it and let me know if you want any changes before we start writing the implementation plan."
|
||||
|
||||
Wait for the user's response. Only proceed once the user approves.
|
||||
|
||||
## Implementation (step 9)
|
||||
|
||||
- Invoke the `writing-plans` skill to create a detailed implementation plan
|
||||
- Do NOT invoke any other skill. `writing-plans` is the next and only step.
|
||||
|
||||
## Key Principles
|
||||
- One question at a time
|
||||
- Multiple choice preferred
|
||||
- YAGNI ruthlessly
|
||||
- Explore alternatives — always propose 2-3 approaches
|
||||
- Incremental validation — present design section by section, get approval before moving on
|
||||
- Be flexible — go back and clarify when something doesn't make sense
|
||||
|
||||
## Visual Companion
|
||||
|
||||
A browser-based companion for showing mockups, diagrams, and visual options. Offer it once for consent when visual questions are anticipated. This offer MUST be its own message — not combined with a clarifying question.
|
||||
|
||||
Per-question decision: use browser for layout/mockup/diagram content; use text for conceptual questions.
|
||||
193
skills/cnpg-database/SKILL.md
Normal file
193
skills/cnpg-database/SKILL.md
Normal file
|
|
@ -0,0 +1,193 @@
|
|||
---
|
||||
name: cnpg-database
|
||||
description: Use when deploying, configuring, or troubleshooting CloudNativePG PostgreSQL clusters on Zoe's k3s homelab, including bootstrapping, secrets, S3 backups, migrations, and common failure modes.
|
||||
---
|
||||
|
||||
# CloudNativePG (CNPG) on k3s Homelab
|
||||
|
||||
## Overview
|
||||
|
||||
Deploy and operate CNPG PostgreSQL clusters on the production k3s cluster at `10.0.6.10`. CNPG operator v1.28.1. Always use ArgoCD sync-waves to enforce creation order.
|
||||
|
||||
## Environment
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| CNPG operator | 1.28.1 |
|
||||
| PostgreSQL image | `ghcr.io/cloudnative-pg/postgresql:18.1-system-trixie` (includes pgvector as `vector.so`) |
|
||||
| Fast storage | `nvme` (NFS-NVMe) |
|
||||
| Standard storage | `ssd` (NFS-SSD) |
|
||||
| S3 endpoint | `https://s3.ctz.fyi` |
|
||||
| S3 bucket | `cnpg-backups` |
|
||||
| Secrets backend | External Secrets Operator → ClusterSecretStore `openbao` |
|
||||
| OpenBao path | `secret/production/<namespace>/<cluster-name>` |
|
||||
|
||||
## Sync-Wave Order (Critical)
|
||||
|
||||
| Wave | Resource |
|
||||
|------|----------|
|
||||
| `-2` | CNPG `Cluster` |
|
||||
| `-1` | `ExternalSecret` for DB credentials |
|
||||
| `0` | App `Deployment` |
|
||||
|
||||
## Step 1 — Write Secrets to OpenBao
|
||||
|
||||
Do this **before** deploying anything:
|
||||
|
||||
```bash
|
||||
bao kv put secret/production/<namespace>/<app>-db \
|
||||
username=<app> \
|
||||
password=$(openssl rand -base64 32 | tr -d /=+ | head -c 32)
|
||||
```
|
||||
|
||||
Also create the backup credentials secret once per namespace:
|
||||
```bash
|
||||
bao kv put secret/production/<namespace>/cnpg-backup-s3-credentials \
|
||||
ACCESS_KEY_ID=<key> \
|
||||
ACCESS_SECRET_KEY=<secret>
|
||||
```
|
||||
|
||||
## Step 2 — ExternalSecret (sync-wave -1)
|
||||
|
||||
```yaml
|
||||
apiVersion: external-secrets.io/v1
|
||||
kind: ExternalSecret
|
||||
metadata:
|
||||
name: <app>-db-credentials
|
||||
namespace: <app>
|
||||
annotations:
|
||||
argocd.argoproj.io/sync-wave: "-1"
|
||||
spec:
|
||||
refreshInterval: 1h
|
||||
secretStoreRef:
|
||||
name: openbao
|
||||
kind: ClusterSecretStore
|
||||
target:
|
||||
name: <app>-db-credentials
|
||||
creationPolicy: Owner
|
||||
data:
|
||||
- secretKey: username
|
||||
remoteRef:
|
||||
key: secret/production/<namespace>/<app>-db
|
||||
property: username
|
||||
- secretKey: password
|
||||
remoteRef:
|
||||
key: secret/production/<namespace>/<app>-db
|
||||
property: password
|
||||
```
|
||||
|
||||
## Step 3 — CNPG Cluster (sync-wave -2)
|
||||
|
||||
```yaml
|
||||
apiVersion: postgresql.cnpg.io/v1
|
||||
kind: Cluster
|
||||
metadata:
|
||||
name: <app>-db
|
||||
namespace: <app>
|
||||
annotations:
|
||||
argocd.argoproj.io/sync-wave: "-2"
|
||||
spec:
|
||||
instances: 3 # Use 1 for dev/small workloads
|
||||
imageName: ghcr.io/cloudnative-pg/postgresql:18.1-system-trixie
|
||||
|
||||
storage:
|
||||
size: 10Gi
|
||||
storageClass: nvme # or ssd
|
||||
|
||||
bootstrap:
|
||||
initdb:
|
||||
database: <app>
|
||||
owner: <app>
|
||||
secret:
|
||||
name: <app>-db-credentials # MUST have keys 'username' and 'password' exactly
|
||||
|
||||
backup:
|
||||
barmanObjectStore:
|
||||
destinationPath: s3://cnpg-backups/<app>
|
||||
endpointURL: https://s3.ctz.fyi
|
||||
s3Credentials:
|
||||
accessKeyId:
|
||||
name: cnpg-backup-s3-credentials
|
||||
key: ACCESS_KEY_ID
|
||||
secretAccessKey:
|
||||
name: cnpg-backup-s3-credentials
|
||||
key: ACCESS_SECRET_KEY
|
||||
retentionPolicy: "30d"
|
||||
```
|
||||
|
||||
## CRITICAL: Secret Key Names
|
||||
|
||||
> **The bootstrap secret MUST have keys named exactly `username` and `password`.**
|
||||
> CNPG will appear healthy but the app cannot connect if keys are wrong (e.g., `user`, `pass`, `POSTGRES_USER`).
|
||||
> CNPG does NOT create a separate `-app` secret when `bootstrap.initdb.secret` is provided.
|
||||
|
||||
## Connecting from the App
|
||||
|
||||
CNPG auto-creates these services:
|
||||
|
||||
| Service | Use |
|
||||
|---------|-----|
|
||||
| `<cluster>-rw` | Read-write (primary) — **use this for app writes** |
|
||||
| `<cluster>-ro` | Read-only (replicas) — use for read-heavy queries |
|
||||
| `<cluster>-r` | Any instance |
|
||||
|
||||
```
|
||||
postgresql://<username>:<password>@<app>-db-rw.<namespace>.svc.cluster.local:5432/<database>
|
||||
```
|
||||
|
||||
## Manual Database Access
|
||||
|
||||
```bash
|
||||
# psql on primary
|
||||
kubectl exec -n <namespace> -it <cluster>-1 -- psql -U <username> <database>
|
||||
|
||||
# via cnpg plugin
|
||||
kubectl cnpg psql <cluster> -n <namespace>
|
||||
|
||||
# pg_dump
|
||||
kubectl exec -n <namespace> <cluster>-1 -- \
|
||||
pg_dump -U <username> <database> > dump.sql
|
||||
|
||||
# restore
|
||||
kubectl exec -n <namespace> -i <cluster>-1 -- \
|
||||
psql -U <username> <database> < dump.sql
|
||||
```
|
||||
|
||||
## Migrating from Docker/External Postgres
|
||||
|
||||
```bash
|
||||
# 1. Dump from source
|
||||
pg_dump -h <old-host> -U <user> <database> > dump.sql
|
||||
|
||||
# 2. Copy into pod
|
||||
kubectl cp dump.sql <namespace>/<pod>:/tmp/dump.sql
|
||||
|
||||
# 3. Restore
|
||||
kubectl exec -n <namespace> -it <pod> -- \
|
||||
psql -U <username> <database> -f /tmp/dump.sql
|
||||
```
|
||||
|
||||
## Scheduled Backups (Optional)
|
||||
|
||||
```yaml
|
||||
apiVersion: postgresql.cnpg.io/v1
|
||||
kind: ScheduledBackup
|
||||
metadata:
|
||||
name: <app>-db-backup
|
||||
namespace: <app>
|
||||
spec:
|
||||
schedule: "0 2 * * *" # 2am daily
|
||||
backupOwnerReference: self
|
||||
cluster:
|
||||
name: <app>-db
|
||||
```
|
||||
|
||||
## Common Issues
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|-------|-----|
|
||||
| Cluster stuck at "Setting up primary" | Secret missing or wrong key names | Check `<app>-db-credentials` exists and has `username`/`password` keys |
|
||||
| Pod in `Pending` | PVC can't provision | Check `nvme`/`ssd` NFS provisioner is healthy |
|
||||
| App can't connect | Using pod IP or wrong service | Use `<cluster>-rw` service, not pod IP |
|
||||
| 2/3 instances after node failure | Normal self-healing | Wait — CNPG will recover automatically |
|
||||
| Stale data after cluster recreation | Old PVCs still present | Delete PVCs manually before clean redeploy |
|
||||
25
skills/code-review-checklist/README.md
Normal file
25
skills/code-review-checklist/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
447
skills/code-review-checklist/SKILL.md
Normal file
447
skills/code-review-checklist/SKILL.md
Normal file
|
|
@ -0,0 +1,447 @@
|
|||
---
|
||||
name: code-review-checklist
|
||||
description: "Comprehensive checklist for conducting thorough code reviews covering functionality, security, performance, and maintainability"
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Code Review Checklist
|
||||
|
||||
## Overview
|
||||
|
||||
Provide a systematic checklist for conducting thorough code reviews. This skill helps reviewers ensure code quality, catch bugs, identify security issues, and maintain consistency across the codebase.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
- Use when reviewing pull requests
|
||||
- Use when conducting code audits
|
||||
- Use when establishing code review standards for a team
|
||||
- Use when training new developers on code review practices
|
||||
- Use when you want to ensure nothing is missed in reviews
|
||||
- Use when creating code review documentation
|
||||
|
||||
## How It Works
|
||||
|
||||
### Step 1: Understand the Context
|
||||
|
||||
Before reviewing code, I'll help you understand:
|
||||
- What problem does this code solve?
|
||||
- What are the requirements?
|
||||
- What files were changed and why?
|
||||
- Are there related issues or tickets?
|
||||
- What's the testing strategy?
|
||||
|
||||
### Step 2: Review Functionality
|
||||
|
||||
Check if the code works correctly:
|
||||
- Does it solve the stated problem?
|
||||
- Are edge cases handled?
|
||||
- Is error handling appropriate?
|
||||
- Are there any logical errors?
|
||||
- Does it match the requirements?
|
||||
|
||||
### Step 3: Review Code Quality
|
||||
|
||||
Assess code maintainability:
|
||||
- Is the code readable and clear?
|
||||
- Are names descriptive?
|
||||
- Is it properly structured?
|
||||
- Are functions/methods focused?
|
||||
- Is there unnecessary complexity?
|
||||
|
||||
### Step 4: Review Security
|
||||
|
||||
Check for security issues:
|
||||
- Are inputs validated?
|
||||
- Is sensitive data protected?
|
||||
- Are there SQL injection risks?
|
||||
- Is authentication/authorization correct?
|
||||
- Are dependencies secure?
|
||||
|
||||
### Step 5: Review Performance
|
||||
|
||||
Look for performance issues:
|
||||
- Are there unnecessary loops?
|
||||
- Is database access optimized?
|
||||
- Are there memory leaks?
|
||||
- Is caching used appropriately?
|
||||
- Are there N+1 query problems?
|
||||
|
||||
### Step 6: Review Tests
|
||||
|
||||
Verify test coverage:
|
||||
- Are there tests for new code?
|
||||
- Do tests cover edge cases?
|
||||
- Are tests meaningful?
|
||||
- Do all tests pass?
|
||||
- Is test coverage adequate?
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Functionality Review Checklist
|
||||
|
||||
```markdown
|
||||
## Functionality Review
|
||||
|
||||
### Requirements
|
||||
- [ ] Code solves the stated problem
|
||||
- [ ] All acceptance criteria are met
|
||||
- [ ] Edge cases are handled
|
||||
- [ ] Error cases are handled
|
||||
- [ ] User input is validated
|
||||
|
||||
### Logic
|
||||
- [ ] No logical errors or bugs
|
||||
- [ ] Conditions are correct (no off-by-one errors)
|
||||
- [ ] Loops terminate correctly
|
||||
- [ ] Recursion has proper base cases
|
||||
- [ ] State management is correct
|
||||
|
||||
### Error Handling
|
||||
- [ ] Errors are caught appropriately
|
||||
- [ ] Error messages are clear and helpful
|
||||
- [ ] Errors don't expose sensitive information
|
||||
- [ ] Failed operations are rolled back
|
||||
- [ ] Logging is appropriate
|
||||
|
||||
### Example Issues to Catch:
|
||||
|
||||
**❌ Bad - Missing validation:**
|
||||
\`\`\`javascript
|
||||
function createUser(email, password) {
|
||||
// No validation!
|
||||
return db.users.create({ email, password });
|
||||
}
|
||||
\`\`\`
|
||||
|
||||
**✅ Good - Proper validation:**
|
||||
\`\`\`javascript
|
||||
function createUser(email, password) {
|
||||
if (!email || !isValidEmail(email)) {
|
||||
throw new Error('Invalid email address');
|
||||
}
|
||||
if (!password || password.length < 8) {
|
||||
throw new Error('Password must be at least 8 characters');
|
||||
}
|
||||
return db.users.create({ email, password });
|
||||
}
|
||||
\`\`\`
|
||||
```
|
||||
|
||||
### Example 2: Security Review Checklist
|
||||
|
||||
```markdown
|
||||
## Security Review
|
||||
|
||||
### Input Validation
|
||||
- [ ] All user inputs are validated
|
||||
- [ ] SQL injection is prevented (use parameterized queries)
|
||||
- [ ] XSS is prevented (escape output)
|
||||
- [ ] CSRF protection is in place
|
||||
- [ ] File uploads are validated (type, size, content)
|
||||
|
||||
### Authentication & Authorization
|
||||
- [ ] Authentication is required where needed
|
||||
- [ ] Authorization checks are present
|
||||
- [ ] Passwords are hashed (never stored plain text)
|
||||
- [ ] Sessions are managed securely
|
||||
- [ ] Tokens expire appropriately
|
||||
|
||||
### Data Protection
|
||||
- [ ] Sensitive data is encrypted
|
||||
- [ ] API keys are not hardcoded
|
||||
- [ ] Environment variables are used for secrets
|
||||
- [ ] Personal data follows privacy regulations
|
||||
- [ ] Database credentials are secure
|
||||
|
||||
### Dependencies
|
||||
- [ ] No known vulnerable dependencies
|
||||
- [ ] Dependencies are up to date
|
||||
- [ ] Unnecessary dependencies are removed
|
||||
- [ ] Dependency versions are pinned
|
||||
|
||||
### Example Issues to Catch:
|
||||
|
||||
**❌ Bad - SQL injection risk:**
|
||||
\`\`\`javascript
|
||||
const query = \`SELECT * FROM users WHERE email = '\${email}'\`;
|
||||
db.query(query);
|
||||
\`\`\`
|
||||
|
||||
**✅ Good - Parameterized query:**
|
||||
\`\`\`javascript
|
||||
const query = 'SELECT * FROM users WHERE email = $1';
|
||||
db.query(query, [email]);
|
||||
\`\`\`
|
||||
|
||||
**❌ Bad - Hardcoded secret:**
|
||||
\`\`\`javascript
|
||||
const API_KEY = 'sk_live_abc123xyz';
|
||||
\`\`\`
|
||||
|
||||
**✅ Good - Environment variable:**
|
||||
\`\`\`javascript
|
||||
const API_KEY = process.env.API_KEY;
|
||||
if (!API_KEY) {
|
||||
throw new Error('API_KEY environment variable is required');
|
||||
}
|
||||
\`\`\`
|
||||
```
|
||||
|
||||
### Example 3: Code Quality Review Checklist
|
||||
|
||||
```markdown
|
||||
## Code Quality Review
|
||||
|
||||
### Readability
|
||||
- [ ] Code is easy to understand
|
||||
- [ ] Variable names are descriptive
|
||||
- [ ] Function names explain what they do
|
||||
- [ ] Complex logic has comments
|
||||
- [ ] Magic numbers are replaced with constants
|
||||
|
||||
### Structure
|
||||
- [ ] Functions are small and focused
|
||||
- [ ] Code follows DRY principle (Don't Repeat Yourself)
|
||||
- [ ] Proper separation of concerns
|
||||
- [ ] Consistent code style
|
||||
- [ ] No dead code or commented-out code
|
||||
|
||||
### Maintainability
|
||||
- [ ] Code is modular and reusable
|
||||
- [ ] Dependencies are minimal
|
||||
- [ ] Changes are backwards compatible
|
||||
- [ ] Breaking changes are documented
|
||||
- [ ] Technical debt is noted
|
||||
|
||||
### Example Issues to Catch:
|
||||
|
||||
**❌ Bad - Unclear naming:**
|
||||
\`\`\`javascript
|
||||
function calc(a, b, c) {
|
||||
return a * b + c;
|
||||
}
|
||||
\`\`\`
|
||||
|
||||
**✅ Good - Descriptive naming:**
|
||||
\`\`\`javascript
|
||||
function calculateTotalPrice(quantity, unitPrice, tax) {
|
||||
return quantity * unitPrice + tax;
|
||||
}
|
||||
\`\`\`
|
||||
|
||||
**❌ Bad - Function doing too much:**
|
||||
\`\`\`javascript
|
||||
function processOrder(order) {
|
||||
// Validate order
|
||||
if (!order.items) throw new Error('No items');
|
||||
|
||||
// Calculate total
|
||||
let total = 0;
|
||||
for (let item of order.items) {
|
||||
total += item.price * item.quantity;
|
||||
}
|
||||
|
||||
// Apply discount
|
||||
if (order.coupon) {
|
||||
total *= 0.9;
|
||||
}
|
||||
|
||||
// Process payment
|
||||
const payment = stripe.charge(total);
|
||||
|
||||
// Send email
|
||||
sendEmail(order.email, 'Order confirmed');
|
||||
|
||||
// Update inventory
|
||||
updateInventory(order.items);
|
||||
|
||||
return { orderId: order.id, total };
|
||||
}
|
||||
\`\`\`
|
||||
|
||||
**✅ Good - Separated concerns:**
|
||||
\`\`\`javascript
|
||||
function processOrder(order) {
|
||||
validateOrder(order);
|
||||
const total = calculateOrderTotal(order);
|
||||
const payment = processPayment(total);
|
||||
sendOrderConfirmation(order.email);
|
||||
updateInventory(order.items);
|
||||
|
||||
return { orderId: order.id, total };
|
||||
}
|
||||
\`\`\`
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### ✅ Do This
|
||||
|
||||
- **Review Small Changes** - Smaller PRs are easier to review thoroughly
|
||||
- **Check Tests First** - Verify tests pass and cover new code
|
||||
- **Run the Code** - Test it locally when possible
|
||||
- **Ask Questions** - Don't assume, ask for clarification
|
||||
- **Be Constructive** - Suggest improvements, don't just criticize
|
||||
- **Focus on Important Issues** - Don't nitpick minor style issues
|
||||
- **Use Automated Tools** - Linters, formatters, security scanners
|
||||
- **Review Documentation** - Check if docs are updated
|
||||
- **Consider Performance** - Think about scale and efficiency
|
||||
- **Check for Regressions** - Ensure existing functionality still works
|
||||
|
||||
### ❌ Don't Do This
|
||||
|
||||
- **Don't Approve Without Reading** - Actually review the code
|
||||
- **Don't Be Vague** - Provide specific feedback with examples
|
||||
- **Don't Ignore Security** - Security issues are critical
|
||||
- **Don't Skip Tests** - Untested code will cause problems
|
||||
- **Don't Be Rude** - Be respectful and professional
|
||||
- **Don't Rubber Stamp** - Every review should add value
|
||||
- **Don't Review When Tired** - You'll miss important issues
|
||||
- **Don't Forget Context** - Understand the bigger picture
|
||||
|
||||
## Complete Review Checklist
|
||||
|
||||
### Pre-Review
|
||||
- [ ] Read the PR description and linked issues
|
||||
- [ ] Understand what problem is being solved
|
||||
- [ ] Check if tests pass in CI/CD
|
||||
- [ ] Pull the branch and run it locally
|
||||
|
||||
### Functionality
|
||||
- [ ] Code solves the stated problem
|
||||
- [ ] Edge cases are handled
|
||||
- [ ] Error handling is appropriate
|
||||
- [ ] User input is validated
|
||||
- [ ] No logical errors
|
||||
|
||||
### Security
|
||||
- [ ] No SQL injection vulnerabilities
|
||||
- [ ] No XSS vulnerabilities
|
||||
- [ ] Authentication/authorization is correct
|
||||
- [ ] Sensitive data is protected
|
||||
- [ ] No hardcoded secrets
|
||||
|
||||
### Performance
|
||||
- [ ] No unnecessary database queries
|
||||
- [ ] No N+1 query problems
|
||||
- [ ] Efficient algorithms used
|
||||
- [ ] No memory leaks
|
||||
- [ ] Caching used appropriately
|
||||
|
||||
### Code Quality
|
||||
- [ ] Code is readable and clear
|
||||
- [ ] Names are descriptive
|
||||
- [ ] Functions are focused and small
|
||||
- [ ] No code duplication
|
||||
- [ ] Follows project conventions
|
||||
|
||||
### Tests
|
||||
- [ ] New code has tests
|
||||
- [ ] Tests cover edge cases
|
||||
- [ ] Tests are meaningful
|
||||
- [ ] All tests pass
|
||||
- [ ] Test coverage is adequate
|
||||
|
||||
### Documentation
|
||||
- [ ] Code comments explain why, not what
|
||||
- [ ] API documentation is updated
|
||||
- [ ] README is updated if needed
|
||||
- [ ] Breaking changes are documented
|
||||
- [ ] Migration guide provided if needed
|
||||
|
||||
### Git
|
||||
- [ ] Commit messages are clear
|
||||
- [ ] No merge conflicts
|
||||
- [ ] Branch is up to date with main
|
||||
- [ ] No unnecessary files committed
|
||||
- [ ] .gitignore is properly configured
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### Problem: Missing Edge Cases
|
||||
**Symptoms:** Code works for happy path but fails on edge cases
|
||||
**Solution:** Ask "What if...?" questions
|
||||
- What if the input is null?
|
||||
- What if the array is empty?
|
||||
- What if the user is not authenticated?
|
||||
- What if the network request fails?
|
||||
|
||||
### Problem: Security Vulnerabilities
|
||||
**Symptoms:** Code exposes security risks
|
||||
**Solution:** Use security checklist
|
||||
- Run security scanners (npm audit, Snyk)
|
||||
- Check OWASP Top 10
|
||||
- Validate all inputs
|
||||
- Use parameterized queries
|
||||
- Never trust user input
|
||||
|
||||
### Problem: Poor Test Coverage
|
||||
**Symptoms:** New code has no tests or inadequate tests
|
||||
**Solution:** Require tests for all new code
|
||||
- Unit tests for functions
|
||||
- Integration tests for features
|
||||
- Edge case tests
|
||||
- Error case tests
|
||||
|
||||
### Problem: Unclear Code
|
||||
**Symptoms:** Reviewer can't understand what code does
|
||||
**Solution:** Request improvements
|
||||
- Better variable names
|
||||
- Explanatory comments
|
||||
- Smaller functions
|
||||
- Clear structure
|
||||
|
||||
## Review Comment Templates
|
||||
|
||||
### Requesting Changes
|
||||
```markdown
|
||||
**Issue:** [Describe the problem]
|
||||
|
||||
**Current code:**
|
||||
\`\`\`javascript
|
||||
// Show problematic code
|
||||
\`\`\`
|
||||
|
||||
**Suggested fix:**
|
||||
\`\`\`javascript
|
||||
// Show improved code
|
||||
\`\`\`
|
||||
|
||||
**Why:** [Explain why this is better]
|
||||
```
|
||||
|
||||
### Asking Questions
|
||||
```markdown
|
||||
**Question:** [Your question]
|
||||
|
||||
**Context:** [Why you're asking]
|
||||
|
||||
**Suggestion:** [If you have one]
|
||||
```
|
||||
|
||||
### Praising Good Code
|
||||
```markdown
|
||||
**Nice!** [What you liked]
|
||||
|
||||
This is great because [explain why]
|
||||
```
|
||||
|
||||
## Related Skills
|
||||
|
||||
- `@requesting-code-review` - Prepare code for review
|
||||
- `@receiving-code-review` - Handle review feedback
|
||||
- `@systematic-debugging` - Debug issues found in review
|
||||
- `@test-driven-development` - Ensure code has tests
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- [Google Code Review Guidelines](https://google.github.io/eng-practices/review/)
|
||||
- [OWASP Top 10](https://owasp.org/www-project-top-ten/)
|
||||
- [Code Review Best Practices](https://github.com/thoughtbot/guides/tree/main/code-review)
|
||||
- [How to Review Code](https://www.kevinlondon.com/2015/05/05/code-review-best-practices.html)
|
||||
|
||||
---
|
||||
|
||||
**Pro Tip:** Use a checklist template for every review to ensure consistency and thoroughness. Customize it for your team's specific needs!
|
||||
25
skills/code-review-excellence/README.md
Normal file
25
skills/code-review-excellence/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
43
skills/code-review-excellence/SKILL.md
Normal file
43
skills/code-review-excellence/SKILL.md
Normal file
|
|
@ -0,0 +1,43 @@
|
|||
---
|
||||
name: code-review-excellence
|
||||
description: "Master effective code review practices to provide constructive feedback, catch bugs early, and foster knowledge sharing while maintaining team morale. Use when reviewing pull requests, establishing..."
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Code Review Excellence
|
||||
|
||||
Transform code reviews from gatekeeping to knowledge sharing through constructive feedback, systematic analysis, and collaborative improvement.
|
||||
|
||||
## Use this skill when
|
||||
|
||||
- Reviewing pull requests and code changes
|
||||
- Establishing code review standards
|
||||
- Mentoring developers through review feedback
|
||||
- Auditing for correctness, security, or performance
|
||||
|
||||
## Do not use this skill when
|
||||
|
||||
- There are no code changes to review
|
||||
- The task is a design-only discussion without code
|
||||
- You need to implement fixes instead of reviewing
|
||||
|
||||
## Instructions
|
||||
|
||||
- Read context, requirements, and test signals first.
|
||||
- Review for correctness, security, performance, and maintainability.
|
||||
- Provide actionable feedback with severity and rationale.
|
||||
- Ask clarifying questions when intent is unclear.
|
||||
- If detailed checklists are required, open `resources/implementation-playbook.md`.
|
||||
|
||||
## Output Format
|
||||
|
||||
- High-level summary of findings
|
||||
- Issues grouped by severity (blocking, important, minor)
|
||||
- Suggestions and questions
|
||||
- Test and coverage notes
|
||||
|
||||
## Resources
|
||||
|
||||
- `resources/implementation-playbook.md` for detailed review patterns and templates.
|
||||
25
skills/code-review-excellence/resources/README.md
Normal file
25
skills/code-review-excellence/resources/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
|
|
@ -0,0 +1,515 @@
|
|||
# Code Review Excellence Implementation Playbook
|
||||
|
||||
This file contains detailed patterns, checklists, and code samples referenced by the skill.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
- Reviewing pull requests and code changes
|
||||
- Establishing code review standards for teams
|
||||
- Mentoring junior developers through reviews
|
||||
- Conducting architecture reviews
|
||||
- Creating review checklists and guidelines
|
||||
- Improving team collaboration
|
||||
- Reducing code review cycle time
|
||||
- Maintaining code quality standards
|
||||
|
||||
## Core Principles
|
||||
|
||||
### 1. The Review Mindset
|
||||
|
||||
**Goals of Code Review:**
|
||||
- Catch bugs and edge cases
|
||||
- Ensure code maintainability
|
||||
- Share knowledge across team
|
||||
- Enforce coding standards
|
||||
- Improve design and architecture
|
||||
- Build team culture
|
||||
|
||||
**Not the Goals:**
|
||||
- Show off knowledge
|
||||
- Nitpick formatting (use linters)
|
||||
- Block progress unnecessarily
|
||||
- Rewrite to your preference
|
||||
|
||||
### 2. Effective Feedback
|
||||
|
||||
**Good Feedback is:**
|
||||
- Specific and actionable
|
||||
- Educational, not judgmental
|
||||
- Focused on the code, not the person
|
||||
- Balanced (praise good work too)
|
||||
- Prioritized (critical vs nice-to-have)
|
||||
|
||||
```markdown
|
||||
❌ Bad: "This is wrong."
|
||||
✅ Good: "This could cause a race condition when multiple users
|
||||
access simultaneously. Consider using a mutex here."
|
||||
|
||||
❌ Bad: "Why didn't you use X pattern?"
|
||||
✅ Good: "Have you considered the Repository pattern? It would
|
||||
make this easier to test. Here's an example: [link]"
|
||||
|
||||
❌ Bad: "Rename this variable."
|
||||
✅ Good: "[nit] Consider `userCount` instead of `uc` for
|
||||
clarity. Not blocking if you prefer to keep it."
|
||||
```
|
||||
|
||||
### 3. Review Scope
|
||||
|
||||
**What to Review:**
|
||||
- Logic correctness and edge cases
|
||||
- Security vulnerabilities
|
||||
- Performance implications
|
||||
- Test coverage and quality
|
||||
- Error handling
|
||||
- Documentation and comments
|
||||
- API design and naming
|
||||
- Architectural fit
|
||||
|
||||
**What Not to Review Manually:**
|
||||
- Code formatting (use Prettier, Black, etc.)
|
||||
- Import organization
|
||||
- Linting violations
|
||||
- Simple typos
|
||||
|
||||
## Review Process
|
||||
|
||||
### Phase 1: Context Gathering (2-3 minutes)
|
||||
|
||||
```markdown
|
||||
Before diving into code, understand:
|
||||
|
||||
1. Read PR description and linked issue
|
||||
2. Check PR size (>400 lines? Ask to split)
|
||||
3. Review CI/CD status (tests passing?)
|
||||
4. Understand the business requirement
|
||||
5. Note any relevant architectural decisions
|
||||
```
|
||||
|
||||
### Phase 2: High-Level Review (5-10 minutes)
|
||||
|
||||
```markdown
|
||||
1. **Architecture & Design**
|
||||
- Does the solution fit the problem?
|
||||
- Are there simpler approaches?
|
||||
- Is it consistent with existing patterns?
|
||||
- Will it scale?
|
||||
|
||||
2. **File Organization**
|
||||
- Are new files in the right places?
|
||||
- Is code grouped logically?
|
||||
- Are there duplicate files?
|
||||
|
||||
3. **Testing Strategy**
|
||||
- Are there tests?
|
||||
- Do tests cover edge cases?
|
||||
- Are tests readable?
|
||||
```
|
||||
|
||||
### Phase 3: Line-by-Line Review (10-20 minutes)
|
||||
|
||||
```markdown
|
||||
For each file:
|
||||
|
||||
1. **Logic & Correctness**
|
||||
- Edge cases handled?
|
||||
- Off-by-one errors?
|
||||
- Null/undefined checks?
|
||||
- Race conditions?
|
||||
|
||||
2. **Security**
|
||||
- Input validation?
|
||||
- SQL injection risks?
|
||||
- XSS vulnerabilities?
|
||||
- Sensitive data exposure?
|
||||
|
||||
3. **Performance**
|
||||
- N+1 queries?
|
||||
- Unnecessary loops?
|
||||
- Memory leaks?
|
||||
- Blocking operations?
|
||||
|
||||
4. **Maintainability**
|
||||
- Clear variable names?
|
||||
- Functions doing one thing?
|
||||
- Complex code commented?
|
||||
- Magic numbers extracted?
|
||||
```
|
||||
|
||||
### Phase 4: Summary & Decision (2-3 minutes)
|
||||
|
||||
```markdown
|
||||
1. Summarize key concerns
|
||||
2. Highlight what you liked
|
||||
3. Make clear decision:
|
||||
- ✅ Approve
|
||||
- 💬 Comment (minor suggestions)
|
||||
- 🔄 Request Changes (must address)
|
||||
4. Offer to pair if complex
|
||||
```
|
||||
|
||||
## Review Techniques
|
||||
|
||||
### Technique 1: The Checklist Method
|
||||
|
||||
```markdown
|
||||
## Security Checklist
|
||||
- [ ] User input validated and sanitized
|
||||
- [ ] SQL queries use parameterization
|
||||
- [ ] Authentication/authorization checked
|
||||
- [ ] Secrets not hardcoded
|
||||
- [ ] Error messages don't leak info
|
||||
|
||||
## Performance Checklist
|
||||
- [ ] No N+1 queries
|
||||
- [ ] Database queries indexed
|
||||
- [ ] Large lists paginated
|
||||
- [ ] Expensive operations cached
|
||||
- [ ] No blocking I/O in hot paths
|
||||
|
||||
## Testing Checklist
|
||||
- [ ] Happy path tested
|
||||
- [ ] Edge cases covered
|
||||
- [ ] Error cases tested
|
||||
- [ ] Test names are descriptive
|
||||
- [ ] Tests are deterministic
|
||||
```
|
||||
|
||||
### Technique 2: The Question Approach
|
||||
|
||||
Instead of stating problems, ask questions to encourage thinking:
|
||||
|
||||
```markdown
|
||||
❌ "This will fail if the list is empty."
|
||||
✅ "What happens if `items` is an empty array?"
|
||||
|
||||
❌ "You need error handling here."
|
||||
✅ "How should this behave if the API call fails?"
|
||||
|
||||
❌ "This is inefficient."
|
||||
✅ "I see this loops through all users. Have we considered
|
||||
the performance impact with 100k users?"
|
||||
```
|
||||
|
||||
### Technique 3: Suggest, Don't Command
|
||||
|
||||
```markdown
|
||||
## Use Collaborative Language
|
||||
|
||||
❌ "You must change this to use async/await"
|
||||
✅ "Suggestion: async/await might make this more readable:
|
||||
```typescript
|
||||
async function fetchUser(id: string) {
|
||||
const user = await db.query('SELECT * FROM users WHERE id = ?', id);
|
||||
return user;
|
||||
}
|
||||
```
|
||||
What do you think?"
|
||||
|
||||
❌ "Extract this into a function"
|
||||
✅ "This logic appears in 3 places. Would it make sense to
|
||||
extract it into a shared utility function?"
|
||||
```
|
||||
|
||||
### Technique 4: Differentiate Severity
|
||||
|
||||
```markdown
|
||||
Use labels to indicate priority:
|
||||
|
||||
🔴 [blocking] - Must fix before merge
|
||||
🟡 [important] - Should fix, discuss if disagree
|
||||
🟢 [nit] - Nice to have, not blocking
|
||||
💡 [suggestion] - Alternative approach to consider
|
||||
📚 [learning] - Educational comment, no action needed
|
||||
🎉 [praise] - Good work, keep it up!
|
||||
|
||||
Example:
|
||||
"🔴 [blocking] This SQL query is vulnerable to injection.
|
||||
Please use parameterized queries."
|
||||
|
||||
"🟢 [nit] Consider renaming `data` to `userData` for clarity."
|
||||
|
||||
"🎉 [praise] Excellent test coverage! This will catch edge cases."
|
||||
```
|
||||
|
||||
## Language-Specific Patterns
|
||||
|
||||
### Python Code Review
|
||||
|
||||
```python
|
||||
# Check for Python-specific issues
|
||||
|
||||
# ❌ Mutable default arguments
|
||||
def add_item(item, items=[]): # Bug! Shared across calls
|
||||
items.append(item)
|
||||
return items
|
||||
|
||||
# ✅ Use None as default
|
||||
def add_item(item, items=None):
|
||||
if items is None:
|
||||
items = []
|
||||
items.append(item)
|
||||
return items
|
||||
|
||||
# ❌ Catching too broad
|
||||
try:
|
||||
result = risky_operation()
|
||||
except: # Catches everything, even KeyboardInterrupt!
|
||||
pass
|
||||
|
||||
# ✅ Catch specific exceptions
|
||||
try:
|
||||
result = risky_operation()
|
||||
except ValueError as e:
|
||||
logger.error(f"Invalid value: {e}")
|
||||
raise
|
||||
|
||||
# ❌ Using mutable class attributes
|
||||
class User:
|
||||
permissions = [] # Shared across all instances!
|
||||
|
||||
# ✅ Initialize in __init__
|
||||
class User:
|
||||
def __init__(self):
|
||||
self.permissions = []
|
||||
```
|
||||
|
||||
### TypeScript/JavaScript Code Review
|
||||
|
||||
```typescript
|
||||
// Check for TypeScript-specific issues
|
||||
|
||||
// ❌ Using any defeats type safety
|
||||
function processData(data: any) { // Avoid any
|
||||
return data.value;
|
||||
}
|
||||
|
||||
// ✅ Use proper types
|
||||
interface DataPayload {
|
||||
value: string;
|
||||
}
|
||||
function processData(data: DataPayload) {
|
||||
return data.value;
|
||||
}
|
||||
|
||||
// ❌ Not handling async errors
|
||||
async function fetchUser(id: string) {
|
||||
const response = await fetch(`/api/users/${id}`);
|
||||
return response.json(); // What if network fails?
|
||||
}
|
||||
|
||||
// ✅ Handle errors properly
|
||||
async function fetchUser(id: string): Promise<User> {
|
||||
try {
|
||||
const response = await fetch(`/api/users/${id}`);
|
||||
if (!response.ok) {
|
||||
throw new Error(`HTTP ${response.status}`);
|
||||
}
|
||||
return await response.json();
|
||||
} catch (error) {
|
||||
console.error('Failed to fetch user:', error);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
// ❌ Mutation of props
|
||||
function UserProfile({ user }: Props) {
|
||||
user.lastViewed = new Date(); // Mutating prop!
|
||||
return <div>{user.name}</div>;
|
||||
}
|
||||
|
||||
// ✅ Don't mutate props
|
||||
function UserProfile({ user, onView }: Props) {
|
||||
useEffect(() => {
|
||||
onView(user.id); // Notify parent to update
|
||||
}, [user.id]);
|
||||
return <div>{user.name}</div>;
|
||||
}
|
||||
```
|
||||
|
||||
## Advanced Review Patterns
|
||||
|
||||
### Pattern 1: Architectural Review
|
||||
|
||||
```markdown
|
||||
When reviewing significant changes:
|
||||
|
||||
1. **Design Document First**
|
||||
- For large features, request design doc before code
|
||||
- Review design with team before implementation
|
||||
- Agree on approach to avoid rework
|
||||
|
||||
2. **Review in Stages**
|
||||
- First PR: Core abstractions and interfaces
|
||||
- Second PR: Implementation
|
||||
- Third PR: Integration and tests
|
||||
- Easier to review, faster to iterate
|
||||
|
||||
3. **Consider Alternatives**
|
||||
- "Have we considered using [pattern/library]?"
|
||||
- "What's the tradeoff vs. the simpler approach?"
|
||||
- "How will this evolve as requirements change?"
|
||||
```
|
||||
|
||||
### Pattern 2: Test Quality Review
|
||||
|
||||
```typescript
|
||||
// ❌ Poor test: Implementation detail testing
|
||||
test('increments counter variable', () => {
|
||||
const component = render(<Counter />);
|
||||
const button = component.getByRole('button');
|
||||
fireEvent.click(button);
|
||||
expect(component.state.counter).toBe(1); // Testing internal state
|
||||
});
|
||||
|
||||
// ✅ Good test: Behavior testing
|
||||
test('displays incremented count when clicked', () => {
|
||||
render(<Counter />);
|
||||
const button = screen.getByRole('button', { name: /increment/i });
|
||||
fireEvent.click(button);
|
||||
expect(screen.getByText('Count: 1')).toBeInTheDocument();
|
||||
});
|
||||
|
||||
// Review questions for tests:
|
||||
// - Do tests describe behavior, not implementation?
|
||||
// - Are test names clear and descriptive?
|
||||
// - Do tests cover edge cases?
|
||||
// - Are tests independent (no shared state)?
|
||||
// - Can tests run in any order?
|
||||
```
|
||||
|
||||
### Pattern 3: Security Review
|
||||
|
||||
```markdown
|
||||
## Security Review Checklist
|
||||
|
||||
### Authentication & Authorization
|
||||
- [ ] Is authentication required where needed?
|
||||
- [ ] Are authorization checks before every action?
|
||||
- [ ] Is JWT validation proper (signature, expiry)?
|
||||
- [ ] Are API keys/secrets properly secured?
|
||||
|
||||
### Input Validation
|
||||
- [ ] All user inputs validated?
|
||||
- [ ] File uploads restricted (size, type)?
|
||||
- [ ] SQL queries parameterized?
|
||||
- [ ] XSS protection (escape output)?
|
||||
|
||||
### Data Protection
|
||||
- [ ] Passwords hashed (bcrypt/argon2)?
|
||||
- [ ] Sensitive data encrypted at rest?
|
||||
- [ ] HTTPS enforced for sensitive data?
|
||||
- [ ] PII handled according to regulations?
|
||||
|
||||
### Common Vulnerabilities
|
||||
- [ ] No eval() or similar dynamic execution?
|
||||
- [ ] No hardcoded secrets?
|
||||
- [ ] CSRF protection for state-changing operations?
|
||||
- [ ] Rate limiting on public endpoints?
|
||||
```
|
||||
|
||||
## Giving Difficult Feedback
|
||||
|
||||
### Pattern: The Sandwich Method (Modified)
|
||||
|
||||
```markdown
|
||||
Traditional: Praise + Criticism + Praise (feels fake)
|
||||
|
||||
Better: Context + Specific Issue + Helpful Solution
|
||||
|
||||
Example:
|
||||
"I noticed the payment processing logic is inline in the
|
||||
controller. This makes it harder to test and reuse.
|
||||
|
||||
[Specific Issue]
|
||||
The calculateTotal() function mixes tax calculation,
|
||||
discount logic, and database queries, making it difficult
|
||||
to unit test and reason about.
|
||||
|
||||
[Helpful Solution]
|
||||
Could we extract this into a PaymentService class? That
|
||||
would make it testable and reusable. I can pair with you
|
||||
on this if helpful."
|
||||
```
|
||||
|
||||
### Handling Disagreements
|
||||
|
||||
```markdown
|
||||
When author disagrees with your feedback:
|
||||
|
||||
1. **Seek to Understand**
|
||||
"Help me understand your approach. What led you to
|
||||
choose this pattern?"
|
||||
|
||||
2. **Acknowledge Valid Points**
|
||||
"That's a good point about X. I hadn't considered that."
|
||||
|
||||
3. **Provide Data**
|
||||
"I'm concerned about performance. Can we add a benchmark
|
||||
to validate the approach?"
|
||||
|
||||
4. **Escalate if Needed**
|
||||
"Let's get [architect/senior dev] to weigh in on this."
|
||||
|
||||
5. **Know When to Let Go**
|
||||
If it's working and not a critical issue, approve it.
|
||||
Perfection is the enemy of progress.
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Review Promptly**: Within 24 hours, ideally same day
|
||||
2. **Limit PR Size**: 200-400 lines max for effective review
|
||||
3. **Review in Time Blocks**: 60 minutes max, take breaks
|
||||
4. **Use Review Tools**: GitHub, GitLab, or dedicated tools
|
||||
5. **Automate What You Can**: Linters, formatters, security scans
|
||||
6. **Build Rapport**: Emoji, praise, and empathy matter
|
||||
7. **Be Available**: Offer to pair on complex issues
|
||||
8. **Learn from Others**: Review others' review comments
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Perfectionism**: Blocking PRs for minor style preferences
|
||||
- **Scope Creep**: "While you're at it, can you also..."
|
||||
- **Inconsistency**: Different standards for different people
|
||||
- **Delayed Reviews**: Letting PRs sit for days
|
||||
- **Ghosting**: Requesting changes then disappearing
|
||||
- **Rubber Stamping**: Approving without actually reviewing
|
||||
- **Bike Shedding**: Debating trivial details extensively
|
||||
|
||||
## Templates
|
||||
|
||||
### PR Review Comment Template
|
||||
|
||||
```markdown
|
||||
## Summary
|
||||
[Brief overview of what was reviewed]
|
||||
|
||||
## Strengths
|
||||
- [What was done well]
|
||||
- [Good patterns or approaches]
|
||||
|
||||
## Required Changes
|
||||
🔴 [Blocking issue 1]
|
||||
🔴 [Blocking issue 2]
|
||||
|
||||
## Suggestions
|
||||
💡 [Improvement 1]
|
||||
💡 [Improvement 2]
|
||||
|
||||
## Questions
|
||||
❓ [Clarification needed on X]
|
||||
❓ [Alternative approach consideration]
|
||||
|
||||
## Verdict
|
||||
✅ Approve after addressing required changes
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
- **references/code-review-best-practices.md**: Comprehensive review guidelines
|
||||
- **references/common-bugs-checklist.md**: Language-specific bugs to watch for
|
||||
- **references/security-review-guide.md**: Security-focused review checklist
|
||||
- **assets/pr-review-template.md**: Standard review comment template
|
||||
- **assets/review-checklist.md**: Quick reference checklist
|
||||
- **scripts/pr-analyzer.py**: Analyze PR complexity and suggest reviewers
|
||||
25
skills/code-reviewer/README.md
Normal file
25
skills/code-reviewer/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
175
skills/code-reviewer/SKILL.md
Normal file
175
skills/code-reviewer/SKILL.md
Normal file
|
|
@ -0,0 +1,175 @@
|
|||
---
|
||||
name: code-reviewer
|
||||
description: "Elite code review expert specializing in modern AI-powered code"
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
## Use this skill when
|
||||
|
||||
- Working on code reviewer tasks or workflows
|
||||
- Needing guidance, best practices, or checklists for code reviewer
|
||||
|
||||
## Do not use this skill when
|
||||
|
||||
- The task is unrelated to code reviewer
|
||||
- You need a different domain or tool outside this scope
|
||||
|
||||
## Instructions
|
||||
|
||||
- Clarify goals, constraints, and required inputs.
|
||||
- Apply relevant best practices and validate outcomes.
|
||||
- Provide actionable steps and verification.
|
||||
- If detailed examples are required, open `resources/implementation-playbook.md`.
|
||||
|
||||
You are an elite code review expert specializing in modern code analysis techniques, AI-powered review tools, and production-grade quality assurance.
|
||||
|
||||
## Expert Purpose
|
||||
Master code reviewer focused on ensuring code quality, security, performance, and maintainability using cutting-edge analysis tools and techniques. Combines deep technical expertise with modern AI-assisted review processes, static analysis tools, and production reliability practices to deliver comprehensive code assessments that prevent bugs, security vulnerabilities, and production incidents.
|
||||
|
||||
## Capabilities
|
||||
|
||||
### AI-Powered Code Analysis
|
||||
- Integration with modern AI review tools (Trag, Bito, Codiga, GitHub Copilot)
|
||||
- Natural language pattern definition for custom review rules
|
||||
- Context-aware code analysis using LLMs and machine learning
|
||||
- Automated pull request analysis and comment generation
|
||||
- Real-time feedback integration with CLI tools and IDEs
|
||||
- Custom rule-based reviews with team-specific patterns
|
||||
- Multi-language AI code analysis and suggestion generation
|
||||
|
||||
### Modern Static Analysis Tools
|
||||
- SonarQube, CodeQL, and Semgrep for comprehensive code scanning
|
||||
- Security-focused analysis with Snyk, Bandit, and OWASP tools
|
||||
- Performance analysis with profilers and complexity analyzers
|
||||
- Dependency vulnerability scanning with npm audit, pip-audit
|
||||
- License compliance checking and open source risk assessment
|
||||
- Code quality metrics with cyclomatic complexity analysis
|
||||
- Technical debt assessment and code smell detection
|
||||
|
||||
### Security Code Review
|
||||
- OWASP Top 10 vulnerability detection and prevention
|
||||
- Input validation and sanitization review
|
||||
- Authentication and authorization implementation analysis
|
||||
- Cryptographic implementation and key management review
|
||||
- SQL injection, XSS, and CSRF prevention verification
|
||||
- Secrets and credential management assessment
|
||||
- API security patterns and rate limiting implementation
|
||||
- Container and infrastructure security code review
|
||||
|
||||
### Performance & Scalability Analysis
|
||||
- Database query optimization and N+1 problem detection
|
||||
- Memory leak and resource management analysis
|
||||
- Caching strategy implementation review
|
||||
- Asynchronous programming pattern verification
|
||||
- Load testing integration and performance benchmark review
|
||||
- Connection pooling and resource limit configuration
|
||||
- Microservices performance patterns and anti-patterns
|
||||
- Cloud-native performance optimization techniques
|
||||
|
||||
### Configuration & Infrastructure Review
|
||||
- Production configuration security and reliability analysis
|
||||
- Database connection pool and timeout configuration review
|
||||
- Container orchestration and Kubernetes manifest analysis
|
||||
- Infrastructure as Code (Terraform, CloudFormation) review
|
||||
- CI/CD pipeline security and reliability assessment
|
||||
- Environment-specific configuration validation
|
||||
- Secrets management and credential security review
|
||||
- Monitoring and observability configuration verification
|
||||
|
||||
### Modern Development Practices
|
||||
- Test-Driven Development (TDD) and test coverage analysis
|
||||
- Behavior-Driven Development (BDD) scenario review
|
||||
- Contract testing and API compatibility verification
|
||||
- Feature flag implementation and rollback strategy review
|
||||
- Blue-green and canary deployment pattern analysis
|
||||
- Observability and monitoring code integration review
|
||||
- Error handling and resilience pattern implementation
|
||||
- Documentation and API specification completeness
|
||||
|
||||
### Code Quality & Maintainability
|
||||
- Clean Code principles and SOLID pattern adherence
|
||||
- Design pattern implementation and architectural consistency
|
||||
- Code duplication detection and refactoring opportunities
|
||||
- Naming convention and code style compliance
|
||||
- Technical debt identification and remediation planning
|
||||
- Legacy code modernization and refactoring strategies
|
||||
- Code complexity reduction and simplification techniques
|
||||
- Maintainability metrics and long-term sustainability assessment
|
||||
|
||||
### Team Collaboration & Process
|
||||
- Pull request workflow optimization and best practices
|
||||
- Code review checklist creation and enforcement
|
||||
- Team coding standards definition and compliance
|
||||
- Mentor-style feedback and knowledge sharing facilitation
|
||||
- Code review automation and tool integration
|
||||
- Review metrics tracking and team performance analysis
|
||||
- Documentation standards and knowledge base maintenance
|
||||
- Onboarding support and code review training
|
||||
|
||||
### Language-Specific Expertise
|
||||
- JavaScript/TypeScript modern patterns and React/Vue best practices
|
||||
- Python code quality with PEP 8 compliance and performance optimization
|
||||
- Java enterprise patterns and Spring framework best practices
|
||||
- Go concurrent programming and performance optimization
|
||||
- Rust memory safety and performance critical code review
|
||||
- C# .NET Core patterns and Entity Framework optimization
|
||||
- PHP modern frameworks and security best practices
|
||||
- Database query optimization across SQL and NoSQL platforms
|
||||
|
||||
### Integration & Automation
|
||||
- GitHub Actions, GitLab CI/CD, and Jenkins pipeline integration
|
||||
- Slack, Teams, and communication tool integration
|
||||
- IDE integration with VS Code, IntelliJ, and development environments
|
||||
- Custom webhook and API integration for workflow automation
|
||||
- Code quality gates and deployment pipeline integration
|
||||
- Automated code formatting and linting tool configuration
|
||||
- Review comment template and checklist automation
|
||||
- Metrics dashboard and reporting tool integration
|
||||
|
||||
## Behavioral Traits
|
||||
- Maintains constructive and educational tone in all feedback
|
||||
- Focuses on teaching and knowledge transfer, not just finding issues
|
||||
- Balances thorough analysis with practical development velocity
|
||||
- Prioritizes security and production reliability above all else
|
||||
- Emphasizes testability and maintainability in every review
|
||||
- Encourages best practices while being pragmatic about deadlines
|
||||
- Provides specific, actionable feedback with code examples
|
||||
- Considers long-term technical debt implications of all changes
|
||||
- Stays current with emerging security threats and mitigation strategies
|
||||
- Champions automation and tooling to improve review efficiency
|
||||
|
||||
## Knowledge Base
|
||||
- Modern code review tools and AI-assisted analysis platforms
|
||||
- OWASP security guidelines and vulnerability assessment techniques
|
||||
- Performance optimization patterns for high-scale applications
|
||||
- Cloud-native development and containerization best practices
|
||||
- DevSecOps integration and shift-left security methodologies
|
||||
- Static analysis tool configuration and custom rule development
|
||||
- Production incident analysis and preventive code review techniques
|
||||
- Modern testing frameworks and quality assurance practices
|
||||
- Software architecture patterns and design principles
|
||||
- Regulatory compliance requirements (SOC2, PCI DSS, GDPR)
|
||||
|
||||
## Response Approach
|
||||
1. **Analyze code context** and identify review scope and priorities
|
||||
2. **Apply automated tools** for initial analysis and vulnerability detection
|
||||
3. **Conduct manual review** for logic, architecture, and business requirements
|
||||
4. **Assess security implications** with focus on production vulnerabilities
|
||||
5. **Evaluate performance impact** and scalability considerations
|
||||
6. **Review configuration changes** with special attention to production risks
|
||||
7. **Provide structured feedback** organized by severity and priority
|
||||
8. **Suggest improvements** with specific code examples and alternatives
|
||||
9. **Document decisions** and rationale for complex review points
|
||||
10. **Follow up** on implementation and provide continuous guidance
|
||||
|
||||
## Example Interactions
|
||||
- "Review this microservice API for security vulnerabilities and performance issues"
|
||||
- "Analyze this database migration for potential production impact"
|
||||
- "Assess this React component for accessibility and performance best practices"
|
||||
- "Review this Kubernetes deployment configuration for security and reliability"
|
||||
- "Evaluate this authentication implementation for OAuth2 compliance"
|
||||
- "Analyze this caching strategy for race conditions and data consistency"
|
||||
- "Review this CI/CD pipeline for security and deployment best practices"
|
||||
- "Assess this error handling implementation for observability and debugging"
|
||||
25
skills/comprehensive-review-pr-enhance/README.md
Normal file
25
skills/comprehensive-review-pr-enhance/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
49
skills/comprehensive-review-pr-enhance/SKILL.md
Normal file
49
skills/comprehensive-review-pr-enhance/SKILL.md
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
---
|
||||
name: comprehensive-review-pr-enhance
|
||||
description: "You are a PR optimization expert specializing in creating high-quality pull requests that facilitate efficient code reviews. Generate comprehensive PR descriptions, automate review processes, and e..."
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Pull Request Enhancement
|
||||
|
||||
You are a PR optimization expert specializing in creating high-quality pull requests that facilitate efficient code reviews. Generate comprehensive PR descriptions, automate review processes, and ensure PRs follow best practices for clarity, size, and reviewability.
|
||||
|
||||
## Use this skill when
|
||||
|
||||
- Writing or improving PR descriptions
|
||||
- Summarizing changes for faster reviews
|
||||
- Organizing tests, risks, and rollout notes
|
||||
- Reducing PR size or improving reviewability
|
||||
|
||||
## Do not use this skill when
|
||||
|
||||
- There is no PR or change list to summarize
|
||||
- You need a full code review instead of PR polishing
|
||||
- The task is unrelated to software delivery
|
||||
|
||||
## Context
|
||||
The user needs to create or improve pull requests with detailed descriptions, proper documentation, test coverage analysis, and review facilitation. Focus on making PRs that are easy to review, well-documented, and include all necessary context.
|
||||
|
||||
## Requirements
|
||||
$ARGUMENTS
|
||||
|
||||
## Instructions
|
||||
|
||||
- Analyze the diff and identify intent and scope.
|
||||
- Summarize changes, tests, and risks clearly.
|
||||
- Highlight breaking changes and rollout notes.
|
||||
- Add checklists and reviewer guidance.
|
||||
- If detailed templates are required, open `resources/implementation-playbook.md`.
|
||||
|
||||
## Output Format
|
||||
|
||||
- PR summary and scope
|
||||
- What changed and why
|
||||
- Tests performed and results
|
||||
- Risks, rollbacks, and reviewer notes
|
||||
|
||||
## Resources
|
||||
|
||||
- `resources/implementation-playbook.md` for detailed templates and examples.
|
||||
25
skills/comprehensive-review-pr-enhance/resources/README.md
Normal file
25
skills/comprehensive-review-pr-enhance/resources/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
|
|
@ -0,0 +1,691 @@
|
|||
# Pull Request Enhancement Implementation Playbook
|
||||
|
||||
This file contains detailed patterns, checklists, and code samples referenced by the skill.
|
||||
|
||||
## Instructions
|
||||
|
||||
### 1. PR Analysis
|
||||
|
||||
Analyze the changes and generate insights:
|
||||
|
||||
**Change Summary Generator**
|
||||
```python
|
||||
import subprocess
|
||||
import re
|
||||
from collections import defaultdict
|
||||
|
||||
class PRAnalyzer:
|
||||
def analyze_changes(self, base_branch='main'):
|
||||
"""
|
||||
Analyze changes between current branch and base
|
||||
"""
|
||||
analysis = {
|
||||
'files_changed': self._get_changed_files(base_branch),
|
||||
'change_statistics': self._get_change_stats(base_branch),
|
||||
'change_categories': self._categorize_changes(base_branch),
|
||||
'potential_impacts': self._assess_impacts(base_branch),
|
||||
'dependencies_affected': self._check_dependencies(base_branch)
|
||||
}
|
||||
|
||||
return analysis
|
||||
|
||||
def _get_changed_files(self, base_branch):
|
||||
"""Get list of changed files with statistics"""
|
||||
cmd = f"git diff --name-status {base_branch}...HEAD"
|
||||
result = subprocess.run(cmd.split(), capture_output=True, text=True)
|
||||
|
||||
files = []
|
||||
for line in result.stdout.strip().split('\n'):
|
||||
if line:
|
||||
status, filename = line.split('\t', 1)
|
||||
files.append({
|
||||
'filename': filename,
|
||||
'status': self._parse_status(status),
|
||||
'category': self._categorize_file(filename)
|
||||
})
|
||||
|
||||
return files
|
||||
|
||||
def _get_change_stats(self, base_branch):
|
||||
"""Get detailed change statistics"""
|
||||
cmd = f"git diff --shortstat {base_branch}...HEAD"
|
||||
result = subprocess.run(cmd.split(), capture_output=True, text=True)
|
||||
|
||||
# Parse output like: "10 files changed, 450 insertions(+), 123 deletions(-)"
|
||||
stats_pattern = r'(\d+) files? changed(?:, (\d+) insertions?\(\+\))?(?:, (\d+) deletions?\(-\))?'
|
||||
match = re.search(stats_pattern, result.stdout)
|
||||
|
||||
if match:
|
||||
files, insertions, deletions = match.groups()
|
||||
return {
|
||||
'files_changed': int(files),
|
||||
'insertions': int(insertions or 0),
|
||||
'deletions': int(deletions or 0),
|
||||
'net_change': int(insertions or 0) - int(deletions or 0)
|
||||
}
|
||||
|
||||
return {'files_changed': 0, 'insertions': 0, 'deletions': 0, 'net_change': 0}
|
||||
|
||||
def _categorize_file(self, filename):
|
||||
"""Categorize file by type"""
|
||||
categories = {
|
||||
'source': ['.js', '.ts', '.py', '.java', '.go', '.rs'],
|
||||
'test': ['test', 'spec', '.test.', '.spec.'],
|
||||
'config': ['config', '.json', '.yml', '.yaml', '.toml'],
|
||||
'docs': ['.md', 'README', 'CHANGELOG', '.rst'],
|
||||
'styles': ['.css', '.scss', '.less'],
|
||||
'build': ['Makefile', 'Dockerfile', '.gradle', 'pom.xml']
|
||||
}
|
||||
|
||||
for category, patterns in categories.items():
|
||||
if any(pattern in filename for pattern in patterns):
|
||||
return category
|
||||
|
||||
return 'other'
|
||||
```
|
||||
|
||||
### 2. PR Description Generation
|
||||
|
||||
Create comprehensive PR descriptions:
|
||||
|
||||
**Description Template Generator**
|
||||
```python
|
||||
def generate_pr_description(analysis, commits):
|
||||
"""
|
||||
Generate detailed PR description from analysis
|
||||
"""
|
||||
description = f"""
|
||||
## Summary
|
||||
|
||||
{generate_summary(analysis, commits)}
|
||||
|
||||
## What Changed
|
||||
|
||||
{generate_change_list(analysis)}
|
||||
|
||||
## Why These Changes
|
||||
|
||||
{extract_why_from_commits(commits)}
|
||||
|
||||
## Type of Change
|
||||
|
||||
{determine_change_types(analysis)}
|
||||
|
||||
## How Has This Been Tested?
|
||||
|
||||
{generate_test_section(analysis)}
|
||||
|
||||
## Visual Changes
|
||||
|
||||
{generate_visual_section(analysis)}
|
||||
|
||||
## Performance Impact
|
||||
|
||||
{analyze_performance_impact(analysis)}
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
{identify_breaking_changes(analysis)}
|
||||
|
||||
## Dependencies
|
||||
|
||||
{list_dependency_changes(analysis)}
|
||||
|
||||
## Checklist
|
||||
|
||||
{generate_review_checklist(analysis)}
|
||||
|
||||
## Additional Notes
|
||||
|
||||
{generate_additional_notes(analysis)}
|
||||
"""
|
||||
return description
|
||||
|
||||
def generate_summary(analysis, commits):
|
||||
"""Generate executive summary"""
|
||||
stats = analysis['change_statistics']
|
||||
|
||||
# Extract main purpose from commits
|
||||
main_purpose = extract_main_purpose(commits)
|
||||
|
||||
summary = f"""
|
||||
This PR {main_purpose}.
|
||||
|
||||
**Impact**: {stats['files_changed']} files changed ({stats['insertions']} additions, {stats['deletions']} deletions)
|
||||
**Risk Level**: {calculate_risk_level(analysis)}
|
||||
**Review Time**: ~{estimate_review_time(stats)} minutes
|
||||
"""
|
||||
return summary
|
||||
|
||||
def generate_change_list(analysis):
|
||||
"""Generate categorized change list"""
|
||||
changes_by_category = defaultdict(list)
|
||||
|
||||
for file in analysis['files_changed']:
|
||||
changes_by_category[file['category']].append(file)
|
||||
|
||||
change_list = ""
|
||||
icons = {
|
||||
'source': '🔧',
|
||||
'test': '✅',
|
||||
'docs': '📝',
|
||||
'config': '⚙️',
|
||||
'styles': '🎨',
|
||||
'build': '🏗️',
|
||||
'other': '📁'
|
||||
}
|
||||
|
||||
for category, files in changes_by_category.items():
|
||||
change_list += f"\n### {icons.get(category, '📁')} {category.title()} Changes\n"
|
||||
for file in files[:10]: # Limit to 10 files per category
|
||||
change_list += f"- {file['status']}: `{file['filename']}`\n"
|
||||
if len(files) > 10:
|
||||
change_list += f"- ...and {len(files) - 10} more\n"
|
||||
|
||||
return change_list
|
||||
```
|
||||
|
||||
### 3. Review Checklist Generation
|
||||
|
||||
Create automated review checklists:
|
||||
|
||||
**Smart Checklist Generator**
|
||||
```python
|
||||
def generate_review_checklist(analysis):
|
||||
"""
|
||||
Generate context-aware review checklist
|
||||
"""
|
||||
checklist = ["## Review Checklist\n"]
|
||||
|
||||
# General items
|
||||
general_items = [
|
||||
"Code follows project style guidelines",
|
||||
"Self-review completed",
|
||||
"Comments added for complex logic",
|
||||
"No debugging code left",
|
||||
"No sensitive data exposed"
|
||||
]
|
||||
|
||||
# Add general items
|
||||
checklist.append("### General")
|
||||
for item in general_items:
|
||||
checklist.append(f"- [ ] {item}")
|
||||
|
||||
# File-specific checks
|
||||
file_types = {file['category'] for file in analysis['files_changed']}
|
||||
|
||||
if 'source' in file_types:
|
||||
checklist.append("\n### Code Quality")
|
||||
checklist.extend([
|
||||
"- [ ] No code duplication",
|
||||
"- [ ] Functions are focused and small",
|
||||
"- [ ] Variable names are descriptive",
|
||||
"- [ ] Error handling is comprehensive",
|
||||
"- [ ] No performance bottlenecks introduced"
|
||||
])
|
||||
|
||||
if 'test' in file_types:
|
||||
checklist.append("\n### Testing")
|
||||
checklist.extend([
|
||||
"- [ ] All new code is covered by tests",
|
||||
"- [ ] Tests are meaningful and not just for coverage",
|
||||
"- [ ] Edge cases are tested",
|
||||
"- [ ] Tests follow AAA pattern (Arrange, Act, Assert)",
|
||||
"- [ ] No flaky tests introduced"
|
||||
])
|
||||
|
||||
if 'config' in file_types:
|
||||
checklist.append("\n### Configuration")
|
||||
checklist.extend([
|
||||
"- [ ] No hardcoded values",
|
||||
"- [ ] Environment variables documented",
|
||||
"- [ ] Backwards compatibility maintained",
|
||||
"- [ ] Security implications reviewed",
|
||||
"- [ ] Default values are sensible"
|
||||
])
|
||||
|
||||
if 'docs' in file_types:
|
||||
checklist.append("\n### Documentation")
|
||||
checklist.extend([
|
||||
"- [ ] Documentation is clear and accurate",
|
||||
"- [ ] Examples are provided where helpful",
|
||||
"- [ ] API changes are documented",
|
||||
"- [ ] README updated if necessary",
|
||||
"- [ ] Changelog updated"
|
||||
])
|
||||
|
||||
# Security checks
|
||||
if has_security_implications(analysis):
|
||||
checklist.append("\n### Security")
|
||||
checklist.extend([
|
||||
"- [ ] No SQL injection vulnerabilities",
|
||||
"- [ ] Input validation implemented",
|
||||
"- [ ] Authentication/authorization correct",
|
||||
"- [ ] No sensitive data in logs",
|
||||
"- [ ] Dependencies are secure"
|
||||
])
|
||||
|
||||
return '\n'.join(checklist)
|
||||
```
|
||||
|
||||
### 4. Code Review Automation
|
||||
|
||||
Automate common review tasks:
|
||||
|
||||
**Automated Review Bot**
|
||||
```python
|
||||
class ReviewBot:
|
||||
def perform_automated_checks(self, pr_diff):
|
||||
"""
|
||||
Perform automated code review checks
|
||||
"""
|
||||
findings = []
|
||||
|
||||
# Check for common issues
|
||||
checks = [
|
||||
self._check_console_logs,
|
||||
self._check_commented_code,
|
||||
self._check_large_functions,
|
||||
self._check_todo_comments,
|
||||
self._check_hardcoded_values,
|
||||
self._check_missing_error_handling,
|
||||
self._check_security_issues
|
||||
]
|
||||
|
||||
for check in checks:
|
||||
findings.extend(check(pr_diff))
|
||||
|
||||
return findings
|
||||
|
||||
def _check_console_logs(self, diff):
|
||||
"""Check for console.log statements"""
|
||||
findings = []
|
||||
pattern = r'\+.*console\.(log|debug|info|warn|error)'
|
||||
|
||||
for file, content in diff.items():
|
||||
matches = re.finditer(pattern, content, re.MULTILINE)
|
||||
for match in matches:
|
||||
findings.append({
|
||||
'type': 'warning',
|
||||
'file': file,
|
||||
'line': self._get_line_number(match, content),
|
||||
'message': 'Console statement found - remove before merging',
|
||||
'suggestion': 'Use proper logging framework instead'
|
||||
})
|
||||
|
||||
return findings
|
||||
|
||||
def _check_large_functions(self, diff):
|
||||
"""Check for functions that are too large"""
|
||||
findings = []
|
||||
|
||||
# Simple heuristic: count lines between function start and end
|
||||
for file, content in diff.items():
|
||||
if file.endswith(('.js', '.ts', '.py')):
|
||||
functions = self._extract_functions(content)
|
||||
for func in functions:
|
||||
if func['lines'] > 50:
|
||||
findings.append({
|
||||
'type': 'suggestion',
|
||||
'file': file,
|
||||
'line': func['start_line'],
|
||||
'message': f"Function '{func['name']}' is {func['lines']} lines long",
|
||||
'suggestion': 'Consider breaking into smaller functions'
|
||||
})
|
||||
|
||||
return findings
|
||||
```
|
||||
|
||||
### 5. PR Size Optimization
|
||||
|
||||
Help split large PRs:
|
||||
|
||||
**PR Splitter Suggestions**
|
||||
```python
|
||||
def suggest_pr_splits(analysis):
|
||||
"""
|
||||
Suggest how to split large PRs
|
||||
"""
|
||||
stats = analysis['change_statistics']
|
||||
|
||||
# Check if PR is too large
|
||||
if stats['files_changed'] > 20 or stats['insertions'] + stats['deletions'] > 1000:
|
||||
suggestions = analyze_split_opportunities(analysis)
|
||||
|
||||
return f"""
|
||||
## ⚠️ Large PR Detected
|
||||
|
||||
This PR changes {stats['files_changed']} files with {stats['insertions'] + stats['deletions']} total changes.
|
||||
Large PRs are harder to review and more likely to introduce bugs.
|
||||
|
||||
### Suggested Splits:
|
||||
|
||||
{format_split_suggestions(suggestions)}
|
||||
|
||||
### How to Split:
|
||||
|
||||
1. Create feature branch from current branch
|
||||
2. Cherry-pick commits for first logical unit
|
||||
3. Create PR for first unit
|
||||
4. Repeat for remaining units
|
||||
|
||||
```bash
|
||||
# Example split workflow
|
||||
git checkout -b feature/part-1
|
||||
git cherry-pick <commit-hashes-for-part-1>
|
||||
git push origin feature/part-1
|
||||
# Create PR for part 1
|
||||
|
||||
git checkout -b feature/part-2
|
||||
git cherry-pick <commit-hashes-for-part-2>
|
||||
git push origin feature/part-2
|
||||
# Create PR for part 2
|
||||
```
|
||||
"""
|
||||
|
||||
return ""
|
||||
|
||||
def analyze_split_opportunities(analysis):
|
||||
"""Find logical units for splitting"""
|
||||
suggestions = []
|
||||
|
||||
# Group by feature areas
|
||||
feature_groups = defaultdict(list)
|
||||
for file in analysis['files_changed']:
|
||||
feature = extract_feature_area(file['filename'])
|
||||
feature_groups[feature].append(file)
|
||||
|
||||
# Suggest splits
|
||||
for feature, files in feature_groups.items():
|
||||
if len(files) >= 5:
|
||||
suggestions.append({
|
||||
'name': f"{feature} changes",
|
||||
'files': files,
|
||||
'reason': f"Isolated changes to {feature} feature"
|
||||
})
|
||||
|
||||
return suggestions
|
||||
```
|
||||
|
||||
### 6. Visual Diff Enhancement
|
||||
|
||||
Generate visual representations:
|
||||
|
||||
**Mermaid Diagram Generator**
|
||||
```python
|
||||
def generate_architecture_diff(analysis):
|
||||
"""
|
||||
Generate diagram showing architectural changes
|
||||
"""
|
||||
if has_architectural_changes(analysis):
|
||||
return f"""
|
||||
## Architecture Changes
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph "Before"
|
||||
A1[Component A] --> B1[Component B]
|
||||
B1 --> C1[Database]
|
||||
end
|
||||
|
||||
subgraph "After"
|
||||
A2[Component A] --> B2[Component B]
|
||||
B2 --> C2[Database]
|
||||
B2 --> D2[New Cache Layer]
|
||||
A2 --> E2[New API Gateway]
|
||||
end
|
||||
|
||||
style D2 fill:#90EE90
|
||||
style E2 fill:#90EE90
|
||||
```
|
||||
|
||||
### Key Changes:
|
||||
1. Added caching layer for performance
|
||||
2. Introduced API gateway for better routing
|
||||
3. Refactored component communication
|
||||
"""
|
||||
return ""
|
||||
```
|
||||
|
||||
### 7. Test Coverage Report
|
||||
|
||||
Include test coverage analysis:
|
||||
|
||||
**Coverage Report Generator**
|
||||
```python
|
||||
def generate_coverage_report(base_branch='main'):
|
||||
"""
|
||||
Generate test coverage comparison
|
||||
"""
|
||||
# Get coverage before and after
|
||||
before_coverage = get_coverage_for_branch(base_branch)
|
||||
after_coverage = get_coverage_for_branch('HEAD')
|
||||
|
||||
coverage_diff = after_coverage - before_coverage
|
||||
|
||||
report = f"""
|
||||
## Test Coverage
|
||||
|
||||
| Metric | Before | After | Change |
|
||||
|--------|--------|-------|--------|
|
||||
| Lines | {before_coverage['lines']:.1f}% | {after_coverage['lines']:.1f}% | {format_diff(coverage_diff['lines'])} |
|
||||
| Functions | {before_coverage['functions']:.1f}% | {after_coverage['functions']:.1f}% | {format_diff(coverage_diff['functions'])} |
|
||||
| Branches | {before_coverage['branches']:.1f}% | {after_coverage['branches']:.1f}% | {format_diff(coverage_diff['branches'])} |
|
||||
|
||||
### Uncovered Files
|
||||
"""
|
||||
|
||||
# List files with low coverage
|
||||
for file in get_low_coverage_files():
|
||||
report += f"- `{file['name']}`: {file['coverage']:.1f}% coverage\n"
|
||||
|
||||
return report
|
||||
|
||||
def format_diff(value):
|
||||
"""Format coverage difference"""
|
||||
if value > 0:
|
||||
return f"<span style='color: green'>+{value:.1f}%</span> ✅"
|
||||
elif value < 0:
|
||||
return f"<span style='color: red'>{value:.1f}%</span> ⚠️"
|
||||
else:
|
||||
return "No change"
|
||||
```
|
||||
|
||||
### 8. Risk Assessment
|
||||
|
||||
Evaluate PR risk:
|
||||
|
||||
**Risk Calculator**
|
||||
```python
|
||||
def calculate_pr_risk(analysis):
|
||||
"""
|
||||
Calculate risk score for PR
|
||||
"""
|
||||
risk_factors = {
|
||||
'size': calculate_size_risk(analysis),
|
||||
'complexity': calculate_complexity_risk(analysis),
|
||||
'test_coverage': calculate_test_risk(analysis),
|
||||
'dependencies': calculate_dependency_risk(analysis),
|
||||
'security': calculate_security_risk(analysis)
|
||||
}
|
||||
|
||||
overall_risk = sum(risk_factors.values()) / len(risk_factors)
|
||||
|
||||
risk_report = f"""
|
||||
## Risk Assessment
|
||||
|
||||
**Overall Risk Level**: {get_risk_level(overall_risk)} ({overall_risk:.1f}/10)
|
||||
|
||||
### Risk Factors
|
||||
|
||||
| Factor | Score | Details |
|
||||
|--------|-------|---------|
|
||||
| Size | {risk_factors['size']:.1f}/10 | {get_size_details(analysis)} |
|
||||
| Complexity | {risk_factors['complexity']:.1f}/10 | {get_complexity_details(analysis)} |
|
||||
| Test Coverage | {risk_factors['test_coverage']:.1f}/10 | {get_test_details(analysis)} |
|
||||
| Dependencies | {risk_factors['dependencies']:.1f}/10 | {get_dependency_details(analysis)} |
|
||||
| Security | {risk_factors['security']:.1f}/10 | {get_security_details(analysis)} |
|
||||
|
||||
### Mitigation Strategies
|
||||
|
||||
{generate_mitigation_strategies(risk_factors)}
|
||||
"""
|
||||
|
||||
return risk_report
|
||||
|
||||
def get_risk_level(score):
|
||||
"""Convert score to risk level"""
|
||||
if score < 3:
|
||||
return "🟢 Low"
|
||||
elif score < 6:
|
||||
return "🟡 Medium"
|
||||
elif score < 8:
|
||||
return "🟠 High"
|
||||
else:
|
||||
return "🔴 Critical"
|
||||
```
|
||||
|
||||
### 9. PR Templates
|
||||
|
||||
Generate context-specific templates:
|
||||
|
||||
```python
|
||||
def generate_pr_template(pr_type, analysis):
|
||||
"""
|
||||
Generate PR template based on type
|
||||
"""
|
||||
templates = {
|
||||
'feature': f"""
|
||||
## Feature: {extract_feature_name(analysis)}
|
||||
|
||||
### Description
|
||||
{generate_feature_description(analysis)}
|
||||
|
||||
### User Story
|
||||
As a [user type]
|
||||
I want [feature]
|
||||
So that [benefit]
|
||||
|
||||
### Acceptance Criteria
|
||||
- [ ] Criterion 1
|
||||
- [ ] Criterion 2
|
||||
- [ ] Criterion 3
|
||||
|
||||
### Demo
|
||||
[Link to demo or screenshots]
|
||||
|
||||
### Technical Implementation
|
||||
{generate_technical_summary(analysis)}
|
||||
|
||||
### Testing Strategy
|
||||
{generate_test_strategy(analysis)}
|
||||
""",
|
||||
'bugfix': f"""
|
||||
## Bug Fix: {extract_bug_description(analysis)}
|
||||
|
||||
### Issue
|
||||
- **Reported in**: #[issue-number]
|
||||
- **Severity**: {determine_severity(analysis)}
|
||||
- **Affected versions**: {get_affected_versions(analysis)}
|
||||
|
||||
### Root Cause
|
||||
{analyze_root_cause(analysis)}
|
||||
|
||||
### Solution
|
||||
{describe_solution(analysis)}
|
||||
|
||||
### Testing
|
||||
- [ ] Bug is reproducible before fix
|
||||
- [ ] Bug is resolved after fix
|
||||
- [ ] No regressions introduced
|
||||
- [ ] Edge cases tested
|
||||
|
||||
### Verification Steps
|
||||
1. Step to reproduce original issue
|
||||
2. Apply this fix
|
||||
3. Verify issue is resolved
|
||||
""",
|
||||
'refactor': f"""
|
||||
## Refactoring: {extract_refactor_scope(analysis)}
|
||||
|
||||
### Motivation
|
||||
{describe_refactor_motivation(analysis)}
|
||||
|
||||
### Changes Made
|
||||
{list_refactor_changes(analysis)}
|
||||
|
||||
### Benefits
|
||||
- Improved {list_improvements(analysis)}
|
||||
- Reduced {list_reductions(analysis)}
|
||||
|
||||
### Compatibility
|
||||
- [ ] No breaking changes
|
||||
- [ ] API remains unchanged
|
||||
- [ ] Performance maintained or improved
|
||||
|
||||
### Metrics
|
||||
| Metric | Before | After |
|
||||
|--------|--------|-------|
|
||||
| Complexity | X | Y |
|
||||
| Test Coverage | X% | Y% |
|
||||
| Performance | Xms | Yms |
|
||||
"""
|
||||
}
|
||||
|
||||
return templates.get(pr_type, templates['feature'])
|
||||
```
|
||||
|
||||
### 10. Review Response Templates
|
||||
|
||||
Help with review responses:
|
||||
|
||||
```python
|
||||
review_response_templates = {
|
||||
'acknowledge_feedback': """
|
||||
Thank you for the thorough review! I'll address these points.
|
||||
""",
|
||||
|
||||
'explain_decision': """
|
||||
Great question! I chose this approach because:
|
||||
1. [Reason 1]
|
||||
2. [Reason 2]
|
||||
|
||||
Alternative approaches considered:
|
||||
- [Alternative 1]: [Why not chosen]
|
||||
- [Alternative 2]: [Why not chosen]
|
||||
|
||||
Happy to discuss further if you have concerns.
|
||||
""",
|
||||
|
||||
'request_clarification': """
|
||||
Thanks for the feedback. Could you clarify what you mean by [specific point]?
|
||||
I want to make sure I understand your concern correctly before making changes.
|
||||
""",
|
||||
|
||||
'disagree_respectfully': """
|
||||
I appreciate your perspective on this. I have a slightly different view:
|
||||
|
||||
[Your reasoning]
|
||||
|
||||
However, I'm open to discussing this further. What do you think about [compromise/middle ground]?
|
||||
""",
|
||||
|
||||
'commit_to_change': """
|
||||
Good catch! I'll update this to [specific change].
|
||||
This should address [concern] while maintaining [other requirement].
|
||||
"""
|
||||
}
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
1. **PR Summary**: Executive summary with key metrics
|
||||
2. **Detailed Description**: Comprehensive PR description
|
||||
3. **Review Checklist**: Context-aware review items
|
||||
4. **Risk Assessment**: Risk analysis with mitigation strategies
|
||||
5. **Test Coverage**: Before/after coverage comparison
|
||||
6. **Visual Aids**: Diagrams and visual diffs where applicable
|
||||
7. **Size Recommendations**: Suggestions for splitting large PRs
|
||||
8. **Review Automation**: Automated checks and findings
|
||||
|
||||
Focus on creating PRs that are a pleasure to review, with all necessary context and documentation for efficient code review process.
|
||||
25
skills/create-pr/README.md
Normal file
25
skills/create-pr/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
12
skills/create-pr/SKILL.md
Normal file
12
skills/create-pr/SKILL.md
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
---
|
||||
name: create-pr
|
||||
description: Alias for sentry-skills:pr-writer. Use when users explicitly ask for "create-pr" or reference the legacy skill name. Redirects to the canonical PR writing workflow.
|
||||
---
|
||||
|
||||
# Alias: create-pr
|
||||
|
||||
This skill name is kept for compatibility.
|
||||
|
||||
Use `sentry-skills:pr-writer` as the canonical skill for creating and editing pull requests.
|
||||
|
||||
If invoked via `create-pr`, run the same workflow and conventions documented in `sentry-skills:pr-writer`.
|
||||
119
skills/creating-grafana-dashboard/SKILL.md
Normal file
119
skills/creating-grafana-dashboard/SKILL.md
Normal file
|
|
@ -0,0 +1,119 @@
|
|||
---
|
||||
name: creating-grafana-dashboard
|
||||
description: Use when adding a dashboard to Zoe's Grafana monitoring stack — whether importing from grafana.com or creating from scratch — including datasource UID patching, GitOps deployment via the grafana-dashboards repo, and verification.
|
||||
---
|
||||
|
||||
# Creating a Grafana Dashboard
|
||||
|
||||
## Overview
|
||||
|
||||
Dashboards are delivered via GitOps from `git@git.ctz.fyi:zoe/grafana-dashboards.git`. Push to main → Woodpecker CI auto-deploys to Grafana at `grafana.monitoring.ctz.fyi`. The critical gotcha: any downloaded dashboard will have wrong datasource UIDs and must be patched before committing.
|
||||
|
||||
## Stack Reference
|
||||
|
||||
| Service | URL / Context |
|
||||
|---------|--------------|
|
||||
| Grafana | grafana.monitoring.ctz.fyi (v11.6.1, Postgres backend) |
|
||||
| Cluster | k3s `monitoring` context |
|
||||
| Mimir (metrics) | datasource UID: `mimir`, type: `prometheus` |
|
||||
| Loki (logs) | datasource UID: `loki`, type: `loki` |
|
||||
| Tempo (traces) | datasource UID: `tempo`, type: `tempo` |
|
||||
| Pyroscope (profiling) | datasource UID: `pyroscope`, type: `grafana-pyroscope-datasource` |
|
||||
| Grafana API key | `secret/production/grafana/api-key` in OpenBao |
|
||||
|
||||
## Datasource UID Mapping (ALWAYS CHECK THIS)
|
||||
|
||||
| What the dashboard JSON says | What to set |
|
||||
|-----------------------------|-------------|
|
||||
| `type: prometheus`, any UID | `uid: "mimir"` |
|
||||
| `type: loki`, any UID | `uid: "loki"` |
|
||||
| `type: tempo`, any UID | `uid: "tempo"` |
|
||||
| `type: grafana-pyroscope-datasource`, any UID | `uid: "pyroscope"` |
|
||||
| `${DS_PROMETHEUS}` template variable | set default to `mimir` |
|
||||
|
||||
## Repo Structure
|
||||
|
||||
```
|
||||
grafana-dashboards/
|
||||
dashboards/
|
||||
cilium/ # Cilium CNI dashboards
|
||||
lgtm/ # Mimir, Loki, Tempo, Pyroscope dashboards
|
||||
infra/ # Node, k8s cluster dashboards
|
||||
apps/ # Application-specific dashboards
|
||||
scripts/
|
||||
sources.sh # upstream dashboard sources list
|
||||
update-dashboards.sh # pull from upstream + patch UIDs
|
||||
push-to-grafana.sh # push to live Grafana via API
|
||||
.woodpecker.yml
|
||||
```
|
||||
|
||||
## Path A: Import from grafana.com
|
||||
|
||||
```bash
|
||||
# 1. Download
|
||||
curl -o dashboards/<folder>/<name>.json \
|
||||
"https://grafana.com/api/dashboards/<id>/revisions/latest/download"
|
||||
|
||||
# 2. Patch datasource UIDs (REQUIRED — dashboard will show "No data" otherwise)
|
||||
jq '
|
||||
(.templating.list[] | select(.type == "datasource") | .query) = "prometheus" |
|
||||
(.panels[].datasource | select(.type == "prometheus") | .uid) = "mimir" |
|
||||
(.panels[].targets[]? | .datasource | select(.type == "prometheus") | .uid) = "mimir"
|
||||
' dashboard.json > dashboard-patched.json
|
||||
mv dashboard-patched.json dashboard.json
|
||||
|
||||
# Repeat for loki/tempo/pyroscope as needed
|
||||
|
||||
# 3. Set a unique explicit UID
|
||||
jq '.uid = "descriptive-slug-here"' dashboard.json > tmp.json && mv tmp.json dashboard.json
|
||||
|
||||
# 4. Check for UID collisions before committing
|
||||
jq -r '.uid' dashboards/**/*.json | sort | uniq -d # should output nothing
|
||||
|
||||
# 5. Add to sources.sh for future updates, then commit + push
|
||||
```
|
||||
|
||||
## Path B: Create from scratch in UI
|
||||
|
||||
1. Build panels at `grafana.monitoring.ctz.fyi`
|
||||
2. Export: Dashboard → Share → Export → Save to file
|
||||
3. Save to `dashboards/<folder>/<name>.json`
|
||||
4. Verify `.uid` is set to a unique descriptive slug
|
||||
5. Commit and push
|
||||
|
||||
For new app dashboards: check what metrics are exposed first.
|
||||
```bash
|
||||
# See what labels Alloy exposes for a service
|
||||
kubectl --context monitoring exec -n monitoring ds/alloy -- alloy targets
|
||||
|
||||
# Or port-forward to the app's /metrics endpoint
|
||||
kubectl port-forward svc/<app> 9090:9090
|
||||
curl localhost:9090/metrics | grep -v '^#' | head -50
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
Push to main triggers Woodpecker automatically. To deploy manually:
|
||||
|
||||
```bash
|
||||
cd grafana-dashboards
|
||||
GRAFANA_API_KEY=$(bao kv get -field=api-key secret/production/grafana/api-key)
|
||||
./scripts/push-to-grafana.sh
|
||||
```
|
||||
|
||||
Check pipeline status at `ci.ctz.fyi` → grafana-dashboards repo.
|
||||
|
||||
## Verification
|
||||
|
||||
- Go to `grafana.monitoring.ctz.fyi` → Dashboards → find the dashboard
|
||||
- All panels should show data (no "No data" panels)
|
||||
- If "No data": datasource UIDs weren't patched — re-run jq patch
|
||||
|
||||
## Common Issues
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|-------|-----|
|
||||
| "No data" on panels | Datasource UID not patched | Re-run jq patch for that datasource type |
|
||||
| Dashboard import fails | Duplicate UID | `jq -r '.uid' dashboards/**/*.json \| sort \| uniq -d` then rename |
|
||||
| Wrong data in panels | Wrong label matchers | Check `alloy targets` for actual label names |
|
||||
| UID collision silently replaces existing dashboard | Forgot to set explicit UID | Always set `.uid` to unique slug before commit |
|
||||
316
skills/deploying-new-k8s-service/SKILL.md
Normal file
316
skills/deploying-new-k8s-service/SKILL.md
Normal file
|
|
@ -0,0 +1,316 @@
|
|||
---
|
||||
name: deploying-new-k8s-service
|
||||
description: Use when deploying a new service to Zoe's homelab k3s cluster (ansiblestack). Covers scaffolding Helm charts, writing ArgoCD app manifests, wiring ExternalSecrets via OpenBao, configuring Traefik IngressRoutes with cert-manager TLS, and watching GitOps sync to completion.
|
||||
---
|
||||
|
||||
# Deploying a New k3s Service (ansiblestack)
|
||||
|
||||
## Overview
|
||||
|
||||
All services deploy via GitOps: Helm chart in `ansiblestack` repo → ArgoCD syncs → k3s cluster. Never `kubectl apply` workload manifests directly. Always commit and let ArgoCD drive.
|
||||
|
||||
## Cluster Quick Reference
|
||||
|
||||
| Thing | Value |
|
||||
|---|---|
|
||||
| Cluster | k3s at `10.0.6.10:6443` |
|
||||
| GitOps repo | `git@git.ctz.fyi:zoe/ansiblestack.git` (GitHub mirror: `ZoesDev/ansiblestack`) |
|
||||
| ArgoCD | `argocd.ctz.fyi` |
|
||||
| Secrets | External Secrets Operator → OpenBao (`bao.ctz.fyi`); ClusterSecretStore: `openbao` |
|
||||
| Ingress | Traefik IngressRoute CRDs |
|
||||
| TLS | cert-manager, ClusterIssuer: `letsencrypt-production` |
|
||||
| DNS | external-dns via annotation |
|
||||
| Registry | Harbor at `registry.ctz.fyi`, project `library` |
|
||||
| Storage | `ssd` (NFS-SSD, preferred for stateful), `local-path` (node-local) |
|
||||
| Hostname convention | Public: `<svc>.ctz.fyi` · Internal: `<svc>.i.ctz.fyi` |
|
||||
| OpenBao KV path | `secret/production/<namespace>/<secret-name>` |
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Research the app
|
||||
|
||||
Before touching any file:
|
||||
- Read the upstream GitHub repo or Docker Hub page
|
||||
- Identify: **ports**, **required env vars**, **config file mounts**, **volume paths**, **default user/UID**
|
||||
- Wrong env vars = silent failure. Don't skip this.
|
||||
|
||||
### 2. Check existing charts for patterns
|
||||
|
||||
```
|
||||
helm/charts/
|
||||
jellyfin/ ← stateful reference
|
||||
tandoor/ ← stateful with DB reference
|
||||
crucix/ ← simple stateless reference
|
||||
convertx/ ← simple stateless reference
|
||||
```
|
||||
|
||||
Match the pattern to your app type before scaffolding.
|
||||
|
||||
### 3. Scaffold chart files
|
||||
|
||||
Path: `helm/charts/<name>/`
|
||||
|
||||
```
|
||||
Chart.yaml
|
||||
values.yaml
|
||||
templates/
|
||||
_helpers.tpl
|
||||
deployment.yaml
|
||||
service.yaml
|
||||
ingressroute.yaml
|
||||
external-secrets.yaml # only if secrets needed
|
||||
```
|
||||
|
||||
#### Chart.yaml
|
||||
|
||||
```yaml
|
||||
apiVersion: v2
|
||||
name: <name>
|
||||
description: <one-liner>
|
||||
version: 0.1.0
|
||||
appVersion: "latest"
|
||||
```
|
||||
|
||||
#### values.yaml (minimum)
|
||||
|
||||
```yaml
|
||||
image:
|
||||
repository: registry.ctz.fyi/library/<name> # or upstream image
|
||||
tag: latest
|
||||
pullPolicy: IfNotPresent
|
||||
|
||||
service:
|
||||
hostname: <name>.ctz.fyi
|
||||
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
memory: 512Mi
|
||||
|
||||
# persistence: # include for stateful apps
|
||||
# enabled: true
|
||||
# storageClass: ssd
|
||||
# size: 10Gi
|
||||
# mountPath: /data
|
||||
```
|
||||
|
||||
#### templates/_helpers.tpl
|
||||
|
||||
```
|
||||
{{- define "<name>.fullname" -}}
|
||||
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
|
||||
{{- end }}
|
||||
```
|
||||
|
||||
#### templates/deployment.yaml
|
||||
|
||||
Standard Deployment. Key points:
|
||||
- `namespace: {{ .Release.Namespace }}`
|
||||
- Use `{{ include "<name>.fullname" . }}` for all name references
|
||||
- Mount secrets from ExternalSecret-created Secret if needed
|
||||
- For stateful: use `PersistentVolumeClaim` via `volumes` + `volumeMounts`, storageClass `ssd`
|
||||
|
||||
#### templates/service.yaml
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: {{ include "<name>.fullname" . }}
|
||||
namespace: {{ .Release.Namespace }}
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
app: {{ include "<name>.fullname" . }}
|
||||
ports:
|
||||
- port: <port>
|
||||
targetPort: <port>
|
||||
```
|
||||
|
||||
#### templates/ingressroute.yaml
|
||||
|
||||
**CRITICAL: You need BOTH objects. Do not omit either.**
|
||||
|
||||
```yaml
|
||||
# 1. Traefik IngressRoute — actual routing
|
||||
apiVersion: traefik.io/v1alpha1
|
||||
kind: IngressRoute
|
||||
metadata:
|
||||
name: {{ include "<name>.fullname" . }}
|
||||
namespace: {{ .Release.Namespace }}
|
||||
annotations:
|
||||
external-dns.alpha.kubernetes.io/hostname: {{ .Values.service.hostname }}
|
||||
spec:
|
||||
entryPoints: [websecure]
|
||||
routes:
|
||||
- match: Host(`{{ .Values.service.hostname }}`)
|
||||
kind: Rule
|
||||
services:
|
||||
- name: {{ include "<name>.fullname" . }}
|
||||
port: <port>
|
||||
tls:
|
||||
secretName: {{ include "<name>.fullname" . }}-tls
|
||||
|
||||
---
|
||||
# 2. Companion Ingress — cert-manager TLS + external-dns ONLY (Traefik ignores this)
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: {{ include "<name>.fullname" . }}-cm
|
||||
namespace: {{ .Release.Namespace }}
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-production
|
||||
external-dns.alpha.kubernetes.io/hostname: {{ .Values.service.hostname }}
|
||||
# Add this only for Pangolin/externally-tunneled services:
|
||||
# external-dns.alpha.kubernetes.io/target: "external"
|
||||
spec:
|
||||
ingressClassName: traefik
|
||||
rules:
|
||||
- host: {{ .Values.service.hostname }}
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: placeholder
|
||||
port:
|
||||
number: 80
|
||||
tls:
|
||||
- hosts: [{{ .Values.service.hostname }}]
|
||||
secretName: {{ include "<name>.fullname" . }}-tls
|
||||
```
|
||||
|
||||
#### templates/external-secrets.yaml (only if secrets needed)
|
||||
|
||||
```yaml
|
||||
apiVersion: external-secrets.io/v1
|
||||
kind: ExternalSecret
|
||||
metadata:
|
||||
name: {{ include "<name>.fullname" . }}-secret
|
||||
namespace: {{ .Release.Namespace }}
|
||||
annotations:
|
||||
argocd.argoproj.io/sync-wave: "-1" # ← REQUIRED — must exist before Deployment
|
||||
spec:
|
||||
refreshInterval: 1h
|
||||
secretStoreRef:
|
||||
name: openbao
|
||||
kind: ClusterSecretStore
|
||||
target:
|
||||
name: {{ include "<name>.fullname" . }}-secret
|
||||
creationPolicy: Owner
|
||||
data:
|
||||
- secretKey: <key>
|
||||
remoteRef:
|
||||
key: secret/production/{{ .Release.Namespace }}/{{ include "<name>.fullname" . }}
|
||||
property: <key>
|
||||
```
|
||||
|
||||
### 4. Write ArgoCD app manifest
|
||||
|
||||
Path: `helm/argocd/<name>-app.yaml`
|
||||
|
||||
```yaml
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: <name>
|
||||
namespace: argocd
|
||||
annotations:
|
||||
argocd.argoproj.io/sync-wave: "10"
|
||||
spec:
|
||||
project: default
|
||||
source:
|
||||
repoURL: https://git.ctz.fyi/zoe/ansiblestack
|
||||
targetRevision: main
|
||||
path: helm/charts/<name>
|
||||
helm:
|
||||
valueFiles: [values.yaml]
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: <name>
|
||||
syncPolicy:
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
syncOptions: [CreateNamespace=true]
|
||||
```
|
||||
|
||||
### 5. Write secrets to OpenBao (if needed)
|
||||
|
||||
```bash
|
||||
bao kv put secret/production/<namespace>/<name> \
|
||||
key1=value1 \
|
||||
key2=value2
|
||||
```
|
||||
|
||||
Do this **before** applying the ArgoCD app. ExternalSecret will pull on first sync.
|
||||
|
||||
### 6. Commit and push
|
||||
|
||||
```bash
|
||||
cd ansiblestack
|
||||
git add helm/charts/<name>/ helm/argocd/<name>-app.yaml
|
||||
git commit -m "feat: add <name> service"
|
||||
git push
|
||||
```
|
||||
|
||||
### 7. Apply the ArgoCD Application
|
||||
|
||||
```bash
|
||||
kubectl apply -f helm/argocd/<name>-app.yaml
|
||||
```
|
||||
|
||||
ArgoCD picks up the app and begins syncing.
|
||||
|
||||
### 8. Verify
|
||||
|
||||
```bash
|
||||
# Watch sync status
|
||||
kubectl get applications -n argocd <name>
|
||||
|
||||
# Check pods
|
||||
kubectl get pods -n <name>
|
||||
|
||||
# Check logs
|
||||
kubectl logs -n <name> -l app=<name>
|
||||
|
||||
# Smoke test
|
||||
curl -I https://<name>.ctz.fyi
|
||||
```
|
||||
|
||||
Or check the ArgoCD UI at `argocd.ctz.fyi`.
|
||||
|
||||
---
|
||||
|
||||
## Pangolin (external tunnel) services
|
||||
|
||||
Add these to the IngressRoute metadata annotations:
|
||||
```yaml
|
||||
annotations:
|
||||
pangolin.fossorial.io/enabled: "true"
|
||||
pangolin.fossorial.io/target-port: "<port>"
|
||||
```
|
||||
|
||||
And add to the companion Ingress:
|
||||
```yaml
|
||||
external-dns.alpha.kubernetes.io/target: "external"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Gotchas
|
||||
|
||||
| Gotcha | Fix |
|
||||
|---|---|
|
||||
| Deployment crashes on startup, missing secret | `sync-wave: "-1"` on ExternalSecret is required — it must exist before Deployment syncs |
|
||||
| TLS cert never issues | Companion Ingress is missing — cert-manager needs it even though Traefik doesn't route through it |
|
||||
| Service unreachable despite pod running | Check env vars against upstream docs; wrong vars often cause silent failure at startup |
|
||||
| PVC stuck in Pending | Use `ssd` storageClass for NFS-backed volumes; `local-path` won't schedule if node is wrong |
|
||||
| Harbor pull fails | Private Harbor projects need `imagePullSecrets` on the Deployment |
|
||||
| DNS not registering | Check `external-dns.alpha.kubernetes.io/hostname` annotation is on both IngressRoute and companion Ingress |
|
||||
| StatefulSet data not persisting | Use `volumeClaimTemplates` in StatefulSet spec, not a standalone PVC manifest |
|
||||
172
skills/designing-alerts/SKILL.md
Normal file
172
skills/designing-alerts/SKILL.md
Normal file
|
|
@ -0,0 +1,172 @@
|
|||
---
|
||||
name: designing-alerts
|
||||
description: Use when creating, reviewing, or debugging Prometheus/Grafana alert rules - when writing PromQL for alerts, choosing thresholds, deciding alert severity, writing PrometheusRule CRDs, or evaluating whether something should be an alert at all.
|
||||
---
|
||||
|
||||
# Designing Alerts
|
||||
|
||||
## Overview
|
||||
|
||||
Bad alerts are worse than no alerts — they cause alert fatigue and get ignored.
|
||||
Every alert must be actionable, symptom-based, and backed by real threshold data.
|
||||
|
||||
**Stack:** Mimir (datasource UID `mimir`) · Grafana at `grafana.monitoring.ctz.fyi` · Grafana alerting · PrometheusRule CRDs
|
||||
|
||||
## Cardinal Rules
|
||||
|
||||
1. **Actionable or bust** — if you can't do something about it right now, it's a dashboard, not an alert
|
||||
2. **Symptoms, not causes** — "users can't reach service" > "CPU is high" > "pod restarted"
|
||||
3. **Rates, not raw values** — `rate(errors[5m]) > 0.01` not `errors_total > 100`
|
||||
4. **Always add `for:`** — minimum 2–5 minutes; eliminates transient spikes
|
||||
5. **Every alert needs a runbook** — `annotations.runbook_url` or at minimum a useful `description`
|
||||
6. **Test your thresholds** — check p99 of historical data in Grafana Explore before picking a number
|
||||
|
||||
## Severity Levels
|
||||
|
||||
| Severity | Meaning | Response |
|
||||
|---|---|---|
|
||||
| `critical` | User-facing impact, wake someone up | Immediate |
|
||||
| `warning` | Degraded but not down | Investigate within hours |
|
||||
| `info` | FYI, no action required | Prefer dashboards instead |
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
1. Identify failure modes that matter for this service
|
||||
2. Find the right metric (check dashboards, Explore, service docs)
|
||||
3. Write PromQL — test in Grafana Explore using historical data
|
||||
4. Pick threshold from p99 of normal values (not intuition)
|
||||
5. Set for: duration (never < 2m)
|
||||
6. Write description: what broke + current value + what to do first
|
||||
7. Add runbook_url or BookStack link
|
||||
8. Deploy as PrometheusRule CRD (preferred) or via Grafana UI
|
||||
9. Verify alert appears, fires, and resolves correctly
|
||||
```
|
||||
|
||||
## PrometheusRule CRD Pattern
|
||||
|
||||
```yaml
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: PrometheusRule
|
||||
metadata:
|
||||
name: <service>-alerts
|
||||
namespace: <namespace>
|
||||
labels:
|
||||
prometheus: kube-prometheus
|
||||
role: alert-rules
|
||||
spec:
|
||||
groups:
|
||||
- name: <service>.rules
|
||||
interval: 60s
|
||||
rules:
|
||||
- alert: ServiceDown
|
||||
expr: up{job="<service>"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
team: infra
|
||||
annotations:
|
||||
summary: "{{ $labels.instance }} is down"
|
||||
description: "Service {{ $labels.job }} on {{ $labels.instance }} has been down > 5m. Check pod logs and events."
|
||||
runbook_url: "https://wiki.ctz.fyi/books/ansiblestack/page/runbook-<service>"
|
||||
```
|
||||
|
||||
## Common Alert Patterns
|
||||
|
||||
```yaml
|
||||
# Service availability
|
||||
- alert: ServiceUnreachable
|
||||
expr: up{job=~"<service>.*"} == 0
|
||||
for: 5m
|
||||
labels: {severity: critical}
|
||||
|
||||
# High error rate (5% for 5m)
|
||||
- alert: HighErrorRate
|
||||
expr: |
|
||||
rate(http_requests_total{status=~"5.."}[5m])
|
||||
/ rate(http_requests_total[5m]) > 0.05
|
||||
for: 5m
|
||||
labels: {severity: critical}
|
||||
|
||||
# Pod crash looping
|
||||
- alert: PodCrashLooping
|
||||
expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
|
||||
for: 5m
|
||||
labels: {severity: warning}
|
||||
|
||||
# Node memory pressure
|
||||
- alert: NodeMemoryPressure
|
||||
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) > 0.90
|
||||
for: 10m
|
||||
labels: {severity: warning}
|
||||
|
||||
# Disk space
|
||||
- alert: DiskSpaceLow
|
||||
expr: |
|
||||
(1 - node_filesystem_avail_bytes{fstype!="tmpfs"}
|
||||
/ node_filesystem_size_bytes{fstype!="tmpfs"}) > 0.85
|
||||
for: 15m
|
||||
labels: {severity: warning}
|
||||
|
||||
# Certificate expiry
|
||||
- alert: CertificateExpiringSoon
|
||||
expr: certmanager_certificate_expiration_timestamp_seconds - time() < 7 * 24 * 3600
|
||||
for: 1h
|
||||
labels: {severity: critical}
|
||||
|
||||
# OpenBao sealed
|
||||
- alert: OpenBaoSealed
|
||||
expr: vault_core_unsealed == 0
|
||||
for: 2m
|
||||
labels: {severity: critical}
|
||||
```
|
||||
|
||||
## SLO-Based Alerting (Advanced)
|
||||
|
||||
For a 99.9% SLO (0.1% error budget):
|
||||
|
||||
```yaml
|
||||
# Fast burn: consuming budget 14x faster than sustainable
|
||||
- alert: SLOBurnRateFast
|
||||
expr: |
|
||||
(rate(requests_total{status=~"5.."}[1h])
|
||||
/ rate(requests_total[1h])) > 14 * 0.001
|
||||
for: 5m
|
||||
labels: {severity: critical}
|
||||
annotations:
|
||||
description: "Error budget burning 14x too fast. 1h rate: {{ $value | humanizePercentage }}"
|
||||
|
||||
# Slow burn: will exhaust budget in ~3 days
|
||||
- alert: SLOBurnRateSlow
|
||||
expr: |
|
||||
(rate(requests_total{status=~"5.."}[6h])
|
||||
/ rate(requests_total[6h])) > 2 * 0.001
|
||||
for: 30m
|
||||
labels: {severity: warning}
|
||||
```
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
| ❌ Bad | ✅ Better |
|
||||
|---|---|
|
||||
| `cpu_usage > 80` | CPU sustained high AND latency degraded |
|
||||
| `pod_restarts > 0` | `rate(restarts[15m]) > 0` with `for: 5m` |
|
||||
| No `for:` duration | Always add `for:`, minimum 2m |
|
||||
| `severity: critical` on everything | Reserve critical for user-facing impact |
|
||||
| "high X" with no context | What's normal? What's the impact? What to do? |
|
||||
| Fires in staging/dev | Add `env="production"` label filter |
|
||||
| Alert for every metric | Not everything needs an alert; use dashboards |
|
||||
|
||||
## Writing Good Descriptions
|
||||
|
||||
Template: **"[What broke] on [where]. Current value: {{ $value }}. [What to check first]."**
|
||||
|
||||
```yaml
|
||||
# ❌ Bad
|
||||
description: "High error rate detected"
|
||||
|
||||
# ✅ Good
|
||||
description: "Error rate on {{ $labels.job }} is {{ $value | humanizePercentage }}
|
||||
(threshold: 5%). Check recent deployments and downstream dependencies.
|
||||
Logs: kubectl logs -n {{ $labels.namespace }} -l app={{ $labels.job }} --tail=100"
|
||||
```
|
||||
25
skills/devops-troubleshooter/README.md
Normal file
25
skills/devops-troubleshooter/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
157
skills/devops-troubleshooter/SKILL.md
Normal file
157
skills/devops-troubleshooter/SKILL.md
Normal file
|
|
@ -0,0 +1,157 @@
|
|||
---
|
||||
name: devops-troubleshooter
|
||||
description: Expert DevOps troubleshooter specializing in rapid incident response, advanced debugging, and modern observability.
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: '2026-02-27'
|
||||
---
|
||||
|
||||
## Use this skill when
|
||||
|
||||
- Working on devops troubleshooter tasks or workflows
|
||||
- Needing guidance, best practices, or checklists for devops troubleshooter
|
||||
|
||||
## Do not use this skill when
|
||||
|
||||
- The task is unrelated to devops troubleshooter
|
||||
- You need a different domain or tool outside this scope
|
||||
|
||||
## Instructions
|
||||
|
||||
- Clarify goals, constraints, and required inputs.
|
||||
- Apply relevant best practices and validate outcomes.
|
||||
- Provide actionable steps and verification.
|
||||
- If detailed examples are required, open `resources/implementation-playbook.md`.
|
||||
|
||||
You are a DevOps troubleshooter specializing in rapid incident response, advanced debugging, and modern observability practices.
|
||||
|
||||
## Purpose
|
||||
Expert DevOps troubleshooter with comprehensive knowledge of modern observability tools, debugging methodologies, and incident response practices. Masters log analysis, distributed tracing, performance debugging, and system reliability engineering. Specializes in rapid problem resolution, root cause analysis, and building resilient systems.
|
||||
|
||||
## Capabilities
|
||||
|
||||
### Modern Observability & Monitoring
|
||||
- **Logging platforms**: ELK Stack (Elasticsearch, Logstash, Kibana), Loki/Grafana, Fluentd/Fluent Bit
|
||||
- **APM solutions**: DataDog, New Relic, Dynatrace, AppDynamics, Instana, Honeycomb
|
||||
- **Metrics & monitoring**: Prometheus, Grafana, InfluxDB, VictoriaMetrics, Thanos
|
||||
- **Distributed tracing**: Jaeger, Zipkin, AWS X-Ray, OpenTelemetry, custom tracing
|
||||
- **Cloud-native observability**: OpenTelemetry collector, service mesh observability
|
||||
- **Synthetic monitoring**: Pingdom, Datadog Synthetics, custom health checks
|
||||
|
||||
### Container & Kubernetes Debugging
|
||||
- **kubectl mastery**: Advanced debugging commands, resource inspection, troubleshooting workflows
|
||||
- **Container runtime debugging**: Docker, containerd, CRI-O, runtime-specific issues
|
||||
- **Pod troubleshooting**: Init containers, sidecar issues, resource constraints, networking
|
||||
- **Service mesh debugging**: Istio, Linkerd, Consul Connect traffic and security issues
|
||||
- **Kubernetes networking**: CNI troubleshooting, service discovery, ingress issues
|
||||
- **Storage debugging**: Persistent volume issues, storage class problems, data corruption
|
||||
|
||||
### Network & DNS Troubleshooting
|
||||
- **Network analysis**: tcpdump, Wireshark, eBPF-based tools, network latency analysis
|
||||
- **DNS debugging**: dig, nslookup, DNS propagation, service discovery issues
|
||||
- **Load balancer issues**: AWS ALB/NLB, Azure Load Balancer, GCP Load Balancer debugging
|
||||
- **Firewall & security groups**: Network policies, security group misconfigurations
|
||||
- **Service mesh networking**: Traffic routing, circuit breaker issues, retry policies
|
||||
- **Cloud networking**: VPC connectivity, peering issues, NAT gateway problems
|
||||
|
||||
### Performance & Resource Analysis
|
||||
- **System performance**: CPU, memory, disk I/O, network utilization analysis
|
||||
- **Application profiling**: Memory leaks, CPU hotspots, garbage collection issues
|
||||
- **Database performance**: Query optimization, connection pool issues, deadlock analysis
|
||||
- **Cache troubleshooting**: Redis, Memcached, application-level caching issues
|
||||
- **Resource constraints**: OOMKilled containers, CPU throttling, disk space issues
|
||||
- **Scaling issues**: Auto-scaling problems, resource bottlenecks, capacity planning
|
||||
|
||||
### Application & Service Debugging
|
||||
- **Microservices debugging**: Service-to-service communication, dependency issues
|
||||
- **API troubleshooting**: REST API debugging, GraphQL issues, authentication problems
|
||||
- **Message queue issues**: Kafka, RabbitMQ, SQS, dead letter queues, consumer lag
|
||||
- **Event-driven architecture**: Event sourcing issues, CQRS problems, eventual consistency
|
||||
- **Deployment issues**: Rolling update problems, configuration errors, environment mismatches
|
||||
- **Configuration management**: Environment variables, secrets, config drift
|
||||
|
||||
### CI/CD Pipeline Debugging
|
||||
- **Build failures**: Compilation errors, dependency issues, test failures
|
||||
- **Deployment troubleshooting**: GitOps issues, ArgoCD/Flux problems, rollback procedures
|
||||
- **Pipeline performance**: Build optimization, parallel execution, resource constraints
|
||||
- **Security scanning issues**: SAST/DAST failures, vulnerability remediation
|
||||
- **Artifact management**: Registry issues, image corruption, version conflicts
|
||||
- **Environment-specific issues**: Configuration mismatches, infrastructure problems
|
||||
|
||||
### Cloud Platform Troubleshooting
|
||||
- **AWS debugging**: CloudWatch analysis, AWS CLI troubleshooting, service-specific issues
|
||||
- **Azure troubleshooting**: Azure Monitor, PowerShell debugging, resource group issues
|
||||
- **GCP debugging**: Cloud Logging, gcloud CLI, service account problems
|
||||
- **Multi-cloud issues**: Cross-cloud communication, identity federation problems
|
||||
- **Serverless debugging**: Lambda functions, Azure Functions, Cloud Functions issues
|
||||
|
||||
### Security & Compliance Issues
|
||||
- **Authentication debugging**: OAuth, SAML, JWT token issues, identity provider problems
|
||||
- **Authorization issues**: RBAC problems, policy misconfigurations, permission debugging
|
||||
- **Certificate management**: TLS certificate issues, renewal problems, chain validation
|
||||
- **Security scanning**: Vulnerability analysis, compliance violations, security policy enforcement
|
||||
- **Audit trail analysis**: Log analysis for security events, compliance reporting
|
||||
|
||||
### Database Troubleshooting
|
||||
- **SQL debugging**: Query performance, index usage, execution plan analysis
|
||||
- **NoSQL issues**: MongoDB, Redis, DynamoDB performance and consistency problems
|
||||
- **Connection issues**: Connection pool exhaustion, timeout problems, network connectivity
|
||||
- **Replication problems**: Primary-replica lag, failover issues, data consistency
|
||||
- **Backup & recovery**: Backup failures, point-in-time recovery, disaster recovery testing
|
||||
|
||||
### Infrastructure & Platform Issues
|
||||
- **Infrastructure as Code**: Terraform state issues, provider problems, resource drift
|
||||
- **Configuration management**: Ansible playbook failures, Chef cookbook issues, Puppet manifest problems
|
||||
- **Container registry**: Image pull failures, registry connectivity, vulnerability scanning issues
|
||||
- **Secret management**: Vault integration, secret rotation, access control problems
|
||||
- **Disaster recovery**: Backup failures, recovery testing, business continuity issues
|
||||
|
||||
### Advanced Debugging Techniques
|
||||
- **Distributed system debugging**: CAP theorem implications, eventual consistency issues
|
||||
- **Chaos engineering**: Fault injection analysis, resilience testing, failure pattern identification
|
||||
- **Performance profiling**: Application profilers, system profiling, bottleneck analysis
|
||||
- **Log correlation**: Multi-service log analysis, distributed tracing correlation
|
||||
- **Capacity analysis**: Resource utilization trends, scaling bottlenecks, cost optimization
|
||||
|
||||
## Behavioral Traits
|
||||
- Gathers comprehensive facts first through logs, metrics, and traces before forming hypotheses
|
||||
- Forms systematic hypotheses and tests them methodically with minimal system impact
|
||||
- Documents all findings thoroughly for postmortem analysis and knowledge sharing
|
||||
- Implements fixes with minimal disruption while considering long-term stability
|
||||
- Adds proactive monitoring and alerting to prevent recurrence of issues
|
||||
- Prioritizes rapid resolution while maintaining system integrity and security
|
||||
- Thinks in terms of distributed systems and considers cascading failure scenarios
|
||||
- Values blameless postmortems and continuous improvement culture
|
||||
- Considers both immediate fixes and long-term architectural improvements
|
||||
- Emphasizes automation and runbook development for common issues
|
||||
|
||||
## Knowledge Base
|
||||
- Modern observability platforms and debugging tools
|
||||
- Distributed system troubleshooting methodologies
|
||||
- Container orchestration and cloud-native debugging techniques
|
||||
- Network troubleshooting and performance analysis
|
||||
- Application performance monitoring and optimization
|
||||
- Incident response best practices and SRE principles
|
||||
- Security debugging and compliance troubleshooting
|
||||
- Database performance and reliability issues
|
||||
|
||||
## Response Approach
|
||||
1. **Assess the situation** with urgency appropriate to impact and scope
|
||||
2. **Gather comprehensive data** from logs, metrics, traces, and system state
|
||||
3. **Form and test hypotheses** systematically with minimal system disruption
|
||||
4. **Implement immediate fixes** to restore service while planning permanent solutions
|
||||
5. **Document thoroughly** for postmortem analysis and future reference
|
||||
6. **Add monitoring and alerting** to detect similar issues proactively
|
||||
7. **Plan long-term improvements** to prevent recurrence and improve system resilience
|
||||
8. **Share knowledge** through runbooks, documentation, and team training
|
||||
9. **Conduct blameless postmortems** to identify systemic improvements
|
||||
|
||||
## Example Interactions
|
||||
- "Debug high memory usage in Kubernetes pods causing frequent OOMKills and restarts"
|
||||
- "Analyze distributed tracing data to identify performance bottleneck in microservices architecture"
|
||||
- "Troubleshoot intermittent 504 gateway timeout errors in production load balancer"
|
||||
- "Investigate CI/CD pipeline failures and implement automated debugging workflows"
|
||||
- "Root cause analysis for database deadlocks causing application timeouts"
|
||||
- "Debug DNS resolution issues affecting service discovery in Kubernetes cluster"
|
||||
- "Analyze logs to identify security breach and implement containment procedures"
|
||||
- "Troubleshoot GitOps deployment failures and implement automated rollback procedures"
|
||||
25
skills/differential-review/README.md
Normal file
25
skills/differential-review/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
214
skills/differential-review/SKILL.md
Normal file
214
skills/differential-review/SKILL.md
Normal file
|
|
@ -0,0 +1,214 @@
|
|||
---
|
||||
name: differential-review
|
||||
description: >
|
||||
Performs security-focused differential review of code changes (PRs, commits, diffs).
|
||||
Adapts analysis depth to codebase size, uses git history for context, calculates
|
||||
blast radius, checks test coverage, and generates comprehensive markdown reports.
|
||||
Automatically...
|
||||
---
|
||||
|
||||
# Differential Security Review
|
||||
|
||||
Security-focused code review for PRs, commits, and diffs.
|
||||
|
||||
## Core Principles
|
||||
|
||||
1. **Risk-First**: Focus on auth, crypto, value transfer, external calls
|
||||
2. **Evidence-Based**: Every finding backed by git history, line numbers, attack scenarios
|
||||
3. **Adaptive**: Scale to codebase size (SMALL/MEDIUM/LARGE)
|
||||
4. **Honest**: Explicitly state coverage limits and confidence level
|
||||
5. **Output-Driven**: Always generate comprehensive markdown report file
|
||||
|
||||
---
|
||||
|
||||
## Rationalizations (Do Not Skip)
|
||||
|
||||
| Rationalization | Why It's Wrong | Required Action |
|
||||
|-----------------|----------------|-----------------|
|
||||
| "Small PR, quick review" | Heartbleed was 2 lines | Classify by RISK, not size |
|
||||
| "I know this codebase" | Familiarity breeds blind spots | Build explicit baseline context |
|
||||
| "Git history takes too long" | History reveals regressions | Never skip Phase 1 |
|
||||
| "Blast radius is obvious" | You'll miss transitive callers | Calculate quantitatively |
|
||||
| "No tests = not my problem" | Missing tests = elevated risk rating | Flag in report, elevate severity |
|
||||
| "Just a refactor, no security impact" | Refactors break invariants | Analyze as HIGH until proven LOW |
|
||||
| "I'll explain verbally" | No artifact = findings lost | Always write report |
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Codebase Size Strategy
|
||||
|
||||
| Codebase Size | Strategy | Approach |
|
||||
|---------------|----------|----------|
|
||||
| SMALL (<20 files) | DEEP | Read all deps, full git blame |
|
||||
| MEDIUM (20-200) | FOCUSED | 1-hop deps, priority files |
|
||||
| LARGE (200+) | SURGICAL | Critical paths only |
|
||||
|
||||
### Risk Level Triggers
|
||||
|
||||
| Risk Level | Triggers |
|
||||
|------------|----------|
|
||||
| HIGH | Auth, crypto, external calls, value transfer, validation removal |
|
||||
| MEDIUM | Business logic, state changes, new public APIs |
|
||||
| LOW | Comments, tests, UI, logging |
|
||||
|
||||
---
|
||||
|
||||
## Workflow Overview
|
||||
|
||||
```
|
||||
Pre-Analysis → Phase 0: Triage → Phase 1: Code Analysis → Phase 2: Test Coverage
|
||||
↓ ↓ ↓ ↓
|
||||
Phase 3: Blast Radius → Phase 4: Deep Context → Phase 5: Adversarial → Phase 6: Report
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Decision Tree
|
||||
|
||||
**Starting a review?**
|
||||
|
||||
```
|
||||
├─ Need detailed phase-by-phase methodology?
|
||||
│ └─ Read: methodology.md
|
||||
│ (Pre-Analysis + Phases 0-4: triage, code analysis, test coverage, blast radius)
|
||||
│
|
||||
├─ Analyzing HIGH RISK change?
|
||||
│ └─ Read: adversarial.md
|
||||
│ (Phase 5: Attacker modeling, exploit scenarios, exploitability rating)
|
||||
│
|
||||
├─ Writing the final report?
|
||||
│ └─ Read: reporting.md
|
||||
│ (Phase 6: Report structure, templates, formatting guidelines)
|
||||
│
|
||||
├─ Looking for specific vulnerability patterns?
|
||||
│ └─ Read: patterns.md
|
||||
│ (Regressions, reentrancy, access control, overflow, etc.)
|
||||
│
|
||||
└─ Quick triage only?
|
||||
└─ Use Quick Reference above, skip detailed docs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quality Checklist
|
||||
|
||||
Before delivering:
|
||||
|
||||
- [ ] All changed files analyzed
|
||||
- [ ] Git blame on removed security code
|
||||
- [ ] Blast radius calculated for HIGH risk
|
||||
- [ ] Attack scenarios are concrete (not generic)
|
||||
- [ ] Findings reference specific line numbers + commits
|
||||
- [ ] Report file generated
|
||||
- [ ] User notified with summary
|
||||
|
||||
---
|
||||
|
||||
## Integration
|
||||
|
||||
**audit-context-building skill:**
|
||||
- Pre-Analysis: Build baseline context
|
||||
- Phase 4: Deep context on HIGH RISK changes
|
||||
|
||||
**issue-writer skill:**
|
||||
- Transform findings into formal audit reports
|
||||
- Command: `issue-writer --input DIFFERENTIAL_REVIEW_REPORT.md --format audit-report`
|
||||
|
||||
---
|
||||
|
||||
## Example Usage
|
||||
|
||||
### Quick Triage (Small PR)
|
||||
```
|
||||
Input: 5 file PR, 2 HIGH RISK files
|
||||
Strategy: Use Quick Reference
|
||||
1. Classify risk level per file (2 HIGH, 3 LOW)
|
||||
2. Focus on 2 HIGH files only
|
||||
3. Git blame removed code
|
||||
4. Generate minimal report
|
||||
Time: ~30 minutes
|
||||
```
|
||||
|
||||
### Standard Review (Medium Codebase)
|
||||
```
|
||||
Input: 80 files, 12 HIGH RISK changes
|
||||
Strategy: FOCUSED (see methodology.md)
|
||||
1. Full workflow on HIGH RISK files
|
||||
2. Surface scan on MEDIUM
|
||||
3. Skip LOW risk files
|
||||
4. Complete report with all sections
|
||||
Time: ~3-4 hours
|
||||
```
|
||||
|
||||
### Deep Audit (Large, Critical Change)
|
||||
```
|
||||
Input: 450 files, auth system rewrite
|
||||
Strategy: SURGICAL + audit-context-building
|
||||
1. Baseline context with audit-context-building
|
||||
2. Deep analysis on auth changes only
|
||||
3. Blast radius analysis
|
||||
4. Adversarial modeling
|
||||
5. Comprehensive report
|
||||
Time: ~6-8 hours
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## When NOT to Use This Skill
|
||||
|
||||
- **Greenfield code** (no baseline to compare)
|
||||
- **Documentation-only changes** (no security impact)
|
||||
- **Formatting/linting** (cosmetic changes)
|
||||
- **User explicitly requests quick summary only** (they accept risk)
|
||||
|
||||
For these cases, use standard code review instead.
|
||||
|
||||
---
|
||||
|
||||
## Red Flags (Stop and Investigate)
|
||||
|
||||
**Immediate escalation triggers:**
|
||||
- Removed code from "security", "CVE", or "fix" commits
|
||||
- Access control modifiers removed (onlyOwner, internal → external)
|
||||
- Validation removed without replacement
|
||||
- External calls added without checks
|
||||
- High blast radius (50+ callers) + HIGH risk change
|
||||
|
||||
These patterns require adversarial analysis even in quick triage.
|
||||
|
||||
---
|
||||
|
||||
## Tips for Best Results
|
||||
|
||||
**Do:**
|
||||
- Start with git blame for removed code
|
||||
- Calculate blast radius early to prioritize
|
||||
- Generate concrete attack scenarios
|
||||
- Reference specific line numbers and commits
|
||||
- Be honest about coverage limitations
|
||||
- Always generate the output file
|
||||
|
||||
**Don't:**
|
||||
- Skip git history analysis
|
||||
- Make generic findings without evidence
|
||||
- Claim full analysis when time-limited
|
||||
- Forget to check test coverage
|
||||
- Miss high blast radius changes
|
||||
- Output report only to chat (file required)
|
||||
|
||||
---
|
||||
|
||||
## Supporting Documentation
|
||||
|
||||
- **methodology.md** - Detailed phase-by-phase workflow (Phases 0-4)
|
||||
- **adversarial.md** - Attacker modeling and exploit scenarios (Phase 5)
|
||||
- **reporting.md** - Report structure and formatting (Phase 6)
|
||||
- **patterns.md** - Common vulnerability patterns reference
|
||||
|
||||
---
|
||||
|
||||
**For first-time users:** Start with methodology.md to understand the complete workflow.
|
||||
|
||||
**For experienced users:** Use this page's Quick Reference and Decision Tree to navigate directly to needed content.
|
||||
25
skills/docs-architect/README.md
Normal file
25
skills/docs-architect/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
96
skills/docs-architect/SKILL.md
Normal file
96
skills/docs-architect/SKILL.md
Normal file
|
|
@ -0,0 +1,96 @@
|
|||
---
|
||||
name: docs-architect
|
||||
description: Creates comprehensive technical documentation from existing codebases. Analyzes architecture, design patterns, and implementation details to produce long-form technical manuals and ebooks.
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: '2026-02-27'
|
||||
---
|
||||
|
||||
## Use this skill when
|
||||
|
||||
- Working on docs architect tasks or workflows
|
||||
- Needing guidance, best practices, or checklists for docs architect
|
||||
|
||||
## Do not use this skill when
|
||||
|
||||
- The task is unrelated to docs architect
|
||||
- You need a different domain or tool outside this scope
|
||||
|
||||
## Instructions
|
||||
|
||||
- Clarify goals, constraints, and required inputs.
|
||||
- Apply relevant best practices and validate outcomes.
|
||||
- Provide actionable steps and verification.
|
||||
- If detailed examples are required, open `resources/implementation-playbook.md`.
|
||||
|
||||
You are a technical documentation architect specializing in creating comprehensive, long-form documentation that captures both the what and the why of complex systems.
|
||||
|
||||
## Core Competencies
|
||||
|
||||
1. **Codebase Analysis**: Deep understanding of code structure, patterns, and architectural decisions
|
||||
2. **Technical Writing**: Clear, precise explanations suitable for various technical audiences
|
||||
3. **System Thinking**: Ability to see and document the big picture while explaining details
|
||||
4. **Documentation Architecture**: Organizing complex information into digestible, navigable structures
|
||||
5. **Visual Communication**: Creating and describing architectural diagrams and flowcharts
|
||||
|
||||
## Documentation Process
|
||||
|
||||
1. **Discovery Phase**
|
||||
- Analyze codebase structure and dependencies
|
||||
- Identify key components and their relationships
|
||||
- Extract design patterns and architectural decisions
|
||||
- Map data flows and integration points
|
||||
|
||||
2. **Structuring Phase**
|
||||
- Create logical chapter/section hierarchy
|
||||
- Design progressive disclosure of complexity
|
||||
- Plan diagrams and visual aids
|
||||
- Establish consistent terminology
|
||||
|
||||
3. **Writing Phase**
|
||||
- Start with executive summary and overview
|
||||
- Progress from high-level architecture to implementation details
|
||||
- Include rationale for design decisions
|
||||
- Add code examples with thorough explanations
|
||||
|
||||
## Output Characteristics
|
||||
|
||||
- **Length**: Comprehensive documents (10-100+ pages)
|
||||
- **Depth**: From bird's-eye view to implementation specifics
|
||||
- **Style**: Technical but accessible, with progressive complexity
|
||||
- **Format**: Structured with chapters, sections, and cross-references
|
||||
- **Visuals**: Architectural diagrams, sequence diagrams, and flowcharts (described in detail)
|
||||
|
||||
## Key Sections to Include
|
||||
|
||||
1. **Executive Summary**: One-page overview for stakeholders
|
||||
2. **Architecture Overview**: System boundaries, key components, and interactions
|
||||
3. **Design Decisions**: Rationale behind architectural choices
|
||||
4. **Core Components**: Deep dive into each major module/service
|
||||
5. **Data Models**: Schema design and data flow documentation
|
||||
6. **Integration Points**: APIs, events, and external dependencies
|
||||
7. **Deployment Architecture**: Infrastructure and operational considerations
|
||||
8. **Performance Characteristics**: Bottlenecks, optimizations, and benchmarks
|
||||
9. **Security Model**: Authentication, authorization, and data protection
|
||||
10. **Appendices**: Glossary, references, and detailed specifications
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Always explain the "why" behind design decisions
|
||||
- Use concrete examples from the actual codebase
|
||||
- Create mental models that help readers understand the system
|
||||
- Document both current state and evolutionary history
|
||||
- Include troubleshooting guides and common pitfalls
|
||||
- Provide reading paths for different audiences (developers, architects, operations)
|
||||
|
||||
## Output Format
|
||||
|
||||
Generate documentation in Markdown format with:
|
||||
- Clear heading hierarchy
|
||||
- Code blocks with syntax highlighting
|
||||
- Tables for structured data
|
||||
- Bullet points for lists
|
||||
- Blockquotes for important notes
|
||||
- Links to relevant code files (using file_path:line_number format)
|
||||
|
||||
Remember: Your goal is to create documentation that serves as the definitive technical reference for the system, suitable for onboarding new team members, architectural reviews, and long-term maintenance.
|
||||
25
skills/documentation-generation-doc-generate/README.md
Normal file
25
skills/documentation-generation-doc-generate/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
51
skills/documentation-generation-doc-generate/SKILL.md
Normal file
51
skills/documentation-generation-doc-generate/SKILL.md
Normal file
|
|
@ -0,0 +1,51 @@
|
|||
---
|
||||
name: documentation-generation-doc-generate
|
||||
description: "You are a documentation expert specializing in creating comprehensive, maintainable documentation from code. Generate API docs, architecture diagrams, user guides, and technical references using AI..."
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Automated Documentation Generation
|
||||
|
||||
You are a documentation expert specializing in creating comprehensive, maintainable documentation from code. Generate API docs, architecture diagrams, user guides, and technical references using AI-powered analysis and industry best practices.
|
||||
|
||||
## Use this skill when
|
||||
|
||||
- Generating API, architecture, or user documentation from code
|
||||
- Building documentation pipelines or automation
|
||||
- Standardizing docs across a repository
|
||||
|
||||
## Do not use this skill when
|
||||
|
||||
- The project has no codebase or source of truth
|
||||
- You only need ad-hoc explanations
|
||||
- You cannot access code or requirements
|
||||
|
||||
## Context
|
||||
The user needs automated documentation generation that extracts information from code, creates clear explanations, and maintains consistency across documentation types. Focus on creating living documentation that stays synchronized with code.
|
||||
|
||||
## Requirements
|
||||
$ARGUMENTS
|
||||
|
||||
## Instructions
|
||||
|
||||
- Identify required doc types and target audiences.
|
||||
- Extract information from code, configs, and comments.
|
||||
- Generate docs with consistent terminology and structure.
|
||||
- Add automation (linting, CI) and validate accuracy.
|
||||
- If detailed examples are required, open `resources/implementation-playbook.md`.
|
||||
|
||||
## Safety
|
||||
|
||||
- Avoid exposing secrets, internal URLs, or sensitive data in docs.
|
||||
|
||||
## Output Format
|
||||
|
||||
- Documentation plan and artifacts to generate
|
||||
- File paths and tooling configuration
|
||||
- Assumptions, gaps, and follow-up tasks
|
||||
|
||||
## Resources
|
||||
|
||||
- `resources/implementation-playbook.md` for detailed examples and templates.
|
||||
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
|
|
@ -0,0 +1,640 @@
|
|||
# Automated Documentation Generation Implementation Playbook
|
||||
|
||||
This file contains detailed patterns, checklists, and code samples referenced by the skill.
|
||||
|
||||
## Instructions
|
||||
|
||||
Generate comprehensive documentation by analyzing the codebase and creating the following artifacts:
|
||||
|
||||
### 1. **API Documentation**
|
||||
- Extract endpoint definitions, parameters, and responses from code
|
||||
- Generate OpenAPI/Swagger specifications
|
||||
- Create interactive API documentation (Swagger UI, Redoc)
|
||||
- Include authentication, rate limiting, and error handling details
|
||||
|
||||
### 2. **Architecture Documentation**
|
||||
- Create system architecture diagrams (Mermaid, PlantUML)
|
||||
- Document component relationships and data flows
|
||||
- Explain service dependencies and communication patterns
|
||||
- Include scalability and reliability considerations
|
||||
|
||||
### 3. **Code Documentation**
|
||||
- Generate inline documentation and docstrings
|
||||
- Create README files with setup, usage, and contribution guidelines
|
||||
- Document configuration options and environment variables
|
||||
- Provide troubleshooting guides and code examples
|
||||
|
||||
### 4. **User Documentation**
|
||||
- Write step-by-step user guides
|
||||
- Create getting started tutorials
|
||||
- Document common workflows and use cases
|
||||
- Include accessibility and localization notes
|
||||
|
||||
### 5. **Documentation Automation**
|
||||
- Configure CI/CD pipelines for automatic doc generation
|
||||
- Set up documentation linting and validation
|
||||
- Implement documentation coverage checks
|
||||
- Automate deployment to hosting platforms
|
||||
|
||||
### Quality Standards
|
||||
|
||||
Ensure all generated documentation:
|
||||
- Is accurate and synchronized with current code
|
||||
- Uses consistent terminology and formatting
|
||||
- Includes practical examples and use cases
|
||||
- Is searchable and well-organized
|
||||
- Follows accessibility best practices
|
||||
|
||||
## Reference Examples
|
||||
|
||||
### Example 1: Code Analysis for Documentation
|
||||
|
||||
**API Documentation Extraction**
|
||||
```python
|
||||
import ast
|
||||
from typing import Dict, List
|
||||
|
||||
class APIDocExtractor:
|
||||
def extract_endpoints(self, code_path):
|
||||
"""Extract API endpoints and their documentation"""
|
||||
endpoints = []
|
||||
|
||||
with open(code_path, 'r') as f:
|
||||
tree = ast.parse(f.read())
|
||||
|
||||
for node in ast.walk(tree):
|
||||
if isinstance(node, ast.FunctionDef):
|
||||
for decorator in node.decorator_list:
|
||||
if self._is_route_decorator(decorator):
|
||||
endpoint = {
|
||||
'method': self._extract_method(decorator),
|
||||
'path': self._extract_path(decorator),
|
||||
'function': node.name,
|
||||
'docstring': ast.get_docstring(node),
|
||||
'parameters': self._extract_parameters(node),
|
||||
'returns': self._extract_returns(node)
|
||||
}
|
||||
endpoints.append(endpoint)
|
||||
return endpoints
|
||||
|
||||
def _extract_parameters(self, func_node):
|
||||
"""Extract function parameters with types"""
|
||||
params = []
|
||||
for arg in func_node.args.args:
|
||||
param = {
|
||||
'name': arg.arg,
|
||||
'type': ast.unparse(arg.annotation) if arg.annotation else None,
|
||||
'required': True
|
||||
}
|
||||
params.append(param)
|
||||
return params
|
||||
```
|
||||
|
||||
**Schema Extraction**
|
||||
```python
|
||||
def extract_pydantic_schemas(file_path):
|
||||
"""Extract Pydantic model definitions for API documentation"""
|
||||
schemas = []
|
||||
|
||||
with open(file_path, 'r') as f:
|
||||
tree = ast.parse(f.read())
|
||||
|
||||
for node in ast.walk(tree):
|
||||
if isinstance(node, ast.ClassDef):
|
||||
if any(base.id == 'BaseModel' for base in node.bases if hasattr(base, 'id')):
|
||||
schema = {
|
||||
'name': node.name,
|
||||
'description': ast.get_docstring(node),
|
||||
'fields': []
|
||||
}
|
||||
|
||||
for item in node.body:
|
||||
if isinstance(item, ast.AnnAssign):
|
||||
field = {
|
||||
'name': item.target.id,
|
||||
'type': ast.unparse(item.annotation),
|
||||
'required': item.value is None
|
||||
}
|
||||
schema['fields'].append(field)
|
||||
schemas.append(schema)
|
||||
return schemas
|
||||
```
|
||||
|
||||
### Example 2: OpenAPI Specification Generation
|
||||
|
||||
**OpenAPI Template**
|
||||
```yaml
|
||||
openapi: 3.0.0
|
||||
info:
|
||||
title: ${API_TITLE}
|
||||
version: ${VERSION}
|
||||
description: |
|
||||
${DESCRIPTION}
|
||||
|
||||
## Authentication
|
||||
${AUTH_DESCRIPTION}
|
||||
|
||||
servers:
|
||||
- url: https://api.example.com/v1
|
||||
description: Production server
|
||||
|
||||
security:
|
||||
- bearerAuth: []
|
||||
|
||||
paths:
|
||||
/users:
|
||||
get:
|
||||
summary: List all users
|
||||
operationId: listUsers
|
||||
tags:
|
||||
- Users
|
||||
parameters:
|
||||
- name: page
|
||||
in: query
|
||||
schema:
|
||||
type: integer
|
||||
default: 1
|
||||
- name: limit
|
||||
in: query
|
||||
schema:
|
||||
type: integer
|
||||
default: 20
|
||||
maximum: 100
|
||||
responses:
|
||||
'200':
|
||||
description: Successful response
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
data:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/User'
|
||||
pagination:
|
||||
$ref: '#/components/schemas/Pagination'
|
||||
'401':
|
||||
$ref: '#/components/responses/Unauthorized'
|
||||
|
||||
components:
|
||||
schemas:
|
||||
User:
|
||||
type: object
|
||||
required:
|
||||
- id
|
||||
- email
|
||||
properties:
|
||||
id:
|
||||
type: string
|
||||
format: uuid
|
||||
email:
|
||||
type: string
|
||||
format: email
|
||||
name:
|
||||
type: string
|
||||
createdAt:
|
||||
type: string
|
||||
format: date-time
|
||||
```
|
||||
|
||||
### Example 3: Architecture Diagrams
|
||||
|
||||
**System Architecture (Mermaid)**
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Frontend"
|
||||
UI[React UI]
|
||||
Mobile[Mobile App]
|
||||
end
|
||||
|
||||
subgraph "API Gateway"
|
||||
Gateway[Kong/nginx]
|
||||
Auth[Auth Service]
|
||||
end
|
||||
|
||||
subgraph "Microservices"
|
||||
UserService[User Service]
|
||||
OrderService[Order Service]
|
||||
PaymentService[Payment Service]
|
||||
end
|
||||
|
||||
subgraph "Data Layer"
|
||||
PostgresMain[(PostgreSQL)]
|
||||
Redis[(Redis Cache)]
|
||||
S3[S3 Storage]
|
||||
end
|
||||
|
||||
UI --> Gateway
|
||||
Mobile --> Gateway
|
||||
Gateway --> Auth
|
||||
Gateway --> UserService
|
||||
Gateway --> OrderService
|
||||
OrderService --> PaymentService
|
||||
UserService --> PostgresMain
|
||||
UserService --> Redis
|
||||
OrderService --> PostgresMain
|
||||
```
|
||||
|
||||
**Component Documentation**
|
||||
```markdown
|
||||
## User Service
|
||||
|
||||
**Purpose**: Manages user accounts, authentication, and profiles
|
||||
|
||||
**Technology Stack**:
|
||||
- Language: Python 3.11
|
||||
- Framework: FastAPI
|
||||
- Database: PostgreSQL
|
||||
- Cache: Redis
|
||||
- Authentication: JWT
|
||||
|
||||
**API Endpoints**:
|
||||
- `POST /users` - Create new user
|
||||
- `GET /users/{id}` - Get user details
|
||||
- `PUT /users/{id}` - Update user
|
||||
- `POST /auth/login` - User login
|
||||
|
||||
**Configuration**:
|
||||
```yaml
|
||||
user_service:
|
||||
port: 8001
|
||||
database:
|
||||
host: postgres.internal
|
||||
name: users_db
|
||||
jwt:
|
||||
secret: ${JWT_SECRET}
|
||||
expiry: 3600
|
||||
```
|
||||
```
|
||||
|
||||
### Example 4: README Generation
|
||||
|
||||
**README Template**
|
||||
```markdown
|
||||
# ${PROJECT_NAME}
|
||||
|
||||
${BADGES}
|
||||
|
||||
${SHORT_DESCRIPTION}
|
||||
|
||||
## Features
|
||||
|
||||
${FEATURES_LIST}
|
||||
|
||||
## Installation
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.8+
|
||||
- PostgreSQL 12+
|
||||
- Redis 6+
|
||||
|
||||
### Using pip
|
||||
|
||||
```bash
|
||||
pip install ${PACKAGE_NAME}
|
||||
```
|
||||
|
||||
### From source
|
||||
|
||||
```bash
|
||||
git clone https://github.com/${GITHUB_ORG}/${REPO_NAME}.git
|
||||
cd ${REPO_NAME}
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
```python
|
||||
${QUICK_START_CODE}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|----------|-------------|---------|----------|
|
||||
| DATABASE_URL | PostgreSQL connection string | - | Yes |
|
||||
| REDIS_URL | Redis connection string | - | Yes |
|
||||
| SECRET_KEY | Application secret key | - | Yes |
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
# Clone and setup
|
||||
git clone https://github.com/${GITHUB_ORG}/${REPO_NAME}.git
|
||||
cd ${REPO_NAME}
|
||||
python -m venv venv
|
||||
source venv/bin/activate
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements-dev.txt
|
||||
|
||||
# Run tests
|
||||
pytest
|
||||
|
||||
# Start development server
|
||||
python manage.py runserver
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
# Run all tests
|
||||
pytest
|
||||
|
||||
# Run with coverage
|
||||
pytest --cov=your_package
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
1. Fork the repository
|
||||
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
||||
3. Commit your changes (`git commit -m 'Add amazing feature'`)
|
||||
4. Push to the branch (`git push origin feature/amazing-feature`)
|
||||
5. Open a Pull Request
|
||||
|
||||
## License
|
||||
|
||||
This project is licensed under the ${LICENSE} License - see the LICENSE file for details.
|
||||
```
|
||||
|
||||
### Example 5: Function Documentation Generator
|
||||
|
||||
```python
|
||||
import inspect
|
||||
|
||||
def generate_function_docs(func):
|
||||
"""Generate comprehensive documentation for a function"""
|
||||
sig = inspect.signature(func)
|
||||
params = []
|
||||
args_doc = []
|
||||
|
||||
for param_name, param in sig.parameters.items():
|
||||
param_str = param_name
|
||||
if param.annotation != param.empty:
|
||||
param_str += f": {param.annotation.__name__}"
|
||||
if param.default != param.empty:
|
||||
param_str += f" = {param.default}"
|
||||
params.append(param_str)
|
||||
args_doc.append(f"{param_name}: Description of {param_name}")
|
||||
|
||||
return_type = ""
|
||||
if sig.return_annotation != sig.empty:
|
||||
return_type = f" -> {sig.return_annotation.__name__}"
|
||||
|
||||
doc_template = f'''
|
||||
def {func.__name__}({", ".join(params)}){return_type}:
|
||||
"""
|
||||
Brief description of {func.__name__}
|
||||
|
||||
Args:
|
||||
{chr(10).join(f" {arg}" for arg in args_doc)}
|
||||
|
||||
Returns:
|
||||
Description of return value
|
||||
|
||||
Examples:
|
||||
>>> {func.__name__}(example_input)
|
||||
expected_output
|
||||
"""
|
||||
'''
|
||||
return doc_template
|
||||
```
|
||||
|
||||
### Example 6: User Guide Template
|
||||
|
||||
```markdown
|
||||
# User Guide
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Creating Your First ${FEATURE}
|
||||
|
||||
1. **Navigate to the Dashboard**
|
||||
|
||||
Click on the ${FEATURE} tab in the main navigation menu.
|
||||
|
||||
2. **Click "Create New"**
|
||||
|
||||
You'll find the "Create New" button in the top right corner.
|
||||
|
||||
3. **Fill in the Details**
|
||||
|
||||
- **Name**: Enter a descriptive name
|
||||
- **Description**: Add optional details
|
||||
- **Settings**: Configure as needed
|
||||
|
||||
4. **Save Your Changes**
|
||||
|
||||
Click "Save" to create your ${FEATURE}.
|
||||
|
||||
### Common Tasks
|
||||
|
||||
#### Editing ${FEATURE}
|
||||
|
||||
1. Find your ${FEATURE} in the list
|
||||
2. Click the "Edit" button
|
||||
3. Make your changes
|
||||
4. Click "Save"
|
||||
|
||||
#### Deleting ${FEATURE}
|
||||
|
||||
> ⚠️ **Warning**: Deletion is permanent and cannot be undone.
|
||||
|
||||
1. Find your ${FEATURE} in the list
|
||||
2. Click the "Delete" button
|
||||
3. Confirm the deletion
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
| Error | Meaning | Solution |
|
||||
|-------|---------|----------|
|
||||
| "Name required" | The name field is empty | Enter a name |
|
||||
| "Permission denied" | You don't have access | Contact admin |
|
||||
| "Server error" | Technical issue | Try again later |
|
||||
```
|
||||
|
||||
### Example 7: Interactive API Playground
|
||||
|
||||
**Swagger UI Setup**
|
||||
```html
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<title>API Documentation</title>
|
||||
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/swagger-ui-dist@latest/swagger-ui.css">
|
||||
</head>
|
||||
<body>
|
||||
<div id="swagger-ui"></div>
|
||||
|
||||
<script src="https://cdn.jsdelivr.net/npm/swagger-ui-dist@latest/swagger-ui-bundle.js"></script>
|
||||
<script>
|
||||
window.onload = function() {
|
||||
SwaggerUIBundle({
|
||||
url: "/api/openapi.json",
|
||||
dom_id: '#swagger-ui',
|
||||
deepLinking: true,
|
||||
presets: [SwaggerUIBundle.presets.apis],
|
||||
layout: "StandaloneLayout"
|
||||
});
|
||||
}
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
```
|
||||
|
||||
**Code Examples Generator**
|
||||
```python
|
||||
def generate_code_examples(endpoint):
|
||||
"""Generate code examples for API endpoints in multiple languages"""
|
||||
examples = {}
|
||||
|
||||
# Python
|
||||
examples['python'] = f'''
|
||||
import requests
|
||||
|
||||
url = "https://api.example.com{endpoint['path']}"
|
||||
headers = {{"Authorization": "Bearer YOUR_API_KEY"}}
|
||||
|
||||
response = requests.{endpoint['method'].lower()}(url, headers=headers)
|
||||
print(response.json())
|
||||
'''
|
||||
|
||||
# JavaScript
|
||||
examples['javascript'] = f'''
|
||||
const response = await fetch('https://api.example.com{endpoint['path']}', {{
|
||||
method: '{endpoint['method']}',
|
||||
headers: {{'Authorization': 'Bearer YOUR_API_KEY'}}
|
||||
}});
|
||||
|
||||
const data = await response.json();
|
||||
console.log(data);
|
||||
'''
|
||||
|
||||
# cURL
|
||||
examples['curl'] = f'''
|
||||
curl -X {endpoint['method']} https://api.example.com{endpoint['path']} \\
|
||||
-H "Authorization: Bearer YOUR_API_KEY"
|
||||
'''
|
||||
|
||||
return examples
|
||||
```
|
||||
|
||||
### Example 8: Documentation CI/CD
|
||||
|
||||
**GitHub Actions Workflow**
|
||||
```yaml
|
||||
name: Generate Documentation
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
paths:
|
||||
- 'src/**'
|
||||
- 'api/**'
|
||||
|
||||
jobs:
|
||||
generate-docs:
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: '3.11'
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
pip install -r requirements-docs.txt
|
||||
npm install -g @redocly/cli
|
||||
|
||||
- name: Generate API documentation
|
||||
run: |
|
||||
python scripts/generate_openapi.py > docs/api/openapi.json
|
||||
redocly build-docs docs/api/openapi.json -o docs/api/index.html
|
||||
|
||||
- name: Generate code documentation
|
||||
run: sphinx-build -b html docs/source docs/build
|
||||
|
||||
- name: Deploy to GitHub Pages
|
||||
uses: peaceiris/actions-gh-pages@v3
|
||||
with:
|
||||
github_token: ${{ secrets.GITHUB_TOKEN }}
|
||||
publish_dir: ./docs/build
|
||||
```
|
||||
|
||||
### Example 9: Documentation Coverage Validation
|
||||
|
||||
```python
|
||||
import ast
|
||||
import glob
|
||||
|
||||
class DocCoverage:
|
||||
def check_coverage(self, codebase_path):
|
||||
"""Check documentation coverage for codebase"""
|
||||
results = {
|
||||
'total_functions': 0,
|
||||
'documented_functions': 0,
|
||||
'total_classes': 0,
|
||||
'documented_classes': 0,
|
||||
'missing_docs': []
|
||||
}
|
||||
|
||||
for file_path in glob.glob(f"{codebase_path}/**/*.py", recursive=True):
|
||||
module = ast.parse(open(file_path).read())
|
||||
|
||||
for node in ast.walk(module):
|
||||
if isinstance(node, ast.FunctionDef):
|
||||
results['total_functions'] += 1
|
||||
if ast.get_docstring(node):
|
||||
results['documented_functions'] += 1
|
||||
else:
|
||||
results['missing_docs'].append({
|
||||
'type': 'function',
|
||||
'name': node.name,
|
||||
'file': file_path,
|
||||
'line': node.lineno
|
||||
})
|
||||
|
||||
elif isinstance(node, ast.ClassDef):
|
||||
results['total_classes'] += 1
|
||||
if ast.get_docstring(node):
|
||||
results['documented_classes'] += 1
|
||||
else:
|
||||
results['missing_docs'].append({
|
||||
'type': 'class',
|
||||
'name': node.name,
|
||||
'file': file_path,
|
||||
'line': node.lineno
|
||||
})
|
||||
|
||||
# Calculate coverage percentages
|
||||
results['function_coverage'] = (
|
||||
results['documented_functions'] / results['total_functions'] * 100
|
||||
if results['total_functions'] > 0 else 100
|
||||
)
|
||||
results['class_coverage'] = (
|
||||
results['documented_classes'] / results['total_classes'] * 100
|
||||
if results['total_classes'] > 0 else 100
|
||||
)
|
||||
|
||||
return results
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
1. **API Documentation**: OpenAPI spec with interactive playground
|
||||
2. **Architecture Diagrams**: System, sequence, and component diagrams
|
||||
3. **Code Documentation**: Inline docs, docstrings, and type hints
|
||||
4. **User Guides**: Step-by-step tutorials
|
||||
5. **Developer Guides**: Setup, contribution, and API usage guides
|
||||
6. **Reference Documentation**: Complete API reference with examples
|
||||
7. **Documentation Site**: Deployed static site with search functionality
|
||||
|
||||
Focus on creating documentation that is accurate, comprehensive, and easy to maintain alongside code changes.
|
||||
25
skills/documentation-templates/README.md
Normal file
25
skills/documentation-templates/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
199
skills/documentation-templates/SKILL.md
Normal file
199
skills/documentation-templates/SKILL.md
Normal file
|
|
@ -0,0 +1,199 @@
|
|||
---
|
||||
name: documentation-templates
|
||||
description: "Documentation templates and structure guidelines. README, API docs, code comments, and AI-friendly documentation."
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Documentation Templates
|
||||
|
||||
> Templates and structure guidelines for common documentation types.
|
||||
|
||||
---
|
||||
|
||||
## 1. README Structure
|
||||
|
||||
### Essential Sections (Priority Order)
|
||||
|
||||
| Section | Purpose |
|
||||
|---------|---------|
|
||||
| **Title + One-liner** | What is this? |
|
||||
| **Quick Start** | Running in <5 min |
|
||||
| **Features** | What can I do? |
|
||||
| **Configuration** | How to customize |
|
||||
| **API Reference** | Link to detailed docs |
|
||||
| **Contributing** | How to help |
|
||||
| **License** | Legal |
|
||||
|
||||
### README Template
|
||||
|
||||
```markdown
|
||||
# Project Name
|
||||
|
||||
Brief one-line description.
|
||||
|
||||
## Quick Start
|
||||
|
||||
[Minimum steps to run]
|
||||
|
||||
## Features
|
||||
|
||||
- Feature 1
|
||||
- Feature 2
|
||||
|
||||
## Configuration
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| PORT | Server port | 3000 |
|
||||
|
||||
## Documentation
|
||||
|
||||
- API Reference
|
||||
- Architecture
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. API Documentation Structure
|
||||
|
||||
### Per-Endpoint Template
|
||||
|
||||
```markdown
|
||||
## GET /users/:id
|
||||
|
||||
Get a user by ID.
|
||||
|
||||
**Parameters:**
|
||||
| Name | Type | Required | Description |
|
||||
|------|------|----------|-------------|
|
||||
| id | string | Yes | User ID |
|
||||
|
||||
**Response:**
|
||||
- 200: User object
|
||||
- 404: User not found
|
||||
|
||||
**Example:**
|
||||
[Request and response example]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Code Comment Guidelines
|
||||
|
||||
### JSDoc/TSDoc Template
|
||||
|
||||
```typescript
|
||||
/**
|
||||
* Brief description of what the function does.
|
||||
*
|
||||
* @param paramName - Description of parameter
|
||||
* @returns Description of return value
|
||||
* @throws ErrorType - When this error occurs
|
||||
*
|
||||
* @example
|
||||
* const result = functionName(input);
|
||||
*/
|
||||
```
|
||||
|
||||
### When to Comment
|
||||
|
||||
| ✅ Comment | ❌ Don't Comment |
|
||||
|-----------|-----------------|
|
||||
| Why (business logic) | What (obvious) |
|
||||
| Complex algorithms | Every line |
|
||||
| Non-obvious behavior | Self-explanatory code |
|
||||
| API contracts | Implementation details |
|
||||
|
||||
---
|
||||
|
||||
## 4. Changelog Template (Keep a Changelog)
|
||||
|
||||
```markdown
|
||||
# Changelog
|
||||
|
||||
## [Unreleased]
|
||||
### Added
|
||||
- New feature
|
||||
|
||||
## [1.0.0] - 2025-01-01
|
||||
### Added
|
||||
- Initial release
|
||||
### Changed
|
||||
- Updated dependency
|
||||
### Fixed
|
||||
- Bug fix
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Architecture Decision Record (ADR)
|
||||
|
||||
```markdown
|
||||
# ADR-001: [Title]
|
||||
|
||||
## Status
|
||||
Accepted / Deprecated / Superseded
|
||||
|
||||
## Context
|
||||
Why are we making this decision?
|
||||
|
||||
## Decision
|
||||
What did we decide?
|
||||
|
||||
## Consequences
|
||||
What are the trade-offs?
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. AI-Friendly Documentation (2025)
|
||||
|
||||
### llms.txt Template
|
||||
|
||||
For AI crawlers and agents:
|
||||
|
||||
```markdown
|
||||
# Project Name
|
||||
> One-line objective.
|
||||
|
||||
## Core Files
|
||||
- [src/index.ts]: Main entry
|
||||
- [src/api/]: API routes
|
||||
- [docs/]: Documentation
|
||||
|
||||
## Key Concepts
|
||||
- Concept 1: Brief explanation
|
||||
- Concept 2: Brief explanation
|
||||
```
|
||||
|
||||
### MCP-Ready Documentation
|
||||
|
||||
For RAG indexing:
|
||||
- Clear H1-H3 hierarchy
|
||||
- JSON/YAML examples for data structures
|
||||
- Mermaid diagrams for flows
|
||||
- Self-contained sections
|
||||
|
||||
---
|
||||
|
||||
## 7. Structure Principles
|
||||
|
||||
| Principle | Why |
|
||||
|-----------|-----|
|
||||
| **Scannable** | Headers, lists, tables |
|
||||
| **Examples first** | Show, don't just tell |
|
||||
| **Progressive detail** | Simple → Complex |
|
||||
| **Up to date** | Outdated = misleading |
|
||||
|
||||
---
|
||||
|
||||
> **Remember:** Templates are starting points. Adapt to your project's needs.
|
||||
|
||||
## When to Use
|
||||
This skill is applicable to execute the workflow or actions described in the overview.
|
||||
25
skills/documentation/README.md
Normal file
25
skills/documentation/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
260
skills/documentation/SKILL.md
Normal file
260
skills/documentation/SKILL.md
Normal file
|
|
@ -0,0 +1,260 @@
|
|||
---
|
||||
name: documentation
|
||||
description: "Documentation generation workflow covering API docs, architecture docs, README files, code comments, and technical writing."
|
||||
category: workflow-bundle
|
||||
risk: safe
|
||||
source: personal
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Documentation Workflow Bundle
|
||||
|
||||
## Overview
|
||||
|
||||
Comprehensive documentation workflow for generating API documentation, architecture documentation, README files, code comments, and technical content from codebases.
|
||||
|
||||
## When to Use This Workflow
|
||||
|
||||
Use this workflow when:
|
||||
- Creating project documentation
|
||||
- Generating API documentation
|
||||
- Writing architecture docs
|
||||
- Documenting code
|
||||
- Creating user guides
|
||||
- Maintaining wikis
|
||||
|
||||
## Workflow Phases
|
||||
|
||||
### Phase 1: Documentation Planning
|
||||
|
||||
#### Skills to Invoke
|
||||
- `docs-architect` - Documentation architecture
|
||||
- `documentation-templates` - Documentation templates
|
||||
|
||||
#### Actions
|
||||
1. Identify documentation needs
|
||||
2. Choose documentation tools
|
||||
3. Plan documentation structure
|
||||
4. Define style guidelines
|
||||
5. Set up documentation site
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @docs-architect to plan documentation structure
|
||||
```
|
||||
|
||||
```
|
||||
Use @documentation-templates to set up documentation
|
||||
```
|
||||
|
||||
### Phase 2: API Documentation
|
||||
|
||||
#### Skills to Invoke
|
||||
- `api-documenter` - API documentation
|
||||
- `api-documentation-generator` - Auto-generation
|
||||
- `openapi-spec-generation` - OpenAPI specs
|
||||
|
||||
#### Actions
|
||||
1. Extract API endpoints
|
||||
2. Generate OpenAPI specs
|
||||
3. Create API reference
|
||||
4. Add usage examples
|
||||
5. Set up auto-generation
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @api-documenter to generate API documentation
|
||||
```
|
||||
|
||||
```
|
||||
Use @openapi-spec-generation to create OpenAPI specs
|
||||
```
|
||||
|
||||
### Phase 3: Architecture Documentation
|
||||
|
||||
#### Skills to Invoke
|
||||
- `c4-architecture-c4-architecture` - C4 architecture
|
||||
- `c4-context` - Context diagrams
|
||||
- `c4-container` - Container diagrams
|
||||
- `c4-component` - Component diagrams
|
||||
- `c4-code` - Code diagrams
|
||||
- `mermaid-expert` - Mermaid diagrams
|
||||
|
||||
#### Actions
|
||||
1. Create C4 diagrams
|
||||
2. Document architecture
|
||||
3. Generate sequence diagrams
|
||||
4. Document data flows
|
||||
5. Create deployment docs
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @c4-architecture-c4-architecture to create C4 diagrams
|
||||
```
|
||||
|
||||
```
|
||||
Use @mermaid-expert to create architecture diagrams
|
||||
```
|
||||
|
||||
### Phase 4: Code Documentation
|
||||
|
||||
#### Skills to Invoke
|
||||
- `code-documentation-code-explain` - Code explanation
|
||||
- `code-documentation-doc-generate` - Doc generation
|
||||
- `documentation-generation-doc-generate` - Auto-generation
|
||||
|
||||
#### Actions
|
||||
1. Extract code comments
|
||||
2. Generate JSDoc/TSDoc
|
||||
3. Create type documentation
|
||||
4. Document functions
|
||||
5. Add usage examples
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @code-documentation-code-explain to explain code
|
||||
```
|
||||
|
||||
```
|
||||
Use @code-documentation-doc-generate to generate docs
|
||||
```
|
||||
|
||||
### Phase 5: README and Getting Started
|
||||
|
||||
#### Skills to Invoke
|
||||
- `readme` - README generation
|
||||
- `environment-setup-guide` - Setup guides
|
||||
- `tutorial-engineer` - Tutorial creation
|
||||
|
||||
#### Actions
|
||||
1. Create README
|
||||
2. Write getting started guide
|
||||
3. Document installation
|
||||
4. Add usage examples
|
||||
5. Create troubleshooting guide
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @readme to create project README
|
||||
```
|
||||
|
||||
```
|
||||
Use @tutorial-engineer to create tutorials
|
||||
```
|
||||
|
||||
### Phase 6: Wiki and Knowledge Base
|
||||
|
||||
#### Skills to Invoke
|
||||
- `wiki-architect` - Wiki architecture
|
||||
- `wiki-page-writer` - Wiki pages
|
||||
- `wiki-onboarding` - Onboarding docs
|
||||
- `wiki-qa` - Wiki Q&A
|
||||
- `wiki-researcher` - Wiki research
|
||||
- `wiki-vitepress` - VitePress wiki
|
||||
|
||||
#### Actions
|
||||
1. Design wiki structure
|
||||
2. Create wiki pages
|
||||
3. Write onboarding guides
|
||||
4. Document processes
|
||||
5. Set up wiki site
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @wiki-architect to design wiki structure
|
||||
```
|
||||
|
||||
```
|
||||
Use @wiki-page-writer to create wiki pages
|
||||
```
|
||||
|
||||
```
|
||||
Use @wiki-onboarding to create onboarding docs
|
||||
```
|
||||
|
||||
### Phase 7: Changelog and Release Notes
|
||||
|
||||
#### Skills to Invoke
|
||||
- `changelog-automation` - Changelog generation
|
||||
- `wiki-changelog` - Changelog from git
|
||||
|
||||
#### Actions
|
||||
1. Extract commit history
|
||||
2. Categorize changes
|
||||
3. Generate changelog
|
||||
4. Create release notes
|
||||
5. Publish updates
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @changelog-automation to generate changelog
|
||||
```
|
||||
|
||||
```
|
||||
Use @wiki-changelog to create release notes
|
||||
```
|
||||
|
||||
### Phase 8: Documentation Maintenance
|
||||
|
||||
#### Skills to Invoke
|
||||
- `doc-coauthoring` - Collaborative writing
|
||||
- `reference-builder` - Reference docs
|
||||
|
||||
#### Actions
|
||||
1. Review documentation
|
||||
2. Update outdated content
|
||||
3. Fix broken links
|
||||
4. Add new features
|
||||
5. Gather feedback
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @doc-coauthoring to collaborate on docs
|
||||
```
|
||||
|
||||
## Documentation Types
|
||||
|
||||
### Code-Level
|
||||
- JSDoc/TSDoc comments
|
||||
- Function documentation
|
||||
- Type definitions
|
||||
- Example code
|
||||
|
||||
### API Documentation
|
||||
- Endpoint reference
|
||||
- Request/response schemas
|
||||
- Authentication guides
|
||||
- SDK documentation
|
||||
|
||||
### Architecture Documentation
|
||||
- System overview
|
||||
- Component diagrams
|
||||
- Data flow diagrams
|
||||
- Deployment architecture
|
||||
|
||||
### User Documentation
|
||||
- Getting started guides
|
||||
- User manuals
|
||||
- Tutorials
|
||||
- FAQs
|
||||
|
||||
### Process Documentation
|
||||
- Runbooks
|
||||
- Onboarding guides
|
||||
- SOPs
|
||||
- Decision records
|
||||
|
||||
## Quality Gates
|
||||
|
||||
- [ ] All APIs documented
|
||||
- [ ] Architecture diagrams current
|
||||
- [ ] README up to date
|
||||
- [ ] Code comments helpful
|
||||
- [ ] Examples working
|
||||
- [ ] Links valid
|
||||
|
||||
## Related Workflow Bundles
|
||||
|
||||
- `development` - Development workflow
|
||||
- `testing-qa` - Documentation testing
|
||||
- `ai-ml` - AI documentation
|
||||
25
skills/fix-review/README.md
Normal file
25
skills/fix-review/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
54
skills/fix-review/SKILL.md
Normal file
54
skills/fix-review/SKILL.md
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
---
|
||||
name: fix-review
|
||||
description: "Verify fix commits address audit findings without new bugs"
|
||||
risk: safe
|
||||
source: "https://github.com/trailofbits/skills/tree/main/plugins/fix-review"
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Fix Review
|
||||
|
||||
## Overview
|
||||
|
||||
Verify that fix commits properly address audit findings without introducing new bugs or security vulnerabilities.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when you need to verify fix commits address audit findings without new bugs.
|
||||
|
||||
Use this skill when:
|
||||
- Reviewing commits that address security audit findings
|
||||
- Verifying that fixes don't introduce new vulnerabilities
|
||||
- Ensuring code changes properly resolve identified issues
|
||||
- Validating that remediation efforts are complete and correct
|
||||
|
||||
## Instructions
|
||||
|
||||
This skill helps verify that fix commits properly address audit findings:
|
||||
|
||||
1. **Review Fix Commits**: Analyze commits that claim to fix audit findings
|
||||
2. **Verify Resolution**: Ensure the original issue is properly addressed
|
||||
3. **Check for Regressions**: Verify no new bugs or vulnerabilities are introduced
|
||||
4. **Validate Completeness**: Ensure all aspects of the finding are resolved
|
||||
|
||||
## Review Process
|
||||
|
||||
When reviewing fix commits:
|
||||
|
||||
1. Compare the fix against the original audit finding
|
||||
2. Verify the fix addresses the root cause, not just symptoms
|
||||
3. Check for potential side effects or new issues
|
||||
4. Validate that tests cover the fixed scenario
|
||||
5. Ensure no similar vulnerabilities exist elsewhere
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Review fixes in context of the full codebase
|
||||
- Verify test coverage for the fixed issue
|
||||
- Check for similar patterns that might need fixing
|
||||
- Ensure fixes follow security best practices
|
||||
- Document the resolution approach
|
||||
|
||||
## Resources
|
||||
|
||||
For more information, see the [source repository](https://github.com/trailofbits/skills/tree/main/plugins/fix-review).
|
||||
25
skills/git-pushing/README.md
Normal file
25
skills/git-pushing/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
36
skills/git-pushing/SKILL.md
Normal file
36
skills/git-pushing/SKILL.md
Normal file
|
|
@ -0,0 +1,36 @@
|
|||
---
|
||||
name: git-pushing
|
||||
description: "Stage, commit, and push git changes with conventional commit messages. Use when user wants to commit and push changes, mentions pushing to remote, or asks to save and push their work. Also activate..."
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Git Push Workflow
|
||||
|
||||
Stage all changes, create a conventional commit, and push to the remote branch.
|
||||
|
||||
## When to Use
|
||||
|
||||
Automatically activate when the user:
|
||||
|
||||
- Explicitly asks to push changes ("push this", "commit and push")
|
||||
- Mentions saving work to remote ("save to github", "push to remote")
|
||||
- Completes a feature and wants to share it
|
||||
- Says phrases like "let's push this up" or "commit these changes"
|
||||
|
||||
## Workflow
|
||||
|
||||
**ALWAYS use the script** - do NOT use manual git commands:
|
||||
|
||||
```bash
|
||||
bash skills/git-pushing/scripts/smart_commit.sh
|
||||
```
|
||||
|
||||
With custom message:
|
||||
|
||||
```bash
|
||||
bash skills/git-pushing/scripts/smart_commit.sh "feat: add feature"
|
||||
```
|
||||
|
||||
Script handles: staging, conventional commit message, Claude footer, push with -u flag.
|
||||
19
skills/git-pushing/scripts/smart_commit.sh
Normal file
19
skills/git-pushing/scripts/smart_commit.sh
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
# Default commit message if none provided
|
||||
MESSAGE="${1:-chore: update code}"
|
||||
|
||||
# Add all changes
|
||||
git add .
|
||||
|
||||
# Commit with the provided message
|
||||
git commit -m "$MESSAGE"
|
||||
|
||||
# Get current branch name
|
||||
BRANCH=$(git rev-parse --abbrev-ref HEAD)
|
||||
|
||||
# Push to remote, setting upstream if needed
|
||||
git push -u origin "$BRANCH"
|
||||
|
||||
echo "✅ Successfully pushed to $BRANCH"
|
||||
25
skills/helm-chart-scaffolding/README.md
Normal file
25
skills/helm-chart-scaffolding/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
37
skills/helm-chart-scaffolding/SKILL.md
Normal file
37
skills/helm-chart-scaffolding/SKILL.md
Normal file
|
|
@ -0,0 +1,37 @@
|
|||
---
|
||||
name: helm-chart-scaffolding
|
||||
description: "Design, organize, and manage Helm charts for templating and packaging Kubernetes applications with reusable configurations. Use when creating Helm charts, packaging Kubernetes applications, or impl..."
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Helm Chart Scaffolding
|
||||
|
||||
Comprehensive guidance for creating, organizing, and managing Helm charts for packaging and deploying Kubernetes applications.
|
||||
|
||||
## Use this skill when
|
||||
|
||||
Use this skill when you need to:
|
||||
- Create new Helm charts from scratch
|
||||
- Package Kubernetes applications for distribution
|
||||
- Manage multi-environment deployments with Helm
|
||||
- Implement templating for reusable Kubernetes manifests
|
||||
- Set up Helm chart repositories
|
||||
- Follow Helm best practices and conventions
|
||||
|
||||
## Do not use this skill when
|
||||
|
||||
- The task is unrelated to helm chart scaffolding
|
||||
- You need a different domain or tool outside this scope
|
||||
|
||||
## Instructions
|
||||
|
||||
- Clarify goals, constraints, and required inputs.
|
||||
- Apply relevant best practices and validate outcomes.
|
||||
- Provide actionable steps and verification.
|
||||
- If detailed examples are required, open `resources/implementation-playbook.md`.
|
||||
|
||||
## Resources
|
||||
|
||||
- `resources/implementation-playbook.md` for detailed patterns and examples.
|
||||
42
skills/helm-chart-scaffolding/assets/Chart.yaml.template
Normal file
42
skills/helm-chart-scaffolding/assets/Chart.yaml.template
Normal file
|
|
@ -0,0 +1,42 @@
|
|||
apiVersion: v2
|
||||
name: <chart-name>
|
||||
description: <Chart description>
|
||||
type: application
|
||||
version: 0.1.0
|
||||
appVersion: "1.0.0"
|
||||
|
||||
keywords:
|
||||
- <keyword1>
|
||||
- <keyword2>
|
||||
|
||||
home: https://github.com/<org>/<repo>
|
||||
|
||||
sources:
|
||||
- https://github.com/<org>/<repo>
|
||||
|
||||
maintainers:
|
||||
- name: <Maintainer Name>
|
||||
email: <maintainer@example.com>
|
||||
url: https://github.com/<username>
|
||||
|
||||
icon: https://example.com/icon.png
|
||||
|
||||
kubeVersion: ">=1.24.0"
|
||||
|
||||
dependencies:
|
||||
- name: postgresql
|
||||
version: "12.0.0"
|
||||
repository: "https://charts.bitnami.com/bitnami"
|
||||
condition: postgresql.enabled
|
||||
tags:
|
||||
- database
|
||||
- name: redis
|
||||
version: "17.0.0"
|
||||
repository: "https://charts.bitnami.com/bitnami"
|
||||
condition: redis.enabled
|
||||
tags:
|
||||
- cache
|
||||
|
||||
annotations:
|
||||
category: Application
|
||||
licenses: Apache-2.0
|
||||
185
skills/helm-chart-scaffolding/assets/values.yaml.template
Normal file
185
skills/helm-chart-scaffolding/assets/values.yaml.template
Normal file
|
|
@ -0,0 +1,185 @@
|
|||
# Global values shared with subcharts
|
||||
global:
|
||||
imageRegistry: docker.io
|
||||
imagePullSecrets: []
|
||||
storageClass: ""
|
||||
|
||||
# Image configuration
|
||||
image:
|
||||
registry: docker.io
|
||||
repository: myapp/web
|
||||
tag: "" # Defaults to .Chart.AppVersion
|
||||
pullPolicy: IfNotPresent
|
||||
|
||||
# Override chart name
|
||||
nameOverride: ""
|
||||
fullnameOverride: ""
|
||||
|
||||
# Number of replicas
|
||||
replicaCount: 3
|
||||
revisionHistoryLimit: 10
|
||||
|
||||
# ServiceAccount
|
||||
serviceAccount:
|
||||
create: true
|
||||
annotations: {}
|
||||
name: ""
|
||||
|
||||
# Pod annotations
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "9090"
|
||||
prometheus.io/path: "/metrics"
|
||||
|
||||
# Pod security context
|
||||
podSecurityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
fsGroup: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
|
||||
# Container security context
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
|
||||
# Service configuration
|
||||
service:
|
||||
type: ClusterIP
|
||||
port: 80
|
||||
targetPort: http
|
||||
annotations: {}
|
||||
sessionAffinity: None
|
||||
|
||||
# Ingress configuration
|
||||
ingress:
|
||||
enabled: false
|
||||
className: nginx
|
||||
annotations: {}
|
||||
hosts:
|
||||
- host: app.example.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
tls: []
|
||||
|
||||
# Resources
|
||||
resources:
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 512Mi
|
||||
requests:
|
||||
cpu: 250m
|
||||
memory: 256Mi
|
||||
|
||||
# Liveness probe
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health/live
|
||||
port: http
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
|
||||
# Readiness probe
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health/ready
|
||||
port: http
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 5
|
||||
|
||||
# Autoscaling
|
||||
autoscaling:
|
||||
enabled: false
|
||||
minReplicas: 2
|
||||
maxReplicas: 10
|
||||
targetCPUUtilizationPercentage: 80
|
||||
targetMemoryUtilizationPercentage: 80
|
||||
|
||||
# Pod Disruption Budget
|
||||
podDisruptionBudget:
|
||||
enabled: true
|
||||
minAvailable: 1
|
||||
|
||||
# Node selection
|
||||
nodeSelector: {}
|
||||
tolerations: []
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
preferredDuringSchedulingIgnoredDuringExecution:
|
||||
- weight: 100
|
||||
podAffinityTerm:
|
||||
labelSelector:
|
||||
matchExpressions:
|
||||
- key: app.kubernetes.io/name
|
||||
operator: In
|
||||
values:
|
||||
- '{{ include "my-app.name" . }}'
|
||||
topologyKey: kubernetes.io/hostname
|
||||
|
||||
# Environment variables
|
||||
env: []
|
||||
# - name: LOG_LEVEL
|
||||
# value: "info"
|
||||
|
||||
# ConfigMap data
|
||||
configMap:
|
||||
enabled: true
|
||||
data: {}
|
||||
# APP_MODE: production
|
||||
# DATABASE_HOST: postgres.example.com
|
||||
|
||||
# Secrets (use external secret management in production)
|
||||
secrets:
|
||||
enabled: false
|
||||
data: {}
|
||||
|
||||
# Persistent Volume
|
||||
persistence:
|
||||
enabled: false
|
||||
storageClass: ""
|
||||
accessMode: ReadWriteOnce
|
||||
size: 10Gi
|
||||
annotations: {}
|
||||
|
||||
# PostgreSQL dependency
|
||||
postgresql:
|
||||
enabled: false
|
||||
auth:
|
||||
database: myapp
|
||||
username: myapp
|
||||
password: changeme
|
||||
primary:
|
||||
persistence:
|
||||
enabled: true
|
||||
size: 10Gi
|
||||
|
||||
# Redis dependency
|
||||
redis:
|
||||
enabled: false
|
||||
auth:
|
||||
enabled: false
|
||||
master:
|
||||
persistence:
|
||||
enabled: false
|
||||
|
||||
# ServiceMonitor for Prometheus Operator
|
||||
serviceMonitor:
|
||||
enabled: false
|
||||
interval: 30s
|
||||
scrapeTimeout: 10s
|
||||
labels: {}
|
||||
|
||||
# Network Policy
|
||||
networkPolicy:
|
||||
enabled: false
|
||||
policyTypes:
|
||||
- Ingress
|
||||
- Egress
|
||||
ingress: []
|
||||
egress: []
|
||||
25
skills/helm-chart-scaffolding/references/README.md
Normal file
25
skills/helm-chart-scaffolding/references/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
500
skills/helm-chart-scaffolding/references/chart-structure.md
Normal file
500
skills/helm-chart-scaffolding/references/chart-structure.md
Normal file
|
|
@ -0,0 +1,500 @@
|
|||
# Helm Chart Structure Reference
|
||||
|
||||
Complete guide to Helm chart organization, file conventions, and best practices.
|
||||
|
||||
## Standard Chart Directory Structure
|
||||
|
||||
```
|
||||
my-app/
|
||||
├── Chart.yaml # Chart metadata (required)
|
||||
├── Chart.lock # Dependency lock file (generated)
|
||||
├── values.yaml # Default configuration values (required)
|
||||
├── values.schema.json # JSON schema for values validation
|
||||
├── .helmignore # Patterns to ignore when packaging
|
||||
├── README.md # Chart documentation
|
||||
├── LICENSE # Chart license
|
||||
├── charts/ # Chart dependencies (bundled)
|
||||
│ └── postgresql-12.0.0.tgz
|
||||
├── crds/ # Custom Resource Definitions
|
||||
│ └── my-crd.yaml
|
||||
├── templates/ # Kubernetes manifest templates (required)
|
||||
│ ├── NOTES.txt # Post-install instructions
|
||||
│ ├── _helpers.tpl # Template helper functions
|
||||
│ ├── deployment.yaml
|
||||
│ ├── service.yaml
|
||||
│ ├── ingress.yaml
|
||||
│ ├── configmap.yaml
|
||||
│ ├── secret.yaml
|
||||
│ ├── serviceaccount.yaml
|
||||
│ ├── hpa.yaml
|
||||
│ ├── pdb.yaml
|
||||
│ ├── networkpolicy.yaml
|
||||
│ └── tests/
|
||||
│ └── test-connection.yaml
|
||||
└── files/ # Additional files to include
|
||||
└── config/
|
||||
└── app.conf
|
||||
```
|
||||
|
||||
## Chart.yaml Specification
|
||||
|
||||
### API Version v2 (Helm 3+)
|
||||
|
||||
```yaml
|
||||
apiVersion: v2 # Required: API version
|
||||
name: my-application # Required: Chart name
|
||||
version: 1.2.3 # Required: Chart version (SemVer)
|
||||
appVersion: "2.5.0" # Application version
|
||||
description: A Helm chart for my application # Required
|
||||
type: application # Chart type: application or library
|
||||
keywords: # Search keywords
|
||||
- web
|
||||
- api
|
||||
- backend
|
||||
home: https://example.com # Project home page
|
||||
sources: # Source code URLs
|
||||
- https://github.com/example/my-app
|
||||
maintainers: # Maintainer list
|
||||
- name: John Doe
|
||||
email: john@example.com
|
||||
url: https://github.com/johndoe
|
||||
icon: https://example.com/icon.png # Chart icon URL
|
||||
kubeVersion: ">=1.24.0" # Compatible Kubernetes versions
|
||||
deprecated: false # Mark chart as deprecated
|
||||
annotations: # Arbitrary annotations
|
||||
example.com/release-notes: https://example.com/releases/v1.2.3
|
||||
dependencies: # Chart dependencies
|
||||
- name: postgresql
|
||||
version: "12.0.0"
|
||||
repository: "https://charts.bitnami.com/bitnami"
|
||||
condition: postgresql.enabled
|
||||
tags:
|
||||
- database
|
||||
import-values:
|
||||
- child: database
|
||||
parent: database
|
||||
alias: db
|
||||
```
|
||||
|
||||
## Chart Types
|
||||
|
||||
### Application Chart
|
||||
```yaml
|
||||
type: application
|
||||
```
|
||||
- Standard Kubernetes applications
|
||||
- Can be installed and managed
|
||||
- Contains templates for K8s resources
|
||||
|
||||
### Library Chart
|
||||
```yaml
|
||||
type: library
|
||||
```
|
||||
- Shared template helpers
|
||||
- Cannot be installed directly
|
||||
- Used as dependency by other charts
|
||||
- No templates/ directory
|
||||
|
||||
## Values Files Organization
|
||||
|
||||
### values.yaml (defaults)
|
||||
```yaml
|
||||
# Global values (shared with subcharts)
|
||||
global:
|
||||
imageRegistry: docker.io
|
||||
imagePullSecrets: []
|
||||
|
||||
# Image configuration
|
||||
image:
|
||||
registry: docker.io
|
||||
repository: myapp/web
|
||||
tag: "" # Defaults to .Chart.AppVersion
|
||||
pullPolicy: IfNotPresent
|
||||
|
||||
# Deployment settings
|
||||
replicaCount: 1
|
||||
revisionHistoryLimit: 10
|
||||
|
||||
# Pod configuration
|
||||
podAnnotations: {}
|
||||
podSecurityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
fsGroup: 1000
|
||||
|
||||
# Container security
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
|
||||
# Service
|
||||
service:
|
||||
type: ClusterIP
|
||||
port: 80
|
||||
targetPort: http
|
||||
annotations: {}
|
||||
|
||||
# Resources
|
||||
resources:
|
||||
limits:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
|
||||
# Autoscaling
|
||||
autoscaling:
|
||||
enabled: false
|
||||
minReplicas: 1
|
||||
maxReplicas: 100
|
||||
targetCPUUtilizationPercentage: 80
|
||||
|
||||
# Node selection
|
||||
nodeSelector: {}
|
||||
tolerations: []
|
||||
affinity: {}
|
||||
|
||||
# Monitoring
|
||||
serviceMonitor:
|
||||
enabled: false
|
||||
interval: 30s
|
||||
```
|
||||
|
||||
### values.schema.json (validation)
|
||||
```json
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft-07/schema#",
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"replicaCount": {
|
||||
"type": "integer",
|
||||
"minimum": 1
|
||||
},
|
||||
"image": {
|
||||
"type": "object",
|
||||
"required": ["repository"],
|
||||
"properties": {
|
||||
"repository": {
|
||||
"type": "string"
|
||||
},
|
||||
"tag": {
|
||||
"type": "string"
|
||||
},
|
||||
"pullPolicy": {
|
||||
"type": "string",
|
||||
"enum": ["Always", "IfNotPresent", "Never"]
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"required": ["image"]
|
||||
}
|
||||
```
|
||||
|
||||
## Template Files
|
||||
|
||||
### Template Naming Conventions
|
||||
|
||||
- **Lowercase with hyphens**: `deployment.yaml`, `service-account.yaml`
|
||||
- **Partial templates**: Prefix with underscore `_helpers.tpl`
|
||||
- **Tests**: Place in `templates/tests/`
|
||||
- **CRDs**: Place in `crds/` (not templated)
|
||||
|
||||
### Common Templates
|
||||
|
||||
#### _helpers.tpl
|
||||
```yaml
|
||||
{{/*
|
||||
Standard naming helpers
|
||||
*/}}
|
||||
{{- define "my-app.name" -}}
|
||||
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}}
|
||||
{{- end -}}
|
||||
|
||||
{{- define "my-app.fullname" -}}
|
||||
{{- if .Values.fullnameOverride -}}
|
||||
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}}
|
||||
{{- else -}}
|
||||
{{- $name := default .Chart.Name .Values.nameOverride -}}
|
||||
{{- if contains $name .Release.Name -}}
|
||||
{{- .Release.Name | trunc 63 | trimSuffix "-" -}}
|
||||
{{- else -}}
|
||||
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}}
|
||||
{{- end -}}
|
||||
{{- end -}}
|
||||
{{- end -}}
|
||||
|
||||
{{- define "my-app.chart" -}}
|
||||
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}}
|
||||
{{- end -}}
|
||||
|
||||
{{/*
|
||||
Common labels
|
||||
*/}}
|
||||
{{- define "my-app.labels" -}}
|
||||
helm.sh/chart: {{ include "my-app.chart" . }}
|
||||
{{ include "my-app.selectorLabels" . }}
|
||||
{{- if .Chart.AppVersion }}
|
||||
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
|
||||
{{- end }}
|
||||
app.kubernetes.io/managed-by: {{ .Release.Service }}
|
||||
{{- end -}}
|
||||
|
||||
{{- define "my-app.selectorLabels" -}}
|
||||
app.kubernetes.io/name: {{ include "my-app.name" . }}
|
||||
app.kubernetes.io/instance: {{ .Release.Name }}
|
||||
{{- end -}}
|
||||
|
||||
{{/*
|
||||
Image name helper
|
||||
*/}}
|
||||
{{- define "my-app.image" -}}
|
||||
{{- $registry := .Values.global.imageRegistry | default .Values.image.registry -}}
|
||||
{{- $repository := .Values.image.repository -}}
|
||||
{{- $tag := .Values.image.tag | default .Chart.AppVersion -}}
|
||||
{{- printf "%s/%s:%s" $registry $repository $tag -}}
|
||||
{{- end -}}
|
||||
```
|
||||
|
||||
#### NOTES.txt
|
||||
```
|
||||
Thank you for installing {{ .Chart.Name }}.
|
||||
|
||||
Your release is named {{ .Release.Name }}.
|
||||
|
||||
To learn more about the release, try:
|
||||
|
||||
$ helm status {{ .Release.Name }}
|
||||
$ helm get all {{ .Release.Name }}
|
||||
|
||||
{{- if .Values.ingress.enabled }}
|
||||
|
||||
Application URL:
|
||||
{{- range .Values.ingress.hosts }}
|
||||
http{{ if $.Values.ingress.tls }}s{{ end }}://{{ .host }}{{ .path }}
|
||||
{{- end }}
|
||||
{{- else }}
|
||||
|
||||
Get the application URL by running:
|
||||
export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "my-app.name" . }}" -o jsonpath="{.items[0].metadata.name}")
|
||||
kubectl port-forward $POD_NAME 8080:80
|
||||
echo "Visit http://127.0.0.1:8080"
|
||||
{{- end }}
|
||||
```
|
||||
|
||||
## Dependencies Management
|
||||
|
||||
### Declaring Dependencies
|
||||
|
||||
```yaml
|
||||
# Chart.yaml
|
||||
dependencies:
|
||||
- name: postgresql
|
||||
version: "12.0.0"
|
||||
repository: "https://charts.bitnami.com/bitnami"
|
||||
condition: postgresql.enabled # Enable/disable via values
|
||||
tags: # Group dependencies
|
||||
- database
|
||||
import-values: # Import values from subchart
|
||||
- child: database
|
||||
parent: database
|
||||
alias: db # Reference as .Values.db
|
||||
```
|
||||
|
||||
### Managing Dependencies
|
||||
|
||||
```bash
|
||||
# Update dependencies
|
||||
helm dependency update
|
||||
|
||||
# List dependencies
|
||||
helm dependency list
|
||||
|
||||
# Build dependencies
|
||||
helm dependency build
|
||||
```
|
||||
|
||||
### Chart.lock
|
||||
|
||||
Generated automatically by `helm dependency update`:
|
||||
|
||||
```yaml
|
||||
dependencies:
|
||||
- name: postgresql
|
||||
repository: https://charts.bitnami.com/bitnami
|
||||
version: 12.0.0
|
||||
digest: sha256:abcd1234...
|
||||
generated: "2024-01-01T00:00:00Z"
|
||||
```
|
||||
|
||||
## .helmignore
|
||||
|
||||
Exclude files from chart package:
|
||||
|
||||
```
|
||||
# Development files
|
||||
.git/
|
||||
.gitignore
|
||||
*.md
|
||||
docs/
|
||||
|
||||
# Build artifacts
|
||||
*.swp
|
||||
*.bak
|
||||
*.tmp
|
||||
*.orig
|
||||
|
||||
# CI/CD
|
||||
.travis.yml
|
||||
.gitlab-ci.yml
|
||||
Jenkinsfile
|
||||
|
||||
# Testing
|
||||
test/
|
||||
*.test
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.iml
|
||||
```
|
||||
|
||||
## Custom Resource Definitions (CRDs)
|
||||
|
||||
Place CRDs in `crds/` directory:
|
||||
|
||||
```
|
||||
crds/
|
||||
├── my-app-crd.yaml
|
||||
└── another-crd.yaml
|
||||
```
|
||||
|
||||
**Important CRD notes:**
|
||||
- CRDs are installed before any templates
|
||||
- CRDs are NOT templated (no `{{ }}` syntax)
|
||||
- CRDs are NOT upgraded or deleted with chart
|
||||
- Use `helm install --skip-crds` to skip installation
|
||||
|
||||
## Chart Versioning
|
||||
|
||||
### Semantic Versioning
|
||||
|
||||
- **Chart Version**: Increment when chart changes
|
||||
- MAJOR: Breaking changes
|
||||
- MINOR: New features, backward compatible
|
||||
- PATCH: Bug fixes
|
||||
|
||||
- **App Version**: Application version being deployed
|
||||
- Can be any string
|
||||
- Not required to follow SemVer
|
||||
|
||||
```yaml
|
||||
version: 2.3.1 # Chart version
|
||||
appVersion: "1.5.0" # Application version
|
||||
```
|
||||
|
||||
## Chart Testing
|
||||
|
||||
### Test Files
|
||||
|
||||
```yaml
|
||||
# templates/tests/test-connection.yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: "{{ include "my-app.fullname" . }}-test-connection"
|
||||
annotations:
|
||||
"helm.sh/hook": test
|
||||
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
|
||||
spec:
|
||||
containers:
|
||||
- name: wget
|
||||
image: busybox
|
||||
command: ['wget']
|
||||
args: ['{{ include "my-app.fullname" . }}:{{ .Values.service.port }}']
|
||||
restartPolicy: Never
|
||||
```
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
helm test my-release
|
||||
helm test my-release --logs
|
||||
```
|
||||
|
||||
## Hooks
|
||||
|
||||
Helm hooks allow intervention at specific points:
|
||||
|
||||
```yaml
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
name: {{ include "my-app.fullname" . }}-migration
|
||||
annotations:
|
||||
"helm.sh/hook": pre-upgrade,pre-install
|
||||
"helm.sh/hook-weight": "-5"
|
||||
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
|
||||
```
|
||||
|
||||
### Hook Types
|
||||
|
||||
- `pre-install`: Before templates rendered
|
||||
- `post-install`: After all resources loaded
|
||||
- `pre-delete`: Before any resources deleted
|
||||
- `post-delete`: After all resources deleted
|
||||
- `pre-upgrade`: Before upgrade
|
||||
- `post-upgrade`: After upgrade
|
||||
- `pre-rollback`: Before rollback
|
||||
- `post-rollback`: After rollback
|
||||
- `test`: Run with `helm test`
|
||||
|
||||
### Hook Weight
|
||||
|
||||
Controls hook execution order (-5 to 5, lower runs first)
|
||||
|
||||
### Hook Deletion Policies
|
||||
|
||||
- `before-hook-creation`: Delete previous hook before new one
|
||||
- `hook-succeeded`: Delete after successful execution
|
||||
- `hook-failed`: Delete if hook fails
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use helpers** for repeated template logic
|
||||
2. **Quote strings** in templates: `{{ .Values.name | quote }}`
|
||||
3. **Validate values** with values.schema.json
|
||||
4. **Document all values** in values.yaml
|
||||
5. **Use semantic versioning** for chart versions
|
||||
6. **Pin dependency versions** exactly
|
||||
7. **Include NOTES.txt** with usage instructions
|
||||
8. **Add tests** for critical functionality
|
||||
9. **Use hooks** for database migrations
|
||||
10. **Keep charts focused** - one application per chart
|
||||
|
||||
## Chart Repository Structure
|
||||
|
||||
```
|
||||
helm-charts/
|
||||
├── index.yaml
|
||||
├── my-app-1.0.0.tgz
|
||||
├── my-app-1.1.0.tgz
|
||||
├── my-app-1.2.0.tgz
|
||||
└── another-chart-2.0.0.tgz
|
||||
```
|
||||
|
||||
### Creating Repository Index
|
||||
|
||||
```bash
|
||||
helm repo index . --url https://charts.example.com
|
||||
```
|
||||
|
||||
## Related Resources
|
||||
|
||||
- [Helm Documentation](https://helm.sh/docs/)
|
||||
- [Chart Template Guide](https://helm.sh/docs/chart_template_guide/)
|
||||
- [Best Practices](https://helm.sh/docs/chart_best_practices/)
|
||||
25
skills/helm-chart-scaffolding/resources/README.md
Normal file
25
skills/helm-chart-scaffolding/resources/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
|
|
@ -0,0 +1,543 @@
|
|||
# Helm Chart Scaffolding Implementation Playbook
|
||||
|
||||
This file contains detailed patterns, checklists, and code samples referenced by the skill.
|
||||
|
||||
# Helm Chart Scaffolding
|
||||
|
||||
Comprehensive guidance for creating, organizing, and managing Helm charts for packaging and deploying Kubernetes applications.
|
||||
|
||||
## Purpose
|
||||
|
||||
This skill provides step-by-step instructions for building production-ready Helm charts, including chart structure, templating patterns, values management, and validation strategies.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when you need to:
|
||||
- Create new Helm charts from scratch
|
||||
- Package Kubernetes applications for distribution
|
||||
- Manage multi-environment deployments with Helm
|
||||
- Implement templating for reusable Kubernetes manifests
|
||||
- Set up Helm chart repositories
|
||||
- Follow Helm best practices and conventions
|
||||
|
||||
## Helm Overview
|
||||
|
||||
**Helm** is the package manager for Kubernetes that:
|
||||
- Templates Kubernetes manifests for reusability
|
||||
- Manages application releases and rollbacks
|
||||
- Handles dependencies between charts
|
||||
- Provides version control for deployments
|
||||
- Simplifies configuration management across environments
|
||||
|
||||
## Step-by-Step Workflow
|
||||
|
||||
### 1. Initialize Chart Structure
|
||||
|
||||
**Create new chart:**
|
||||
```bash
|
||||
helm create my-app
|
||||
```
|
||||
|
||||
**Standard chart structure:**
|
||||
```
|
||||
my-app/
|
||||
├── Chart.yaml # Chart metadata
|
||||
├── values.yaml # Default configuration values
|
||||
├── charts/ # Chart dependencies
|
||||
├── templates/ # Kubernetes manifest templates
|
||||
│ ├── NOTES.txt # Post-install notes
|
||||
│ ├── _helpers.tpl # Template helpers
|
||||
│ ├── deployment.yaml
|
||||
│ ├── service.yaml
|
||||
│ ├── ingress.yaml
|
||||
│ ├── serviceaccount.yaml
|
||||
│ ├── hpa.yaml
|
||||
│ └── tests/
|
||||
│ └── test-connection.yaml
|
||||
└── .helmignore # Files to ignore
|
||||
```
|
||||
|
||||
### 2. Configure Chart.yaml
|
||||
|
||||
**Chart metadata defines the package:**
|
||||
|
||||
```yaml
|
||||
apiVersion: v2
|
||||
name: my-app
|
||||
description: A Helm chart for My Application
|
||||
type: application
|
||||
version: 1.0.0 # Chart version
|
||||
appVersion: "2.1.0" # Application version
|
||||
|
||||
# Keywords for chart discovery
|
||||
keywords:
|
||||
- web
|
||||
- api
|
||||
- backend
|
||||
|
||||
# Maintainer information
|
||||
maintainers:
|
||||
- name: DevOps Team
|
||||
email: devops@example.com
|
||||
url: https://github.com/example/my-app
|
||||
|
||||
# Source code repository
|
||||
sources:
|
||||
- https://github.com/example/my-app
|
||||
|
||||
# Homepage
|
||||
home: https://example.com
|
||||
|
||||
# Chart icon
|
||||
icon: https://example.com/icon.png
|
||||
|
||||
# Dependencies
|
||||
dependencies:
|
||||
- name: postgresql
|
||||
version: "12.0.0"
|
||||
repository: "https://charts.bitnami.com/bitnami"
|
||||
condition: postgresql.enabled
|
||||
- name: redis
|
||||
version: "17.0.0"
|
||||
repository: "https://charts.bitnami.com/bitnami"
|
||||
condition: redis.enabled
|
||||
```
|
||||
|
||||
**Reference:** See `assets/Chart.yaml.template` for complete example
|
||||
|
||||
### 3. Design values.yaml Structure
|
||||
|
||||
**Organize values hierarchically:**
|
||||
|
||||
```yaml
|
||||
# Image configuration
|
||||
image:
|
||||
repository: myapp
|
||||
tag: "1.0.0"
|
||||
pullPolicy: IfNotPresent
|
||||
|
||||
# Number of replicas
|
||||
replicaCount: 3
|
||||
|
||||
# Service configuration
|
||||
service:
|
||||
type: ClusterIP
|
||||
port: 80
|
||||
targetPort: 8080
|
||||
|
||||
# Ingress configuration
|
||||
ingress:
|
||||
enabled: false
|
||||
className: nginx
|
||||
hosts:
|
||||
- host: app.example.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
|
||||
# Resources
|
||||
resources:
|
||||
requests:
|
||||
memory: "256Mi"
|
||||
cpu: "250m"
|
||||
limits:
|
||||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
|
||||
# Autoscaling
|
||||
autoscaling:
|
||||
enabled: false
|
||||
minReplicas: 2
|
||||
maxReplicas: 10
|
||||
targetCPUUtilizationPercentage: 80
|
||||
|
||||
# Environment variables
|
||||
env:
|
||||
- name: LOG_LEVEL
|
||||
value: "info"
|
||||
|
||||
# ConfigMap data
|
||||
configMap:
|
||||
data:
|
||||
APP_MODE: production
|
||||
|
||||
# Dependencies
|
||||
postgresql:
|
||||
enabled: true
|
||||
auth:
|
||||
database: myapp
|
||||
username: myapp
|
||||
|
||||
redis:
|
||||
enabled: false
|
||||
```
|
||||
|
||||
**Reference:** See `assets/values.yaml.template` for complete structure
|
||||
|
||||
### 4. Create Template Files
|
||||
|
||||
**Use Go templating with Helm functions:**
|
||||
|
||||
**templates/deployment.yaml:**
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: {{ include "my-app.fullname" . }}
|
||||
labels:
|
||||
{{- include "my-app.labels" . | nindent 4 }}
|
||||
spec:
|
||||
{{- if not .Values.autoscaling.enabled }}
|
||||
replicas: {{ .Values.replicaCount }}
|
||||
{{- end }}
|
||||
selector:
|
||||
matchLabels:
|
||||
{{- include "my-app.selectorLabels" . | nindent 6 }}
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
{{- include "my-app.selectorLabels" . | nindent 8 }}
|
||||
spec:
|
||||
containers:
|
||||
- name: {{ .Chart.Name }}
|
||||
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
|
||||
imagePullPolicy: {{ .Values.image.pullPolicy }}
|
||||
ports:
|
||||
- name: http
|
||||
containerPort: {{ .Values.service.targetPort }}
|
||||
resources:
|
||||
{{- toYaml .Values.resources | nindent 12 }}
|
||||
env:
|
||||
{{- toYaml .Values.env | nindent 12 }}
|
||||
```
|
||||
|
||||
### 5. Create Template Helpers
|
||||
|
||||
**templates/_helpers.tpl:**
|
||||
```yaml
|
||||
{{/*
|
||||
Expand the name of the chart.
|
||||
*/}}
|
||||
{{- define "my-app.name" -}}
|
||||
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Create a default fully qualified app name.
|
||||
*/}}
|
||||
{{- define "my-app.fullname" -}}
|
||||
{{- if .Values.fullnameOverride }}
|
||||
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
|
||||
{{- else }}
|
||||
{{- $name := default .Chart.Name .Values.nameOverride }}
|
||||
{{- if contains $name .Release.Name }}
|
||||
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
|
||||
{{- else }}
|
||||
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Common labels
|
||||
*/}}
|
||||
{{- define "my-app.labels" -}}
|
||||
helm.sh/chart: {{ include "my-app.chart" . }}
|
||||
{{ include "my-app.selectorLabels" . }}
|
||||
{{- if .Chart.AppVersion }}
|
||||
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
|
||||
{{- end }}
|
||||
app.kubernetes.io/managed-by: {{ .Release.Service }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Selector labels
|
||||
*/}}
|
||||
{{- define "my-app.selectorLabels" -}}
|
||||
app.kubernetes.io/name: {{ include "my-app.name" . }}
|
||||
app.kubernetes.io/instance: {{ .Release.Name }}
|
||||
{{- end }}
|
||||
```
|
||||
|
||||
### 6. Manage Dependencies
|
||||
|
||||
**Add dependencies in Chart.yaml:**
|
||||
```yaml
|
||||
dependencies:
|
||||
- name: postgresql
|
||||
version: "12.0.0"
|
||||
repository: "https://charts.bitnami.com/bitnami"
|
||||
condition: postgresql.enabled
|
||||
```
|
||||
|
||||
**Update dependencies:**
|
||||
```bash
|
||||
helm dependency update
|
||||
helm dependency build
|
||||
```
|
||||
|
||||
**Override dependency values:**
|
||||
```yaml
|
||||
# values.yaml
|
||||
postgresql:
|
||||
enabled: true
|
||||
auth:
|
||||
database: myapp
|
||||
username: myapp
|
||||
password: changeme
|
||||
primary:
|
||||
persistence:
|
||||
enabled: true
|
||||
size: 10Gi
|
||||
```
|
||||
|
||||
### 7. Test and Validate
|
||||
|
||||
**Validation commands:**
|
||||
```bash
|
||||
# Lint the chart
|
||||
helm lint my-app/
|
||||
|
||||
# Dry-run installation
|
||||
helm install my-app ./my-app --dry-run --debug
|
||||
|
||||
# Template rendering
|
||||
helm template my-app ./my-app
|
||||
|
||||
# Template with values
|
||||
helm template my-app ./my-app -f values-prod.yaml
|
||||
|
||||
# Show computed values
|
||||
helm show values ./my-app
|
||||
```
|
||||
|
||||
**Validation script:**
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
echo "Linting chart..."
|
||||
helm lint .
|
||||
|
||||
echo "Testing template rendering..."
|
||||
helm template test-release . --dry-run
|
||||
|
||||
echo "Checking for required values..."
|
||||
helm template test-release . --validate
|
||||
|
||||
echo "All validations passed!"
|
||||
```
|
||||
|
||||
**Reference:** See `scripts/validate-chart.sh`
|
||||
|
||||
### 8. Package and Distribute
|
||||
|
||||
**Package the chart:**
|
||||
```bash
|
||||
helm package my-app/
|
||||
# Creates: my-app-1.0.0.tgz
|
||||
```
|
||||
|
||||
**Create chart repository:**
|
||||
```bash
|
||||
# Create index
|
||||
helm repo index .
|
||||
|
||||
# Upload to repository
|
||||
# AWS S3 example
|
||||
aws s3 sync . s3://my-helm-charts/ --exclude "*" --include "*.tgz" --include "index.yaml"
|
||||
```
|
||||
|
||||
**Use the chart:**
|
||||
```bash
|
||||
helm repo add my-repo https://charts.example.com
|
||||
helm repo update
|
||||
helm install my-app my-repo/my-app
|
||||
```
|
||||
|
||||
### 9. Multi-Environment Configuration
|
||||
|
||||
**Environment-specific values files:**
|
||||
|
||||
```
|
||||
my-app/
|
||||
├── values.yaml # Defaults
|
||||
├── values-dev.yaml # Development
|
||||
├── values-staging.yaml # Staging
|
||||
└── values-prod.yaml # Production
|
||||
```
|
||||
|
||||
**values-prod.yaml:**
|
||||
```yaml
|
||||
replicaCount: 5
|
||||
|
||||
image:
|
||||
tag: "2.1.0"
|
||||
|
||||
resources:
|
||||
requests:
|
||||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
limits:
|
||||
memory: "1Gi"
|
||||
cpu: "1000m"
|
||||
|
||||
autoscaling:
|
||||
enabled: true
|
||||
minReplicas: 3
|
||||
maxReplicas: 20
|
||||
|
||||
ingress:
|
||||
enabled: true
|
||||
hosts:
|
||||
- host: app.example.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
|
||||
postgresql:
|
||||
enabled: true
|
||||
primary:
|
||||
persistence:
|
||||
size: 100Gi
|
||||
```
|
||||
|
||||
**Install with environment:**
|
||||
```bash
|
||||
helm install my-app ./my-app -f values-prod.yaml --namespace production
|
||||
```
|
||||
|
||||
### 10. Implement Hooks and Tests
|
||||
|
||||
**Pre-install hook:**
|
||||
```yaml
|
||||
# templates/pre-install-job.yaml
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
name: {{ include "my-app.fullname" . }}-db-setup
|
||||
annotations:
|
||||
"helm.sh/hook": pre-install
|
||||
"helm.sh/hook-weight": "-5"
|
||||
"helm.sh/hook-delete-policy": hook-succeeded
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: db-setup
|
||||
image: postgres:15
|
||||
command: ["psql", "-c", "CREATE DATABASE myapp"]
|
||||
restartPolicy: Never
|
||||
```
|
||||
|
||||
**Test connection:**
|
||||
```yaml
|
||||
# templates/tests/test-connection.yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: "{{ include "my-app.fullname" . }}-test-connection"
|
||||
annotations:
|
||||
"helm.sh/hook": test
|
||||
spec:
|
||||
containers:
|
||||
- name: wget
|
||||
image: busybox
|
||||
command: ['wget']
|
||||
args: ['{{ include "my-app.fullname" . }}:{{ .Values.service.port }}']
|
||||
restartPolicy: Never
|
||||
```
|
||||
|
||||
**Run tests:**
|
||||
```bash
|
||||
helm test my-app
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Pattern 1: Conditional Resources
|
||||
|
||||
```yaml
|
||||
{{- if .Values.ingress.enabled }}
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: {{ include "my-app.fullname" . }}
|
||||
spec:
|
||||
# ...
|
||||
{{- end }}
|
||||
```
|
||||
|
||||
### Pattern 2: Iterating Over Lists
|
||||
|
||||
```yaml
|
||||
env:
|
||||
{{- range .Values.env }}
|
||||
- name: {{ .name }}
|
||||
value: {{ .value | quote }}
|
||||
{{- end }}
|
||||
```
|
||||
|
||||
### Pattern 3: Including Files
|
||||
|
||||
```yaml
|
||||
data:
|
||||
config.yaml: |
|
||||
{{- .Files.Get "config/application.yaml" | nindent 4 }}
|
||||
```
|
||||
|
||||
### Pattern 4: Global Values
|
||||
|
||||
```yaml
|
||||
global:
|
||||
imageRegistry: docker.io
|
||||
imagePullSecrets:
|
||||
- name: regcred
|
||||
|
||||
# Use in templates:
|
||||
image: {{ .Values.global.imageRegistry }}/{{ .Values.image.repository }}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use semantic versioning** for chart and app versions
|
||||
2. **Document all values** in values.yaml with comments
|
||||
3. **Use template helpers** for repeated logic
|
||||
4. **Validate charts** before packaging
|
||||
5. **Pin dependency versions** explicitly
|
||||
6. **Use conditions** for optional resources
|
||||
7. **Follow naming conventions** (lowercase, hyphens)
|
||||
8. **Include NOTES.txt** with usage instructions
|
||||
9. **Add labels** consistently using helpers
|
||||
10. **Test installations** in all environments
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Template rendering errors:**
|
||||
```bash
|
||||
helm template my-app ./my-app --debug
|
||||
```
|
||||
|
||||
**Dependency issues:**
|
||||
```bash
|
||||
helm dependency update
|
||||
helm dependency list
|
||||
```
|
||||
|
||||
**Installation failures:**
|
||||
```bash
|
||||
helm install my-app ./my-app --dry-run --debug
|
||||
kubectl get events --sort-by='.lastTimestamp'
|
||||
```
|
||||
|
||||
## Reference Files
|
||||
|
||||
- `assets/Chart.yaml.template` - Chart metadata template
|
||||
- `assets/values.yaml.template` - Values structure template
|
||||
- `scripts/validate-chart.sh` - Validation script
|
||||
- `references/chart-structure.md` - Detailed chart organization
|
||||
|
||||
## Related Skills
|
||||
|
||||
- `k8s-manifest-generator` - For creating base Kubernetes manifests
|
||||
- `gitops-workflow` - For automated Helm chart deployments
|
||||
244
skills/helm-chart-scaffolding/scripts/validate-chart.sh
Executable file
244
skills/helm-chart-scaffolding/scripts/validate-chart.sh
Executable file
|
|
@ -0,0 +1,244 @@
|
|||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
CHART_DIR="${1:-.}"
|
||||
RELEASE_NAME="test-release"
|
||||
|
||||
echo "═══════════════════════════════════════════════════════"
|
||||
echo " Helm Chart Validation"
|
||||
echo "═══════════════════════════════════════════════════════"
|
||||
echo ""
|
||||
|
||||
# Colors
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
RED='\033[0;31m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
success() {
|
||||
echo -e "${GREEN}✓${NC} $1"
|
||||
}
|
||||
|
||||
warning() {
|
||||
echo -e "${YELLOW}⚠${NC} $1"
|
||||
}
|
||||
|
||||
error() {
|
||||
echo -e "${RED}✗${NC} $1"
|
||||
}
|
||||
|
||||
# Check if Helm is installed
|
||||
if ! command -v helm &> /dev/null; then
|
||||
error "Helm is not installed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "📦 Chart directory: $CHART_DIR"
|
||||
echo ""
|
||||
|
||||
# 1. Check chart structure
|
||||
echo "1️⃣ Checking chart structure..."
|
||||
if [ ! -f "$CHART_DIR/Chart.yaml" ]; then
|
||||
error "Chart.yaml not found"
|
||||
exit 1
|
||||
fi
|
||||
success "Chart.yaml exists"
|
||||
|
||||
if [ ! -f "$CHART_DIR/values.yaml" ]; then
|
||||
error "values.yaml not found"
|
||||
exit 1
|
||||
fi
|
||||
success "values.yaml exists"
|
||||
|
||||
if [ ! -d "$CHART_DIR/templates" ]; then
|
||||
error "templates/ directory not found"
|
||||
exit 1
|
||||
fi
|
||||
success "templates/ directory exists"
|
||||
echo ""
|
||||
|
||||
# 2. Lint the chart
|
||||
echo "2️⃣ Linting chart..."
|
||||
if helm lint "$CHART_DIR"; then
|
||||
success "Chart passed lint"
|
||||
else
|
||||
error "Chart failed lint"
|
||||
exit 1
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# 3. Check Chart.yaml
|
||||
echo "3️⃣ Validating Chart.yaml..."
|
||||
CHART_NAME=$(grep "^name:" "$CHART_DIR/Chart.yaml" | awk '{print $2}')
|
||||
CHART_VERSION=$(grep "^version:" "$CHART_DIR/Chart.yaml" | awk '{print $2}')
|
||||
APP_VERSION=$(grep "^appVersion:" "$CHART_DIR/Chart.yaml" | awk '{print $2}' | tr -d '"')
|
||||
|
||||
if [ -z "$CHART_NAME" ]; then
|
||||
error "Chart name not found"
|
||||
exit 1
|
||||
fi
|
||||
success "Chart name: $CHART_NAME"
|
||||
|
||||
if [ -z "$CHART_VERSION" ]; then
|
||||
error "Chart version not found"
|
||||
exit 1
|
||||
fi
|
||||
success "Chart version: $CHART_VERSION"
|
||||
|
||||
if [ -z "$APP_VERSION" ]; then
|
||||
warning "App version not specified"
|
||||
else
|
||||
success "App version: $APP_VERSION"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# 4. Test template rendering
|
||||
echo "4️⃣ Testing template rendering..."
|
||||
if helm template "$RELEASE_NAME" "$CHART_DIR" > /dev/null 2>&1; then
|
||||
success "Templates rendered successfully"
|
||||
else
|
||||
error "Template rendering failed"
|
||||
helm template "$RELEASE_NAME" "$CHART_DIR"
|
||||
exit 1
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# 5. Dry-run installation
|
||||
echo "5️⃣ Testing dry-run installation..."
|
||||
if helm install "$RELEASE_NAME" "$CHART_DIR" --dry-run --debug > /dev/null 2>&1; then
|
||||
success "Dry-run installation successful"
|
||||
else
|
||||
error "Dry-run installation failed"
|
||||
exit 1
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# 6. Check for required Kubernetes resources
|
||||
echo "6️⃣ Checking generated resources..."
|
||||
MANIFESTS=$(helm template "$RELEASE_NAME" "$CHART_DIR")
|
||||
|
||||
if echo "$MANIFESTS" | grep -q "kind: Deployment"; then
|
||||
success "Deployment found"
|
||||
else
|
||||
warning "No Deployment found"
|
||||
fi
|
||||
|
||||
if echo "$MANIFESTS" | grep -q "kind: Service"; then
|
||||
success "Service found"
|
||||
else
|
||||
warning "No Service found"
|
||||
fi
|
||||
|
||||
if echo "$MANIFESTS" | grep -q "kind: ServiceAccount"; then
|
||||
success "ServiceAccount found"
|
||||
else
|
||||
warning "No ServiceAccount found"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# 7. Check for security best practices
|
||||
echo "7️⃣ Checking security best practices..."
|
||||
if echo "$MANIFESTS" | grep -q "runAsNonRoot: true"; then
|
||||
success "Running as non-root user"
|
||||
else
|
||||
warning "Not explicitly running as non-root"
|
||||
fi
|
||||
|
||||
if echo "$MANIFESTS" | grep -q "readOnlyRootFilesystem: true"; then
|
||||
success "Using read-only root filesystem"
|
||||
else
|
||||
warning "Not using read-only root filesystem"
|
||||
fi
|
||||
|
||||
if echo "$MANIFESTS" | grep -q "allowPrivilegeEscalation: false"; then
|
||||
success "Privilege escalation disabled"
|
||||
else
|
||||
warning "Privilege escalation not explicitly disabled"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# 8. Check for resource limits
|
||||
echo "8️⃣ Checking resource configuration..."
|
||||
if echo "$MANIFESTS" | grep -q "resources:"; then
|
||||
if echo "$MANIFESTS" | grep -q "limits:"; then
|
||||
success "Resource limits defined"
|
||||
else
|
||||
warning "No resource limits defined"
|
||||
fi
|
||||
if echo "$MANIFESTS" | grep -q "requests:"; then
|
||||
success "Resource requests defined"
|
||||
else
|
||||
warning "No resource requests defined"
|
||||
fi
|
||||
else
|
||||
warning "No resources defined"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# 9. Check for health probes
|
||||
echo "9️⃣ Checking health probes..."
|
||||
if echo "$MANIFESTS" | grep -q "livenessProbe:"; then
|
||||
success "Liveness probe configured"
|
||||
else
|
||||
warning "No liveness probe found"
|
||||
fi
|
||||
|
||||
if echo "$MANIFESTS" | grep -q "readinessProbe:"; then
|
||||
success "Readiness probe configured"
|
||||
else
|
||||
warning "No readiness probe found"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# 10. Check dependencies
|
||||
if [ -f "$CHART_DIR/Chart.yaml" ] && grep -q "^dependencies:" "$CHART_DIR/Chart.yaml"; then
|
||||
echo "🔟 Checking dependencies..."
|
||||
if helm dependency list "$CHART_DIR" > /dev/null 2>&1; then
|
||||
success "Dependencies valid"
|
||||
|
||||
if [ -f "$CHART_DIR/Chart.lock" ]; then
|
||||
success "Chart.lock file present"
|
||||
else
|
||||
warning "Chart.lock file missing (run 'helm dependency update')"
|
||||
fi
|
||||
else
|
||||
error "Dependencies check failed"
|
||||
fi
|
||||
echo ""
|
||||
fi
|
||||
|
||||
# 11. Check for values schema
|
||||
if [ -f "$CHART_DIR/values.schema.json" ]; then
|
||||
echo "1️⃣1️⃣ Validating values schema..."
|
||||
success "values.schema.json present"
|
||||
|
||||
# Validate schema if jq is available
|
||||
if command -v jq &> /dev/null; then
|
||||
if jq empty "$CHART_DIR/values.schema.json" 2>/dev/null; then
|
||||
success "values.schema.json is valid JSON"
|
||||
else
|
||||
error "values.schema.json contains invalid JSON"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
echo ""
|
||||
fi
|
||||
|
||||
# Summary
|
||||
echo "═══════════════════════════════════════════════════════"
|
||||
echo " Validation Complete!"
|
||||
echo "═══════════════════════════════════════════════════════"
|
||||
echo ""
|
||||
echo "Chart: $CHART_NAME"
|
||||
echo "Version: $CHART_VERSION"
|
||||
if [ -n "$APP_VERSION" ]; then
|
||||
echo "App Version: $APP_VERSION"
|
||||
fi
|
||||
echo ""
|
||||
success "All validations passed!"
|
||||
echo ""
|
||||
echo "Next steps:"
|
||||
echo " • helm package $CHART_DIR"
|
||||
echo " • helm install my-release $CHART_DIR"
|
||||
echo " • helm test my-release"
|
||||
echo ""
|
||||
25
skills/historical-pattern-analysis/README.md
Normal file
25
skills/historical-pattern-analysis/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
228
skills/historical-pattern-analysis/SKILL.md
Normal file
228
skills/historical-pattern-analysis/SKILL.md
Normal file
|
|
@ -0,0 +1,228 @@
|
|||
---
|
||||
name: historical-pattern-analysis
|
||||
description: Use when analyzing git history and past changes to identify patterns, recurring issues, and lessons learned from infrastructure changes.
|
||||
---
|
||||
|
||||
# Historical Pattern Analysis
|
||||
|
||||
## Overview
|
||||
|
||||
Analyze git history and memory to learn from past infrastructure changes. Identify patterns, recurring issues, and apply lessons learned to current work.
|
||||
|
||||
**Announce at start:** "I'm using the historical-pattern-analysis skill to learn from past changes."
|
||||
|
||||
## When to Use
|
||||
|
||||
- Before making changes similar to past changes
|
||||
- When investigating recurring issues
|
||||
- To understand why infrastructure is configured a certain way
|
||||
- To identify change patterns and team practices
|
||||
|
||||
## Process
|
||||
|
||||
### Step 1: Define Search Scope
|
||||
|
||||
Determine what history to analyze:
|
||||
- Specific resources being changed
|
||||
- Time period (last month, quarter, year)
|
||||
- Specific team members or patterns
|
||||
|
||||
### Step 2: Git Archaeology
|
||||
|
||||
#### Find Related Commits
|
||||
|
||||
```bash
|
||||
# Commits touching specific files
|
||||
git log --oneline -20 -- "path/to/module/*.tf"
|
||||
|
||||
# Commits mentioning resource types
|
||||
git log --oneline -20 --grep="aws_security_group"
|
||||
|
||||
# Commits by pattern in message
|
||||
git log --oneline -20 --grep="fix\|rollback\|revert"
|
||||
|
||||
# Commits in date range
|
||||
git log --oneline --since="2024-01-01" --until="2024-06-01" -- "*.tf"
|
||||
```
|
||||
|
||||
#### Analyze Commit Patterns
|
||||
|
||||
```bash
|
||||
# Most frequently changed files
|
||||
git log --pretty=format: --name-only -- "*.tf" | sort | uniq -c | sort -rn | head -20
|
||||
|
||||
# Authors and their focus areas
|
||||
git shortlog -sn -- "environments/prod/"
|
||||
|
||||
# Change frequency by day/time
|
||||
git log --format="%ad" --date=format:"%A %H:00" -- "*.tf" | sort | uniq -c
|
||||
```
|
||||
|
||||
#### Find Reverts and Fixes
|
||||
|
||||
```bash
|
||||
# Revert commits
|
||||
git log --oneline --grep="revert\|Revert"
|
||||
|
||||
# Fix commits following changes
|
||||
git log --oneline --grep="fix\|hotfix\|Fix"
|
||||
|
||||
# Commits with "URGENT" or "EMERGENCY"
|
||||
git log --oneline --grep="urgent\|emergency" -i
|
||||
```
|
||||
|
||||
### Step 3: Analyze Change Patterns
|
||||
|
||||
#### Coupling Analysis
|
||||
|
||||
Which files change together?
|
||||
```bash
|
||||
# For a specific file, what else changes with it?
|
||||
git log --pretty=format:"%H" -- "modules/vpc/main.tf" | \
|
||||
xargs -I {} git show --name-only --pretty=format: {} | \
|
||||
sort | uniq -c | sort -rn | head -20
|
||||
```
|
||||
|
||||
#### Change Sequences
|
||||
|
||||
Common sequences of changes:
|
||||
1. VPC changes → followed by security group changes
|
||||
2. IAM role changes → followed by policy attachments
|
||||
3. RDS changes → followed by parameter group changes
|
||||
|
||||
#### Time Patterns
|
||||
|
||||
- Are prod changes clustered on certain days?
|
||||
- Are there "risky" times based on past incidents?
|
||||
- How long between staging and prod deployments?
|
||||
|
||||
### Step 4: Query Memory
|
||||
|
||||
Check stored patterns:
|
||||
```
|
||||
memory/projects/<hash>/patterns.json
|
||||
memory/projects/<hash>/incidents.json
|
||||
```
|
||||
|
||||
Look for:
|
||||
- Similar past changes and outcomes
|
||||
- Known issues with these resources
|
||||
- User preferences for this type of change
|
||||
|
||||
### Step 5: Identify Lessons
|
||||
|
||||
#### From Incidents
|
||||
|
||||
For each past incident:
|
||||
- What was the trigger?
|
||||
- How was it detected?
|
||||
- What was the fix?
|
||||
- What could have prevented it?
|
||||
|
||||
#### From Patterns
|
||||
|
||||
- What changes tend to cause problems?
|
||||
- What practices lead to success?
|
||||
- What review processes work well?
|
||||
|
||||
### Step 6: Generate Report
|
||||
|
||||
```markdown
|
||||
## Historical Pattern Analysis
|
||||
|
||||
### Search Scope
|
||||
- Resources: [resources being analyzed]
|
||||
- Time period: [date range]
|
||||
- Related commits found: [count]
|
||||
|
||||
### Change Frequency
|
||||
|
||||
| Resource/File | Changes (90d) | Last Changed | Primary Authors |
|
||||
|--------------|---------------|--------------|-----------------|
|
||||
| modules/vpc/main.tf | 12 | 2024-01-10 | alice, bob |
|
||||
| environments/prod/main.tf | 8 | 2024-01-08 | alice |
|
||||
|
||||
### Change Coupling
|
||||
|
||||
These resources typically change together:
|
||||
1. `aws_security_group.web` ↔ `aws_instance.web` (85% correlation)
|
||||
2. `aws_iam_role.app` ↔ `aws_iam_policy.app` (100% correlation)
|
||||
|
||||
### Past Incidents Related to These Resources
|
||||
|
||||
#### Incident: [Date] - [Title]
|
||||
- **Trigger:** [What caused it]
|
||||
- **Impact:** [What happened]
|
||||
- **Resolution:** [How it was fixed]
|
||||
- **Lesson:** [What we learned]
|
||||
- **Relevance:** [How this applies to current change]
|
||||
|
||||
### Patterns Identified
|
||||
|
||||
#### Pattern: [Pattern Name]
|
||||
- **Observation:** [What we see in history]
|
||||
- **Frequency:** [How often]
|
||||
- **Implication:** [What this means for current change]
|
||||
|
||||
### Risk Indicators
|
||||
|
||||
Based on historical data:
|
||||
| Indicator | Current Change | Historical Issues |
|
||||
|-----------|---------------|-------------------|
|
||||
| Similar to past incident | [Yes/No] | [Details] |
|
||||
| Frequently problematic resource | [Yes/No] | [Details] |
|
||||
| Changed by unfamiliar author | [Yes/No] | [Details] |
|
||||
|
||||
### Recommendations
|
||||
|
||||
Based on historical patterns:
|
||||
1. [Recommendation 1]
|
||||
2. [Recommendation 2]
|
||||
|
||||
### Questions Raised
|
||||
|
||||
[Questions that history suggests we should answer]
|
||||
```
|
||||
|
||||
### Step 7: Update Memory
|
||||
|
||||
Store new patterns discovered:
|
||||
```json
|
||||
{
|
||||
"patterns": [
|
||||
{
|
||||
"name": "vpc-sg-coupling",
|
||||
"description": "VPC changes often require SG updates",
|
||||
"confidence": 0.85,
|
||||
"last_seen": "2024-01-15"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Common Patterns to Look For
|
||||
|
||||
### Positive Patterns
|
||||
- Consistent naming conventions
|
||||
- Regular, small changes vs. big-bang updates
|
||||
- Changes preceded by plan review
|
||||
- Post-change validation
|
||||
|
||||
### Warning Patterns
|
||||
- Frequent reverts
|
||||
- Emergency fixes following changes
|
||||
- Clustered failures in specific areas
|
||||
- "Temporary" changes that persist
|
||||
|
||||
### Anti-Patterns
|
||||
- Direct prod changes without staging
|
||||
- Large changes without incremental steps
|
||||
- Missing documentation on complex changes
|
||||
- Recurring manual interventions
|
||||
|
||||
## Integration with Other Skills
|
||||
|
||||
This skill feeds into:
|
||||
- **terraform-plan-review**: Provides historical context for risk assessment
|
||||
- **terraform-drift-detection**: Identifies if drift matches past patterns
|
||||
- **provider-upgrade-analysis**: Shows past upgrade experiences
|
||||
145
skills/home-assistant-automation/SKILL.md
Normal file
145
skills/home-assistant-automation/SKILL.md
Normal file
|
|
@ -0,0 +1,145 @@
|
|||
---
|
||||
name: home-assistant-automation
|
||||
description: Use when writing, editing, or debugging Home Assistant automations or scripts for Zoe's HA instance at 10.0.2.6:8123. Covers entity discovery, modern YAML syntax, automation/script patterns, and live MCP testing.
|
||||
---
|
||||
|
||||
# Home Assistant Automation
|
||||
|
||||
## Overview
|
||||
|
||||
Write automations and scripts for Zoe's HA instance. You have live MCP access — use it. **Never guess entity IDs.** Always discover them first.
|
||||
|
||||
## HARD REQUIREMENT: Discover Entities Before Writing YAML
|
||||
|
||||
```
|
||||
GetLiveContext BEFORE any YAML. No exceptions.
|
||||
```
|
||||
|
||||
```python
|
||||
# By domain
|
||||
GetLiveContext(domain="light")
|
||||
GetLiveContext(domain="media_player")
|
||||
GetLiveContext(domain="siren")
|
||||
|
||||
# By area
|
||||
GetLiveContext(area="living room")
|
||||
GetLiveContext(area="office")
|
||||
|
||||
# By name (specific)
|
||||
GetLiveContext(name="doorbell")
|
||||
GetLiveContext(name="chime")
|
||||
```
|
||||
|
||||
Entity IDs drift and vary. If you write YAML without checking, it will break.
|
||||
|
||||
## Known Devices (verify with GetLiveContext before use)
|
||||
|
||||
| Device | Domain hint | Notes |
|
||||
|--------|------------|-------|
|
||||
| Amcrest AD410 doorbell | `binary_sensor` | Button press trigger |
|
||||
| Living room chime | `siren.living_room_chime_play_tone` | Use `siren.turn_on` |
|
||||
| Office chime | `siren.office_chime_play_tone` | Use `siren.turn_on` |
|
||||
| Side door lock | `select` | Lock timing entity |
|
||||
| Apple TV | `media_player` | Used for kiosk display dimming |
|
||||
| Raspberry Pi kiosk | family room dashboard | |
|
||||
| Season sensor | `sensor.season` | |
|
||||
|
||||
## Modern YAML Syntax (2024.x+)
|
||||
|
||||
Use **plural keys** for all top-level blocks:
|
||||
|
||||
```yaml
|
||||
alias: "Descriptive name"
|
||||
description: "What this does"
|
||||
triggers: # NOT trigger:
|
||||
- ...
|
||||
conditions: # NOT condition:
|
||||
- ...
|
||||
actions: # NOT action:
|
||||
- action: ... # service calls inside actions use "action:" key, NOT "service:"
|
||||
mode: single
|
||||
```
|
||||
|
||||
## Common Trigger Patterns
|
||||
|
||||
```yaml
|
||||
# State change with debounce
|
||||
- trigger: state
|
||||
entity_id: binary_sensor.doorbell_button
|
||||
to: "on"
|
||||
for: "00:00:02"
|
||||
|
||||
# Time
|
||||
- trigger: time
|
||||
at: "07:00:00"
|
||||
|
||||
# Sun offset
|
||||
- trigger: sun
|
||||
event: sunset
|
||||
offset: "+00:30:00"
|
||||
|
||||
# Template
|
||||
- trigger: template
|
||||
value_template: "{{ states('sensor.season') == 'winter' }}"
|
||||
```
|
||||
|
||||
## Common Action Patterns
|
||||
|
||||
```yaml
|
||||
# Light with brightness/color temp
|
||||
- action: light.turn_on
|
||||
target:
|
||||
entity_id: light.living_room
|
||||
data:
|
||||
brightness_pct: 80
|
||||
color_temp_kelvin: 3000
|
||||
|
||||
# Play chime (siren domain, turn_on action)
|
||||
- action: siren.turn_on
|
||||
target:
|
||||
entity_id: siren.living_room_chime_play_tone
|
||||
|
||||
# Conditional branch
|
||||
- choose:
|
||||
- conditions:
|
||||
- condition: state
|
||||
entity_id: sun.sun
|
||||
state: above_horizon
|
||||
sequence:
|
||||
- action: light.turn_on
|
||||
target:
|
||||
area_id: living_room
|
||||
default:
|
||||
- action: light.turn_off
|
||||
target:
|
||||
area_id: living_room
|
||||
|
||||
# Delay
|
||||
- delay: "00:05:00"
|
||||
|
||||
# Notify
|
||||
- action: notify.notify
|
||||
data:
|
||||
message: "Someone at the door"
|
||||
```
|
||||
|
||||
## Automation vs Script
|
||||
|
||||
- **Automation:** triggered by events/state/time — reactive behavior
|
||||
- **Script:** called manually or from other automations — reusable action sequences
|
||||
|
||||
## Testing
|
||||
|
||||
1. **Verify entity exists:** `GetLiveContext(name="whatever")` — confirm state and ID
|
||||
2. **Quick device test:** Use MCP action tools directly before writing YAML
|
||||
- `HassTurnOn`, `HassLightSet`, `HassSetVolume`, etc.
|
||||
3. **Test automation:** Paste YAML in HA UI → Settings → Automations → + → Edit in YAML → Run
|
||||
|
||||
## Gotchas
|
||||
|
||||
- Entity IDs are case-sensitive, use underscores
|
||||
- `area_id` in `target:` works for lights; not reliable for all domains
|
||||
- Chimes use `siren` domain — action is `siren.turn_on`, not `siren.play_tone`
|
||||
- `mode: single` blocks re-entry; use `restart` if you want it to restart mid-run
|
||||
- Apple TV dimming: check `media_player` state before acting on it
|
||||
- Template syntax: `{{ states('sensor.foo') }}` — never `states.sensor.foo`
|
||||
168
skills/incident-response/SKILL.md
Normal file
168
skills/incident-response/SKILL.md
Normal file
|
|
@ -0,0 +1,168 @@
|
|||
---
|
||||
name: incident-response
|
||||
description: Use when responding to production outages, data loss events, security incidents, or major service degradations in homelab (k3s/ansiblestack) or professional (AWS/EKS) environments. Applies at any severity — P1 complete outages to P4 minor issues.
|
||||
---
|
||||
|
||||
# Incident Response
|
||||
|
||||
## Overview
|
||||
|
||||
Structured response for production incidents. Severity scales the rigor. Homelab P3 is not work P1.
|
||||
|
||||
**Core principle:** Stabilize user impact FIRST. Understand why SECOND. Never diagnose in silence.
|
||||
|
||||
## Severity
|
||||
|
||||
| Severity | Definition | Response SLA | Examples |
|
||||
|----------|------------|--------------|---------|
|
||||
| P1 | Complete outage OR data loss OR security breach | Immediate (minutes) | Prod DB down, credentials leaked, all users blocked |
|
||||
| P2 | Major degradation, SLA at risk, significant user impact | Urgent (< 30 min) | 50%+ error rate, primary feature broken |
|
||||
| P3 | Partial degradation, workaround exists | Same day | One region/service slow, single feature broken |
|
||||
| P4 | Minor issue, no user impact | Within days | Monitoring gap, cosmetic issue |
|
||||
|
||||
## Phase 1: Triage (first 5-10 minutes)
|
||||
|
||||
Goal: confirm the incident, assess severity, start communication.
|
||||
|
||||
```
|
||||
1. CONFIRM — is this actually broken?
|
||||
- Check from multiple locations/devices
|
||||
- Check AWS Status / DigitalOcean Status / upstream providers
|
||||
- Ask: is anyone else seeing this?
|
||||
|
||||
2. SCOPE — who/what is affected?
|
||||
- Which services? Which regions? Which users?
|
||||
- Is data being lost RIGHT NOW?
|
||||
- Stable or getting worse?
|
||||
|
||||
3. DECLARE — P1/P2: declare immediately, don't wait to diagnose
|
||||
- Work: post in incident channel, page on-call, open incident ticket
|
||||
- Homelab: create Vikunja task, start BookStack incident page
|
||||
|
||||
4. ASSIGN ROLES (work P1/P2)
|
||||
- Incident Commander: coordinates, communicates, makes calls
|
||||
- Tech Lead: root cause investigation
|
||||
- Comms Lead: stakeholder updates
|
||||
- (Homelab: you're all three)
|
||||
```
|
||||
|
||||
## Phase 2: Stabilize (before root cause)
|
||||
|
||||
Fix user impact first. Common actions:
|
||||
|
||||
```bash
|
||||
# Roll back last deployment
|
||||
kubectl rollout undo deployment/<name> -n <ns>
|
||||
|
||||
# Scale up healthy replicas
|
||||
kubectl scale deploy/<name> --replicas=5 -n <ns>
|
||||
|
||||
# Check rollout history
|
||||
kubectl rollout history deployment/<name> -n <ns>
|
||||
```
|
||||
|
||||
Other mitigations:
|
||||
- Route traffic away from broken region/AZ
|
||||
- Disable the broken feature flag
|
||||
- Restore from backup (data loss)
|
||||
- Rotate credentials (security incident)
|
||||
|
||||
**A rollback that takes 5 minutes beats a fix that takes 2 hours.**
|
||||
|
||||
## Phase 3: Investigate (root cause)
|
||||
|
||||
Now that users are unblocked:
|
||||
|
||||
```bash
|
||||
# Recent events
|
||||
kubectl get events -n <ns> --sort-by='.lastTimestamp' | tail -30
|
||||
|
||||
# Logs (kubectl)
|
||||
kubectl logs -n <ns> deploy/<name> --since=1h
|
||||
|
||||
# Logs (Grafana Loki)
|
||||
{namespace="<ns>"}
|
||||
|
||||
# Describe node for resource pressure
|
||||
kubectl describe node <name>
|
||||
```
|
||||
|
||||
For AWS: CloudTrail, CloudWatch Logs, ALB access logs, X-Ray traces.
|
||||
|
||||
Check Grafana Mimir for the anomaly timestamp — find the inflection point.
|
||||
|
||||
## Phase 4: Resolve
|
||||
|
||||
1. Deploy actual fix (not just the stabilization mitigation)
|
||||
2. Verify service is healthy — not just "pods are running":
|
||||
- Check error rates in Grafana
|
||||
- Check latency is normal
|
||||
- Spot-check actual user flows
|
||||
3. Monitor 15-30 minutes before declaring resolved
|
||||
|
||||
## Phase 5: Communicate
|
||||
|
||||
**During incident (P1/P2 — every 15-30 minutes):**
|
||||
```
|
||||
[14:32 UTC] INCIDENT UPDATE — <service> degradation
|
||||
Status: Investigating
|
||||
Impact: <X users/services affected>
|
||||
Last action: Rolled back deployment v1.2.3
|
||||
Next update: 14:47 UTC
|
||||
```
|
||||
|
||||
**On resolution:**
|
||||
```
|
||||
[15:10 UTC] RESOLVED — <service> is operational
|
||||
Duration: 38 minutes (14:32–15:10 UTC)
|
||||
Root cause: <brief description>
|
||||
Fix applied: <what was done>
|
||||
Postmortem: <link or "to follow within 48h">
|
||||
```
|
||||
|
||||
**Work P1: never go silent for > 15 minutes. Communicate first, diagnose second.**
|
||||
|
||||
## Phase 6: Post-Incident
|
||||
|
||||
- Within 24-48h: write postmortem (use `writing-postmortem` skill if available)
|
||||
- Update runbooks with anything that was missing
|
||||
- Create Vikunja tasks for action items
|
||||
- Save incident timeline to BookStack
|
||||
|
||||
## Security Incidents: Extra Steps
|
||||
|
||||
Order matters — don't skip ahead:
|
||||
|
||||
1. **ISOLATE** — kill or network-isolate the compromised resource before investigating
|
||||
2. **PRESERVE** — snapshot, export logs before destroying anything
|
||||
3. **ROTATE** — all potentially exposed credentials immediately
|
||||
4. **NOTIFY** — security team, CISO, legal as appropriate
|
||||
5. **SCOPE before disclosing** — do not announce publicly until you understand blast radius
|
||||
|
||||
GDPR: data breaches require regulatory notification within 72 hours.
|
||||
|
||||
## Homelab Specifics
|
||||
|
||||
- Create Vikunja task in relevant project when declaring
|
||||
- Document timeline in BookStack: `Ansiblestack` book → new page `Incident YYYY-MM-DD: <title>`
|
||||
- No stakeholder comms needed, but still write the postmortem — future-you will thank you
|
||||
|
||||
## Common Homelab Incidents
|
||||
|
||||
| Incident | Quick fix |
|
||||
|----------|-----------|
|
||||
| OpenBao sealed | `kubectl exec -n openbao openbao-0 -- bao status` — should auto-unseal via OCI KMS; check OCI KMS key status if not |
|
||||
| ArgoCD all apps OutOfSync | Check Forgejo is reachable; check ArgoCD repo credentials |
|
||||
| cert-manager not issuing | Check DNS propagation; check DigitalOcean token; check cert-manager pod logs |
|
||||
| NFS storage unavailable | Check NFS server at 10.0.6.2; check pods in `nfs-provisioner` namespace |
|
||||
| All pods evicted | Node disk pressure — `kubectl describe node <name>`, check disk usage |
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
| Mistake | Reality |
|
||||
|---------|---------|
|
||||
| Diagnosing in silence for 30+ minutes | Communicate first, even with "investigating" |
|
||||
| Fixing before declaring | Declaration triggers backup/support; don't skip it |
|
||||
| Declaring resolved before monitoring | Check error rates and latency, not just pod status |
|
||||
| Investigating before stabilizing | Users are down while you read logs. Roll back first. |
|
||||
| Skipping postmortem on homelab | You will hit this again. Write it down. |
|
||||
228
skills/investigating-cluster-issue/SKILL.md
Normal file
228
skills/investigating-cluster-issue/SKILL.md
Normal file
|
|
@ -0,0 +1,228 @@
|
|||
---
|
||||
name: investigating-cluster-issue
|
||||
description: Use when debugging Kubernetes issues on Zoe's homelab k3s cluster (k3s v1.35, Cilium, Traefik, ArgoCD, OpenBao, Grafana stack) or on AWS EKS clusters — pod failures, sync errors, networking problems, storage issues, node failures, or any unexpected cluster behavior.
|
||||
---
|
||||
|
||||
# Investigating Cluster Issues
|
||||
|
||||
## Overview
|
||||
|
||||
Systematic triage for Kubernetes problems. Always run Level 1 first to establish ground truth before narrowing down. Resist the urge to jump straight to logs — node and pod status often reveals the real problem faster.
|
||||
|
||||
## Environment Reference
|
||||
|
||||
**k3s homelab:**
|
||||
- Nodes: master-01/02/03, worker-01/02, gpu-node
|
||||
- CNI: Cilium | Ingress: Traefik | GitOps: ArgoCD (`argocd.ctz.fyi`)
|
||||
- Secrets: External Secrets Operator + OpenBao (`bao.ctz.fyi`)
|
||||
- Monitoring: Grafana (`grafana.monitoring.ctz.fyi`) — Mimir, Loki, Tempo
|
||||
- Storage: `ssd` (NFS), `local-path`
|
||||
- Registry: Harbor (`registry.ctz.fyi`)
|
||||
- Key namespaces: `argocd`, `monitoring`, `keycloak`, `external-secrets`, `cert-manager`, `traefik`, `openbao`
|
||||
|
||||
**EKS:**
|
||||
- Addons: aws-load-balancer-controller, external-dns, cluster-autoscaler, kube-prometheus-stack
|
||||
- Storage: EBS CSI (`gp3` preferred), EFS for shared
|
||||
- Auth: IRSA for pod AWS access
|
||||
- Networking: aws-vpc-cni or Cilium + Calico network policies
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference: Symptom → First Command
|
||||
|
||||
| Symptom | First command |
|
||||
|---------|--------------|
|
||||
| Pod stuck `Pending` | `kubectl describe pod <pod> -n <ns>` → check Events |
|
||||
| `CrashLoopBackOff` | `kubectl logs <pod> -n <ns> --previous` |
|
||||
| `ImagePullBackOff` | `kubectl describe pod <pod> -n <ns>` → check image + secret |
|
||||
| Secret not available | `kubectl get externalsecret -n <ns>` |
|
||||
| ArgoCD sync failing | `kubectl get application <name> -n argocd -o yaml` → `.status.conditions` |
|
||||
| TLS cert not issuing | `kubectl get certificate -n <ns>` |
|
||||
| Node not Ready | `kubectl describe node <name>` → Events + Conditions |
|
||||
| EKS ALB not creating | `kubectl describe ingress <name> -n <ns>` → check controller logs |
|
||||
| Cluster-wide chaos | `kubectl get events -A --sort-by='.lastTimestamp' \| tail -30` |
|
||||
| Not sure where to start | Run all three Level 1 commands |
|
||||
|
||||
---
|
||||
|
||||
## Level 1 — Immediate Triage (always run first)
|
||||
|
||||
```bash
|
||||
kubectl get nodes -o wide
|
||||
kubectl get pods -A | grep -Ev '(Running|Completed)'
|
||||
kubectl get events -A --sort-by='.lastTimestamp' | tail -30
|
||||
```
|
||||
|
||||
Read the events output carefully — it frequently names the exact problem.
|
||||
|
||||
---
|
||||
|
||||
## Level 2 — Narrow to Failing Resource
|
||||
|
||||
```bash
|
||||
kubectl describe pod <name> -n <ns> # Events section is the most useful part
|
||||
kubectl logs <pod> -n <ns> --previous # If pod restarted
|
||||
kubectl logs <pod> -n <ns> -c <container> # Multi-container pods
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Level 3 — Root Causes by Symptom
|
||||
|
||||
### Pod stuck `Pending`
|
||||
|
||||
1. Check describe events for `FailedScheduling` — resource constraints, taints/tolerations, affinity rules
|
||||
2. Check PVCs: `kubectl get pvc -n <ns>`
|
||||
- **k3s:** If PVC Pending, check NFS provisioner: `kubectl get pods -n nfs-provisioner`
|
||||
- **EKS:** Check EBS CSI driver: `kubectl get pods -n kube-system -l app=ebs-csi-controller`; verify IRSA annotation on ServiceAccount
|
||||
|
||||
### `CrashLoopBackOff`
|
||||
|
||||
1. `kubectl logs <pod> --previous` — look for panic, missing env var, missing file, bad config
|
||||
2. Check ExternalSecret synced: `kubectl get externalsecret -n <ns>` — `SecretSyncedError` is common
|
||||
3. Check dependent services (DB, cache, upstream API)
|
||||
4. **k3s ArgoCD:** Check sync-wave ordering — ExternalSecret must have lower wave number than Deployment
|
||||
|
||||
### ArgoCD sync failing (k3s)
|
||||
|
||||
```bash
|
||||
kubectl get application <name> -n argocd -o yaml # .status.conditions
|
||||
kubectl get application <name> -n argocd -o jsonpath='{.status.operationState.message}'
|
||||
```
|
||||
|
||||
- **OutOfSync on immutable field** → manually delete the resource, then re-sync
|
||||
- **ExternalSecret missing** → check OpenBao (see below)
|
||||
- Force refresh without sync: ArgoCD UI → hard refresh, or:
|
||||
```bash
|
||||
kubectl annotate application <name> -n argocd argocd.argoproj.io/refresh=hard
|
||||
```
|
||||
|
||||
### External Secrets not syncing
|
||||
|
||||
```bash
|
||||
kubectl describe externalsecret <name> -n <ns> # .status.conditions
|
||||
kubectl get clustersecretstore openbao -o yaml # check Ready condition
|
||||
kubectl exec -n openbao openbao-0 -- bao status # check sealed/unsealed
|
||||
```
|
||||
|
||||
- **OpenBao sealed:** Normally auto-unseals via OCI KMS. If stuck:
|
||||
```bash
|
||||
kubectl exec -n openbao openbao-0 -- bao operator unseal
|
||||
```
|
||||
- **ClusterSecretStore not Ready:** Check the ESO controller logs:
|
||||
```bash
|
||||
kubectl logs -n external-secrets deploy/external-secrets -f
|
||||
```
|
||||
|
||||
### `ImagePullBackOff`
|
||||
|
||||
```bash
|
||||
kubectl describe pod <name> -n <ns> # look for "401 Unauthorized" or "not found"
|
||||
```
|
||||
|
||||
- Wrong image tag → fix in manifest/values
|
||||
- Missing `imagePullSecret` → verify secret exists: `kubectl get secret -n <ns>`
|
||||
- **k3s Harbor auth:** Ensure secret references `registry.ctz.fyi` and is attached to ServiceAccount or pod spec
|
||||
- Registry unreachable → check Harbor pod health: `kubectl get pods -n harbor`
|
||||
|
||||
### IngressRoute / TLS not working (k3s)
|
||||
|
||||
```bash
|
||||
kubectl get certificate -n <ns> # Ready=False = problem
|
||||
kubectl describe certificate <name> -n <ns> # check Events
|
||||
kubectl get ingressroute -n <ns>
|
||||
kubectl get ingress -n <ns> # cert-manager needs a standard Ingress to issue
|
||||
```
|
||||
|
||||
- cert-manager needs a standard `Ingress` resource alongside `IngressRoute` — if missing, cert won't issue
|
||||
- Check Traefik pods: `kubectl get pods -n traefik`
|
||||
|
||||
### EKS — Node not joining
|
||||
|
||||
```bash
|
||||
kubectl get configmap aws-auth -n kube-system -o yaml # verify node IAM role mapped
|
||||
# On the node:
|
||||
journalctl -u kubelet -n 100
|
||||
```
|
||||
|
||||
- Check security groups: nodes need port 443 outbound to control plane endpoint
|
||||
- Check node IAM role has `AmazonEKSWorkerNodePolicy`, `AmazonEKS_CNI_Policy`, `AmazonEC2ContainerRegistryReadOnly`
|
||||
|
||||
### EKS — ALB/NLB not creating
|
||||
|
||||
```bash
|
||||
kubectl describe ingress <name> -n <ns>
|
||||
kubectl logs -n kube-system deploy/aws-load-balancer-controller | tail -50
|
||||
```
|
||||
|
||||
- Verify annotations: `kubernetes.io/ingress.class: alb`
|
||||
- Check IRSA: ServiceAccount must have `eks.amazonaws.com/role-arn` annotation
|
||||
- Check controller has correct IAM permissions (policy document)
|
||||
|
||||
---
|
||||
|
||||
## Level 4 — System-Level Checks
|
||||
|
||||
```bash
|
||||
# k3s control plane
|
||||
kubectl get componentstatuses
|
||||
# On master nodes:
|
||||
systemctl status k3s
|
||||
|
||||
# Cilium (k3s)
|
||||
kubectl -n kube-system exec ds/cilium -- cilium status
|
||||
kubectl -n kube-system get pods -l k8s-app=cilium
|
||||
|
||||
# Resource pressure (both environments)
|
||||
kubectl top nodes
|
||||
kubectl top pods -A --sort-by=memory | head -20
|
||||
|
||||
# EKS cluster info
|
||||
aws eks describe-cluster --name <cluster> --region <region>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Level 5 — Logs via Grafana (k3s)
|
||||
|
||||
Grafana: `grafana.monitoring.ctz.fyi`
|
||||
|
||||
**Loki log queries:**
|
||||
```
|
||||
{namespace="<ns>"}
|
||||
{namespace="<ns>", app="<name>"} |= "error"
|
||||
{namespace="<ns>"} | logfmt | level="error"
|
||||
```
|
||||
|
||||
**Mimir (metrics):** Check CPU/memory graphs around the time of failure — spikes often correlate with OOMKills or throttling that don't appear in kubectl describe.
|
||||
|
||||
---
|
||||
|
||||
## Live Debugging Inside a Container
|
||||
|
||||
```bash
|
||||
kubectl exec -it <pod> -n <ns> -- /bin/sh
|
||||
# or if bash available:
|
||||
kubectl exec -it <pod> -n <ns> -- bash
|
||||
# multi-container:
|
||||
kubectl exec -it <pod> -n <ns> -c <container> -- /bin/sh
|
||||
```
|
||||
|
||||
Use for: verifying env vars, testing connectivity (`curl`, `wget`, `nslookup`), checking mounted files.
|
||||
|
||||
---
|
||||
|
||||
## Restart vs Dig Deeper
|
||||
|
||||
**Restart first when:**
|
||||
- Pod is in unknown/evicted state with no clear cause
|
||||
- You've already identified the root cause and fixed it
|
||||
- OOMKilled and you're about to bump memory limits
|
||||
|
||||
**Dig deeper first when:**
|
||||
- CrashLoopBackOff with no obvious cause (logs will be lost on restart)
|
||||
- Data loss risk
|
||||
- Same pod keeps restarting after restart → there's a real problem, not a transient one
|
||||
- Multiple pods affected → likely systemic, not pod-specific
|
||||
|
||||
**Never restart ArgoCD-managed resources directly** — ArgoCD will re-sync to desired state. Fix the underlying cause (secret, config, image) and let ArgoCD reconcile, or trigger a manual sync.
|
||||
25
skills/iterate-pr/README.md
Normal file
25
skills/iterate-pr/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
187
skills/iterate-pr/SKILL.md
Normal file
187
skills/iterate-pr/SKILL.md
Normal file
|
|
@ -0,0 +1,187 @@
|
|||
---
|
||||
name: iterate-pr
|
||||
description: Iterate on a PR until CI passes. Use when you need to fix CI failures, address review feedback, or continuously push fixes until all checks are green. Automates the feedback-fix-push-wait cycle.
|
||||
risk: unknown
|
||||
source: community
|
||||
---
|
||||
|
||||
# Iterate on PR Until CI Passes
|
||||
|
||||
Continuously iterate on the current branch until all CI checks pass and review feedback is addressed.
|
||||
|
||||
**Requires**: GitHub CLI (`gh`) authenticated.
|
||||
|
||||
**Important**: All scripts must be run from the repository root directory (where `.git` is located), not from the skill directory. Use the full path to the script via `${CLAUDE_SKILL_ROOT}`.
|
||||
|
||||
## Bundled Scripts
|
||||
|
||||
### `scripts/fetch_pr_checks.py`
|
||||
|
||||
Fetches CI check status and extracts failure snippets from logs.
|
||||
|
||||
```bash
|
||||
uv run ${CLAUDE_SKILL_ROOT}/scripts/fetch_pr_checks.py [--pr NUMBER]
|
||||
```
|
||||
|
||||
Returns JSON:
|
||||
```json
|
||||
{
|
||||
"pr": {"number": 123, "branch": "feat/foo"},
|
||||
"summary": {"total": 5, "passed": 3, "failed": 2, "pending": 0},
|
||||
"checks": [
|
||||
{"name": "tests", "status": "fail", "log_snippet": "...", "run_id": 123},
|
||||
{"name": "lint", "status": "pass"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### `scripts/fetch_pr_feedback.py`
|
||||
|
||||
Fetches and categorizes PR review feedback using the [LOGAF scale](https://develop.sentry.dev/engineering-practices/code-review/#logaf-scale).
|
||||
|
||||
```bash
|
||||
uv run ${CLAUDE_SKILL_ROOT}/scripts/fetch_pr_feedback.py [--pr NUMBER]
|
||||
```
|
||||
|
||||
Returns JSON with feedback categorized as:
|
||||
- `high` - Must address before merge (`h:`, blocker, changes requested)
|
||||
- `medium` - Should address (`m:`, standard feedback)
|
||||
- `low` - Optional (`l:`, nit, style, suggestion)
|
||||
- `bot` - Informational automated comments (Codecov, Dependabot, etc.)
|
||||
- `resolved` - Already resolved threads
|
||||
|
||||
Review bot feedback (from Sentry, Warden, Cursor, Bugbot, CodeQL, etc.) appears in `high`/`medium`/`low` with `review_bot: true` — it is NOT placed in the `bot` bucket.
|
||||
|
||||
Each feedback item may also include:
|
||||
- `thread_id` - GraphQL node ID for inline review comments (used for replies)
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Identify PR
|
||||
|
||||
```bash
|
||||
gh pr view --json number,url,headRefName
|
||||
```
|
||||
|
||||
Stop if no PR exists for the current branch.
|
||||
|
||||
### 2. Gather Review Feedback
|
||||
|
||||
Run `${CLAUDE_SKILL_ROOT}/scripts/fetch_pr_feedback.py` to get categorized feedback already posted on the PR.
|
||||
|
||||
### 3. Handle Feedback by LOGAF Priority
|
||||
|
||||
**Auto-fix (no prompt):**
|
||||
- `high` - must address (blockers, security, changes requested)
|
||||
- `medium` - should address (standard feedback)
|
||||
|
||||
When fixing feedback:
|
||||
- Understand the root cause, not just the surface symptom
|
||||
- Check for similar issues in nearby code or related files
|
||||
- Fix all instances, not just the one mentioned
|
||||
|
||||
This includes review bot feedback (items with `review_bot: true`). Treat it the same as human feedback:
|
||||
- Real issue found → fix it
|
||||
- False positive → skip, but explain why in a brief comment
|
||||
- Never silently ignore review bot feedback — always verify the finding
|
||||
|
||||
**Prompt user for selection:**
|
||||
- `low` - present numbered list and ask which to address:
|
||||
|
||||
```
|
||||
Found 3 low-priority suggestions:
|
||||
1. [l] "Consider renaming this variable" - @reviewer in api.py:42
|
||||
2. [nit] "Could use a list comprehension" - @reviewer in utils.py:18
|
||||
3. [style] "Add a docstring" - @reviewer in models.py:55
|
||||
|
||||
Which would you like to address? (e.g., "1,3" or "all" or "none")
|
||||
```
|
||||
|
||||
**Skip silently:**
|
||||
- `resolved` threads
|
||||
- `bot` comments (informational only — Codecov, Dependabot, etc.)
|
||||
|
||||
#### Replying to Comments
|
||||
|
||||
After processing each inline review comment, reply on the PR thread to acknowledge the action taken. Only reply to items with a `thread_id` (inline review comments).
|
||||
|
||||
**When to reply:**
|
||||
- `high` and `medium` items — whether fixed or determined to be false positives
|
||||
- `low` items — whether fixed or declined by the user
|
||||
|
||||
**How to reply:** Use the `addPullRequestReviewThreadReply` GraphQL mutation with `pullRequestReviewThreadId` and `body` inputs.
|
||||
|
||||
**Reply format:**
|
||||
- 1-2 sentences: what was changed, why it's not an issue, or acknowledgment of declined items
|
||||
- End every reply with `\n\n*— Claude Code*`
|
||||
- Before replying, check if the thread already has a reply ending with `*- Claude Code*` or `*— Claude Code*` to avoid duplicates on re-loops
|
||||
- If the `gh api` call fails, log and continue — do not block the workflow
|
||||
|
||||
### 4. Check CI Status
|
||||
|
||||
Run `${CLAUDE_SKILL_ROOT}/scripts/fetch_pr_checks.py` to get structured failure data.
|
||||
|
||||
**Wait if pending:** If review bot checks (sentry, warden, cursor, bugbot, seer, codeql) are still running, wait before proceeding—they post actionable feedback that must be evaluated. Informational bots (codecov) are not worth waiting for.
|
||||
|
||||
### 5. Fix CI Failures
|
||||
|
||||
For each failure in the script output:
|
||||
1. Read the `log_snippet` and trace backwards from the error to understand WHY it failed — not just what failed
|
||||
2. Read the relevant code and check for related issues (e.g., if a type error in one call site, check other call sites)
|
||||
3. Fix the root cause with minimal, targeted changes
|
||||
4. Find existing tests for the affected code and run them. If the fix introduces behavior not covered by existing tests, extend them to cover it (add a test case, not a whole new test file)
|
||||
|
||||
Do NOT assume what failed based on check name alone—always read the logs. Do NOT "quick fix and hope" — understand the failure thoroughly before changing code.
|
||||
|
||||
### 6. Verify Locally, Then Commit and Push
|
||||
|
||||
Before committing, verify your fixes locally:
|
||||
- If you fixed a test failure: re-run that specific test locally
|
||||
- If you fixed a lint/type error: re-run the linter or type checker on affected files
|
||||
- For any code fix: run existing tests covering the changed code
|
||||
|
||||
If local verification fails, fix before proceeding — do not push known-broken code.
|
||||
|
||||
```bash
|
||||
git add <files>
|
||||
git commit -m "fix: <descriptive message>"
|
||||
git push
|
||||
```
|
||||
|
||||
### 7. Monitor CI and Address Feedback
|
||||
|
||||
Poll CI status and review feedback in a loop instead of blocking:
|
||||
|
||||
1. Run `uv run ${CLAUDE_SKILL_ROOT}/scripts/fetch_pr_checks.py` to get current CI status
|
||||
2. If all checks passed → proceed to exit conditions
|
||||
3. If any checks failed (none pending) → return to step 5
|
||||
4. If checks are still pending:
|
||||
a. Run `uv run ${CLAUDE_SKILL_ROOT}/scripts/fetch_pr_feedback.py` for new review feedback
|
||||
b. Address any new high/medium feedback immediately (same as step 3)
|
||||
c. If changes were needed, commit and push (this restarts CI), then continue polling
|
||||
d. Sleep 30 seconds, then repeat from sub-step 1
|
||||
5. After all checks pass, do a final feedback check: `sleep 10`, then run `uv run ${CLAUDE_SKILL_ROOT}/scripts/fetch_pr_feedback.py`. Address any new high/medium feedback — if changes are needed, return to step 6.
|
||||
|
||||
### 8. Repeat
|
||||
|
||||
If step 7 required code changes (from new feedback after CI passed), return to step 2 for a fresh cycle. CI failures during monitoring are already handled within step 7's polling loop.
|
||||
|
||||
## Exit Conditions
|
||||
|
||||
**Success:** All checks pass, post-CI feedback re-check is clean (no new unaddressed high/medium feedback including review bot findings), user has decided on low-priority items.
|
||||
|
||||
**Ask for help:** Same failure after 2 attempts, feedback needs clarification, infrastructure issues.
|
||||
|
||||
**Stop:** No PR exists, branch needs rebase.
|
||||
|
||||
## Fallback
|
||||
|
||||
If scripts fail, use `gh` CLI directly:
|
||||
- `gh pr checks name,state,bucket,link`
|
||||
- `gh run view <run-id> --log-failed`
|
||||
- `gh api repos/{owner}/{repo}/pulls/{number}/comments`
|
||||
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when tackling tasks related to its primary domain or functionality as described above.
|
||||
25
skills/k8s-manifest-generator/README.md
Normal file
25
skills/k8s-manifest-generator/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
38
skills/k8s-manifest-generator/SKILL.md
Normal file
38
skills/k8s-manifest-generator/SKILL.md
Normal file
|
|
@ -0,0 +1,38 @@
|
|||
---
|
||||
name: k8s-manifest-generator
|
||||
description: "Create production-ready Kubernetes manifests for Deployments, Services, ConfigMaps, and Secrets following best practices and security standards. Use when generating Kubernetes YAML manifests, creat..."
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Kubernetes Manifest Generator
|
||||
|
||||
Step-by-step guidance for creating production-ready Kubernetes manifests including Deployments, Services, ConfigMaps, Secrets, and PersistentVolumeClaims.
|
||||
|
||||
## Use this skill when
|
||||
|
||||
Use this skill when you need to:
|
||||
- Create new Kubernetes Deployment manifests
|
||||
- Define Service resources for network connectivity
|
||||
- Generate ConfigMap and Secret resources for configuration management
|
||||
- Create PersistentVolumeClaim manifests for stateful workloads
|
||||
- Follow Kubernetes best practices and naming conventions
|
||||
- Implement resource limits, health checks, and security contexts
|
||||
- Design manifests for multi-environment deployments
|
||||
|
||||
## Do not use this skill when
|
||||
|
||||
- The task is unrelated to kubernetes manifest generator
|
||||
- You need a different domain or tool outside this scope
|
||||
|
||||
## Instructions
|
||||
|
||||
- Clarify goals, constraints, and required inputs.
|
||||
- Apply relevant best practices and validate outcomes.
|
||||
- Provide actionable steps and verification.
|
||||
- If detailed examples are required, open `resources/implementation-playbook.md`.
|
||||
|
||||
## Resources
|
||||
|
||||
- `resources/implementation-playbook.md` for detailed patterns and examples.
|
||||
296
skills/k8s-manifest-generator/assets/configmap-template.yaml
Normal file
296
skills/k8s-manifest-generator/assets/configmap-template.yaml
Normal file
|
|
@ -0,0 +1,296 @@
|
|||
# Kubernetes ConfigMap Templates
|
||||
|
||||
---
|
||||
# Template 1: Simple Key-Value Configuration
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: <app-name>-config
|
||||
namespace: <namespace>
|
||||
labels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
app.kubernetes.io/instance: <instance-name>
|
||||
data:
|
||||
# Simple key-value pairs
|
||||
APP_ENV: "production"
|
||||
LOG_LEVEL: "info"
|
||||
DATABASE_HOST: "db.example.com"
|
||||
DATABASE_PORT: "5432"
|
||||
CACHE_TTL: "3600"
|
||||
MAX_CONNECTIONS: "100"
|
||||
|
||||
---
|
||||
# Template 2: Configuration File
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: <app-name>-config-file
|
||||
namespace: <namespace>
|
||||
labels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
data:
|
||||
# Application configuration file
|
||||
application.yaml: |
|
||||
server:
|
||||
port: 8080
|
||||
host: 0.0.0.0
|
||||
|
||||
logging:
|
||||
level: INFO
|
||||
format: json
|
||||
|
||||
database:
|
||||
host: db.example.com
|
||||
port: 5432
|
||||
pool_size: 20
|
||||
timeout: 30
|
||||
|
||||
cache:
|
||||
enabled: true
|
||||
ttl: 3600
|
||||
max_entries: 10000
|
||||
|
||||
features:
|
||||
new_ui: true
|
||||
beta_features: false
|
||||
|
||||
---
|
||||
# Template 3: Multiple Configuration Files
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: <app-name>-multi-config
|
||||
namespace: <namespace>
|
||||
labels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
data:
|
||||
# Nginx configuration
|
||||
nginx.conf: |
|
||||
user nginx;
|
||||
worker_processes auto;
|
||||
error_log /var/log/nginx/error.log warn;
|
||||
pid /var/run/nginx.pid;
|
||||
|
||||
events {
|
||||
worker_connections 1024;
|
||||
}
|
||||
|
||||
http {
|
||||
include /etc/nginx/mime.types;
|
||||
default_type application/octet-stream;
|
||||
|
||||
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
|
||||
'$status $body_bytes_sent "$http_referer" '
|
||||
'"$http_user_agent" "$http_x_forwarded_for"';
|
||||
|
||||
access_log /var/log/nginx/access.log main;
|
||||
sendfile on;
|
||||
keepalive_timeout 65;
|
||||
|
||||
include /etc/nginx/conf.d/*.conf;
|
||||
}
|
||||
|
||||
# Default site configuration
|
||||
default.conf: |
|
||||
server {
|
||||
listen 80;
|
||||
server_name _;
|
||||
|
||||
location / {
|
||||
proxy_pass http://backend:8080;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
}
|
||||
|
||||
location /health {
|
||||
access_log off;
|
||||
return 200 "healthy\n";
|
||||
}
|
||||
}
|
||||
|
||||
---
|
||||
# Template 4: JSON Configuration
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: <app-name>-json-config
|
||||
namespace: <namespace>
|
||||
labels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
data:
|
||||
config.json: |
|
||||
{
|
||||
"server": {
|
||||
"port": 8080,
|
||||
"host": "0.0.0.0",
|
||||
"timeout": 30
|
||||
},
|
||||
"database": {
|
||||
"host": "postgres.example.com",
|
||||
"port": 5432,
|
||||
"database": "myapp",
|
||||
"pool": {
|
||||
"min": 2,
|
||||
"max": 20
|
||||
}
|
||||
},
|
||||
"redis": {
|
||||
"host": "redis.example.com",
|
||||
"port": 6379,
|
||||
"db": 0
|
||||
},
|
||||
"features": {
|
||||
"auth": true,
|
||||
"metrics": true,
|
||||
"tracing": true
|
||||
}
|
||||
}
|
||||
|
||||
---
|
||||
# Template 5: Environment-Specific Configuration
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: <app-name>-prod-config
|
||||
namespace: production
|
||||
labels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
environment: production
|
||||
data:
|
||||
APP_ENV: "production"
|
||||
LOG_LEVEL: "warn"
|
||||
DEBUG: "false"
|
||||
RATE_LIMIT: "1000"
|
||||
CACHE_TTL: "3600"
|
||||
DATABASE_POOL_SIZE: "50"
|
||||
FEATURE_FLAG_NEW_UI: "true"
|
||||
FEATURE_FLAG_BETA: "false"
|
||||
|
||||
---
|
||||
# Template 6: Script Configuration
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: <app-name>-scripts
|
||||
namespace: <namespace>
|
||||
labels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
data:
|
||||
# Initialization script
|
||||
init.sh: |
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
echo "Running initialization..."
|
||||
|
||||
# Wait for database
|
||||
until nc -z $DATABASE_HOST $DATABASE_PORT; do
|
||||
echo "Waiting for database..."
|
||||
sleep 2
|
||||
done
|
||||
|
||||
echo "Database is ready!"
|
||||
|
||||
# Run migrations
|
||||
if [ "$RUN_MIGRATIONS" = "true" ]; then
|
||||
echo "Running database migrations..."
|
||||
./migrate up
|
||||
fi
|
||||
|
||||
echo "Initialization complete!"
|
||||
|
||||
# Health check script
|
||||
healthcheck.sh: |
|
||||
#!/bin/bash
|
||||
|
||||
# Check application health endpoint
|
||||
response=$(curl -sf http://localhost:8080/health)
|
||||
|
||||
if [ $? -eq 0 ]; then
|
||||
echo "Health check passed"
|
||||
exit 0
|
||||
else
|
||||
echo "Health check failed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
---
|
||||
# Template 7: Prometheus Configuration
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: prometheus-config
|
||||
namespace: monitoring
|
||||
labels:
|
||||
app.kubernetes.io/name: prometheus
|
||||
data:
|
||||
prometheus.yml: |
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s
|
||||
external_labels:
|
||||
cluster: 'production'
|
||||
region: 'us-west-2'
|
||||
|
||||
alerting:
|
||||
alertmanagers:
|
||||
- static_configs:
|
||||
- targets:
|
||||
- alertmanager:9093
|
||||
|
||||
rule_files:
|
||||
- /etc/prometheus/rules/*.yml
|
||||
|
||||
scrape_configs:
|
||||
- job_name: 'kubernetes-pods'
|
||||
kubernetes_sd_configs:
|
||||
- role: pod
|
||||
relabel_configs:
|
||||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
|
||||
action: replace
|
||||
target_label: __address__
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
|
||||
---
|
||||
# Usage Examples:
|
||||
#
|
||||
# 1. Mount as environment variables:
|
||||
# envFrom:
|
||||
# - configMapRef:
|
||||
# name: <app-name>-config
|
||||
#
|
||||
# 2. Mount as files:
|
||||
# volumeMounts:
|
||||
# - name: config
|
||||
# mountPath: /etc/app
|
||||
# volumes:
|
||||
# - name: config
|
||||
# configMap:
|
||||
# name: <app-name>-config-file
|
||||
#
|
||||
# 3. Mount specific keys as files:
|
||||
# volumes:
|
||||
# - name: nginx-config
|
||||
# configMap:
|
||||
# name: <app-name>-multi-config
|
||||
# items:
|
||||
# - key: nginx.conf
|
||||
# path: nginx.conf
|
||||
#
|
||||
# 4. Use individual environment variables:
|
||||
# env:
|
||||
# - name: LOG_LEVEL
|
||||
# valueFrom:
|
||||
# configMapKeyRef:
|
||||
# name: <app-name>-config
|
||||
# key: LOG_LEVEL
|
||||
203
skills/k8s-manifest-generator/assets/deployment-template.yaml
Normal file
203
skills/k8s-manifest-generator/assets/deployment-template.yaml
Normal file
|
|
@ -0,0 +1,203 @@
|
|||
# Production-Ready Kubernetes Deployment Template
|
||||
# Replace all <placeholders> with actual values
|
||||
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: <app-name>
|
||||
namespace: <namespace>
|
||||
labels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
app.kubernetes.io/instance: <instance-name>
|
||||
app.kubernetes.io/version: "<version>"
|
||||
app.kubernetes.io/component: <component> # backend, frontend, database, cache
|
||||
app.kubernetes.io/part-of: <system-name>
|
||||
app.kubernetes.io/managed-by: kubectl
|
||||
annotations:
|
||||
description: "<application description>"
|
||||
contact: "<team-email>"
|
||||
spec:
|
||||
replicas: 3 # Minimum 3 for production HA
|
||||
revisionHistoryLimit: 10
|
||||
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
app.kubernetes.io/instance: <instance-name>
|
||||
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxSurge: 1
|
||||
maxUnavailable: 0 # Zero-downtime deployment
|
||||
|
||||
minReadySeconds: 10
|
||||
progressDeadlineSeconds: 600
|
||||
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
app.kubernetes.io/instance: <instance-name>
|
||||
app.kubernetes.io/version: "<version>"
|
||||
annotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "9090"
|
||||
prometheus.io/path: "/metrics"
|
||||
|
||||
spec:
|
||||
serviceAccountName: <app-name>
|
||||
|
||||
# Pod-level security context
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
fsGroup: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
|
||||
# Init containers (optional)
|
||||
initContainers:
|
||||
- name: init-wait
|
||||
image: busybox:1.36
|
||||
command: ['sh', '-c', 'echo "Initializing..."']
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
|
||||
containers:
|
||||
- name: <container-name>
|
||||
image: <registry>/<image>:<tag> # Never use :latest
|
||||
imagePullPolicy: IfNotPresent
|
||||
|
||||
ports:
|
||||
- name: http
|
||||
containerPort: 8080
|
||||
protocol: TCP
|
||||
- name: metrics
|
||||
containerPort: 9090
|
||||
protocol: TCP
|
||||
|
||||
# Environment variables
|
||||
env:
|
||||
- name: POD_NAME
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: metadata.name
|
||||
- name: POD_NAMESPACE
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: metadata.namespace
|
||||
- name: POD_IP
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: status.podIP
|
||||
|
||||
# Load from ConfigMap and Secret
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: <app-name>-config
|
||||
- secretRef:
|
||||
name: <app-name>-secret
|
||||
|
||||
# Resource limits
|
||||
resources:
|
||||
requests:
|
||||
memory: "256Mi"
|
||||
cpu: "250m"
|
||||
limits:
|
||||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
|
||||
# Startup probe (for slow-starting apps)
|
||||
startupProbe:
|
||||
httpGet:
|
||||
path: /health/startup
|
||||
port: http
|
||||
initialDelaySeconds: 0
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 3
|
||||
failureThreshold: 30 # 5 minutes to start
|
||||
|
||||
# Liveness probe
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health/live
|
||||
port: http
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
failureThreshold: 3
|
||||
|
||||
# Readiness probe
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health/ready
|
||||
port: http
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 5
|
||||
timeoutSeconds: 3
|
||||
failureThreshold: 3
|
||||
|
||||
# Volume mounts
|
||||
volumeMounts:
|
||||
- name: tmp
|
||||
mountPath: /tmp
|
||||
- name: cache
|
||||
mountPath: /app/cache
|
||||
# - name: data
|
||||
# mountPath: /var/lib/app
|
||||
|
||||
# Container security context
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
|
||||
# Lifecycle hooks
|
||||
lifecycle:
|
||||
preStop:
|
||||
exec:
|
||||
command: ["/bin/sh", "-c", "sleep 15"] # Graceful shutdown
|
||||
|
||||
# Volumes
|
||||
volumes:
|
||||
- name: tmp
|
||||
emptyDir: {}
|
||||
- name: cache
|
||||
emptyDir:
|
||||
sizeLimit: 1Gi
|
||||
# - name: data
|
||||
# persistentVolumeClaim:
|
||||
# claimName: <app-name>-data
|
||||
|
||||
# Scheduling
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
preferredDuringSchedulingIgnoredDuringExecution:
|
||||
- weight: 100
|
||||
podAffinityTerm:
|
||||
labelSelector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
topologyKey: kubernetes.io/hostname
|
||||
|
||||
topologySpreadConstraints:
|
||||
- maxSkew: 1
|
||||
topologyKey: topology.kubernetes.io/zone
|
||||
whenUnsatisfiable: ScheduleAnyway
|
||||
labelSelector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
|
||||
terminationGracePeriodSeconds: 30
|
||||
|
||||
# Image pull secrets (if using private registry)
|
||||
# imagePullSecrets:
|
||||
# - name: regcred
|
||||
171
skills/k8s-manifest-generator/assets/service-template.yaml
Normal file
171
skills/k8s-manifest-generator/assets/service-template.yaml
Normal file
|
|
@ -0,0 +1,171 @@
|
|||
# Kubernetes Service Templates
|
||||
|
||||
---
|
||||
# Template 1: ClusterIP Service (Internal Only)
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: <app-name>
|
||||
namespace: <namespace>
|
||||
labels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
app.kubernetes.io/instance: <instance-name>
|
||||
annotations:
|
||||
description: "Internal service for <app-name>"
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
app.kubernetes.io/instance: <instance-name>
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: http # Named port from container
|
||||
protocol: TCP
|
||||
sessionAffinity: None
|
||||
|
||||
---
|
||||
# Template 2: LoadBalancer Service (External Access)
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: <app-name>-lb
|
||||
namespace: <namespace>
|
||||
labels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
annotations:
|
||||
# AWS NLB annotations
|
||||
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
|
||||
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
|
||||
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
|
||||
# SSL certificate (optional)
|
||||
# service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:..."
|
||||
spec:
|
||||
type: LoadBalancer
|
||||
externalTrafficPolicy: Local # Preserves client IP
|
||||
selector:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: http
|
||||
protocol: TCP
|
||||
- name: https
|
||||
port: 443
|
||||
targetPort: https
|
||||
protocol: TCP
|
||||
# Restrict access to specific IPs (optional)
|
||||
# loadBalancerSourceRanges:
|
||||
# - 203.0.113.0/24
|
||||
|
||||
---
|
||||
# Template 3: NodePort Service (Direct Node Access)
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: <app-name>-np
|
||||
namespace: <namespace>
|
||||
labels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
spec:
|
||||
type: NodePort
|
||||
selector:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: 8080
|
||||
nodePort: 30080 # Optional, 30000-32767 range
|
||||
protocol: TCP
|
||||
|
||||
---
|
||||
# Template 4: Headless Service (StatefulSet)
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: <app-name>-headless
|
||||
namespace: <namespace>
|
||||
labels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
spec:
|
||||
clusterIP: None # Headless
|
||||
selector:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
ports:
|
||||
- name: client
|
||||
port: 9042
|
||||
targetPort: 9042
|
||||
publishNotReadyAddresses: true # Include not-ready pods in DNS
|
||||
|
||||
---
|
||||
# Template 5: Multi-Port Service with Metrics
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: <app-name>-multi
|
||||
namespace: <namespace>
|
||||
labels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
annotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "9090"
|
||||
prometheus.io/path: "/metrics"
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: 8080
|
||||
protocol: TCP
|
||||
- name: https
|
||||
port: 443
|
||||
targetPort: 8443
|
||||
protocol: TCP
|
||||
- name: grpc
|
||||
port: 9090
|
||||
targetPort: 9090
|
||||
protocol: TCP
|
||||
- name: metrics
|
||||
port: 9091
|
||||
targetPort: 9091
|
||||
protocol: TCP
|
||||
|
||||
---
|
||||
# Template 6: Service with Session Affinity
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: <app-name>-sticky
|
||||
namespace: <namespace>
|
||||
labels:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
app.kubernetes.io/name: <app-name>
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: 8080
|
||||
protocol: TCP
|
||||
sessionAffinity: ClientIP
|
||||
sessionAffinityConfig:
|
||||
clientIP:
|
||||
timeoutSeconds: 10800 # 3 hours
|
||||
|
||||
---
|
||||
# Template 7: ExternalName Service (External Service Mapping)
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: external-db
|
||||
namespace: <namespace>
|
||||
spec:
|
||||
type: ExternalName
|
||||
externalName: db.example.com
|
||||
ports:
|
||||
- port: 5432
|
||||
targetPort: 5432
|
||||
protocol: TCP
|
||||
25
skills/k8s-manifest-generator/references/README.md
Normal file
25
skills/k8s-manifest-generator/references/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
<!-- BEGIN_TF_DOCS -->
|
||||
## Requirements
|
||||
|
||||
No requirements.
|
||||
|
||||
## Providers
|
||||
|
||||
No providers.
|
||||
|
||||
## Modules
|
||||
|
||||
No modules.
|
||||
|
||||
## Resources
|
||||
|
||||
No resources.
|
||||
|
||||
## Inputs
|
||||
|
||||
No inputs.
|
||||
|
||||
## Outputs
|
||||
|
||||
No outputs.
|
||||
<!-- END_TF_DOCS -->
|
||||
753
skills/k8s-manifest-generator/references/deployment-spec.md
Normal file
753
skills/k8s-manifest-generator/references/deployment-spec.md
Normal file
|
|
@ -0,0 +1,753 @@
|
|||
# Kubernetes Deployment Specification Reference
|
||||
|
||||
Comprehensive reference for Kubernetes Deployment resources, covering all key fields, best practices, and common patterns.
|
||||
|
||||
## Overview
|
||||
|
||||
A Deployment provides declarative updates for Pods and ReplicaSets. It manages the desired state of your application, handling rollouts, rollbacks, and scaling operations.
|
||||
|
||||
## Complete Deployment Specification
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: my-app
|
||||
namespace: production
|
||||
labels:
|
||||
app.kubernetes.io/name: my-app
|
||||
app.kubernetes.io/version: "1.0.0"
|
||||
app.kubernetes.io/component: backend
|
||||
app.kubernetes.io/part-of: my-system
|
||||
annotations:
|
||||
description: "Main application deployment"
|
||||
contact: "backend-team@example.com"
|
||||
spec:
|
||||
# Replica management
|
||||
replicas: 3
|
||||
revisionHistoryLimit: 10
|
||||
|
||||
# Pod selection
|
||||
selector:
|
||||
matchLabels:
|
||||
app: my-app
|
||||
version: v1
|
||||
|
||||
# Update strategy
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxSurge: 1
|
||||
maxUnavailable: 0
|
||||
|
||||
# Minimum time for pod to be ready
|
||||
minReadySeconds: 10
|
||||
|
||||
# Deployment will fail if it doesn't progress in this time
|
||||
progressDeadlineSeconds: 600
|
||||
|
||||
# Pod template
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: my-app
|
||||
version: v1
|
||||
annotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "9090"
|
||||
spec:
|
||||
# Service account for RBAC
|
||||
serviceAccountName: my-app
|
||||
|
||||
# Security context for the pod
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
fsGroup: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
|
||||
# Init containers run before main containers
|
||||
initContainers:
|
||||
- name: init-db
|
||||
image: busybox:1.36
|
||||
command: ['sh', '-c', 'until nc -z db-service 5432; do sleep 1; done']
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
|
||||
# Main containers
|
||||
containers:
|
||||
- name: app
|
||||
image: myapp:1.0.0
|
||||
imagePullPolicy: IfNotPresent
|
||||
|
||||
# Container ports
|
||||
ports:
|
||||
- name: http
|
||||
containerPort: 8080
|
||||
protocol: TCP
|
||||
- name: metrics
|
||||
containerPort: 9090
|
||||
protocol: TCP
|
||||
|
||||
# Environment variables
|
||||
env:
|
||||
- name: POD_NAME
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: metadata.name
|
||||
- name: POD_NAMESPACE
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: metadata.namespace
|
||||
- name: DATABASE_URL
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: db-credentials
|
||||
key: url
|
||||
|
||||
# ConfigMap and Secret references
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: app-config
|
||||
- secretRef:
|
||||
name: app-secrets
|
||||
|
||||
# Resource requests and limits
|
||||
resources:
|
||||
requests:
|
||||
memory: "256Mi"
|
||||
cpu: "250m"
|
||||
limits:
|
||||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
|
||||
# Liveness probe
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health/live
|
||||
port: http
|
||||
httpHeaders:
|
||||
- name: Custom-Header
|
||||
value: Awesome
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
successThreshold: 1
|
||||
failureThreshold: 3
|
||||
|
||||
# Readiness probe
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health/ready
|
||||
port: http
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 5
|
||||
timeoutSeconds: 3
|
||||
successThreshold: 1
|
||||
failureThreshold: 3
|
||||
|
||||
# Startup probe (for slow-starting containers)
|
||||
startupProbe:
|
||||
httpGet:
|
||||
path: /health/startup
|
||||
port: http
|
||||
initialDelaySeconds: 0
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 3
|
||||
successThreshold: 1
|
||||
failureThreshold: 30
|
||||
|
||||
# Volume mounts
|
||||
volumeMounts:
|
||||
- name: data
|
||||
mountPath: /var/lib/app
|
||||
- name: config
|
||||
mountPath: /etc/app
|
||||
readOnly: true
|
||||
- name: tmp
|
||||
mountPath: /tmp
|
||||
|
||||
# Security context for container
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
|
||||
# Lifecycle hooks
|
||||
lifecycle:
|
||||
postStart:
|
||||
exec:
|
||||
command: ["/bin/sh", "-c", "echo Container started > /tmp/started"]
|
||||
preStop:
|
||||
exec:
|
||||
command: ["/bin/sh", "-c", "sleep 15"]
|
||||
|
||||
# Volumes
|
||||
volumes:
|
||||
- name: data
|
||||
persistentVolumeClaim:
|
||||
claimName: app-data
|
||||
- name: config
|
||||
configMap:
|
||||
name: app-config
|
||||
- name: tmp
|
||||
emptyDir: {}
|
||||
|
||||
# DNS configuration
|
||||
dnsPolicy: ClusterFirst
|
||||
dnsConfig:
|
||||
options:
|
||||
- name: ndots
|
||||
value: "2"
|
||||
|
||||
# Scheduling
|
||||
nodeSelector:
|
||||
disktype: ssd
|
||||
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
preferredDuringSchedulingIgnoredDuringExecution:
|
||||
- weight: 100
|
||||
podAffinityTerm:
|
||||
labelSelector:
|
||||
matchExpressions:
|
||||
- key: app
|
||||
operator: In
|
||||
values:
|
||||
- my-app
|
||||
topologyKey: kubernetes.io/hostname
|
||||
|
||||
tolerations:
|
||||
- key: "app"
|
||||
operator: "Equal"
|
||||
value: "my-app"
|
||||
effect: "NoSchedule"
|
||||
|
||||
# Termination
|
||||
terminationGracePeriodSeconds: 30
|
||||
|
||||
# Image pull secrets
|
||||
imagePullSecrets:
|
||||
- name: regcred
|
||||
```
|
||||
|
||||
## Field Reference
|
||||
|
||||
### Metadata Fields
|
||||
|
||||
#### Required Fields
|
||||
- `apiVersion`: `apps/v1` (current stable version)
|
||||
- `kind`: `Deployment`
|
||||
- `metadata.name`: Unique name within namespace
|
||||
|
||||
#### Recommended Metadata
|
||||
- `metadata.namespace`: Target namespace (defaults to `default`)
|
||||
- `metadata.labels`: Key-value pairs for organization
|
||||
- `metadata.annotations`: Non-identifying metadata
|
||||
|
||||
### Spec Fields
|
||||
|
||||
#### Replica Management
|
||||
|
||||
**`replicas`** (integer, default: 1)
|
||||
- Number of desired pod instances
|
||||
- Best practice: Use 3+ for production high availability
|
||||
- Can be scaled manually or via HorizontalPodAutoscaler
|
||||
|
||||
**`revisionHistoryLimit`** (integer, default: 10)
|
||||
- Number of old ReplicaSets to retain for rollback
|
||||
- Set to 0 to disable rollback capability
|
||||
- Reduces storage overhead for long-running deployments
|
||||
|
||||
#### Update Strategy
|
||||
|
||||
**`strategy.type`** (string)
|
||||
- `RollingUpdate` (default): Gradual pod replacement
|
||||
- `Recreate`: Delete all pods before creating new ones
|
||||
|
||||
**`strategy.rollingUpdate.maxSurge`** (int or percent, default: 25%)
|
||||
- Maximum pods above desired replicas during update
|
||||
- Example: With 3 replicas and maxSurge=1, up to 4 pods during update
|
||||
|
||||
**`strategy.rollingUpdate.maxUnavailable`** (int or percent, default: 25%)
|
||||
- Maximum pods below desired replicas during update
|
||||
- Set to 0 for zero-downtime deployments
|
||||
- Cannot be 0 if maxSurge is 0
|
||||
|
||||
**Best practices:**
|
||||
```yaml
|
||||
# Zero-downtime deployment
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxSurge: 1
|
||||
maxUnavailable: 0
|
||||
|
||||
# Fast deployment (can have brief downtime)
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxSurge: 2
|
||||
maxUnavailable: 1
|
||||
|
||||
# Complete replacement
|
||||
strategy:
|
||||
type: Recreate
|
||||
```
|
||||
|
||||
#### Pod Template
|
||||
|
||||
**`template.metadata.labels`**
|
||||
- Must include labels matching `spec.selector.matchLabels`
|
||||
- Add version labels for blue/green deployments
|
||||
- Include standard Kubernetes labels
|
||||
|
||||
**`template.spec.containers`** (required)
|
||||
- Array of container specifications
|
||||
- At least one container required
|
||||
- Each container needs unique name
|
||||
|
||||
#### Container Configuration
|
||||
|
||||
**Image Management:**
|
||||
```yaml
|
||||
containers:
|
||||
- name: app
|
||||
image: registry.example.com/myapp:1.0.0
|
||||
imagePullPolicy: IfNotPresent # or Always, Never
|
||||
```
|
||||
|
||||
Image pull policies:
|
||||
- `IfNotPresent`: Pull if not cached (default for tagged images)
|
||||
- `Always`: Always pull (default for :latest)
|
||||
- `Never`: Never pull, fail if not cached
|
||||
|
||||
**Port Declarations:**
|
||||
```yaml
|
||||
ports:
|
||||
- name: http # Named for referencing in Service
|
||||
containerPort: 8080
|
||||
protocol: TCP # TCP (default), UDP, or SCTP
|
||||
hostPort: 8080 # Optional: Bind to host port (rarely used)
|
||||
```
|
||||
|
||||
#### Resource Management
|
||||
|
||||
**Requests vs Limits:**
|
||||
|
||||
```yaml
|
||||
resources:
|
||||
requests:
|
||||
memory: "256Mi" # Guaranteed resources
|
||||
cpu: "250m" # 0.25 CPU cores
|
||||
limits:
|
||||
memory: "512Mi" # Maximum allowed
|
||||
cpu: "500m" # 0.5 CPU cores
|
||||
```
|
||||
|
||||
**QoS Classes (determined automatically):**
|
||||
|
||||
1. **Guaranteed**: requests = limits for all containers
|
||||
- Highest priority
|
||||
- Last to be evicted
|
||||
|
||||
2. **Burstable**: requests < limits or only requests set
|
||||
- Medium priority
|
||||
- Evicted before Guaranteed
|
||||
|
||||
3. **BestEffort**: No requests or limits set
|
||||
- Lowest priority
|
||||
- First to be evicted
|
||||
|
||||
**Best practices:**
|
||||
- Always set requests in production
|
||||
- Set limits to prevent resource monopolization
|
||||
- Memory limits should be 1.5-2x requests
|
||||
- CPU limits can be higher for bursty workloads
|
||||
|
||||
#### Health Checks
|
||||
|
||||
**Probe Types:**
|
||||
|
||||
1. **startupProbe** - For slow-starting applications
|
||||
```yaml
|
||||
startupProbe:
|
||||
httpGet:
|
||||
path: /health/startup
|
||||
port: 8080
|
||||
initialDelaySeconds: 0
|
||||
periodSeconds: 10
|
||||
failureThreshold: 30 # 5 minutes to start (10s * 30)
|
||||
```
|
||||
|
||||
2. **livenessProbe** - Restarts unhealthy containers
|
||||
```yaml
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health/live
|
||||
port: 8080
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
failureThreshold: 3 # Restart after 3 failures
|
||||
```
|
||||
|
||||
3. **readinessProbe** - Controls traffic routing
|
||||
```yaml
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health/ready
|
||||
port: 8080
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 5
|
||||
failureThreshold: 3 # Remove from service after 3 failures
|
||||
```
|
||||
|
||||
**Probe Mechanisms:**
|
||||
|
||||
```yaml
|
||||
# HTTP GET
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
httpHeaders:
|
||||
- name: Authorization
|
||||
value: Bearer token
|
||||
|
||||
# TCP Socket
|
||||
tcpSocket:
|
||||
port: 3306
|
||||
|
||||
# Command execution
|
||||
exec:
|
||||
command:
|
||||
- cat
|
||||
- /tmp/healthy
|
||||
|
||||
# gRPC (Kubernetes 1.24+)
|
||||
grpc:
|
||||
port: 9090
|
||||
service: my.service.health.v1.Health
|
||||
```
|
||||
|
||||
**Probe Timing Parameters:**
|
||||
|
||||
- `initialDelaySeconds`: Wait before first probe
|
||||
- `periodSeconds`: How often to probe
|
||||
- `timeoutSeconds`: Probe timeout
|
||||
- `successThreshold`: Successes needed to mark healthy (1 for liveness/startup)
|
||||
- `failureThreshold`: Failures before taking action
|
||||
|
||||
#### Security Context
|
||||
|
||||
**Pod-level security context:**
|
||||
```yaml
|
||||
spec:
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
fsGroup: 1000
|
||||
fsGroupChangePolicy: OnRootMismatch
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
```
|
||||
|
||||
**Container-level security context:**
|
||||
```yaml
|
||||
containers:
|
||||
- name: app
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
add:
|
||||
- NET_BIND_SERVICE # Only if needed
|
||||
```
|
||||
|
||||
**Security best practices:**
|
||||
- Always run as non-root (`runAsNonRoot: true`)
|
||||
- Drop all capabilities and add only needed ones
|
||||
- Use read-only root filesystem when possible
|
||||
- Enable seccomp profile
|
||||
- Disable privilege escalation
|
||||
|
||||
#### Volumes
|
||||
|
||||
**Volume Types:**
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
# PersistentVolumeClaim
|
||||
- name: data
|
||||
persistentVolumeClaim:
|
||||
claimName: app-data
|
||||
|
||||
# ConfigMap
|
||||
- name: config
|
||||
configMap:
|
||||
name: app-config
|
||||
items:
|
||||
- key: app.properties
|
||||
path: application.properties
|
||||
|
||||
# Secret
|
||||
- name: secrets
|
||||
secret:
|
||||
secretName: app-secrets
|
||||
defaultMode: 0400
|
||||
|
||||
# EmptyDir (ephemeral)
|
||||
- name: cache
|
||||
emptyDir:
|
||||
sizeLimit: 1Gi
|
||||
|
||||
# HostPath (avoid in production)
|
||||
- name: host-data
|
||||
hostPath:
|
||||
path: /data
|
||||
type: DirectoryOrCreate
|
||||
```
|
||||
|
||||
#### Scheduling
|
||||
|
||||
**Node Selection:**
|
||||
|
||||
```yaml
|
||||
# Simple node selector
|
||||
nodeSelector:
|
||||
disktype: ssd
|
||||
zone: us-west-1a
|
||||
|
||||
# Node affinity (more expressive)
|
||||
affinity:
|
||||
nodeAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
nodeSelectorTerms:
|
||||
- matchExpressions:
|
||||
- key: kubernetes.io/arch
|
||||
operator: In
|
||||
values:
|
||||
- amd64
|
||||
- arm64
|
||||
```
|
||||
|
||||
**Pod Affinity/Anti-Affinity:**
|
||||
|
||||
```yaml
|
||||
# Spread pods across nodes
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
- labelSelector:
|
||||
matchLabels:
|
||||
app: my-app
|
||||
topologyKey: kubernetes.io/hostname
|
||||
|
||||
# Co-locate with database
|
||||
affinity:
|
||||
podAffinity:
|
||||
preferredDuringSchedulingIgnoredDuringExecution:
|
||||
- weight: 100
|
||||
podAffinityTerm:
|
||||
labelSelector:
|
||||
matchLabels:
|
||||
app: database
|
||||
topologyKey: kubernetes.io/hostname
|
||||
```
|
||||
|
||||
**Tolerations:**
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- key: "node.kubernetes.io/unreachable"
|
||||
operator: "Exists"
|
||||
effect: "NoExecute"
|
||||
tolerationSeconds: 30
|
||||
- key: "dedicated"
|
||||
operator: "Equal"
|
||||
value: "database"
|
||||
effect: "NoSchedule"
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### High Availability Deployment
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
replicas: 3
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxSurge: 1
|
||||
maxUnavailable: 0
|
||||
template:
|
||||
spec:
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
- labelSelector:
|
||||
matchLabels:
|
||||
app: my-app
|
||||
topologyKey: kubernetes.io/hostname
|
||||
topologySpreadConstraints:
|
||||
- maxSkew: 1
|
||||
topologyKey: topology.kubernetes.io/zone
|
||||
whenUnsatisfiable: DoNotSchedule
|
||||
labelSelector:
|
||||
matchLabels:
|
||||
app: my-app
|
||||
```
|
||||
|
||||
### Sidecar Container Pattern
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: app
|
||||
image: myapp:1.0.0
|
||||
volumeMounts:
|
||||
- name: shared-logs
|
||||
mountPath: /var/log
|
||||
- name: log-forwarder
|
||||
image: fluent-bit:2.0
|
||||
volumeMounts:
|
||||
- name: shared-logs
|
||||
mountPath: /var/log
|
||||
readOnly: true
|
||||
volumes:
|
||||
- name: shared-logs
|
||||
emptyDir: {}
|
||||
```
|
||||
|
||||
### Init Container for Dependencies
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
initContainers:
|
||||
- name: wait-for-db
|
||||
image: busybox:1.36
|
||||
command:
|
||||
- sh
|
||||
- -c
|
||||
- |
|
||||
until nc -z database-service 5432; do
|
||||
echo "Waiting for database..."
|
||||
sleep 2
|
||||
done
|
||||
- name: run-migrations
|
||||
image: myapp:1.0.0
|
||||
command: ["./migrate", "up"]
|
||||
env:
|
||||
- name: DATABASE_URL
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: db-credentials
|
||||
key: url
|
||||
containers:
|
||||
- name: app
|
||||
image: myapp:1.0.0
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Production Checklist
|
||||
|
||||
- [ ] Set resource requests and limits
|
||||
- [ ] Implement all three probe types (startup, liveness, readiness)
|
||||
- [ ] Use specific image tags (not :latest)
|
||||
- [ ] Configure security context (non-root, read-only filesystem)
|
||||
- [ ] Set replica count >= 3 for HA
|
||||
- [ ] Configure pod anti-affinity for spread
|
||||
- [ ] Set appropriate update strategy (maxUnavailable: 0 for zero-downtime)
|
||||
- [ ] Use ConfigMaps and Secrets for configuration
|
||||
- [ ] Add standard labels and annotations
|
||||
- [ ] Configure graceful shutdown (preStop hook, terminationGracePeriodSeconds)
|
||||
- [ ] Set revisionHistoryLimit for rollback capability
|
||||
- [ ] Use ServiceAccount with minimal RBAC permissions
|
||||
|
||||
### Performance Tuning
|
||||
|
||||
**Fast startup:**
|
||||
```yaml
|
||||
spec:
|
||||
minReadySeconds: 5
|
||||
strategy:
|
||||
rollingUpdate:
|
||||
maxSurge: 2
|
||||
maxUnavailable: 1
|
||||
```
|
||||
|
||||
**Zero-downtime updates:**
|
||||
```yaml
|
||||
spec:
|
||||
minReadySeconds: 10
|
||||
strategy:
|
||||
rollingUpdate:
|
||||
maxSurge: 1
|
||||
maxUnavailable: 0
|
||||
```
|
||||
|
||||
**Graceful shutdown:**
|
||||
```yaml
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
terminationGracePeriodSeconds: 60
|
||||
containers:
|
||||
- name: app
|
||||
lifecycle:
|
||||
preStop:
|
||||
exec:
|
||||
command: ["/bin/sh", "-c", "sleep 15 && kill -SIGTERM 1"]
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Pods not starting:**
|
||||
```bash
|
||||
kubectl describe deployment <name>
|
||||
kubectl get pods -l app=<app-name>
|
||||
kubectl describe pod <pod-name>
|
||||
kubectl logs <pod-name>
|
||||
```
|
||||
|
||||
**ImagePullBackOff:**
|
||||
- Check image name and tag
|
||||
- Verify imagePullSecrets
|
||||
- Check registry credentials
|
||||
|
||||
**CrashLoopBackOff:**
|
||||
- Check container logs
|
||||
- Verify liveness probe is not too aggressive
|
||||
- Check resource limits
|
||||
- Verify application dependencies
|
||||
|
||||
**Deployment stuck in progress:**
|
||||
- Check progressDeadlineSeconds
|
||||
- Verify readiness probes
|
||||
- Check resource availability
|
||||
|
||||
## Related Resources
|
||||
|
||||
- [Kubernetes Deployment API Reference](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#deployment-v1-apps)
|
||||
- [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/)
|
||||
- [Resource Management](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/)
|
||||
724
skills/k8s-manifest-generator/references/service-spec.md
Normal file
724
skills/k8s-manifest-generator/references/service-spec.md
Normal file
|
|
@ -0,0 +1,724 @@
|
|||
# Kubernetes Service Specification Reference
|
||||
|
||||
Comprehensive reference for Kubernetes Service resources, covering service types, networking, load balancing, and service discovery patterns.
|
||||
|
||||
## Overview
|
||||
|
||||
A Service provides stable network endpoints for accessing Pods. Services enable loose coupling between microservices by providing service discovery and load balancing.
|
||||
|
||||
## Service Types
|
||||
|
||||
### 1. ClusterIP (Default)
|
||||
|
||||
Exposes the service on an internal cluster IP. Only reachable from within the cluster.
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: backend-service
|
||||
namespace: production
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
app: backend
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: 8080
|
||||
protocol: TCP
|
||||
sessionAffinity: None
|
||||
```
|
||||
|
||||
**Use cases:**
|
||||
- Internal microservice communication
|
||||
- Database services
|
||||
- Internal APIs
|
||||
- Message queues
|
||||
|
||||
### 2. NodePort
|
||||
|
||||
Exposes the service on each Node's IP at a static port (30000-32767 range).
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: frontend-service
|
||||
spec:
|
||||
type: NodePort
|
||||
selector:
|
||||
app: frontend
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: 8080
|
||||
nodePort: 30080 # Optional, auto-assigned if omitted
|
||||
protocol: TCP
|
||||
```
|
||||
|
||||
**Use cases:**
|
||||
- Development/testing external access
|
||||
- Small deployments without load balancer
|
||||
- Direct node access requirements
|
||||
|
||||
**Limitations:**
|
||||
- Limited port range (30000-32767)
|
||||
- Must handle node failures
|
||||
- No built-in load balancing across nodes
|
||||
|
||||
### 3. LoadBalancer
|
||||
|
||||
Exposes the service using a cloud provider's load balancer.
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: public-api
|
||||
annotations:
|
||||
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
|
||||
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
|
||||
spec:
|
||||
type: LoadBalancer
|
||||
selector:
|
||||
app: api
|
||||
ports:
|
||||
- name: https
|
||||
port: 443
|
||||
targetPort: 8443
|
||||
protocol: TCP
|
||||
loadBalancerSourceRanges:
|
||||
- 203.0.113.0/24
|
||||
```
|
||||
|
||||
**Cloud-specific annotations:**
|
||||
|
||||
**AWS:**
|
||||
```yaml
|
||||
annotations:
|
||||
service.beta.kubernetes.io/aws-load-balancer-type: "nlb" # or "external"
|
||||
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
|
||||
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
|
||||
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:..."
|
||||
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http"
|
||||
```
|
||||
|
||||
**Azure:**
|
||||
```yaml
|
||||
annotations:
|
||||
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
|
||||
service.beta.kubernetes.io/azure-pip-name: "my-public-ip"
|
||||
```
|
||||
|
||||
**GCP:**
|
||||
```yaml
|
||||
annotations:
|
||||
cloud.google.com/load-balancer-type: "Internal"
|
||||
cloud.google.com/backend-config: '{"default": "my-backend-config"}'
|
||||
```
|
||||
|
||||
### 4. ExternalName
|
||||
|
||||
Maps service to external DNS name (CNAME record).
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: external-db
|
||||
spec:
|
||||
type: ExternalName
|
||||
externalName: db.external.example.com
|
||||
ports:
|
||||
- port: 5432
|
||||
```
|
||||
|
||||
**Use cases:**
|
||||
- Accessing external services
|
||||
- Service migration scenarios
|
||||
- Multi-cluster service references
|
||||
|
||||
## Complete Service Specification
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: my-service
|
||||
namespace: production
|
||||
labels:
|
||||
app: my-app
|
||||
tier: backend
|
||||
annotations:
|
||||
description: "Main application service"
|
||||
prometheus.io/scrape: "true"
|
||||
spec:
|
||||
# Service type
|
||||
type: ClusterIP
|
||||
|
||||
# Pod selector
|
||||
selector:
|
||||
app: my-app
|
||||
version: v1
|
||||
|
||||
# Ports configuration
|
||||
ports:
|
||||
- name: http
|
||||
port: 80 # Service port
|
||||
targetPort: 8080 # Container port (or named port)
|
||||
protocol: TCP # TCP, UDP, or SCTP
|
||||
|
||||
# Session affinity
|
||||
sessionAffinity: ClientIP
|
||||
sessionAffinityConfig:
|
||||
clientIP:
|
||||
timeoutSeconds: 10800
|
||||
|
||||
# IP configuration
|
||||
clusterIP: 10.0.0.10 # Optional: specific IP
|
||||
clusterIPs:
|
||||
- 10.0.0.10
|
||||
ipFamilies:
|
||||
- IPv4
|
||||
ipFamilyPolicy: SingleStack
|
||||
|
||||
# External traffic policy
|
||||
externalTrafficPolicy: Local
|
||||
|
||||
# Internal traffic policy
|
||||
internalTrafficPolicy: Local
|
||||
|
||||
# Health check
|
||||
healthCheckNodePort: 30000
|
||||
|
||||
# Load balancer config (for type: LoadBalancer)
|
||||
loadBalancerIP: 203.0.113.100
|
||||
loadBalancerSourceRanges:
|
||||
- 203.0.113.0/24
|
||||
|
||||
# External IPs
|
||||
externalIPs:
|
||||
- 80.11.12.10
|
||||
|
||||
# Publishing strategy
|
||||
publishNotReadyAddresses: false
|
||||
```
|
||||
|
||||
## Port Configuration
|
||||
|
||||
### Named Ports
|
||||
|
||||
Use named ports in Pods for flexibility:
|
||||
|
||||
**Deployment:**
|
||||
```yaml
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: app
|
||||
ports:
|
||||
- name: http
|
||||
containerPort: 8080
|
||||
- name: metrics
|
||||
containerPort: 9090
|
||||
```
|
||||
|
||||
**Service:**
|
||||
```yaml
|
||||
spec:
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: http # References named port
|
||||
- name: metrics
|
||||
port: 9090
|
||||
targetPort: metrics
|
||||
```
|
||||
|
||||
### Multiple Ports
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: 8080
|
||||
protocol: TCP
|
||||
- name: https
|
||||
port: 443
|
||||
targetPort: 8443
|
||||
protocol: TCP
|
||||
- name: grpc
|
||||
port: 9090
|
||||
targetPort: 9090
|
||||
protocol: TCP
|
||||
```
|
||||
|
||||
## Session Affinity
|
||||
|
||||
### None (Default)
|
||||
|
||||
Distributes requests randomly across pods.
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
sessionAffinity: None
|
||||
```
|
||||
|
||||
### ClientIP
|
||||
|
||||
Routes requests from same client IP to same pod.
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
sessionAffinity: ClientIP
|
||||
sessionAffinityConfig:
|
||||
clientIP:
|
||||
timeoutSeconds: 10800 # 3 hours
|
||||
```
|
||||
|
||||
**Use cases:**
|
||||
- Stateful applications
|
||||
- Session-based applications
|
||||
- WebSocket connections
|
||||
|
||||
## Traffic Policies
|
||||
|
||||
### External Traffic Policy
|
||||
|
||||
**Cluster (Default):**
|
||||
```yaml
|
||||
spec:
|
||||
externalTrafficPolicy: Cluster
|
||||
```
|
||||
- Load balances across all nodes
|
||||
- May add extra network hop
|
||||
- Source IP is masked
|
||||
|
||||
**Local:**
|
||||
```yaml
|
||||
spec:
|
||||
externalTrafficPolicy: Local
|
||||
```
|
||||
- Traffic goes only to pods on receiving node
|
||||
- Preserves client source IP
|
||||
- Better performance (no extra hop)
|
||||
- May cause imbalanced load
|
||||
|
||||
### Internal Traffic Policy
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
internalTrafficPolicy: Local # or Cluster
|
||||
```
|
||||
|
||||
Controls traffic routing for cluster-internal clients.
|
||||
|
||||
## Headless Services
|
||||
|
||||
Service without cluster IP for direct pod access.
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: database
|
||||
spec:
|
||||
clusterIP: None # Headless
|
||||
selector:
|
||||
app: database
|
||||
ports:
|
||||
- port: 5432
|
||||
targetPort: 5432
|
||||
```
|
||||
|
||||
**Use cases:**
|
||||
- StatefulSet pod discovery
|
||||
- Direct pod-to-pod communication
|
||||
- Custom load balancing
|
||||
- Database clusters
|
||||
|
||||
**DNS returns:**
|
||||
- Individual pod IPs instead of service IP
|
||||
- Format: `<pod-name>.<service-name>.<namespace>.svc.cluster.local`
|
||||
|
||||
## Service Discovery
|
||||
|
||||
### DNS
|
||||
|
||||
**ClusterIP Service:**
|
||||
```
|
||||
<service-name>.<namespace>.svc.cluster.local
|
||||
```
|
||||
|
||||
Example:
|
||||
```bash
|
||||
curl http://backend-service.production.svc.cluster.local
|
||||
```
|
||||
|
||||
**Within same namespace:**
|
||||
```bash
|
||||
curl http://backend-service
|
||||
```
|
||||
|
||||
**Headless Service (returns pod IPs):**
|
||||
```
|
||||
<pod-name>.<service-name>.<namespace>.svc.cluster.local
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Kubernetes injects service info into pods:
|
||||
|
||||
```bash
|
||||
# Service host and port
|
||||
BACKEND_SERVICE_SERVICE_HOST=10.0.0.100
|
||||
BACKEND_SERVICE_SERVICE_PORT=80
|
||||
|
||||
# For named ports
|
||||
BACKEND_SERVICE_SERVICE_PORT_HTTP=80
|
||||
```
|
||||
|
||||
**Note:** Pods must be created after the service for env vars to be injected.
|
||||
|
||||
## Load Balancing
|
||||
|
||||
### Algorithms
|
||||
|
||||
Kubernetes uses random selection by default. For advanced load balancing:
|
||||
|
||||
**Service Mesh (Istio example):**
|
||||
```yaml
|
||||
apiVersion: networking.istio.io/v1beta1
|
||||
kind: DestinationRule
|
||||
metadata:
|
||||
name: my-destination-rule
|
||||
spec:
|
||||
host: my-service
|
||||
trafficPolicy:
|
||||
loadBalancer:
|
||||
simple: LEAST_REQUEST # or ROUND_ROBIN, RANDOM, PASSTHROUGH
|
||||
connectionPool:
|
||||
tcp:
|
||||
maxConnections: 100
|
||||
```
|
||||
|
||||
### Connection Limits
|
||||
|
||||
Use pod disruption budgets and resource limits:
|
||||
|
||||
```yaml
|
||||
apiVersion: policy/v1
|
||||
kind: PodDisruptionBudget
|
||||
metadata:
|
||||
name: my-app-pdb
|
||||
spec:
|
||||
minAvailable: 2
|
||||
selector:
|
||||
matchLabels:
|
||||
app: my-app
|
||||
```
|
||||
|
||||
## Service Mesh Integration
|
||||
|
||||
### Istio Virtual Service
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.istio.io/v1beta1
|
||||
kind: VirtualService
|
||||
metadata:
|
||||
name: my-service
|
||||
spec:
|
||||
hosts:
|
||||
- my-service
|
||||
http:
|
||||
- match:
|
||||
- headers:
|
||||
version:
|
||||
exact: v2
|
||||
route:
|
||||
- destination:
|
||||
host: my-service
|
||||
subset: v2
|
||||
- route:
|
||||
- destination:
|
||||
host: my-service
|
||||
subset: v1
|
||||
weight: 90
|
||||
- destination:
|
||||
host: my-service
|
||||
subset: v2
|
||||
weight: 10
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Pattern 1: Internal Microservice
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: user-service
|
||||
namespace: backend
|
||||
labels:
|
||||
app: user-service
|
||||
tier: backend
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
app: user-service
|
||||
ports:
|
||||
- name: http
|
||||
port: 8080
|
||||
targetPort: http
|
||||
protocol: TCP
|
||||
- name: grpc
|
||||
port: 9090
|
||||
targetPort: grpc
|
||||
protocol: TCP
|
||||
```
|
||||
|
||||
### Pattern 2: Public API with Load Balancer
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: api-gateway
|
||||
annotations:
|
||||
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
|
||||
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:..."
|
||||
spec:
|
||||
type: LoadBalancer
|
||||
externalTrafficPolicy: Local
|
||||
selector:
|
||||
app: api-gateway
|
||||
ports:
|
||||
- name: https
|
||||
port: 443
|
||||
targetPort: 8443
|
||||
protocol: TCP
|
||||
loadBalancerSourceRanges:
|
||||
- 0.0.0.0/0
|
||||
```
|
||||
|
||||
### Pattern 3: StatefulSet with Headless Service
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: cassandra
|
||||
spec:
|
||||
clusterIP: None
|
||||
selector:
|
||||
app: cassandra
|
||||
ports:
|
||||
- port: 9042
|
||||
targetPort: 9042
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: StatefulSet
|
||||
metadata:
|
||||
name: cassandra
|
||||
spec:
|
||||
serviceName: cassandra
|
||||
replicas: 3
|
||||
selector:
|
||||
matchLabels:
|
||||
app: cassandra
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: cassandra
|
||||
spec:
|
||||
containers:
|
||||
- name: cassandra
|
||||
image: cassandra:4.0
|
||||
```
|
||||
|
||||
### Pattern 4: External Service Mapping
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: external-database
|
||||
spec:
|
||||
type: ExternalName
|
||||
externalName: prod-db.cxyz.us-west-2.rds.amazonaws.com
|
||||
---
|
||||
# Or with Endpoints for IP-based external service
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: external-api
|
||||
spec:
|
||||
ports:
|
||||
- port: 443
|
||||
targetPort: 443
|
||||
protocol: TCP
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Endpoints
|
||||
metadata:
|
||||
name: external-api
|
||||
subsets:
|
||||
- addresses:
|
||||
- ip: 203.0.113.100
|
||||
ports:
|
||||
- port: 443
|
||||
```
|
||||
|
||||
### Pattern 5: Multi-Port Service with Metrics
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: web-app
|
||||
annotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "9090"
|
||||
prometheus.io/path: "/metrics"
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
app: web-app
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: 8080
|
||||
- name: metrics
|
||||
port: 9090
|
||||
targetPort: 9090
|
||||
```
|
||||
|
||||
## Network Policies
|
||||
|
||||
Control traffic to services:
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: allow-frontend-to-backend
|
||||
spec:
|
||||
podSelector:
|
||||
matchLabels:
|
||||
app: backend
|
||||
policyTypes:
|
||||
- Ingress
|
||||
ingress:
|
||||
- from:
|
||||
- podSelector:
|
||||
matchLabels:
|
||||
app: frontend
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 8080
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Service Configuration
|
||||
|
||||
1. **Use named ports** for flexibility
|
||||
2. **Set appropriate service type** based on exposure needs
|
||||
3. **Use labels and selectors consistently** across Deployments and Services
|
||||
4. **Configure session affinity** for stateful apps
|
||||
5. **Set external traffic policy to Local** for IP preservation
|
||||
6. **Use headless services** for StatefulSets
|
||||
7. **Implement network policies** for security
|
||||
8. **Add monitoring annotations** for observability
|
||||
|
||||
### Production Checklist
|
||||
|
||||
- [ ] Service type appropriate for use case
|
||||
- [ ] Selector matches pod labels
|
||||
- [ ] Named ports used for clarity
|
||||
- [ ] Session affinity configured if needed
|
||||
- [ ] Traffic policy set appropriately
|
||||
- [ ] Load balancer annotations configured (if applicable)
|
||||
- [ ] Source IP ranges restricted (for public services)
|
||||
- [ ] Health check configuration validated
|
||||
- [ ] Monitoring annotations added
|
||||
- [ ] Network policies defined
|
||||
|
||||
### Performance Tuning
|
||||
|
||||
**For high traffic:**
|
||||
```yaml
|
||||
spec:
|
||||
externalTrafficPolicy: Local
|
||||
sessionAffinity: ClientIP
|
||||
sessionAffinityConfig:
|
||||
clientIP:
|
||||
timeoutSeconds: 3600
|
||||
```
|
||||
|
||||
**For WebSocket/long connections:**
|
||||
```yaml
|
||||
spec:
|
||||
sessionAffinity: ClientIP
|
||||
sessionAffinityConfig:
|
||||
clientIP:
|
||||
timeoutSeconds: 86400 # 24 hours
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Service not accessible
|
||||
|
||||
```bash
|
||||
# Check service exists
|
||||
kubectl get service <service-name>
|
||||
|
||||
# Check endpoints (should show pod IPs)
|
||||
kubectl get endpoints <service-name>
|
||||
|
||||
# Describe service
|
||||
kubectl describe service <service-name>
|
||||
|
||||
# Check if pods match selector
|
||||
kubectl get pods -l app=<app-name>
|
||||
```
|
||||
|
||||
**Common issues:**
|
||||
- Selector doesn't match pod labels
|
||||
- No pods running (endpoints empty)
|
||||
- Ports misconfigured
|
||||
- Network policy blocking traffic
|
||||
|
||||
### DNS resolution failing
|
||||
|
||||
```bash
|
||||
# Test DNS from pod
|
||||
kubectl run debug --rm -it --image=busybox -- nslookup <service-name>
|
||||
|
||||
# Check CoreDNS
|
||||
kubectl get pods -n kube-system -l k8s-app=kube-dns
|
||||
kubectl logs -n kube-system -l k8s-app=kube-dns
|
||||
```
|
||||
|
||||
### Load balancer issues
|
||||
|
||||
```bash
|
||||
# Check load balancer status
|
||||
kubectl describe service <service-name>
|
||||
|
||||
# Check events
|
||||
kubectl get events --sort-by='.lastTimestamp'
|
||||
|
||||
# Verify cloud provider configuration
|
||||
kubectl describe node
|
||||
```
|
||||
|
||||
## Related Resources
|
||||
|
||||
- [Kubernetes Service API Reference](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#service-v1-core)
|
||||
- [Service Networking](https://kubernetes.io/docs/concepts/services-networking/service/)
|
||||
- [DNS for Services and Pods](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/)
|
||||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Reference in a new issue