autojanet/skills/cnpg-database/SKILL.md
Zoë cc74ad0bd0
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
fix: use library/ Harbor project, add skills, fix pipeline secrets
- .woodpecker.yaml: image paths -> library/autojanet-{agent,dispatcher}
- .woodpecker.yaml: secret names RS_HARBOR_USER / RS_HARBOR_PASS (global)
- container/Dockerfile: restore COPY skills/, skills/ populated from opencode config
- skills/: 84 opencode skills bundled into image
- k8s/manifests: update image refs to library/
2026-05-30 15:43:14 -07:00

5.3 KiB

name description
cnpg-database Use when deploying, configuring, or troubleshooting CloudNativePG PostgreSQL clusters on Zoe's k3s homelab, including bootstrapping, secrets, S3 backups, migrations, and common failure modes.

CloudNativePG (CNPG) on k3s Homelab

Overview

Deploy and operate CNPG PostgreSQL clusters on the production k3s cluster at 10.0.6.10. CNPG operator v1.28.1. Always use ArgoCD sync-waves to enforce creation order.

Environment

Setting Value
CNPG operator 1.28.1
PostgreSQL image ghcr.io/cloudnative-pg/postgresql:18.1-system-trixie (includes pgvector as vector.so)
Fast storage nvme (NFS-NVMe)
Standard storage ssd (NFS-SSD)
S3 endpoint https://s3.ctz.fyi
S3 bucket cnpg-backups
Secrets backend External Secrets Operator → ClusterSecretStore openbao
OpenBao path secret/production/<namespace>/<cluster-name>

Sync-Wave Order (Critical)

Wave Resource
-2 CNPG Cluster
-1 ExternalSecret for DB credentials
0 App Deployment

Step 1 — Write Secrets to OpenBao

Do this before deploying anything:

bao kv put secret/production/<namespace>/<app>-db \
  username=<app> \
  password=$(openssl rand -base64 32 | tr -d /=+ | head -c 32)

Also create the backup credentials secret once per namespace:

bao kv put secret/production/<namespace>/cnpg-backup-s3-credentials \
  ACCESS_KEY_ID=<key> \
  ACCESS_SECRET_KEY=<secret>

Step 2 — ExternalSecret (sync-wave -1)

apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: <app>-db-credentials
  namespace: <app>
  annotations:
    argocd.argoproj.io/sync-wave: "-1"
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: openbao
    kind: ClusterSecretStore
  target:
    name: <app>-db-credentials
    creationPolicy: Owner
  data:
    - secretKey: username
      remoteRef:
        key: secret/production/<namespace>/<app>-db
        property: username
    - secretKey: password
      remoteRef:
        key: secret/production/<namespace>/<app>-db
        property: password

Step 3 — CNPG Cluster (sync-wave -2)

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: <app>-db
  namespace: <app>
  annotations:
    argocd.argoproj.io/sync-wave: "-2"
spec:
  instances: 3  # Use 1 for dev/small workloads
  imageName: ghcr.io/cloudnative-pg/postgresql:18.1-system-trixie

  storage:
    size: 10Gi
    storageClass: nvme  # or ssd

  bootstrap:
    initdb:
      database: <app>
      owner: <app>
      secret:
        name: <app>-db-credentials  # MUST have keys 'username' and 'password' exactly

  backup:
    barmanObjectStore:
      destinationPath: s3://cnpg-backups/<app>
      endpointURL: https://s3.ctz.fyi
      s3Credentials:
        accessKeyId:
          name: cnpg-backup-s3-credentials
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: cnpg-backup-s3-credentials
          key: ACCESS_SECRET_KEY
    retentionPolicy: "30d"

CRITICAL: Secret Key Names

The bootstrap secret MUST have keys named exactly username and password.
CNPG will appear healthy but the app cannot connect if keys are wrong (e.g., user, pass, POSTGRES_USER).
CNPG does NOT create a separate -app secret when bootstrap.initdb.secret is provided.

Connecting from the App

CNPG auto-creates these services:

Service Use
<cluster>-rw Read-write (primary) — use this for app writes
<cluster>-ro Read-only (replicas) — use for read-heavy queries
<cluster>-r Any instance
postgresql://<username>:<password>@<app>-db-rw.<namespace>.svc.cluster.local:5432/<database>

Manual Database Access

# psql on primary
kubectl exec -n <namespace> -it <cluster>-1 -- psql -U <username> <database>

# via cnpg plugin
kubectl cnpg psql <cluster> -n <namespace>

# pg_dump
kubectl exec -n <namespace> <cluster>-1 -- \
  pg_dump -U <username> <database> > dump.sql

# restore
kubectl exec -n <namespace> -i <cluster>-1 -- \
  psql -U <username> <database> < dump.sql

Migrating from Docker/External Postgres

# 1. Dump from source
pg_dump -h <old-host> -U <user> <database> > dump.sql

# 2. Copy into pod
kubectl cp dump.sql <namespace>/<pod>:/tmp/dump.sql

# 3. Restore
kubectl exec -n <namespace> -it <pod> -- \
  psql -U <username> <database> -f /tmp/dump.sql

Scheduled Backups (Optional)

apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: <app>-db-backup
  namespace: <app>
spec:
  schedule: "0 2 * * *"  # 2am daily
  backupOwnerReference: self
  cluster:
    name: <app>-db

Common Issues

Symptom Cause Fix
Cluster stuck at "Setting up primary" Secret missing or wrong key names Check <app>-db-credentials exists and has username/password keys
Pod in Pending PVC can't provision Check nvme/ssd NFS provisioner is healthy
App can't connect Using pod IP or wrong service Use <cluster>-rw service, not pod IP
2/3 instances after node failure Normal self-healing Wait — CNPG will recover automatically
Stale data after cluster recreation Old PVCs still present Delete PVCs manually before clean redeploy