--- name: cnpg-database description: Use when deploying, configuring, or troubleshooting CloudNativePG PostgreSQL clusters on Zoe's k3s homelab, including bootstrapping, secrets, S3 backups, migrations, and common failure modes. --- # CloudNativePG (CNPG) on k3s Homelab ## Overview Deploy and operate CNPG PostgreSQL clusters on the production k3s cluster at `10.0.6.10`. CNPG operator v1.28.1. Always use ArgoCD sync-waves to enforce creation order. ## Environment | Setting | Value | |---------|-------| | CNPG operator | 1.28.1 | | PostgreSQL image | `ghcr.io/cloudnative-pg/postgresql:18.1-system-trixie` (includes pgvector as `vector.so`) | | Fast storage | `nvme` (NFS-NVMe) | | Standard storage | `ssd` (NFS-SSD) | | S3 endpoint | `https://s3.ctz.fyi` | | S3 bucket | `cnpg-backups` | | Secrets backend | External Secrets Operator → ClusterSecretStore `openbao` | | OpenBao path | `secret/production//` | ## Sync-Wave Order (Critical) | Wave | Resource | |------|----------| | `-2` | CNPG `Cluster` | | `-1` | `ExternalSecret` for DB credentials | | `0` | App `Deployment` | ## Step 1 — Write Secrets to OpenBao Do this **before** deploying anything: ```bash bao kv put secret/production//-db \ username= \ password=$(openssl rand -base64 32 | tr -d /=+ | head -c 32) ``` Also create the backup credentials secret once per namespace: ```bash bao kv put secret/production//cnpg-backup-s3-credentials \ ACCESS_KEY_ID= \ ACCESS_SECRET_KEY= ``` ## Step 2 — ExternalSecret (sync-wave -1) ```yaml apiVersion: external-secrets.io/v1 kind: ExternalSecret metadata: name: -db-credentials namespace: annotations: argocd.argoproj.io/sync-wave: "-1" spec: refreshInterval: 1h secretStoreRef: name: openbao kind: ClusterSecretStore target: name: -db-credentials creationPolicy: Owner data: - secretKey: username remoteRef: key: secret/production//-db property: username - secretKey: password remoteRef: key: secret/production//-db property: password ``` ## Step 3 — CNPG Cluster (sync-wave -2) ```yaml apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: name: -db namespace: annotations: argocd.argoproj.io/sync-wave: "-2" spec: instances: 3 # Use 1 for dev/small workloads imageName: ghcr.io/cloudnative-pg/postgresql:18.1-system-trixie storage: size: 10Gi storageClass: nvme # or ssd bootstrap: initdb: database: owner: secret: name: -db-credentials # MUST have keys 'username' and 'password' exactly backup: barmanObjectStore: destinationPath: s3://cnpg-backups/ endpointURL: https://s3.ctz.fyi s3Credentials: accessKeyId: name: cnpg-backup-s3-credentials key: ACCESS_KEY_ID secretAccessKey: name: cnpg-backup-s3-credentials key: ACCESS_SECRET_KEY retentionPolicy: "30d" ``` ## CRITICAL: Secret Key Names > **The bootstrap secret MUST have keys named exactly `username` and `password`.** > CNPG will appear healthy but the app cannot connect if keys are wrong (e.g., `user`, `pass`, `POSTGRES_USER`). > CNPG does NOT create a separate `-app` secret when `bootstrap.initdb.secret` is provided. ## Connecting from the App CNPG auto-creates these services: | Service | Use | |---------|-----| | `-rw` | Read-write (primary) — **use this for app writes** | | `-ro` | Read-only (replicas) — use for read-heavy queries | | `-r` | Any instance | ``` postgresql://:@-db-rw..svc.cluster.local:5432/ ``` ## Manual Database Access ```bash # psql on primary kubectl exec -n -it -1 -- psql -U # via cnpg plugin kubectl cnpg psql -n # pg_dump kubectl exec -n -1 -- \ pg_dump -U > dump.sql # restore kubectl exec -n -i -1 -- \ psql -U < dump.sql ``` ## Migrating from Docker/External Postgres ```bash # 1. Dump from source pg_dump -h -U > dump.sql # 2. Copy into pod kubectl cp dump.sql /:/tmp/dump.sql # 3. Restore kubectl exec -n -it -- \ psql -U -f /tmp/dump.sql ``` ## Scheduled Backups (Optional) ```yaml apiVersion: postgresql.cnpg.io/v1 kind: ScheduledBackup metadata: name: -db-backup namespace: spec: schedule: "0 2 * * *" # 2am daily backupOwnerReference: self cluster: name: -db ``` ## Common Issues | Symptom | Cause | Fix | |---------|-------|-----| | Cluster stuck at "Setting up primary" | Secret missing or wrong key names | Check `-db-credentials` exists and has `username`/`password` keys | | Pod in `Pending` | PVC can't provision | Check `nvme`/`ssd` NFS provisioner is healthy | | App can't connect | Using pod IP or wrong service | Use `-rw` service, not pod IP | | 2/3 instances after node failure | Normal self-healing | Wait — CNPG will recover automatically | | Stale data after cluster recreation | Old PVCs still present | Delete PVCs manually before clean redeploy |