Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- .woodpecker.yaml: image paths -> library/autojanet-{agent,dispatcher}
- .woodpecker.yaml: secret names RS_HARBOR_USER / RS_HARBOR_PASS (global)
- container/Dockerfile: restore COPY skills/, skills/ populated from opencode config
- skills/: 84 opencode skills bundled into image
- k8s/manifests: update image refs to library/
4.5 KiB
4.5 KiB
| name | description |
|---|---|
| creating-grafana-dashboard | Use when adding a dashboard to Zoe's Grafana monitoring stack — whether importing from grafana.com or creating from scratch — including datasource UID patching, GitOps deployment via the grafana-dashboards repo, and verification. |
Creating a Grafana Dashboard
Overview
Dashboards are delivered via GitOps from git@git.ctz.fyi:zoe/grafana-dashboards.git. Push to main → Woodpecker CI auto-deploys to Grafana at grafana.monitoring.ctz.fyi. The critical gotcha: any downloaded dashboard will have wrong datasource UIDs and must be patched before committing.
Stack Reference
| Service | URL / Context |
|---|---|
| Grafana | grafana.monitoring.ctz.fyi (v11.6.1, Postgres backend) |
| Cluster | k3s monitoring context |
| Mimir (metrics) | datasource UID: mimir, type: prometheus |
| Loki (logs) | datasource UID: loki, type: loki |
| Tempo (traces) | datasource UID: tempo, type: tempo |
| Pyroscope (profiling) | datasource UID: pyroscope, type: grafana-pyroscope-datasource |
| Grafana API key | secret/production/grafana/api-key in OpenBao |
Datasource UID Mapping (ALWAYS CHECK THIS)
| What the dashboard JSON says | What to set |
|---|---|
type: prometheus, any UID |
uid: "mimir" |
type: loki, any UID |
uid: "loki" |
type: tempo, any UID |
uid: "tempo" |
type: grafana-pyroscope-datasource, any UID |
uid: "pyroscope" |
${DS_PROMETHEUS} template variable |
set default to mimir |
Repo Structure
grafana-dashboards/
dashboards/
cilium/ # Cilium CNI dashboards
lgtm/ # Mimir, Loki, Tempo, Pyroscope dashboards
infra/ # Node, k8s cluster dashboards
apps/ # Application-specific dashboards
scripts/
sources.sh # upstream dashboard sources list
update-dashboards.sh # pull from upstream + patch UIDs
push-to-grafana.sh # push to live Grafana via API
.woodpecker.yml
Path A: Import from grafana.com
# 1. Download
curl -o dashboards/<folder>/<name>.json \
"https://grafana.com/api/dashboards/<id>/revisions/latest/download"
# 2. Patch datasource UIDs (REQUIRED — dashboard will show "No data" otherwise)
jq '
(.templating.list[] | select(.type == "datasource") | .query) = "prometheus" |
(.panels[].datasource | select(.type == "prometheus") | .uid) = "mimir" |
(.panels[].targets[]? | .datasource | select(.type == "prometheus") | .uid) = "mimir"
' dashboard.json > dashboard-patched.json
mv dashboard-patched.json dashboard.json
# Repeat for loki/tempo/pyroscope as needed
# 3. Set a unique explicit UID
jq '.uid = "descriptive-slug-here"' dashboard.json > tmp.json && mv tmp.json dashboard.json
# 4. Check for UID collisions before committing
jq -r '.uid' dashboards/**/*.json | sort | uniq -d # should output nothing
# 5. Add to sources.sh for future updates, then commit + push
Path B: Create from scratch in UI
- Build panels at
grafana.monitoring.ctz.fyi - Export: Dashboard → Share → Export → Save to file
- Save to
dashboards/<folder>/<name>.json - Verify
.uidis set to a unique descriptive slug - Commit and push
For new app dashboards: check what metrics are exposed first.
# See what labels Alloy exposes for a service
kubectl --context monitoring exec -n monitoring ds/alloy -- alloy targets
# Or port-forward to the app's /metrics endpoint
kubectl port-forward svc/<app> 9090:9090
curl localhost:9090/metrics | grep -v '^#' | head -50
Deployment
Push to main triggers Woodpecker automatically. To deploy manually:
cd grafana-dashboards
GRAFANA_API_KEY=$(bao kv get -field=api-key secret/production/grafana/api-key)
./scripts/push-to-grafana.sh
Check pipeline status at ci.ctz.fyi → grafana-dashboards repo.
Verification
- Go to
grafana.monitoring.ctz.fyi→ Dashboards → find the dashboard - All panels should show data (no "No data" panels)
- If "No data": datasource UIDs weren't patched — re-run jq patch
Common Issues
| Symptom | Cause | Fix |
|---|---|---|
| "No data" on panels | Datasource UID not patched | Re-run jq patch for that datasource type |
| Dashboard import fails | Duplicate UID | jq -r '.uid' dashboards/**/*.json | sort | uniq -d then rename |
| Wrong data in panels | Wrong label matchers | Check alloy targets for actual label names |
| UID collision silently replaces existing dashboard | Forgot to set explicit UID | Always set .uid to unique slug before commit |