autojanet/skills/creating-grafana-dashboard/SKILL.md
Zoë cc74ad0bd0
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
fix: use library/ Harbor project, add skills, fix pipeline secrets
- .woodpecker.yaml: image paths -> library/autojanet-{agent,dispatcher}
- .woodpecker.yaml: secret names RS_HARBOR_USER / RS_HARBOR_PASS (global)
- container/Dockerfile: restore COPY skills/, skills/ populated from opencode config
- skills/: 84 opencode skills bundled into image
- k8s/manifests: update image refs to library/
2026-05-30 15:43:14 -07:00

4.5 KiB

name description
creating-grafana-dashboard Use when adding a dashboard to Zoe's Grafana monitoring stack — whether importing from grafana.com or creating from scratch — including datasource UID patching, GitOps deployment via the grafana-dashboards repo, and verification.

Creating a Grafana Dashboard

Overview

Dashboards are delivered via GitOps from git@git.ctz.fyi:zoe/grafana-dashboards.git. Push to main → Woodpecker CI auto-deploys to Grafana at grafana.monitoring.ctz.fyi. The critical gotcha: any downloaded dashboard will have wrong datasource UIDs and must be patched before committing.

Stack Reference

Service URL / Context
Grafana grafana.monitoring.ctz.fyi (v11.6.1, Postgres backend)
Cluster k3s monitoring context
Mimir (metrics) datasource UID: mimir, type: prometheus
Loki (logs) datasource UID: loki, type: loki
Tempo (traces) datasource UID: tempo, type: tempo
Pyroscope (profiling) datasource UID: pyroscope, type: grafana-pyroscope-datasource
Grafana API key secret/production/grafana/api-key in OpenBao

Datasource UID Mapping (ALWAYS CHECK THIS)

What the dashboard JSON says What to set
type: prometheus, any UID uid: "mimir"
type: loki, any UID uid: "loki"
type: tempo, any UID uid: "tempo"
type: grafana-pyroscope-datasource, any UID uid: "pyroscope"
${DS_PROMETHEUS} template variable set default to mimir

Repo Structure

grafana-dashboards/
  dashboards/
    cilium/    # Cilium CNI dashboards
    lgtm/      # Mimir, Loki, Tempo, Pyroscope dashboards
    infra/     # Node, k8s cluster dashboards
    apps/      # Application-specific dashboards
  scripts/
    sources.sh              # upstream dashboard sources list
    update-dashboards.sh    # pull from upstream + patch UIDs
    push-to-grafana.sh      # push to live Grafana via API
  .woodpecker.yml

Path A: Import from grafana.com

# 1. Download
curl -o dashboards/<folder>/<name>.json \
  "https://grafana.com/api/dashboards/<id>/revisions/latest/download"

# 2. Patch datasource UIDs (REQUIRED — dashboard will show "No data" otherwise)
jq '
  (.templating.list[] | select(.type == "datasource") | .query) = "prometheus" |
  (.panels[].datasource | select(.type == "prometheus") | .uid) = "mimir" |
  (.panels[].targets[]? | .datasource | select(.type == "prometheus") | .uid) = "mimir"
' dashboard.json > dashboard-patched.json
mv dashboard-patched.json dashboard.json

# Repeat for loki/tempo/pyroscope as needed

# 3. Set a unique explicit UID
jq '.uid = "descriptive-slug-here"' dashboard.json > tmp.json && mv tmp.json dashboard.json

# 4. Check for UID collisions before committing
jq -r '.uid' dashboards/**/*.json | sort | uniq -d   # should output nothing

# 5. Add to sources.sh for future updates, then commit + push

Path B: Create from scratch in UI

  1. Build panels at grafana.monitoring.ctz.fyi
  2. Export: Dashboard → Share → Export → Save to file
  3. Save to dashboards/<folder>/<name>.json
  4. Verify .uid is set to a unique descriptive slug
  5. Commit and push

For new app dashboards: check what metrics are exposed first.

# See what labels Alloy exposes for a service
kubectl --context monitoring exec -n monitoring ds/alloy -- alloy targets

# Or port-forward to the app's /metrics endpoint
kubectl port-forward svc/<app> 9090:9090
curl localhost:9090/metrics | grep -v '^#' | head -50

Deployment

Push to main triggers Woodpecker automatically. To deploy manually:

cd grafana-dashboards
GRAFANA_API_KEY=$(bao kv get -field=api-key secret/production/grafana/api-key)
./scripts/push-to-grafana.sh

Check pipeline status at ci.ctz.fyi → grafana-dashboards repo.

Verification

  • Go to grafana.monitoring.ctz.fyi → Dashboards → find the dashboard
  • All panels should show data (no "No data" panels)
  • If "No data": datasource UIDs weren't patched — re-run jq patch

Common Issues

Symptom Cause Fix
"No data" on panels Datasource UID not patched Re-run jq patch for that datasource type
Dashboard import fails Duplicate UID jq -r '.uid' dashboards/**/*.json | sort | uniq -d then rename
Wrong data in panels Wrong label matchers Check alloy targets for actual label names
UID collision silently replaces existing dashboard Forgot to set explicit UID Always set .uid to unique slug before commit