Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- .woodpecker.yaml: image paths -> library/autojanet-{agent,dispatcher}
- .woodpecker.yaml: secret names RS_HARBOR_USER / RS_HARBOR_PASS (global)
- container/Dockerfile: restore COPY skills/, skills/ populated from opencode config
- skills/: 84 opencode skills bundled into image
- k8s/manifests: update image refs to library/
157 lines
5 KiB
Markdown
157 lines
5 KiB
Markdown
---
|
|
name: network-debugging
|
|
description: Use when diagnosing network connectivity issues in Zoe's homelab or work environments — DNS not resolving, TLS cert stuck, service unreachable, ingress not routing, Cilium dropping packets, or Pangolin tunnel not working.
|
|
---
|
|
|
|
# Network Debugging
|
|
|
|
## Overview
|
|
|
|
Systematic outside-in debugging for Zoe's homelab stack: DigitalOcean DNS + BIND9 split-horizon, cert-manager DNS-01, Traefik IngressRoute, Cilium CNI, and Pangolin tunnels.
|
|
|
|
**Rule:** Always work from outside in. DNS → TLS → Ingress → Pod → Cilium → Pangolin.
|
|
|
|
## Quick Symptom → First Command
|
|
|
|
| Symptom | First command |
|
|
|---------|---------------|
|
|
| Can't reach service from browser | `dig <hostname> @8.8.8.8` |
|
|
| Certificate expired / not trusted | `kubectl get certificate -n <ns>` |
|
|
| cert-manager stuck in Pending | `kubectl get challenge -A` |
|
|
| Service resolves but connection refused | `kubectl get endpoints <svc> -n <ns>` |
|
|
| Works internally, not externally | Check Pangolin annotations + external-dns target |
|
|
| Works externally, not from cluster | `kubectl run nettest --image=nicolaka/netshoot` |
|
|
| Pod can't reach external internet | Check Cilium NetworkPolicy egress rules |
|
|
| DNS resolves wrong IP | Compare `dig @8.8.8.8` vs `dig @10.0.6.6` (split-horizon issue) |
|
|
|
|
## Level 1: DNS
|
|
|
|
```bash
|
|
# Public DNS
|
|
dig <hostname> @8.8.8.8
|
|
dig <hostname> @ns1.digitalocean.com
|
|
|
|
# Internal DNS (from within cluster)
|
|
kubectl run -it --rm dnsutils --image=busybox --restart=Never -- nslookup <hostname>
|
|
|
|
# ACME challenge record (cert-manager DNS-01)
|
|
dig TXT _acme-challenge.<hostname> @ns1.digitalocean.com
|
|
|
|
# ExternalDNS registration
|
|
kubectl logs -n external-dns -l app.kubernetes.io/name=external-dns | tail -20
|
|
```
|
|
|
|
**Stack:** DigitalOcean (ctz.fyi public) + BIND9 (10.0.6.6, split-horizon internal)
|
|
**Public NS:** ns1/ns2/ns3.digitalocean.com
|
|
**Domains:** `*.ctz.fyi` (public), `*.i.ctz.fyi` (internal only)
|
|
|
|
## Level 2: TLS / cert-manager
|
|
|
|
```bash
|
|
# Certificate status
|
|
kubectl get certificate -n <namespace>
|
|
kubectl describe certificate <name> -n <namespace>
|
|
|
|
# Active ACME challenge
|
|
kubectl get challenge -A
|
|
kubectl describe challenge <name> -n <namespace>
|
|
|
|
# cert-manager errors
|
|
kubectl logs -n cert-manager deploy/cert-manager | grep -i error | tail -20
|
|
|
|
# Verify cert in secret
|
|
kubectl get secret <name>-tls -n <namespace> \
|
|
-o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates
|
|
```
|
|
|
|
**Common issue:** cert-manager can't create DNS TXT record
|
|
- Check DigitalOcean token: `kubectl get secret digitalocean-dns -n cert-manager`
|
|
- Check outbound UDP 53 — Cilium NetworkPolicy may block cert-manager egress
|
|
|
|
## Level 3: Ingress / Traefik
|
|
|
|
```bash
|
|
# Check IngressRoute
|
|
kubectl get ingressroute -n <namespace> -o yaml
|
|
|
|
# Traefik logs for hostname
|
|
kubectl logs -n traefik deploy/traefik | grep <hostname>
|
|
```
|
|
|
|
**Critical gotcha:** cert-manager reads `Ingress` objects, not `IngressRoute` CRDs.
|
|
You **must** have both:
|
|
- `IngressRoute` — actual routing
|
|
- `Ingress` — cert-manager TLS issuance + external-dns registration
|
|
|
|
Missing the companion `Ingress` = cert never issued, hostname never registered.
|
|
|
|
## Level 4: Pod Connectivity
|
|
|
|
```bash
|
|
# Test from inside cluster
|
|
kubectl run -it --rm nettest --image=nicolaka/netshoot --restart=Never -- bash
|
|
# curl http://<service>.<namespace>.svc.cluster.local
|
|
# nslookup <service>.<namespace>.svc.cluster.local
|
|
# curl -v https://<external-hostname>
|
|
|
|
# Check service has endpoints (pod actually behind service?)
|
|
kubectl get endpoints <service> -n <namespace>
|
|
```
|
|
|
|
## Level 5: Cilium
|
|
|
|
```bash
|
|
# Cilium status
|
|
kubectl exec -n kube-system ds/cilium -- cilium status
|
|
|
|
# Dropped flows
|
|
kubectl exec -n kube-system ds/cilium -- \
|
|
hubble observe --namespace <ns> --verdict DROPPED
|
|
|
|
# Active policies
|
|
kubectl get networkpolicy -n <namespace>
|
|
kubectl get ciliumnetworkpolicy -n <namespace>
|
|
|
|
# Pod identity
|
|
kubectl exec -n kube-system ds/cilium -- cilium endpoint list | grep <pod-ip>
|
|
```
|
|
|
|
## Level 6: Pangolin Tunnel
|
|
|
|
```bash
|
|
# Check annotations on IngressRoute
|
|
kubectl get ingressroute <name> -n <namespace> -o yaml | grep pangolin
|
|
|
|
# Pangolin/Newt pod health
|
|
kubectl get pods -n pangolin
|
|
kubectl logs -n pangolin <newt-pod>
|
|
```
|
|
|
|
**Required annotations for Pangolin-routed services:**
|
|
```yaml
|
|
annotations:
|
|
pangolin.fossorial.io/enabled: "true"
|
|
external-dns.alpha.kubernetes.io/target: "external"
|
|
```
|
|
|
|
## EKS / Cloud Extras
|
|
|
|
```bash
|
|
# CoreDNS logs
|
|
kubectl logs -n kube-system -l k8s-app=kube-dns
|
|
|
|
# Security group check
|
|
aws ec2 describe-security-groups --group-ids sg-xxxx
|
|
```
|
|
|
|
Also check: VPC flow logs, ALB access logs, inbound/outbound security group rules.
|
|
|
|
## Common Mistakes
|
|
|
|
| Mistake | Fix |
|
|
|---------|-----|
|
|
| Only created `IngressRoute`, no `Ingress` | Add companion `Ingress` for cert-manager + external-dns |
|
|
| cert-manager can't do DNS-01 | Check DigitalOcean API token secret exists in cert-manager ns |
|
|
| Split-horizon confusion | Always compare `@8.8.8.8` vs `@10.0.6.6` explicitly |
|
|
| Pangolin service not externally reachable | Verify both annotations are present |
|
|
| Cilium blocking cert-manager | Check egress NetworkPolicy for UDP 53 and TCP 443 |
|