--- name: network-debugging description: Use when diagnosing network connectivity issues in Zoe's homelab or work environments — DNS not resolving, TLS cert stuck, service unreachable, ingress not routing, Cilium dropping packets, or Pangolin tunnel not working. --- # Network Debugging ## Overview Systematic outside-in debugging for Zoe's homelab stack: DigitalOcean DNS + BIND9 split-horizon, cert-manager DNS-01, Traefik IngressRoute, Cilium CNI, and Pangolin tunnels. **Rule:** Always work from outside in. DNS → TLS → Ingress → Pod → Cilium → Pangolin. ## Quick Symptom → First Command | Symptom | First command | |---------|---------------| | Can't reach service from browser | `dig @8.8.8.8` | | Certificate expired / not trusted | `kubectl get certificate -n ` | | cert-manager stuck in Pending | `kubectl get challenge -A` | | Service resolves but connection refused | `kubectl get endpoints -n ` | | Works internally, not externally | Check Pangolin annotations + external-dns target | | Works externally, not from cluster | `kubectl run nettest --image=nicolaka/netshoot` | | Pod can't reach external internet | Check Cilium NetworkPolicy egress rules | | DNS resolves wrong IP | Compare `dig @8.8.8.8` vs `dig @10.0.6.6` (split-horizon issue) | ## Level 1: DNS ```bash # Public DNS dig @8.8.8.8 dig @ns1.digitalocean.com # Internal DNS (from within cluster) kubectl run -it --rm dnsutils --image=busybox --restart=Never -- nslookup # ACME challenge record (cert-manager DNS-01) dig TXT _acme-challenge. @ns1.digitalocean.com # ExternalDNS registration kubectl logs -n external-dns -l app.kubernetes.io/name=external-dns | tail -20 ``` **Stack:** DigitalOcean (ctz.fyi public) + BIND9 (10.0.6.6, split-horizon internal) **Public NS:** ns1/ns2/ns3.digitalocean.com **Domains:** `*.ctz.fyi` (public), `*.i.ctz.fyi` (internal only) ## Level 2: TLS / cert-manager ```bash # Certificate status kubectl get certificate -n kubectl describe certificate -n # Active ACME challenge kubectl get challenge -A kubectl describe challenge -n # cert-manager errors kubectl logs -n cert-manager deploy/cert-manager | grep -i error | tail -20 # Verify cert in secret kubectl get secret -tls -n \ -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates ``` **Common issue:** cert-manager can't create DNS TXT record - Check DigitalOcean token: `kubectl get secret digitalocean-dns -n cert-manager` - Check outbound UDP 53 — Cilium NetworkPolicy may block cert-manager egress ## Level 3: Ingress / Traefik ```bash # Check IngressRoute kubectl get ingressroute -n -o yaml # Traefik logs for hostname kubectl logs -n traefik deploy/traefik | grep ``` **Critical gotcha:** cert-manager reads `Ingress` objects, not `IngressRoute` CRDs. You **must** have both: - `IngressRoute` — actual routing - `Ingress` — cert-manager TLS issuance + external-dns registration Missing the companion `Ingress` = cert never issued, hostname never registered. ## Level 4: Pod Connectivity ```bash # Test from inside cluster kubectl run -it --rm nettest --image=nicolaka/netshoot --restart=Never -- bash # curl http://..svc.cluster.local # nslookup ..svc.cluster.local # curl -v https:// # Check service has endpoints (pod actually behind service?) kubectl get endpoints -n ``` ## Level 5: Cilium ```bash # Cilium status kubectl exec -n kube-system ds/cilium -- cilium status # Dropped flows kubectl exec -n kube-system ds/cilium -- \ hubble observe --namespace --verdict DROPPED # Active policies kubectl get networkpolicy -n kubectl get ciliumnetworkpolicy -n # Pod identity kubectl exec -n kube-system ds/cilium -- cilium endpoint list | grep ``` ## Level 6: Pangolin Tunnel ```bash # Check annotations on IngressRoute kubectl get ingressroute -n -o yaml | grep pangolin # Pangolin/Newt pod health kubectl get pods -n pangolin kubectl logs -n pangolin ``` **Required annotations for Pangolin-routed services:** ```yaml annotations: pangolin.fossorial.io/enabled: "true" external-dns.alpha.kubernetes.io/target: "external" ``` ## EKS / Cloud Extras ```bash # CoreDNS logs kubectl logs -n kube-system -l k8s-app=kube-dns # Security group check aws ec2 describe-security-groups --group-ids sg-xxxx ``` Also check: VPC flow logs, ALB access logs, inbound/outbound security group rules. ## Common Mistakes | Mistake | Fix | |---------|-----| | Only created `IngressRoute`, no `Ingress` | Add companion `Ingress` for cert-manager + external-dns | | cert-manager can't do DNS-01 | Check DigitalOcean API token secret exists in cert-manager ns | | Split-horizon confusion | Always compare `@8.8.8.8` vs `@10.0.6.6` explicitly | | Pangolin service not externally reachable | Verify both annotations are present | | Cilium blocking cert-manager | Check egress NetworkPolicy for UDP 53 and TCP 443 |