- Notifications
You must be signed in to change notification settings - Fork 505
Description
tl;dr If a nameserver replies status=NOERROR with no answer section to a DNS A question, kube-dns always caches this result. If the domain name actually gets an A record after it's queried through kube-dns, it never (I waited a few days) resolves from the pods, but does resolve outside the container (e.g. on my laptop) just fine.
Repro steps
Prerequisites
- Have a domain name
alp.imand the nameservers are pointed to CloudFlare. - Have nslookup/dig installed on your workstation.
- Have a minikube cluster ready on your workstation
- running kubernetes v1.6.0
- kube-dns comes by default, running
gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1
Step 1: Domain does not exist, query from your laptop
Note ANSWER: 0, and status: NOERROR
$ dig A z.alp.im ; <<>> DiG 9.8.3-P1 <<>> A z.alp.im ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64978 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;z.alp.im. IN A ;; AUTHORITY SECTION: alp.im. 1799 IN SOA ivan.ns.cloudflare.com. dns.cloudflare.com. 2025042470 10000 2400 604800 3600 ;; Query time: 196 msec ;; SERVER: 2401:fa00:fa::1#53(2401:fa00:fa::1) ;; WHEN: Thu Jun 29 10:51:35 2017 ;; MSG SIZE rcvd: 99 Step 2: Domain does not exist, query from Pod on Kubernetes
Start a toolbelt/dig container with shell and run the same query:
⚠️ Do not exit this container as you will reuse it later.
Note the response is the same, ANSWER: 0 and NOERROR.
$ kubectl run -i -t --rm --image=toolbelt/dig dig --command -- sh If you don't see a command prompt, try pressing enter. / # dig A z.alp.im ; <<>> DiG 9.11.1-P1 <<>> A z.alp.im ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11209 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;z.alp.im. IN A ;; AUTHORITY SECTION: alp.im. 1724 IN SOA ivan.ns.cloudflare.com. dns.cloudflare.com. 2025042470 10000 2400 604800 3600 ;; Query time: 74 msec ;; SERVER: 10.0.0.10#53(10.0.0.10) ;; WHEN: Thu Jun 29 17:55:46 UTC 2017 ;; MSG SIZE rcvd: 99 (Also note that SERVER: 10.0.0.10#53 which is kube-dns.)
Step 3: Create an A record for the domain
Here I use CloudFlare as it manages my DNS.
Step 4: Test DNS record from your laptop
Run dig on your laptop (note ;; ANSWER SECTION: and 8.8.8.8 answer):
$ dig A z.alp.im ; <<>> DiG 9.8.3-P1 <<>> A z.alp.im ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37570 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;z.alp.im. IN A ;; ANSWER SECTION: z.alp.im. 299 IN A 8.8.8.8 ;; Query time: 196 msec ;; SERVER: 2401:fa00:fa::1#53(2401:fa00:fa::1) ;; WHEN: Thu Jun 29 10:54:44 2017 ;; MSG SIZE rcvd: 53 Step 5: Test DNS record from Pod on Kubernetes
Run the same command again:
/ # dig A z.alp.im ; <<>> DiG 9.11.1-P1 <<>> A z.alp.im ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45420 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;z.alp.im. IN A ;; Query time: 0 msec ;; SERVER: 10.0.0.10#53(10.0.0.10) ;; WHEN: Thu Jun 29 18:00:24 UTC 2017 ;; MSG SIZE rcvd: 37 Note the diff:
- still
ANSWER: 0andstatus: NOERROR(but it resolves just fine outside the cluster) ;; AUTHORITY SECTION:disappeared andAUTHORITY:changed to0from the previous time we ran this.;; Query time: 0 msec(was 79 ms) –I assume it means it's just a cached response.- Query time stays as 0 ms no matter how many times I run the same command.
What else I tried
-
Try it on GKE: I tried with k8s v1.5.x and v1.6.4. → Same issue. (cc: @bowei)
-
Query from a different pod on minikube: I started a new Pod and queried from there → Same issue.
-
Restart kube-dns Pod → This worked on GKE, but not on minikube.
$ kubectl delete pods -n kube-system -l k8s-app=kube-dns pod "kube-dns-268032401-69xk5" deleted
Impact
I am not sure why this has not been discovered before. I noticed this behavior while using kube-lego on GKE. Once kube-lego applies for a TLS certificate, it polls the domain name of the service (e.g. example.com/.well-known/<token>) before asking Let's Encrypt to validate it. Before I create an Ingress with kube-lego annotation, I don't have the external IP yet so I can't configure the domain, but the kube-lego Pod already picks it up and starts querying my domain in an infinite loop. It never succeeds because first time it looked up the hostname, the A record didn't exist, so that result is cached forever. After I add A record, it still can't resolve. The moment I delete kube-dns Pods and they get recreated, it immediately starts working and resolves the hostname and completes the kube-lego challenge.
