0

I faced multiple problems during installation of k8s multimaster cluster with external etcd. I did it before twice, on other sites, successfully, but this time I need help.

calico was installed from the recommended in guide yaml: https://docs.projectcalico.org/manifests/calico.yaml

First, there was problem installing calico - calico-node could not reach API, when apiServer.extraArgs.advertise-address was mentioned in config.

After that calico-kube-controllers stuck in ContainerCreating state. I managed to fix it by using calico-etcd.yaml instead if calico.yaml. Now calico pods are up and running, calicoctl can see them in etcd.

But the coredns pods stuck in ConteinerCreating. These lines I can see in describe pod:

 Warning FailedScheduling 82s (x2 over 88s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. Normal Scheduled 80s default-scheduler Successfully assigned kube-system/coredns-6955765f44-clbhk to master01.<removed> Warning FailedCreatePodSandBox 18s kubelet, master01.<removed> Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9ab9fe3bd3d4e145c218fe59f6578169fa09075c59718fbe2f 7033d207c4ea4c" network for pod "coredns-6955765f44-clbhk": networkPlugin cni failed to set up pod "coredns-6955765f44-clbhk_kube-system" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get http:///var/run/cilium/cilium.sock/v1/config: dial unix /var/run/cilium/cilium.sock: connect: no such file or directory Is the agent running? Normal SandboxChanged 17s kubelet, master01.<removed> Pod sandbox changed, it will be killed and re-created. 

But I don't use cilium. I use calico. I did tried cilium during first calico problem debug, but I removed it, rebiult cluster multiple times and also wiped etcd data after every try.

Here is kubelet config:

apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterConfiguration kubernetesVersion: "v1.17.2" controlPlaneEndpoint: "192.168.10.100:7443" #balancer ip:port etcd: external: endpoints: - http://192.168.20.1:2379 - http://192.168.20.2:2379 - http://192.168.40.1:2379 - http://192.168.40.2:2379 - http://192.168.40.3:2379 #controllerManager: # extraArgs: # node-monitor-period: "2s" # node-monitor-grace-period: "16s" # pod-eviction-timeout: "30s" networking: dnsDomain: "cluster.local" podSubnet: "10.96.0.0/12" serviceSubnet: "172.16.0.0/12" apiServer: timeoutForControlPlane: "60s" # extraArgs: # advertise-address: "192.168.10.100" # bind-address: "192.168.20.1" # secure-port: "6443" 

kubernetes 1.17.2, etcd 3.3.11, centos 7 x64

It feels like problem is somewhere between api pod and etcd, but I can't locate it.

1 Answer 1

1

Oh, nevermind. I have found it.

There were cilium-cni cilium-cni.old files in /opt/cni/bin/ These files obviously were installed with cilium, so they survived kubernetes-cni rpm reinstallation. Idk why, but k8s prefers cilium, if it is available. Is it bug? Should I report it?

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.