I faced multiple problems during installation of k8s multimaster cluster with external etcd. I did it before twice, on other sites, successfully, but this time I need help.
calico was installed from the recommended in guide yaml: https://docs.projectcalico.org/manifests/calico.yaml
First, there was problem installing calico - calico-node could not reach API, when apiServer.extraArgs.advertise-address was mentioned in config.
After that calico-kube-controllers stuck in ContainerCreating state. I managed to fix it by using calico-etcd.yaml instead if calico.yaml. Now calico pods are up and running, calicoctl can see them in etcd.
But the coredns pods stuck in ConteinerCreating. These lines I can see in describe pod:
Warning FailedScheduling 82s (x2 over 88s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. Normal Scheduled 80s default-scheduler Successfully assigned kube-system/coredns-6955765f44-clbhk to master01.<removed> Warning FailedCreatePodSandBox 18s kubelet, master01.<removed> Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9ab9fe3bd3d4e145c218fe59f6578169fa09075c59718fbe2f 7033d207c4ea4c" network for pod "coredns-6955765f44-clbhk": networkPlugin cni failed to set up pod "coredns-6955765f44-clbhk_kube-system" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get http:///var/run/cilium/cilium.sock/v1/config: dial unix /var/run/cilium/cilium.sock: connect: no such file or directory Is the agent running? Normal SandboxChanged 17s kubelet, master01.<removed> Pod sandbox changed, it will be killed and re-created. But I don't use cilium. I use calico. I did tried cilium during first calico problem debug, but I removed it, rebiult cluster multiple times and also wiped etcd data after every try.
Here is kubelet config:
apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterConfiguration kubernetesVersion: "v1.17.2" controlPlaneEndpoint: "192.168.10.100:7443" #balancer ip:port etcd: external: endpoints: - http://192.168.20.1:2379 - http://192.168.20.2:2379 - http://192.168.40.1:2379 - http://192.168.40.2:2379 - http://192.168.40.3:2379 #controllerManager: # extraArgs: # node-monitor-period: "2s" # node-monitor-grace-period: "16s" # pod-eviction-timeout: "30s" networking: dnsDomain: "cluster.local" podSubnet: "10.96.0.0/12" serviceSubnet: "172.16.0.0/12" apiServer: timeoutForControlPlane: "60s" # extraArgs: # advertise-address: "192.168.10.100" # bind-address: "192.168.20.1" # secure-port: "6443" kubernetes 1.17.2, etcd 3.3.11, centos 7 x64
It feels like problem is somewhere between api pod and etcd, but I can't locate it.