1

I have setup a kubernetes cluster with 2 master nodes (cp01 192.168.1.42, cp02 192.168.1.46) and 4 worker nodes, implemented with haproxy and keepalived running as static pods in the cluster, internal etcd cluster. For some silly reasons, I accidentally kubeadm reset -f on cp01. Now I am trying rejoin the cluster using kubeadm join command but I keep getting the dial tcp 192.168.1.49:8443: connect: connection refused, where 192.168.1.49 is the LoadBalancer IP. Please help! Below are the current configurations.

/etc/haproxy/haproxy.cfg on cp02

defaults timeout connect 10s timeout client 30s timeout server 30s frontend apiserver bind *.8443 mode tcp option tcplog default_backend apiserver backend apiserver option httpchk GET /healthz http-check expect status 200 mode tcp option ssl-hello-chk balance roundrobin default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100 #server master01 192.168.1.42:6443 check ***the one i accidentally resetted server master02 192.168.1.46:6443 check 

/etc/keepalived/keepalived.conf on cp02

global_defs { router_id LVS_DEVEL script_user root enable_script_security dynamic_interfaces } vrrp_script check_apiserver { script "/etc/keepalived/check_apiserver.sh" interval 3 weight -2 fall 10 rise 2 } vrrp_instance VI_l { state BACKUP interface ens192 virtual_router_id 51 priority 101 authentication { auth_type PASS auth_pass *** } virtual_ipaddress { 192.168.1.49/24 } track_script { check_apiserver } } 

cluster kubeadm-config

apiVersion: v1 data: ClusterConfiguration: | apiServer: extraArgs: authorization-mode: Node,RBAC timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta2 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controlPlaneEndpoint: 192.168.1.49:8443 controllerManager: {} dns: type: CoreDNS etcd: local: dataDir: /var/lib/etcd imageRepository: k8s.gcr.io kind: ClusterConfiguration kubernetesVersion: v1.19.2 networking: dnsDomain: cluster.local podSubnet: 10.244.0.0/16 serviceSubnet: 10.96.0.0/12 scheduler: {} ClusterStatus: | apiEndpoints: cp02: advertiseAddress: 192.168.1.46 bindPort: 6443 apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterStatus ... 

kubectl cluster-info

Kubernetes master is running at https://192.168.1.49:8443 KubeDNS is running at https://192.168.1.49:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy 

More Info

  1. cluster was initialised with --upload-certs on cp01.

  2. I drained and deleted cp01 from the cluster.

  3. kubeadm join --token ... --discovery-token-ca-cert-hash ... --control-plane --certificate-key ... command returned:

    error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get "https://192.168.1.49:8443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp 192.168.1.49:8443: connect: connection refused 
  4. kubectl exec -n kube-system -it etcd-cp02 -- etcdctl --endpoints=https://192.168.1.46:2379 --key=/etc/kubernetes/pki/etcd/peer.key --cert=/etc/kubernetes/pki/etcd/peer.crt --cacert=/etc/kubernetes/pki/etcd/ca.crt member list returned:

    ..., started, cp02, https://192.168.1.46:2380, https://192.168.1.46:2379, false 
  5. kubectl describe pod/etcd-cp02 -n kube-system:

    ... Container ID: docker://... Image: k8s.gcr.io/etcd:3.4.13-0 Image ID: docker://... Port: <none> Host Port: <none> Command: etcd --advertise-client-urls=https://192.168.1.46:2379 --cert-file=/etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --initial-advertise-peer-urls=https://192.168.1.46:2380 --initial-cluster=cp01=https://192.168.1.42:2380,cp02=https://192.168.1.46:2380 --initial-cluster-state=existing --key-file=/etc/kubernetes/pki/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://192.168.1.46:2379 --listen-metrics-urls=http://127.0.0.1:2381 --listen-peer-urls=https://192.168.1.46:2380 --name=cp02 --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt --peer-client-cert-auth=true --peer-key-file=/etc/kubernetes/pki/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt --snapshot-count=10000 --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt ... 
  6. Tried copying the certs to cp01:/etc/kubernetes/pki before running kubeadm join 192.168.1.49:8443 --token ... --discovery-token-ca-cert-hash but returned same error.

    # files copied over to cp01 ca.crt ca.key sa.key sa.pub front-proxy-ca.crt front-proxy-ca.key etcd/ca.crt etcd/ca.key 

Troubleshoot network

  1. Able to ping 192.168.1.49 on cp01

  2. nc -v 192.168.1.49 8443 on cp01 returned Ncat: Connection refused.

  3. curl -k https://192.168.1.49:8443/api/v1... works on cp02 and worker nodes (returns code 403 which should be normal).

  4. /etc/cni/net.d/ is removed on cp01

  5. Manually cleared iptables rules on cp01 with 'KUBE' or 'cali'.

  6. firewalld is disabled on both cp01 and cp02.

  7. I tried joining with a new server cp03 192.168.1.48 and encountered the same dial tcp 192.168.1.49:8443: connect: connection refused error.

  8. netstat -tlnp | grep 8443 on cp02 returned:

    tcp 0 0.0.0.0:8443 0.0.0.0:* LISTEN 27316/haproxy 
  9. nc -v 192.168.1.46 6443 on cp01 and cp03 returns:

    Ncat: Connected to 192.168.1.46:6443 

Any advice/guidance would be greatly appreciated as I am at a loss here. I'm thinking it might be due to the network rules on cp02 but I don't really know how to check this. Thank you!!

1 Answer 1

1

Figured out what was the issue when I entered ip a. Realised that ens192 on cp01 still contains the secondary ip address 192.168.1.49.

Simply ip addr del 192.168.1.49/24 dev ens192 and kubeadm join... and cp01 is able to rejoin the cluster successfully. Can't believe I missed that...

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.