I have now tried for a week to properly setup Windows k8s node using Calico without success. I have followed official Calico documentation docs.tigera.io. I have tried both Operator and Manual install without success. I am stuck in both cases at HPC operations as my Linux master cannot get Windows node into Ready status. Here are the steps:
- My containers are in AWS EC2 where I have 1 master, 1 Linux node and 1 windows node. Security groups are properly setup and ports between master and nodes are open.
- My Linux node is connecting with no issues (I have deployed pods on master and Linux node and they are operating as expected)
- I have initiated my cluster with
sudo kubeadm init --pod-network-cidr=192.168.0.0/16 - After that I have followed Calico Operator install with updates to Pods to allow only VXLAN and strict affinity
This is what pods are looking like:
NAMESPACE NAME READY STATUS RESTARTS AGE calico-apiserver calico-apiserver-844c7896b7-8qr98 1/1 Running 0 86m calico-apiserver calico-apiserver-844c7896b7-hhsfn 1/1 Running 0 86m calico-system calico-kube-controllers-98fbc76fb-x6p88 1/1 Running 0 86m calico-system calico-node-4c2vp 1/1 Running 0 86m calico-system calico-node-cbhld 1/1 Running 0 86m calico-system calico-typha-7b579d4b66-96jm4 1/1 Running 0 86m calico-system calico-typha-7b579d4b66-k2lkb 1/1 Running 0 86m calico-system csi-node-driver-84jlq 2/2 Running 0 112m calico-system csi-node-driver-l5nmb 2/2 Running 0 112m kube-system coredns-7db6d8ff4d-2hbs9 1/1 Running 0 114m kube-system coredns-7db6d8ff4d-vlvfg 1/1 Running 0 114m kube-system etcd-ip-172-16-8-123 1/1 Running 9 115m kube-system kube-apiserver-ip-172-16-8-123 1/1 Running 9 115m kube-system kube-controller-manager-ip-172-16-8-123 1/1 Running 2 115m kube-system kube-proxy-566rb 1/1 Running 0 113m kube-system kube-proxy-fkvv2 1/1 Running 0 114m kube-system kube-proxy-windows-j4fgh 1/1 Running 0 83m kube-system kube-scheduler-ip-172-16-8-123 1/1 Running 9 115m tigera-operator tigera-operator-76ff79f7fd-tj5pp 1/1 Running 0 113m Nodes:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ec2amaz-5bu9t2v NotReady <none> 90m v1.30.0 172.16.8.235 <none> Windows Server 2022 Datacenter 10.0.20348.2402 containerd://1.7.1 ip-172-16-8-123 Ready control-plane 115m v1.30.0 172.16.8.123 <none> Ubuntu 24.04 LTS 6.8.0-1008-aws containerd://1.7.16 ip-172-16-8-26 Ready <none> 114m v1.30.0 172.16.8.26 <none> Ubuntu 24.04 LTS 6.8.0-1008-aws containerd://1.7.16 - I have setup Windows Server 2022 with instructions from the Calico docs. All is well until step 6: Install kube-proxy on Windows nodes (If kube-proxy is not running, you must install and run kube-proxy on each of the Windows nodes in your cluster). I was not sure what is meant there as I understood that using config is done with sig-windows-tools on master node.
- I have downloaded sig-windows-tools and modified scripts to match my cluster. I have built all required images for calico and kube-proxy and uploaded to AWS registry.
- Deployed with no issues.
Windows and Linux are running containerd.
Kubernetes: 1.30.0 Calico: 3.28.0
Describe from Node:
Ready False Tue, 21 May 2024 22:53:04 +0000 Tue, 21 May 2024 21:19:39 +0000 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cd Describe from Win proxy:
k logs kube-proxy-windows-j4fgh -n kube-system WARNING: The names of some imported commands from the module 'hns' include unapproved verbs that might make them less discoverable. To find the commands with unapproved verbs, run the Import-Module command again with the Verbose parameter. For a list of approved verbs, type Get-Verb. Running kub-proxy service. Waiting for HNS network Calico to be created... I am stuck now and do not understand how to proceed with making Windows node Ready. Any help is appreciated.