Summary:
I recently migrated a Kubernetes cluster from bare metal to a Proxmox VM on the same physical host. Since the migration, the VM experiences intermittent high CPU spikes and network latency, causing significant performance issues.
Workload & Topology:
I have three identical servers (see Configuration below), each running Proxmox with two VMs: one worker node that uses a majority of the resources available on the physical host, and another system node that's relatively light. Only the worker nodes show the problematic behavior. The worker loads each have ten disks passed through to form two ZFS pools, which OpenEBS then uses to provide persistent storage to Kubernetes.
A noteworthy workload on the Kubernetes is an ECK-managed Elasticsearch cluster. It tends to have the highest CPU utilization of anything in my Kubernetes cluster. Each Kubernetes host runs one Elasticsearch master and two data nodes.
Proxmox resources are not overallocated, and CPU consumption generally sits at 10-20% for the VMs, 20-40% overall for the host.
Symptoms:
- Intermittent High CPU Spikes: When running
htop
on the Proxmox host, I see thekvm
process associated with the troublesome VM spike up to ~6,000% (near capacity, I believe) during the freezes. - Network Latency: Continuous local network pings to the worker nodes show that pings stop altogether for 3 to 10 seconds. Following these periods, replies come in quick succession, showing high round-trip times. Round-trip times might look like: 0.5ms, 0.4ms, 0.6ms, 3000ms, 2500ms, 1200ms, 300ms, 0.4ms, 0.5ms, ....
- Various Kubernetes Issues: inter-pod timeouts, nodes going NotReady arbitrarily, and dozens of Prometheus alerts all point to fundamental issues with the cluster itself.
- Diagnostic Observations:
- Both
atop
andiotop
freeze during the ping freeze, suggesting an issue at the CPU level (and making it very difficult to troubleshoot specifics). - I wrote a script that loops indefinitely, printing the time to a file and then waiting a second. Run with
chrt -f 99 ./crit_test.sh
, it stops updating during the ping freeze as well.
- Both
Additional Context:
- This issue did not occur when the Kubernetes cluster was running on bare metal.
- A similar issue occurred with an OPNsense VM on a different Proxmox host when Zenarmor was enabled (which managed its own Elasticsearch internally). The issue resolved when the Zenarmor process was stopped.
- When I say I "migrated" from bare metal to Proxmox, I mean I cloned the physical disk onto a virtual one and called it a day when it booted successfully; given I didn't actually install the OS onto the virtualized hardware, I've wondered whether there could be an issue with a driver or configuration that needs to be updated.
Configuration:
Host:
- OS: Proxmox VE 8.2.2 (6.5.13-5-pve)
- Hardware: Dell R730xd
- CPUs: 72 x Intel(R) Xeon(R) CPU E5-2696 v3 @ 2.30GHz (2 Sockets)
- RAM: 316 GB
- Boot Mode: Legacy BIOS
- Networking setup: OVS Bridges
K8s Worker VM:
- OS: Ubuntu 22.04.04 (5.15.0-107-generic) (minimal install)
- CPUs: 64 (2 sockets, 32 cores) [host] [numa=1]
- RAM: 245 GiB
- Kubernetes Version: v1.28.6 (kubeadm deployed with Kubespray)
- Container Runtime Version: containerd://1.7.13
- Configuration:
agent: 1 boot: order=scsi0 cores: 32 cpu: host machine: q35 memory: 250880 meta: creation-qemu=8.1.2,ctime=1712944757 name: home-k8s-2 net0: virtio=BC:...:6E,bridge=vmbr1,firewall=1,queues=63 net1: virtio=BC:...:E5,bridge=vmbr2,firewall=1,queues=63,tag=4 numa: 1 onboot: 1 ostype: l26 scsi0: data_zfs:vm-802-disk-0,discard=on,iothread=1,size=256G,ssd=1 scsi1: /dev/disk/by-id/ata-..._90Y7T077,backup=0,discard=on,iothread=1,size=937692504K,ssd=1 scsi10: /dev/disk/by-id/scsi-...57c,backup=0,discard=on,iothread=1,size=1172112984K,ssd=1 scsi11: data_zfs:vm-802-disk-1,backup=0,discard=on,iothread=1,size=100G,ssd=1 scsi2: /dev/disk/by-id/ata-..._90Y7O01K,backup=0,discard=on,iothread=1,size=937692504K,ssd=1 scsi3: /dev/disk/by-id/ata-..._90Y7I2MA,backup=0,discard=on,iothread=1,size=937692504K,ssd=1 scsi4: /dev/disk/by-id/ata-..._90Y7I0C4,backup=0,discard=on,iothread=1,size=937692504K,ssd=1 scsi5: /dev/disk/by-id/ata-..._0Y7T0JM,backup=0,discard=on,iothread=1,size=937692504K,ssd=1 scsi6: /dev/disk/by-id/scsi-...b9c,backup=0,discard=on,iothread=1,size=1172112984K scsi7: /dev/disk/by-id/scsi-...944,backup=0,discard=on,iothread=1,size=1172112984K scsi8: /dev/disk/by-id/scsi-...f60,backup=0,discard=on,iothread=1,size=1172112984K scsi9: /dev/disk/by-id/scsi-...5fc,backup=0,discard=on,iothread=1,size=1172112984K scsihw: virtio-scsi-single smbios1: uuid=cc8c0d42-...-08645523ceac sockets: 2 startup: order=50 vmgenid: 0e318124-...-18b887322e4
Questions:
- Proxmox Configuration: Is there a potential misconfiguration in my Proxmox setup that's causing this issue?
- Elasticsearch Impact: Could the Elasticsearch cluster running on Kubernetes be causing these intermittent spikes? Are there known issues with JVM GC or other Elasticsearch components causing such behavior in a VM?
- NUMA Configuration: Given the dual-socket, NUMA-supporting hardware, could incorrect NUMA configurations be causing these issues?