Cross-posted from superuser as I was unable to resolve this issue and haven gotten any replies yet. Maybe this community is more suited to help? Thank you!
I have a Fedora Server VM on macOS Sonoma using UTM. UTM is configured to use a shared network, which creates a VLAN the VM joins and through which the host OS can discover it.
Intermittently, all network connections from the macOS host to the VM drop and remain broken. The only workaround is to reboot the VM, which returns everything to a working state. SSH connections timeout and become unresponsive. Pings to the VM fail.
Could someone point me in the right direction here? Some facts below.
The VM hostname is work, IPv4 address is 192.168.64.2. I don't think it's a problem with the IP lease due to when the connections fail, it retains the IP address.
On the macOS host, ping and traceroute to the VM fail:
$ traceroute 192.168.64.2 traceroute to 192.168.64.2 (192.168.64.2), 64 hops max, 40 byte packets 1 * * * 2 *^C $ ping 192.168.64.2 PING 192.168.64.2 (192.168.64.2): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 Request timeout for icmp_seq 2 Request timeout for icmp_seq 3 The routing table looks OK though, AFAICT:
$ netstat -rn -f inet Routing tables Internet: Destination Gateway Flags Netif Expire default 192.168.178.1 UGScg en0 default link#24 UCSIg bridge100 ! 127 127.0.0.1 UCS lo0 127.0.0.1 127.0.0.1 UH lo0 169.254 link#15 UCS en0 ! 192.168.64 link#24 UC bridge100 ! 192.168.64.1 62.3e.5f.73.14.64 UHLWI lo0 192.168.64.2 8a.c3.94.a.86.e2 UHLWIi bridge100 1199 192.168.178 link#15 UCS en0 ! 192.168.178.1/32 link#15 UCS en0 ! 192.168.178.1 dc:15:c8:ef:b8:1d UHLWIir en0 1198 192.168.178.52/32 link#15 UCS en0 ! 224.0.0/4 link#15 UmCS en0 ! 224.0.0.251 1:0:5e:0:0:fb UHmLWI en0 224.0.0.251 1:0:5e:0:0:fb UHmLWIg bridge100 255.255.255.255/32 link#15 UCS en0 ! And when I drill into this specific route:
$ route get 192.168.64.2 route to: work destination: work interface: bridge100 flags: <UP,HOST,DONE,LLINFO,WASCLONED,IFSCOPE,IFREF> recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire 0 0 0 0 0 0 1500 1200 Bridge network interface (this is created by UTM as part of the VLAN I believe?)
ifconfig bridge100 bridge100: flags=8a63<UP,BROADCAST,SMART,RUNNING,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500 options=3<RXCSUM,TXCSUM> ether 62:3e:5f:73:14:64 inet 192.168.64.1 netmask 0xffffff00 broadcast 192.168.64.255 inet6 fe80::603e:5fff:fe73:1464%bridge100 prefixlen 64 scopeid 0x18 inet6 fd5d:9cb:9b9e:3946:141b:1dc6:2328:9f96 prefixlen 64 autoconf secured Configuration: id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0 maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200 root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0 ipfilter disabled flags 0x0 member: vmenet0 flags=3<LEARNING,DISCOVER> ifmaxaddr 0 port 23 priority 0 path cost 0 Address cache: 8a:c3:94:a:86:e2 Vlan1 vmenet0 1199 flags=0<> nd6 options=201<PERFORMNUD,DAD> media: autoselect status: active ARP lookups seem to work:
$ arp work work (192.168.64.2) at 8a:c3:94:a:86:e2 on bridge100 ifscope [bridge] On the guest, I couldn't see anything out of the ordinary.
So bridge100 on the host is enp0s1 on the guest and it is UP.
I started looking for NetworkManager entries in journalctl as well, but since I don't really know what I was looking for I wasn't sure what to focus on.
I'd appreciate any help.
UPDATE 1:
As suggested in the comments, I ran a TCP capture in Wireshark. Around the time of the connection loss, I see 3 RST packets on 3 different TCP streams:
Given thatpings are broken thereafter, this is probably a symptom, not a cause?
UPDATE 2:
I captured all traffic on bridge100 now and after the disconnect, I see a pattern in ARP requests. Before the disruption, 192.168.64.1 (the macOS host) keeps asking who is 192.168.64.2 (the VM) and receives a reply each time:
However, after the disruption, I see the host starting to broadcast the same question instead of addressing it to the VM's network interface directly (8a:c3:94:a:86:e2 is the VM's network IF); also, now the VM in return is starting to ask who has 192.168.64.1 which is the host (it did not do that before):
This seems to imply:
The host is not getting an answer for ARP lookups; however,
arp workat a command prompt has access to the MAC, as mentioned in above. Not sure if this is just another symptom, not a cause?The VM does have the MAC address is associated with 192.168.64.1 in the airport cache.
arp 192.168.64.1returns(incomplete)forHWaddress. This would at least explain that it is now impossible for the VM to reply to any network packets received from the host.
UPDATE 3:
Around the same time it just happened, I was seeing Spurious retransmission errors in TCP streams:
On the VM itself, I looked for a similar timestamp in journalctl and noticed that systemd-resolved (Fedora's primary DNS resolver) started to fail over into alternative DNS resolution schemes:





