I have a server (ubuntu/debian) with two ISP connections. Both of these WAN connections have multiple public IP addresses.
(big pipe)----eth0-->\ > server ---eth2--(internal) (cable pipe)--eth1-->/ On eth0 I have 4 IPs assigned to me that are a part of a broader /24 subnet. 24.xxx.xxx.xxx/24 On eth1 I have 5 IPs assigned to me but here I am the only one on a /29 (the 6th IP is the gateway I hit) 71.xxx.xxx.xxx/29
My goal is to setup source/policy based routing so that VMs/clients on the various internal subnets (there are multiple actual VLANS on eth2) can be routed out to the internet on any specified WAN IP.
Here's what I've done so far.
First I have eth0 and eth1 configured in the interfaces file.
auto eth0 iface eth0 inet static address 24.xxx.xxx.66 netmask 255.255.255.0 network 24.xxx.xxx.0 broadcast 24.xxx.xxx.255 gateway 24.xxx.xxx.1 dns-nameservers 8.8.8.8 up /etc/network/rt_scripts/i_eth0 auto eth1 iface eth1 inet static address 71.xxx.xxx.107 netmask 255.255.255.248 network 71.xxx.xxx.105 broadcast 71.xxx.xxx.111 up /etc/network/rt_scripts/i_eth1 Then macvlan devices on the BigPipe
#!/bin/sh #iface BigPipe67 ip link add mac0 link eth0 address xx:xx:xx:xx:xx:3c type macvlan ip link set mac0 up ip address add 24.xxx.xxx.67/24 dev mac0 #iface BigPipe135 ip link add mac1 link eth0 address xx:xx:xx:xx:xx:3d type macvlan ip link set mac1 up ip address add 24.xxx.xxx.135/24 dev mac1 #iface BigPipe136 ip link add mac2 link eth0 address xx:xx:xx:xx:xx:3e type macvlan ip link set mac2 up ip address add 24.xxx.xxx.136/24 dev mac2 /etc/network/rt_scripts/t_frontdesk /etc/network/rt_scripts/t_pubwifi /etc/network/rt_scripts/t_mail1 /etc/network/rt_scripts/t_scansrvc CBL connection. The missing 5th IP (71.xxx.xxx.106) is a different router sitting in the building.
#!/bin/sh ip route add xxx.xxx.xxx.xxx/20 via 71.xxx.xxx.105 dev eth1 ip route add xxx.xxx.xxx.xxx/20 via 71.xxx.xxx.105 dev eth1 #iface CBL108 ip link add mac3 link eth1 address xx:xx:xx:xx:xx:c5 type macvlan ip link set mac3 up ip address add 71.xxx.xxx.108/29 dev mac3 #iface CBL109 ip link add mac4 link eth1 address xx:xx:xx:xx:xx:c6 type macvlan ip link set mac4 up ip address add 71.xxx.xxx.109/29 dev mac4 #iface CBL110 ip link add mac5 link eth1 address xx:xx:xx:xx:xx:c7 type macvlan ip link set mac5 up ip address add 71.xxx.xxx.110/29 dev mac5 /etc/network/rt_scripts/t_jenkins4 /etc/network/rt_scripts/t_skynet /etc/network/rt_scripts/t_lappy386 You'll pry notice I have a couple routes specified on the main table when I setup the macvlan interfaces on eth1. I have a couple other routers on the same cable provider as my main server. They VPN back to the main server while the BigPipe is used for everything else (on the main table).
The "t_" scripts are used to setup the individual rules and tables for the various services/clients that used the IPs setup by the macvlan interfaces.
Simplified, they look a little like this.
#!/bin/sh ip rule add from 172.23.1.6 table scansrvc ip route add default via 24.xxx.xxx.1 dev mac0 table scansrvc ip route add 24.xxx.xxx.0/24 dev mac0 table scansrvc ip route add 172.23.0.0/20 dev br1 table scansrvc So putting that all together and as a quick recap, I've got the main server using 8 public IPs (4 on BigPipe and 4 on CBL). One of the BigPipe IPs and one of the CBL IPs are used for VPN services effectively creating a "ghetto internet exchange" if you will. That routing configuration exists on the main table.
Then the remaining 6 IPs are used by various services or clients and those tables are frontdesk, pubwifi, mail1, scansrvc, jenkins4, skynet, and lappy386.
I am masquerading on all public IPs to the various internal subnets.
Here's where I just am dumbfounded... It all works until it doesn't. Meaning, when I startup the server everything gets setup correctly and I am able to see that the routing policies are doing what they're supposed to be doing.
So, on scansrvc, which is a VM on the main server but with an internal ip (172.23.1.6/20)
waffle@scansrvc:~$ dig +short myip.opendns.com @resolver1.opendns.com 24.xxx.xxx.67 However, after a while packets stop making it back to the VM behind the main server. I could see in the iptables firewall stats that they'd leave my network but not make it back.
When it's working and I scan from the outside I can see the service port, but after it dies iptables also doesn't even see the packets make it in.
Also, through my searching I started reading about martian packets. So I turned on the logging of those through sysctl. Wow. I'm logging a ton of martians from the BigPipe but none from the CBL, perhaps because BigPipe I'm not the only one on that subnet?
Here's a snippet
Nov 22 08:59:03 srv3 kernel: [ 271.747016] net_ratelimit: 497 callbacks suppressed Nov 22 08:59:03 srv3 kernel: [ 271.747027] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac0 Nov 22 08:59:03 srv3 kernel: [ 271.747035] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06 .......N$..... Nov 22 08:59:03 srv3 kernel: [ 271.747046] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac2 Nov 22 08:59:03 srv3 kernel: [ 271.747052] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06 .......N$..... Nov 22 08:59:03 srv3 kernel: [ 271.747061] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac1 Nov 22 08:59:03 srv3 kernel: [ 271.747066] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06 .......N$..... Nov 22 08:59:03 srv3 kernel: [ 271.796429] IPv4: martian source 24.xxx.xxx.211 from 24.xxx.xxx.1, on dev mac0 Nov 22 08:59:03 srv3 kernel: [ 271.796440] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06 .......N$..... Nov 22 08:59:03 srv3 kernel: [ 271.796450] IPv4: martian source 24.xxx.xxx.211 from 24.xxx.xxx.1, on dev mac2 From what I understand so far about martians, my hypothesis is that having multiple interfaces on the same subnet could be causing packets not meant for an interface to be sent to that interface... somehow... (I thought since they've got different MAC addresses that would be alleviated)
What would cause this? Why when I freshly boot the system and the VMS will the setup work until all the sudden dies after a while? (Ex. if I leave a ping running to 8.8.8.8 on the scansrvc VM I'll get 100-1000 responses back before it dies) Could this be something with the ARP cache? It's not like I'm reassigning any IPs to different MAC addresses mid-flight.
I'm stuck. I'm going to start to learn some tcpdump skills to try and shed some light on something I'm perhaps missing. If anyone that's better versed in networking setups could point out anything I'm missing it'd be a huge help! :)