I've learned a LOT from this question, but since nftables configuration is not "a piece of cake" to everyone, I want to share my own config, I'll try to share as detailed as I can so someone might have a more intuitive idea on how to do pbr wi nftables:
Step 1, define a standalone nft table.
table inet classify { set pbrips { type ipv4_addr elements = { 1.0.0.1, 8.8.4.4, 8.8.8.8 } } chain output { type route hook output priority filter; policy accept; # This needs to be ^^^^^ route, not filter ip daddr @pbrips counter packets 29795 bytes 3021764 ct mark set 0x00000002 ip daddr @pbrips counter packets 29795 bytes 3021764 meta mark set ct mark } chain prerouting { type filter hook prerouting priority mangle; policy accept; ip daddr @pbrips counter packets 193712 bytes 60796012 ct mark set 0x00000002 ip daddr @pbrips counter packets 193712 bytes 60796012 meta mark set ct mark } }
Here I defined a standalone table called classify, instead of using the existing fw4 table of openwrt 23.05, I do this because I tried to manipulate directly on fw4 table but it seemed to have no effect.
In the table I defined a nftset named pbrips, with some pre-defined ips (in my case I'll always re-route these ips). In this set, each ip is designed to route to another interface instead of the default gateway, thus it's called policy-based-routing, i.e. pbr.
You may have noticed the counter packets 29795 bytes 3021764 part, it's generated automatically by nftables, when you are creating rules, what you only need is a bare counter directive.
For some reasons I have to put two chains there, the output chain, and the prerouting chain, please pay attention to the hooks used inside each chain, they are the key-points if you want to re-route traffic, The reason I put two chains here is that If I only use the output chain, only the traffic (related to the 3 ips in the set) from the openwrt router itself can be re-routed, while with the prerouting chain all the traffic from my LAN can be correctly re-routed (p.s. If someone knows, please kindly remind me how to fix it Thks!)
Step 2, add a dedicated route table for pbr
Here is my /etc/iproute2/rt_tables:
root@OpenWrt:~# cat /etc/iproute2/rt_tables # # reserved values # 128 prelocal 255 local 254 main 253 default 0 unspec # # local # #1 inr.ruhep 123 pbr
You can see in the last row I put a new table there, with id 123 and name pbr, you can use whatever id and name here, just don't use the already existing ones above, they are system-level values. you can edit this file directly with vi and save it
Step 3, create a route in the table to receive all traffic
Here is my output of ip route list table 123 (n.b. you can refer to the table either by it's id or it's name, they are both OK)
root@OpenWrt:~# ip route list table 123 default dev tun0 scope link root@OpenWrt:~#
As you can see, there is only one rule in table 123, and if you want to re-route all traffic through table 123, this route is sufficient.
Step 4, the critical step, create ip rule to consume the fwmark
Here is my output of ip rule
root@OpenWrt:~# ip rule 0: from all lookup local 32765: from all fwmark 0x2 lookup pbr 32766: from all lookup main 32767: from all lookup default root@OpenWrt:~#
And the keypoint here lies in the 3rd line, aka the fwmark 0x2 part, here 0x2 refers to hex value 2, but hex value 0x2 is equivalent to decimal value 2, and the value in my nft rule is 0x00000002, because 0x00000002 is exactly the same hex value of 0x2, and both equal to decimal value 2, I recommend you to always use a decimal value like 1,2,3, not 0x1,0x2,0x3, to not confuse yourself (as I did before)
Step 5, use whatever your preferred method to add elements to the nftset
As long as you add new elements (i.e. IPs) to the nftset pbrips, you can test it out through traceroute 8.8.8.8 to verify if the pbr policy is working or not. Theoretically you are ready to go now.
Step 6, you may need to change the value of sysctl rp_filter
As mentioned on this Medium blog post you may need to change the value to echo 2 > /proc/sys/net/ipv4/conf/all/rp_filter or echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter (risky, check the documentation).
echo "net.ipv4.conf.all.rp_filter=2" > /etc/sysctl.d/95-IPv4-Filtering.conf sysctl -p /etc/sysctl.d/95-IPv4-Filtering.conf # Reboot and check the value sysctl -a | grep rp_filter
And Thank you again Alexey Martemyanov for your question and your answer as a good reference to help me learning pbr!