Hello, I'd like to report a regression that goes back to the 2015. I know it's damn late, but the good thing is, the regression is still easy to reproduce, verify & revert. Long story short, starting with the commit 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan") - which first hit kernel 4.2 - NAT performance of my router dropped by 30% - 40%. My hardware is BCM47094 SoC (dual core ARM) with integrated network controller and external BCM53012 switch. Relevant setup: * SoC network controller is wired to the hardware switch * Switch passes 802.1q frames with VID 1 to four LAN ports * Switch passes 802.1q frames with VID 2 to WAN port * Linux does NAT for LAN (eth0.1) to WAN (eth0.2) * Linux uses pfifo and "echo 2 > rps_cpus" * Ryzen 5 PRO 2500U (x86_64) laptop connected to a LAN port * Intel i7-2670QM laptop connected to a WAN port * Speed of LAN to WAN measured using iperf & TCP over 10 minutes 1) 5.1.0-rc3 [ 6] 0.0-600.0 sec 39.9 GBytes 572 Mbits/sec 2) 5.1.0-rc3 + rtcache patch [ 6] 0.0-600.0 sec 40.0 GBytes 572 Mbits/sec 3) 5.1.0-rc3 + disable GRO support [ 6] 0.0-300.4 sec 27.5 GBytes 786 Mbits/sec 4) 5.1.0-rc3 + rtcache patch + disable GRO support [ 6] 0.0-600.0 sec 65.6 GBytes 939 Mbits/sec 5) 4.1.15 + rtcache patch 934 Mb/s 6) 4.3.4 + rtcache patch 565 Mb/s As you can see I can achieve a big performance gain by disabling/reverting a GRO support. Getting up to 65% faster NAT makes a huge difference and ideally I'd like to get that with upstream Linux code. Could someone help me and check the reported commit/code, please? Is there any other info I can provide or anything I can test for you? --- a/net/8021q/vlan_core.c +++ b/net/8021q/vlan_core.c @@ -545,6 +545,8 @@ static int __init vlan_offload_init(void) { unsigned int i; + return -ENOTSUPP; + for (i = 0; i < ARRAY_SIZE(vlan_packet_offloads); i++) dev_add_offload(&vlan_packet_offloads[i]);