All of lore.kernel.org
 help / color / mirror / Atom feed
* IP fragmentation performance and don't fragment bug when forwarding
@ 2018-12-02 12:01 Risto Pajula
  2018-12-02 21:32 ` Risto Pajula
  0 siblings, 1 reply; 14+ messages in thread
From: Risto Pajula @ 2018-12-02 12:01 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov; +Cc: netdev

Hello.

I have encountered a weird performance problem in Linux IP fragmentation 
when using video streaming services behind the NAT. Also I have studied 
a possible bug in the DF bit (don't fragment) handling when forwarding 
the IP packets.

First the system setup description:

[host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre router]-internet

where:
host1: is a Netgem N7800 "cable box" for online video streaming services 
provided by local telco (Can access Netflix, HBO nordic, "live TV", etc.)
linux router: Linux computer with Dualcore Intel Celeron G1840, running 
currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, 
mtu set to 1500, RTL8169sb/8110sb
eth0: Linux Routers internet facing interface, public ip address, mtu 
set to 1500,  RTL8168evl/8111evl
fibre router: Alcatel Lucent fibre router (I-241G-Q), directly connected 
to the eth0 of the Linux router.

And now when using the Netgem N7800 with online video services (Netflix, 
HBO nordic, etc) the Linux router will receive very BIG IP packets in 
the eth0 upto ~20kB, this seems to lead to the following problems in the 
Linux IP stack.

IP fragmentation performance:
When the Linux router receives these large IP packets in the eth0 
everything works, but it seems that them cause very large performance 
degradation from internal network to the internet regarding the latency 
when the IP fragmentation is performed. The ping latency from internal 
network to the internel network increases from stable 15ms-20ms up to 
700-800ms AND also the ping from the internal network to the linux 
router eth1 (192.168.0.). However up link works perfectly, the ping is 
still stable when streaming the online services (From linux router to 
the internet). It seems that the IP fragmentation is somehow blocking 
the eth1 reception or transmission for very long time (which it 
shouldn't). I'm able to test and debug the issue further, but advice 
regarding where to look would be appreciated.


DF Bit, mtu bug when forwarding:
I have started to study the above mentioned problem and have found a 
possible bug in the DF bit and mtu handling in IP forwarding. The BIG 
packets received from streaming services all have the "DF bit" set and 
the question is that should we be forwarding them at all as that would 
result them being fragmented? Apparently we currently are... I have 
traced this down to the ip_forward.c function ip_exceeds_mtu(), and the 
following patch seems to fix that.

--- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
+++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
@@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
                 return false;

         /* original fragment exceeds mtu and DF is set */
-       if (unlikely(IPCB(skb)->frag_max_size > mtu))
+        if (unlikely(skb->len > mtu))
                 return true;

         if (skb->ignore_df)


This seems to work (in some ways) - after the change IP packets that are 
too large to the internal network get dropped and we are sending "ICMP 
Destination unreachable, The datagram is too big" messages to the 
originator (as we should?). However it seems that not all services 
really like this... Netflix behaves as expected and ping is stable from 
internal network to the internet, but for example HBO nordic will not 
work anymore (too little buffering? Retransimissions not working?). So 
it seems the original issue should be also fixed (And the fragmention 
should be allowed?).



Any advice would be appreciated. Thanks!

PS. Watching TV was not this intensive 20 years ago :)

-- 
Risto Pajula
or.pajula@gmail.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: IP fragmentation performance and don't fragment bug when forwarding
  2018-12-02 12:01 IP fragmentation performance and don't fragment bug when forwarding Risto Pajula
@ 2018-12-02 21:32 ` Risto Pajula
  2018-12-07 14:46   ` Risto Pajula
  0 siblings, 1 reply; 14+ messages in thread
From: Risto Pajula @ 2018-12-02 21:32 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov; +Cc: netdev

Hello.

You can most likely ignore the "DF Bit, mtu bug when forwarding" case. 
There isn't actually big IP packets on the wire, instead there is burst 
of packets on the wire, which are combined by the GRO... And thus 
dropping them should not happen. Sorry about the invalid bug report.

However the poor latency from intenal network to the internet still 
remain, both GRO enabled and disabled. I will try to study further...


BR.
Risto


On 2.12.2018 14:01, Risto Pajula wrote:
> Hello.
>
> I have encountered a weird performance problem in Linux IP 
> fragmentation when using video streaming services behind the NAT. Also 
> I have studied a possible bug in the DF bit (don't fragment) handling 
> when forwarding the IP packets.
>
> First the system setup description:
>
> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre router]-internet
>
> where:
> host1: is a Netgem N7800 "cable box" for online video streaming 
> services provided by local telco (Can access Netflix, HBO nordic, 
> "live TV", etc.)
> linux router: Linux computer with Dualcore Intel Celeron G1840, 
> running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, 
> mtu set to 1500, RTL8169sb/8110sb
> eth0: Linux Routers internet facing interface, public ip address, mtu 
> set to 1500,  RTL8168evl/8111evl
> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly 
> connected to the eth0 of the Linux router.
>
> And now when using the Netgem N7800 with online video services 
> (Netflix, HBO nordic, etc) the Linux router will receive very BIG IP 
> packets in the eth0 upto ~20kB, this seems to lead to the following 
> problems in the Linux IP stack.
>
> IP fragmentation performance:
> When the Linux router receives these large IP packets in the eth0 
> everything works, but it seems that them cause very large performance 
> degradation from internal network to the internet regarding the 
> latency when the IP fragmentation is performed. The ping latency from 
> internal network to the internel network increases from stable 
> 15ms-20ms up to 700-800ms AND also the ping from the internal network 
> to the linux router eth1 (192.168.0.). However up link works 
> perfectly, the ping is still stable when streaming the online services 
> (From linux router to the internet). It seems that the IP 
> fragmentation is somehow blocking the eth1 reception or transmission 
> for very long time (which it shouldn't). I'm able to test and debug 
> the issue further, but advice regarding where to look would be 
> appreciated.
>
>
> DF Bit, mtu bug when forwarding:
> I have started to study the above mentioned problem and have found a 
> possible bug in the DF bit and mtu handling in IP forwarding. The BIG 
> packets received from streaming services all have the "DF bit" set and 
> the question is that should we be forwarding them at all as that would 
> result them being fragmented? Apparently we currently are... I have 
> traced this down to the ip_forward.c function ip_exceeds_mtu(), and 
> the following patch seems to fix that.
>
> --- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
> +++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
>                 return false;
>
>         /* original fragment exceeds mtu and DF is set */
> -       if (unlikely(IPCB(skb)->frag_max_size > mtu))
> +        if (unlikely(skb->len > mtu))
>                 return true;
>
>         if (skb->ignore_df)
>
>
> This seems to work (in some ways) - after the change IP packets that 
> are too large to the internal network get dropped and we are sending 
> "ICMP Destination unreachable, The datagram is too big" messages to 
> the originator (as we should?). However it seems that not all services 
> really like this... Netflix behaves as expected and ping is stable 
> from internal network to the internet, but for example HBO nordic will 
> not work anymore (too little buffering? Retransimissions not 
> working?). So it seems the original issue should be also fixed (And 
> the fragmention should be allowed?).
>
>
>
> Any advice would be appreciated. Thanks!
>
> PS. Watching TV was not this intensive 20 years ago :)
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: IP fragmentation performance and don't fragment bug when forwarding
  2018-12-02 21:32 ` Risto Pajula
@ 2018-12-07 14:46   ` Risto Pajula
  2018-12-09 23:28     ` IP (rtl8169) forwarding bug (performance) Risto Pajula
  0 siblings, 1 reply; 14+ messages in thread
From: Risto Pajula @ 2018-12-07 14:46 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov; +Cc: netdev

Hello.

I have been to track the poor forwarding latency to the TCP Window scale 
options. The Netgem device uses rather large windows scale options 
(x256) and I have been able to reproduce the routers poor forwarding 
latency also with linux box running in the internal network and changing 
the net.ipv4.tcp_rmem to a large value and thus changing the TCP window 
scaling options to larger ones. I still do not have clue why this causes 
the forwarfing in the linux kernel to block? Maybe something in the 
connection tracking....?


With the ICMP timestamp messages I have been able to also pinpoint that 
the latency is caused in the eth1 sending side (the following hping3 
example is run in the router toward the internal network...


xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
using eth1, addr: 192.168.0.1, MTU: 1500
HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0 
data bytes
len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
icmp_seq=0 rtt=7.9 ms
ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
ICMP timestamp RTT tsrtt=7

len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
icmp_seq=1 rtt=235.9 ms
ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
ICMP timestamp RTT tsrtt=235

len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
icmp_seq=2 rtt=3.8 ms
ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
ICMP timestamp RTT tsrtt=3

len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
icmp_seq=3 rtt=47.8 ms
ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
ICMP timestamp RTT tsrtt=47

len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
icmp_seq=4 rtt=7.7 ms
ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
ICMP timestamp RTT tsrtt=7

len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
icmp_seq=5 rtt=3.7 ms
ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
ICMP timestamp RTT tsrtt=3

^C
--- 192.168.0.112 hping statistic ---
6 packets tramitted, 6 packets received, 0% packet loss
round-trip min/avg/max = 3.7/51.1/235.9 ms



BR.
Risto

On 2.12.2018 23:32, Risto Pajula wrote:
> Hello.
>
> You can most likely ignore the "DF Bit, mtu bug when forwarding" case. 
> There isn't actually big IP packets on the wire, instead there is 
> burst of packets on the wire, which are combined by the GRO... And 
> thus dropping them should not happen. Sorry about the invalid bug report.
>
> However the poor latency from intenal network to the internet still 
> remain, both GRO enabled and disabled. I will try to study further...
>
>
> BR.
> Risto
>
>
> On 2.12.2018 14:01, Risto Pajula wrote:
>> Hello.
>>
>> I have encountered a weird performance problem in Linux IP 
>> fragmentation when using video streaming services behind the NAT. 
>> Also I have studied a possible bug in the DF bit (don't fragment) 
>> handling when forwarding the IP packets.
>>
>> First the system setup description:
>>
>> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre 
>> router]-internet
>>
>> where:
>> host1: is a Netgem N7800 "cable box" for online video streaming 
>> services provided by local telco (Can access Netflix, HBO nordic, 
>> "live TV", etc.)
>> linux router: Linux computer with Dualcore Intel Celeron G1840, 
>> running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, 
>> mtu set to 1500, RTL8169sb/8110sb
>> eth0: Linux Routers internet facing interface, public ip address, mtu 
>> set to 1500,  RTL8168evl/8111evl
>> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly 
>> connected to the eth0 of the Linux router.
>>
>> And now when using the Netgem N7800 with online video services 
>> (Netflix, HBO nordic, etc) the Linux router will receive very BIG IP 
>> packets in the eth0 upto ~20kB, this seems to lead to the following 
>> problems in the Linux IP stack.
>>
>> IP fragmentation performance:
>> When the Linux router receives these large IP packets in the eth0 
>> everything works, but it seems that them cause very large performance 
>> degradation from internal network to the internet regarding the 
>> latency when the IP fragmentation is performed. The ping latency from 
>> internal network to the internel network increases from stable 
>> 15ms-20ms up to 700-800ms AND also the ping from the internal network 
>> to the linux router eth1 (192.168.0.). However up link works 
>> perfectly, the ping is still stable when streaming the online 
>> services (From linux router to the internet). It seems that the IP 
>> fragmentation is somehow blocking the eth1 reception or transmission 
>> for very long time (which it shouldn't). I'm able to test and debug 
>> the issue further, but advice regarding where to look would be 
>> appreciated.
>>
>>
>> DF Bit, mtu bug when forwarding:
>> I have started to study the above mentioned problem and have found a 
>> possible bug in the DF bit and mtu handling in IP forwarding. The BIG 
>> packets received from streaming services all have the "DF bit" set 
>> and the question is that should we be forwarding them at all as that 
>> would result them being fragmented? Apparently we currently are... I 
>> have traced this down to the ip_forward.c function ip_exceeds_mtu(), 
>> and the following patch seems to fix that.
>>
>> --- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
>> +++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
>> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
>>                 return false;
>>
>>         /* original fragment exceeds mtu and DF is set */
>> -       if (unlikely(IPCB(skb)->frag_max_size > mtu))
>> +        if (unlikely(skb->len > mtu))
>>                 return true;
>>
>>         if (skb->ignore_df)
>>
>>
>> This seems to work (in some ways) - after the change IP packets that 
>> are too large to the internal network get dropped and we are sending 
>> "ICMP Destination unreachable, The datagram is too big" messages to 
>> the originator (as we should?). However it seems that not all 
>> services really like this... Netflix behaves as expected and ping is 
>> stable from internal network to the internet, but for example HBO 
>> nordic will not work anymore (too little buffering? Retransimissions 
>> not working?). So it seems the original issue should be also fixed 
>> (And the fragmention should be allowed?).
>>
>>
>>
>> Any advice would be appreciated. Thanks!
>>
>> PS. Watching TV was not this intensive 20 years ago :)
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* IP (rtl8169) forwarding bug (performance)
  2018-12-07 14:46   ` Risto Pajula
@ 2018-12-09 23:28     ` Risto Pajula
  2018-12-10 21:26       ` Heiner Kallweit
  0 siblings, 1 reply; 14+ messages in thread
From: Risto Pajula @ 2018-12-09 23:28 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Realtek linux nic maintainers,
	hkallweit1
  Cc: netdev


Hello.

Old subject: "Re: IP fragmentation performance and don't fragment bug 
when forwarding


I have now been tracing the kernel and finding the bug seems difficult. 
I think the bug is combination of several things, likely cause is that 
it only occurs with rtl8169 and how it is using the netdevapi/DQL api.

 From my investigations seems that following happens:

*Burst of frames is received from internet. (eth0)
*These are accepted and forwarded to the (eth1)
*DQL hits the limit, this causes the scheduling for the device to be 
stopped (__QUEUE_STATE_STACK_XOFF)
*This and combination of some timing causes the  performance 
degradation, messages are only scheduled for transmission in soft Irq 
context.

I still do now know, why the DQL hits the limits.. I think it should 
not,  maybe something undesirable first happens with rtl8169.

I inserted following trace printk functions to the code..

rtl8169_start_xmit:
trace_printk("ORP33 %s rtl8169_start_xmit len: %d entry: %u cur_tx: %u 
frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, skb->len, entry, 
tp->cur_tx, frags, tp->dirty_tx, (tp->cur_tx - tp->dirty_tx) );


rtl_tx:
trace_printk("ORP33 %s rtl_tx len: %d entry: %u cur_tx: %u frags: %d 
dirty_tx: %u tx_left: %u\n" , tp->dev->name, tx_skb->skb->len, 
tp->cur_tx % NUM_TX_DESC, tp->cur_tx, 0, dirty_tx, tx_left );


Here is some grepped output only filtering the ping packet in two 
different situations:

//trace when downloading ubuntu (~20MB/s), only 3 secs fit to buffer. 
ping is all the time normal....
//orig tracing.out3754
117493:          <idle>-0     [000] ..s.  5235.407116: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 57 cur_tx: 3747641 frags: 0 dirty_tx: 
3747640 tx_left: 1
118704:          <idle>-0     [000] ..s.  5235.416057: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 20 cur_tx: 3747668 frags: 0 dirty_tx: 
3747667 tx_left: 1
346319:          hping3-2871  [000] ....  5236.002456: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 8 
cur_tx: 3754312 frags: 0 dirty_tx: 3754312 tx_left: 0
346337:          hping3-2871  [000] ..s.  5236.002475: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 9 cur_tx: 3754313 frags: 0 dirty_tx: 
3754312 tx_left: 1
1277155:          hping3-2871  [001] ....  5237.002627: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 38 
cur_tx: 3766502 frags: 0 dirty_tx: 3766502 tx_left: 0
1277173:          <idle>-0     [000] ..s.  5237.002646: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 39 cur_tx: 3766503 frags: 0 dirty_tx: 
3766502 tx_left: 1


//imcp messages from route to intenal network when TCP stream is on. 
tracing.out3236/eth1_ring.txt is the original
//no idea why the first frame is tripled...
200792:          hping3-2871  [001] ....  3239.593384: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 34 
cur_tx: 2132770 frags: 0 dirty_tx: 2132770 tx_left: 0
201578:          hping3-2871  [001] ....  3240.593625: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 40 
cur_tx: 2132776 frags: 0 dirty_tx: 2132776 tx_left: 0
341061:          <idle>-0     [000] ..s.  3241.738011: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 52 
cur_tx: 2133684 frags: 0 dirty_tx: 2133680 tx_left: 4
342112:          <idle>-0     [000] ..s.  3241.745297: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 59 cur_tx: 2133691 frags: 0 dirty_tx: 
2133684 tx_left: 7
500126:          <idle>-0     [000] ..s.  3242.739500: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 49 
cur_tx: 2134513 frags: 0 dirty_tx: 2134505 tx_left: 8
501739:          <idle>-0     [000] ..s.  3242.751654: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 58 cur_tx: 2134522 frags: 0 dirty_tx: 
2134513 tx_left: 9
646220:          <idle>-0     [000] ..s.  3243.745761: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 41 
cur_tx: 2135273 frags: 0 dirty_tx: 2135267 tx_left: 6
647633:          <idle>-0     [000] ..s.  3243.755486: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 49 cur_tx: 2135281 frags: 0 dirty_tx: 
2135273 tx_left: 8
802878:          <idle>-0     [000] ..s.  3244.739947: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 
cur_tx: 2136087 frags: 0 dirty_tx: 2136081 tx_left: 6
804298:          <idle>-0     [000] ..s.  3244.749677: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 31 cur_tx: 2136095 frags: 0 dirty_tx: 
2136087 tx_left: 8
961190:          <idle>-0     [000] ..s.  3245.746217: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 16 
cur_tx: 2136912 frags: 0 dirty_tx: 2136906 tx_left: 6
962610:          <idle>-0     [000] ..s.  3245.755946: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2136920 frags: 0 dirty_tx: 
2136912 tx_left: 8
1118044:          <idle>-0     [000] ..s.  3246.740336: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 62 
cur_tx: 2137726 frags: 0 dirty_tx: 2137720 tx_left: 6
1119443:          <idle>-0     [000] ..s.  3246.750047: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 5 cur_tx: 2137733 frags: 0 dirty_tx: 
2137726 tx_left: 7
1264047:          <idle>-0     [000] ..s.  3247.709202: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 29 
cur_tx: 2138525 frags: 0 dirty_tx: 2138518 tx_left: 7
1264740:          <idle>-0     [000] ..s.  3247.718918: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 33 cur_tx: 2138529 frags: 0 dirty_tx: 
2138525 tx_left: 4
1419958:          <idle>-0     [000] ..s.  3248.736436: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 48 
cur_tx: 2139312 frags: 0 dirty_tx: 2139305 tx_left: 7
1421551:          <idle>-0     [000] ..s.  3248.746189: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 56 cur_tx: 2139320 frags: 0 dirty_tx: 
2139312 tx_left: 8
1578746:          <idle>-0     [000] ..s.  3249.742702: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 46 
cur_tx: 2140142 frags: 0 dirty_tx: 2140135 tx_left: 7
1580194:          <idle>-0     [000] ..s.  3249.752457: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 54 cur_tx: 2140150 frags: 0 dirty_tx: 
2140142 tx_left: 8
1729597:          <idle>-0     [000] ..s.  3250.746839: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 36 
cur_tx: 2140964 frags: 0 dirty_tx: 2140957 tx_left: 7
1731015:          <idle>-0     [000] ..s.  3250.756594: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 43 cur_tx: 2140971 frags: 0 dirty_tx: 
2140964 tx_left: 7
1881028:          <idle>-0     [000] ..s.  3251.740991: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 19 
cur_tx: 2141779 frags: 0 dirty_tx: 2141772 tx_left: 7
1881381:          <idle>-0     [000] ..s.  3251.750718: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 26 cur_tx: 2141786 frags: 0 dirty_tx: 
2141779 tx_left: 7
//something happens, the output queue empties.. now it is fast, and 
rtl8169_start_xmit is not called from soft_irq
1896178:          hping3-2871  [001] ....  3252.595994: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 
cur_tx: 2141911 frags: 0 dirty_tx: 2141911 tx_left: 0
1896196:          <idle>-0     [000] ..s.  3252.596055: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2141912 frags: 0 dirty_tx: 
2141911 tx_left: 1
1923136:          hping3-2871  [001] ....  3253.596227: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 50 
cur_tx: 2142066 frags: 0 dirty_tx: 2142066 tx_left: 0
1923154:          <idle>-0     [000] ..s.  3253.596289: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 51 cur_tx: 2142067 frags: 0 dirty_tx: 
2142066 tx_left: 1
1951633:          hping3-2871  [001] ....  3254.596459: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 5 
cur_tx: 2142213 frags: 0 dirty_tx: 2142213 tx_left: 0
1951651:          <idle>-0     [000] ..s.  3254.596521: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 6 cur_tx: 2142214 frags: 0 dirty_tx: 
2142213 tx_left: 1
1954472:          hping3-2871  [001] ....  3255.596700: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 15 
cur_tx: 2142223 frags: 0 dirty_tx: 2142223 tx_left: 0
1954490:          <idle>-0     [000] ..s.  3255.596763: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 54 entry: 16 cur_tx: 2142224 frags: 0 dirty_tx: 
2142223 tx_left: 1
//until it soon, again is slow..


BR.
Risto



On 7.12.2018 16:46, Risto Pajula wrote:
> Hello.
> 
> I have been to track the poor forwarding latency to the TCP Window scale 
> options. The Netgem device uses rather large windows scale options 
> (x256) and I have been able to reproduce the routers poor forwarding 
> latency also with linux box running in the internal network and changing 
> the net.ipv4.tcp_rmem to a large value and thus changing the TCP window 
> scaling options to larger ones. I still do not have clue why this causes 
> the forwarfing in the linux kernel to block? Maybe something in the 
> connection tracking....?
> 
> 
> With the ICMP timestamp messages I have been able to also pinpoint that 
> the latency is caused in the eth1 sending side (the following hping3 
> example is run in the router toward the internal network...
> 
> 
> xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
> using eth1, addr: 192.168.0.1, MTU: 1500
> HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0 
> data bytes
> len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
> icmp_seq=0 rtt=7.9 ms
> ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
> ICMP timestamp RTT tsrtt=7
> 
> len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
> icmp_seq=1 rtt=235.9 ms
> ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
> ICMP timestamp RTT tsrtt=235
> 
> len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
> icmp_seq=2 rtt=3.8 ms
> ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
> ICMP timestamp RTT tsrtt=3
> 
> len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
> icmp_seq=3 rtt=47.8 ms
> ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
> ICMP timestamp RTT tsrtt=47
> 
> len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
> icmp_seq=4 rtt=7.7 ms
> ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
> ICMP timestamp RTT tsrtt=7
> 
> len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
> icmp_seq=5 rtt=3.7 ms
> ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
> ICMP timestamp RTT tsrtt=3
> 
> ^C
> --- 192.168.0.112 hping statistic ---
> 6 packets tramitted, 6 packets received, 0% packet loss
> round-trip min/avg/max = 3.7/51.1/235.9 ms
> 
> 
> 
> BR.
> Risto
> 
> On 2.12.2018 23:32, Risto Pajula wrote:
>> Hello.
>>
>> You can most likely ignore the "DF Bit, mtu bug when forwarding" case. 
>> There isn't actually big IP packets on the wire, instead there is 
>> burst of packets on the wire, which are combined by the GRO... And 
>> thus dropping them should not happen. Sorry about the invalid bug report.
>>
>> However the poor latency from intenal network to the internet still 
>> remain, both GRO enabled and disabled. I will try to study further...
>>
>>
>> BR.
>> Risto
>>
>>
>> On 2.12.2018 14:01, Risto Pajula wrote:
>>> Hello.
>>>
>>> I have encountered a weird performance problem in Linux IP 
>>> fragmentation when using video streaming services behind the NAT. 
>>> Also I have studied a possible bug in the DF bit (don't fragment) 
>>> handling when forwarding the IP packets.
>>>
>>> First the system setup description:
>>>
>>> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre 
>>> router]-internet
>>>
>>> where:
>>> host1: is a Netgem N7800 "cable box" for online video streaming 
>>> services provided by local telco (Can access Netflix, HBO nordic, 
>>> "live TV", etc.)
>>> linux router: Linux computer with Dualcore Intel Celeron G1840, 
>>> running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
>>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, 
>>> mtu set to 1500, RTL8169sb/8110sb
>>> eth0: Linux Routers internet facing interface, public ip address, mtu 
>>> set to 1500,  RTL8168evl/8111evl
>>> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly 
>>> connected to the eth0 of the Linux router.
>>>
>>> And now when using the Netgem N7800 with online video services 
>>> (Netflix, HBO nordic, etc) the Linux router will receive very BIG IP 
>>> packets in the eth0 upto ~20kB, this seems to lead to the following 
>>> problems in the Linux IP stack.
>>>
>>> IP fragmentation performance:
>>> When the Linux router receives these large IP packets in the eth0 
>>> everything works, but it seems that them cause very large performance 
>>> degradation from internal network to the internet regarding the 
>>> latency when the IP fragmentation is performed. The ping latency from 
>>> internal network to the internel network increases from stable 
>>> 15ms-20ms up to 700-800ms AND also the ping from the internal network 
>>> to the linux router eth1 (192.168.0.). However up link works 
>>> perfectly, the ping is still stable when streaming the online 
>>> services (From linux router to the internet). It seems that the IP 
>>> fragmentation is somehow blocking the eth1 reception or transmission 
>>> for very long time (which it shouldn't). I'm able to test and debug 
>>> the issue further, but advice regarding where to look would be 
>>> appreciated.
>>>
>>>
>>> DF Bit, mtu bug when forwarding:
>>> I have started to study the above mentioned problem and have found a 
>>> possible bug in the DF bit and mtu handling in IP forwarding. The BIG 
>>> packets received from streaming services all have the "DF bit" set 
>>> and the question is that should we be forwarding them at all as that 
>>> would result them being fragmented? Apparently we currently are... I 
>>> have traced this down to the ip_forward.c function ip_exceeds_mtu(), 
>>> and the following patch seems to fix that.
>>>
>>> --- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
>>> +++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
>>> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
>>>                 return false;
>>>
>>>         /* original fragment exceeds mtu and DF is set */
>>> -       if (unlikely(IPCB(skb)->frag_max_size > mtu))
>>> +        if (unlikely(skb->len > mtu))
>>>                 return true;
>>>
>>>         if (skb->ignore_df)
>>>
>>>
>>> This seems to work (in some ways) - after the change IP packets that 
>>> are too large to the internal network get dropped and we are sending 
>>> "ICMP Destination unreachable, The datagram is too big" messages to 
>>> the originator (as we should?). However it seems that not all 
>>> services really like this... Netflix behaves as expected and ping is 
>>> stable from internal network to the internet, but for example HBO 
>>> nordic will not work anymore (too little buffering? Retransimissions 
>>> not working?). So it seems the original issue should be also fixed 
>>> (And the fragmention should be allowed?).
>>>
>>>
>>>
>>> Any advice would be appreciated. Thanks!
>>>
>>> PS. Watching TV was not this intensive 20 years ago :)
>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: IP (rtl8169) forwarding bug (performance)
  2018-12-09 23:28     ` IP (rtl8169) forwarding bug (performance) Risto Pajula
@ 2018-12-10 21:26       ` Heiner Kallweit
  2018-12-10 22:20         ` Risto Pajula
  0 siblings, 1 reply; 14+ messages in thread
From: Heiner Kallweit @ 2018-12-10 21:26 UTC (permalink / raw)
  To: Risto Pajula, David S. Miller, Alexey Kuznetsov,
	Realtek linux nic maintainers
  Cc: netdev

Did you test also with the latest linux-next kernel? Some recent changes like 2e6eedb4813e
"r8169: make use of xmit_more and __netdev_sent_queue" may have a positive impact.

On 10.12.2018 00:28, Risto Pajula wrote:
> 
> Hello.
> 
> Old subject: "Re: IP fragmentation performance and don't fragment bug when forwarding
> 
> 
> I have now been tracing the kernel and finding the bug seems difficult. I think the bug is combination of several things, likely cause is that it only occurs with rtl8169 and how it is using the netdevapi/DQL api.
> 
> From my investigations seems that following happens:
> 
> *Burst of frames is received from internet. (eth0)
> *These are accepted and forwarded to the (eth1)
> *DQL hits the limit, this causes the scheduling for the device to be stopped (__QUEUE_STATE_STACK_XOFF)
> *This and combination of some timing causes the  performance degradation, messages are only scheduled for transmission in soft Irq context.
> 
> I still do now know, why the DQL hits the limits.. I think it should not,  maybe something undesirable first happens with rtl8169.
> 
> I inserted following trace printk functions to the code..
> 
> rtl8169_start_xmit:
> trace_printk("ORP33 %s rtl8169_start_xmit len: %d entry: %u cur_tx: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, skb->len, entry, tp->cur_tx, frags, tp->dirty_tx, (tp->cur_tx - tp->dirty_tx) );
> 
> 
> rtl_tx:
> trace_printk("ORP33 %s rtl_tx len: %d entry: %u cur_tx: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, tx_skb->skb->len, tp->cur_tx % NUM_TX_DESC, tp->cur_tx, 0, dirty_tx, tx_left );
> 
> 
> Here is some grepped output only filtering the ping packet in two different situations:
> 
> //trace when downloading ubuntu (~20MB/s), only 3 secs fit to buffer. ping is all the time normal....
> //orig tracing.out3754
> 117493:          <idle>-0     [000] ..s.  5235.407116: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 57 cur_tx: 3747641 frags: 0 dirty_tx: 3747640 tx_left: 1
> 118704:          <idle>-0     [000] ..s.  5235.416057: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 20 cur_tx: 3747668 frags: 0 dirty_tx: 3747667 tx_left: 1
> 346319:          hping3-2871  [000] ....  5236.002456: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 8 cur_tx: 3754312 frags: 0 dirty_tx: 3754312 tx_left: 0
> 346337:          hping3-2871  [000] ..s.  5236.002475: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 9 cur_tx: 3754313 frags: 0 dirty_tx: 3754312 tx_left: 1
> 1277155:          hping3-2871  [001] ....  5237.002627: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 38 cur_tx: 3766502 frags: 0 dirty_tx: 3766502 tx_left: 0
> 1277173:          <idle>-0     [000] ..s.  5237.002646: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 39 cur_tx: 3766503 frags: 0 dirty_tx: 3766502 tx_left: 1
> 
> 
> //imcp messages from route to intenal network when TCP stream is on. tracing.out3236/eth1_ring.txt is the original
> //no idea why the first frame is tripled...
> 200792:          hping3-2871  [001] ....  3239.593384: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 34 cur_tx: 2132770 frags: 0 dirty_tx: 2132770 tx_left: 0
> 201578:          hping3-2871  [001] ....  3240.593625: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 40 cur_tx: 2132776 frags: 0 dirty_tx: 2132776 tx_left: 0
> 341061:          <idle>-0     [000] ..s.  3241.738011: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 52 cur_tx: 2133684 frags: 0 dirty_tx: 2133680 tx_left: 4
> 342112:          <idle>-0     [000] ..s.  3241.745297: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 59 cur_tx: 2133691 frags: 0 dirty_tx: 2133684 tx_left: 7
> 500126:          <idle>-0     [000] ..s.  3242.739500: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 49 cur_tx: 2134513 frags: 0 dirty_tx: 2134505 tx_left: 8
> 501739:          <idle>-0     [000] ..s.  3242.751654: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 58 cur_tx: 2134522 frags: 0 dirty_tx: 2134513 tx_left: 9
> 646220:          <idle>-0     [000] ..s.  3243.745761: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 41 cur_tx: 2135273 frags: 0 dirty_tx: 2135267 tx_left: 6
> 647633:          <idle>-0     [000] ..s.  3243.755486: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 49 cur_tx: 2135281 frags: 0 dirty_tx: 2135273 tx_left: 8
> 802878:          <idle>-0     [000] ..s.  3244.739947: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2136087 frags: 0 dirty_tx: 2136081 tx_left: 6
> 804298:          <idle>-0     [000] ..s.  3244.749677: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 31 cur_tx: 2136095 frags: 0 dirty_tx: 2136087 tx_left: 8
> 961190:          <idle>-0     [000] ..s.  3245.746217: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 16 cur_tx: 2136912 frags: 0 dirty_tx: 2136906 tx_left: 6
> 962610:          <idle>-0     [000] ..s.  3245.755946: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2136920 frags: 0 dirty_tx: 2136912 tx_left: 8
> 1118044:          <idle>-0     [000] ..s.  3246.740336: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 62 cur_tx: 2137726 frags: 0 dirty_tx: 2137720 tx_left: 6
> 1119443:          <idle>-0     [000] ..s.  3246.750047: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 5 cur_tx: 2137733 frags: 0 dirty_tx: 2137726 tx_left: 7
> 1264047:          <idle>-0     [000] ..s.  3247.709202: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 29 cur_tx: 2138525 frags: 0 dirty_tx: 2138518 tx_left: 7
> 1264740:          <idle>-0     [000] ..s.  3247.718918: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 33 cur_tx: 2138529 frags: 0 dirty_tx: 2138525 tx_left: 4
> 1419958:          <idle>-0     [000] ..s.  3248.736436: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 48 cur_tx: 2139312 frags: 0 dirty_tx: 2139305 tx_left: 7
> 1421551:          <idle>-0     [000] ..s.  3248.746189: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 56 cur_tx: 2139320 frags: 0 dirty_tx: 2139312 tx_left: 8
> 1578746:          <idle>-0     [000] ..s.  3249.742702: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 46 cur_tx: 2140142 frags: 0 dirty_tx: 2140135 tx_left: 7
> 1580194:          <idle>-0     [000] ..s.  3249.752457: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 54 cur_tx: 2140150 frags: 0 dirty_tx: 2140142 tx_left: 8
> 1729597:          <idle>-0     [000] ..s.  3250.746839: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 36 cur_tx: 2140964 frags: 0 dirty_tx: 2140957 tx_left: 7
> 1731015:          <idle>-0     [000] ..s.  3250.756594: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 43 cur_tx: 2140971 frags: 0 dirty_tx: 2140964 tx_left: 7
> 1881028:          <idle>-0     [000] ..s.  3251.740991: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 19 cur_tx: 2141779 frags: 0 dirty_tx: 2141772 tx_left: 7
> 1881381:          <idle>-0     [000] ..s.  3251.750718: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 26 cur_tx: 2141786 frags: 0 dirty_tx: 2141779 tx_left: 7
> //something happens, the output queue empties.. now it is fast, and rtl8169_start_xmit is not called from soft_irq
> 1896178:          hping3-2871  [001] ....  3252.595994: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2141911 frags: 0 dirty_tx: 2141911 tx_left: 0
> 1896196:          <idle>-0     [000] ..s.  3252.596055: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2141912 frags: 0 dirty_tx: 2141911 tx_left: 1
> 1923136:          hping3-2871  [001] ....  3253.596227: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 50 cur_tx: 2142066 frags: 0 dirty_tx: 2142066 tx_left: 0
> 1923154:          <idle>-0     [000] ..s.  3253.596289: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 51 cur_tx: 2142067 frags: 0 dirty_tx: 2142066 tx_left: 1
> 1951633:          hping3-2871  [001] ....  3254.596459: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 5 cur_tx: 2142213 frags: 0 dirty_tx: 2142213 tx_left: 0
> 1951651:          <idle>-0     [000] ..s.  3254.596521: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 6 cur_tx: 2142214 frags: 0 dirty_tx: 2142213 tx_left: 1
> 1954472:          hping3-2871  [001] ....  3255.596700: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 15 cur_tx: 2142223 frags: 0 dirty_tx: 2142223 tx_left: 0
> 1954490:          <idle>-0     [000] ..s.  3255.596763: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 16 cur_tx: 2142224 frags: 0 dirty_tx: 2142223 tx_left: 1
> //until it soon, again is slow..
> 
> 
> BR.
> Risto
> 
> 
> 
> On 7.12.2018 16:46, Risto Pajula wrote:
>> Hello.
>>
>> I have been to track the poor forwarding latency to the TCP Window scale options. The Netgem device uses rather large windows scale options (x256) and I have been able to reproduce the routers poor forwarding latency also with linux box running in the internal network and changing the net.ipv4.tcp_rmem to a large value and thus changing the TCP window scaling options to larger ones. I still do not have clue why this causes the forwarfing in the linux kernel to block? Maybe something in the connection tracking....?
>>
>>
>> With the ICMP timestamp messages I have been able to also pinpoint that the latency is caused in the eth1 sending side (the following hping3 example is run in the router toward the internal network...
>>
>>
>> xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
>> using eth1, addr: 192.168.0.1, MTU: 1500
>> HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0 data bytes
>> len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
>> icmp_seq=0 rtt=7.9 ms
>> ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
>> ICMP timestamp RTT tsrtt=7
>>
>> len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
>> icmp_seq=1 rtt=235.9 ms
>> ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
>> ICMP timestamp RTT tsrtt=235
>>
>> len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
>> icmp_seq=2 rtt=3.8 ms
>> ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
>> ICMP timestamp RTT tsrtt=3
>>
>> len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
>> icmp_seq=3 rtt=47.8 ms
>> ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
>> ICMP timestamp RTT tsrtt=47
>>
>> len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
>> icmp_seq=4 rtt=7.7 ms
>> ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
>> ICMP timestamp RTT tsrtt=7
>>
>> len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
>> icmp_seq=5 rtt=3.7 ms
>> ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
>> ICMP timestamp RTT tsrtt=3
>>
>> ^C
>> --- 192.168.0.112 hping statistic ---
>> 6 packets tramitted, 6 packets received, 0% packet loss
>> round-trip min/avg/max = 3.7/51.1/235.9 ms
>>
>>
>>
>> BR.
>> Risto
>>
>> On 2.12.2018 23:32, Risto Pajula wrote:
>>> Hello.
>>>
>>> You can most likely ignore the "DF Bit, mtu bug when forwarding" case. There isn't actually big IP packets on the wire, instead there is burst of packets on the wire, which are combined by the GRO... And thus dropping them should not happen. Sorry about the invalid bug report.
>>>
>>> However the poor latency from intenal network to the internet still remain, both GRO enabled and disabled. I will try to study further...
>>>
>>>
>>> BR.
>>> Risto
>>>
>>>
>>> On 2.12.2018 14:01, Risto Pajula wrote:
>>>> Hello.
>>>>
>>>> I have encountered a weird performance problem in Linux IP fragmentation when using video streaming services behind the NAT. Also I have studied a possible bug in the DF bit (don't fragment) handling when forwarding the IP packets.
>>>>
>>>> First the system setup description:
>>>>
>>>> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre router]-internet
>>>>
>>>> where:
>>>> host1: is a Netgem N7800 "cable box" for online video streaming services provided by local telco (Can access Netflix, HBO nordic, "live TV", etc.)
>>>> linux router: Linux computer with Dualcore Intel Celeron G1840, running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
>>>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, mtu set to 1500, RTL8169sb/8110sb
>>>> eth0: Linux Routers internet facing interface, public ip address, mtu set to 1500,  RTL8168evl/8111evl
>>>> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly connected to the eth0 of the Linux router.
>>>>
>>>> And now when using the Netgem N7800 with online video services (Netflix, HBO nordic, etc) the Linux router will receive very BIG IP packets in the eth0 upto ~20kB, this seems to lead to the following problems in the Linux IP stack.
>>>>
>>>> IP fragmentation performance:
>>>> When the Linux router receives these large IP packets in the eth0 everything works, but it seems that them cause very large performance degradation from internal network to the internet regarding the latency when the IP fragmentation is performed. The ping latency from internal network to the internel network increases from stable 15ms-20ms up to 700-800ms AND also the ping from the internal network to the linux router eth1 (192.168.0.). However up link works perfectly, the ping is still stable when streaming the online services (From linux router to the internet). It seems that the IP fragmentation is somehow blocking the eth1 reception or transmission for very long time (which it shouldn't). I'm able to test and debug the issue further, but advice regarding where to look would be ap
 preciated.
>>>>
>>>>
>>>> DF Bit, mtu bug when forwarding:
>>>> I have started to study the above mentioned problem and have found a possible bug in the DF bit and mtu handling in IP forwarding. The BIG packets received from streaming services all have the "DF bit" set and the question is that should we be forwarding them at all as that would result them being fragmented? Apparently we currently are... I have traced this down to the ip_forward.c function ip_exceeds_mtu(), and the following patch seems to fix that.
>>>>
>>>> --- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
>>>> +++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
>>>> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
>>>>                 return false;
>>>>
>>>>         /* original fragment exceeds mtu and DF is set */
>>>> -       if (unlikely(IPCB(skb)->frag_max_size > mtu))
>>>> +        if (unlikely(skb->len > mtu))
>>>>                 return true;
>>>>
>>>>         if (skb->ignore_df)
>>>>
>>>>
>>>> This seems to work (in some ways) - after the change IP packets that are too large to the internal network get dropped and we are sending "ICMP Destination unreachable, The datagram is too big" messages to the originator (as we should?). However it seems that not all services really like this... Netflix behaves as expected and ping is stable from internal network to the internet, but for example HBO nordic will not work anymore (too little buffering? Retransimissions not working?). So it seems the original issue should be also fixed (And the fragmention should be allowed?).
>>>>
>>>>
>>>>
>>>> Any advice would be appreciated. Thanks!
>>>>
>>>> PS. Watching TV was not this intensive 20 years ago :)
>>>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: IP (rtl8169) forwarding bug (performance)
  2018-12-10 21:26       ` Heiner Kallweit
@ 2018-12-10 22:20         ` Risto Pajula
  2018-12-11 17:01           ` Risto Pajula
  0 siblings, 1 reply; 14+ messages in thread
From: Risto Pajula @ 2018-12-10 22:20 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: David S. Miller, Alexey Kuznetsov, Realtek linux nic maintainers, netdev

Hello.

I have not yet tested with linux-next but I will, thanks for pointing 
that out.

...But I have studied the problem a bit more, indeed it seems that the 
rtl8169 transmission queue gets stuck.

Below is some trace log. Starting from 802026 a burst of frames is 
forwarded from eth0 to eth1 and when we get to the rtl_tx only two 
frames have been transmitted.. (Should have had 1,25 ms to transmit 
them).. Also then the sequence seems to repeat for very long time that 
only two massages get transmitted.

800541:          <idle>-0     [001] ..s.  7237.698143: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 34 
cur_tx: 4406946 frags: 0 dirty_tx: 4406946 tx_left: 0
800671:          <idle>-0     [001] ..s.  7237.698189: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 35 
cur_tx: 4406947 frags: 0 dirty_tx: 4406946 tx_left: 1
800835:          <idle>-0     [001] ..s.  7237.698235: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 36 
cur_tx: 4406948 frags: 0 dirty_tx: 4406946 tx_left: 2
800905:          <idle>-0     [000] ..s.  7237.698255: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 
4406946 tx_left: 3
800932:          <idle>-0     [000] ..s.  7237.698263: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 
4406947 tx_left: 2
800960:          <idle>-0     [000] ..s.  7237.698270: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 
4406948 tx_left: 1
800998:          <idle>-0     [001] ..s.  7237.698282: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 37 
cur_tx: 4406949 frags: 0 dirty_tx: 4406949 tx_left: 0
801085:          <idle>-0     [000] ..s.  7237.698312: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 38 cur_tx: 4406950 frags: 0 dirty_tx: 
4406949 tx_left: 1
801137:          <idle>-0     [001] ..s.  7237.698329: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 38 
cur_tx: 4406950 frags: 0 dirty_tx: 4406950 tx_left: 0
801229:          <idle>-0     [000] ..s.  7237.698363: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 39 cur_tx: 4406951 frags: 0 dirty_tx: 
4406950 tx_left: 1
801264:          <idle>-0     [001] ..s.  7237.698375: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 39 
cur_tx: 4406951 frags: 0 dirty_tx: 4406951 tx_left: 0
801356:          <idle>-0     [000] ..s.  7237.698409: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 40 cur_tx: 4406952 frags: 0 dirty_tx: 
4406951 tx_left: 1
801391:          <idle>-0     [001] ..s.  7237.698421: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 40 
cur_tx: 4406952 frags: 0 dirty_tx: 4406952 tx_left: 0
801482:          <idle>-0     [000] ..s.  7237.698455: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 41 cur_tx: 4406953 frags: 0 dirty_tx: 
4406952 tx_left: 1
801518:          <idle>-0     [001] ..s.  7237.698468: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 41 
cur_tx: 4406953 frags: 0 dirty_tx: 4406953 tx_left: 0
801610:          <idle>-0     [000] ..s.  7237.698502: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 42 cur_tx: 4406954 frags: 0 dirty_tx: 
4406953 tx_left: 1
801645:          <idle>-0     [001] ..s.  7237.698514: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 42 
cur_tx: 4406954 frags: 0 dirty_tx: 4406954 tx_left: 0
801737:          <idle>-0     [000] ..s.  7237.698548: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 43 cur_tx: 4406955 frags: 0 dirty_tx: 
4406954 tx_left: 1
801772:          <idle>-0     [001] ..s.  7237.698560: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 43 
cur_tx: 4406955 frags: 0 dirty_tx: 4406955 tx_left: 0
801864:          <idle>-0     [000] ..s.  7237.698594: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 44 cur_tx: 4406956 frags: 0 dirty_tx: 
4406955 tx_left: 1
801899:          <idle>-0     [001] ..s.  7237.698607: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 44 
cur_tx: 4406956 frags: 0 dirty_tx: 4406956 tx_left: 0
801984:          <idle>-0     [000] ..s.  7237.698640: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 45 cur_tx: 4406957 frags: 0 dirty_tx: 
4406956 tx_left: 1
802026:          <idle>-0     [001] ..s.  7237.698655: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 45 
cur_tx: 4406957 frags: 0 dirty_tx: 4406957 tx_left: 0
802129:          <idle>-0     [001] ..s.  7237.698701: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 46 
cur_tx: 4406958 frags: 0 dirty_tx: 4406957 tx_left: 1
802232:          <idle>-0     [001] ..s.  7237.698747: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 47 
cur_tx: 4406959 frags: 0 dirty_tx: 4406957 tx_left: 2
802335:          <idle>-0     [001] ..s.  7237.698793: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 48 
cur_tx: 4406960 frags: 0 dirty_tx: 4406957 tx_left: 3
802438:          <idle>-0     [001] ..s.  7237.698840: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 49 
cur_tx: 4406961 frags: 0 dirty_tx: 4406957 tx_left: 4
802541:          <idle>-0     [001] ..s.  7237.698885: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 50 
cur_tx: 4406962 frags: 0 dirty_tx: 4406957 tx_left: 5
802644:          <idle>-0     [001] ..s.  7237.698932: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 51 
cur_tx: 4406963 frags: 0 dirty_tx: 4406957 tx_left: 6
802747:          <idle>-0     [001] ..s.  7237.698978: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 52 
cur_tx: 4406964 frags: 0 dirty_tx: 4406957 tx_left: 7
802851:          <idle>-0     [001] ..s.  7237.699025: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 53 
cur_tx: 4406965 frags: 0 dirty_tx: 4406957 tx_left: 8
805094:          <idle>-0     [000] ..s.  7237.699871: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 
4406957 tx_left: 9
805126:          <idle>-0     [000] ..s.  7237.699878: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 
4406958 tx_left: 8
805197:          <idle>-0     [001] ..s.  7237.699905: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 54 
cur_tx: 4406966 frags: 0 dirty_tx: 4406959 tx_left: 7
805234:          <idle>-0     [001] ..s.  7237.699925: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 55 
cur_tx: 4406967 frags: 0 dirty_tx: 4406959 tx_left: 8
811446:          <idle>-0     [000] ..s.  7237.702344: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 
4406959 tx_left: 9
811479:          <idle>-0     [000] ..s.  7237.702354: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 
4406960 tx_left: 8
811641:          <idle>-0     [000] ..s.  7237.702395: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 56 
cur_tx: 4406968 frags: 0 dirty_tx: 4406961 tx_left: 7
811733:          <idle>-0     [000] ..s.  7237.702416: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 57 
cur_tx: 4406969 frags: 0 dirty_tx: 4406961 tx_left: 8
812050:          hping3-2871  [000] ..s.  7237.704770: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 
4406961 tx_left: 9
812061:          hping3-2871  [000] ..s.  7237.704777: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 
4406962 tx_left: 8
812124:          hping3-2871  [000] ..s.  7237.704811: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 58 
cur_tx: 4406970 frags: 0 dirty_tx: 4406963 tx_left: 7
812176:          hping3-2871  [000] ..s.  7237.704831: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 59 
cur_tx: 4406971 frags: 0 dirty_tx: 4406963 tx_left: 8
812300:          <idle>-0     [000] ..s.  7237.707247: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 
4406963 tx_left: 9
812312:          <idle>-0     [000] ..s.  7237.707257: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 
4406964 tx_left: 8
812389:          <idle>-0     [000] ..s.  7237.707297: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 60 
cur_tx: 4406972 frags: 0 dirty_tx: 4406965 tx_left: 7
812426:          <idle>-0     [000] ..s.  7237.707318: 
rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 61 
cur_tx: 4406973 frags: 0 dirty_tx: 4406965 tx_left: 8
812827:          <idle>-0     [000] ..s.  7237.709690: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 
4406965 tx_left: 9
812838:          <idle>-0     [000] ..s.  7237.709700: rtl8169_poll: 
ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 
4406966 tx_left: 8
.....


BR.
Risto

On 10.12.2018 23:26, Heiner Kallweit wrote:
> Did you test also with the latest linux-next kernel? Some recent changes like 2e6eedb4813e
> "r8169: make use of xmit_more and __netdev_sent_queue" may have a positive impact.
>
> On 10.12.2018 00:28, Risto Pajula wrote:
>> Hello.
>>
>> Old subject: "Re: IP fragmentation performance and don't fragment bug when forwarding
>>
>>
>> I have now been tracing the kernel and finding the bug seems difficult. I think the bug is combination of several things, likely cause is that it only occurs with rtl8169 and how it is using the netdevapi/DQL api.
>>
>>  From my investigations seems that following happens:
>>
>> *Burst of frames is received from internet. (eth0)
>> *These are accepted and forwarded to the (eth1)
>> *DQL hits the limit, this causes the scheduling for the device to be stopped (__QUEUE_STATE_STACK_XOFF)
>> *This and combination of some timing causes the  performance degradation, messages are only scheduled for transmission in soft Irq context.
>>
>> I still do now know, why the DQL hits the limits.. I think it should not,  maybe something undesirable first happens with rtl8169.
>>
>> I inserted following trace printk functions to the code..
>>
>> rtl8169_start_xmit:
>> trace_printk("ORP33 %s rtl8169_start_xmit len: %d entry: %u cur_tx: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, skb->len, entry, tp->cur_tx, frags, tp->dirty_tx, (tp->cur_tx - tp->dirty_tx) );
>>
>>
>> rtl_tx:
>> trace_printk("ORP33 %s rtl_tx len: %d entry: %u cur_tx: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, tx_skb->skb->len, tp->cur_tx % NUM_TX_DESC, tp->cur_tx, 0, dirty_tx, tx_left );
>>
>>
>> Here is some grepped output only filtering the ping packet in two different situations:
>>
>> //trace when downloading ubuntu (~20MB/s), only 3 secs fit to buffer. ping is all the time normal....
>> //orig tracing.out3754
>> 117493:          <idle>-0     [000] ..s.  5235.407116: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 57 cur_tx: 3747641 frags: 0 dirty_tx: 3747640 tx_left: 1
>> 118704:          <idle>-0     [000] ..s.  5235.416057: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 20 cur_tx: 3747668 frags: 0 dirty_tx: 3747667 tx_left: 1
>> 346319:          hping3-2871  [000] ....  5236.002456: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 8 cur_tx: 3754312 frags: 0 dirty_tx: 3754312 tx_left: 0
>> 346337:          hping3-2871  [000] ..s.  5236.002475: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 9 cur_tx: 3754313 frags: 0 dirty_tx: 3754312 tx_left: 1
>> 1277155:          hping3-2871  [001] ....  5237.002627: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 38 cur_tx: 3766502 frags: 0 dirty_tx: 3766502 tx_left: 0
>> 1277173:          <idle>-0     [000] ..s.  5237.002646: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 39 cur_tx: 3766503 frags: 0 dirty_tx: 3766502 tx_left: 1
>>
>>
>> //imcp messages from route to intenal network when TCP stream is on. tracing.out3236/eth1_ring.txt is the original
>> //no idea why the first frame is tripled...
>> 200792:          hping3-2871  [001] ....  3239.593384: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 34 cur_tx: 2132770 frags: 0 dirty_tx: 2132770 tx_left: 0
>> 201578:          hping3-2871  [001] ....  3240.593625: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 40 cur_tx: 2132776 frags: 0 dirty_tx: 2132776 tx_left: 0
>> 341061:          <idle>-0     [000] ..s.  3241.738011: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 52 cur_tx: 2133684 frags: 0 dirty_tx: 2133680 tx_left: 4
>> 342112:          <idle>-0     [000] ..s.  3241.745297: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 59 cur_tx: 2133691 frags: 0 dirty_tx: 2133684 tx_left: 7
>> 500126:          <idle>-0     [000] ..s.  3242.739500: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 49 cur_tx: 2134513 frags: 0 dirty_tx: 2134505 tx_left: 8
>> 501739:          <idle>-0     [000] ..s.  3242.751654: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 58 cur_tx: 2134522 frags: 0 dirty_tx: 2134513 tx_left: 9
>> 646220:          <idle>-0     [000] ..s.  3243.745761: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 41 cur_tx: 2135273 frags: 0 dirty_tx: 2135267 tx_left: 6
>> 647633:          <idle>-0     [000] ..s.  3243.755486: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 49 cur_tx: 2135281 frags: 0 dirty_tx: 2135273 tx_left: 8
>> 802878:          <idle>-0     [000] ..s.  3244.739947: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2136087 frags: 0 dirty_tx: 2136081 tx_left: 6
>> 804298:          <idle>-0     [000] ..s.  3244.749677: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 31 cur_tx: 2136095 frags: 0 dirty_tx: 2136087 tx_left: 8
>> 961190:          <idle>-0     [000] ..s.  3245.746217: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 16 cur_tx: 2136912 frags: 0 dirty_tx: 2136906 tx_left: 6
>> 962610:          <idle>-0     [000] ..s.  3245.755946: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2136920 frags: 0 dirty_tx: 2136912 tx_left: 8
>> 1118044:          <idle>-0     [000] ..s.  3246.740336: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 62 cur_tx: 2137726 frags: 0 dirty_tx: 2137720 tx_left: 6
>> 1119443:          <idle>-0     [000] ..s.  3246.750047: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 5 cur_tx: 2137733 frags: 0 dirty_tx: 2137726 tx_left: 7
>> 1264047:          <idle>-0     [000] ..s.  3247.709202: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 29 cur_tx: 2138525 frags: 0 dirty_tx: 2138518 tx_left: 7
>> 1264740:          <idle>-0     [000] ..s.  3247.718918: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 33 cur_tx: 2138529 frags: 0 dirty_tx: 2138525 tx_left: 4
>> 1419958:          <idle>-0     [000] ..s.  3248.736436: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 48 cur_tx: 2139312 frags: 0 dirty_tx: 2139305 tx_left: 7
>> 1421551:          <idle>-0     [000] ..s.  3248.746189: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 56 cur_tx: 2139320 frags: 0 dirty_tx: 2139312 tx_left: 8
>> 1578746:          <idle>-0     [000] ..s.  3249.742702: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 46 cur_tx: 2140142 frags: 0 dirty_tx: 2140135 tx_left: 7
>> 1580194:          <idle>-0     [000] ..s.  3249.752457: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 54 cur_tx: 2140150 frags: 0 dirty_tx: 2140142 tx_left: 8
>> 1729597:          <idle>-0     [000] ..s.  3250.746839: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 36 cur_tx: 2140964 frags: 0 dirty_tx: 2140957 tx_left: 7
>> 1731015:          <idle>-0     [000] ..s.  3250.756594: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 43 cur_tx: 2140971 frags: 0 dirty_tx: 2140964 tx_left: 7
>> 1881028:          <idle>-0     [000] ..s.  3251.740991: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 19 cur_tx: 2141779 frags: 0 dirty_tx: 2141772 tx_left: 7
>> 1881381:          <idle>-0     [000] ..s.  3251.750718: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 26 cur_tx: 2141786 frags: 0 dirty_tx: 2141779 tx_left: 7
>> //something happens, the output queue empties.. now it is fast, and rtl8169_start_xmit is not called from soft_irq
>> 1896178:          hping3-2871  [001] ....  3252.595994: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2141911 frags: 0 dirty_tx: 2141911 tx_left: 0
>> 1896196:          <idle>-0     [000] ..s.  3252.596055: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2141912 frags: 0 dirty_tx: 2141911 tx_left: 1
>> 1923136:          hping3-2871  [001] ....  3253.596227: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 50 cur_tx: 2142066 frags: 0 dirty_tx: 2142066 tx_left: 0
>> 1923154:          <idle>-0     [000] ..s.  3253.596289: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 51 cur_tx: 2142067 frags: 0 dirty_tx: 2142066 tx_left: 1
>> 1951633:          hping3-2871  [001] ....  3254.596459: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 5 cur_tx: 2142213 frags: 0 dirty_tx: 2142213 tx_left: 0
>> 1951651:          <idle>-0     [000] ..s.  3254.596521: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 6 cur_tx: 2142214 frags: 0 dirty_tx: 2142213 tx_left: 1
>> 1954472:          hping3-2871  [001] ....  3255.596700: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 15 cur_tx: 2142223 frags: 0 dirty_tx: 2142223 tx_left: 0
>> 1954490:          <idle>-0     [000] ..s.  3255.596763: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 16 cur_tx: 2142224 frags: 0 dirty_tx: 2142223 tx_left: 1
>> //until it soon, again is slow..
>>
>>
>> BR.
>> Risto
>>
>>
>>
>> On 7.12.2018 16:46, Risto Pajula wrote:
>>> Hello.
>>>
>>> I have been to track the poor forwarding latency to the TCP Window scale options. The Netgem device uses rather large windows scale options (x256) and I have been able to reproduce the routers poor forwarding latency also with linux box running in the internal network and changing the net.ipv4.tcp_rmem to a large value and thus changing the TCP window scaling options to larger ones. I still do not have clue why this causes the forwarfing in the linux kernel to block? Maybe something in the connection tracking....?
>>>
>>>
>>> With the ICMP timestamp messages I have been able to also pinpoint that the latency is caused in the eth1 sending side (the following hping3 example is run in the router toward the internal network...
>>>
>>>
>>> xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
>>> using eth1, addr: 192.168.0.1, MTU: 1500
>>> HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0 data bytes
>>> len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
>>> icmp_seq=0 rtt=7.9 ms
>>> ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
>>> ICMP timestamp RTT tsrtt=7
>>>
>>> len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
>>> icmp_seq=1 rtt=235.9 ms
>>> ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
>>> ICMP timestamp RTT tsrtt=235
>>>
>>> len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
>>> icmp_seq=2 rtt=3.8 ms
>>> ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
>>> ICMP timestamp RTT tsrtt=3
>>>
>>> len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
>>> icmp_seq=3 rtt=47.8 ms
>>> ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
>>> ICMP timestamp RTT tsrtt=47
>>>
>>> len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
>>> icmp_seq=4 rtt=7.7 ms
>>> ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
>>> ICMP timestamp RTT tsrtt=7
>>>
>>> len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
>>> icmp_seq=5 rtt=3.7 ms
>>> ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
>>> ICMP timestamp RTT tsrtt=3
>>>
>>> ^C
>>> --- 192.168.0.112 hping statistic ---
>>> 6 packets tramitted, 6 packets received, 0% packet loss
>>> round-trip min/avg/max = 3.7/51.1/235.9 ms
>>>
>>>
>>>
>>> BR.
>>> Risto
>>>
>>> On 2.12.2018 23:32, Risto Pajula wrote:
>>>> Hello.
>>>>
>>>> You can most likely ignore the "DF Bit, mtu bug when forwarding" case. There isn't actually big IP packets on the wire, instead there is burst of packets on the wire, which are combined by the GRO... And thus dropping them should not happen. Sorry about the invalid bug report.
>>>>
>>>> However the poor latency from intenal network to the internet still remain, both GRO enabled and disabled. I will try to study further...
>>>>
>>>>
>>>> BR.
>>>> Risto
>>>>
>>>>
>>>> On 2.12.2018 14:01, Risto Pajula wrote:
>>>>> Hello.
>>>>>
>>>>> I have encountered a weird performance problem in Linux IP fragmentation when using video streaming services behind the NAT. Also I have studied a possible bug in the DF bit (don't fragment) handling when forwarding the IP packets.
>>>>>
>>>>> First the system setup description:
>>>>>
>>>>> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre router]-internet
>>>>>
>>>>> where:
>>>>> host1: is a Netgem N7800 "cable box" for online video streaming services provided by local telco (Can access Netflix, HBO nordic, "live TV", etc.)
>>>>> linux router: Linux computer with Dualcore Intel Celeron G1840, running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
>>>>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, mtu set to 1500, RTL8169sb/8110sb
>>>>> eth0: Linux Routers internet facing interface, public ip address, mtu set to 1500,  RTL8168evl/8111evl
>>>>> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly connected to the eth0 of the Linux router.
>>>>>
>>>>> And now when using the Netgem N7800 with online video services (Netflix, HBO nordic, etc) the Linux router will receive very BIG IP packets in the eth0 upto ~20kB, this seems to lead to the following problems in the Linux IP stack.
>>>>>
>>>>> IP fragmentation performance:
>>>>> When the Linux router receives these large IP packets in the eth0 everything works, but it seems that them cause very large performance degradation from internal network to the internet regarding the latency when the IP fragmentation is performed. The ping latency from internal network to the internel network increases from stable 15ms-20ms up to 700-800ms AND also the ping from the internal network to the linux router eth1 (192.168.0.). However up link works perfectly, the ping is still stable when streaming the online services (From linux router to the internet). It seems that the IP fragmentation is somehow blocking the eth1 reception or transmission for very long time (which it shouldn't). I'm able to test and debug the issue further, but advice regarding where to look would be a
 ppreciated.
>>>>>
>>>>>
>>>>> DF Bit, mtu bug when forwarding:
>>>>> I have started to study the above mentioned problem and have found a possible bug in the DF bit and mtu handling in IP forwarding. The BIG packets received from streaming services all have the "DF bit" set and the question is that should we be forwarding them at all as that would result them being fragmented? Apparently we currently are... I have traced this down to the ip_forward.c function ip_exceeds_mtu(), and the following patch seems to fix that.
>>>>>
>>>>> --- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
>>>>> +++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
>>>>> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
>>>>>                  return false;
>>>>>
>>>>>          /* original fragment exceeds mtu and DF is set */
>>>>> -       if (unlikely(IPCB(skb)->frag_max_size > mtu))
>>>>> +        if (unlikely(skb->len > mtu))
>>>>>                  return true;
>>>>>
>>>>>          if (skb->ignore_df)
>>>>>
>>>>>
>>>>> This seems to work (in some ways) - after the change IP packets that are too large to the internal network get dropped and we are sending "ICMP Destination unreachable, The datagram is too big" messages to the originator (as we should?). However it seems that not all services really like this... Netflix behaves as expected and ping is stable from internal network to the internet, but for example HBO nordic will not work anymore (too little buffering? Retransimissions not working?). So it seems the original issue should be also fixed (And the fragmention should be allowed?).
>>>>>
>>>>>
>>>>>
>>>>> Any advice would be appreciated. Thanks!
>>>>>
>>>>> PS. Watching TV was not this intensive 20 years ago :)
>>>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: IP (rtl8169) forwarding bug (performance)
  2018-12-10 22:20         ` Risto Pajula
@ 2018-12-11 17:01           ` Risto Pajula
  2018-12-11 19:51             ` Heiner Kallweit
  0 siblings, 1 reply; 14+ messages in thread
From: Risto Pajula @ 2018-12-11 17:01 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: David S. Miller, Alexey Kuznetsov, Realtek linux nic maintainers, netdev

Hello.

A freshly built 4.20.0-rc6-next-20181210-lp150.12.25-default waited me 
when I got back from work, but unfortunately it did not help at all, it 
behaved exactly in same manner.

Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
Reply from 192.168.0.1: bytes=32 time=73ms TTL=64
Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
Reply from 192.168.0.1: bytes=32 time=83ms TTL=64
Reply from 192.168.0.1: bytes=32 time=307ms TTL=64
Reply from 192.168.0.1: bytes=32 time=115ms TTL=64
Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
Reply from 192.168.0.1: bytes=32 time<1ms TTL=64

BR.
Risto


On 11.12.2018 0:20, Risto Pajula wrote:
> Hello.
>
> I have not yet tested with linux-next but I will, thanks for pointing 
> that out.
>
> ...But I have studied the problem a bit more, indeed it seems that the 
> rtl8169 transmission queue gets stuck.
>
> Below is some trace log. Starting from 802026 a burst of frames is 
> forwarded from eth0 to eth1 and when we get to the rtl_tx only two 
> frames have been transmitted.. (Should have had 1,25 ms to transmit 
> them).. Also then the sequence seems to repeat for very long time that 
> only two massages get transmitted.
>
> 800541:          <idle>-0     [001] ..s.  7237.698143: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 34 
> cur_tx: 4406946 frags: 0 dirty_tx: 4406946 tx_left: 0
> 800671:          <idle>-0     [001] ..s.  7237.698189: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 35 
> cur_tx: 4406947 frags: 0 dirty_tx: 4406946 tx_left: 1
> 800835:          <idle>-0     [001] ..s.  7237.698235: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 36 
> cur_tx: 4406948 frags: 0 dirty_tx: 4406946 tx_left: 2
> 800905:          <idle>-0     [000] ..s.  7237.698255: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 
> dirty_tx: 4406946 tx_left: 3
> 800932:          <idle>-0     [000] ..s.  7237.698263: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 
> dirty_tx: 4406947 tx_left: 2
> 800960:          <idle>-0     [000] ..s.  7237.698270: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 
> dirty_tx: 4406948 tx_left: 1
> 800998:          <idle>-0     [001] ..s.  7237.698282: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 37 
> cur_tx: 4406949 frags: 0 dirty_tx: 4406949 tx_left: 0
> 801085:          <idle>-0     [000] ..s.  7237.698312: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 38 cur_tx: 4406950 frags: 0 
> dirty_tx: 4406949 tx_left: 1
> 801137:          <idle>-0     [001] ..s.  7237.698329: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 38 
> cur_tx: 4406950 frags: 0 dirty_tx: 4406950 tx_left: 0
> 801229:          <idle>-0     [000] ..s.  7237.698363: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 39 cur_tx: 4406951 frags: 0 
> dirty_tx: 4406950 tx_left: 1
> 801264:          <idle>-0     [001] ..s.  7237.698375: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 39 
> cur_tx: 4406951 frags: 0 dirty_tx: 4406951 tx_left: 0
> 801356:          <idle>-0     [000] ..s.  7237.698409: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 40 cur_tx: 4406952 frags: 0 
> dirty_tx: 4406951 tx_left: 1
> 801391:          <idle>-0     [001] ..s.  7237.698421: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 40 
> cur_tx: 4406952 frags: 0 dirty_tx: 4406952 tx_left: 0
> 801482:          <idle>-0     [000] ..s.  7237.698455: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 41 cur_tx: 4406953 frags: 0 
> dirty_tx: 4406952 tx_left: 1
> 801518:          <idle>-0     [001] ..s.  7237.698468: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 41 
> cur_tx: 4406953 frags: 0 dirty_tx: 4406953 tx_left: 0
> 801610:          <idle>-0     [000] ..s.  7237.698502: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 42 cur_tx: 4406954 frags: 0 
> dirty_tx: 4406953 tx_left: 1
> 801645:          <idle>-0     [001] ..s.  7237.698514: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 42 
> cur_tx: 4406954 frags: 0 dirty_tx: 4406954 tx_left: 0
> 801737:          <idle>-0     [000] ..s.  7237.698548: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 43 cur_tx: 4406955 frags: 0 
> dirty_tx: 4406954 tx_left: 1
> 801772:          <idle>-0     [001] ..s.  7237.698560: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 43 
> cur_tx: 4406955 frags: 0 dirty_tx: 4406955 tx_left: 0
> 801864:          <idle>-0     [000] ..s.  7237.698594: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 44 cur_tx: 4406956 frags: 0 
> dirty_tx: 4406955 tx_left: 1
> 801899:          <idle>-0     [001] ..s.  7237.698607: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 44 
> cur_tx: 4406956 frags: 0 dirty_tx: 4406956 tx_left: 0
> 801984:          <idle>-0     [000] ..s.  7237.698640: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 45 cur_tx: 4406957 frags: 0 
> dirty_tx: 4406956 tx_left: 1
> 802026:          <idle>-0     [001] ..s.  7237.698655: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 45 
> cur_tx: 4406957 frags: 0 dirty_tx: 4406957 tx_left: 0
> 802129:          <idle>-0     [001] ..s.  7237.698701: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 46 
> cur_tx: 4406958 frags: 0 dirty_tx: 4406957 tx_left: 1
> 802232:          <idle>-0     [001] ..s.  7237.698747: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 47 
> cur_tx: 4406959 frags: 0 dirty_tx: 4406957 tx_left: 2
> 802335:          <idle>-0     [001] ..s.  7237.698793: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 48 
> cur_tx: 4406960 frags: 0 dirty_tx: 4406957 tx_left: 3
> 802438:          <idle>-0     [001] ..s.  7237.698840: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 49 
> cur_tx: 4406961 frags: 0 dirty_tx: 4406957 tx_left: 4
> 802541:          <idle>-0     [001] ..s.  7237.698885: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 50 
> cur_tx: 4406962 frags: 0 dirty_tx: 4406957 tx_left: 5
> 802644:          <idle>-0     [001] ..s.  7237.698932: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 51 
> cur_tx: 4406963 frags: 0 dirty_tx: 4406957 tx_left: 6
> 802747:          <idle>-0     [001] ..s.  7237.698978: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 52 
> cur_tx: 4406964 frags: 0 dirty_tx: 4406957 tx_left: 7
> 802851:          <idle>-0     [001] ..s.  7237.699025: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 53 
> cur_tx: 4406965 frags: 0 dirty_tx: 4406957 tx_left: 8
> 805094:          <idle>-0     [000] ..s.  7237.699871: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 
> dirty_tx: 4406957 tx_left: 9
> 805126:          <idle>-0     [000] ..s.  7237.699878: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 
> dirty_tx: 4406958 tx_left: 8
> 805197:          <idle>-0     [001] ..s.  7237.699905: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 54 
> cur_tx: 4406966 frags: 0 dirty_tx: 4406959 tx_left: 7
> 805234:          <idle>-0     [001] ..s.  7237.699925: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 55 
> cur_tx: 4406967 frags: 0 dirty_tx: 4406959 tx_left: 8
> 811446:          <idle>-0     [000] ..s.  7237.702344: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 
> dirty_tx: 4406959 tx_left: 9
> 811479:          <idle>-0     [000] ..s.  7237.702354: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 
> dirty_tx: 4406960 tx_left: 8
> 811641:          <idle>-0     [000] ..s.  7237.702395: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 56 
> cur_tx: 4406968 frags: 0 dirty_tx: 4406961 tx_left: 7
> 811733:          <idle>-0     [000] ..s.  7237.702416: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 57 
> cur_tx: 4406969 frags: 0 dirty_tx: 4406961 tx_left: 8
> 812050:          hping3-2871  [000] ..s.  7237.704770: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 
> dirty_tx: 4406961 tx_left: 9
> 812061:          hping3-2871  [000] ..s.  7237.704777: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 
> dirty_tx: 4406962 tx_left: 8
> 812124:          hping3-2871  [000] ..s.  7237.704811: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 58 
> cur_tx: 4406970 frags: 0 dirty_tx: 4406963 tx_left: 7
> 812176:          hping3-2871  [000] ..s.  7237.704831: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 59 
> cur_tx: 4406971 frags: 0 dirty_tx: 4406963 tx_left: 8
> 812300:          <idle>-0     [000] ..s.  7237.707247: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 
> dirty_tx: 4406963 tx_left: 9
> 812312:          <idle>-0     [000] ..s.  7237.707257: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 
> dirty_tx: 4406964 tx_left: 8
> 812389:          <idle>-0     [000] ..s.  7237.707297: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 60 
> cur_tx: 4406972 frags: 0 dirty_tx: 4406965 tx_left: 7
> 812426:          <idle>-0     [000] ..s.  7237.707318: 
> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 61 
> cur_tx: 4406973 frags: 0 dirty_tx: 4406965 tx_left: 8
> 812827:          <idle>-0     [000] ..s.  7237.709690: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 
> dirty_tx: 4406965 tx_left: 9
> 812838:          <idle>-0     [000] ..s.  7237.709700: rtl8169_poll: 
> ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 
> dirty_tx: 4406966 tx_left: 8
> .....
>
>
> BR.
> Risto
>
> On 10.12.2018 23:26, Heiner Kallweit wrote:
>> Did you test also with the latest linux-next kernel? Some recent 
>> changes like 2e6eedb4813e
>> "r8169: make use of xmit_more and __netdev_sent_queue" may have a 
>> positive impact.
>>
>> On 10.12.2018 00:28, Risto Pajula wrote:
>>> Hello.
>>>
>>> Old subject: "Re: IP fragmentation performance and don't fragment 
>>> bug when forwarding
>>>
>>>
>>> I have now been tracing the kernel and finding the bug seems 
>>> difficult. I think the bug is combination of several things, likely 
>>> cause is that it only occurs with rtl8169 and how it is using the 
>>> netdevapi/DQL api.
>>>
>>>  From my investigations seems that following happens:
>>>
>>> *Burst of frames is received from internet. (eth0)
>>> *These are accepted and forwarded to the (eth1)
>>> *DQL hits the limit, this causes the scheduling for the device to be 
>>> stopped (__QUEUE_STATE_STACK_XOFF)
>>> *This and combination of some timing causes the  performance 
>>> degradation, messages are only scheduled for transmission in soft 
>>> Irq context.
>>>
>>> I still do now know, why the DQL hits the limits.. I think it should 
>>> not,  maybe something undesirable first happens with rtl8169.
>>>
>>> I inserted following trace printk functions to the code..
>>>
>>> rtl8169_start_xmit:
>>> trace_printk("ORP33 %s rtl8169_start_xmit len: %d entry: %u cur_tx: 
>>> %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, skb->len, 
>>> entry, tp->cur_tx, frags, tp->dirty_tx, (tp->cur_tx - tp->dirty_tx) );
>>>
>>>
>>> rtl_tx:
>>> trace_printk("ORP33 %s rtl_tx len: %d entry: %u cur_tx: %u frags: %d 
>>> dirty_tx: %u tx_left: %u\n" , tp->dev->name, tx_skb->skb->len, 
>>> tp->cur_tx % NUM_TX_DESC, tp->cur_tx, 0, dirty_tx, tx_left );
>>>
>>>
>>> Here is some grepped output only filtering the ping packet in two 
>>> different situations:
>>>
>>> //trace when downloading ubuntu (~20MB/s), only 3 secs fit to 
>>> buffer. ping is all the time normal....
>>> //orig tracing.out3754
>>> 117493:          <idle>-0     [000] ..s.  5235.407116: rtl8169_poll: 
>>> ORP33 eth1 rtl_tx len: 54 entry: 57 cur_tx: 3747641 frags: 0 
>>> dirty_tx: 3747640 tx_left: 1
>>> 118704:          <idle>-0     [000] ..s.  5235.416057: rtl8169_poll: 
>>> ORP33 eth1 rtl_tx len: 54 entry: 20 cur_tx: 3747668 frags: 0 
>>> dirty_tx: 3747667 tx_left: 1
>>> 346319:          hping3-2871  [000] ....  5236.002456: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 8 
>>> cur_tx: 3754312 frags: 0 dirty_tx: 3754312 tx_left: 0
>>> 346337:          hping3-2871  [000] ..s.  5236.002475: rtl8169_poll: 
>>> ORP33 eth1 rtl_tx len: 54 entry: 9 cur_tx: 3754313 frags: 0 
>>> dirty_tx: 3754312 tx_left: 1
>>> 1277155:          hping3-2871  [001] ....  5237.002627: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 38 
>>> cur_tx: 3766502 frags: 0 dirty_tx: 3766502 tx_left: 0
>>> 1277173:          <idle>-0     [000] ..s.  5237.002646: 
>>> rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 39 cur_tx: 3766503 
>>> frags: 0 dirty_tx: 3766502 tx_left: 1
>>>
>>>
>>> //imcp messages from route to intenal network when TCP stream is on. 
>>> tracing.out3236/eth1_ring.txt is the original
>>> //no idea why the first frame is tripled...
>>> 200792:          hping3-2871  [001] ....  3239.593384: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 34 
>>> cur_tx: 2132770 frags: 0 dirty_tx: 2132770 tx_left: 0
>>> 201578:          hping3-2871  [001] ....  3240.593625: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 40 
>>> cur_tx: 2132776 frags: 0 dirty_tx: 2132776 tx_left: 0
>>> 341061:          <idle>-0     [000] ..s.  3241.738011: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 52 
>>> cur_tx: 2133684 frags: 0 dirty_tx: 2133680 tx_left: 4
>>> 342112:          <idle>-0     [000] ..s.  3241.745297: rtl8169_poll: 
>>> ORP33 eth1 rtl_tx len: 54 entry: 59 cur_tx: 2133691 frags: 0 
>>> dirty_tx: 2133684 tx_left: 7
>>> 500126:          <idle>-0     [000] ..s.  3242.739500: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 49 
>>> cur_tx: 2134513 frags: 0 dirty_tx: 2134505 tx_left: 8
>>> 501739:          <idle>-0     [000] ..s.  3242.751654: rtl8169_poll: 
>>> ORP33 eth1 rtl_tx len: 54 entry: 58 cur_tx: 2134522 frags: 0 
>>> dirty_tx: 2134513 tx_left: 9
>>> 646220:          <idle>-0     [000] ..s.  3243.745761: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 41 
>>> cur_tx: 2135273 frags: 0 dirty_tx: 2135267 tx_left: 6
>>> 647633:          <idle>-0     [000] ..s.  3243.755486: rtl8169_poll: 
>>> ORP33 eth1 rtl_tx len: 54 entry: 49 cur_tx: 2135281 frags: 0 
>>> dirty_tx: 2135273 tx_left: 8
>>> 802878:          <idle>-0     [000] ..s.  3244.739947: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 
>>> cur_tx: 2136087 frags: 0 dirty_tx: 2136081 tx_left: 6
>>> 804298:          <idle>-0     [000] ..s.  3244.749677: rtl8169_poll: 
>>> ORP33 eth1 rtl_tx len: 54 entry: 31 cur_tx: 2136095 frags: 0 
>>> dirty_tx: 2136087 tx_left: 8
>>> 961190:          <idle>-0     [000] ..s.  3245.746217: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 16 
>>> cur_tx: 2136912 frags: 0 dirty_tx: 2136906 tx_left: 6
>>> 962610:          <idle>-0     [000] ..s.  3245.755946: rtl8169_poll: 
>>> ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2136920 frags: 0 
>>> dirty_tx: 2136912 tx_left: 8
>>> 1118044:          <idle>-0     [000] ..s.  3246.740336: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 62 
>>> cur_tx: 2137726 frags: 0 dirty_tx: 2137720 tx_left: 6
>>> 1119443:          <idle>-0     [000] ..s.  3246.750047: 
>>> rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 5 cur_tx: 2137733 
>>> frags: 0 dirty_tx: 2137726 tx_left: 7
>>> 1264047:          <idle>-0     [000] ..s.  3247.709202: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 29 
>>> cur_tx: 2138525 frags: 0 dirty_tx: 2138518 tx_left: 7
>>> 1264740:          <idle>-0     [000] ..s.  3247.718918: 
>>> rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 33 cur_tx: 2138529 
>>> frags: 0 dirty_tx: 2138525 tx_left: 4
>>> 1419958:          <idle>-0     [000] ..s.  3248.736436: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 48 
>>> cur_tx: 2139312 frags: 0 dirty_tx: 2139305 tx_left: 7
>>> 1421551:          <idle>-0     [000] ..s.  3248.746189: 
>>> rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 56 cur_tx: 2139320 
>>> frags: 0 dirty_tx: 2139312 tx_left: 8
>>> 1578746:          <idle>-0     [000] ..s.  3249.742702: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 46 
>>> cur_tx: 2140142 frags: 0 dirty_tx: 2140135 tx_left: 7
>>> 1580194:          <idle>-0     [000] ..s.  3249.752457: 
>>> rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 54 cur_tx: 2140150 
>>> frags: 0 dirty_tx: 2140142 tx_left: 8
>>> 1729597:          <idle>-0     [000] ..s.  3250.746839: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 36 
>>> cur_tx: 2140964 frags: 0 dirty_tx: 2140957 tx_left: 7
>>> 1731015:          <idle>-0     [000] ..s.  3250.756594: 
>>> rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 43 cur_tx: 2140971 
>>> frags: 0 dirty_tx: 2140964 tx_left: 7
>>> 1881028:          <idle>-0     [000] ..s.  3251.740991: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 19 
>>> cur_tx: 2141779 frags: 0 dirty_tx: 2141772 tx_left: 7
>>> 1881381:          <idle>-0     [000] ..s.  3251.750718: 
>>> rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 26 cur_tx: 2141786 
>>> frags: 0 dirty_tx: 2141779 tx_left: 7
>>> //something happens, the output queue empties.. now it is fast, and 
>>> rtl8169_start_xmit is not called from soft_irq
>>> 1896178:          hping3-2871  [001] ....  3252.595994: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 
>>> cur_tx: 2141911 frags: 0 dirty_tx: 2141911 tx_left: 0
>>> 1896196:          <idle>-0     [000] ..s.  3252.596055: 
>>> rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2141912 
>>> frags: 0 dirty_tx: 2141911 tx_left: 1
>>> 1923136:          hping3-2871  [001] ....  3253.596227: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 50 
>>> cur_tx: 2142066 frags: 0 dirty_tx: 2142066 tx_left: 0
>>> 1923154:          <idle>-0     [000] ..s.  3253.596289: 
>>> rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 51 cur_tx: 2142067 
>>> frags: 0 dirty_tx: 2142066 tx_left: 1
>>> 1951633:          hping3-2871  [001] ....  3254.596459: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 5 
>>> cur_tx: 2142213 frags: 0 dirty_tx: 2142213 tx_left: 0
>>> 1951651:          <idle>-0     [000] ..s.  3254.596521: 
>>> rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 6 cur_tx: 2142214 
>>> frags: 0 dirty_tx: 2142213 tx_left: 1
>>> 1954472:          hping3-2871  [001] ....  3255.596700: 
>>> rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 15 
>>> cur_tx: 2142223 frags: 0 dirty_tx: 2142223 tx_left: 0
>>> 1954490:          <idle>-0     [000] ..s.  3255.596763: 
>>> rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 16 cur_tx: 2142224 
>>> frags: 0 dirty_tx: 2142223 tx_left: 1
>>> //until it soon, again is slow..
>>>
>>>
>>> BR.
>>> Risto
>>>
>>>
>>>
>>> On 7.12.2018 16:46, Risto Pajula wrote:
>>>> Hello.
>>>>
>>>> I have been to track the poor forwarding latency to the TCP Window 
>>>> scale options. The Netgem device uses rather large windows scale 
>>>> options (x256) and I have been able to reproduce the routers poor 
>>>> forwarding latency also with linux box running in the internal 
>>>> network and changing the net.ipv4.tcp_rmem to a large value and 
>>>> thus changing the TCP window scaling options to larger ones. I 
>>>> still do not have clue why this causes the forwarfing in the linux 
>>>> kernel to block? Maybe something in the connection tracking....?
>>>>
>>>>
>>>> With the ICMP timestamp messages I have been able to also pinpoint 
>>>> that the latency is caused in the eth1 sending side (the following 
>>>> hping3 example is run in the router toward the internal network...
>>>>
>>>>
>>>> xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
>>>> using eth1, addr: 192.168.0.1, MTU: 1500
>>>> HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers 
>>>> + 0 data bytes
>>>> len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
>>>> icmp_seq=0 rtt=7.9 ms
>>>> ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
>>>> ICMP timestamp RTT tsrtt=7
>>>>
>>>> len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
>>>> icmp_seq=1 rtt=235.9 ms
>>>> ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
>>>> ICMP timestamp RTT tsrtt=235
>>>>
>>>> len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
>>>> icmp_seq=2 rtt=3.8 ms
>>>> ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
>>>> ICMP timestamp RTT tsrtt=3
>>>>
>>>> len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
>>>> icmp_seq=3 rtt=47.8 ms
>>>> ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
>>>> ICMP timestamp RTT tsrtt=47
>>>>
>>>> len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
>>>> icmp_seq=4 rtt=7.7 ms
>>>> ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
>>>> ICMP timestamp RTT tsrtt=7
>>>>
>>>> len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
>>>> icmp_seq=5 rtt=3.7 ms
>>>> ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
>>>> ICMP timestamp RTT tsrtt=3
>>>>
>>>> ^C
>>>> --- 192.168.0.112 hping statistic ---
>>>> 6 packets tramitted, 6 packets received, 0% packet loss
>>>> round-trip min/avg/max = 3.7/51.1/235.9 ms
>>>>
>>>>
>>>>
>>>> BR.
>>>> Risto
>>>>
>>>> On 2.12.2018 23:32, Risto Pajula wrote:
>>>>> Hello.
>>>>>
>>>>> You can most likely ignore the "DF Bit, mtu bug when forwarding" 
>>>>> case. There isn't actually big IP packets on the wire, instead 
>>>>> there is burst of packets on the wire, which are combined by the 
>>>>> GRO... And thus dropping them should not happen. Sorry about the 
>>>>> invalid bug report.
>>>>>
>>>>> However the poor latency from intenal network to the internet 
>>>>> still remain, both GRO enabled and disabled. I will try to study 
>>>>> further...
>>>>>
>>>>>
>>>>> BR.
>>>>> Risto
>>>>>
>>>>>
>>>>> On 2.12.2018 14:01, Risto Pajula wrote:
>>>>>> Hello.
>>>>>>
>>>>>> I have encountered a weird performance problem in Linux IP 
>>>>>> fragmentation when using video streaming services behind the NAT. 
>>>>>> Also I have studied a possible bug in the DF bit (don't fragment) 
>>>>>> handling when forwarding the IP packets.
>>>>>>
>>>>>> First the system setup description:
>>>>>>
>>>>>> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre 
>>>>>> router]-internet
>>>>>>
>>>>>> where:
>>>>>> host1: is a Netgem N7800 "cable box" for online video streaming 
>>>>>> services provided by local telco (Can access Netflix, HBO nordic, 
>>>>>> "live TV", etc.)
>>>>>> linux router: Linux computer with Dualcore Intel Celeron G1840, 
>>>>>> running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
>>>>>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 
>>>>>> network, mtu set to 1500, RTL8169sb/8110sb
>>>>>> eth0: Linux Routers internet facing interface, public ip address, 
>>>>>> mtu set to 1500,  RTL8168evl/8111evl
>>>>>> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly 
>>>>>> connected to the eth0 of the Linux router.
>>>>>>
>>>>>> And now when using the Netgem N7800 with online video services 
>>>>>> (Netflix, HBO nordic, etc) the Linux router will receive very BIG 
>>>>>> IP packets in the eth0 upto ~20kB, this seems to lead to the 
>>>>>> following problems in the Linux IP stack.
>>>>>>
>>>>>> IP fragmentation performance:
>>>>>> When the Linux router receives these large IP packets in the eth0 
>>>>>> everything works, but it seems that them cause very large 
>>>>>> performance degradation from internal network to the internet 
>>>>>> regarding the latency when the IP fragmentation is performed. The 
>>>>>> ping latency from internal network to the internel network 
>>>>>> increases from stable 15ms-20ms up to 700-800ms AND also the ping 
>>>>>> from the internal network to the linux router eth1 (192.168.0.). 
>>>>>> However up link works perfectly, the ping is still stable when 
>>>>>> streaming the online services (From linux router to the 
>>>>>> internet). It seems that the IP fragmentation is somehow blocking 
>>>>>> the eth1 reception or transmission for very long time (which it 
>>>>>> shouldn't). I'm able to test and debug the issue further, but 
>>>>>> advice regarding where to look would be appreciated.
>>>>>>
>>>>>>
>>>>>> DF Bit, mtu bug when forwarding:
>>>>>> I have started to study the above mentioned problem and have 
>>>>>> found a possible bug in the DF bit and mtu handling in IP 
>>>>>> forwarding. The BIG packets received from streaming services all 
>>>>>> have the "DF bit" set and the question is that should we be 
>>>>>> forwarding them at all as that would result them being 
>>>>>> fragmented? Apparently we currently are... I have traced this 
>>>>>> down to the ip_forward.c function ip_exceeds_mtu(), and the 
>>>>>> following patch seems to fix that.
>>>>>>
>>>>>> --- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
>>>>>> +++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
>>>>>> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
>>>>>>                  return false;
>>>>>>
>>>>>>          /* original fragment exceeds mtu and DF is set */
>>>>>> -       if (unlikely(IPCB(skb)->frag_max_size > mtu))
>>>>>> +        if (unlikely(skb->len > mtu))
>>>>>>                  return true;
>>>>>>
>>>>>>          if (skb->ignore_df)
>>>>>>
>>>>>>
>>>>>> This seems to work (in some ways) - after the change IP packets 
>>>>>> that are too large to the internal network get dropped and we are 
>>>>>> sending "ICMP Destination unreachable, The datagram is too big" 
>>>>>> messages to the originator (as we should?). However it seems that 
>>>>>> not all services really like this... Netflix behaves as expected 
>>>>>> and ping is stable from internal network to the internet, but for 
>>>>>> example HBO nordic will not work anymore (too little buffering? 
>>>>>> Retransimissions not working?). So it seems the original issue 
>>>>>> should be also fixed (And the fragmention should be allowed?).
>>>>>>
>>>>>>
>>>>>>
>>>>>> Any advice would be appreciated. Thanks!
>>>>>>
>>>>>> PS. Watching TV was not this intensive 20 years ago :)
>>>>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: IP (rtl8169) forwarding bug (performance)
  2018-12-11 17:01           ` Risto Pajula
@ 2018-12-11 19:51             ` Heiner Kallweit
  2018-12-12  1:28               ` Risto Pajula
  0 siblings, 1 reply; 14+ messages in thread
From: Heiner Kallweit @ 2018-12-11 19:51 UTC (permalink / raw)
  To: Risto Pajula
  Cc: David S. Miller, Alexey Kuznetsov, Realtek linux nic maintainers, netdev

OK, then another idea .. At the very beginning of the mail thread it
was stated that the router has to network ports:
linux router: Linux computer with Dualcore Intel Celeron G1840, running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, mtu set to 1500, RTL8169sb/8110sb
eth0: Linux Routers internet facing interface, public ip address, mtu set to 1500,  RTL8168evl/8111evl

r8169 supports about 50 members of the RTL8169/RTL8168/RTL8101 family
and more or less every single member needs its own quirks.
RTL8168evl (PCIe) is somewhat recent, RTL8169sb (PCI) is ancient.
To rule out you triggered some hardware issue: Could you switch both
interfaces and check whether you see a change in system behavior?

On 11.12.2018 18:01, Risto Pajula wrote:
> Hello.
> 
> A freshly built 4.20.0-rc6-next-20181210-lp150.12.25-default waited me when I got back from work, but unfortunately it did not help at all, it behaved exactly in same manner.
> 
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time=73ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time=83ms TTL=64
> Reply from 192.168.0.1: bytes=32 time=307ms TTL=64
> Reply from 192.168.0.1: bytes=32 time=115ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> 
> BR.
> Risto
> 
> 
> On 11.12.2018 0:20, Risto Pajula wrote:
>> Hello.
>>
>> I have not yet tested with linux-next but I will, thanks for pointing that out.
>>
>> ...But I have studied the problem a bit more, indeed it seems that the rtl8169 transmission queue gets stuck.
>>
>> Below is some trace log. Starting from 802026 a burst of frames is forwarded from eth0 to eth1 and when we get to the rtl_tx only two frames have been transmitted.. (Should have had 1,25 ms to transmit them).. Also then the sequence seems to repeat for very long time that only two massages get transmitted.
>>
>> 800541:          <idle>-0     [001] ..s.  7237.698143: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 34 cur_tx: 4406946 frags: 0 dirty_tx: 4406946 tx_left: 0
>> 800671:          <idle>-0     [001] ..s.  7237.698189: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 35 cur_tx: 4406947 frags: 0 dirty_tx: 4406946 tx_left: 1
>> 800835:          <idle>-0     [001] ..s.  7237.698235: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 36 cur_tx: 4406948 frags: 0 dirty_tx: 4406946 tx_left: 2
>> 800905:          <idle>-0     [000] ..s.  7237.698255: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406946 tx_left: 3
>> 800932:          <idle>-0     [000] ..s.  7237.698263: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406947 tx_left: 2
>> 800960:          <idle>-0     [000] ..s.  7237.698270: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406948 tx_left: 1
>> 800998:          <idle>-0     [001] ..s.  7237.698282: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406949 tx_left: 0
>> 801085:          <idle>-0     [000] ..s.  7237.698312: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 38 cur_tx: 4406950 frags: 0 dirty_tx: 4406949 tx_left: 1
>> 801137:          <idle>-0     [001] ..s.  7237.698329: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 38 cur_tx: 4406950 frags: 0 dirty_tx: 4406950 tx_left: 0
>> 801229:          <idle>-0     [000] ..s.  7237.698363: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 39 cur_tx: 4406951 frags: 0 dirty_tx: 4406950 tx_left: 1
>> 801264:          <idle>-0     [001] ..s.  7237.698375: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 39 cur_tx: 4406951 frags: 0 dirty_tx: 4406951 tx_left: 0
>> 801356:          <idle>-0     [000] ..s.  7237.698409: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 40 cur_tx: 4406952 frags: 0 dirty_tx: 4406951 tx_left: 1
>> 801391:          <idle>-0     [001] ..s.  7237.698421: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 40 cur_tx: 4406952 frags: 0 dirty_tx: 4406952 tx_left: 0
>> 801482:          <idle>-0     [000] ..s.  7237.698455: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 41 cur_tx: 4406953 frags: 0 dirty_tx: 4406952 tx_left: 1
>> 801518:          <idle>-0     [001] ..s.  7237.698468: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 41 cur_tx: 4406953 frags: 0 dirty_tx: 4406953 tx_left: 0
>> 801610:          <idle>-0     [000] ..s.  7237.698502: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 42 cur_tx: 4406954 frags: 0 dirty_tx: 4406953 tx_left: 1
>> 801645:          <idle>-0     [001] ..s.  7237.698514: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 42 cur_tx: 4406954 frags: 0 dirty_tx: 4406954 tx_left: 0
>> 801737:          <idle>-0     [000] ..s.  7237.698548: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 43 cur_tx: 4406955 frags: 0 dirty_tx: 4406954 tx_left: 1
>> 801772:          <idle>-0     [001] ..s.  7237.698560: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 43 cur_tx: 4406955 frags: 0 dirty_tx: 4406955 tx_left: 0
>> 801864:          <idle>-0     [000] ..s.  7237.698594: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 44 cur_tx: 4406956 frags: 0 dirty_tx: 4406955 tx_left: 1
>> 801899:          <idle>-0     [001] ..s.  7237.698607: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 44 cur_tx: 4406956 frags: 0 dirty_tx: 4406956 tx_left: 0
>> 801984:          <idle>-0     [000] ..s.  7237.698640: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 45 cur_tx: 4406957 frags: 0 dirty_tx: 4406956 tx_left: 1
>> 802026:          <idle>-0     [001] ..s.  7237.698655: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 45 cur_tx: 4406957 frags: 0 dirty_tx: 4406957 tx_left: 0
>> 802129:          <idle>-0     [001] ..s.  7237.698701: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 46 cur_tx: 4406958 frags: 0 dirty_tx: 4406957 tx_left: 1
>> 802232:          <idle>-0     [001] ..s.  7237.698747: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 47 cur_tx: 4406959 frags: 0 dirty_tx: 4406957 tx_left: 2
>> 802335:          <idle>-0     [001] ..s.  7237.698793: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 48 cur_tx: 4406960 frags: 0 dirty_tx: 4406957 tx_left: 3
>> 802438:          <idle>-0     [001] ..s.  7237.698840: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 49 cur_tx: 4406961 frags: 0 dirty_tx: 4406957 tx_left: 4
>> 802541:          <idle>-0     [001] ..s.  7237.698885: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 50 cur_tx: 4406962 frags: 0 dirty_tx: 4406957 tx_left: 5
>> 802644:          <idle>-0     [001] ..s.  7237.698932: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 51 cur_tx: 4406963 frags: 0 dirty_tx: 4406957 tx_left: 6
>> 802747:          <idle>-0     [001] ..s.  7237.698978: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 52 cur_tx: 4406964 frags: 0 dirty_tx: 4406957 tx_left: 7
>> 802851:          <idle>-0     [001] ..s.  7237.699025: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 53 cur_tx: 4406965 frags: 0 dirty_tx: 4406957 tx_left: 8
>> 805094:          <idle>-0     [000] ..s.  7237.699871: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406957 tx_left: 9
>> 805126:          <idle>-0     [000] ..s.  7237.699878: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406958 tx_left: 8
>> 805197:          <idle>-0     [001] ..s.  7237.699905: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406959 tx_left: 7
>> 805234:          <idle>-0     [001] ..s.  7237.699925: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 55 cur_tx: 4406967 frags: 0 dirty_tx: 4406959 tx_left: 8
>> 811446:          <idle>-0     [000] ..s.  7237.702344: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406959 tx_left: 9
>> 811479:          <idle>-0     [000] ..s.  7237.702354: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406960 tx_left: 8
>> 811641:          <idle>-0     [000] ..s.  7237.702395: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406961 tx_left: 7
>> 811733:          <idle>-0     [000] ..s.  7237.702416: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 57 cur_tx: 4406969 frags: 0 dirty_tx: 4406961 tx_left: 8
>> 812050:          hping3-2871  [000] ..s.  7237.704770: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406961 tx_left: 9
>> 812061:          hping3-2871  [000] ..s.  7237.704777: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406962 tx_left: 8
>> 812124:          hping3-2871  [000] ..s.  7237.704811: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406963 tx_left: 7
>> 812176:          hping3-2871  [000] ..s.  7237.704831: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 59 cur_tx: 4406971 frags: 0 dirty_tx: 4406963 tx_left: 8
>> 812300:          <idle>-0     [000] ..s.  7237.707247: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406963 tx_left: 9
>> 812312:          <idle>-0     [000] ..s.  7237.707257: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406964 tx_left: 8
>> 812389:          <idle>-0     [000] ..s.  7237.707297: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406965 tx_left: 7
>> 812426:          <idle>-0     [000] ..s.  7237.707318: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 61 cur_tx: 4406973 frags: 0 dirty_tx: 4406965 tx_left: 8
>> 812827:          <idle>-0     [000] ..s.  7237.709690: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 4406965 tx_left: 9
>> 812838:          <idle>-0     [000] ..s.  7237.709700: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 4406966 tx_left: 8
>> .....
>>
>>
>> BR.
>> Risto
>>
>> On 10.12.2018 23:26, Heiner Kallweit wrote:
>>> Did you test also with the latest linux-next kernel? Some recent changes like 2e6eedb4813e
>>> "r8169: make use of xmit_more and __netdev_sent_queue" may have a positive impact.
>>>
>>> On 10.12.2018 00:28, Risto Pajula wrote:
>>>> Hello.
>>>>
>>>> Old subject: "Re: IP fragmentation performance and don't fragment bug when forwarding
>>>>
>>>>
>>>> I have now been tracing the kernel and finding the bug seems difficult. I think the bug is combination of several things, likely cause is that it only occurs with rtl8169 and how it is using the netdevapi/DQL api.
>>>>
>>>>  From my investigations seems that following happens:
>>>>
>>>> *Burst of frames is received from internet. (eth0)
>>>> *These are accepted and forwarded to the (eth1)
>>>> *DQL hits the limit, this causes the scheduling for the device to be stopped (__QUEUE_STATE_STACK_XOFF)
>>>> *This and combination of some timing causes the  performance degradation, messages are only scheduled for transmission in soft Irq context.
>>>>
>>>> I still do now know, why the DQL hits the limits.. I think it should not,  maybe something undesirable first happens with rtl8169.
>>>>
>>>> I inserted following trace printk functions to the code..
>>>>
>>>> rtl8169_start_xmit:
>>>> trace_printk("ORP33 %s rtl8169_start_xmit len: %d entry: %u cur_tx: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, skb->len, entry, tp->cur_tx, frags, tp->dirty_tx, (tp->cur_tx - tp->dirty_tx) );
>>>>
>>>>
>>>> rtl_tx:
>>>> trace_printk("ORP33 %s rtl_tx len: %d entry: %u cur_tx: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, tx_skb->skb->len, tp->cur_tx % NUM_TX_DESC, tp->cur_tx, 0, dirty_tx, tx_left );
>>>>
>>>>
>>>> Here is some grepped output only filtering the ping packet in two different situations:
>>>>
>>>> //trace when downloading ubuntu (~20MB/s), only 3 secs fit to buffer. ping is all the time normal....
>>>> //orig tracing.out3754
>>>> 117493:          <idle>-0     [000] ..s.  5235.407116: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 57 cur_tx: 3747641 frags: 0 dirty_tx: 3747640 tx_left: 1
>>>> 118704:          <idle>-0     [000] ..s.  5235.416057: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 20 cur_tx: 3747668 frags: 0 dirty_tx: 3747667 tx_left: 1
>>>> 346319:          hping3-2871  [000] ....  5236.002456: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 8 cur_tx: 3754312 frags: 0 dirty_tx: 3754312 tx_left: 0
>>>> 346337:          hping3-2871  [000] ..s.  5236.002475: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 9 cur_tx: 3754313 frags: 0 dirty_tx: 3754312 tx_left: 1
>>>> 1277155:          hping3-2871  [001] ....  5237.002627: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 38 cur_tx: 3766502 frags: 0 dirty_tx: 3766502 tx_left: 0
>>>> 1277173:          <idle>-0     [000] ..s.  5237.002646: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 39 cur_tx: 3766503 frags: 0 dirty_tx: 3766502 tx_left: 1
>>>>
>>>>
>>>> //imcp messages from route to intenal network when TCP stream is on. tracing.out3236/eth1_ring.txt is the original
>>>> //no idea why the first frame is tripled...
>>>> 200792:          hping3-2871  [001] ....  3239.593384: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 34 cur_tx: 2132770 frags: 0 dirty_tx: 2132770 tx_left: 0
>>>> 201578:          hping3-2871  [001] ....  3240.593625: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 40 cur_tx: 2132776 frags: 0 dirty_tx: 2132776 tx_left: 0
>>>> 341061:          <idle>-0     [000] ..s.  3241.738011: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 52 cur_tx: 2133684 frags: 0 dirty_tx: 2133680 tx_left: 4
>>>> 342112:          <idle>-0     [000] ..s.  3241.745297: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 59 cur_tx: 2133691 frags: 0 dirty_tx: 2133684 tx_left: 7
>>>> 500126:          <idle>-0     [000] ..s.  3242.739500: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 49 cur_tx: 2134513 frags: 0 dirty_tx: 2134505 tx_left: 8
>>>> 501739:          <idle>-0     [000] ..s.  3242.751654: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 58 cur_tx: 2134522 frags: 0 dirty_tx: 2134513 tx_left: 9
>>>> 646220:          <idle>-0     [000] ..s.  3243.745761: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 41 cur_tx: 2135273 frags: 0 dirty_tx: 2135267 tx_left: 6
>>>> 647633:          <idle>-0     [000] ..s.  3243.755486: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 49 cur_tx: 2135281 frags: 0 dirty_tx: 2135273 tx_left: 8
>>>> 802878:          <idle>-0     [000] ..s.  3244.739947: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2136087 frags: 0 dirty_tx: 2136081 tx_left: 6
>>>> 804298:          <idle>-0     [000] ..s.  3244.749677: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 31 cur_tx: 2136095 frags: 0 dirty_tx: 2136087 tx_left: 8
>>>> 961190:          <idle>-0     [000] ..s.  3245.746217: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 16 cur_tx: 2136912 frags: 0 dirty_tx: 2136906 tx_left: 6
>>>> 962610:          <idle>-0     [000] ..s.  3245.755946: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2136920 frags: 0 dirty_tx: 2136912 tx_left: 8
>>>> 1118044:          <idle>-0     [000] ..s.  3246.740336: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 62 cur_tx: 2137726 frags: 0 dirty_tx: 2137720 tx_left: 6
>>>> 1119443:          <idle>-0     [000] ..s.  3246.750047: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 5 cur_tx: 2137733 frags: 0 dirty_tx: 2137726 tx_left: 7
>>>> 1264047:          <idle>-0     [000] ..s.  3247.709202: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 29 cur_tx: 2138525 frags: 0 dirty_tx: 2138518 tx_left: 7
>>>> 1264740:          <idle>-0     [000] ..s.  3247.718918: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 33 cur_tx: 2138529 frags: 0 dirty_tx: 2138525 tx_left: 4
>>>> 1419958:          <idle>-0     [000] ..s.  3248.736436: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 48 cur_tx: 2139312 frags: 0 dirty_tx: 2139305 tx_left: 7
>>>> 1421551:          <idle>-0     [000] ..s.  3248.746189: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 56 cur_tx: 2139320 frags: 0 dirty_tx: 2139312 tx_left: 8
>>>> 1578746:          <idle>-0     [000] ..s.  3249.742702: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 46 cur_tx: 2140142 frags: 0 dirty_tx: 2140135 tx_left: 7
>>>> 1580194:          <idle>-0     [000] ..s.  3249.752457: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 54 cur_tx: 2140150 frags: 0 dirty_tx: 2140142 tx_left: 8
>>>> 1729597:          <idle>-0     [000] ..s.  3250.746839: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 36 cur_tx: 2140964 frags: 0 dirty_tx: 2140957 tx_left: 7
>>>> 1731015:          <idle>-0     [000] ..s.  3250.756594: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 43 cur_tx: 2140971 frags: 0 dirty_tx: 2140964 tx_left: 7
>>>> 1881028:          <idle>-0     [000] ..s.  3251.740991: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 19 cur_tx: 2141779 frags: 0 dirty_tx: 2141772 tx_left: 7
>>>> 1881381:          <idle>-0     [000] ..s.  3251.750718: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 26 cur_tx: 2141786 frags: 0 dirty_tx: 2141779 tx_left: 7
>>>> //something happens, the output queue empties.. now it is fast, and rtl8169_start_xmit is not called from soft_irq
>>>> 1896178:          hping3-2871  [001] ....  3252.595994: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2141911 frags: 0 dirty_tx: 2141911 tx_left: 0
>>>> 1896196:          <idle>-0     [000] ..s.  3252.596055: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2141912 frags: 0 dirty_tx: 2141911 tx_left: 1
>>>> 1923136:          hping3-2871  [001] ....  3253.596227: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 50 cur_tx: 2142066 frags: 0 dirty_tx: 2142066 tx_left: 0
>>>> 1923154:          <idle>-0     [000] ..s.  3253.596289: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 51 cur_tx: 2142067 frags: 0 dirty_tx: 2142066 tx_left: 1
>>>> 1951633:          hping3-2871  [001] ....  3254.596459: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 5 cur_tx: 2142213 frags: 0 dirty_tx: 2142213 tx_left: 0
>>>> 1951651:          <idle>-0     [000] ..s.  3254.596521: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 6 cur_tx: 2142214 frags: 0 dirty_tx: 2142213 tx_left: 1
>>>> 1954472:          hping3-2871  [001] ....  3255.596700: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 15 cur_tx: 2142223 frags: 0 dirty_tx: 2142223 tx_left: 0
>>>> 1954490:          <idle>-0     [000] ..s.  3255.596763: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 16 cur_tx: 2142224 frags: 0 dirty_tx: 2142223 tx_left: 1
>>>> //until it soon, again is slow..
>>>>
>>>>
>>>> BR.
>>>> Risto
>>>>
>>>>
>>>>
>>>> On 7.12.2018 16:46, Risto Pajula wrote:
>>>>> Hello.
>>>>>
>>>>> I have been to track the poor forwarding latency to the TCP Window scale options. The Netgem device uses rather large windows scale options (x256) and I have been able to reproduce the routers poor forwarding latency also with linux box running in the internal network and changing the net.ipv4.tcp_rmem to a large value and thus changing the TCP window scaling options to larger ones. I still do not have clue why this causes the forwarfing in the linux kernel to block? Maybe something in the connection tracking....?
>>>>>
>>>>>
>>>>> With the ICMP timestamp messages I have been able to also pinpoint that the latency is caused in the eth1 sending side (the following hping3 example is run in the router toward the internal network...
>>>>>
>>>>>
>>>>> xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
>>>>> using eth1, addr: 192.168.0.1, MTU: 1500
>>>>> HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0 data bytes
>>>>> len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
>>>>> icmp_seq=0 rtt=7.9 ms
>>>>> ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
>>>>> ICMP timestamp RTT tsrtt=7
>>>>>
>>>>> len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
>>>>> icmp_seq=1 rtt=235.9 ms
>>>>> ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
>>>>> ICMP timestamp RTT tsrtt=235
>>>>>
>>>>> len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
>>>>> icmp_seq=2 rtt=3.8 ms
>>>>> ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
>>>>> ICMP timestamp RTT tsrtt=3
>>>>>
>>>>> len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
>>>>> icmp_seq=3 rtt=47.8 ms
>>>>> ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
>>>>> ICMP timestamp RTT tsrtt=47
>>>>>
>>>>> len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
>>>>> icmp_seq=4 rtt=7.7 ms
>>>>> ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
>>>>> ICMP timestamp RTT tsrtt=7
>>>>>
>>>>> len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
>>>>> icmp_seq=5 rtt=3.7 ms
>>>>> ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
>>>>> ICMP timestamp RTT tsrtt=3
>>>>>
>>>>> ^C
>>>>> --- 192.168.0.112 hping statistic ---
>>>>> 6 packets tramitted, 6 packets received, 0% packet loss
>>>>> round-trip min/avg/max = 3.7/51.1/235.9 ms
>>>>>
>>>>>
>>>>>
>>>>> BR.
>>>>> Risto
>>>>>
>>>>> On 2.12.2018 23:32, Risto Pajula wrote:
>>>>>> Hello.
>>>>>>
>>>>>> You can most likely ignore the "DF Bit, mtu bug when forwarding" case. There isn't actually big IP packets on the wire, instead there is burst of packets on the wire, which are combined by the GRO... And thus dropping them should not happen. Sorry about the invalid bug report.
>>>>>>
>>>>>> However the poor latency from intenal network to the internet still remain, both GRO enabled and disabled. I will try to study further...
>>>>>>
>>>>>>
>>>>>> BR.
>>>>>> Risto
>>>>>>
>>>>>>
>>>>>> On 2.12.2018 14:01, Risto Pajula wrote:
>>>>>>> Hello.
>>>>>>>
>>>>>>> I have encountered a weird performance problem in Linux IP fragmentation when using video streaming services behind the NAT. Also I have studied a possible bug in the DF bit (don't fragment) handling when forwarding the IP packets.
>>>>>>>
>>>>>>> First the system setup description:
>>>>>>>
>>>>>>> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre router]-internet
>>>>>>>
>>>>>>> where:
>>>>>>> host1: is a Netgem N7800 "cable box" for online video streaming services provided by local telco (Can access Netflix, HBO nordic, "live TV", etc.)
>>>>>>> linux router: Linux computer with Dualcore Intel Celeron G1840, running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
>>>>>>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, mtu set to 1500, RTL8169sb/8110sb
>>>>>>> eth0: Linux Routers internet facing interface, public ip address, mtu set to 1500,  RTL8168evl/8111evl
>>>>>>> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly connected to the eth0 of the Linux router.
>>>>>>>
>>>>>>> And now when using the Netgem N7800 with online video services (Netflix, HBO nordic, etc) the Linux router will receive very BIG IP packets in the eth0 upto ~20kB, this seems to lead to the following problems in the Linux IP stack.
>>>>>>>
>>>>>>> IP fragmentation performance:
>>>>>>> When the Linux router receives these large IP packets in the eth0 everything works, but it seems that them cause very large performance degradation from internal network to the internet regarding the latency when the IP fragmentation is performed. The ping latency from internal network to the internel network increases from stable 15ms-20ms up to 700-800ms AND also the ping from the internal network to the linux router eth1 (192.168.0.). However up link works perfectly, the ping is still stable when streaming the online services (From linux router to the internet). It seems that the IP fragmentation is somehow blocking the eth1 reception or transmission for very long time (which it shouldn't). I'm able to test and debug the issue further, but advice regarding where to look would be
  appreciated.
>>>>>>>
>>>>>>>
>>>>>>> DF Bit, mtu bug when forwarding:
>>>>>>> I have started to study the above mentioned problem and have found a possible bug in the DF bit and mtu handling in IP forwarding. The BIG packets received from streaming services all have the "DF bit" set and the question is that should we be forwarding them at all as that would result them being fragmented? Apparently we currently are... I have traced this down to the ip_forward.c function ip_exceeds_mtu(), and the following patch seems to fix that.
>>>>>>>
>>>>>>> --- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
>>>>>>> +++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
>>>>>>> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
>>>>>>>                  return false;
>>>>>>>
>>>>>>>          /* original fragment exceeds mtu and DF is set */
>>>>>>> -       if (unlikely(IPCB(skb)->frag_max_size > mtu))
>>>>>>> +        if (unlikely(skb->len > mtu))
>>>>>>>                  return true;
>>>>>>>
>>>>>>>          if (skb->ignore_df)
>>>>>>>
>>>>>>>
>>>>>>> This seems to work (in some ways) - after the change IP packets that are too large to the internal network get dropped and we are sending "ICMP Destination unreachable, The datagram is too big" messages to the originator (as we should?). However it seems that not all services really like this... Netflix behaves as expected and ping is stable from internal network to the internet, but for example HBO nordic will not work anymore (too little buffering? Retransimissions not working?). So it seems the original issue should be also fixed (And the fragmention should be allowed?).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Any advice would be appreciated. Thanks!
>>>>>>>
>>>>>>> PS. Watching TV was not this intensive 20 years ago :)
>>>>>>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: IP (rtl8169) forwarding bug (performance)
  2018-12-11 19:51             ` Heiner Kallweit
@ 2018-12-12  1:28               ` Risto Pajula
  2018-12-12  6:23                 ` Heiner Kallweit
  0 siblings, 1 reply; 14+ messages in thread
From: Risto Pajula @ 2018-12-12  1:28 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: David S. Miller, Alexey Kuznetsov, Realtek linux nic maintainers, netdev

Hello.

I added some debug prints to diagnose the bug properly. I can send the 
patches if you are willing to debug/try... for example this output is 
produced:

96096:          <idle>-0     [000] ..s.   232.466703: 
rtl8169_start_xmit: RTLDBG221 eth1 rtl8169_start_xmit len: 1506 opts1: 
B0000000 txpol: 0 entry: 24 cur_tx: 5656 frags: 0 dirty_tx: 5648 tx_left: 8
96097:          <idle>-0     [000] ..s.   232.466704: 
rtl8169_start_xmit: RTLDBG111 eth1 rtl_tx_slots_avail
96099:          <idle>-0     [000] d.h.   232.468827: rtl8169_interrupt: 
RTLDBG111 eth1 rtl8169_interrupt
96103:          <idle>-0     [000] ..s.   232.468833: rtl8169_poll: 
RTLDBG111 eth1 rtl8169_poll
96104:          <idle>-0     [000] ..s.   232.468834: rtl8169_poll: 
RTLDBG111 eth1 rtl_rx
96105:          <idle>-0     [000] ..s.   232.468835: rtl8169_poll: 
RTLDBG111 eth1 rtl8169_try_rx_copy
96184:          <idle>-0     [000] ..s.   232.468914: rtl_tx: RTLDBG222 
eth1 rtl_tx len: 1506 opts1: B00005E2 txpol: 0 entry: 16 cur_tx: 5657 
frags: 0 dirty_tx: 5648 tx_left: 9
96185:          <idle>-0     [000] ..s.   232.468924: rtl_tx: RTLDBG223 
eth1 rtl_tx entry opts1: 16, B00005E2 17, B00005E2 18, B00005E2 19, 
B00005E2 20, B00005E2 21, B00005E2 22, B00005E2 23, B00005E2 24, B00005E2
96189:          <idle>-0     [000] d.h.   232.469006: rtl8169_interrupt: 
RTLDBG111 eth1 rtl8169_interrupt
96193:          <idle>-0     [000] ..s.   232.469012: rtl8169_poll: 
RTLDBG111 eth1 rtl8169_poll
96194:          <idle>-0     [000] ..s.   232.469014: rtl8169_poll: 
RTLDBG111 eth1 rtl_rx
96196:          <idle>-0     [000] ..s.   232.469020: rtl_tx: RTLDBG222 
eth1 rtl_tx len: 1506 opts1: 30000000 txpol: 0 entry: 16 cur_tx: 5657 
frags: 0 dirty_tx: 5648 tx_left: 9
96204:          <idle>-0     [000] ..s.   232.469032: rtl_tx: RTLDBG222 
eth1 rtl_tx len: 1506 opts1: 30000000 txpol: 0 entry: 17 cur_tx: 5657 
frags: 0 dirty_tx: 5649 tx_left: 8
96212:          <idle>-0     [000] ..s.   232.469041: rtl_tx: RTLDBG222 
eth1 rtl_tx len: 1506 opts1: B00005E2 txpol: 0 entry: 18 cur_tx: 5657 
frags: 0 dirty_tx: 5650 tx_left: 7
96213:          <idle>-0     [000] ..s.   232.469050: rtl_tx: RTLDBG223 
eth1 rtl_tx entry opts1: 18, B00005E2 19, B00005E2 20, B00005E2 21, 
B00005E2 22, B00005E2 23, B00005E2 24, B00005E2
96235:          <idle>-0     [000] d.h.   232.471253: rtl8169_interrupt: 
RTLDBG111 eth1 rtl8169_interrupt

Here we can clearly see that TX Fifo descriptors are properly setup and 
send is initiated, but then the rtl8169 stops sending after few frames. 
I think the bug is in the HW side (or in the rtl8169 firmware). Some PCI 
settings might affect this, but then again shouldn't the rtl8169 then 
report PCI error interrupt...

Also this TX Fifo stalling happens with any traffic and no 
netflix/netgem/large tcp windows are required to trigger it. What the 
netgem/netflix does is that it causes this bug to be visible by probably 
due to timing of the traffic pattern, and thus causes also upper layers 
of the network stack to take actions..

RTL8168evl indeed is a onboard Ethernet on a reasonable new mothorboard 
and the RTL8169sb is and old PCI card. But, the same issue happens with 
both ethernet adapters (I switched the interfaces).

But then again - this got me thinking... If the issue (send fifo 
stalling) does not happen in the internet facing interface even with 
heavy upload traffic, then the problem must be in the internal network 
itself. Tomorrow I will change the Ethernet switch. There could be some 
sort of compatibility problem?

BR.
Risto


On 11.12.2018 21:51, Heiner Kallweit wrote:
> OK, then another idea .. At the very beginning of the mail thread it
> was stated that the router has to network ports:
> linux router: Linux computer with Dualcore Intel Celeron G1840, running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, mtu set to 1500, RTL8169sb/8110sb
> eth0: Linux Routers internet facing interface, public ip address, mtu set to 1500,  RTL8168evl/8111evl
>
> r8169 supports about 50 members of the RTL8169/RTL8168/RTL8101 family
> and more or less every single member needs its own quirks.
> RTL8168evl (PCIe) is somewhat recent, RTL8169sb (PCI) is ancient.
> To rule out you triggered some hardware issue: Could you switch both
> interfaces and check whether you see a change in system behavior?
>
> On 11.12.2018 18:01, Risto Pajula wrote:
>> Hello.
>>
>> A freshly built 4.20.0-rc6-next-20181210-lp150.12.25-default waited me when I got back from work, but unfortunately it did not help at all, it behaved exactly in same manner.
>>
>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>> Reply from 192.168.0.1: bytes=32 time=73ms TTL=64
>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>> Reply from 192.168.0.1: bytes=32 time=83ms TTL=64
>> Reply from 192.168.0.1: bytes=32 time=307ms TTL=64
>> Reply from 192.168.0.1: bytes=32 time=115ms TTL=64
>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>
>> BR.
>> Risto
>>
>>
>> On 11.12.2018 0:20, Risto Pajula wrote:
>>> Hello.
>>>
>>> I have not yet tested with linux-next but I will, thanks for pointing that out.
>>>
>>> ...But I have studied the problem a bit more, indeed it seems that the rtl8169 transmission queue gets stuck.
>>>
>>> Below is some trace log. Starting from 802026 a burst of frames is forwarded from eth0 to eth1 and when we get to the rtl_tx only two frames have been transmitted.. (Should have had 1,25 ms to transmit them).. Also then the sequence seems to repeat for very long time that only two massages get transmitted.
>>>
>>> 800541:          <idle>-0     [001] ..s.  7237.698143: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 34 cur_tx: 4406946 frags: 0 dirty_tx: 4406946 tx_left: 0
>>> 800671:          <idle>-0     [001] ..s.  7237.698189: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 35 cur_tx: 4406947 frags: 0 dirty_tx: 4406946 tx_left: 1
>>> 800835:          <idle>-0     [001] ..s.  7237.698235: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 36 cur_tx: 4406948 frags: 0 dirty_tx: 4406946 tx_left: 2
>>> 800905:          <idle>-0     [000] ..s.  7237.698255: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406946 tx_left: 3
>>> 800932:          <idle>-0     [000] ..s.  7237.698263: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406947 tx_left: 2
>>> 800960:          <idle>-0     [000] ..s.  7237.698270: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406948 tx_left: 1
>>> 800998:          <idle>-0     [001] ..s.  7237.698282: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406949 tx_left: 0
>>> 801085:          <idle>-0     [000] ..s.  7237.698312: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 38 cur_tx: 4406950 frags: 0 dirty_tx: 4406949 tx_left: 1
>>> 801137:          <idle>-0     [001] ..s.  7237.698329: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 38 cur_tx: 4406950 frags: 0 dirty_tx: 4406950 tx_left: 0
>>> 801229:          <idle>-0     [000] ..s.  7237.698363: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 39 cur_tx: 4406951 frags: 0 dirty_tx: 4406950 tx_left: 1
>>> 801264:          <idle>-0     [001] ..s.  7237.698375: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 39 cur_tx: 4406951 frags: 0 dirty_tx: 4406951 tx_left: 0
>>> 801356:          <idle>-0     [000] ..s.  7237.698409: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 40 cur_tx: 4406952 frags: 0 dirty_tx: 4406951 tx_left: 1
>>> 801391:          <idle>-0     [001] ..s.  7237.698421: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 40 cur_tx: 4406952 frags: 0 dirty_tx: 4406952 tx_left: 0
>>> 801482:          <idle>-0     [000] ..s.  7237.698455: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 41 cur_tx: 4406953 frags: 0 dirty_tx: 4406952 tx_left: 1
>>> 801518:          <idle>-0     [001] ..s.  7237.698468: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 41 cur_tx: 4406953 frags: 0 dirty_tx: 4406953 tx_left: 0
>>> 801610:          <idle>-0     [000] ..s.  7237.698502: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 42 cur_tx: 4406954 frags: 0 dirty_tx: 4406953 tx_left: 1
>>> 801645:          <idle>-0     [001] ..s.  7237.698514: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 42 cur_tx: 4406954 frags: 0 dirty_tx: 4406954 tx_left: 0
>>> 801737:          <idle>-0     [000] ..s.  7237.698548: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 43 cur_tx: 4406955 frags: 0 dirty_tx: 4406954 tx_left: 1
>>> 801772:          <idle>-0     [001] ..s.  7237.698560: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 43 cur_tx: 4406955 frags: 0 dirty_tx: 4406955 tx_left: 0
>>> 801864:          <idle>-0     [000] ..s.  7237.698594: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 44 cur_tx: 4406956 frags: 0 dirty_tx: 4406955 tx_left: 1
>>> 801899:          <idle>-0     [001] ..s.  7237.698607: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 44 cur_tx: 4406956 frags: 0 dirty_tx: 4406956 tx_left: 0
>>> 801984:          <idle>-0     [000] ..s.  7237.698640: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 45 cur_tx: 4406957 frags: 0 dirty_tx: 4406956 tx_left: 1
>>> 802026:          <idle>-0     [001] ..s.  7237.698655: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 45 cur_tx: 4406957 frags: 0 dirty_tx: 4406957 tx_left: 0
>>> 802129:          <idle>-0     [001] ..s.  7237.698701: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 46 cur_tx: 4406958 frags: 0 dirty_tx: 4406957 tx_left: 1
>>> 802232:          <idle>-0     [001] ..s.  7237.698747: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 47 cur_tx: 4406959 frags: 0 dirty_tx: 4406957 tx_left: 2
>>> 802335:          <idle>-0     [001] ..s.  7237.698793: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 48 cur_tx: 4406960 frags: 0 dirty_tx: 4406957 tx_left: 3
>>> 802438:          <idle>-0     [001] ..s.  7237.698840: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 49 cur_tx: 4406961 frags: 0 dirty_tx: 4406957 tx_left: 4
>>> 802541:          <idle>-0     [001] ..s.  7237.698885: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 50 cur_tx: 4406962 frags: 0 dirty_tx: 4406957 tx_left: 5
>>> 802644:          <idle>-0     [001] ..s.  7237.698932: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 51 cur_tx: 4406963 frags: 0 dirty_tx: 4406957 tx_left: 6
>>> 802747:          <idle>-0     [001] ..s.  7237.698978: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 52 cur_tx: 4406964 frags: 0 dirty_tx: 4406957 tx_left: 7
>>> 802851:          <idle>-0     [001] ..s.  7237.699025: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 53 cur_tx: 4406965 frags: 0 dirty_tx: 4406957 tx_left: 8
>>> 805094:          <idle>-0     [000] ..s.  7237.699871: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406957 tx_left: 9
>>> 805126:          <idle>-0     [000] ..s.  7237.699878: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406958 tx_left: 8
>>> 805197:          <idle>-0     [001] ..s.  7237.699905: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406959 tx_left: 7
>>> 805234:          <idle>-0     [001] ..s.  7237.699925: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 55 cur_tx: 4406967 frags: 0 dirty_tx: 4406959 tx_left: 8
>>> 811446:          <idle>-0     [000] ..s.  7237.702344: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406959 tx_left: 9
>>> 811479:          <idle>-0     [000] ..s.  7237.702354: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406960 tx_left: 8
>>> 811641:          <idle>-0     [000] ..s.  7237.702395: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406961 tx_left: 7
>>> 811733:          <idle>-0     [000] ..s.  7237.702416: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 57 cur_tx: 4406969 frags: 0 dirty_tx: 4406961 tx_left: 8
>>> 812050:          hping3-2871  [000] ..s.  7237.704770: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406961 tx_left: 9
>>> 812061:          hping3-2871  [000] ..s.  7237.704777: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406962 tx_left: 8
>>> 812124:          hping3-2871  [000] ..s.  7237.704811: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406963 tx_left: 7
>>> 812176:          hping3-2871  [000] ..s.  7237.704831: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 59 cur_tx: 4406971 frags: 0 dirty_tx: 4406963 tx_left: 8
>>> 812300:          <idle>-0     [000] ..s.  7237.707247: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406963 tx_left: 9
>>> 812312:          <idle>-0     [000] ..s.  7237.707257: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406964 tx_left: 8
>>> 812389:          <idle>-0     [000] ..s.  7237.707297: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406965 tx_left: 7
>>> 812426:          <idle>-0     [000] ..s.  7237.707318: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 61 cur_tx: 4406973 frags: 0 dirty_tx: 4406965 tx_left: 8
>>> 812827:          <idle>-0     [000] ..s.  7237.709690: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 4406965 tx_left: 9
>>> 812838:          <idle>-0     [000] ..s.  7237.709700: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 4406966 tx_left: 8
>>> .....
>>>
>>>
>>> BR.
>>> Risto
>>>
>>> On 10.12.2018 23:26, Heiner Kallweit wrote:
>>>> Did you test also with the latest linux-next kernel? Some recent changes like 2e6eedb4813e
>>>> "r8169: make use of xmit_more and __netdev_sent_queue" may have a positive impact.
>>>>
>>>> On 10.12.2018 00:28, Risto Pajula wrote:
>>>>> Hello.
>>>>>
>>>>> Old subject: "Re: IP fragmentation performance and don't fragment bug when forwarding
>>>>>
>>>>>
>>>>> I have now been tracing the kernel and finding the bug seems difficult. I think the bug is combination of several things, likely cause is that it only occurs with rtl8169 and how it is using the netdevapi/DQL api.
>>>>>
>>>>>   From my investigations seems that following happens:
>>>>>
>>>>> *Burst of frames is received from internet. (eth0)
>>>>> *These are accepted and forwarded to the (eth1)
>>>>> *DQL hits the limit, this causes the scheduling for the device to be stopped (__QUEUE_STATE_STACK_XOFF)
>>>>> *This and combination of some timing causes the  performance degradation, messages are only scheduled for transmission in soft Irq context.
>>>>>
>>>>> I still do now know, why the DQL hits the limits.. I think it should not,  maybe something undesirable first happens with rtl8169.
>>>>>
>>>>> I inserted following trace printk functions to the code..
>>>>>
>>>>> rtl8169_start_xmit:
>>>>> trace_printk("ORP33 %s rtl8169_start_xmit len: %d entry: %u cur_tx: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, skb->len, entry, tp->cur_tx, frags, tp->dirty_tx, (tp->cur_tx - tp->dirty_tx) );
>>>>>
>>>>>
>>>>> rtl_tx:
>>>>> trace_printk("ORP33 %s rtl_tx len: %d entry: %u cur_tx: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, tx_skb->skb->len, tp->cur_tx % NUM_TX_DESC, tp->cur_tx, 0, dirty_tx, tx_left );
>>>>>
>>>>>
>>>>> Here is some grepped output only filtering the ping packet in two different situations:
>>>>>
>>>>> //trace when downloading ubuntu (~20MB/s), only 3 secs fit to buffer. ping is all the time normal....
>>>>> //orig tracing.out3754
>>>>> 117493:          <idle>-0     [000] ..s.  5235.407116: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 57 cur_tx: 3747641 frags: 0 dirty_tx: 3747640 tx_left: 1
>>>>> 118704:          <idle>-0     [000] ..s.  5235.416057: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 20 cur_tx: 3747668 frags: 0 dirty_tx: 3747667 tx_left: 1
>>>>> 346319:          hping3-2871  [000] ....  5236.002456: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 8 cur_tx: 3754312 frags: 0 dirty_tx: 3754312 tx_left: 0
>>>>> 346337:          hping3-2871  [000] ..s.  5236.002475: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 9 cur_tx: 3754313 frags: 0 dirty_tx: 3754312 tx_left: 1
>>>>> 1277155:          hping3-2871  [001] ....  5237.002627: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 38 cur_tx: 3766502 frags: 0 dirty_tx: 3766502 tx_left: 0
>>>>> 1277173:          <idle>-0     [000] ..s.  5237.002646: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 39 cur_tx: 3766503 frags: 0 dirty_tx: 3766502 tx_left: 1
>>>>>
>>>>>
>>>>> //imcp messages from route to intenal network when TCP stream is on. tracing.out3236/eth1_ring.txt is the original
>>>>> //no idea why the first frame is tripled...
>>>>> 200792:          hping3-2871  [001] ....  3239.593384: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 34 cur_tx: 2132770 frags: 0 dirty_tx: 2132770 tx_left: 0
>>>>> 201578:          hping3-2871  [001] ....  3240.593625: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 40 cur_tx: 2132776 frags: 0 dirty_tx: 2132776 tx_left: 0
>>>>> 341061:          <idle>-0     [000] ..s.  3241.738011: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 52 cur_tx: 2133684 frags: 0 dirty_tx: 2133680 tx_left: 4
>>>>> 342112:          <idle>-0     [000] ..s.  3241.745297: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 59 cur_tx: 2133691 frags: 0 dirty_tx: 2133684 tx_left: 7
>>>>> 500126:          <idle>-0     [000] ..s.  3242.739500: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 49 cur_tx: 2134513 frags: 0 dirty_tx: 2134505 tx_left: 8
>>>>> 501739:          <idle>-0     [000] ..s.  3242.751654: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 58 cur_tx: 2134522 frags: 0 dirty_tx: 2134513 tx_left: 9
>>>>> 646220:          <idle>-0     [000] ..s.  3243.745761: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 41 cur_tx: 2135273 frags: 0 dirty_tx: 2135267 tx_left: 6
>>>>> 647633:          <idle>-0     [000] ..s.  3243.755486: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 49 cur_tx: 2135281 frags: 0 dirty_tx: 2135273 tx_left: 8
>>>>> 802878:          <idle>-0     [000] ..s.  3244.739947: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2136087 frags: 0 dirty_tx: 2136081 tx_left: 6
>>>>> 804298:          <idle>-0     [000] ..s.  3244.749677: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 31 cur_tx: 2136095 frags: 0 dirty_tx: 2136087 tx_left: 8
>>>>> 961190:          <idle>-0     [000] ..s.  3245.746217: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 16 cur_tx: 2136912 frags: 0 dirty_tx: 2136906 tx_left: 6
>>>>> 962610:          <idle>-0     [000] ..s.  3245.755946: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2136920 frags: 0 dirty_tx: 2136912 tx_left: 8
>>>>> 1118044:          <idle>-0     [000] ..s.  3246.740336: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 62 cur_tx: 2137726 frags: 0 dirty_tx: 2137720 tx_left: 6
>>>>> 1119443:          <idle>-0     [000] ..s.  3246.750047: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 5 cur_tx: 2137733 frags: 0 dirty_tx: 2137726 tx_left: 7
>>>>> 1264047:          <idle>-0     [000] ..s.  3247.709202: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 29 cur_tx: 2138525 frags: 0 dirty_tx: 2138518 tx_left: 7
>>>>> 1264740:          <idle>-0     [000] ..s.  3247.718918: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 33 cur_tx: 2138529 frags: 0 dirty_tx: 2138525 tx_left: 4
>>>>> 1419958:          <idle>-0     [000] ..s.  3248.736436: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 48 cur_tx: 2139312 frags: 0 dirty_tx: 2139305 tx_left: 7
>>>>> 1421551:          <idle>-0     [000] ..s.  3248.746189: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 56 cur_tx: 2139320 frags: 0 dirty_tx: 2139312 tx_left: 8
>>>>> 1578746:          <idle>-0     [000] ..s.  3249.742702: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 46 cur_tx: 2140142 frags: 0 dirty_tx: 2140135 tx_left: 7
>>>>> 1580194:          <idle>-0     [000] ..s.  3249.752457: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 54 cur_tx: 2140150 frags: 0 dirty_tx: 2140142 tx_left: 8
>>>>> 1729597:          <idle>-0     [000] ..s.  3250.746839: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 36 cur_tx: 2140964 frags: 0 dirty_tx: 2140957 tx_left: 7
>>>>> 1731015:          <idle>-0     [000] ..s.  3250.756594: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 43 cur_tx: 2140971 frags: 0 dirty_tx: 2140964 tx_left: 7
>>>>> 1881028:          <idle>-0     [000] ..s.  3251.740991: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 19 cur_tx: 2141779 frags: 0 dirty_tx: 2141772 tx_left: 7
>>>>> 1881381:          <idle>-0     [000] ..s.  3251.750718: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 26 cur_tx: 2141786 frags: 0 dirty_tx: 2141779 tx_left: 7
>>>>> //something happens, the output queue empties.. now it is fast, and rtl8169_start_xmit is not called from soft_irq
>>>>> 1896178:          hping3-2871  [001] ....  3252.595994: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2141911 frags: 0 dirty_tx: 2141911 tx_left: 0
>>>>> 1896196:          <idle>-0     [000] ..s.  3252.596055: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2141912 frags: 0 dirty_tx: 2141911 tx_left: 1
>>>>> 1923136:          hping3-2871  [001] ....  3253.596227: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 50 cur_tx: 2142066 frags: 0 dirty_tx: 2142066 tx_left: 0
>>>>> 1923154:          <idle>-0     [000] ..s.  3253.596289: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 51 cur_tx: 2142067 frags: 0 dirty_tx: 2142066 tx_left: 1
>>>>> 1951633:          hping3-2871  [001] ....  3254.596459: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 5 cur_tx: 2142213 frags: 0 dirty_tx: 2142213 tx_left: 0
>>>>> 1951651:          <idle>-0     [000] ..s.  3254.596521: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 6 cur_tx: 2142214 frags: 0 dirty_tx: 2142213 tx_left: 1
>>>>> 1954472:          hping3-2871  [001] ....  3255.596700: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 15 cur_tx: 2142223 frags: 0 dirty_tx: 2142223 tx_left: 0
>>>>> 1954490:          <idle>-0     [000] ..s.  3255.596763: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 16 cur_tx: 2142224 frags: 0 dirty_tx: 2142223 tx_left: 1
>>>>> //until it soon, again is slow..
>>>>>
>>>>>
>>>>> BR.
>>>>> Risto
>>>>>
>>>>>
>>>>>
>>>>> On 7.12.2018 16:46, Risto Pajula wrote:
>>>>>> Hello.
>>>>>>
>>>>>> I have been to track the poor forwarding latency to the TCP Window scale options. The Netgem device uses rather large windows scale options (x256) and I have been able to reproduce the routers poor forwarding latency also with linux box running in the internal network and changing the net.ipv4.tcp_rmem to a large value and thus changing the TCP window scaling options to larger ones. I still do not have clue why this causes the forwarfing in the linux kernel to block? Maybe something in the connection tracking....?
>>>>>>
>>>>>>
>>>>>> With the ICMP timestamp messages I have been able to also pinpoint that the latency is caused in the eth1 sending side (the following hping3 example is run in the router toward the internal network...
>>>>>>
>>>>>>
>>>>>> xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
>>>>>> using eth1, addr: 192.168.0.1, MTU: 1500
>>>>>> HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0 data bytes
>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
>>>>>> icmp_seq=0 rtt=7.9 ms
>>>>>> ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
>>>>>> ICMP timestamp RTT tsrtt=7
>>>>>>
>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
>>>>>> icmp_seq=1 rtt=235.9 ms
>>>>>> ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
>>>>>> ICMP timestamp RTT tsrtt=235
>>>>>>
>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
>>>>>> icmp_seq=2 rtt=3.8 ms
>>>>>> ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
>>>>>> ICMP timestamp RTT tsrtt=3
>>>>>>
>>>>>> len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
>>>>>> icmp_seq=3 rtt=47.8 ms
>>>>>> ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
>>>>>> ICMP timestamp RTT tsrtt=47
>>>>>>
>>>>>> len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
>>>>>> icmp_seq=4 rtt=7.7 ms
>>>>>> ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
>>>>>> ICMP timestamp RTT tsrtt=7
>>>>>>
>>>>>> len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
>>>>>> icmp_seq=5 rtt=3.7 ms
>>>>>> ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
>>>>>> ICMP timestamp RTT tsrtt=3
>>>>>>
>>>>>> ^C
>>>>>> --- 192.168.0.112 hping statistic ---
>>>>>> 6 packets tramitted, 6 packets received, 0% packet loss
>>>>>> round-trip min/avg/max = 3.7/51.1/235.9 ms
>>>>>>
>>>>>>
>>>>>>
>>>>>> BR.
>>>>>> Risto
>>>>>>
>>>>>> On 2.12.2018 23:32, Risto Pajula wrote:
>>>>>>> Hello.
>>>>>>>
>>>>>>> You can most likely ignore the "DF Bit, mtu bug when forwarding" case. There isn't actually big IP packets on the wire, instead there is burst of packets on the wire, which are combined by the GRO... And thus dropping them should not happen. Sorry about the invalid bug report.
>>>>>>>
>>>>>>> However the poor latency from intenal network to the internet still remain, both GRO enabled and disabled. I will try to study further...
>>>>>>>
>>>>>>>
>>>>>>> BR.
>>>>>>> Risto
>>>>>>>
>>>>>>>
>>>>>>> On 2.12.2018 14:01, Risto Pajula wrote:
>>>>>>>> Hello.
>>>>>>>>
>>>>>>>> I have encountered a weird performance problem in Linux IP fragmentation when using video streaming services behind the NAT. Also I have studied a possible bug in the DF bit (don't fragment) handling when forwarding the IP packets.
>>>>>>>>
>>>>>>>> First the system setup description:
>>>>>>>>
>>>>>>>> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre router]-internet
>>>>>>>>
>>>>>>>> where:
>>>>>>>> host1: is a Netgem N7800 "cable box" for online video streaming services provided by local telco (Can access Netflix, HBO nordic, "live TV", etc.)
>>>>>>>> linux router: Linux computer with Dualcore Intel Celeron G1840, running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
>>>>>>>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, mtu set to 1500, RTL8169sb/8110sb
>>>>>>>> eth0: Linux Routers internet facing interface, public ip address, mtu set to 1500,  RTL8168evl/8111evl
>>>>>>>> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly connected to the eth0 of the Linux router.
>>>>>>>>
>>>>>>>> And now when using the Netgem N7800 with online video services (Netflix, HBO nordic, etc) the Linux router will receive very BIG IP packets in the eth0 upto ~20kB, this seems to lead to the following problems in the Linux IP stack.
>>>>>>>>
>>>>>>>> IP fragmentation performance:
>>>>>>>> When the Linux router receives these large IP packets in the eth0 everything works, but it seems that them cause very large performance degradation from internal network to the internet regarding the latency when the IP fragmentation is performed. The ping latency from internal network to the internel network increases from stable 15ms-20ms up to 700-800ms AND also the ping from the internal network to the linux router eth1 (192.168.0.). However up link works perfectly, the ping is still stable when streaming the online services (From linux router to the internet). It seems that the IP fragmentation is somehow blocking the eth1 reception or transmission for very long time (which it shouldn't). I'm able to test and debug the issue further, but advice regarding where to look would b
 e appreciated.
>>>>>>>>
>>>>>>>>
>>>>>>>> DF Bit, mtu bug when forwarding:
>>>>>>>> I have started to study the above mentioned problem and have found a possible bug in the DF bit and mtu handling in IP forwarding. The BIG packets received from streaming services all have the "DF bit" set and the question is that should we be forwarding them at all as that would result them being fragmented? Apparently we currently are... I have traced this down to the ip_forward.c function ip_exceeds_mtu(), and the following patch seems to fix that.
>>>>>>>>
>>>>>>>> --- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
>>>>>>>> +++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
>>>>>>>> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
>>>>>>>>                   return false;
>>>>>>>>
>>>>>>>>           /* original fragment exceeds mtu and DF is set */
>>>>>>>> -       if (unlikely(IPCB(skb)->frag_max_size > mtu))
>>>>>>>> +        if (unlikely(skb->len > mtu))
>>>>>>>>                   return true;
>>>>>>>>
>>>>>>>>           if (skb->ignore_df)
>>>>>>>>
>>>>>>>>
>>>>>>>> This seems to work (in some ways) - after the change IP packets that are too large to the internal network get dropped and we are sending "ICMP Destination unreachable, The datagram is too big" messages to the originator (as we should?). However it seems that not all services really like this... Netflix behaves as expected and ping is stable from internal network to the internet, but for example HBO nordic will not work anymore (too little buffering? Retransimissions not working?). So it seems the original issue should be also fixed (And the fragmention should be allowed?).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Any advice would be appreciated. Thanks!
>>>>>>>>
>>>>>>>> PS. Watching TV was not this intensive 20 years ago :)
>>>>>>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: IP (rtl8169) forwarding bug (performance)
  2018-12-12  1:28               ` Risto Pajula
@ 2018-12-12  6:23                 ` Heiner Kallweit
  2018-12-12 23:20                   ` Risto Pajula
  0 siblings, 1 reply; 14+ messages in thread
From: Heiner Kallweit @ 2018-12-12  6:23 UTC (permalink / raw)
  To: Risto Pajula
  Cc: David S. Miller, Alexey Kuznetsov, Realtek linux nic maintainers, netdev

According to your description of the issue it doesn't need a very exotic scenario to trigger it.
And due to the fact that Realtek network chips are used on a lot of consumer mainboards, I would
assume quite some people are using such a mainboard for a use case like yours.
This makes it somewhat suspicious that at least I'm not aware of more such reports.
Therefore I think it's a good idea to check the other components of the internal network.

On 12.12.2018 02:28, Risto Pajula wrote:
> Hello.
> 
> I added some debug prints to diagnose the bug properly. I can send the patches if you are willing to debug/try... for example this output is produced:
> 
> 96096:          <idle>-0     [000] ..s.   232.466703: rtl8169_start_xmit: RTLDBG221 eth1 rtl8169_start_xmit len: 1506 opts1: B0000000 txpol: 0 entry: 24 cur_tx: 5656 frags: 0 dirty_tx: 5648 tx_left: 8
> 96097:          <idle>-0     [000] ..s.   232.466704: rtl8169_start_xmit: RTLDBG111 eth1 rtl_tx_slots_avail
> 96099:          <idle>-0     [000] d.h.   232.468827: rtl8169_interrupt: RTLDBG111 eth1 rtl8169_interrupt
> 96103:          <idle>-0     [000] ..s.   232.468833: rtl8169_poll: RTLDBG111 eth1 rtl8169_poll
> 96104:          <idle>-0     [000] ..s.   232.468834: rtl8169_poll: RTLDBG111 eth1 rtl_rx
> 96105:          <idle>-0     [000] ..s.   232.468835: rtl8169_poll: RTLDBG111 eth1 rtl8169_try_rx_copy
> 96184:          <idle>-0     [000] ..s.   232.468914: rtl_tx: RTLDBG222 eth1 rtl_tx len: 1506 opts1: B00005E2 txpol: 0 entry: 16 cur_tx: 5657 frags: 0 dirty_tx: 5648 tx_left: 9
> 96185:          <idle>-0     [000] ..s.   232.468924: rtl_tx: RTLDBG223 eth1 rtl_tx entry opts1: 16, B00005E2 17, B00005E2 18, B00005E2 19, B00005E2 20, B00005E2 21, B00005E2 22, B00005E2 23, B00005E2 24, B00005E2
> 96189:          <idle>-0     [000] d.h.   232.469006: rtl8169_interrupt: RTLDBG111 eth1 rtl8169_interrupt
> 96193:          <idle>-0     [000] ..s.   232.469012: rtl8169_poll: RTLDBG111 eth1 rtl8169_poll
> 96194:          <idle>-0     [000] ..s.   232.469014: rtl8169_poll: RTLDBG111 eth1 rtl_rx
> 96196:          <idle>-0     [000] ..s.   232.469020: rtl_tx: RTLDBG222 eth1 rtl_tx len: 1506 opts1: 30000000 txpol: 0 entry: 16 cur_tx: 5657 frags: 0 dirty_tx: 5648 tx_left: 9
> 96204:          <idle>-0     [000] ..s.   232.469032: rtl_tx: RTLDBG222 eth1 rtl_tx len: 1506 opts1: 30000000 txpol: 0 entry: 17 cur_tx: 5657 frags: 0 dirty_tx: 5649 tx_left: 8
> 96212:          <idle>-0     [000] ..s.   232.469041: rtl_tx: RTLDBG222 eth1 rtl_tx len: 1506 opts1: B00005E2 txpol: 0 entry: 18 cur_tx: 5657 frags: 0 dirty_tx: 5650 tx_left: 7
> 96213:          <idle>-0     [000] ..s.   232.469050: rtl_tx: RTLDBG223 eth1 rtl_tx entry opts1: 18, B00005E2 19, B00005E2 20, B00005E2 21, B00005E2 22, B00005E2 23, B00005E2 24, B00005E2
> 96235:          <idle>-0     [000] d.h.   232.471253: rtl8169_interrupt: RTLDBG111 eth1 rtl8169_interrupt
> 
> Here we can clearly see that TX Fifo descriptors are properly setup and send is initiated, but then the rtl8169 stops sending after few frames. I think the bug is in the HW side (or in the rtl8169 firmware). Some PCI settings might affect this, but then again shouldn't the rtl8169 then report PCI error interrupt...
> 
> Also this TX Fifo stalling happens with any traffic and no netflix/netgem/large tcp windows are required to trigger it. What the netgem/netflix does is that it causes this bug to be visible by probably due to timing of the traffic pattern, and thus causes also upper layers of the network stack to take actions..
> 
> RTL8168evl indeed is a onboard Ethernet on a reasonable new mothorboard and the RTL8169sb is and old PCI card. But, the same issue happens with both ethernet adapters (I switched the interfaces).
> 
> But then again - this got me thinking... If the issue (send fifo stalling) does not happen in the internet facing interface even with heavy upload traffic, then the problem must be in the internal network itself. Tomorrow I will change the Ethernet switch. There could be some sort of compatibility problem?
> 
> BR.
> Risto
> 
> 
> On 11.12.2018 21:51, Heiner Kallweit wrote:
>> OK, then another idea .. At the very beginning of the mail thread it
>> was stated that the router has to network ports:
>> linux router: Linux computer with Dualcore Intel Celeron G1840, running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, mtu set to 1500, RTL8169sb/8110sb
>> eth0: Linux Routers internet facing interface, public ip address, mtu set to 1500,  RTL8168evl/8111evl
>>
>> r8169 supports about 50 members of the RTL8169/RTL8168/RTL8101 family
>> and more or less every single member needs its own quirks.
>> RTL8168evl (PCIe) is somewhat recent, RTL8169sb (PCI) is ancient.
>> To rule out you triggered some hardware issue: Could you switch both
>> interfaces and check whether you see a change in system behavior?
>>
>> On 11.12.2018 18:01, Risto Pajula wrote:
>>> Hello.
>>>
>>> A freshly built 4.20.0-rc6-next-20181210-lp150.12.25-default waited me when I got back from work, but unfortunately it did not help at all, it behaved exactly in same manner.
>>>
>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>> Reply from 192.168.0.1: bytes=32 time=73ms TTL=64
>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>> Reply from 192.168.0.1: bytes=32 time=83ms TTL=64
>>> Reply from 192.168.0.1: bytes=32 time=307ms TTL=64
>>> Reply from 192.168.0.1: bytes=32 time=115ms TTL=64
>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>>
>>> BR.
>>> Risto
>>>
>>>
>>> On 11.12.2018 0:20, Risto Pajula wrote:
>>>> Hello.
>>>>
>>>> I have not yet tested with linux-next but I will, thanks for pointing that out.
>>>>
>>>> ...But I have studied the problem a bit more, indeed it seems that the rtl8169 transmission queue gets stuck.
>>>>
>>>> Below is some trace log. Starting from 802026 a burst of frames is forwarded from eth0 to eth1 and when we get to the rtl_tx only two frames have been transmitted.. (Should have had 1,25 ms to transmit them).. Also then the sequence seems to repeat for very long time that only two massages get transmitted.
>>>>
>>>> 800541:          <idle>-0     [001] ..s.  7237.698143: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 34 cur_tx: 4406946 frags: 0 dirty_tx: 4406946 tx_left: 0
>>>> 800671:          <idle>-0     [001] ..s.  7237.698189: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 35 cur_tx: 4406947 frags: 0 dirty_tx: 4406946 tx_left: 1
>>>> 800835:          <idle>-0     [001] ..s.  7237.698235: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 36 cur_tx: 4406948 frags: 0 dirty_tx: 4406946 tx_left: 2
>>>> 800905:          <idle>-0     [000] ..s.  7237.698255: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406946 tx_left: 3
>>>> 800932:          <idle>-0     [000] ..s.  7237.698263: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406947 tx_left: 2
>>>> 800960:          <idle>-0     [000] ..s.  7237.698270: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406948 tx_left: 1
>>>> 800998:          <idle>-0     [001] ..s.  7237.698282: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406949 tx_left: 0
>>>> 801085:          <idle>-0     [000] ..s.  7237.698312: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 38 cur_tx: 4406950 frags: 0 dirty_tx: 4406949 tx_left: 1
>>>> 801137:          <idle>-0     [001] ..s.  7237.698329: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 38 cur_tx: 4406950 frags: 0 dirty_tx: 4406950 tx_left: 0
>>>> 801229:          <idle>-0     [000] ..s.  7237.698363: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 39 cur_tx: 4406951 frags: 0 dirty_tx: 4406950 tx_left: 1
>>>> 801264:          <idle>-0     [001] ..s.  7237.698375: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 39 cur_tx: 4406951 frags: 0 dirty_tx: 4406951 tx_left: 0
>>>> 801356:          <idle>-0     [000] ..s.  7237.698409: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 40 cur_tx: 4406952 frags: 0 dirty_tx: 4406951 tx_left: 1
>>>> 801391:          <idle>-0     [001] ..s.  7237.698421: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 40 cur_tx: 4406952 frags: 0 dirty_tx: 4406952 tx_left: 0
>>>> 801482:          <idle>-0     [000] ..s.  7237.698455: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 41 cur_tx: 4406953 frags: 0 dirty_tx: 4406952 tx_left: 1
>>>> 801518:          <idle>-0     [001] ..s.  7237.698468: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 41 cur_tx: 4406953 frags: 0 dirty_tx: 4406953 tx_left: 0
>>>> 801610:          <idle>-0     [000] ..s.  7237.698502: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 42 cur_tx: 4406954 frags: 0 dirty_tx: 4406953 tx_left: 1
>>>> 801645:          <idle>-0     [001] ..s.  7237.698514: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 42 cur_tx: 4406954 frags: 0 dirty_tx: 4406954 tx_left: 0
>>>> 801737:          <idle>-0     [000] ..s.  7237.698548: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 43 cur_tx: 4406955 frags: 0 dirty_tx: 4406954 tx_left: 1
>>>> 801772:          <idle>-0     [001] ..s.  7237.698560: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 43 cur_tx: 4406955 frags: 0 dirty_tx: 4406955 tx_left: 0
>>>> 801864:          <idle>-0     [000] ..s.  7237.698594: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 44 cur_tx: 4406956 frags: 0 dirty_tx: 4406955 tx_left: 1
>>>> 801899:          <idle>-0     [001] ..s.  7237.698607: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 44 cur_tx: 4406956 frags: 0 dirty_tx: 4406956 tx_left: 0
>>>> 801984:          <idle>-0     [000] ..s.  7237.698640: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 45 cur_tx: 4406957 frags: 0 dirty_tx: 4406956 tx_left: 1
>>>> 802026:          <idle>-0     [001] ..s.  7237.698655: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 45 cur_tx: 4406957 frags: 0 dirty_tx: 4406957 tx_left: 0
>>>> 802129:          <idle>-0     [001] ..s.  7237.698701: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 46 cur_tx: 4406958 frags: 0 dirty_tx: 4406957 tx_left: 1
>>>> 802232:          <idle>-0     [001] ..s.  7237.698747: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 47 cur_tx: 4406959 frags: 0 dirty_tx: 4406957 tx_left: 2
>>>> 802335:          <idle>-0     [001] ..s.  7237.698793: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 48 cur_tx: 4406960 frags: 0 dirty_tx: 4406957 tx_left: 3
>>>> 802438:          <idle>-0     [001] ..s.  7237.698840: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 49 cur_tx: 4406961 frags: 0 dirty_tx: 4406957 tx_left: 4
>>>> 802541:          <idle>-0     [001] ..s.  7237.698885: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 50 cur_tx: 4406962 frags: 0 dirty_tx: 4406957 tx_left: 5
>>>> 802644:          <idle>-0     [001] ..s.  7237.698932: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 51 cur_tx: 4406963 frags: 0 dirty_tx: 4406957 tx_left: 6
>>>> 802747:          <idle>-0     [001] ..s.  7237.698978: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 52 cur_tx: 4406964 frags: 0 dirty_tx: 4406957 tx_left: 7
>>>> 802851:          <idle>-0     [001] ..s.  7237.699025: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 53 cur_tx: 4406965 frags: 0 dirty_tx: 4406957 tx_left: 8
>>>> 805094:          <idle>-0     [000] ..s.  7237.699871: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406957 tx_left: 9
>>>> 805126:          <idle>-0     [000] ..s.  7237.699878: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406958 tx_left: 8
>>>> 805197:          <idle>-0     [001] ..s.  7237.699905: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406959 tx_left: 7
>>>> 805234:          <idle>-0     [001] ..s.  7237.699925: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 55 cur_tx: 4406967 frags: 0 dirty_tx: 4406959 tx_left: 8
>>>> 811446:          <idle>-0     [000] ..s.  7237.702344: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406959 tx_left: 9
>>>> 811479:          <idle>-0     [000] ..s.  7237.702354: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406960 tx_left: 8
>>>> 811641:          <idle>-0     [000] ..s.  7237.702395: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406961 tx_left: 7
>>>> 811733:          <idle>-0     [000] ..s.  7237.702416: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 57 cur_tx: 4406969 frags: 0 dirty_tx: 4406961 tx_left: 8
>>>> 812050:          hping3-2871  [000] ..s.  7237.704770: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406961 tx_left: 9
>>>> 812061:          hping3-2871  [000] ..s.  7237.704777: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406962 tx_left: 8
>>>> 812124:          hping3-2871  [000] ..s.  7237.704811: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406963 tx_left: 7
>>>> 812176:          hping3-2871  [000] ..s.  7237.704831: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 59 cur_tx: 4406971 frags: 0 dirty_tx: 4406963 tx_left: 8
>>>> 812300:          <idle>-0     [000] ..s.  7237.707247: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406963 tx_left: 9
>>>> 812312:          <idle>-0     [000] ..s.  7237.707257: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406964 tx_left: 8
>>>> 812389:          <idle>-0     [000] ..s.  7237.707297: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406965 tx_left: 7
>>>> 812426:          <idle>-0     [000] ..s.  7237.707318: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 61 cur_tx: 4406973 frags: 0 dirty_tx: 4406965 tx_left: 8
>>>> 812827:          <idle>-0     [000] ..s.  7237.709690: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 4406965 tx_left: 9
>>>> 812838:          <idle>-0     [000] ..s.  7237.709700: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 4406966 tx_left: 8
>>>> .....
>>>>
>>>>
>>>> BR.
>>>> Risto
>>>>
>>>> On 10.12.2018 23:26, Heiner Kallweit wrote:
>>>>> Did you test also with the latest linux-next kernel? Some recent changes like 2e6eedb4813e
>>>>> "r8169: make use of xmit_more and __netdev_sent_queue" may have a positive impact.
>>>>>
>>>>> On 10.12.2018 00:28, Risto Pajula wrote:
>>>>>> Hello.
>>>>>>
>>>>>> Old subject: "Re: IP fragmentation performance and don't fragment bug when forwarding
>>>>>>
>>>>>>
>>>>>> I have now been tracing the kernel and finding the bug seems difficult. I think the bug is combination of several things, likely cause is that it only occurs with rtl8169 and how it is using the netdevapi/DQL api.
>>>>>>
>>>>>>   From my investigations seems that following happens:
>>>>>>
>>>>>> *Burst of frames is received from internet. (eth0)
>>>>>> *These are accepted and forwarded to the (eth1)
>>>>>> *DQL hits the limit, this causes the scheduling for the device to be stopped (__QUEUE_STATE_STACK_XOFF)
>>>>>> *This and combination of some timing causes the  performance degradation, messages are only scheduled for transmission in soft Irq context.
>>>>>>
>>>>>> I still do now know, why the DQL hits the limits.. I think it should not,  maybe something undesirable first happens with rtl8169.
>>>>>>
>>>>>> I inserted following trace printk functions to the code..
>>>>>>
>>>>>> rtl8169_start_xmit:
>>>>>> trace_printk("ORP33 %s rtl8169_start_xmit len: %d entry: %u cur_tx: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, skb->len, entry, tp->cur_tx, frags, tp->dirty_tx, (tp->cur_tx - tp->dirty_tx) );
>>>>>>
>>>>>>
>>>>>> rtl_tx:
>>>>>> trace_printk("ORP33 %s rtl_tx len: %d entry: %u cur_tx: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, tx_skb->skb->len, tp->cur_tx % NUM_TX_DESC, tp->cur_tx, 0, dirty_tx, tx_left );
>>>>>>
>>>>>>
>>>>>> Here is some grepped output only filtering the ping packet in two different situations:
>>>>>>
>>>>>> //trace when downloading ubuntu (~20MB/s), only 3 secs fit to buffer. ping is all the time normal....
>>>>>> //orig tracing.out3754
>>>>>> 117493:          <idle>-0     [000] ..s.  5235.407116: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 57 cur_tx: 3747641 frags: 0 dirty_tx: 3747640 tx_left: 1
>>>>>> 118704:          <idle>-0     [000] ..s.  5235.416057: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 20 cur_tx: 3747668 frags: 0 dirty_tx: 3747667 tx_left: 1
>>>>>> 346319:          hping3-2871  [000] ....  5236.002456: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 8 cur_tx: 3754312 frags: 0 dirty_tx: 3754312 tx_left: 0
>>>>>> 346337:          hping3-2871  [000] ..s.  5236.002475: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 9 cur_tx: 3754313 frags: 0 dirty_tx: 3754312 tx_left: 1
>>>>>> 1277155:          hping3-2871  [001] ....  5237.002627: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 38 cur_tx: 3766502 frags: 0 dirty_tx: 3766502 tx_left: 0
>>>>>> 1277173:          <idle>-0     [000] ..s.  5237.002646: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 39 cur_tx: 3766503 frags: 0 dirty_tx: 3766502 tx_left: 1
>>>>>>
>>>>>>
>>>>>> //imcp messages from route to intenal network when TCP stream is on. tracing.out3236/eth1_ring.txt is the original
>>>>>> //no idea why the first frame is tripled...
>>>>>> 200792:          hping3-2871  [001] ....  3239.593384: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 34 cur_tx: 2132770 frags: 0 dirty_tx: 2132770 tx_left: 0
>>>>>> 201578:          hping3-2871  [001] ....  3240.593625: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 40 cur_tx: 2132776 frags: 0 dirty_tx: 2132776 tx_left: 0
>>>>>> 341061:          <idle>-0     [000] ..s.  3241.738011: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 52 cur_tx: 2133684 frags: 0 dirty_tx: 2133680 tx_left: 4
>>>>>> 342112:          <idle>-0     [000] ..s.  3241.745297: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 59 cur_tx: 2133691 frags: 0 dirty_tx: 2133684 tx_left: 7
>>>>>> 500126:          <idle>-0     [000] ..s.  3242.739500: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 49 cur_tx: 2134513 frags: 0 dirty_tx: 2134505 tx_left: 8
>>>>>> 501739:          <idle>-0     [000] ..s.  3242.751654: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 58 cur_tx: 2134522 frags: 0 dirty_tx: 2134513 tx_left: 9
>>>>>> 646220:          <idle>-0     [000] ..s.  3243.745761: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 41 cur_tx: 2135273 frags: 0 dirty_tx: 2135267 tx_left: 6
>>>>>> 647633:          <idle>-0     [000] ..s.  3243.755486: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 49 cur_tx: 2135281 frags: 0 dirty_tx: 2135273 tx_left: 8
>>>>>> 802878:          <idle>-0     [000] ..s.  3244.739947: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2136087 frags: 0 dirty_tx: 2136081 tx_left: 6
>>>>>> 804298:          <idle>-0     [000] ..s.  3244.749677: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 31 cur_tx: 2136095 frags: 0 dirty_tx: 2136087 tx_left: 8
>>>>>> 961190:          <idle>-0     [000] ..s.  3245.746217: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 16 cur_tx: 2136912 frags: 0 dirty_tx: 2136906 tx_left: 6
>>>>>> 962610:          <idle>-0     [000] ..s.  3245.755946: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2136920 frags: 0 dirty_tx: 2136912 tx_left: 8
>>>>>> 1118044:          <idle>-0     [000] ..s.  3246.740336: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 62 cur_tx: 2137726 frags: 0 dirty_tx: 2137720 tx_left: 6
>>>>>> 1119443:          <idle>-0     [000] ..s.  3246.750047: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 5 cur_tx: 2137733 frags: 0 dirty_tx: 2137726 tx_left: 7
>>>>>> 1264047:          <idle>-0     [000] ..s.  3247.709202: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 29 cur_tx: 2138525 frags: 0 dirty_tx: 2138518 tx_left: 7
>>>>>> 1264740:          <idle>-0     [000] ..s.  3247.718918: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 33 cur_tx: 2138529 frags: 0 dirty_tx: 2138525 tx_left: 4
>>>>>> 1419958:          <idle>-0     [000] ..s.  3248.736436: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 48 cur_tx: 2139312 frags: 0 dirty_tx: 2139305 tx_left: 7
>>>>>> 1421551:          <idle>-0     [000] ..s.  3248.746189: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 56 cur_tx: 2139320 frags: 0 dirty_tx: 2139312 tx_left: 8
>>>>>> 1578746:          <idle>-0     [000] ..s.  3249.742702: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 46 cur_tx: 2140142 frags: 0 dirty_tx: 2140135 tx_left: 7
>>>>>> 1580194:          <idle>-0     [000] ..s.  3249.752457: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 54 cur_tx: 2140150 frags: 0 dirty_tx: 2140142 tx_left: 8
>>>>>> 1729597:          <idle>-0     [000] ..s.  3250.746839: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 36 cur_tx: 2140964 frags: 0 dirty_tx: 2140957 tx_left: 7
>>>>>> 1731015:          <idle>-0     [000] ..s.  3250.756594: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 43 cur_tx: 2140971 frags: 0 dirty_tx: 2140964 tx_left: 7
>>>>>> 1881028:          <idle>-0     [000] ..s.  3251.740991: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 19 cur_tx: 2141779 frags: 0 dirty_tx: 2141772 tx_left: 7
>>>>>> 1881381:          <idle>-0     [000] ..s.  3251.750718: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 26 cur_tx: 2141786 frags: 0 dirty_tx: 2141779 tx_left: 7
>>>>>> //something happens, the output queue empties.. now it is fast, and rtl8169_start_xmit is not called from soft_irq
>>>>>> 1896178:          hping3-2871  [001] ....  3252.595994: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2141911 frags: 0 dirty_tx: 2141911 tx_left: 0
>>>>>> 1896196:          <idle>-0     [000] ..s.  3252.596055: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2141912 frags: 0 dirty_tx: 2141911 tx_left: 1
>>>>>> 1923136:          hping3-2871  [001] ....  3253.596227: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 50 cur_tx: 2142066 frags: 0 dirty_tx: 2142066 tx_left: 0
>>>>>> 1923154:          <idle>-0     [000] ..s.  3253.596289: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 51 cur_tx: 2142067 frags: 0 dirty_tx: 2142066 tx_left: 1
>>>>>> 1951633:          hping3-2871  [001] ....  3254.596459: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 5 cur_tx: 2142213 frags: 0 dirty_tx: 2142213 tx_left: 0
>>>>>> 1951651:          <idle>-0     [000] ..s.  3254.596521: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 6 cur_tx: 2142214 frags: 0 dirty_tx: 2142213 tx_left: 1
>>>>>> 1954472:          hping3-2871  [001] ....  3255.596700: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 15 cur_tx: 2142223 frags: 0 dirty_tx: 2142223 tx_left: 0
>>>>>> 1954490:          <idle>-0     [000] ..s.  3255.596763: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 16 cur_tx: 2142224 frags: 0 dirty_tx: 2142223 tx_left: 1
>>>>>> //until it soon, again is slow..
>>>>>>
>>>>>>
>>>>>> BR.
>>>>>> Risto
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 7.12.2018 16:46, Risto Pajula wrote:
>>>>>>> Hello.
>>>>>>>
>>>>>>> I have been to track the poor forwarding latency to the TCP Window scale options. The Netgem device uses rather large windows scale options (x256) and I have been able to reproduce the routers poor forwarding latency also with linux box running in the internal network and changing the net.ipv4.tcp_rmem to a large value and thus changing the TCP window scaling options to larger ones. I still do not have clue why this causes the forwarfing in the linux kernel to block? Maybe something in the connection tracking....?
>>>>>>>
>>>>>>>
>>>>>>> With the ICMP timestamp messages I have been able to also pinpoint that the latency is caused in the eth1 sending side (the following hping3 example is run in the router toward the internal network...
>>>>>>>
>>>>>>>
>>>>>>> xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
>>>>>>> using eth1, addr: 192.168.0.1, MTU: 1500
>>>>>>> HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0 data bytes
>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
>>>>>>> icmp_seq=0 rtt=7.9 ms
>>>>>>> ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
>>>>>>> ICMP timestamp RTT tsrtt=7
>>>>>>>
>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
>>>>>>> icmp_seq=1 rtt=235.9 ms
>>>>>>> ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
>>>>>>> ICMP timestamp RTT tsrtt=235
>>>>>>>
>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
>>>>>>> icmp_seq=2 rtt=3.8 ms
>>>>>>> ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
>>>>>>> ICMP timestamp RTT tsrtt=3
>>>>>>>
>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
>>>>>>> icmp_seq=3 rtt=47.8 ms
>>>>>>> ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
>>>>>>> ICMP timestamp RTT tsrtt=47
>>>>>>>
>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
>>>>>>> icmp_seq=4 rtt=7.7 ms
>>>>>>> ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
>>>>>>> ICMP timestamp RTT tsrtt=7
>>>>>>>
>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
>>>>>>> icmp_seq=5 rtt=3.7 ms
>>>>>>> ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
>>>>>>> ICMP timestamp RTT tsrtt=3
>>>>>>>
>>>>>>> ^C
>>>>>>> --- 192.168.0.112 hping statistic ---
>>>>>>> 6 packets tramitted, 6 packets received, 0% packet loss
>>>>>>> round-trip min/avg/max = 3.7/51.1/235.9 ms
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> BR.
>>>>>>> Risto
>>>>>>>
>>>>>>> On 2.12.2018 23:32, Risto Pajula wrote:
>>>>>>>> Hello.
>>>>>>>>
>>>>>>>> You can most likely ignore the "DF Bit, mtu bug when forwarding" case. There isn't actually big IP packets on the wire, instead there is burst of packets on the wire, which are combined by the GRO... And thus dropping them should not happen. Sorry about the invalid bug report.
>>>>>>>>
>>>>>>>> However the poor latency from intenal network to the internet still remain, both GRO enabled and disabled. I will try to study further...
>>>>>>>>
>>>>>>>>
>>>>>>>> BR.
>>>>>>>> Risto
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2.12.2018 14:01, Risto Pajula wrote:
>>>>>>>>> Hello.
>>>>>>>>>
>>>>>>>>> I have encountered a weird performance problem in Linux IP fragmentation when using video streaming services behind the NAT. Also I have studied a possible bug in the DF bit (don't fragment) handling when forwarding the IP packets.
>>>>>>>>>
>>>>>>>>> First the system setup description:
>>>>>>>>>
>>>>>>>>> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre router]-internet
>>>>>>>>>
>>>>>>>>> where:
>>>>>>>>> host1: is a Netgem N7800 "cable box" for online video streaming services provided by local telco (Can access Netflix, HBO nordic, "live TV", etc.)
>>>>>>>>> linux router: Linux computer with Dualcore Intel Celeron G1840, running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
>>>>>>>>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, mtu set to 1500, RTL8169sb/8110sb
>>>>>>>>> eth0: Linux Routers internet facing interface, public ip address, mtu set to 1500,  RTL8168evl/8111evl
>>>>>>>>> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly connected to the eth0 of the Linux router.
>>>>>>>>>
>>>>>>>>> And now when using the Netgem N7800 with online video services (Netflix, HBO nordic, etc) the Linux router will receive very BIG IP packets in the eth0 upto ~20kB, this seems to lead to the following problems in the Linux IP stack.
>>>>>>>>>
>>>>>>>>> IP fragmentation performance:
>>>>>>>>> When the Linux router receives these large IP packets in the eth0 everything works, but it seems that them cause very large performance degradation from internal network to the internet regarding the latency when the IP fragmentation is performed. The ping latency from internal network to the internel network increases from stable 15ms-20ms up to 700-800ms AND also the ping from the internal network to the linux router eth1 (192.168.0.). However up link works perfectly, the ping is still stable when streaming the online services (From linux router to the internet). It seems that the IP fragmentation is somehow blocking the eth1 reception or transmission for very long time (which it shouldn't). I'm able to test and debug the issue further, but advice regarding where to look would 
 be appreciated.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> DF Bit, mtu bug when forwarding:
>>>>>>>>> I have started to study the above mentioned problem and have found a possible bug in the DF bit and mtu handling in IP forwarding. The BIG packets received from streaming services all have the "DF bit" set and the question is that should we be forwarding them at all as that would result them being fragmented? Apparently we currently are... I have traced this down to the ip_forward.c function ip_exceeds_mtu(), and the following patch seems to fix that.
>>>>>>>>>
>>>>>>>>> --- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
>>>>>>>>> +++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
>>>>>>>>> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
>>>>>>>>>                   return false;
>>>>>>>>>
>>>>>>>>>           /* original fragment exceeds mtu and DF is set */
>>>>>>>>> -       if (unlikely(IPCB(skb)->frag_max_size > mtu))
>>>>>>>>> +        if (unlikely(skb->len > mtu))
>>>>>>>>>                   return true;
>>>>>>>>>
>>>>>>>>>           if (skb->ignore_df)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This seems to work (in some ways) - after the change IP packets that are too large to the internal network get dropped and we are sending "ICMP Destination unreachable, The datagram is too big" messages to the originator (as we should?). However it seems that not all services really like this... Netflix behaves as expected and ping is stable from internal network to the internet, but for example HBO nordic will not work anymore (too little buffering? Retransimissions not working?). So it seems the original issue should be also fixed (And the fragmention should be allowed?).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any advice would be appreciated. Thanks!
>>>>>>>>>
>>>>>>>>> PS. Watching TV was not this intensive 20 years ago :)
>>>>>>>>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: IP (rtl8169) forwarding bug (performance)
  2018-12-12  6:23                 ` Heiner Kallweit
@ 2018-12-12 23:20                   ` Risto Pajula
  2018-12-13  4:52                     ` Stephen Hemminger
  0 siblings, 1 reply; 14+ messages in thread
From: Risto Pajula @ 2018-12-12 23:20 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: David S. Miller, Alexey Kuznetsov, Realtek linux nic maintainers, netdev

Hello.

I'm not able reproduce the actual problem anymore which was the high 
ping latency from the internal network.

This starts to sound like some sort of voodoo, but...

I tested replacing the switch to another brand where internal network 
trtl8169 is connected. This did not have any effect on the behavior, the 
latency remained high.

Then I disconnected every device and all the network cables from the 
internal network. The I connected everything again. After that 
reproducing the problem seems very difficult. Likelihood for this to 
happen dramatically decreased.

Looking at pings 30min I once got this.

Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
Reply from 192.168.0.1: bytes=32 time=18ms TTL=64
Reply from 192.168.0.1: bytes=32 time=24ms TTL=64
Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
Reply from 192.168.0.1: bytes=32 time<1ms TTL=64

I did also look at the kernel traces, it seems the rtl8169 send FIFO 
still stalls, but it does not happen as often and in practice is hard to 
notice for the user.

For example I did find this kind sequence:

t0: rtl8169_start_xmit first 1506 byte frame is written
...9 more frames is transmitted
t0+350us: 2 frames is still not sent.  (transmission should have taken 
120us + interframe time)

BR.
Risto

On 12.12.2018 8:23, Heiner Kallweit wrote:
> According to your description of the issue it doesn't need a very exotic scenario to trigger it.
> And due to the fact that Realtek network chips are used on a lot of consumer mainboards, I would
> assume quite some people are using such a mainboard for a use case like yours.
> This makes it somewhat suspicious that at least I'm not aware of more such reports.
> Therefore I think it's a good idea to check the other components of the internal network.
>
> On 12.12.2018 02:28, Risto Pajula wrote:
>> Hello.
>>
>> I added some debug prints to diagnose the bug properly. I can send the patches if you are willing to debug/try... for example this output is produced:
>>
>> 96096:          <idle>-0     [000] ..s.   232.466703: rtl8169_start_xmit: RTLDBG221 eth1 rtl8169_start_xmit len: 1506 opts1: B0000000 txpol: 0 entry: 24 cur_tx: 5656 frags: 0 dirty_tx: 5648 tx_left: 8
>> 96097:          <idle>-0     [000] ..s.   232.466704: rtl8169_start_xmit: RTLDBG111 eth1 rtl_tx_slots_avail
>> 96099:          <idle>-0     [000] d.h.   232.468827: rtl8169_interrupt: RTLDBG111 eth1 rtl8169_interrupt
>> 96103:          <idle>-0     [000] ..s.   232.468833: rtl8169_poll: RTLDBG111 eth1 rtl8169_poll
>> 96104:          <idle>-0     [000] ..s.   232.468834: rtl8169_poll: RTLDBG111 eth1 rtl_rx
>> 96105:          <idle>-0     [000] ..s.   232.468835: rtl8169_poll: RTLDBG111 eth1 rtl8169_try_rx_copy
>> 96184:          <idle>-0     [000] ..s.   232.468914: rtl_tx: RTLDBG222 eth1 rtl_tx len: 1506 opts1: B00005E2 txpol: 0 entry: 16 cur_tx: 5657 frags: 0 dirty_tx: 5648 tx_left: 9
>> 96185:          <idle>-0     [000] ..s.   232.468924: rtl_tx: RTLDBG223 eth1 rtl_tx entry opts1: 16, B00005E2 17, B00005E2 18, B00005E2 19, B00005E2 20, B00005E2 21, B00005E2 22, B00005E2 23, B00005E2 24, B00005E2
>> 96189:          <idle>-0     [000] d.h.   232.469006: rtl8169_interrupt: RTLDBG111 eth1 rtl8169_interrupt
>> 96193:          <idle>-0     [000] ..s.   232.469012: rtl8169_poll: RTLDBG111 eth1 rtl8169_poll
>> 96194:          <idle>-0     [000] ..s.   232.469014: rtl8169_poll: RTLDBG111 eth1 rtl_rx
>> 96196:          <idle>-0     [000] ..s.   232.469020: rtl_tx: RTLDBG222 eth1 rtl_tx len: 1506 opts1: 30000000 txpol: 0 entry: 16 cur_tx: 5657 frags: 0 dirty_tx: 5648 tx_left: 9
>> 96204:          <idle>-0     [000] ..s.   232.469032: rtl_tx: RTLDBG222 eth1 rtl_tx len: 1506 opts1: 30000000 txpol: 0 entry: 17 cur_tx: 5657 frags: 0 dirty_tx: 5649 tx_left: 8
>> 96212:          <idle>-0     [000] ..s.   232.469041: rtl_tx: RTLDBG222 eth1 rtl_tx len: 1506 opts1: B00005E2 txpol: 0 entry: 18 cur_tx: 5657 frags: 0 dirty_tx: 5650 tx_left: 7
>> 96213:          <idle>-0     [000] ..s.   232.469050: rtl_tx: RTLDBG223 eth1 rtl_tx entry opts1: 18, B00005E2 19, B00005E2 20, B00005E2 21, B00005E2 22, B00005E2 23, B00005E2 24, B00005E2
>> 96235:          <idle>-0     [000] d.h.   232.471253: rtl8169_interrupt: RTLDBG111 eth1 rtl8169_interrupt
>>
>> Here we can clearly see that TX Fifo descriptors are properly setup and send is initiated, but then the rtl8169 stops sending after few frames. I think the bug is in the HW side (or in the rtl8169 firmware). Some PCI settings might affect this, but then again shouldn't the rtl8169 then report PCI error interrupt...
>>
>> Also this TX Fifo stalling happens with any traffic and no netflix/netgem/large tcp windows are required to trigger it. What the netgem/netflix does is that it causes this bug to be visible by probably due to timing of the traffic pattern, and thus causes also upper layers of the network stack to take actions..
>>
>> RTL8168evl indeed is a onboard Ethernet on a reasonable new mothorboard and the RTL8169sb is and old PCI card. But, the same issue happens with both ethernet adapters (I switched the interfaces).
>>
>> But then again - this got me thinking... If the issue (send fifo stalling) does not happen in the internet facing interface even with heavy upload traffic, then the problem must be in the internal network itself. Tomorrow I will change the Ethernet switch. There could be some sort of compatibility problem?
>>
>> BR.
>> Risto
>>
>>
>> On 11.12.2018 21:51, Heiner Kallweit wrote:
>>> OK, then another idea .. At the very beginning of the mail thread it
>>> was stated that the router has to network ports:
>>> linux router: Linux computer with Dualcore Intel Celeron G1840, running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
>>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, mtu set to 1500, RTL8169sb/8110sb
>>> eth0: Linux Routers internet facing interface, public ip address, mtu set to 1500,  RTL8168evl/8111evl
>>>
>>> r8169 supports about 50 members of the RTL8169/RTL8168/RTL8101 family
>>> and more or less every single member needs its own quirks.
>>> RTL8168evl (PCIe) is somewhat recent, RTL8169sb (PCI) is ancient.
>>> To rule out you triggered some hardware issue: Could you switch both
>>> interfaces and check whether you see a change in system behavior?
>>>
>>> On 11.12.2018 18:01, Risto Pajula wrote:
>>>> Hello.
>>>>
>>>> A freshly built 4.20.0-rc6-next-20181210-lp150.12.25-default waited me when I got back from work, but unfortunately it did not help at all, it behaved exactly in same manner.
>>>>
>>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>>> Reply from 192.168.0.1: bytes=32 time=73ms TTL=64
>>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>>> Reply from 192.168.0.1: bytes=32 time=83ms TTL=64
>>>> Reply from 192.168.0.1: bytes=32 time=307ms TTL=64
>>>> Reply from 192.168.0.1: bytes=32 time=115ms TTL=64
>>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
>>>>
>>>> BR.
>>>> Risto
>>>>
>>>>
>>>> On 11.12.2018 0:20, Risto Pajula wrote:
>>>>> Hello.
>>>>>
>>>>> I have not yet tested with linux-next but I will, thanks for pointing that out.
>>>>>
>>>>> ...But I have studied the problem a bit more, indeed it seems that the rtl8169 transmission queue gets stuck.
>>>>>
>>>>> Below is some trace log. Starting from 802026 a burst of frames is forwarded from eth0 to eth1 and when we get to the rtl_tx only two frames have been transmitted.. (Should have had 1,25 ms to transmit them).. Also then the sequence seems to repeat for very long time that only two massages get transmitted.
>>>>>
>>>>> 800541:          <idle>-0     [001] ..s.  7237.698143: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 34 cur_tx: 4406946 frags: 0 dirty_tx: 4406946 tx_left: 0
>>>>> 800671:          <idle>-0     [001] ..s.  7237.698189: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 35 cur_tx: 4406947 frags: 0 dirty_tx: 4406946 tx_left: 1
>>>>> 800835:          <idle>-0     [001] ..s.  7237.698235: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 36 cur_tx: 4406948 frags: 0 dirty_tx: 4406946 tx_left: 2
>>>>> 800905:          <idle>-0     [000] ..s.  7237.698255: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406946 tx_left: 3
>>>>> 800932:          <idle>-0     [000] ..s.  7237.698263: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406947 tx_left: 2
>>>>> 800960:          <idle>-0     [000] ..s.  7237.698270: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406948 tx_left: 1
>>>>> 800998:          <idle>-0     [001] ..s.  7237.698282: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406949 tx_left: 0
>>>>> 801085:          <idle>-0     [000] ..s.  7237.698312: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 38 cur_tx: 4406950 frags: 0 dirty_tx: 4406949 tx_left: 1
>>>>> 801137:          <idle>-0     [001] ..s.  7237.698329: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 38 cur_tx: 4406950 frags: 0 dirty_tx: 4406950 tx_left: 0
>>>>> 801229:          <idle>-0     [000] ..s.  7237.698363: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 39 cur_tx: 4406951 frags: 0 dirty_tx: 4406950 tx_left: 1
>>>>> 801264:          <idle>-0     [001] ..s.  7237.698375: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 39 cur_tx: 4406951 frags: 0 dirty_tx: 4406951 tx_left: 0
>>>>> 801356:          <idle>-0     [000] ..s.  7237.698409: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 40 cur_tx: 4406952 frags: 0 dirty_tx: 4406951 tx_left: 1
>>>>> 801391:          <idle>-0     [001] ..s.  7237.698421: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 40 cur_tx: 4406952 frags: 0 dirty_tx: 4406952 tx_left: 0
>>>>> 801482:          <idle>-0     [000] ..s.  7237.698455: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 41 cur_tx: 4406953 frags: 0 dirty_tx: 4406952 tx_left: 1
>>>>> 801518:          <idle>-0     [001] ..s.  7237.698468: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 41 cur_tx: 4406953 frags: 0 dirty_tx: 4406953 tx_left: 0
>>>>> 801610:          <idle>-0     [000] ..s.  7237.698502: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 42 cur_tx: 4406954 frags: 0 dirty_tx: 4406953 tx_left: 1
>>>>> 801645:          <idle>-0     [001] ..s.  7237.698514: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 42 cur_tx: 4406954 frags: 0 dirty_tx: 4406954 tx_left: 0
>>>>> 801737:          <idle>-0     [000] ..s.  7237.698548: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 43 cur_tx: 4406955 frags: 0 dirty_tx: 4406954 tx_left: 1
>>>>> 801772:          <idle>-0     [001] ..s.  7237.698560: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 43 cur_tx: 4406955 frags: 0 dirty_tx: 4406955 tx_left: 0
>>>>> 801864:          <idle>-0     [000] ..s.  7237.698594: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 44 cur_tx: 4406956 frags: 0 dirty_tx: 4406955 tx_left: 1
>>>>> 801899:          <idle>-0     [001] ..s.  7237.698607: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 44 cur_tx: 4406956 frags: 0 dirty_tx: 4406956 tx_left: 0
>>>>> 801984:          <idle>-0     [000] ..s.  7237.698640: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 45 cur_tx: 4406957 frags: 0 dirty_tx: 4406956 tx_left: 1
>>>>> 802026:          <idle>-0     [001] ..s.  7237.698655: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 45 cur_tx: 4406957 frags: 0 dirty_tx: 4406957 tx_left: 0
>>>>> 802129:          <idle>-0     [001] ..s.  7237.698701: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 46 cur_tx: 4406958 frags: 0 dirty_tx: 4406957 tx_left: 1
>>>>> 802232:          <idle>-0     [001] ..s.  7237.698747: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 47 cur_tx: 4406959 frags: 0 dirty_tx: 4406957 tx_left: 2
>>>>> 802335:          <idle>-0     [001] ..s.  7237.698793: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 48 cur_tx: 4406960 frags: 0 dirty_tx: 4406957 tx_left: 3
>>>>> 802438:          <idle>-0     [001] ..s.  7237.698840: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 49 cur_tx: 4406961 frags: 0 dirty_tx: 4406957 tx_left: 4
>>>>> 802541:          <idle>-0     [001] ..s.  7237.698885: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 50 cur_tx: 4406962 frags: 0 dirty_tx: 4406957 tx_left: 5
>>>>> 802644:          <idle>-0     [001] ..s.  7237.698932: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 51 cur_tx: 4406963 frags: 0 dirty_tx: 4406957 tx_left: 6
>>>>> 802747:          <idle>-0     [001] ..s.  7237.698978: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 52 cur_tx: 4406964 frags: 0 dirty_tx: 4406957 tx_left: 7
>>>>> 802851:          <idle>-0     [001] ..s.  7237.699025: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 53 cur_tx: 4406965 frags: 0 dirty_tx: 4406957 tx_left: 8
>>>>> 805094:          <idle>-0     [000] ..s.  7237.699871: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406957 tx_left: 9
>>>>> 805126:          <idle>-0     [000] ..s.  7237.699878: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406958 tx_left: 8
>>>>> 805197:          <idle>-0     [001] ..s.  7237.699905: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406959 tx_left: 7
>>>>> 805234:          <idle>-0     [001] ..s.  7237.699925: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 55 cur_tx: 4406967 frags: 0 dirty_tx: 4406959 tx_left: 8
>>>>> 811446:          <idle>-0     [000] ..s.  7237.702344: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406959 tx_left: 9
>>>>> 811479:          <idle>-0     [000] ..s.  7237.702354: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406960 tx_left: 8
>>>>> 811641:          <idle>-0     [000] ..s.  7237.702395: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406961 tx_left: 7
>>>>> 811733:          <idle>-0     [000] ..s.  7237.702416: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 57 cur_tx: 4406969 frags: 0 dirty_tx: 4406961 tx_left: 8
>>>>> 812050:          hping3-2871  [000] ..s.  7237.704770: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406961 tx_left: 9
>>>>> 812061:          hping3-2871  [000] ..s.  7237.704777: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406962 tx_left: 8
>>>>> 812124:          hping3-2871  [000] ..s.  7237.704811: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406963 tx_left: 7
>>>>> 812176:          hping3-2871  [000] ..s.  7237.704831: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 59 cur_tx: 4406971 frags: 0 dirty_tx: 4406963 tx_left: 8
>>>>> 812300:          <idle>-0     [000] ..s.  7237.707247: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406963 tx_left: 9
>>>>> 812312:          <idle>-0     [000] ..s.  7237.707257: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406964 tx_left: 8
>>>>> 812389:          <idle>-0     [000] ..s.  7237.707297: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406965 tx_left: 7
>>>>> 812426:          <idle>-0     [000] ..s.  7237.707318: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 61 cur_tx: 4406973 frags: 0 dirty_tx: 4406965 tx_left: 8
>>>>> 812827:          <idle>-0     [000] ..s.  7237.709690: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 4406965 tx_left: 9
>>>>> 812838:          <idle>-0     [000] ..s.  7237.709700: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 4406966 tx_left: 8
>>>>> .....
>>>>>
>>>>>
>>>>> BR.
>>>>> Risto
>>>>>
>>>>> On 10.12.2018 23:26, Heiner Kallweit wrote:
>>>>>> Did you test also with the latest linux-next kernel? Some recent changes like 2e6eedb4813e
>>>>>> "r8169: make use of xmit_more and __netdev_sent_queue" may have a positive impact.
>>>>>>
>>>>>> On 10.12.2018 00:28, Risto Pajula wrote:
>>>>>>> Hello.
>>>>>>>
>>>>>>> Old subject: "Re: IP fragmentation performance and don't fragment bug when forwarding
>>>>>>>
>>>>>>>
>>>>>>> I have now been tracing the kernel and finding the bug seems difficult. I think the bug is combination of several things, likely cause is that it only occurs with rtl8169 and how it is using the netdevapi/DQL api.
>>>>>>>
>>>>>>>    From my investigations seems that following happens:
>>>>>>>
>>>>>>> *Burst of frames is received from internet. (eth0)
>>>>>>> *These are accepted and forwarded to the (eth1)
>>>>>>> *DQL hits the limit, this causes the scheduling for the device to be stopped (__QUEUE_STATE_STACK_XOFF)
>>>>>>> *This and combination of some timing causes the  performance degradation, messages are only scheduled for transmission in soft Irq context.
>>>>>>>
>>>>>>> I still do now know, why the DQL hits the limits.. I think it should not,  maybe something undesirable first happens with rtl8169.
>>>>>>>
>>>>>>> I inserted following trace printk functions to the code..
>>>>>>>
>>>>>>> rtl8169_start_xmit:
>>>>>>> trace_printk("ORP33 %s rtl8169_start_xmit len: %d entry: %u cur_tx: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, skb->len, entry, tp->cur_tx, frags, tp->dirty_tx, (tp->cur_tx - tp->dirty_tx) );
>>>>>>>
>>>>>>>
>>>>>>> rtl_tx:
>>>>>>> trace_printk("ORP33 %s rtl_tx len: %d entry: %u cur_tx: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, tx_skb->skb->len, tp->cur_tx % NUM_TX_DESC, tp->cur_tx, 0, dirty_tx, tx_left );
>>>>>>>
>>>>>>>
>>>>>>> Here is some grepped output only filtering the ping packet in two different situations:
>>>>>>>
>>>>>>> //trace when downloading ubuntu (~20MB/s), only 3 secs fit to buffer. ping is all the time normal....
>>>>>>> //orig tracing.out3754
>>>>>>> 117493:          <idle>-0     [000] ..s.  5235.407116: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 57 cur_tx: 3747641 frags: 0 dirty_tx: 3747640 tx_left: 1
>>>>>>> 118704:          <idle>-0     [000] ..s.  5235.416057: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 20 cur_tx: 3747668 frags: 0 dirty_tx: 3747667 tx_left: 1
>>>>>>> 346319:          hping3-2871  [000] ....  5236.002456: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 8 cur_tx: 3754312 frags: 0 dirty_tx: 3754312 tx_left: 0
>>>>>>> 346337:          hping3-2871  [000] ..s.  5236.002475: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 9 cur_tx: 3754313 frags: 0 dirty_tx: 3754312 tx_left: 1
>>>>>>> 1277155:          hping3-2871  [001] ....  5237.002627: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 38 cur_tx: 3766502 frags: 0 dirty_tx: 3766502 tx_left: 0
>>>>>>> 1277173:          <idle>-0     [000] ..s.  5237.002646: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 39 cur_tx: 3766503 frags: 0 dirty_tx: 3766502 tx_left: 1
>>>>>>>
>>>>>>>
>>>>>>> //imcp messages from route to intenal network when TCP stream is on. tracing.out3236/eth1_ring.txt is the original
>>>>>>> //no idea why the first frame is tripled...
>>>>>>> 200792:          hping3-2871  [001] ....  3239.593384: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 34 cur_tx: 2132770 frags: 0 dirty_tx: 2132770 tx_left: 0
>>>>>>> 201578:          hping3-2871  [001] ....  3240.593625: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 40 cur_tx: 2132776 frags: 0 dirty_tx: 2132776 tx_left: 0
>>>>>>> 341061:          <idle>-0     [000] ..s.  3241.738011: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 52 cur_tx: 2133684 frags: 0 dirty_tx: 2133680 tx_left: 4
>>>>>>> 342112:          <idle>-0     [000] ..s.  3241.745297: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 59 cur_tx: 2133691 frags: 0 dirty_tx: 2133684 tx_left: 7
>>>>>>> 500126:          <idle>-0     [000] ..s.  3242.739500: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 49 cur_tx: 2134513 frags: 0 dirty_tx: 2134505 tx_left: 8
>>>>>>> 501739:          <idle>-0     [000] ..s.  3242.751654: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 58 cur_tx: 2134522 frags: 0 dirty_tx: 2134513 tx_left: 9
>>>>>>> 646220:          <idle>-0     [000] ..s.  3243.745761: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 41 cur_tx: 2135273 frags: 0 dirty_tx: 2135267 tx_left: 6
>>>>>>> 647633:          <idle>-0     [000] ..s.  3243.755486: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 49 cur_tx: 2135281 frags: 0 dirty_tx: 2135273 tx_left: 8
>>>>>>> 802878:          <idle>-0     [000] ..s.  3244.739947: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2136087 frags: 0 dirty_tx: 2136081 tx_left: 6
>>>>>>> 804298:          <idle>-0     [000] ..s.  3244.749677: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 31 cur_tx: 2136095 frags: 0 dirty_tx: 2136087 tx_left: 8
>>>>>>> 961190:          <idle>-0     [000] ..s.  3245.746217: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 16 cur_tx: 2136912 frags: 0 dirty_tx: 2136906 tx_left: 6
>>>>>>> 962610:          <idle>-0     [000] ..s.  3245.755946: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2136920 frags: 0 dirty_tx: 2136912 tx_left: 8
>>>>>>> 1118044:          <idle>-0     [000] ..s.  3246.740336: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 62 cur_tx: 2137726 frags: 0 dirty_tx: 2137720 tx_left: 6
>>>>>>> 1119443:          <idle>-0     [000] ..s.  3246.750047: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 5 cur_tx: 2137733 frags: 0 dirty_tx: 2137726 tx_left: 7
>>>>>>> 1264047:          <idle>-0     [000] ..s.  3247.709202: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 29 cur_tx: 2138525 frags: 0 dirty_tx: 2138518 tx_left: 7
>>>>>>> 1264740:          <idle>-0     [000] ..s.  3247.718918: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 33 cur_tx: 2138529 frags: 0 dirty_tx: 2138525 tx_left: 4
>>>>>>> 1419958:          <idle>-0     [000] ..s.  3248.736436: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 48 cur_tx: 2139312 frags: 0 dirty_tx: 2139305 tx_left: 7
>>>>>>> 1421551:          <idle>-0     [000] ..s.  3248.746189: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 56 cur_tx: 2139320 frags: 0 dirty_tx: 2139312 tx_left: 8
>>>>>>> 1578746:          <idle>-0     [000] ..s.  3249.742702: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 46 cur_tx: 2140142 frags: 0 dirty_tx: 2140135 tx_left: 7
>>>>>>> 1580194:          <idle>-0     [000] ..s.  3249.752457: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 54 cur_tx: 2140150 frags: 0 dirty_tx: 2140142 tx_left: 8
>>>>>>> 1729597:          <idle>-0     [000] ..s.  3250.746839: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 36 cur_tx: 2140964 frags: 0 dirty_tx: 2140957 tx_left: 7
>>>>>>> 1731015:          <idle>-0     [000] ..s.  3250.756594: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 43 cur_tx: 2140971 frags: 0 dirty_tx: 2140964 tx_left: 7
>>>>>>> 1881028:          <idle>-0     [000] ..s.  3251.740991: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 19 cur_tx: 2141779 frags: 0 dirty_tx: 2141772 tx_left: 7
>>>>>>> 1881381:          <idle>-0     [000] ..s.  3251.750718: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 26 cur_tx: 2141786 frags: 0 dirty_tx: 2141779 tx_left: 7
>>>>>>> //something happens, the output queue empties.. now it is fast, and rtl8169_start_xmit is not called from soft_irq
>>>>>>> 1896178:          hping3-2871  [001] ....  3252.595994: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2141911 frags: 0 dirty_tx: 2141911 tx_left: 0
>>>>>>> 1896196:          <idle>-0     [000] ..s.  3252.596055: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2141912 frags: 0 dirty_tx: 2141911 tx_left: 1
>>>>>>> 1923136:          hping3-2871  [001] ....  3253.596227: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 50 cur_tx: 2142066 frags: 0 dirty_tx: 2142066 tx_left: 0
>>>>>>> 1923154:          <idle>-0     [000] ..s.  3253.596289: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 51 cur_tx: 2142067 frags: 0 dirty_tx: 2142066 tx_left: 1
>>>>>>> 1951633:          hping3-2871  [001] ....  3254.596459: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 5 cur_tx: 2142213 frags: 0 dirty_tx: 2142213 tx_left: 0
>>>>>>> 1951651:          <idle>-0     [000] ..s.  3254.596521: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 6 cur_tx: 2142214 frags: 0 dirty_tx: 2142213 tx_left: 1
>>>>>>> 1954472:          hping3-2871  [001] ....  3255.596700: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 15 cur_tx: 2142223 frags: 0 dirty_tx: 2142223 tx_left: 0
>>>>>>> 1954490:          <idle>-0     [000] ..s.  3255.596763: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 16 cur_tx: 2142224 frags: 0 dirty_tx: 2142223 tx_left: 1
>>>>>>> //until it soon, again is slow..
>>>>>>>
>>>>>>>
>>>>>>> BR.
>>>>>>> Risto
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 7.12.2018 16:46, Risto Pajula wrote:
>>>>>>>> Hello.
>>>>>>>>
>>>>>>>> I have been to track the poor forwarding latency to the TCP Window scale options. The Netgem device uses rather large windows scale options (x256) and I have been able to reproduce the routers poor forwarding latency also with linux box running in the internal network and changing the net.ipv4.tcp_rmem to a large value and thus changing the TCP window scaling options to larger ones. I still do not have clue why this causes the forwarfing in the linux kernel to block? Maybe something in the connection tracking....?
>>>>>>>>
>>>>>>>>
>>>>>>>> With the ICMP timestamp messages I have been able to also pinpoint that the latency is caused in the eth1 sending side (the following hping3 example is run in the router toward the internal network...
>>>>>>>>
>>>>>>>>
>>>>>>>> xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
>>>>>>>> using eth1, addr: 192.168.0.1, MTU: 1500
>>>>>>>> HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0 data bytes
>>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
>>>>>>>> icmp_seq=0 rtt=7.9 ms
>>>>>>>> ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
>>>>>>>> ICMP timestamp RTT tsrtt=7
>>>>>>>>
>>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
>>>>>>>> icmp_seq=1 rtt=235.9 ms
>>>>>>>> ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
>>>>>>>> ICMP timestamp RTT tsrtt=235
>>>>>>>>
>>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
>>>>>>>> icmp_seq=2 rtt=3.8 ms
>>>>>>>> ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
>>>>>>>> ICMP timestamp RTT tsrtt=3
>>>>>>>>
>>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
>>>>>>>> icmp_seq=3 rtt=47.8 ms
>>>>>>>> ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
>>>>>>>> ICMP timestamp RTT tsrtt=47
>>>>>>>>
>>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
>>>>>>>> icmp_seq=4 rtt=7.7 ms
>>>>>>>> ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
>>>>>>>> ICMP timestamp RTT tsrtt=7
>>>>>>>>
>>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
>>>>>>>> icmp_seq=5 rtt=3.7 ms
>>>>>>>> ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
>>>>>>>> ICMP timestamp RTT tsrtt=3
>>>>>>>>
>>>>>>>> ^C
>>>>>>>> --- 192.168.0.112 hping statistic ---
>>>>>>>> 6 packets tramitted, 6 packets received, 0% packet loss
>>>>>>>> round-trip min/avg/max = 3.7/51.1/235.9 ms
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> BR.
>>>>>>>> Risto
>>>>>>>>
>>>>>>>> On 2.12.2018 23:32, Risto Pajula wrote:
>>>>>>>>> Hello.
>>>>>>>>>
>>>>>>>>> You can most likely ignore the "DF Bit, mtu bug when forwarding" case. There isn't actually big IP packets on the wire, instead there is burst of packets on the wire, which are combined by the GRO... And thus dropping them should not happen. Sorry about the invalid bug report.
>>>>>>>>>
>>>>>>>>> However the poor latency from intenal network to the internet still remain, both GRO enabled and disabled. I will try to study further...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> BR.
>>>>>>>>> Risto
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2.12.2018 14:01, Risto Pajula wrote:
>>>>>>>>>> Hello.
>>>>>>>>>>
>>>>>>>>>> I have encountered a weird performance problem in Linux IP fragmentation when using video streaming services behind the NAT. Also I have studied a possible bug in the DF bit (don't fragment) handling when forwarding the IP packets.
>>>>>>>>>>
>>>>>>>>>> First the system setup description:
>>>>>>>>>>
>>>>>>>>>> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre router]-internet
>>>>>>>>>>
>>>>>>>>>> where:
>>>>>>>>>> host1: is a Netgem N7800 "cable box" for online video streaming services provided by local telco (Can access Netflix, HBO nordic, "live TV", etc.)
>>>>>>>>>> linux router: Linux computer with Dualcore Intel Celeron G1840, running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
>>>>>>>>>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, mtu set to 1500, RTL8169sb/8110sb
>>>>>>>>>> eth0: Linux Routers internet facing interface, public ip address, mtu set to 1500,  RTL8168evl/8111evl
>>>>>>>>>> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly connected to the eth0 of the Linux router.
>>>>>>>>>>
>>>>>>>>>> And now when using the Netgem N7800 with online video services (Netflix, HBO nordic, etc) the Linux router will receive very BIG IP packets in the eth0 upto ~20kB, this seems to lead to the following problems in the Linux IP stack.
>>>>>>>>>>
>>>>>>>>>> IP fragmentation performance:
>>>>>>>>>> When the Linux router receives these large IP packets in the eth0 everything works, but it seems that them cause very large performance degradation from internal network to the internet regarding the latency when the IP fragmentation is performed. The ping latency from internal network to the internel network increases from stable 15ms-20ms up to 700-800ms AND also the ping from the internal network to the linux router eth1 (192.168.0.). However up link works perfectly, the ping is still stable when streaming the online services (From linux router to the internet). It seems that the IP fragmentation is somehow blocking the eth1 reception or transmission for very long time (which it shouldn't). I'm able to test and debug the issue further, but advice regarding where to look would
  be appreciated.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> DF Bit, mtu bug when forwarding:
>>>>>>>>>> I have started to study the above mentioned problem and have found a possible bug in the DF bit and mtu handling in IP forwarding. The BIG packets received from streaming services all have the "DF bit" set and the question is that should we be forwarding them at all as that would result them being fragmented? Apparently we currently are... I have traced this down to the ip_forward.c function ip_exceeds_mtu(), and the following patch seems to fix that.
>>>>>>>>>>
>>>>>>>>>> --- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
>>>>>>>>>> +++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
>>>>>>>>>> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
>>>>>>>>>>                    return false;
>>>>>>>>>>
>>>>>>>>>>            /* original fragment exceeds mtu and DF is set */
>>>>>>>>>> -       if (unlikely(IPCB(skb)->frag_max_size > mtu))
>>>>>>>>>> +        if (unlikely(skb->len > mtu))
>>>>>>>>>>                    return true;
>>>>>>>>>>
>>>>>>>>>>            if (skb->ignore_df)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This seems to work (in some ways) - after the change IP packets that are too large to the internal network get dropped and we are sending "ICMP Destination unreachable, The datagram is too big" messages to the originator (as we should?). However it seems that not all services really like this... Netflix behaves as expected and ping is stable from internal network to the internet, but for example HBO nordic will not work anymore (too little buffering? Retransimissions not working?). So it seems the original issue should be also fixed (And the fragmention should be allowed?).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Any advice would be appreciated. Thanks!
>>>>>>>>>>
>>>>>>>>>> PS. Watching TV was not this intensive 20 years ago :)
>>>>>>>>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: IP (rtl8169) forwarding bug (performance)
  2018-12-12 23:20                   ` Risto Pajula
@ 2018-12-13  4:52                     ` Stephen Hemminger
  2018-12-13 22:10                       ` Risto Pajula
  0 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2018-12-13  4:52 UTC (permalink / raw)
  To: Risto Pajula
  Cc: Heiner Kallweit, David S. Miller, Alexey Kuznetsov,
	Realtek linux nic maintainers, netdev

On Thu, 13 Dec 2018 01:20:48 +0200
Risto Pajula <or.pajula@gmail.com> wrote:

> Hello.
> 
> I'm not able reproduce the actual problem anymore which was the high 
> ping latency from the internal network.
> 
> This starts to sound like some sort of voodoo, but...
> 
> I tested replacing the switch to another brand where internal network 
> trtl8169 is connected. This did not have any effect on the behavior, the 
> latency remained high.
> 
> Then I disconnected every device and all the network cables from the 
> internal network. The I connected everything again. After that 
> reproducing the problem seems very difficult. Likelihood for this to 
> happen dramatically decreased.
> 
> Looking at pings 30min I once got this.
> 
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time=18ms TTL=64
> Reply from 192.168.0.1: bytes=32 time=24ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> 
> I did also look at the kernel traces, it seems the rtl8169 send FIFO 
> still stalls, but it does not happen as often and in practice is hard to 
> notice for the user.
> 
> For example I did find this kind sequence:
> 
> t0: rtl8169_start_xmit first 1506 byte frame is written
> ...9 more frames is transmitted
> t0+350us: 2 frames is still not sent.  (transmission should have taken 
> 120us + interframe time)
> 
> BR.
> Risto
> 
> On 12.12.2018 8:23, Heiner Kallweit wrote:
> > According to your description of the issue it doesn't need a very exotic scenario to trigger it.
> > And due to the fact that Realtek network chips are used on a lot of consumer mainboards, I would
> > assume quite some people are using such a mainboard for a use case like yours.
> > This makes it somewhat suspicious that at least I'm not aware of more such reports.
> > Therefore I think it's a good idea to check the other components of the internal network.
> >
> > On 12.12.2018 02:28, Risto Pajula wrote:  
> >> Hello.
> >>
> >> I added some debug prints to diagnose the bug properly. I can send the patches if you are willing to debug/try... for example this output is produced:
> >>
> >> 96096:          <idle>-0     [000] ..s.   232.466703: rtl8169_start_xmit: RTLDBG221 eth1 rtl8169_start_xmit len: 1506 opts1: B0000000 txpol: 0 entry: 24 cur_tx: 5656 frags: 0 dirty_tx: 5648 tx_left: 8
> >> 96097:          <idle>-0     [000] ..s.   232.466704: rtl8169_start_xmit: RTLDBG111 eth1 rtl_tx_slots_avail
> >> 96099:          <idle>-0     [000] d.h.   232.468827: rtl8169_interrupt: RTLDBG111 eth1 rtl8169_interrupt
> >> 96103:          <idle>-0     [000] ..s.   232.468833: rtl8169_poll: RTLDBG111 eth1 rtl8169_poll
> >> 96104:          <idle>-0     [000] ..s.   232.468834: rtl8169_poll: RTLDBG111 eth1 rtl_rx
> >> 96105:          <idle>-0     [000] ..s.   232.468835: rtl8169_poll: RTLDBG111 eth1 rtl8169_try_rx_copy
> >> 96184:          <idle>-0     [000] ..s.   232.468914: rtl_tx: RTLDBG222 eth1 rtl_tx len: 1506 opts1: B00005E2 txpol: 0 entry: 16 cur_tx: 5657 frags: 0 dirty_tx: 5648 tx_left: 9
> >> 96185:          <idle>-0     [000] ..s.   232.468924: rtl_tx: RTLDBG223 eth1 rtl_tx entry opts1: 16, B00005E2 17, B00005E2 18, B00005E2 19, B00005E2 20, B00005E2 21, B00005E2 22, B00005E2 23, B00005E2 24, B00005E2
> >> 96189:          <idle>-0     [000] d.h.   232.469006: rtl8169_interrupt: RTLDBG111 eth1 rtl8169_interrupt
> >> 96193:          <idle>-0     [000] ..s.   232.469012: rtl8169_poll: RTLDBG111 eth1 rtl8169_poll
> >> 96194:          <idle>-0     [000] ..s.   232.469014: rtl8169_poll: RTLDBG111 eth1 rtl_rx
> >> 96196:          <idle>-0     [000] ..s.   232.469020: rtl_tx: RTLDBG222 eth1 rtl_tx len: 1506 opts1: 30000000 txpol: 0 entry: 16 cur_tx: 5657 frags: 0 dirty_tx: 5648 tx_left: 9
> >> 96204:          <idle>-0     [000] ..s.   232.469032: rtl_tx: RTLDBG222 eth1 rtl_tx len: 1506 opts1: 30000000 txpol: 0 entry: 17 cur_tx: 5657 frags: 0 dirty_tx: 5649 tx_left: 8
> >> 96212:          <idle>-0     [000] ..s.   232.469041: rtl_tx: RTLDBG222 eth1 rtl_tx len: 1506 opts1: B00005E2 txpol: 0 entry: 18 cur_tx: 5657 frags: 0 dirty_tx: 5650 tx_left: 7
> >> 96213:          <idle>-0     [000] ..s.   232.469050: rtl_tx: RTLDBG223 eth1 rtl_tx entry opts1: 18, B00005E2 19, B00005E2 20, B00005E2 21, B00005E2 22, B00005E2 23, B00005E2 24, B00005E2
> >> 96235:          <idle>-0     [000] d.h.   232.471253: rtl8169_interrupt: RTLDBG111 eth1 rtl8169_interrupt
> >>
> >> Here we can clearly see that TX Fifo descriptors are properly setup and send is initiated, but then the rtl8169 stops sending after few frames. I think the bug is in the HW side (or in the rtl8169 firmware). Some PCI settings might affect this, but then again shouldn't the rtl8169 then report PCI error interrupt...
> >>
> >> Also this TX Fifo stalling happens with any traffic and no netflix/netgem/large tcp windows are required to trigger it. What the netgem/netflix does is that it causes this bug to be visible by probably due to timing of the traffic pattern, and thus causes also upper layers of the network stack to take actions..
> >>
> >> RTL8168evl indeed is a onboard Ethernet on a reasonable new mothorboard and the RTL8169sb is and old PCI card. But, the same issue happens with both ethernet adapters (I switched the interfaces).
> >>
> >> But then again - this got me thinking... If the issue (send fifo stalling) does not happen in the internet facing interface even with heavy upload traffic, then the problem must be in the internal network itself. Tomorrow I will change the Ethernet switch. There could be some sort of compatibility problem?
> >>
> >> BR.
> >> Risto
> >>
> >>
> >> On 11.12.2018 21:51, Heiner Kallweit wrote:  
> >>> OK, then another idea .. At the very beginning of the mail thread it
> >>> was stated that the router has to network ports:
> >>> linux router: Linux computer with Dualcore Intel Celeron G1840, running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
> >>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, mtu set to 1500, RTL8169sb/8110sb
> >>> eth0: Linux Routers internet facing interface, public ip address, mtu set to 1500,  RTL8168evl/8111evl
> >>>
> >>> r8169 supports about 50 members of the RTL8169/RTL8168/RTL8101 family
> >>> and more or less every single member needs its own quirks.
> >>> RTL8168evl (PCIe) is somewhat recent, RTL8169sb (PCI) is ancient.
> >>> To rule out you triggered some hardware issue: Could you switch both
> >>> interfaces and check whether you see a change in system behavior?
> >>>
> >>> On 11.12.2018 18:01, Risto Pajula wrote:  
> >>>> Hello.
> >>>>
> >>>> A freshly built 4.20.0-rc6-next-20181210-lp150.12.25-default waited me when I got back from work, but unfortunately it did not help at all, it behaved exactly in same manner.
> >>>>
> >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> >>>> Reply from 192.168.0.1: bytes=32 time=73ms TTL=64
> >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> >>>> Reply from 192.168.0.1: bytes=32 time=83ms TTL=64
> >>>> Reply from 192.168.0.1: bytes=32 time=307ms TTL=64
> >>>> Reply from 192.168.0.1: bytes=32 time=115ms TTL=64
> >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> >>>> Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
> >>>>
> >>>> BR.
> >>>> Risto
> >>>>
> >>>>
> >>>> On 11.12.2018 0:20, Risto Pajula wrote:  
> >>>>> Hello.
> >>>>>
> >>>>> I have not yet tested with linux-next but I will, thanks for pointing that out.
> >>>>>
> >>>>> ...But I have studied the problem a bit more, indeed it seems that the rtl8169 transmission queue gets stuck.
> >>>>>
> >>>>> Below is some trace log. Starting from 802026 a burst of frames is forwarded from eth0 to eth1 and when we get to the rtl_tx only two frames have been transmitted.. (Should have had 1,25 ms to transmit them).. Also then the sequence seems to repeat for very long time that only two massages get transmitted.
> >>>>>
> >>>>> 800541:          <idle>-0     [001] ..s.  7237.698143: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 34 cur_tx: 4406946 frags: 0 dirty_tx: 4406946 tx_left: 0
> >>>>> 800671:          <idle>-0     [001] ..s.  7237.698189: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 35 cur_tx: 4406947 frags: 0 dirty_tx: 4406946 tx_left: 1
> >>>>> 800835:          <idle>-0     [001] ..s.  7237.698235: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 36 cur_tx: 4406948 frags: 0 dirty_tx: 4406946 tx_left: 2
> >>>>> 800905:          <idle>-0     [000] ..s.  7237.698255: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406946 tx_left: 3
> >>>>> 800932:          <idle>-0     [000] ..s.  7237.698263: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406947 tx_left: 2
> >>>>> 800960:          <idle>-0     [000] ..s.  7237.698270: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406948 tx_left: 1
> >>>>> 800998:          <idle>-0     [001] ..s.  7237.698282: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 4406949 tx_left: 0
> >>>>> 801085:          <idle>-0     [000] ..s.  7237.698312: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 38 cur_tx: 4406950 frags: 0 dirty_tx: 4406949 tx_left: 1
> >>>>> 801137:          <idle>-0     [001] ..s.  7237.698329: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 38 cur_tx: 4406950 frags: 0 dirty_tx: 4406950 tx_left: 0
> >>>>> 801229:          <idle>-0     [000] ..s.  7237.698363: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 39 cur_tx: 4406951 frags: 0 dirty_tx: 4406950 tx_left: 1
> >>>>> 801264:          <idle>-0     [001] ..s.  7237.698375: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 39 cur_tx: 4406951 frags: 0 dirty_tx: 4406951 tx_left: 0
> >>>>> 801356:          <idle>-0     [000] ..s.  7237.698409: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 40 cur_tx: 4406952 frags: 0 dirty_tx: 4406951 tx_left: 1
> >>>>> 801391:          <idle>-0     [001] ..s.  7237.698421: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 40 cur_tx: 4406952 frags: 0 dirty_tx: 4406952 tx_left: 0
> >>>>> 801482:          <idle>-0     [000] ..s.  7237.698455: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 41 cur_tx: 4406953 frags: 0 dirty_tx: 4406952 tx_left: 1
> >>>>> 801518:          <idle>-0     [001] ..s.  7237.698468: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 41 cur_tx: 4406953 frags: 0 dirty_tx: 4406953 tx_left: 0
> >>>>> 801610:          <idle>-0     [000] ..s.  7237.698502: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 42 cur_tx: 4406954 frags: 0 dirty_tx: 4406953 tx_left: 1
> >>>>> 801645:          <idle>-0     [001] ..s.  7237.698514: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 42 cur_tx: 4406954 frags: 0 dirty_tx: 4406954 tx_left: 0
> >>>>> 801737:          <idle>-0     [000] ..s.  7237.698548: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 43 cur_tx: 4406955 frags: 0 dirty_tx: 4406954 tx_left: 1
> >>>>> 801772:          <idle>-0     [001] ..s.  7237.698560: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 43 cur_tx: 4406955 frags: 0 dirty_tx: 4406955 tx_left: 0
> >>>>> 801864:          <idle>-0     [000] ..s.  7237.698594: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 44 cur_tx: 4406956 frags: 0 dirty_tx: 4406955 tx_left: 1
> >>>>> 801899:          <idle>-0     [001] ..s.  7237.698607: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 44 cur_tx: 4406956 frags: 0 dirty_tx: 4406956 tx_left: 0
> >>>>> 801984:          <idle>-0     [000] ..s.  7237.698640: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 45 cur_tx: 4406957 frags: 0 dirty_tx: 4406956 tx_left: 1
> >>>>> 802026:          <idle>-0     [001] ..s.  7237.698655: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 45 cur_tx: 4406957 frags: 0 dirty_tx: 4406957 tx_left: 0
> >>>>> 802129:          <idle>-0     [001] ..s.  7237.698701: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 46 cur_tx: 4406958 frags: 0 dirty_tx: 4406957 tx_left: 1
> >>>>> 802232:          <idle>-0     [001] ..s.  7237.698747: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 47 cur_tx: 4406959 frags: 0 dirty_tx: 4406957 tx_left: 2
> >>>>> 802335:          <idle>-0     [001] ..s.  7237.698793: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 48 cur_tx: 4406960 frags: 0 dirty_tx: 4406957 tx_left: 3
> >>>>> 802438:          <idle>-0     [001] ..s.  7237.698840: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 49 cur_tx: 4406961 frags: 0 dirty_tx: 4406957 tx_left: 4
> >>>>> 802541:          <idle>-0     [001] ..s.  7237.698885: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 50 cur_tx: 4406962 frags: 0 dirty_tx: 4406957 tx_left: 5
> >>>>> 802644:          <idle>-0     [001] ..s.  7237.698932: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 51 cur_tx: 4406963 frags: 0 dirty_tx: 4406957 tx_left: 6
> >>>>> 802747:          <idle>-0     [001] ..s.  7237.698978: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 52 cur_tx: 4406964 frags: 0 dirty_tx: 4406957 tx_left: 7
> >>>>> 802851:          <idle>-0     [001] ..s.  7237.699025: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 53 cur_tx: 4406965 frags: 0 dirty_tx: 4406957 tx_left: 8
> >>>>> 805094:          <idle>-0     [000] ..s.  7237.699871: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406957 tx_left: 9
> >>>>> 805126:          <idle>-0     [000] ..s.  7237.699878: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406958 tx_left: 8
> >>>>> 805197:          <idle>-0     [001] ..s.  7237.699905: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 4406959 tx_left: 7
> >>>>> 805234:          <idle>-0     [001] ..s.  7237.699925: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 55 cur_tx: 4406967 frags: 0 dirty_tx: 4406959 tx_left: 8
> >>>>> 811446:          <idle>-0     [000] ..s.  7237.702344: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406959 tx_left: 9
> >>>>> 811479:          <idle>-0     [000] ..s.  7237.702354: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406960 tx_left: 8
> >>>>> 811641:          <idle>-0     [000] ..s.  7237.702395: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 4406961 tx_left: 7
> >>>>> 811733:          <idle>-0     [000] ..s.  7237.702416: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 57 cur_tx: 4406969 frags: 0 dirty_tx: 4406961 tx_left: 8
> >>>>> 812050:          hping3-2871  [000] ..s.  7237.704770: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406961 tx_left: 9
> >>>>> 812061:          hping3-2871  [000] ..s.  7237.704777: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406962 tx_left: 8
> >>>>> 812124:          hping3-2871  [000] ..s.  7237.704811: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406963 tx_left: 7
> >>>>> 812176:          hping3-2871  [000] ..s.  7237.704831: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 59 cur_tx: 4406971 frags: 0 dirty_tx: 4406963 tx_left: 8
> >>>>> 812300:          <idle>-0     [000] ..s.  7237.707247: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406963 tx_left: 9
> >>>>> 812312:          <idle>-0     [000] ..s.  7237.707257: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406964 tx_left: 8
> >>>>> 812389:          <idle>-0     [000] ..s.  7237.707297: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 4406965 tx_left: 7
> >>>>> 812426:          <idle>-0     [000] ..s.  7237.707318: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 61 cur_tx: 4406973 frags: 0 dirty_tx: 4406965 tx_left: 8
> >>>>> 812827:          <idle>-0     [000] ..s.  7237.709690: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 4406965 tx_left: 9
> >>>>> 812838:          <idle>-0     [000] ..s.  7237.709700: rtl8169_poll: ORP33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 4406966 tx_left: 8
> >>>>> .....
> >>>>>
> >>>>>
> >>>>> BR.
> >>>>> Risto
> >>>>>
> >>>>> On 10.12.2018 23:26, Heiner Kallweit wrote:  
> >>>>>> Did you test also with the latest linux-next kernel? Some recent changes like 2e6eedb4813e
> >>>>>> "r8169: make use of xmit_more and __netdev_sent_queue" may have a positive impact.
> >>>>>>
> >>>>>> On 10.12.2018 00:28, Risto Pajula wrote:  
> >>>>>>> Hello.
> >>>>>>>
> >>>>>>> Old subject: "Re: IP fragmentation performance and don't fragment bug when forwarding
> >>>>>>>
> >>>>>>>
> >>>>>>> I have now been tracing the kernel and finding the bug seems difficult. I think the bug is combination of several things, likely cause is that it only occurs with rtl8169 and how it is using the netdevapi/DQL api.
> >>>>>>>
> >>>>>>>    From my investigations seems that following happens:
> >>>>>>>
> >>>>>>> *Burst of frames is received from internet. (eth0)
> >>>>>>> *These are accepted and forwarded to the (eth1)
> >>>>>>> *DQL hits the limit, this causes the scheduling for the device to be stopped (__QUEUE_STATE_STACK_XOFF)
> >>>>>>> *This and combination of some timing causes the  performance degradation, messages are only scheduled for transmission in soft Irq context.
> >>>>>>>
> >>>>>>> I still do now know, why the DQL hits the limits.. I think it should not,  maybe something undesirable first happens with rtl8169.
> >>>>>>>
> >>>>>>> I inserted following trace printk functions to the code..
> >>>>>>>
> >>>>>>> rtl8169_start_xmit:
> >>>>>>> trace_printk("ORP33 %s rtl8169_start_xmit len: %d entry: %u cur_tx: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, skb->len, entry, tp->cur_tx, frags, tp->dirty_tx, (tp->cur_tx - tp->dirty_tx) );
> >>>>>>>
> >>>>>>>
> >>>>>>> rtl_tx:
> >>>>>>> trace_printk("ORP33 %s rtl_tx len: %d entry: %u cur_tx: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, tx_skb->skb->len, tp->cur_tx % NUM_TX_DESC, tp->cur_tx, 0, dirty_tx, tx_left );
> >>>>>>>
> >>>>>>>
> >>>>>>> Here is some grepped output only filtering the ping packet in two different situations:
> >>>>>>>
> >>>>>>> //trace when downloading ubuntu (~20MB/s), only 3 secs fit to buffer. ping is all the time normal....
> >>>>>>> //orig tracing.out3754
> >>>>>>> 117493:          <idle>-0     [000] ..s.  5235.407116: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 57 cur_tx: 3747641 frags: 0 dirty_tx: 3747640 tx_left: 1
> >>>>>>> 118704:          <idle>-0     [000] ..s.  5235.416057: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 20 cur_tx: 3747668 frags: 0 dirty_tx: 3747667 tx_left: 1
> >>>>>>> 346319:          hping3-2871  [000] ....  5236.002456: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 8 cur_tx: 3754312 frags: 0 dirty_tx: 3754312 tx_left: 0
> >>>>>>> 346337:          hping3-2871  [000] ..s.  5236.002475: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 9 cur_tx: 3754313 frags: 0 dirty_tx: 3754312 tx_left: 1
> >>>>>>> 1277155:          hping3-2871  [001] ....  5237.002627: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 38 cur_tx: 3766502 frags: 0 dirty_tx: 3766502 tx_left: 0
> >>>>>>> 1277173:          <idle>-0     [000] ..s.  5237.002646: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 39 cur_tx: 3766503 frags: 0 dirty_tx: 3766502 tx_left: 1
> >>>>>>>
> >>>>>>>
> >>>>>>> //imcp messages from route to intenal network when TCP stream is on. tracing.out3236/eth1_ring.txt is the original
> >>>>>>> //no idea why the first frame is tripled...
> >>>>>>> 200792:          hping3-2871  [001] ....  3239.593384: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 34 cur_tx: 2132770 frags: 0 dirty_tx: 2132770 tx_left: 0
> >>>>>>> 201578:          hping3-2871  [001] ....  3240.593625: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 40 cur_tx: 2132776 frags: 0 dirty_tx: 2132776 tx_left: 0
> >>>>>>> 341061:          <idle>-0     [000] ..s.  3241.738011: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 52 cur_tx: 2133684 frags: 0 dirty_tx: 2133680 tx_left: 4
> >>>>>>> 342112:          <idle>-0     [000] ..s.  3241.745297: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 59 cur_tx: 2133691 frags: 0 dirty_tx: 2133684 tx_left: 7
> >>>>>>> 500126:          <idle>-0     [000] ..s.  3242.739500: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 49 cur_tx: 2134513 frags: 0 dirty_tx: 2134505 tx_left: 8
> >>>>>>> 501739:          <idle>-0     [000] ..s.  3242.751654: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 58 cur_tx: 2134522 frags: 0 dirty_tx: 2134513 tx_left: 9
> >>>>>>> 646220:          <idle>-0     [000] ..s.  3243.745761: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 41 cur_tx: 2135273 frags: 0 dirty_tx: 2135267 tx_left: 6
> >>>>>>> 647633:          <idle>-0     [000] ..s.  3243.755486: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 49 cur_tx: 2135281 frags: 0 dirty_tx: 2135273 tx_left: 8
> >>>>>>> 802878:          <idle>-0     [000] ..s.  3244.739947: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2136087 frags: 0 dirty_tx: 2136081 tx_left: 6
> >>>>>>> 804298:          <idle>-0     [000] ..s.  3244.749677: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 31 cur_tx: 2136095 frags: 0 dirty_tx: 2136087 tx_left: 8
> >>>>>>> 961190:          <idle>-0     [000] ..s.  3245.746217: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 16 cur_tx: 2136912 frags: 0 dirty_tx: 2136906 tx_left: 6
> >>>>>>> 962610:          <idle>-0     [000] ..s.  3245.755946: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2136920 frags: 0 dirty_tx: 2136912 tx_left: 8
> >>>>>>> 1118044:          <idle>-0     [000] ..s.  3246.740336: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 62 cur_tx: 2137726 frags: 0 dirty_tx: 2137720 tx_left: 6
> >>>>>>> 1119443:          <idle>-0     [000] ..s.  3246.750047: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 5 cur_tx: 2137733 frags: 0 dirty_tx: 2137726 tx_left: 7
> >>>>>>> 1264047:          <idle>-0     [000] ..s.  3247.709202: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 29 cur_tx: 2138525 frags: 0 dirty_tx: 2138518 tx_left: 7
> >>>>>>> 1264740:          <idle>-0     [000] ..s.  3247.718918: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 33 cur_tx: 2138529 frags: 0 dirty_tx: 2138525 tx_left: 4
> >>>>>>> 1419958:          <idle>-0     [000] ..s.  3248.736436: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 48 cur_tx: 2139312 frags: 0 dirty_tx: 2139305 tx_left: 7
> >>>>>>> 1421551:          <idle>-0     [000] ..s.  3248.746189: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 56 cur_tx: 2139320 frags: 0 dirty_tx: 2139312 tx_left: 8
> >>>>>>> 1578746:          <idle>-0     [000] ..s.  3249.742702: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 46 cur_tx: 2140142 frags: 0 dirty_tx: 2140135 tx_left: 7
> >>>>>>> 1580194:          <idle>-0     [000] ..s.  3249.752457: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 54 cur_tx: 2140150 frags: 0 dirty_tx: 2140142 tx_left: 8
> >>>>>>> 1729597:          <idle>-0     [000] ..s.  3250.746839: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 36 cur_tx: 2140964 frags: 0 dirty_tx: 2140957 tx_left: 7
> >>>>>>> 1731015:          <idle>-0     [000] ..s.  3250.756594: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 43 cur_tx: 2140971 frags: 0 dirty_tx: 2140964 tx_left: 7
> >>>>>>> 1881028:          <idle>-0     [000] ..s.  3251.740991: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 19 cur_tx: 2141779 frags: 0 dirty_tx: 2141772 tx_left: 7
> >>>>>>> 1881381:          <idle>-0     [000] ..s.  3251.750718: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 26 cur_tx: 2141786 frags: 0 dirty_tx: 2141779 tx_left: 7
> >>>>>>> //something happens, the output queue empties.. now it is fast, and rtl8169_start_xmit is not called from soft_irq
> >>>>>>> 1896178:          hping3-2871  [001] ....  3252.595994: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2141911 frags: 0 dirty_tx: 2141911 tx_left: 0
> >>>>>>> 1896196:          <idle>-0     [000] ..s.  3252.596055: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2141912 frags: 0 dirty_tx: 2141911 tx_left: 1
> >>>>>>> 1923136:          hping3-2871  [001] ....  3253.596227: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 50 cur_tx: 2142066 frags: 0 dirty_tx: 2142066 tx_left: 0
> >>>>>>> 1923154:          <idle>-0     [000] ..s.  3253.596289: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 51 cur_tx: 2142067 frags: 0 dirty_tx: 2142066 tx_left: 1
> >>>>>>> 1951633:          hping3-2871  [001] ....  3254.596459: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 5 cur_tx: 2142213 frags: 0 dirty_tx: 2142213 tx_left: 0
> >>>>>>> 1951651:          <idle>-0     [000] ..s.  3254.596521: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 6 cur_tx: 2142214 frags: 0 dirty_tx: 2142213 tx_left: 1
> >>>>>>> 1954472:          hping3-2871  [001] ....  3255.596700: rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 15 cur_tx: 2142223 frags: 0 dirty_tx: 2142223 tx_left: 0
> >>>>>>> 1954490:          <idle>-0     [000] ..s.  3255.596763: rtl8169_poll: ORP33 eth1 rtl_tx len: 54 entry: 16 cur_tx: 2142224 frags: 0 dirty_tx: 2142223 tx_left: 1
> >>>>>>> //until it soon, again is slow..
> >>>>>>>
> >>>>>>>
> >>>>>>> BR.
> >>>>>>> Risto
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 7.12.2018 16:46, Risto Pajula wrote:  
> >>>>>>>> Hello.
> >>>>>>>>
> >>>>>>>> I have been to track the poor forwarding latency to the TCP Window scale options. The Netgem device uses rather large windows scale options (x256) and I have been able to reproduce the routers poor forwarding latency also with linux box running in the internal network and changing the net.ipv4.tcp_rmem to a large value and thus changing the TCP window scaling options to larger ones. I still do not have clue why this causes the forwarfing in the linux kernel to block? Maybe something in the connection tracking....?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> With the ICMP timestamp messages I have been able to also pinpoint that the latency is caused in the eth1 sending side (the following hping3 example is run in the router toward the internal network...
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
> >>>>>>>> using eth1, addr: 192.168.0.1, MTU: 1500
> >>>>>>>> HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0 data bytes
> >>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
> >>>>>>>> icmp_seq=0 rtt=7.9 ms
> >>>>>>>> ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
> >>>>>>>> ICMP timestamp RTT tsrtt=7
> >>>>>>>>
> >>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
> >>>>>>>> icmp_seq=1 rtt=235.9 ms
> >>>>>>>> ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
> >>>>>>>> ICMP timestamp RTT tsrtt=235
> >>>>>>>>
> >>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
> >>>>>>>> icmp_seq=2 rtt=3.8 ms
> >>>>>>>> ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
> >>>>>>>> ICMP timestamp RTT tsrtt=3
> >>>>>>>>
> >>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
> >>>>>>>> icmp_seq=3 rtt=47.8 ms
> >>>>>>>> ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
> >>>>>>>> ICMP timestamp RTT tsrtt=47
> >>>>>>>>
> >>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
> >>>>>>>> icmp_seq=4 rtt=7.7 ms
> >>>>>>>> ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
> >>>>>>>> ICMP timestamp RTT tsrtt=7
> >>>>>>>>
> >>>>>>>> len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
> >>>>>>>> icmp_seq=5 rtt=3.7 ms
> >>>>>>>> ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
> >>>>>>>> ICMP timestamp RTT tsrtt=3
> >>>>>>>>
> >>>>>>>> ^C
> >>>>>>>> --- 192.168.0.112 hping statistic ---
> >>>>>>>> 6 packets tramitted, 6 packets received, 0% packet loss
> >>>>>>>> round-trip min/avg/max = 3.7/51.1/235.9 ms
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> BR.
> >>>>>>>> Risto
> >>>>>>>>
> >>>>>>>> On 2.12.2018 23:32, Risto Pajula wrote:  
> >>>>>>>>> Hello.
> >>>>>>>>>
> >>>>>>>>> You can most likely ignore the "DF Bit, mtu bug when forwarding" case. There isn't actually big IP packets on the wire, instead there is burst of packets on the wire, which are combined by the GRO... And thus dropping them should not happen. Sorry about the invalid bug report.
> >>>>>>>>>
> >>>>>>>>> However the poor latency from intenal network to the internet still remain, both GRO enabled and disabled. I will try to study further...
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> BR.
> >>>>>>>>> Risto
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 2.12.2018 14:01, Risto Pajula wrote:  
> >>>>>>>>>> Hello.
> >>>>>>>>>>
> >>>>>>>>>> I have encountered a weird performance problem in Linux IP fragmentation when using video streaming services behind the NAT. Also I have studied a possible bug in the DF bit (don't fragment) handling when forwarding the IP packets.
> >>>>>>>>>>
> >>>>>>>>>> First the system setup description:
> >>>>>>>>>>
> >>>>>>>>>> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre router]-internet
> >>>>>>>>>>
> >>>>>>>>>> where:
> >>>>>>>>>> host1: is a Netgem N7800 "cable box" for online video streaming services provided by local telco (Can access Netflix, HBO nordic, "live TV", etc.)
> >>>>>>>>>> linux router: Linux computer with Dualcore Intel Celeron G1840, running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
> >>>>>>>>>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, mtu set to 1500, RTL8169sb/8110sb
> >>>>>>>>>> eth0: Linux Routers internet facing interface, public ip address, mtu set to 1500,  RTL8168evl/8111evl
> >>>>>>>>>> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly connected to the eth0 of the Linux router.
> >>>>>>>>>>
> >>>>>>>>>> And now when using the Netgem N7800 with online video services (Netflix, HBO nordic, etc) the Linux router will receive very BIG IP packets in the eth0 upto ~20kB, this seems to lead to the following problems in the Linux IP stack.
> >>>>>>>>>>
> >>>>>>>>>> IP fragmentation performance:
> >>>>>>>>>> When the Linux router receives these large IP packets in the eth0 everything works, but it seems that them cause very large performance degradation from internal network to the internet regarding the latency when the IP fragmentation is performed. The ping latency from internal network to the internel network increases from stable 15ms-20ms up to 700-800ms AND also the ping from the internal network to the linux router eth1 (192.168.0.). However up link works perfectly, the ping is still stable when streaming the online services (From linux router to the internet). It seems that the IP fragmentation is somehow blocking the eth1 reception or transmission for very long time (which it shouldn't). I'm able to test and debug the issue further, but advice regarding where to look would be appreciated.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> DF Bit, mtu bug when forwarding:
> >>>>>>>>>> I have started to study the above mentioned problem and have found a possible bug in the DF bit and mtu handling in IP forwarding. The BIG packets received from streaming services all have the "DF bit" set and the question is that should we be forwarding them at all as that would result them being fragmented? Apparently we currently are... I have traced this down to the ip_forward.c function ip_exceeds_mtu(), and the following patch seems to fix that.
> >>>>>>>>>>
> >>>>>>>>>> --- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
> >>>>>>>>>> +++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
> >>>>>>>>>> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
> >>>>>>>>>>                    return false;
> >>>>>>>>>>
> >>>>>>>>>>            /* original fragment exceeds mtu and DF is set */
> >>>>>>>>>> -       if (unlikely(IPCB(skb)->frag_max_size > mtu))
> >>>>>>>>>> +        if (unlikely(skb->len > mtu))
> >>>>>>>>>>                    return true;
> >>>>>>>>>>
> >>>>>>>>>>            if (skb->ignore_df)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> This seems to work (in some ways) - after the change IP packets that are too large to the internal network get dropped and we are sending "ICMP Destination unreachable, The datagram is too big" messages to the originator (as we should?). However it seems that not all services really like this... Netflix behaves as expected and ping is stable from internal network to the internet, but for example HBO nordic will not work anymore (too little buffering? Retransimissions not working?). So it seems the original issue should be also fixed (And the fragmention should be allowed?).
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Any advice would be appreciated. Thanks!
> >>>>>>>>>>
> >>>>>>>>>> PS. Watching TV was not this intensive 20 years ago :)
> >>>>>>>>>>  

Did you disable ethernet flow control?  Ethernet flow control is
usually a bad idea, it can cause head of line blocking. Unfortunately,
most devices default to on.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: IP (rtl8169) forwarding bug (performance)
  2018-12-13  4:52                     ` Stephen Hemminger
@ 2018-12-13 22:10                       ` Risto Pajula
  2018-12-13 22:30                         ` Heiner Kallweit
  0 siblings, 1 reply; 14+ messages in thread
From: Risto Pajula @ 2018-12-13 22:10 UTC (permalink / raw)
  To: Stephen Hemminger, Heiner Kallweit
  Cc: David S. Miller, Alexey Kuznetsov, Realtek linux nic maintainers, netdev


On 13.12.2018 6:52, Stephen Hemminger wrote:
>
> Did you disable ethernet flow control?  Ethernet flow control is
> usually a bad idea, it can cause head of line blocking. Unfortunately,
> most devices default to on.

Disable ethernet flow control from where? The rtl8169 driver does not 
support changing the ethernet flow control settings. My switches are 
also unmanaged.

ethtool -a eth0
Pause parameters for eth0:
Cannot get device pause settings: Operation not supported

According to the rtl8169 datasheet (rev 1.21 - from 2002):

"The RTL8169 enters backoff state for the specified period of time when 
it receives a valid PAUSE packet (with pause_time=n) in full duplex 
mode. If the PAUSE packet is received while the RTL8169 is transmitting, 
the RTL8169 starts to backoff after the current transmission is 
completed. The RTL8169 is free to transmit packets when it receives a 
valid PAUSE packet (with pause_time=0000h) or the backoff timer(=n*512 
bit time) elapses."

So the transmit FIFO stalling should not be caused by this... It should 
automatically resume...

Also I have not seen any PAUSE frames in my traffic captures.


One suspicious thing is in the datasheet:

Interrupt Status Register Bit3: "Transmit (Tx) Error: This bit set to 1 
indicates that a packet transmission was aborted, due to excessive 
collisions, according to the TXRR's setting in the TCR register."

However in the TCR register definition there is nothing about "TXRR".

I could not find more recent datasheet... Could anyone provide me with 
one or where it could be obtained?

BR.
Risto

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: IP (rtl8169) forwarding bug (performance)
  2018-12-13 22:10                       ` Risto Pajula
@ 2018-12-13 22:30                         ` Heiner Kallweit
  0 siblings, 0 replies; 14+ messages in thread
From: Heiner Kallweit @ 2018-12-13 22:30 UTC (permalink / raw)
  To: Risto Pajula, Stephen Hemminger
  Cc: David S. Miller, Alexey Kuznetsov, Realtek linux nic maintainers, netdev

On 13.12.2018 23:10, Risto Pajula wrote:
> 
> On 13.12.2018 6:52, Stephen Hemminger wrote:
>>
>> Did you disable ethernet flow control?  Ethernet flow control is
>> usually a bad idea, it can cause head of line blocking. Unfortunately,
>> most devices default to on.
> 
> Disable ethernet flow control from where? The rtl8169 driver does not support changing the ethernet flow control settings. My switches are also unmanaged.
> 
> ethtool -a eth0
> Pause parameters for eth0:
> Cannot get device pause settings: Operation not supported
> 
> According to the rtl8169 datasheet (rev 1.21 - from 2002):
> 
> "The RTL8169 enters backoff state for the specified period of time when it receives a valid PAUSE packet (with pause_time=n) in full duplex mode. If the PAUSE packet is received while the RTL8169 is transmitting, the RTL8169 starts to backoff after the current transmission is completed. The RTL8169 is free to transmit packets when it receives a valid PAUSE packet (with pause_time=0000h) or the backoff timer(=n*512 bit time) elapses."
> 
> So the transmit FIFO stalling should not be caused by this... It should automatically resume...
> 
> Also I have not seen any PAUSE frames in my traffic captures.
> 
> 
> One suspicious thing is in the datasheet:
> 
> Interrupt Status Register Bit3: "Transmit (Tx) Error: This bit set to 1 indicates that a packet transmission was aborted, due to excessive collisions, according to the TXRR's setting in the TCR register."
> 
> However in the TCR register definition there is nothing about "TXRR".
> 
A later version doesn't include the comment about TXRR.

> I could not find more recent datasheet... Could anyone provide me with one or where it could be obtained?
> 
There are no public datasheets, just one or two leaked ones. You could also check against the
r8168 vendor driver whether it handles certain things differently.
But still it's strange that it's only you who seems to have this problem. Recently I fixed one
or two issues which occurred under heavy load only. And most likely the affected users would
have recognized if the issue described by you would have occurred.

> BR.
> Risto
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-12-13 22:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-02 12:01 IP fragmentation performance and don't fragment bug when forwarding Risto Pajula
2018-12-02 21:32 ` Risto Pajula
2018-12-07 14:46   ` Risto Pajula
2018-12-09 23:28     ` IP (rtl8169) forwarding bug (performance) Risto Pajula
2018-12-10 21:26       ` Heiner Kallweit
2018-12-10 22:20         ` Risto Pajula
2018-12-11 17:01           ` Risto Pajula
2018-12-11 19:51             ` Heiner Kallweit
2018-12-12  1:28               ` Risto Pajula
2018-12-12  6:23                 ` Heiner Kallweit
2018-12-12 23:20                   ` Risto Pajula
2018-12-13  4:52                     ` Stephen Hemminger
2018-12-13 22:10                       ` Risto Pajula
2018-12-13 22:30                         ` Heiner Kallweit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.