netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* r8169: Performance regression and latency instability
@ 2019-08-16 12:09 Juliana Rodrigueiro
  2019-08-16 12:35 ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Juliana Rodrigueiro @ 2019-08-16 12:09 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, hkallweit1

Greetings!

During migration from kernel 3.14 to 4.19, we noticed a regression on 
the network performance. Under the exact same circumstances, the 
standard deviation of the latency is more than double than before on the 
Realtek RTL8111/8168B (10ec:8168) using the r8169 driver.

Kernel 3.14:
     # netperf -v 2 -P 0 -H <netserver-IP>,4 -I 99,5 -t omni -l 1 -- -O 
STDDEV_LATENCY -m 64K -d Send
     313.37

Kernel 4.19:
     # netperf -v 2 -P 0 -H <netserver-IP>,4 -I 99,5 -t omni -l 1 -- -O 
STDDEV_LATENCY -m 64K -d Send
     632.96

In contrast, we noticed small improvements in performance with other 
non-Realtek network cards (igb, tg3). Which suggested a possible driver 
related bug.

However after bisecting the code, I ended up with the following patch, 
which was introduced in kernel 4.17 and modifies net/ipv4:

     commit 0a6b2a1dc2a2105f178255fe495eb914b09cb37a
     Author: Eric Dumazet <edumazet@google.com>
     Date:   Mon Feb 19 11:56:47 2018 -0800

         tcp: switch to GSO being always on

Could you please help me to clarify, should GSO be always on on my 
device? Or does it just affect TCP? According to ethtool it is always 
off, "ethtool -K eth0 gso on" has no effect, unless I switch SG on.

     # ethtool -k eth0
     Offload parameters for eth0:
     Cannot get device udp large send offload settings: Operation not 
supported
     rx-checksumming: on
     tx-checksumming: off
     scatter-gather: off
     tcp-segmentation-offload: off
     udp-fragmentation-offload: off
     generic-segmentation-offload: off
     generic-receive-offload: on
     large-receive-offload: off

I validated that reverting "tcp: switch to GSO being always on" 
successfully brings back the better performance for the r8169 driver.

I'm sure that reverting that commit is not the optimal solution, so I 
would like to kindly ask for help to shed some light in this issue.

Best regards,
Juliana.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: r8169: Performance regression and latency instability
  2019-08-16 12:09 r8169: Performance regression and latency instability Juliana Rodrigueiro
@ 2019-08-16 12:35 ` Eric Dumazet
  2019-08-16 13:59   ` Holger Hoffstätte
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2019-08-16 12:35 UTC (permalink / raw)
  To: Juliana Rodrigueiro, netdev; +Cc: edumazet, hkallweit1



On 8/16/19 2:09 PM, Juliana Rodrigueiro wrote:
> Greetings!
> 
> During migration from kernel 3.14 to 4.19, we noticed a regression on the network performance. Under the exact same circumstances, the standard deviation of the latency is more than double than before on the Realtek RTL8111/8168B (10ec:8168) using the r8169 driver.
> 
> Kernel 3.14:
>     # netperf -v 2 -P 0 -H <netserver-IP>,4 -I 99,5 -t omni -l 1 -- -O STDDEV_LATENCY -m 64K -d Send
>     313.37
> 
> Kernel 4.19:
>     # netperf -v 2 -P 0 -H <netserver-IP>,4 -I 99,5 -t omni -l 1 -- -O STDDEV_LATENCY -m 64K -d Send
>     632.96
> 
> In contrast, we noticed small improvements in performance with other non-Realtek network cards (igb, tg3). Which suggested a possible driver related bug.
> 
> However after bisecting the code, I ended up with the following patch, which was introduced in kernel 4.17 and modifies net/ipv4:
> 
>     commit 0a6b2a1dc2a2105f178255fe495eb914b09cb37a
>     Author: Eric Dumazet <edumazet@google.com>
>     Date:   Mon Feb 19 11:56:47 2018 -0800
> 
>         tcp: switch to GSO being always on
> 
> Could you please help me to clarify, should GSO be always on on my device? Or does it just affect TCP? According to ethtool it is always off, "ethtool -K eth0 gso on" has no effect, unless I switch SG on.
> 
>     # ethtool -k eth0
>     Offload parameters for eth0:
>     Cannot get device udp large send offload settings: Operation not supported
>     rx-checksumming: on
>     tx-checksumming: off
>     scatter-gather: off
>     tcp-segmentation-offload: off
>     udp-fragmentation-offload: off
>     generic-segmentation-offload: off
>     generic-receive-offload: on
>     large-receive-offload: off
> 
> I validated that reverting "tcp: switch to GSO being always on" successfully brings back the better performance for the r8169 driver.
> 
> I'm sure that reverting that commit is not the optimal solution, so I would like to kindly ask for help to shed some light in this issue.

Hi Juliana

I am sure that all commits done in TCP stack can show a regression on a particular
combination of packet sizes, MTU size, NIC, and measured metric.

Basically if your NIC does not support SG and TSO, there is a possibility
that the changes we did to enter the era of 100Gbit and 200Gbit NIC might
hurt a bit.

Lack of SG means that the lower stack might have to perform memory  allocations
to perform the segmentation and this might be slow (or even fail) under memory pressure.

I have no idea why you can even turn on SG, if it is turned off by default.

Please give us more information on the NIC

ethtool -i eth0 ; ifconfig eth0

Possibly try to use a recent ethtool, it seems yours is pretty old.

I also see this relevant commit : I have no idea why SG would have any relation with TSO.

commit a7eb6a4f2560d5ae64bfac98d79d11378ca2de6c
Author: Holger Hoffstätte <holger@applied-asynchrony.com>
Date:   Fri Aug 9 00:02:40 2019 +0200

    r8169: fix performance issue on RTL8168evl
    
    Disabling TSO but leaving SG active results is a significant
    performance drop. Therefore disable also SG on RTL8168evl.
    This restores the original performance.
    
    Fixes: 93681cd7d94f ("r8169: enable HW csum and TSO")
    Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com>
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index b2a275d8504c..912bd41eaa1b 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -6898,9 +6898,9 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
        /* RTL8168e-vl has a HW issue with TSO */
        if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
-               dev->vlan_features &= ~NETIF_F_ALL_TSO;
-               dev->hw_features &= ~NETIF_F_ALL_TSO;
-               dev->features &= ~NETIF_F_ALL_TSO;
+               dev->vlan_features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG);
+               dev->hw_features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG);
+               dev->features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG);
        }
 
        dev->hw_features |= NETIF_F_RXALL;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: r8169: Performance regression and latency instability
  2019-08-16 12:35 ` Eric Dumazet
@ 2019-08-16 13:59   ` Holger Hoffstätte
  2019-08-16 19:12     ` Heiner Kallweit
  0 siblings, 1 reply; 6+ messages in thread
From: Holger Hoffstätte @ 2019-08-16 13:59 UTC (permalink / raw)
  To: Eric Dumazet, Juliana Rodrigueiro, netdev; +Cc: hkallweit1

On 8/16/19 2:35 PM, Eric Dumazet wrote:
..snip..
> I also see this relevant commit : I have no idea why SG would have any relation with TSO.
> 
> commit a7eb6a4f2560d5ae64bfac98d79d11378ca2de6c
> Author: Holger Hoffstätte <holger@applied-asynchrony.com>
> Date:   Fri Aug 9 00:02:40 2019 +0200
> 
>      r8169: fix performance issue on RTL8168evl
>      
>      Disabling TSO but leaving SG active results is a significant
>      performance drop. Therefore disable also SG on RTL8168evl.
>      This restores the original performance.
>      
>      Fixes: 93681cd7d94f ("r8169: enable HW csum and TSO")
>      Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com>
>      Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>      Signed-off-by: David S. Miller <davem@davemloft.net>

It does not - and admittedly none of this makes sense, but stay with me here.

The commit 93681cd7d94f to net-next enabled rx/tx HW checksumming and TSO
by default, but disabled TSO for one specific chip revision - the most popular
one, of course. Enabling rx/tx checksums by default while leaving SG on turned
out to be the performance issue (~780 MBit max) that I found & fixed in the
quoted commit. SG *can* be enabled when rx/tx checkusmming is *dis*abled
(I just verified again), we just had to sanitize the new default.

An alternative strategy could still be to (again?) disable everything by default
and just let people manually enable whatever settings work for their random
chip revision + BIOS combination. I'll let Heiner chime in here.

Basically these chips are dumpster fires and should not be used for anything
ever, which of course means they are everywhere.

AFAICT none of this has anything to do with Juliana's problem..

-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: r8169: Performance regression and latency instability
  2019-08-16 13:59   ` Holger Hoffstätte
@ 2019-08-16 19:12     ` Heiner Kallweit
  2019-08-19 16:04       ` Juliana Rodrigueiro
  0 siblings, 1 reply; 6+ messages in thread
From: Heiner Kallweit @ 2019-08-16 19:12 UTC (permalink / raw)
  To: Holger Hoffstätte, Eric Dumazet, Juliana Rodrigueiro, netdev

On 16.08.2019 15:59, Holger Hoffstätte wrote:
> On 8/16/19 2:35 PM, Eric Dumazet wrote:
> ..snip..
>> I also see this relevant commit : I have no idea why SG would have any relation with TSO.
>>
>> commit a7eb6a4f2560d5ae64bfac98d79d11378ca2de6c
>> Author: Holger Hoffstätte <holger@applied-asynchrony.com>
>> Date:   Fri Aug 9 00:02:40 2019 +0200
>>
>>      r8169: fix performance issue on RTL8168evl
>>           Disabling TSO but leaving SG active results is a significant
>>      performance drop. Therefore disable also SG on RTL8168evl.
>>      This restores the original performance.
>>           Fixes: 93681cd7d94f ("r8169: enable HW csum and TSO")
>>      Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com>
>>      Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>      Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> It does not - and admittedly none of this makes sense, but stay with me here.
> 
> The commit 93681cd7d94f to net-next enabled rx/tx HW checksumming and TSO
> by default, but disabled TSO for one specific chip revision - the most popular
> one, of course. Enabling rx/tx checksums by default while leaving SG on turned
> out to be the performance issue (~780 MBit max) that I found & fixed in the
> quoted commit. SG *can* be enabled when rx/tx checkusmming is *dis*abled
> (I just verified again), we just had to sanitize the new default.
> 
> An alternative strategy could still be to (again?) disable everything by default
> and just let people manually enable whatever settings work for their random
> chip revision + BIOS combination. I'll let Heiner chime in here.
> 
> Basically these chips are dumpster fires and should not be used for anything
> ever, which of course means they are everywhere.
> 
> AFAICT none of this has anything to do with Juliana's problem..
> 
Indeed, here we're talking about changes in linux-next, and Juliana's issue is
with 4.19. However I'd appreciate if Juliana could test with linux-next and
different combinations of the NETIF_F_xxx features.

I have no immediate idea why the referenced GSO change affects r8169 but not
other chips / drivers.

> -h
> 
Heiner

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: r8169: Performance regression and latency instability
  2019-08-16 19:12     ` Heiner Kallweit
@ 2019-08-19 16:04       ` Juliana Rodrigueiro
  2019-09-06 11:25         ` Juliana Rodrigueiro
  0 siblings, 1 reply; 6+ messages in thread
From: Juliana Rodrigueiro @ 2019-08-19 16:04 UTC (permalink / raw)
  To: Heiner Kallweit, Holger Hoffstätte, Eric Dumazet, netdev

Hi!

First of all: Thank you everyone for the input.

Here is some more info about my NIC. (Using the latest ethtool)

# ethtool -i eth0 ; ifconfig eth0
driver: r8169
version:
firmware-version: rtl8168h-2_0.0.2 02/26/15
expansion-rom-version:
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
eth0      Link encap:Ethernet  HWaddr <hidden>
           inet addr:<hidden>  Bcast:<hidden>  Mask:255.255.0.0
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:27392501 errors:0 dropped:0 overruns:0 frame:0
           TX packets:24647212 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:33702173568 (31.3 GiB)  TX bytes:35865124147 (33.4 GiB)


On 8/16/19 9:12 PM, Heiner Kallweit wrote:

> Indeed, here we're talking about changes in linux-next, and Juliana's issue is
> with 4.19. However I'd appreciate if Juliana could test with linux-next and
> different combinations of the NETIF_F_xxx features.

I also tested the latest linux-next (20190819) and the results did not
improved for me, unfortunately. About the same as all the kernel
versions I tested from 4.17 onwards.

# netperf -v 2 -P 0 -H <netserver-ip>,4 -I 99,5 -t omni -l 1 -- -O 
STDDEV_LATENCY -m 64K -d Send
627.99

Running linux-next I have the following defaults (shortened for simplicity):

# ethtool -k eth0
Features for eth0:
rx-checksumming: on
tx-checksumming: on
         tx-checksum-ipv4: on
         tx-checksum-ip-generic: off [fixed]
         tx-checksum-ipv6: on
         tx-checksum-fcoe-crc: off [fixed]
         tx-checksum-sctp: off [fixed]
scatter-gather: on
         tx-scatter-gather: on
         tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
         tx-tcp-segmentation: on
         tx-tcp-ecn-segmentation: off [fixed]
         tx-tcp-mangleid-segmentation: off
         tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
... (all off from here)


There are quite a few possible combinations to go through. I executed my
test with SG, TSO, GSO, RX, TX individually disabled, but the results
were all the same or slightly worse.

Until I find the root cause, we will have to keep the "tcp: switch to
GSO being always on" patch reverted for production, which is not ideal.

Any other ideas how I could debug this issue?


Best regards,
Juliana.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: r8169: Performance regression and latency instability
  2019-08-19 16:04       ` Juliana Rodrigueiro
@ 2019-09-06 11:25         ` Juliana Rodrigueiro
  0 siblings, 0 replies; 6+ messages in thread
From: Juliana Rodrigueiro @ 2019-09-06 11:25 UTC (permalink / raw)
  To: Heiner Kallweit, Holger Hoffstätte, Eric Dumazet, netdev
  Cc: Thomas Jarosch, Gerd v. Egidy

Hi all!

I would like to kindly ask your help to understand this subject, since my
familiarity with the network stack is limited. I'm still
trying to find a solution for the latency problems with Realtek cards I
reported earlier.

Why does the GSO have to be forced on the socket level
if drivers for high performance chips will have it enabled by default?

A consequence of forcing GSO is that TSO is also forced [1] in
sk_setup_caps() via the NETIF_F_GSO_SOFTWARE flag (list of features with
software fallbacks). I'm sorry for my ignorance, but I wonder if TSO
will really be done in software in case the card does have NETIF_F_TSO
capability but "might not work properly" [2]. (Plus tx-checksum and SG
are all off)

Effectively for me, the following patch shows the same good performance
as the 3.14 kernel (or kernel 4.19 with reverted "tcp: switch to GSO 
being always on").

diff --git a/net/core/sock.c b/net/core/sock.c
index 9c32e8eb64da..d792d12e0f66 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1770,7 +1770,7 @@ void sk_setup_caps(struct sock *sk, struct 
dst_entry *dst)
         sk_dst_set(sk, dst);
         sk->sk_route_caps = dst->dev->features | sk->sk_route_forced_caps;
         if (sk->sk_route_caps & NETIF_F_GSO)
-               sk->sk_route_caps |= NETIF_F_GSO_SOFTWARE;
+               sk->sk_route_caps |= (NETIF_F_GSO_SOFTWARE & 
~NETIF_F_ALL_TSO);
         sk->sk_route_caps &= ~sk->sk_route_nocaps;
         if (sk_can_gso(sk)) {
                 if (dst->header_len && !xfrm_dst_offload_ok(dst)) {


Any help is highly appreciated.

Best regards,
Juliana.


[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/core/sock.c?h=linux-4.19.y#n1773
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/realtek/r8169.c?h=linux-4.19.y#n7590

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-09-06 11:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16 12:09 r8169: Performance regression and latency instability Juliana Rodrigueiro
2019-08-16 12:35 ` Eric Dumazet
2019-08-16 13:59   ` Holger Hoffstätte
2019-08-16 19:12     ` Heiner Kallweit
2019-08-19 16:04       ` Juliana Rodrigueiro
2019-09-06 11:25         ` Juliana Rodrigueiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).