All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RPI 3B+ / TSO / lan78xx ]
       [not found] <5267da21-8f12-2750-c0c5-4ed31b03833b@gmail.com>
@ 2020-01-07 13:32 ` RENARD Pierre-Francois
  2020-01-07 17:04   ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: RENARD Pierre-Francois @ 2020-01-07 13:32 UTC (permalink / raw)
  To: nsaenzjulienne, woojung.huh, UNGLinuxDriver, netdev, linux-usb,
	stefan.wahren


Hello all

I am facing an issue related to Raspberry PI 3B+ and onboard ethernet card.

When doing a huge transfer (more than 1GB) in a row, transfer hanges and 
failed after a few minutes.


I have two ways to reproduce this issue


using NFS (v3 or v4)

     dd if=/dev/zero of=/NFSPATH/file bs=4M count=1000 status=progress


     we can see that at some point dd hangs and becomes non interrutible 
(no way to ctrl-c it or kill it)

     after afew minutes, dd dies and a bunch of NFS server not 
responding / NFS server is OK are seens into the journal


Using SCP

     dd if=/dev/zero of=/tmp/file bs=4M count=1000

     scp /tmp/file user@server:/directory


     scp hangs after 1GB and after a few minutes scp is failing with 
message "client_loop: send disconnect: Broken pipe lostconnection"




It appears, this is a known bug relatted to TCP Segmentation Offload & 
Selective Acknowledge.

disabling this TSO (ethtool -K eth0 tso off & ethtool -K eth0 gso off) 
solves the issue.

A patch has been created to disable the feature by default by the 
raspberry team and is by default applied wihtin raspbian.

comment from the patch :

/* TSO seems to be having some issue with Selective Acknowledge (SACK) that
  * results in lost data never being retransmitted.
  * Disable it by default now, but adds a module parameter to enable it for
  * debug purposes (the full cause is not currently understood).
  */


For reference you can find

a link to the issue I created yesterday : 
https://github.com/raspberrypi/linux/issues/3395

links to raspberry dev team : 
https://github.com/raspberrypi/linux/issues/2482 & 
https://github.com/raspberrypi/linux/issues/2449



If you need me to test things, or give you more informations, I ll be 
pleased to help.



Fox


PS : this is a resent in with plain text because vger rejected the first 
one with html formating ...:)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RPI 3B+ / TSO / lan78xx ]
  2020-01-07 13:32 ` [RPI 3B+ / TSO / lan78xx ] RENARD Pierre-Francois
@ 2020-01-07 17:04   ` Eric Dumazet
  2020-01-07 17:21     ` Eric Dumazet
  2020-01-07 17:30     ` Stefan Wahren
  0 siblings, 2 replies; 5+ messages in thread
From: Eric Dumazet @ 2020-01-07 17:04 UTC (permalink / raw)
  To: RENARD Pierre-Francois, nsaenzjulienne, woojung.huh,
	UNGLinuxDriver, netdev, linux-usb, stefan.wahren



On 1/7/20 5:32 AM, RENARD Pierre-Francois wrote:
> 
> Hello all
> 
> I am facing an issue related to Raspberry PI 3B+ and onboard ethernet card.
> 
> When doing a huge transfer (more than 1GB) in a row, transfer hanges and failed after a few minutes.
> 
> 
> I have two ways to reproduce this issue
> 
> 
> using NFS (v3 or v4)
> 
>     dd if=/dev/zero of=/NFSPATH/file bs=4M count=1000 status=progress
> 
> 
>     we can see that at some point dd hangs and becomes non interrutible (no way to ctrl-c it or kill it)
> 
>     after afew minutes, dd dies and a bunch of NFS server not responding / NFS server is OK are seens into the journal
> 
> 
> Using SCP
> 
>     dd if=/dev/zero of=/tmp/file bs=4M count=1000
> 
>     scp /tmp/file user@server:/directory
> 
> 
>     scp hangs after 1GB and after a few minutes scp is failing with message "client_loop: send disconnect: Broken pipe lostconnection"
> 
> 
> 
> 
> It appears, this is a known bug relatted to TCP Segmentation Offload & Selective Acknowledge.
> 
> disabling this TSO (ethtool -K eth0 tso off & ethtool -K eth0 gso off) solves the issue.
> 
> A patch has been created to disable the feature by default by the raspberry team and is by default applied wihtin raspbian.
> 
> comment from the patch :
> 
> /* TSO seems to be having some issue with Selective Acknowledge (SACK) that
>  * results in lost data never being retransmitted.
>  * Disable it by default now, but adds a module parameter to enable it for
>  * debug purposes (the full cause is not currently understood).
>  */
> 
> 
> For reference you can find
> 
> a link to the issue I created yesterday : https://github.com/raspberrypi/linux/issues/3395
> 
> links to raspberry dev team : https://github.com/raspberrypi/linux/issues/2482 & https://github.com/raspberrypi/linux/issues/2449
> 
> 
> 
> If you need me to test things, or give you more informations, I ll be pleased to help.
>


I doubt TSO and SACK have a serious generic bug like that.

Most likely the TSO implementation on the driver/NIC has a bug .

Anyway you do not provide a kernel version, I am not sure what you expect from us.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RPI 3B+ / TSO / lan78xx ]
  2020-01-07 17:04   ` Eric Dumazet
@ 2020-01-07 17:21     ` Eric Dumazet
  2020-01-07 17:30     ` Stefan Wahren
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2020-01-07 17:21 UTC (permalink / raw)
  To: RENARD Pierre-Francois, nsaenzjulienne, woojung.huh,
	UNGLinuxDriver, netdev, linux-usb, stefan.wahren



On 1/7/20 9:04 AM, Eric Dumazet wrote:
> 
> 
> On 1/7/20 5:32 AM, RENARD Pierre-Francois wrote:
>>
>> Hello all
>>
>> I am facing an issue related to Raspberry PI 3B+ and onboard ethernet card.
>>
>> When doing a huge transfer (more than 1GB) in a row, transfer hanges and failed after a few minutes.
>>
>>
>> I have two ways to reproduce this issue
>>
>>
>> using NFS (v3 or v4)
>>
>>     dd if=/dev/zero of=/NFSPATH/file bs=4M count=1000 status=progress
>>
>>
>>     we can see that at some point dd hangs and becomes non interrutible (no way to ctrl-c it or kill it)
>>
>>     after afew minutes, dd dies and a bunch of NFS server not responding / NFS server is OK are seens into the journal
>>
>>
>> Using SCP
>>
>>     dd if=/dev/zero of=/tmp/file bs=4M count=1000
>>
>>     scp /tmp/file user@server:/directory
>>
>>
>>     scp hangs after 1GB and after a few minutes scp is failing with message "client_loop: send disconnect: Broken pipe lostconnection"
>>
>>
>>
>>
>> It appears, this is a known bug relatted to TCP Segmentation Offload & Selective Acknowledge.
>>
>> disabling this TSO (ethtool -K eth0 tso off & ethtool -K eth0 gso off) solves the issue.
>>
>> A patch has been created to disable the feature by default by the raspberry team and is by default applied wihtin raspbian.
>>
>> comment from the patch :
>>
>> /* TSO seems to be having some issue with Selective Acknowledge (SACK) that
>>  * results in lost data never being retransmitted.
>>  * Disable it by default now, but adds a module parameter to enable it for
>>  * debug purposes (the full cause is not currently understood).
>>  */
>>
>>
>> For reference you can find
>>
>> a link to the issue I created yesterday : https://github.com/raspberrypi/linux/issues/3395
>>
>> links to raspberry dev team : https://github.com/raspberrypi/linux/issues/2482 & https://github.com/raspberrypi/linux/issues/2449
>>
>>
>>
>> If you need me to test things, or give you more informations, I ll be pleased to help.
>>
> 
> 
> I doubt TSO and SACK have a serious generic bug like that.
> 
> Most likely the TSO implementation on the driver/NIC has a bug .
> 
> Anyway you do not provide a kernel version, I am not sure what you expect from us.
> 

Oh well, drivers/net/usb/lan78xx.c is horribly buggy.

It wants linear skbs, which is likely to fail with too big packets.

And if skb linearization fails, skb is not freed, so a big memory leak happens.

Please try this patch :

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index f940dc6485e56a7e8f905082ce920f5dd83232b0..5e2d3c8c34dc8d8ac6f2ab3fd8a59dba5b348882 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2724,11 +2724,6 @@ static int lan78xx_stop(struct net_device *net)
        return 0;
 }
 
-static int lan78xx_linearize(struct sk_buff *skb)
-{
-       return skb_linearize(skb);
-}
-
 static struct sk_buff *lan78xx_tx_prep(struct lan78xx_net *dev,
                                       struct sk_buff *skb, gfp_t flags)
 {
@@ -2740,8 +2735,10 @@ static struct sk_buff *lan78xx_tx_prep(struct lan78xx_net *dev,
                return NULL;
        }
 
-       if (lan78xx_linearize(skb) < 0)
+       if (skb_linearize(skb)) {
+               dev_kfree_skb_any(skb);
                return NULL;
+       }
 
        tx_cmd_a = (u32)(skb->len & TX_CMD_A_LEN_MASK_) | TX_CMD_A_FCS_;
 
@@ -3790,6 +3787,9 @@ static int lan78xx_probe(struct usb_interface *intf,
        if (ret < 0)
                goto out4;
 
+       /* since we want linear skb, avoid high-order allocations */
+       netif_set_gso_max_size(netdev, SKB_WITH_OVERHEAD(16000));
+
        ret = register_netdev(netdev);
        if (ret != 0) {
                netif_err(dev, probe, netdev, "couldn't register the device\n");

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RPI 3B+ / TSO / lan78xx ]
  2020-01-07 17:04   ` Eric Dumazet
  2020-01-07 17:21     ` Eric Dumazet
@ 2020-01-07 17:30     ` Stefan Wahren
  2020-01-07 18:06       ` Eric Dumazet
  1 sibling, 1 reply; 5+ messages in thread
From: Stefan Wahren @ 2020-01-07 17:30 UTC (permalink / raw)
  To: Eric Dumazet, RENARD Pierre-Francois, nsaenzjulienne,
	woojung.huh, UNGLinuxDriver, netdev, linux-usb

Hi Eric,

Am 07.01.20 um 18:04 schrieb Eric Dumazet:
>
> On 1/7/20 5:32 AM, RENARD Pierre-Francois wrote:
>> Hello all
>>
>> I am facing an issue related to Raspberry PI 3B+ and onboard ethernet card.
>>
>> When doing a huge transfer (more than 1GB) in a row, transfer hanges and failed after a few minutes.
>>
>>
>> I have two ways to reproduce this issue
>>
>>
>> using NFS (v3 or v4)
>>
>>     dd if=/dev/zero of=/NFSPATH/file bs=4M count=1000 status=progress
>>
>>
>>     we can see that at some point dd hangs and becomes non interrutible (no way to ctrl-c it or kill it)
>>
>>     after afew minutes, dd dies and a bunch of NFS server not responding / NFS server is OK are seens into the journal
>>
>>
>> Using SCP
>>
>>     dd if=/dev/zero of=/tmp/file bs=4M count=1000
>>
>>     scp /tmp/file user@server:/directory
>>
>>
>>     scp hangs after 1GB and after a few minutes scp is failing with message "client_loop: send disconnect: Broken pipe lostconnection"
>>
>>
>>
>>
>> It appears, this is a known bug relatted to TCP Segmentation Offload & Selective Acknowledge.
>>
>> disabling this TSO (ethtool -K eth0 tso off & ethtool -K eth0 gso off) solves the issue.
>>
>> A patch has been created to disable the feature by default by the raspberry team and is by default applied wihtin raspbian.
>>
>> comment from the patch :
>>
>> /* TSO seems to be having some issue with Selective Acknowledge (SACK) that
>>  * results in lost data never being retransmitted.
>>  * Disable it by default now, but adds a module parameter to enable it for
>>  * debug purposes (the full cause is not currently understood).
>>  */
>>
>>
>> For reference you can find
>>
>> a link to the issue I created yesterday : https://github.com/raspberrypi/linux/issues/3395
>>
>> links to raspberry dev team : https://github.com/raspberrypi/linux/issues/2482 & https://github.com/raspberrypi/linux/issues/2449
>>
>>
>>
>> If you need me to test things, or give you more informations, I ll be pleased to help.
>>
>
> I doubt TSO and SACK have a serious generic bug like that.
>
> Most likely the TSO implementation on the driver/NIC has a bug .

Yes, the issue isn't reproducible with the Raspberry Pi 3B and the same
kernel (without +). The main difference between both boards is the
different ethernet USB chip:

Raspberry Pi 3B: smsc95xx
Raspberry Pi 3B+: lan78xx

>
> Anyway you do not provide a kernel version, I am not sure what you expect from us.

It's Linux 5.4.7 (arm64) as in the provided github link. I asked
Pierre-Francois to report this issue here, so the issue get addressed
properly. Currently this very old bug not fixed in mainline and the
Raspberry Pi vendor tree uses a workaround (disable TSO).

Stefan



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RPI 3B+ / TSO / lan78xx ]
  2020-01-07 17:30     ` Stefan Wahren
@ 2020-01-07 18:06       ` Eric Dumazet
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2020-01-07 18:06 UTC (permalink / raw)
  To: Stefan Wahren, Eric Dumazet, RENARD Pierre-Francois,
	nsaenzjulienne, woojung.huh, UNGLinuxDriver, netdev, linux-usb



On 1/7/20 9:30 AM, Stefan Wahren wrote:
> Hi Eric,
> 
> Am 07.01.20 um 18:04 schrieb Eric Dumazet:
>>

>>
>> I doubt TSO and SACK have a serious generic bug like that.
>>
>> Most likely the TSO implementation on the driver/NIC has a bug .
> 
> Yes, the issue isn't reproducible with the Raspberry Pi 3B and the same
> kernel (without +). The main difference between both boards is the
> different ethernet USB chip:
> 
> Raspberry Pi 3B: smsc95xx
> Raspberry Pi 3B+: lan78xx
> 
>>
>> Anyway you do not provide a kernel version, I am not sure what you expect from us.
> 
> It's Linux 5.4.7 (arm64) as in the provided github link. I asked
> Pierre-Francois to report this issue here, so the issue get addressed
> properly. Currently this very old bug not fixed in mainline and the
> Raspberry Pi vendor tree uses a workaround (disable TSO).

This is puzzling.

Bug seems trivial enough :/


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-01-07 18:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5267da21-8f12-2750-c0c5-4ed31b03833b@gmail.com>
2020-01-07 13:32 ` [RPI 3B+ / TSO / lan78xx ] RENARD Pierre-Francois
2020-01-07 17:04   ` Eric Dumazet
2020-01-07 17:21     ` Eric Dumazet
2020-01-07 17:30     ` Stefan Wahren
2020-01-07 18:06       ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.