linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drivers/net/usb/r8152 fix broken rx checksums
@ 2016-10-26 22:36 Mark Lord
  2016-10-26 22:54 ` Mark Lord
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Mark Lord @ 2016-10-26 22:36 UTC (permalink / raw)
  To: nic_swsd, netdev, Linux Kernel; +Cc: stable

[-- Attachment #1: Type: text/plain, Size: 1384 bytes --]

The r8152 driver has been broken since (approx) 3.6.16,
when support was added for hardware rx checksum on newer chip versions.
Symptoms include random segfaults and silent data corruption over NFS.

This does not work on the VER_02 dongle I have here
when used with a slow embedded system CPU.
Google reveals others reporting similar issues on Raspberry Pi.

So, disable hardware rx checksum for VER_02, and fix
an obvious coding error for IPV6 checksums in the same function.

Because this bug results in silent data corruption,
it is a good candidate for back-porting to -stable >= 3.16.xx.
Patch attached (to deal with buggy mailer) and also below for review.

Signed-off-by: Mark Lord <mlord@pobox.com>

--- old/drivers/net/usb/r8152.c	2016-09-30 04:20:43.000000000 -0400
+++ linux/drivers/net/usb/r8152.c	2016-10-26 14:15:44.932517676 -0400
@@ -1645,7 +1645,7 @@
  	u8 checksum = CHECKSUM_NONE;
  	u32 opts2, opts3;

-	if (tp->version == RTL_VER_01)
+	if (tp->version == RTL_VER_01 || tp->version == RTL_VER_02)
  		goto return_result;

  	opts2 = le32_to_cpu(rx_desc->opts2);
@@ -1660,7 +1660,7 @@
  			checksum = CHECKSUM_NONE;
  		else
  			checksum = CHECKSUM_UNNECESSARY;
-	} else if (RD_IPV6_CS) {
+	} else if (opts2 & RD_IPV6_CS) {
  		if ((opts2 & RD_UDP_CS) && !(opts3 & UDPF))
  			checksum = CHECKSUM_UNNECESSARY;
  		else if ((opts2 & RD_TCP_CS) && !(opts3 & TCPF))

[-- Attachment #2: drivers_net_usb_r8152_checksums.patch --]
[-- Type: text/x-patch, Size: 1306 bytes --]

The r8152 driver has been broken since (approx) 3.6.16,
when support was added for hardware rx checksum on newer chip versions.
Symptoms include random segfaults and silent data corruption over NFS.

This does not work on the VER_02 dongle I have here
when used with a slow embedded system CPU.
Google reveals others reporting similar issues on Raspberry Pi.

So, disable hardware rx checksum for VER_02, and fix
an obvious coding error for IPV6 checksums in the same function.

Because this bug results in silent data corruption,
it is a good candidate for back-porting to -stable >= 3.16.xx.

Signed-off-by: Mark Lord <mlord@pobox.com>

--- old/drivers/net/usb/r8152.c	2016-09-30 04:20:43.000000000 -0400
+++ linux/drivers/net/usb/r8152.c	2016-10-26 14:15:44.932517676 -0400
@@ -1645,7 +1645,7 @@
 	u8 checksum = CHECKSUM_NONE;
 	u32 opts2, opts3;
 
-	if (tp->version == RTL_VER_01)
+	if (tp->version == RTL_VER_01 || tp->version == RTL_VER_02)
 		goto return_result;
 
 	opts2 = le32_to_cpu(rx_desc->opts2);
@@ -1660,7 +1660,7 @@
 			checksum = CHECKSUM_NONE;
 		else
 			checksum = CHECKSUM_UNNECESSARY;
-	} else if (RD_IPV6_CS) {
+	} else if (opts2 & RD_IPV6_CS) {
 		if ((opts2 & RD_UDP_CS) && !(opts3 & UDPF))
 			checksum = CHECKSUM_UNNECESSARY;
 		else if ((opts2 & RD_TCP_CS) && !(opts3 & TCPF))

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drivers/net/usb/r8152 fix broken rx checksums
  2016-10-26 22:36 [PATCH] drivers/net/usb/r8152 fix broken rx checksums Mark Lord
@ 2016-10-26 22:54 ` Mark Lord
  2016-10-30 21:22 ` David Miller
  2016-10-30 23:28 ` [PATCH net] r8152: Fix broken RX checksums Mark Lord
  2 siblings, 0 replies; 18+ messages in thread
From: Mark Lord @ 2016-10-26 22:54 UTC (permalink / raw)
  To: nic_swsd, netdev, Linux Kernel; +Cc: stable

On 16-10-26 06:36 PM, Mark Lord wrote:
> The r8152 driver has been broken since (approx) 3.6.16,

Correction:  broken since 3.16.xx.

> when support was added for hardware rx checksum on newer chip versions.
> Symptoms include random segfaults and silent data corruption over NFS.
>
> This does not work on the VER_02 dongle I have here
> when used with a slow embedded system CPU.
> Google reveals others reporting similar issues on Raspberry Pi.
>
> So, disable hardware rx checksum for VER_02, and fix
> an obvious coding error for IPV6 checksums in the same function.
>
> Because this bug results in silent data corruption,
> it is a good candidate for back-porting to -stable >= 3.16.xx.
> Patch attached (to deal with buggy mailer) and also below for review.
>
> Signed-off-by: Mark Lord <mlord@pobox.com>
>
> --- old/drivers/net/usb/r8152.c    2016-09-30 04:20:43.000000000 -0400
> +++ linux/drivers/net/usb/r8152.c    2016-10-26 14:15:44.932517676 -0400
> @@ -1645,7 +1645,7 @@
>      u8 checksum = CHECKSUM_NONE;
>      u32 opts2, opts3;
>
> -    if (tp->version == RTL_VER_01)
> +    if (tp->version == RTL_VER_01 || tp->version == RTL_VER_02)
>          goto return_result;
>
>      opts2 = le32_to_cpu(rx_desc->opts2);
> @@ -1660,7 +1660,7 @@
>              checksum = CHECKSUM_NONE;
>          else
>              checksum = CHECKSUM_UNNECESSARY;
> -    } else if (RD_IPV6_CS) {
> +    } else if (opts2 & RD_IPV6_CS) {
>          if ((opts2 & RD_UDP_CS) && !(opts3 & UDPF))
>              checksum = CHECKSUM_UNNECESSARY;
>          else if ((opts2 & RD_TCP_CS) && !(opts3 & TCPF))


-- 
Mark Lord
Real-Time Remedies Inc.
mlord@pobox.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drivers/net/usb/r8152 fix broken rx checksums
  2016-10-26 22:36 [PATCH] drivers/net/usb/r8152 fix broken rx checksums Mark Lord
  2016-10-26 22:54 ` Mark Lord
@ 2016-10-30 21:22 ` David Miller
  2016-10-30 23:48   ` Paul Bolle
  2016-10-30 23:28 ` [PATCH net] r8152: Fix broken RX checksums Mark Lord
  2 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2016-10-30 21:22 UTC (permalink / raw)
  To: mlord; +Cc: nic_swsd, netdev, linux-kernel, stable

From: Mark Lord <mlord@pobox.com>
Date: Wed, 26 Oct 2016 18:36:57 -0400

> Patch attached (to deal with buggy mailer) and also below for review.

Please make your mailer work properly so that you can submit
patches properly which work inline, just like every other developer
does for the kernel.

Also please format your Subject line properly, it must be of the
form:

	[PATCH net] r8152: Fix broken RX checksums.

The important parts are:

	1) "[PATCH net]"  This says that it is a patch, and that it
	   is targetting the 'net' GIT tree specifically.

	2) "r8152: "  This indicates the "subsystem" that the patch
	   specifically targets, in this case the r8152 driver.  It
	   must end with a colon character then a space.

	3) "Fix broken RX checksums."  Commit header lines and commit
	   messages are proper English, therefore sentences should
	   begin with a capitalized letter and end with a period.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net] r8152: Fix broken RX checksums.
  2016-10-26 22:36 [PATCH] drivers/net/usb/r8152 fix broken rx checksums Mark Lord
  2016-10-26 22:54 ` Mark Lord
  2016-10-30 21:22 ` David Miller
@ 2016-10-30 23:28 ` Mark Lord
  2016-10-31  0:57   ` David Miller
  2 siblings, 1 reply; 18+ messages in thread
From: Mark Lord @ 2016-10-30 23:28 UTC (permalink / raw)
  To: nic_swsd, netdev, Linux Kernel; +Cc: stable

The r8152 driver has been broken since (approx) 3.16.xx
when support was added for hardware RX checksums
on newer chip versions.  Symptoms include random
segfaults and silent data corruption over NFS.

The hardware checksum logig does not work on the VER_02
dongles I have here when used with a slow embedded system CPU.
Google reveals others reporting similar issues on Raspberry Pi.

So, disable hardware RX checksum support for VER_02, and fix
an obvious coding error for IPV6 checksums in the same function.

Because this bug results in silent data corruption,
it is a good candidate for back-porting to -stable >= 3.16.xx.

Signed-off-by: Mark Lord <mlord@pobox.com>

--- old/drivers/net/usb/r8152.c	2016-09-30 04:20:43.000000000 -0400
+++ linux/drivers/net/usb/r8152.c	2016-10-26 14:15:44.932517676 -0400
@@ -1645,7 +1645,7 @@
 	u8 checksum = CHECKSUM_NONE;
 	u32 opts2, opts3;

-	if (tp->version == RTL_VER_01)
+	if (tp->version == RTL_VER_01 || tp->version == RTL_VER_02)
 		goto return_result;

 	opts2 = le32_to_cpu(rx_desc->opts2);
@@ -1660,7 +1660,7 @@
 			checksum = CHECKSUM_NONE;
 		else
 			checksum = CHECKSUM_UNNECESSARY;
-	} else if (RD_IPV6_CS) {
+	} else if (opts2 & RD_IPV6_CS) {
 		if ((opts2 & RD_UDP_CS) && !(opts3 & UDPF))
 			checksum = CHECKSUM_UNNECESSARY;
 		else if ((opts2 & RD_TCP_CS) && !(opts3 & TCPF))

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drivers/net/usb/r8152 fix broken rx checksums
  2016-10-30 21:22 ` David Miller
@ 2016-10-30 23:48   ` Paul Bolle
  0 siblings, 0 replies; 18+ messages in thread
From: Paul Bolle @ 2016-10-30 23:48 UTC (permalink / raw)
  To: David Miller, mlord; +Cc: nic_swsd, netdev, linux-kernel, stable

On Sun, 2016-10-30 at 17:22 -0400, David Miller wrote:
>         3) "Fix broken RX checksums."  Commit header lines and commit
>            messages are proper English, therefore sentences should
>            begin with a capitalized letter and end with a period.

Commit messages should be proper English. But commit header lines
should not end with a period. The vast majority doesn't. Yes, I've just
checked.

How many newspaper headlines end with a period?

Thanks,


Paul Bolle

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] r8152: Fix broken RX checksums.
  2016-10-30 23:28 ` [PATCH net] r8152: Fix broken RX checksums Mark Lord
@ 2016-10-31  0:57   ` David Miller
  2016-10-31  2:07     ` Mark Lord
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2016-10-31  0:57 UTC (permalink / raw)
  To: mlord; +Cc: nic_swsd, netdev, linux-kernel, stable

From: Mark Lord <mlord@pobox.com>
Date: Sun, 30 Oct 2016 19:28:27 -0400

> The r8152 driver has been broken since (approx) 3.16.xx
> when support was added for hardware RX checksums
> on newer chip versions.  Symptoms include random
> segfaults and silent data corruption over NFS.
> 
> The hardware checksum logig does not work on the VER_02
> dongles I have here when used with a slow embedded system CPU.
> Google reveals others reporting similar issues on Raspberry Pi.
> 
> So, disable hardware RX checksum support for VER_02, and fix
> an obvious coding error for IPV6 checksums in the same function.
> 
> Because this bug results in silent data corruption,
> it is a good candidate for back-porting to -stable >= 3.16.xx.
> 
> Signed-off-by: Mark Lord <mlord@pobox.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] r8152: Fix broken RX checksums.
  2016-10-31  0:57   ` David Miller
@ 2016-10-31  2:07     ` Mark Lord
  2016-10-31  3:53       ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Mark Lord @ 2016-10-31  2:07 UTC (permalink / raw)
  To: David Miller; +Cc: nic_swsd, netdev, linux-kernel

On 16-10-30 08:57 PM, David Miller wrote:
> From: Mark Lord <mlord@pobox.com>
> Date: Sun, 30 Oct 2016 19:28:27 -0400
> 
>> The r8152 driver has been broken since (approx) 3.16.xx
>> when support was added for hardware RX checksums
>> on newer chip versions.  Symptoms include random
>> segfaults and silent data corruption over NFS.
>>
>> The hardware checksum logig does not work on the VER_02
>> dongles I have here when used with a slow embedded system CPU.
>> Google reveals others reporting similar issues on Raspberry Pi.
>>
>> So, disable hardware RX checksum support for VER_02, and fix
>> an obvious coding error for IPV6 checksums in the same function.
>>
>> Because this bug results in silent data corruption,
>> it is a good candidate for back-porting to -stable >= 3.16.xx.
>>
>> Signed-off-by: Mark Lord <mlord@pobox.com>
> 
> Applied and queued up for -stable, thanks.

Thanks.  Now that this is taken care of, I do wonder if perhaps
RX checksums ought to be enabled at all for ANY versions of this chip?

My theory is that the checksums probably work okay most of the time,
except when the hardware RX buffer overflows.

In my case, and in the case of the Raspberry Pi, the receiving CPU
is quite a bit slower than mainstream x86, so it can quite easily
fall behind in emptying the RX buffer on the chip.
The only indication this has happened may be an incorrect RX checksum.

This is only a theory, but I otherwise have trouble explaining
why we are seeing invalid RX checksums -- direct cable connections
to a switch, shared only with the NFS server.  No reason for it
to have bad RX checksums in the first place.

Should we just blanket disable RX checksums for all versions here
unless proven otherwise/safe?

Anyone out there know better?

Cheers
Mark

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] r8152: Fix broken RX checksums.
  2016-10-31  2:07     ` Mark Lord
@ 2016-10-31  3:53       ` David Miller
  2016-10-31  8:14         ` Hayes Wang
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2016-10-31  3:53 UTC (permalink / raw)
  To: mlord; +Cc: nic_swsd, netdev, linux-kernel

From: Mark Lord <mlord@pobox.com>
Date: Sun, 30 Oct 2016 22:07:25 -0400

> On 16-10-30 08:57 PM, David Miller wrote:
>> From: Mark Lord <mlord@pobox.com>
>> Date: Sun, 30 Oct 2016 19:28:27 -0400
>> 
>>> The r8152 driver has been broken since (approx) 3.16.xx
>>> when support was added for hardware RX checksums
>>> on newer chip versions.  Symptoms include random
>>> segfaults and silent data corruption over NFS.
>>>
>>> The hardware checksum logig does not work on the VER_02
>>> dongles I have here when used with a slow embedded system CPU.
>>> Google reveals others reporting similar issues on Raspberry Pi.
>>>
>>> So, disable hardware RX checksum support for VER_02, and fix
>>> an obvious coding error for IPV6 checksums in the same function.
>>>
>>> Because this bug results in silent data corruption,
>>> it is a good candidate for back-porting to -stable >= 3.16.xx.
>>>
>>> Signed-off-by: Mark Lord <mlord@pobox.com>
>> 
>> Applied and queued up for -stable, thanks.
> 
> Thanks.  Now that this is taken care of, I do wonder if perhaps
> RX checksums ought to be enabled at all for ANY versions of this chip?

You should really start a dialogue with the developer who has been
making the most, if not all, of the major changes to this driver over
the past few years, Hayes Wang.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH net] r8152: Fix broken RX checksums.
  2016-10-31  3:53       ` David Miller
@ 2016-10-31  8:14         ` Hayes Wang
  2016-10-31 13:24           ` Mark Lord
                             ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Hayes Wang @ 2016-10-31  8:14 UTC (permalink / raw)
  To: David Miller, mlord; +Cc: nic_swsd, netdev, linux-kernel

> >>> The r8152 driver has been broken since (approx) 3.16.xx
> >>> when support was added for hardware RX checksums
> >>> on newer chip versions.  Symptoms include random
> >>> segfaults and silent data corruption over NFS.
> >>>
> >>> The hardware checksum logig does not work on the VER_02
> >>> dongles I have here when used with a slow embedded system CPU.
> >>> Google reveals others reporting similar issues on Raspberry Pi.
> >>>
> >>> So, disable hardware RX checksum support for VER_02, and fix
> >>> an obvious coding error for IPV6 checksums in the same function.
> >>>
> >>> Because this bug results in silent data corruption,
> >>> it is a good candidate for back-porting to -stable >= 3.16.xx.
> >>>
> >>> Signed-off-by: Mark Lord <mlord@pobox.com>
> >>
> >> Applied and queued up for -stable, thanks.
> >
> > Thanks.  Now that this is taken care of, I do wonder if perhaps
> > RX checksums ought to be enabled at all for ANY versions of this chip?
> 
> You should really start a dialogue with the developer who has been
> making the most, if not all, of the major changes to this driver over
> the past few years, Hayes Wang.

Our hw engineer says only VER_01 has the issue about rx checksum.
I need more information for checking it.

Best Regards,
Hayes

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] r8152: Fix broken RX checksums.
  2016-10-31  8:14         ` Hayes Wang
@ 2016-10-31 13:24           ` Mark Lord
  2016-11-02 18:29           ` Mark Lord
       [not found]           ` <201611030159.uA31x0np004648@rtits1.realtek.com>
  2 siblings, 0 replies; 18+ messages in thread
From: Mark Lord @ 2016-10-31 13:24 UTC (permalink / raw)
  To: Hayes Wang, David Miller; +Cc: nic_swsd, netdev, linux-kernel

On 16-10-31 04:14 AM, Hayes Wang wrote:
>
> Our hw engineer says only VER_01 has the issue about rx checksum.
> I need more information for checking it.

I've been doing driver work for Linux since 1991,
and learned long ago not to trust engineering specs 100%.

Get yourself a Raspberry Pi v1, set up an NFSROOT root filesystem for it,
and boot/run from that using the ethernet dongle to connect.

It should segfault like crazy when hardware RX checksums are enabled.

Definitely something wrong there, and whatever it is goes away
when RX checksums are disabled in the driver.

I have two theories as to why this happens:

1) The hardware buffer on the dongle overflows because the slow host CPU
does not empty it quickly enough.  This results in a bad checksum on the
final/truncated packet in the buffer.  The chip does not detect this.

2) Perhaps the device driver is looking at the wrong bits.

Either way, this results in data corruption and until otherwise fixed,
it is safest to just not enable RX checksums.

If it happens on a slow embedded CPU, then it can also happen on a heavily
loaded Intel/AMD CPU -- just a lot less frequently.

Cheers
-- 
Mark Lord
Real-Time Remedies Inc.
mlord@pobox.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] r8152: Fix broken RX checksums.
  2016-10-31  8:14         ` Hayes Wang
  2016-10-31 13:24           ` Mark Lord
@ 2016-11-02 18:29           ` Mark Lord
  2016-11-04 12:13             ` Mark Lord
       [not found]           ` <201611030159.uA31x0np004648@rtits1.realtek.com>
  2 siblings, 1 reply; 18+ messages in thread
From: Mark Lord @ 2016-11-02 18:29 UTC (permalink / raw)
  To: Hayes Wang, David Miller; +Cc: nic_swsd, netdev, linux-kernel

On 16-10-31 04:14 AM, Hayes Wang wrote:
>>>>> The r8152 driver has been broken since (approx) 3.16.xx
>>>>> when support was added for hardware RX checksums
>>>>> on newer chip versions.  Symptoms include random
>>>>> segfaults and silent data corruption over NFS.
>>>>>
>>>>> The hardware checksum logig does not work on the VER_02
>>>>> dongles I have here when used with a slow embedded system CPU.
>>>>> Google reveals others reporting similar issues on Raspberry Pi.
...
> Our hw engineer says only VER_01 has the issue about rx checksum.
> I need more information for checking it.

I have poked at it some more, and thus far it appears that it is
only necessary to disable TCP rx checksums.  The system doesn't crash
when only IP/UDP checksums are enabled, but does when TCP checksums are on.

This happens regardless of whether RX_AGG is disabled or enabled,
and increasing/decreasing the number of RX URBs (RTL8152_MAX_RX)
doesn't seem to affect it.

lsusb -vv (from an x86 system, not the failing embedded system) follows:

Bus 001 Device 004: ID 0bda:8152 Realtek Semiconductor Corp.
Device Descriptor:
   bLength                18
   bDescriptorType         1
   bcdUSB               2.10
   bDeviceClass            0 (Defined at Interface level)
   bDeviceSubClass         0
   bDeviceProtocol         0
   bMaxPacketSize0        64
   idVendor           0x0bda Realtek Semiconductor Corp.
   idProduct          0x8152
   bcdDevice           20.00
   iManufacturer           1 Realtek
   iProduct                2 USB 10/100 LAN
   iSerial                 3 84E71400257D
   bNumConfigurations      2
   Configuration Descriptor:
     bLength                 9
     bDescriptorType         2
     wTotalLength           39
     bNumInterfaces          1
     bConfigurationValue     1
     iConfiguration          0
     bmAttributes         0xa0
       (Bus Powered)
       Remote Wakeup
     MaxPower              100mA
     Interface Descriptor:
       bLength                 9
       bDescriptorType         4
       bInterfaceNumber        0
       bAlternateSetting       0
       bNumEndpoints           3
       bInterfaceClass       255 Vendor Specific Class
       bInterfaceSubClass    255 Vendor Specific Subclass
       bInterfaceProtocol      0
       iInterface              0
       Endpoint Descriptor:
         bLength                 7
         bDescriptorType         5
         bEndpointAddress     0x81  EP 1 IN
         bmAttributes            2
           Transfer Type            Bulk
           Synch Type               None
           Usage Type               Data
         wMaxPacketSize     0x0200  1x 512 bytes
         bInterval               0
       Endpoint Descriptor:
         bLength                 7
         bDescriptorType         5
         bEndpointAddress     0x02  EP 2 OUT
         bmAttributes            2
           Transfer Type            Bulk
           Synch Type               None
           Usage Type               Data
         wMaxPacketSize     0x0200  1x 512 bytes
         bInterval               0
       Endpoint Descriptor:
         bLength                 7
         bDescriptorType         5
         bEndpointAddress     0x83  EP 3 IN
         bmAttributes            3
           Transfer Type            Interrupt
           Synch Type               None
           Usage Type               Data
         wMaxPacketSize     0x0002  1x 2 bytes
         bInterval               8
   Configuration Descriptor:
     bLength                 9
     bDescriptorType         2
     wTotalLength           80
     bNumInterfaces          2
     bConfigurationValue     2
     iConfiguration          0
     bmAttributes         0xa0
       (Bus Powered)
       Remote Wakeup
     MaxPower              100mA
     Interface Descriptor:
       bLength                 9
       bDescriptorType         4
       bInterfaceNumber        0
       bAlternateSetting       0
       bNumEndpoints           1
       bInterfaceClass         2 Communications
       bInterfaceSubClass      6 Ethernet Networking
       bInterfaceProtocol      0
       iInterface              5 CDC Communications Control
       CDC Header:
         bcdCDC               1.10
       CDC Union:
         bMasterInterface        0
         bSlaveInterface         1
       CDC Ethernet:
         iMacAddress                      3 84E71400257D
         bmEthernetStatistics    0x00000000
         wMaxSegmentSize               1514
         wNumberMCFilters            0x0000
         bNumberPowerFilters              0
       Endpoint Descriptor:
         bLength                 7
         bDescriptorType         5
         bEndpointAddress     0x83  EP 3 IN
         bmAttributes            3
           Transfer Type            Interrupt
           Synch Type               None
           Usage Type               Data
         wMaxPacketSize     0x0010  1x 16 bytes
         bInterval               8
     Interface Descriptor:
       bLength                 9
       bDescriptorType         4
       bInterfaceNumber        1
       bAlternateSetting       0
       bNumEndpoints           0
       bInterfaceClass        10 CDC Data
       bInterfaceSubClass      0 Unused
       bInterfaceProtocol      0
       iInterface              0
     Interface Descriptor:
       bLength                 9
       bDescriptorType         4
       bInterfaceNumber        1
       bAlternateSetting       1
       bNumEndpoints           2
       bInterfaceClass        10 CDC Data
       bInterfaceSubClass      0 Unused
       bInterfaceProtocol      0
       iInterface              4 Ethernet Data
       Endpoint Descriptor:
         bLength                 7
         bDescriptorType         5
         bEndpointAddress     0x81  EP 1 IN
         bmAttributes            2
           Transfer Type            Bulk
           Synch Type               None
           Usage Type               Data
         wMaxPacketSize     0x0200  1x 512 bytes
         bInterval               0
       Endpoint Descriptor:
         bLength                 7
         bDescriptorType         5
         bEndpointAddress     0x02  EP 2 OUT
         bmAttributes            2
           Transfer Type            Bulk
           Synch Type               None
           Usage Type               Data
         wMaxPacketSize     0x0200  1x 512 bytes
         bInterval               0
Binary Object Store Descriptor:
   bLength                 5
   bDescriptorType        15
   wTotalLength           12
   bNumDeviceCaps          1
   USB 2.0 Extension Device Capability:
     bLength                 7
     bDescriptorType        16
     bDevCapabilityType      2
     bmAttributes   0x00000002
       Link Power Management (LPM) Supported
Device Status:     0x0000
   (Bus Powered)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH net] r8152: Fix broken RX checksums.
       [not found]           ` <201611030159.uA31x0np004648@rtits1.realtek.com>
@ 2016-11-03  8:56             ` Hayes Wang
  2016-11-03 11:43               ` Mark Lord
  0 siblings, 1 reply; 18+ messages in thread
From: Hayes Wang @ 2016-11-03  8:56 UTC (permalink / raw)
  To: Mark Lord, David Miller; +Cc: nic_swsd, netdev, linux-kernel

Mark Lord [mailto:mlord@pobox.com]
> Sent: Thursday, November 03, 2016 2:30 AM
> To: Hayes Wang; David Miller
[...]
> I have poked at it some more, and thus far it appears that it is
> only necessary to disable TCP rx checksums.  The system doesn't crash
> when only IP/UDP checksums are enabled, but does when TCP checksums are on.
> 
> This happens regardless of whether RX_AGG is disabled or enabled,
> and increasing/decreasing the number of RX URBs (RTL8152_MAX_RX)
> doesn't seem to affect it.

I test Raspberry Pi v1, but I couldn't boot with NFSROOT through
both onboard nic and RTL8152. I get following error.

   VFS: Unable to mount root fs via NFS, trying floppy.
   Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0)

However, if I start the system without NFSROOT, I could mount the nfs fs.
Any idea?

NFS server: Fedora 24
Raspberry Pi OS: 2016-09-23-raspbian-jessie

Content of /etc/exports:

   /nfsexport *(rw,sync,no_root_squash)

I change the cmdline.txt from

   dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1
   root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline
   fsck.repair=yes rootwait quiet splash plymouth.ignore-serial-consoles

to

   dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1
   root=/dev/nfs nfsroot=192.168.94.2:/nfsexport ip=192.168.94.22
   rw rootwait quiet splash plymouth.ignore-serial-consoles

Best Regards,
Hayes

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] r8152: Fix broken RX checksums.
  2016-11-03  8:56             ` Hayes Wang
@ 2016-11-03 11:43               ` Mark Lord
  2016-11-04 13:50                 ` Mark Lord
       [not found]                 ` <201611041425.uA4EPwCw018176@rtits1.realtek.com>
  0 siblings, 2 replies; 18+ messages in thread
From: Mark Lord @ 2016-11-03 11:43 UTC (permalink / raw)
  To: Hayes Wang, David Miller; +Cc: nic_swsd, netdev, linux-kernel

On 16-11-03 04:56 AM, Hayes Wang wrote:
> Mark Lord [mailto:mlord@pobox.com]
>> Sent: Thursday, November 03, 2016 2:30 AM
>> To: Hayes Wang; David Miller
> [...]
>> I have poked at it some more, and thus far it appears that it is
>> only necessary to disable TCP rx checksums.  The system doesn't crash
>> when only IP/UDP checksums are enabled, but does when TCP checksums are on.
>>
>> This happens regardless of whether RX_AGG is disabled or enabled,
>> and increasing/decreasing the number of RX URBs (RTL8152_MAX_RX)
>> doesn't seem to affect it.
> 
> I test Raspberry Pi v1, but I couldn't boot with NFSROOT through
> both onboard nic and RTL8152. I get following error.
> 
>    VFS: Unable to mount root fs via NFS, trying floppy.
>    Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0)
> 
> However, if I start the system without NFSROOT, I could mount the nfs fs.
> Any idea?

Rather than getting caught up in all of that,
you could then just chroot to the mounted nfs fs
at that point, and continue on from there.

Eg.  chroot /mnt/nfsxxx /bin/sh

Running from NFS is probably not necessary though.
Instead, perhaps just run md5sum on every file on the nfs fs
from the Raspberry Pi, and then repeat the md5sum's on the server,
and compare the results for errors.

The system I am using the dongle with is a custom embedded board,
but I think the important thing is that it has a slow-ish CPU,
which means it is more prone to having the on-chip RX FIFO overflow.
It is also big-endian rather than little-endian, though that seems
to be correctly handled already in the device driver.

I will try the md5sum test on an x86 box for comparison.

Cheers
-- 
Mark Lord

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] r8152: Fix broken RX checksums.
  2016-11-02 18:29           ` Mark Lord
@ 2016-11-04 12:13             ` Mark Lord
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Lord @ 2016-11-04 12:13 UTC (permalink / raw)
  To: Hayes Wang, David Miller; +Cc: nic_swsd, netdev, linux-kernel

On 16-11-02 02:29 PM, Mark Lord wrote:
>
> I have poked at it some more, and thus far it appears that it is
> only necessary to disable TCP rx checksums.  The system doesn't crash
> when only IP/UDP checksums are enabled, but does when TCP checksums are on.
>
> This happens regardless of whether RX_AGG is disabled or enabled,
> and increasing/decreasing the number of RX URBs (RTL8152_MAX_RX)
> doesn't seem to affect it.
>
..

I noticed that BIT(20) was not defined as anything for "opts3",
and so I added a line to rx_csum to check whether or not that bit
was ever set.

It triggered after a few thousand reset/reboot cycles with opts3
having the rather dubious looking value of an ASCII string: 0x5c7d7852.

So to me, it appears that the rx_desc's are getting mixed-up with data somewhere,
and when using hardware TCP checksums this doesn't get caught.  So perhaps the
checksums themselves are fine, but there's another bug (driver or hardware?)
that sneaks through when not doing software checksums.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] r8152: Fix broken RX checksums.
  2016-11-03 11:43               ` Mark Lord
@ 2016-11-04 13:50                 ` Mark Lord
  2016-11-04 20:13                   ` Mark Lord
       [not found]                 ` <201611041425.uA4EPwCw018176@rtits1.realtek.com>
  1 sibling, 1 reply; 18+ messages in thread
From: Mark Lord @ 2016-11-04 13:50 UTC (permalink / raw)
  To: Hayes Wang, David Miller; +Cc: nic_swsd, netdev, linux-kernel

Yeah, the device or driver is definitely getting confused with rx_desc structures.
I added code to check for unlikely rx_desc values, and it found this for starters:

rx_desc: 00480801 00480401 00480001 0048fc00 0048f800 0048f400 pkt_len=2045
rx_data: 00 f0 48 00 00 ec 48 00 00 e8 48 00 00 e4 48 00 00 e0 48 00 00 dc 48 00 00 d8 48 00 00 d4 48 00
rx_data: 00 d0 48 00 00 cc 48 00 00 c8 48 00 00 c4 48 00 00 c0 48 00 00 bc 48 00 00 b8 48 00 00 b4 48 00
rx_data: 00 b0 48 00 00 ac 48 00 00 01 00 00 81 ed 00 00 00 01 00 00 00 00 00 00 00 00 00 02 4d ac 00 00
rx_data: 10 00 ff ff ff ff 00 00 01 28 83 d6 ff 6d 00 20 25 b1 58 1b 68 ff 00 05 20 01 56 41 17 35 00 00
...

The MTU/MRU on this link is the standard 1500 bytes, so a pkt_len of 2045 isn't valid here.
And the rx_desc values look an awful lot like the rx_data values that follow it.

There's definitely more broken here than just TCP RX checksums.

-ml

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] r8152: Fix broken RX checksums.
  2016-11-04 13:50                 ` Mark Lord
@ 2016-11-04 20:13                   ` Mark Lord
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Lord @ 2016-11-04 20:13 UTC (permalink / raw)
  To: Hayes Wang, David Miller; +Cc: nic_swsd, netdev, linux-kernel

On 16-11-04 09:50 AM, Mark Lord wrote:
> Yeah, the device or driver is definitely getting confused with rx_desc structures.
> I added code to check for unlikely rx_desc values, and it found this for starters:
> 
> rx_desc: 00480801 00480401 00480001 0048fc00 0048f800 0048f400 pkt_len=2045
> rx_data: 00 f0 48 00 00 ec 48 00 00 e8 48 00 00 e4 48 00 00 e0 48 00 00 dc 48 00 00 d8 48 00 00 d4
> 48 00
> rx_data: 00 d0 48 00 00 cc 48 00 00 c8 48 00 00 c4 48 00 00 c0 48 00 00 bc 48 00 00 b8 48 00 00 b4
> 48 00
> rx_data: 00 b0 48 00 00 ac 48 00 00 01 00 00 81 ed 00 00 00 01 00 00 00 00 00 00 00 00 00 02 4d ac
> 00 00
> rx_data: 10 00 ff ff ff ff 00 00 01 28 83 d6 ff 6d 00 20 25 b1 58 1b 68 ff 00 05 20 01 56 41 17 35
> 00 00
> ...
> 
> The MTU/MRU on this link is the standard 1500 bytes, so a pkt_len of 2045 isn't valid here.
> And the rx_desc values look an awful lot like the rx_data values that follow it.
> 
> There's definitely more broken here than just TCP RX checksums.

I spent a bit more time on this again today, and made progress.
The issue seems to be stale rx buffers.
I'll discuss further offline with Hayes Wang.

-- 
Mark Lord
Real-Time Remedies Inc.
mlord@pobox.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH net] r8152: Fix broken RX checksums.
       [not found]                 ` <201611041425.uA4EPwCw018176@rtits1.realtek.com>
@ 2016-11-09 13:09                   ` Hayes Wang
  2016-11-09 13:19                     ` Mark Lord
  0 siblings, 1 reply; 18+ messages in thread
From: Hayes Wang @ 2016-11-09 13:09 UTC (permalink / raw)
  To: Mark Lord, David Miller; +Cc: nic_swsd, netdev, linux-kernel

Mark Lord [mailto:mlord@pobox.com]
> Sent: Friday, November 04, 2016 9:50 PM
[...]
> Yeah, the device or driver is definitely getting confused with rx_desc structures.
> I added code to check for unlikely rx_desc values, and it found this for starters:
> 
> rx_desc: 00480801 00480401 00480001 0048fc00 0048f800 0048f400
> pkt_len=2045
> rx_data: 00 f0 48 00 00 ec 48 00 00 e8 48 00 00 e4 48 00 00 e0 48 00 00 dc 48 00
> 00 d8 48 00 00 d4 48 00
> rx_data: 00 d0 48 00 00 cc 48 00 00 c8 48 00 00 c4 48 00 00 c0 48 00 00 bc 48 00
> 00 b8 48 00 00 b4 48 00
> rx_data: 00 b0 48 00 00 ac 48 00 00 01 00 00 81 ed 00 00 00 01 00 00 00 00 00 00
> 00 00 00 02 4d ac 00 00
> rx_data: 10 00 ff ff ff ff 00 00 01 28 83 d6 ff 6d 00 20 25 b1 58 1b 68 ff 00 05 20 01
> 56 41 17 35 00 00
> ...
> 
> The MTU/MRU on this link is the standard 1500 bytes, so a pkt_len of 2045 isn't
> valid here.
> And the rx_desc values look an awful lot like the rx_data values that follow it.
> 
> There's definitely more broken here than just TCP RX checksums.

I don't think it is the issue of our hw. If it happens, windows or
other OS may have problems, too. It is like the memory issue described
in commit 990c9b347245("Merge branch 'r8152-fixes'"). It seems that
the data in memory is not same with the one from the device.

Besides, I test the raspberry pi with RTL8152. However, I don't find
any checksum issue for TCP. I try to copy a large file and md5sum it
through NFS. It works fine.

Best Regards,
Hayes

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] r8152: Fix broken RX checksums.
  2016-11-09 13:09                   ` Hayes Wang
@ 2016-11-09 13:19                     ` Mark Lord
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Lord @ 2016-11-09 13:19 UTC (permalink / raw)
  To: Hayes Wang, David Miller; +Cc: nic_swsd, netdev, linux-kernel

On 16-11-09 08:09 AM, Hayes Wang wrote:
> Mark Lord [mailto:mlord@pobox.com]
..
>> The MTU/MRU on this link is the standard 1500 bytes, so a pkt_len of 2045 isn't
>> valid here.
>> And the rx_desc values look an awful lot like the rx_data values that follow it.
>>
>> There's definitely more broken here than just TCP RX checksums.
> 
> I don't think it is the issue of our hw. If it happens, windows or
> other OS may have problems, too. It is like the memory issue described
> in commit 990c9b347245("Merge branch 'r8152-fixes'"). It seems that
> the data in memory is not same with the one from the device.

I am still doing long-term testing of various tweaks to the driver,
and can now confirm that changing from kmalloc() to usb_alloc_coherent()
vastly improves reliability, and re-enabling RX checksums works fine
with that change.

However, even with coherent URB buffers, I still see the occasional bad rx_desc:
like, twice in 36 hours of continuous bashing at it.

So having code in the driver to sanitize the rx_desc is essential.
My current test code (shared with Hayes already) includes validation of various
key fields of the rx_desc, and detects when the chip/driver/whatever gets confused.

Hopefully r8152.c will get updated to take more care before trusting
what it sees in the rx_desc fields.

Cheers
-- 
Mark Lord
mlord@pobox.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-11-09 13:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-26 22:36 [PATCH] drivers/net/usb/r8152 fix broken rx checksums Mark Lord
2016-10-26 22:54 ` Mark Lord
2016-10-30 21:22 ` David Miller
2016-10-30 23:48   ` Paul Bolle
2016-10-30 23:28 ` [PATCH net] r8152: Fix broken RX checksums Mark Lord
2016-10-31  0:57   ` David Miller
2016-10-31  2:07     ` Mark Lord
2016-10-31  3:53       ` David Miller
2016-10-31  8:14         ` Hayes Wang
2016-10-31 13:24           ` Mark Lord
2016-11-02 18:29           ` Mark Lord
2016-11-04 12:13             ` Mark Lord
     [not found]           ` <201611030159.uA31x0np004648@rtits1.realtek.com>
2016-11-03  8:56             ` Hayes Wang
2016-11-03 11:43               ` Mark Lord
2016-11-04 13:50                 ` Mark Lord
2016-11-04 20:13                   ` Mark Lord
     [not found]                 ` <201611041425.uA4EPwCw018176@rtits1.realtek.com>
2016-11-09 13:09                   ` Hayes Wang
2016-11-09 13:19                     ` Mark Lord

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).