All of lore.kernel.org
 help / color / mirror / Atom feed
* net: mv643xx: interface does not transmit after some time
@ 2016-02-06 18:24 Thomas Schlöter
  2016-02-06 18:34 ` Andrew Lunn
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Schlöter @ 2016-02-06 18:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

I am using mainline kernel 4.4 on a QNAP TS-412 (Marvell Kirkwood Feroceon 6281). The network interface mv643xx loses connection after a few minutes to some hours. Only bringing the interface down and up again helps.

Now I have a serial console for debugging. When I ping the dead interface and run ?watch ifconfig eth0? on the serial console, I can see the RX packets counting up while TX packets stays unchanged. Log files only show a message of some services showing a timeout or ?if_sendrawpacket: No buffer space available?.

How can I figure out what is wrong?

Regards,
Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* net: mv643xx: interface does not transmit after some time
  2016-02-06 18:24 net: mv643xx: interface does not transmit after some time Thomas Schlöter
@ 2016-02-06 18:34 ` Andrew Lunn
  2016-02-06 23:19   ` Martin Michlmayr
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Lunn @ 2016-02-06 18:34 UTC (permalink / raw)
  To: linux-arm-kernel

> How can I figure out what is wrong?

Hi Thomas

Can you find an easily way to reproduce it? iperf, or scp a big file
etc.

If you have a simple and reliable way to reproduce it, i would suggest
doing a git bisect to find out which change broke it.

      Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* net: mv643xx: interface does not transmit after some time
  2016-02-06 18:34 ` Andrew Lunn
@ 2016-02-06 23:19   ` Martin Michlmayr
  2016-02-07 16:15     ` Adam Baker
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Martin Michlmayr @ 2016-02-06 23:19 UTC (permalink / raw)
  To: linux-arm-kernel

* Andrew Lunn <andrew@lunn.ch> [2016-02-06 19:34]:
> Can you find an easily way to reproduce it? iperf, or scp a big file
> etc.

FWIW, we had a similar bug report in Debian recently:
https://lists.debian.org/debian-arm/2016/01/msg00098.html

Turning off TCP RX and TX offloading make it go away:
https://lists.debian.org/debian-arm/2016/01/msg00100.html

Hopefully Thomas can do a bisect.

-- 
Martin Michlmayr
http://www.cyrius.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* net: mv643xx: interface does not transmit after some time
  2016-02-06 23:19   ` Martin Michlmayr
@ 2016-02-07 16:15     ` Adam Baker
  2016-02-07 18:04       ` Andrew Lunn
       [not found]     ` <2ACB3A0B-DD51-43C1-A56E-E7C175645554@schloeter.net>
  2016-02-07 21:11     ` Thomas Schlöter
  2 siblings, 1 reply; 15+ messages in thread
From: Adam Baker @ 2016-02-07 16:15 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/02/16 23:19, Martin Michlmayr wrote:
> * Andrew Lunn <andrew@lunn.ch> [2016-02-06 19:34]:
>> Can you find an easily way to reproduce it? iperf, or scp a big file
>> etc.
>
> FWIW, we had a similar bug report in Debian recently:
> https://lists.debian.org/debian-arm/2016/01/msg00098.html
>
> Turning off TCP RX and TX offloading make it go away:
> https://lists.debian.org/debian-arm/2016/01/msg00100.html
>
> Hopefully Thomas can do a bisect.
>

I observed a similar issue when performing a large NFSv4 transfer with a 
kirkwood based server. Unfortunately I'd just jumped from 3.14 to 4.4 so 
I could do with a better starting point to bisect from.

It failed several times while transferring a 4GB file over NFS but is 
stable in normal use.

Regards

Adam

^ permalink raw reply	[flat|nested] 15+ messages in thread

* net: mv643xx: interface does not transmit after some time
  2016-02-07 16:15     ` Adam Baker
@ 2016-02-07 18:04       ` Andrew Lunn
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Lunn @ 2016-02-07 18:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Feb 07, 2016 at 04:15:36PM +0000, Adam Baker wrote:
> On 06/02/16 23:19, Martin Michlmayr wrote:
> >* Andrew Lunn <andrew@lunn.ch> [2016-02-06 19:34]:
> >>Can you find an easily way to reproduce it? iperf, or scp a big file
> >>etc.
> >
> >FWIW, we had a similar bug report in Debian recently:
> >https://lists.debian.org/debian-arm/2016/01/msg00098.html
> >
> >Turning off TCP RX and TX offloading make it go away:
> >https://lists.debian.org/debian-arm/2016/01/msg00100.html
> >
> >Hopefully Thomas can do a bisect.
> >
> 
> I observed a similar issue when performing a large NFSv4 transfer
> with a kirkwood based server. Unfortunately I'd just jumped from
> 3.14 to 4.4 so I could do with a better starting point to bisect
> from.
> 
> It failed several times while transferring a 4GB file over NFS but
> is stable in normal use.

I tried iperf, 6Gbytes, both RX and TX and it worked. This is with
4.5-rc2 net-next.

I will try again with different window sizes etc, to see if i can
trigger it.

	Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* net: mv643xx: interface does not transmit after some time
       [not found]     ` <2ACB3A0B-DD51-43C1-A56E-E7C175645554@schloeter.net>
@ 2016-02-07 20:35       ` Andrew Lunn
  2016-02-07 21:07         ` Thomas Schlöter
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Lunn @ 2016-02-07 20:35 UTC (permalink / raw)
  To: linux-arm-kernel

> > FWIW, we had a similar bug report in Debian recently:
> > https://lists.debian.org/debian-arm/2016/01/msg00098.html

Hi Thomas

I this thread, Ian Campbell mentions a patch. Please could you try
that patch and see if it fixes your problem.

Thanks
	Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* net: mv643xx: interface does not transmit after some time
  2016-02-07 20:35       ` Andrew Lunn
@ 2016-02-07 21:07         ` Thomas Schlöter
  2016-02-08 18:49           ` Thomas Schlöter
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Schlöter @ 2016-02-07 21:07 UTC (permalink / raw)
  To: linux-arm-kernel

Am 07.02.2016 um 21:35 schrieb Andrew Lunn <andrew@lunn.ch>:
> 
>>> FWIW, we had a similar bug report in Debian recently:
>>> https://lists.debian.org/debian-arm/2016/01/msg00098.html
> 
> Hi Thomas
> 
> I this thread, Ian Campbell mentions a patch. Please could you try
> that patch and see if it fixes your problem.
> 
> Thanks
> 	Andrew

Hi Andrew,

I just applied the patch and the NAS is now running it. I?ll try to crash it tonight and keep you informed whether it worked.

Thanks
	Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* net: mv643xx: interface does not transmit after some time
  2016-02-06 23:19   ` Martin Michlmayr
  2016-02-07 16:15     ` Adam Baker
       [not found]     ` <2ACB3A0B-DD51-43C1-A56E-E7C175645554@schloeter.net>
@ 2016-02-07 21:11     ` Thomas Schlöter
  2 siblings, 0 replies; 15+ messages in thread
From: Thomas Schlöter @ 2016-02-07 21:11 UTC (permalink / raw)
  To: linux-arm-kernel

Am 07.02.2016 um 00:19 schrieb Martin Michlmayr <tbm@cyrius.com>:
> 
> * Andrew Lunn <andrew@lunn.ch> [2016-02-06 19:34]:
>> Can you find an easily way to reproduce it? iperf, or scp a big file
>> etc.
> 
> FWIW, we had a similar bug report in Debian recently:
> https://lists.debian.org/debian-arm/2016/01/msg00098.html
> 
> Turning off TCP RX and TX offloading make it go away:
> https://lists.debian.org/debian-arm/2016/01/msg00100.html

Sounds interesting. I will try that.

> Hopefully Thomas can do a bisect.

I have never used that feature before. As far as I understand, I have to mark the last known good revision good and the most recent bad. Then I try every version in between which has changef related code. Right?

Actually I don't have a last good version as I had the original QNAP fw installed before.

4.3, which I installed during my first try was bad and 4.4 did not fix it. What can I do now? Should I provide my complete kernel config file?
At the moment I am trying Ian Campbell's patch as suggested by Andrew.

Thomas

@Martin: Sorry for reposting to your address, I did not manage to send plain text mail from my iPhone.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* net: mv643xx: interface does not transmit after some time
  2016-02-07 21:07         ` Thomas Schlöter
@ 2016-02-08 18:49           ` Thomas Schlöter
  2016-02-10 18:40             ` Thomas Schlöter
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Schlöter @ 2016-02-08 18:49 UTC (permalink / raw)
  To: linux-arm-kernel


> Am 07.02.2016 um 22:07 schrieb Thomas Schl?ter <thomas@schloeter.net>:
> 
> Am 07.02.2016 um 21:35 schrieb Andrew Lunn <andrew@lunn.ch>:
>> 
>>>> FWIW, we had a similar bug report in Debian recently:
>>>> https://lists.debian.org/debian-arm/2016/01/msg00098.html
>> 
>> Hi Thomas
>> 
>> I this thread, Ian Campbell mentions a patch. Please could you try
>> that patch and see if it fixes your problem.
>> 
>> Thanks
>> 	Andrew
> 
> Hi Andrew,
> 
> I just applied the patch and the NAS is now running it. I?ll try to crash it tonight and keep you informed whether it worked.
> 
> Thanks
> 	Thomas

Hi Andrew,

the patch did not fix the problem. After 1.2 GiB RX and 950 MiB TX, the interface crashed again.

Now I switched off RX/TX offload just to make sure we are talking about the same problem. If we are, the interface should be stable without offload, right?

	Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* net: mv643xx: interface does not transmit after some time
  2016-02-08 18:49           ` Thomas Schlöter
@ 2016-02-10 18:40             ` Thomas Schlöter
  2016-02-10 22:57               ` Andrew Lunn
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Schlöter @ 2016-02-10 18:40 UTC (permalink / raw)
  To: linux-arm-kernel


> Am 08.02.2016 um 19:49 schrieb Thomas Schl?ter <thomas@schloeter.net>:
> 
> 
>> Am 07.02.2016 um 22:07 schrieb Thomas Schl?ter <thomas@schloeter.net>:
>> 
>> Am 07.02.2016 um 21:35 schrieb Andrew Lunn <andrew@lunn.ch>:
>>> 
>>>>> FWIW, we had a similar bug report in Debian recently:
>>>>> https://lists.debian.org/debian-arm/2016/01/msg00098.html
>>> 
>>> Hi Thomas
>>> 
>>> I this thread, Ian Campbell mentions a patch. Please could you try
>>> that patch and see if it fixes your problem.
>>> 
>>> Thanks
>>> 	Andrew
>> 
>> Hi Andrew,
>> 
>> I just applied the patch and the NAS is now running it. I?ll try to crash it tonight and keep you informed whether it worked.
>> 
>> Thanks
>> 	Thomas
> 
> Hi Andrew,
> 
> the patch did not fix the problem. After 1.2 GiB RX and 950 MiB TX, the interface crashed again.
> 
> Now I switched off RX/TX offload just to make sure we are talking about the same problem. If we are, the interface should be stable without offload, right?
> 
> 	Thomas

Okay, so I have installed ethtool and switched off all offload features available. Now the NAS is running rock solid for two days. I backed up my Mac using Time Machine / netatalk (450 GiB transferred) and some Linux machines via NFS (100 GiB total) without a problem.

How much code is used for mv643xx offload functionality?
Is it possible to debug things in the driver and figure out what happens during the crash?
Is the hardware offload interface proprietary or reverse engineered or is it a well known API that can be analyzed?

	Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* net: mv643xx: interface does not transmit after some time
  2016-02-10 18:40             ` Thomas Schlöter
@ 2016-02-10 22:57               ` Andrew Lunn
  2016-02-11 14:38                   ` Ezequiel Garcia
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Lunn @ 2016-02-10 22:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 10, 2016 at 07:40:54PM +0100, Thomas Schl?ter wrote:
> 
> > Am 08.02.2016 um 19:49 schrieb Thomas Schl?ter <thomas@schloeter.net>:
> > 
> > 
> >> Am 07.02.2016 um 22:07 schrieb Thomas Schl?ter <thomas@schloeter.net>:
> >> 
> >> Am 07.02.2016 um 21:35 schrieb Andrew Lunn <andrew@lunn.ch>:
> >>> 
> >>>>> FWIW, we had a similar bug report in Debian recently:
> >>>>> https://lists.debian.org/debian-arm/2016/01/msg00098.html
> >>> 
> >>> Hi Thomas
> >>> 
> >>> I this thread, Ian Campbell mentions a patch. Please could you try
> >>> that patch and see if it fixes your problem.
> >>> 
> >>> Thanks
> >>> 	Andrew
> >> 
> >> Hi Andrew,
> >> 
> >> I just applied the patch and the NAS is now running it. I???ll try to crash it tonight and keep you informed whether it worked.
> >> 
> >> Thanks
> >> 	Thomas
> > 
> > Hi Andrew,
> > 
> > the patch did not fix the problem. After 1.2 GiB RX and 950 MiB TX, the interface crashed again.
> > 
> > Now I switched off RX/TX offload just to make sure we are talking about the same problem. If we are, the interface should be stable without offload, right?
> > 
> > 	Thomas
> 
> Okay, so I have installed ethtool and switched off all offload features available. Now the NAS is running rock solid for two days. I backed up my Mac using Time Machine / netatalk (450 GiB transferred) and some Linux machines via NFS (100 GiB total) without a problem.
> 
> How much code is used for mv643xx offload functionality?
> Is it possible to debug things in the driver and figure out what happens during the crash?
> Is the hardware offload interface proprietary or reverse engineered or is it a well known API that can be analyzed?

Hi Thomas

Ezequiel Garcia probably knows this part of the driver and hardware
the best...

    Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net: mv643xx: interface does not transmit after some time
  2016-02-10 22:57               ` Andrew Lunn
@ 2016-02-11 14:38                   ` Ezequiel Garcia
  0 siblings, 0 replies; 15+ messages in thread
From: Ezequiel Garcia @ 2016-02-11 14:38 UTC (permalink / raw)
  To: Andrew Lunn, Thomas Schlöter
  Cc: Martin Michlmayr, Linux ARM Kernel, philipp, Karl Beldan, netdev,
	Thomas Petazzoni

(let's expand the Cc a bit)

On 10 February 2016 at 19:57, Andrew Lunn <andrew@lunn.ch> wrote:
> On Wed, Feb 10, 2016 at 07:40:54PM +0100, Thomas Schlöter wrote:
>>
>> > Am 08.02.2016 um 19:49 schrieb Thomas Schlöter <thomas@schloeter.net>:
>> >
>> >
>> >> Am 07.02.2016 um 22:07 schrieb Thomas Schlöter <thomas@schloeter.net>:
>> >>
>> >> Am 07.02.2016 um 21:35 schrieb Andrew Lunn <andrew@lunn.ch>:
>> >>>
>> >>>>> FWIW, we had a similar bug report in Debian recently:
>> >>>>> https://lists.debian.org/debian-arm/2016/01/msg00098.html
>> >>>
>> >>> Hi Thomas
>> >>>
>> >>> I this thread, Ian Campbell mentions a patch. Please could you try
>> >>> that patch and see if it fixes your problem.
>> >>>
>> >>> Thanks
>> >>>   Andrew
>> >>
>> >> Hi Andrew,
>> >>
>> >> I just applied the patch and the NAS is now running it. I???ll try to crash it tonight and keep you informed whether it worked.
>> >>
>> >> Thanks
>> >>    Thomas
>> >
>> > Hi Andrew,
>> >
>> > the patch did not fix the problem. After 1.2 GiB RX and 950 MiB TX, the interface crashed again.
>> >
>> > Now I switched off RX/TX offload just to make sure we are talking about the same problem. If we are, the interface should be stable without offload, right?
>> >
>> >     Thomas
>>
>> Okay, so I have installed ethtool and switched off all offload features available. Now the NAS is running rock solid for two days. I backed up my Mac using Time Machine / netatalk (450 GiB transferred) and some Linux machines via NFS (100 GiB total) without a problem.
>>
>> How much code is used for mv643xx offload functionality?
>> Is it possible to debug things in the driver and figure out what happens during the crash?
>> Is the hardware offload interface proprietary or reverse engineered or is it a well known API that can be analyzed?
>
> Hi Thomas
>
> Ezequiel Garcia probably knows this part of the driver and hardware
> the best...
>

The TCP segmentation offload (TSO) implemented in this driver is
mostly a software thing.

I'm CCing Karl and Philipp, who have fixed subtle issues in the TSO
path, and may be able to help figure this one out.

-- 
Ezequiel García, VanguardiaSur
www.vanguardiasur.com.ar

^ permalink raw reply	[flat|nested] 15+ messages in thread

* net: mv643xx: interface does not transmit after some time
@ 2016-02-11 14:38                   ` Ezequiel Garcia
  0 siblings, 0 replies; 15+ messages in thread
From: Ezequiel Garcia @ 2016-02-11 14:38 UTC (permalink / raw)
  To: linux-arm-kernel

(let's expand the Cc a bit)

On 10 February 2016 at 19:57, Andrew Lunn <andrew@lunn.ch> wrote:
> On Wed, Feb 10, 2016 at 07:40:54PM +0100, Thomas Schl?ter wrote:
>>
>> > Am 08.02.2016 um 19:49 schrieb Thomas Schl?ter <thomas@schloeter.net>:
>> >
>> >
>> >> Am 07.02.2016 um 22:07 schrieb Thomas Schl?ter <thomas@schloeter.net>:
>> >>
>> >> Am 07.02.2016 um 21:35 schrieb Andrew Lunn <andrew@lunn.ch>:
>> >>>
>> >>>>> FWIW, we had a similar bug report in Debian recently:
>> >>>>> https://lists.debian.org/debian-arm/2016/01/msg00098.html
>> >>>
>> >>> Hi Thomas
>> >>>
>> >>> I this thread, Ian Campbell mentions a patch. Please could you try
>> >>> that patch and see if it fixes your problem.
>> >>>
>> >>> Thanks
>> >>>   Andrew
>> >>
>> >> Hi Andrew,
>> >>
>> >> I just applied the patch and the NAS is now running it. I???ll try to crash it tonight and keep you informed whether it worked.
>> >>
>> >> Thanks
>> >>    Thomas
>> >
>> > Hi Andrew,
>> >
>> > the patch did not fix the problem. After 1.2 GiB RX and 950 MiB TX, the interface crashed again.
>> >
>> > Now I switched off RX/TX offload just to make sure we are talking about the same problem. If we are, the interface should be stable without offload, right?
>> >
>> >     Thomas
>>
>> Okay, so I have installed ethtool and switched off all offload features available. Now the NAS is running rock solid for two days. I backed up my Mac using Time Machine / netatalk (450 GiB transferred) and some Linux machines via NFS (100 GiB total) without a problem.
>>
>> How much code is used for mv643xx offload functionality?
>> Is it possible to debug things in the driver and figure out what happens during the crash?
>> Is the hardware offload interface proprietary or reverse engineered or is it a well known API that can be analyzed?
>
> Hi Thomas
>
> Ezequiel Garcia probably knows this part of the driver and hardware
> the best...
>

The TCP segmentation offload (TSO) implemented in this driver is
mostly a software thing.

I'm CCing Karl and Philipp, who have fixed subtle issues in the TSO
path, and may be able to help figure this one out.

-- 
Ezequiel Garc?a, VanguardiaSur
www.vanguardiasur.com.ar

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net: mv643xx: interface does not transmit after some time
  2016-02-11 14:38                   ` Ezequiel Garcia
@ 2016-02-27 20:06                     ` Adam Baker
  -1 siblings, 0 replies; 15+ messages in thread
From: Adam Baker @ 2016-02-27 20:06 UTC (permalink / raw)
  To: Ezequiel Garcia, Andrew Lunn, Thomas Schlöter
  Cc: Thomas Petazzoni, Karl Beldan, netdev, philipp, Martin Michlmayr,
	Linux ARM Kernel

On 11/02/16 14:38, Ezequiel Garcia wrote:
> (let's expand the Cc a bit)
>
> On 10 February 2016 at 19:57, Andrew Lunn <andrew@lunn.ch> wrote:
>> On Wed, Feb 10, 2016 at 07:40:54PM +0100, Thomas Schlöter wrote:
>>>
>>>> Am 08.02.2016 um 19:49 schrieb Thomas Schlöter <thomas@schloeter.net>:
>>>>
>>>>
>>>>> Am 07.02.2016 um 22:07 schrieb Thomas Schlöter <thomas@schloeter.net>:
>>>>>
>>>>> Am 07.02.2016 um 21:35 schrieb Andrew Lunn <andrew@lunn.ch>:
>>>>>>
>>>>>>>> FWIW, we had a similar bug report in Debian recently:
>>>>>>>> https://lists.debian.org/debian-arm/2016/01/msg00098.html
>>>>>>
>>>>>> Hi Thomas
>>>>>>
>>>>>> I this thread, Ian Campbell mentions a patch. Please could you try
>>>>>> that patch and see if it fixes your problem.
>>>>>>
>>>>>> Thanks
>>>>>>    Andrew
>>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> I just applied the patch and the NAS is now running it. I???ll try to crash it tonight and keep you informed whether it worked.
>>>>>
>>>>> Thanks
>>>>>     Thomas
>>>>
>>>> Hi Andrew,
>>>>
>>>> the patch did not fix the problem. After 1.2 GiB RX and 950 MiB TX, the interface crashed again.
>>>>
>>>> Now I switched off RX/TX offload just to make sure we are talking about the same problem. If we are, the interface should be stable without offload, right?
>>>>
>>>>      Thomas
>>>
>>> Okay, so I have installed ethtool and switched off all offload features available. Now the NAS is running rock solid for two days. I backed up my Mac using Time Machine / netatalk (450 GiB transferred) and some Linux machines via NFS (100 GiB total) without a problem.
>>>
>>> How much code is used for mv643xx offload functionality?
>>> Is it possible to debug things in the driver and figure out what happens during the crash?
>>> Is the hardware offload interface proprietary or reverse engineered or is it a well known API that can be analyzed?
>>
>> Hi Thomas
>>
>> Ezequiel Garcia probably knows this part of the driver and hardware
>> the best...
>>
>
> The TCP segmentation offload (TSO) implemented in this driver is
> mostly a software thing.
>
> I'm CCing Karl and Philipp, who have fixed subtle issues in the TSO
> path, and may be able to help figure this one out.
>

Hi,

Had this issue occur again today. In my case it seems to be triggered by 
large NFSv4 transfers.

I'm running 4.4 plus Nicolas Schichan's patch at
https://patchwork.ozlabs.org/patch/573334/

There is a thread a http://forum.doozan.com/read.php?2,17404 suggesting 
that this has been broken since at least 3.16.

I first spotted the issue when upgrading from 3.11 to 4.4.

Looking at 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/drivers/net/ethernet/marvell/mv643xx_eth.c 
I see 2014-05-22 as the date TSO support was first added which is 
shortly before the merge window opened for 3.16. I'm therefore guessing 
that TSO has been problematic since it's introduction.

Regards

Adam



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* net: mv643xx: interface does not transmit after some time
@ 2016-02-27 20:06                     ` Adam Baker
  0 siblings, 0 replies; 15+ messages in thread
From: Adam Baker @ 2016-02-27 20:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/02/16 14:38, Ezequiel Garcia wrote:
> (let's expand the Cc a bit)
>
> On 10 February 2016 at 19:57, Andrew Lunn <andrew@lunn.ch> wrote:
>> On Wed, Feb 10, 2016 at 07:40:54PM +0100, Thomas Schl?ter wrote:
>>>
>>>> Am 08.02.2016 um 19:49 schrieb Thomas Schl?ter <thomas@schloeter.net>:
>>>>
>>>>
>>>>> Am 07.02.2016 um 22:07 schrieb Thomas Schl?ter <thomas@schloeter.net>:
>>>>>
>>>>> Am 07.02.2016 um 21:35 schrieb Andrew Lunn <andrew@lunn.ch>:
>>>>>>
>>>>>>>> FWIW, we had a similar bug report in Debian recently:
>>>>>>>> https://lists.debian.org/debian-arm/2016/01/msg00098.html
>>>>>>
>>>>>> Hi Thomas
>>>>>>
>>>>>> I this thread, Ian Campbell mentions a patch. Please could you try
>>>>>> that patch and see if it fixes your problem.
>>>>>>
>>>>>> Thanks
>>>>>>    Andrew
>>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> I just applied the patch and the NAS is now running it. I???ll try to crash it tonight and keep you informed whether it worked.
>>>>>
>>>>> Thanks
>>>>>     Thomas
>>>>
>>>> Hi Andrew,
>>>>
>>>> the patch did not fix the problem. After 1.2 GiB RX and 950 MiB TX, the interface crashed again.
>>>>
>>>> Now I switched off RX/TX offload just to make sure we are talking about the same problem. If we are, the interface should be stable without offload, right?
>>>>
>>>>      Thomas
>>>
>>> Okay, so I have installed ethtool and switched off all offload features available. Now the NAS is running rock solid for two days. I backed up my Mac using Time Machine / netatalk (450 GiB transferred) and some Linux machines via NFS (100 GiB total) without a problem.
>>>
>>> How much code is used for mv643xx offload functionality?
>>> Is it possible to debug things in the driver and figure out what happens during the crash?
>>> Is the hardware offload interface proprietary or reverse engineered or is it a well known API that can be analyzed?
>>
>> Hi Thomas
>>
>> Ezequiel Garcia probably knows this part of the driver and hardware
>> the best...
>>
>
> The TCP segmentation offload (TSO) implemented in this driver is
> mostly a software thing.
>
> I'm CCing Karl and Philipp, who have fixed subtle issues in the TSO
> path, and may be able to help figure this one out.
>

Hi,

Had this issue occur again today. In my case it seems to be triggered by 
large NFSv4 transfers.

I'm running 4.4 plus Nicolas Schichan's patch at
https://patchwork.ozlabs.org/patch/573334/

There is a thread a http://forum.doozan.com/read.php?2,17404 suggesting 
that this has been broken since at least 3.16.

I first spotted the issue when upgrading from 3.11 to 4.4.

Looking at 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/drivers/net/ethernet/marvell/mv643xx_eth.c 
I see 2014-05-22 as the date TSO support was first added which is 
shortly before the merge window opened for 3.16. I'm therefore guessing 
that TSO has been problematic since it's introduction.

Regards

Adam

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-02-27 20:06 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-06 18:24 net: mv643xx: interface does not transmit after some time Thomas Schlöter
2016-02-06 18:34 ` Andrew Lunn
2016-02-06 23:19   ` Martin Michlmayr
2016-02-07 16:15     ` Adam Baker
2016-02-07 18:04       ` Andrew Lunn
     [not found]     ` <2ACB3A0B-DD51-43C1-A56E-E7C175645554@schloeter.net>
2016-02-07 20:35       ` Andrew Lunn
2016-02-07 21:07         ` Thomas Schlöter
2016-02-08 18:49           ` Thomas Schlöter
2016-02-10 18:40             ` Thomas Schlöter
2016-02-10 22:57               ` Andrew Lunn
2016-02-11 14:38                 ` Ezequiel Garcia
2016-02-11 14:38                   ` Ezequiel Garcia
2016-02-27 20:06                   ` Adam Baker
2016-02-27 20:06                     ` Adam Baker
2016-02-07 21:11     ` Thomas Schlöter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.