All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Problems with mvneta
       [not found] <20171018223425.42ce7a74@gmx.de>
@ 2017-10-18 20:55 ` Thomas Petazzoni
  2017-10-19 22:25   ` Sven Müller
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Petazzoni @ 2017-10-18 20:55 UTC (permalink / raw)
  To: Sven Müller
  Cc: Grégory Clement, Antoine Ténart, netdev, Marcin Wojtas

Hello,

I'm adding my colleagues Grégory Clement and Antoine Ténart in Cc, as
well as Marcin Wojtas, who also worked on mvneta, and the netdev
mailing list. I'm keeping your full message below so that others can
read the context.

On Wed, 18 Oct 2017 22:34:25 +0200, Sven Müller wrote:

> I've found your email address in the kernel sources of the mvneta driver. I didn't find a bug system on free-electrons.com. And on kernel.org searching for mvneta wasn't really helpful. 

There is a bug tracker for the Linux kernel at
https://bugzilla.kernel.org/. However, I indeed wouldn't be notified of
bug reports against mvneta.

> Some people including me hacked the Zyxel NSA-326 some time ago. The whole thread you can find here: 
> 
> https://forum.doozan.com/read.php?2,27108
> 
> Until kernel 4.10.10 everything worked great. I didn't test 4.11. But any higher kernel version (tested 4.12., 4.13) causes network problems with nfs. I described it here:
> 
> https://forum.doozan.com/read.php?2,27108,37699#msg-37699
> 
> Transfering files not with full speed but over a longer period of time, e.g. playing music files over nfs or reading a lot of smaller files causes the error: 
> 
> Sep 27 17:35:37 nas kernel: rpc-srv/tcp: nfsd: sent only 36488 when sending 65644 bytes - shutting down socket
> 
> After that message the network is down. I have to reboot the device in order to get any network connectivity again. And how I wrote: 4.10.10 works perfectly. 4.12 produced a lot of this errors, 4.13 seems to be a little bit better. 
> 
> Unfortunately I didn't find a way to reproduce this problem directly. It occurs after 5 minutes up to one hour of transferring files via nfs. 
> 
> If you are interested in fixing this bug, I would like to support you with providing you any information I can find on my system and testing. 
> 
> My kernel config, which is working in 4.10.10 and producing the nfs problem in 4.12 and 4.13: 
> 
> https://paste.pound-python.org/show/RCaG9J4yBy79K3NL5F1
> 
> and the device tree: 
> 
> https://paste.pound-python.org/show/UiLpMgUERuCddHOn6Vsp/

There have been a few changes in the mvneta code between 4.10 and 4.12,
but not many of them look potentially problematic.

f95936cca6a8410ebdaf164bc5d3ade9e1de5bdb net: mvneta: Adjust six checks for null pointers
d441b688a1bce8e2e1b43d8090738c306dd09131 net: mvneta: Use kmalloc_array() in mvneta_txq_init()
5d6312ed57a909c86bb9472b2bbc012539392e7d net: mvneta: Improve two size determinations in mvneta_init()
2911063011fc7adcb43c93e9c3e9dc7798f459f5 net: mvneta: Use devm_kmalloc_array() in mvneta_init()
82960fff09bc394e2a33d5369969410699c04861 net: mvneta: fix failed to suspend if WOL is enabled
d6956ac87b5ff6841b09c273a70de86200d82019 net: mvneta: set rx mode during resume if interface is running
a38d20d791fdcd79ebccda15a8308a6d8ada6e1c net: mvneta: add RGMII_RXID and RGMII_TXID support
9768b45ceb0bc7bdee61837afad331dd6bf7977f net: mvneta: support suspend and resume
4581be42fce5e1d208cbeb8e78df3f1b4673eff7 net: mvneta: make mvneta_eth_tool_ops static
9303ab2b3402b60f6c39abfdbfa4ce00fce8bee4 net: mvneta: fix build errors when linux/phy*.h is removed from net/dsa.h
b60a00f9c5f14695991cb77dce7e926623269d88 net: mvneta: implement .set_wol and .get_wol
6ad20165d376fa07919a70e4f43dfae564601829 drivers: net: generalize napi_complete_done()
a29b6235560a1ed10c8e1a73bfc616a66b802b90 net: mvneta: add BQL support
2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a net: mvneta: add xmit_more support
bc1f44709cf27fb2a5766cadafe7e2ad5e9cb221 net: make ndo_get_stats64 a void function

The only ones that really could have an impact are:

6ad20165d376fa07919a70e4f43dfae564601829 drivers: net: generalize napi_complete_done()
a29b6235560a1ed10c8e1a73bfc616a66b802b90 net: mvneta: add BQL support
2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a net: mvneta: add xmit_more support

Could you try to take mvneta* from Linux 4.10, put that in Linux 4.12,
and see if you can still produce the problem? I'd like to first make
sure the problem really is inside mvneta, and not in some other place.

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Problems with mvneta
  2017-10-18 20:55 ` Problems with mvneta Thomas Petazzoni
@ 2017-10-19 22:25   ` Sven Müller
  2017-10-20  7:09     ` Thomas Petazzoni
  0 siblings, 1 reply; 22+ messages in thread
From: Sven Müller @ 2017-10-19 22:25 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Grégory Clement, Antoine Ténart, netdev, Marcin Wojtas

Hi all.

First of all I'm not familiar with kernel programming at all, so please excuse me, if I don't understand everything at the first glance. 

I did as you told me, and moved the mvneta folder from 4.10.10 to 4.13.7: 

rm -rf linux-4.13.7-gentoo/drivers/net/ethernet/marvell
mv linux-4.10.10-gentoo/drivers/net/ethernet/marvell linux-4.13.7-gentoo/drivers/net/ethernet/marvell

This approach didn't compile. So I had to change a view lines: 

--- linux-4.10.10-gentoo/drivers/net/ethernet/marvell/mvneta.c	2017-04-17 01:55:25.126007316 +0200
+++ linux-4.13.7-gentoo/drivers/net/ethernet/marvell/mvneta.c	2017-10-19 22:27:02.685114689 +0200
@@ -28,6 +28,7 @@
 #include <linux/of_mdio.h>
 #include <linux/of_net.h>
 #include <linux/phy.h>
+#include <linux/phy_fixed.h>
 #include <linux/platform_device.h>
 #include <linux/skbuff.h>
 #include <net/hwbm.h>
@@ -652,7 +653,8 @@
 }
 
 /* Get System Network Statistics */
-static struct rtnl_link_stats64 *
+//static struct rtnl_link_stats64 *
+static void
 mvneta_get_stats64(struct net_device *dev,
 		   struct rtnl_link_stats64 *stats)
 {
@@ -687,7 +689,7 @@
 
 	stats->tx_dropped	= dev->stats.tx_dropped;
 
-	return stats;
+//	return stats;
 }
 
 /* Rx descriptors helper methods */


It compiles and runs fine. After a couple of hours and testing no issues were found. 

The changes with a lot of noise:
diff -ur linux-4.13.7-gentoo.orig/drivers/net/ethernet/marvell/ linux-4.13.7-gentoo/drivers/net/ethernet/marvell/
https://paste.pound-python.org/show/GoVNQqxqr2AK6abriwFH/

diff -ur linux-4.10.10-gentoo/drivers/net/ethernet/marvell/ linux-4.13.7-gentoo/drivers/net/ethernet/marvell/
https://paste.pound-python.org/show/LFkv81qeIGQTOvFDQfTZ/

Thanks a lot
Sven



Am Wed, 18 Oct 2017 22:55:57 +0200
schrieb Thomas Petazzoni <thomas.petazzoni@free-electrons.com>:

> Hello,
> 
> I'm adding my colleagues Grégory Clement and Antoine Ténart in Cc, as
> well as Marcin Wojtas, who also worked on mvneta, and the netdev
> mailing list. I'm keeping your full message below so that others can
> read the context.
> 
> On Wed, 18 Oct 2017 22:34:25 +0200, Sven Müller wrote:
> 
> > I've found your email address in the kernel sources of the mvneta
> > driver. I didn't find a bug system on free-electrons.com. And on
> > kernel.org searching for mvneta wasn't really helpful.   
> 
> There is a bug tracker for the Linux kernel at
> https://bugzilla.kernel.org/. However, I indeed wouldn't be notified
> of bug reports against mvneta.
> 
> > Some people including me hacked the Zyxel NSA-326 some time ago.
> > The whole thread you can find here: 
> > 
> > https://forum.doozan.com/read.php?2,27108
> > 
> > Until kernel 4.10.10 everything worked great. I didn't test 4.11.
> > But any higher kernel version (tested 4.12., 4.13) causes network
> > problems with nfs. I described it here:
> > 
> > https://forum.doozan.com/read.php?2,27108,37699#msg-37699
> > 
> > Transfering files not with full speed but over a longer period of
> > time, e.g. playing music files over nfs or reading a lot of smaller
> > files causes the error: 
> > 
> > Sep 27 17:35:37 nas kernel: rpc-srv/tcp: nfsd: sent only 36488 when
> > sending 65644 bytes - shutting down socket
> > 
> > After that message the network is down. I have to reboot the device
> > in order to get any network connectivity again. And how I wrote:
> > 4.10.10 works perfectly. 4.12 produced a lot of this errors, 4.13
> > seems to be a little bit better. 
> > 
> > Unfortunately I didn't find a way to reproduce this problem
> > directly. It occurs after 5 minutes up to one hour of transferring
> > files via nfs. 
> > 
> > If you are interested in fixing this bug, I would like to support
> > you with providing you any information I can find on my system and
> > testing. 
> > 
> > My kernel config, which is working in 4.10.10 and producing the nfs
> > problem in 4.12 and 4.13: 
> > 
> > https://paste.pound-python.org/show/RCaG9J4yBy79K3NL5F1
> > 
> > and the device tree: 
> > 
> > https://paste.pound-python.org/show/UiLpMgUERuCddHOn6Vsp/  
> 
> There have been a few changes in the mvneta code between 4.10 and
> 4.12, but not many of them look potentially problematic.
> 
> f95936cca6a8410ebdaf164bc5d3ade9e1de5bdb net: mvneta: Adjust six
> checks for null pointers d441b688a1bce8e2e1b43d8090738c306dd09131
> net: mvneta: Use kmalloc_array() in mvneta_txq_init()
> 5d6312ed57a909c86bb9472b2bbc012539392e7d net: mvneta: Improve two
> size determinations in mvneta_init()
> 2911063011fc7adcb43c93e9c3e9dc7798f459f5 net: mvneta: Use
> devm_kmalloc_array() in mvneta_init()
> 82960fff09bc394e2a33d5369969410699c04861 net: mvneta: fix failed to
> suspend if WOL is enabled d6956ac87b5ff6841b09c273a70de86200d82019
> net: mvneta: set rx mode during resume if interface is running
> a38d20d791fdcd79ebccda15a8308a6d8ada6e1c net: mvneta: add RGMII_RXID
> and RGMII_TXID support 9768b45ceb0bc7bdee61837afad331dd6bf7977f net:
> mvneta: support suspend and resume
> 4581be42fce5e1d208cbeb8e78df3f1b4673eff7 net: mvneta: make
> mvneta_eth_tool_ops static 9303ab2b3402b60f6c39abfdbfa4ce00fce8bee4
> net: mvneta: fix build errors when linux/phy*.h is removed from
> net/dsa.h b60a00f9c5f14695991cb77dce7e926623269d88 net: mvneta:
> implement .set_wol and .get_wol
> 6ad20165d376fa07919a70e4f43dfae564601829 drivers: net: generalize
> napi_complete_done() a29b6235560a1ed10c8e1a73bfc616a66b802b90 net:
> mvneta: add BQL support 2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a net:
> mvneta: add xmit_more support
> bc1f44709cf27fb2a5766cadafe7e2ad5e9cb221 net: make ndo_get_stats64 a
> void function
> 
> The only ones that really could have an impact are:
> 
> 6ad20165d376fa07919a70e4f43dfae564601829 drivers: net: generalize
> napi_complete_done() a29b6235560a1ed10c8e1a73bfc616a66b802b90 net:
> mvneta: add BQL support 2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a net:
> mvneta: add xmit_more support
> 
> Could you try to take mvneta* from Linux 4.10, put that in Linux 4.12,
> and see if you can still produce the problem? I'd like to first make
> sure the problem really is inside mvneta, and not in some other place.
> 
> Thanks!
> 
> Thomas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Problems with mvneta
  2017-10-19 22:25   ` Sven Müller
@ 2017-10-20  7:09     ` Thomas Petazzoni
  2017-10-23  6:29       ` Andreas Tobler
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Petazzoni @ 2017-10-20  7:09 UTC (permalink / raw)
  To: Sven Müller
  Cc: Grégory Clement, Antoine Ténart, netdev, Marcin Wojtas

Hello,

On Fri, 20 Oct 2017 00:25:24 +0200, Sven Müller wrote:

> First of all I'm not familiar with kernel programming at all, so
> please excuse me, if I don't understand everything at the first
> glance. 

No problem. Your bug report and effort to nail down the issue are much
appreciated!

> It compiles and runs fine. After a couple of hours and testing no
> issues were found. 

OK, so this really hints at a regression in the mvneta driver itself.

Could you try to revert just:

  6ad20165d376fa07919a70e4f43dfae564601829
  a29b6235560a1ed10c8e1a73bfc616a66b802b90
  2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a

first all of them, and then each one by one, so that we can pin-point
the commit that causes the breakage ?

I looked at all the other changes in mvneta between 4.10 and 4.12, and
I don't see how any of the other changes can cause a functional
difference. So let's focus on those 3 commits for the moment.

Assuming you're using Git, to revert a commit just do: "git revert
<commitid>".

Thanks again!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Problems with mvneta
  2017-10-20  7:09     ` Thomas Petazzoni
@ 2017-10-23  6:29       ` Andreas Tobler
  2017-10-23  9:30         ` Sven Müller
  0 siblings, 1 reply; 22+ messages in thread
From: Andreas Tobler @ 2017-10-23  6:29 UTC (permalink / raw)
  To: Thomas Petazzoni, Sven Müller
  Cc: Grégory Clement, Antoine Ténart, netdev, Marcin Wojtas

Hi all,


On 20.10.17 09:09, Thomas Petazzoni wrote:
> Hello,
> 
> On Fri, 20 Oct 2017 00:25:24 +0200, Sven Müller wrote:
> 
>> First of all I'm not familiar with kernel programming at all, so
>> please excuse me, if I don't understand everything at the first
>> glance.
> 
> No problem. Your bug report and effort to nail down the issue are much
> appreciated!
> 
>> It compiles and runs fine. After a couple of hours and testing no
>> issues were found.
> 
> OK, so this really hints at a regression in the mvneta driver itself.
> 
> Could you try to revert just:
> 
>    6ad20165d376fa07919a70e4f43dfae564601829
>    a29b6235560a1ed10c8e1a73bfc616a66b802b90
>    2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a
> 
> first all of them, and then each one by one, so that we can pin-point
> the commit that causes the breakage ?
> 
> I looked at all the other changes in mvneta between 4.10 and 4.12, and
> I don't see how any of the other changes can cause a functional
> difference. So let's focus on those 3 commits for the moment.

We did also experience some issues with the mvneta driver.

I nailed it down to the BQL commit. 
(a29b6235560a1ed10c8e1a73bfc616a66b802b90 net: mvneta: add BQL support)

Here we did an upgrade from 4.10.13 to 4.13.5. Before it was stable and 
a 4.13.5 with the 4.10.13 driver was also ok.

Our scenario is the following, the board we use acts as router and 
forwards some traffic. To distribute the load to both cpu's we have, we 
enabled RPS (receive packet steering). Now as soon as we stress the 
router with iperf3 the eth links go down. The router sits between a 
client and a server where we blow load with iperf3.

If we disable RPS, the links seems stable.

Doing the iperf3 tests from/to the router directly, iperf3 client is 
started on the router, does not show any instability.

Maybe these observations help to find out what's going on.

Thx,
Andreas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Problems with mvneta
  2017-10-23  6:29       ` Andreas Tobler
@ 2017-10-23  9:30         ` Sven Müller
  2017-10-31 14:23           ` Sven Müller
  0 siblings, 1 reply; 22+ messages in thread
From: Sven Müller @ 2017-10-23  9:30 UTC (permalink / raw)
  To: Andreas Tobler
  Cc: Thomas Petazzoni, Grégory Clement, Antoine Ténart,
	netdev, Marcin Wojtas

I've tested a lot on weekend, but the results are still contradictory and therefore not reliable enough.

Kernel 4.13.7 with driver of that version: 

Reverted all 3 patches: 6ad2, a29b, 2a90: works fine. No issues.
Applied only 6ad2: Got the nfs socket shutdown error on the first test, but yesterday it has been worked perfectly for a lot of hours. 
Applied only a29b: No issues
Applied only 2a90: No issues. 
Applied a29b and 2a90: No issues. 

The main problem is to create a reproducible crash scenario. 

Regards
Sven

Am Mon, 23 Oct 2017 08:29:33 +0200
schrieb Andreas Tobler <andreas.tobler@cloudguard.ch>:

> Hi all,
> 
> 
> We did also experience some issues with the mvneta driver.
> 
> I nailed it down to the BQL commit. 
> (a29b6235560a1ed10c8e1a73bfc616a66b802b90 net: mvneta: add BQL
> support)
> 
> Here we did an upgrade from 4.10.13 to 4.13.5. Before it was stable
> and a 4.13.5 with the 4.10.13 driver was also ok.
> 
> Our scenario is the following, the board we use acts as router and 
> forwards some traffic. To distribute the load to both cpu's we have,
> we enabled RPS (receive packet steering). Now as soon as we stress
> the router with iperf3 the eth links go down. The router sits between
> a client and a server where we blow load with iperf3.
> 
> If we disable RPS, the links seems stable.
> 
> Doing the iperf3 tests from/to the router directly, iperf3 client is 
> started on the router, does not show any instability.
> 
> Maybe these observations help to find out what's going on.
> 
> Thx,
> Andreas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Problems with mvneta
  2017-10-23  9:30         ` Sven Müller
@ 2017-10-31 14:23           ` Sven Müller
  2017-10-31 14:27             ` Thomas Petazzoni
  2017-11-10 10:22             ` [PATCH] " Sven Müller
  0 siblings, 2 replies; 22+ messages in thread
From: Sven Müller @ 2017-10-31 14:23 UTC (permalink / raw)
  To: Andreas Tobler
  Cc: Thomas Petazzoni, Grégory Clement, Antoine Ténart,
	netdev, Marcin Wojtas

After quite a long time of trying to reproduce the issue without any success I got 3 network crashes today. And all errors occurred with a kernel including the patch: 

2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a

At least according to Andreas' and my problems we can exclude the 6ad2 patch as the source of the errors. 

Thanks and regards
Sven

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Problems with mvneta
  2017-10-31 14:23           ` Sven Müller
@ 2017-10-31 14:27             ` Thomas Petazzoni
  2017-10-31 17:09               ` Simon Guinot
  2017-11-10 10:22             ` [PATCH] " Sven Müller
  1 sibling, 1 reply; 22+ messages in thread
From: Thomas Petazzoni @ 2017-10-31 14:27 UTC (permalink / raw)
  To: Sven Müller
  Cc: Andreas Tobler, Grégory Clement, Antoine Ténart,
	netdev, Marcin Wojtas, Simon Guinot

Hello,

Let's add Simon Guinot in the loop.

On Tue, 31 Oct 2017 15:23:22 +0100, Sven Müller wrote:
> After quite a long time of trying to reproduce the issue without any success I got 3 network crashes today. And all errors occurred with a kernel including the patch: 
> 
> 2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a
> 
> At least according to Andreas' and my problems we can exclude the 6ad2 patch as the source of the errors. 

Simon, 2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a is your commit, adding
xmit_more support, and a number of people are reporting stability
issues with this patch applied.

Do you think you will have some time to look into this ?

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Problems with mvneta
  2017-10-31 14:27             ` Thomas Petazzoni
@ 2017-10-31 17:09               ` Simon Guinot
  2017-10-31 20:23                 ` Thomas Petazzoni
  0 siblings, 1 reply; 22+ messages in thread
From: Simon Guinot @ 2017-10-31 17:09 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Sven Müller, Andreas Tobler, Grégory Clement,
	Antoine Ténart, netdev, Marcin Wojtas

[-- Attachment #1: Type: text/plain, Size: 1353 bytes --]

On Tue, Oct 31, 2017 at 03:27:40PM +0100, Thomas Petazzoni wrote:
> Hello,

Hi Thomas,

> 
> Let's add Simon Guinot in the loop.
> 
> On Tue, 31 Oct 2017 15:23:22 +0100, Sven Müller wrote:
> > After quite a long time of trying to reproduce the issue without any success I got 3 network crashes today. And all errors occurred with a kernel including the patch: 
> > 
> > 2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a
> > 
> > At least according to Andreas' and my problems we can exclude the 6ad2 patch as the source of the errors. 
> 
> Simon, 2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a is your commit, adding
> xmit_more support, and a number of people are reporting stability
> issues with this patch applied.

I wrote an earlier version of this patch. But I think this commit has
been modified by the submitter Marcin Wojtas because I don't remember
anything about the maximum number of descriptors allowed to be flush.

> 
> Do you think you will have some time to look into this ?

No I don't have time to look into that.

But after a quick look, I wonder what is happening if
"txq->pending + frags > MVNETA_TXQ_DEC_SENT_MASK" ? Because IIUC
mvneta_txq_pend_desc_add() is called anyway. And according to the
comment inside the function, it assumes there is less than 255
descriptors to send... It looks suspect.

Simon

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Problems with mvneta
  2017-10-31 17:09               ` Simon Guinot
@ 2017-10-31 20:23                 ` Thomas Petazzoni
  2017-11-01  8:10                   ` Marcin Wojtas
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Petazzoni @ 2017-10-31 20:23 UTC (permalink / raw)
  To: Simon Guinot
  Cc: Sven Müller, Andreas Tobler, Grégory Clement,
	Antoine Ténart, netdev, Marcin Wojtas

Hello,

On Tue, 31 Oct 2017 18:09:38 +0100, Simon Guinot wrote:

> > On Tue, 31 Oct 2017 15:23:22 +0100, Sven Müller wrote:  
> > > After quite a long time of trying to reproduce the issue without any success I got 3 network crashes today. And all errors occurred with a kernel including the patch: 
> > > 
> > > 2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a
> > > 
> > > At least according to Andreas' and my problems we can exclude the 6ad2 patch as the source of the errors.   
> > 
> > Simon, 2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a is your commit, adding
> > xmit_more support, and a number of people are reporting stability
> > issues with this patch applied.  
> 
> I wrote an earlier version of this patch. But I think this commit has
> been modified by the submitter Marcin Wojtas because I don't remember
> anything about the maximum number of descriptors allowed to be flush.
> 
> > 
> > Do you think you will have some time to look into this ?  
> 
> No I don't have time to look into that.
> 
> But after a quick look, I wonder what is happening if
> "txq->pending + frags > MVNETA_TXQ_DEC_SENT_MASK" ? Because IIUC
> mvneta_txq_pend_desc_add() is called anyway. And according to the
> comment inside the function, it assumes there is less than 255
> descriptors to send... It looks suspect.

Thanks for the feedback. Marcin, do you remember this xmit_more patch?

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Problems with mvneta
  2017-10-31 20:23                 ` Thomas Petazzoni
@ 2017-11-01  8:10                   ` Marcin Wojtas
  2017-11-08 16:58                     ` [PATCH] net: mvneta: fix handling of the Tx descriptor counter Simon Guinot
  0 siblings, 1 reply; 22+ messages in thread
From: Marcin Wojtas @ 2017-11-01  8:10 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Simon Guinot, Sven Müller, Andreas Tobler,
	Grégory Clement, Antoine Ténart, netdev

Hi Thomas,

2017-10-31 21:23 GMT+01:00 Thomas Petazzoni
<thomas.petazzoni@free-electrons.com>:
> Hello,
>
> On Tue, 31 Oct 2017 18:09:38 +0100, Simon Guinot wrote:
>
>> > On Tue, 31 Oct 2017 15:23:22 +0100, Sven Müller wrote:
>> > > After quite a long time of trying to reproduce the issue without any success I got 3 network crashes today. And all errors occurred with a kernel including the patch:
>> > >
>> > > 2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a
>> > >
>> > > At least according to Andreas' and my problems we can exclude the 6ad2 patch as the source of the errors.
>> >
>> > Simon, 2a90f7e1d5d04e4f1060268e0b55a2c702bbd67a is your commit, adding
>> > xmit_more support, and a number of people are reporting stability
>> > issues with this patch applied.
>>
>> I wrote an earlier version of this patch. But I think this commit has
>> been modified by the submitter Marcin Wojtas because I don't remember
>> anything about the maximum number of descriptors allowed to be flush.
>>
>> >
>> > Do you think you will have some time to look into this ?
>>
>> No I don't have time to look into that.
>>
>> But after a quick look, I wonder what is happening if
>> "txq->pending + frags > MVNETA_TXQ_DEC_SENT_MASK" ? Because IIUC
>> mvneta_txq_pend_desc_add() is called anyway. And according to the
>> comment inside the function, it assumes there is less than 255
>> descriptors to send... It looks suspect.
>
> Thanks for the feedback. Marcin, do you remember this xmit_more patch?
>

Yes I do. It seems pretty simple and didn't show any issues durin very
long stress tests. I will check the mvneta_tx() routine if there's
anything suspicios/missed.

Marcin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] net: mvneta: fix handling of the Tx descriptor counter
  2017-11-01  8:10                   ` Marcin Wojtas
@ 2017-11-08 16:58                     ` Simon Guinot
  2017-11-08 17:03                       ` David Laight
                                         ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Simon Guinot @ 2017-11-08 16:58 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: netdev, Sven Müller, Andreas Tobler, Grégory Clement,
	Antoine Ténart, Marcin Wojtas, Simon Guinot, stable

The mvneta controller provides a 8-bit register to update the pending
Tx descriptor counter. Then, a maximum of 255 Tx descriptors can be
added at once. In the current code the mvneta_txq_pend_desc_add function
assumes the caller takes care of this limit. But it is not the case. In
some situations (xmit_more flag), more than 255 descriptors are added.
When this happens, the Tx descriptor counter register is updated with a
wrong value, which breaks the whole Tx queue management.

This patch fixes the issue by allowing the mvneta_txq_pend_desc_add
function to process more than 255 Tx descriptors.

Fixes: 2a90f7e1d5d0 ("net: mvneta: add xmit_more support")
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 64a04975bcf8..027c08ce4e5d 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -816,11 +816,14 @@ static void mvneta_txq_pend_desc_add(struct mvneta_port *pp,
 {
 	u32 val;
 
-	/* Only 255 descriptors can be added at once ; Assume caller
-	 * process TX desriptors in quanta less than 256
-	 */
-	val = pend_desc + txq->pending;
-	mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
+	pend_desc += txq->pending;
+
+	/* Only 255 Tx descriptors can be added at once */
+	while (pend_desc > 0) {
+		val = min(pend_desc, 255);
+		mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
+		pend_desc -= val;
+	}
 	txq->pending = 0;
 }
 
@@ -2413,8 +2416,7 @@ static int mvneta_tx(struct sk_buff *skb, struct net_device *dev)
 		if (txq->count >= txq->tx_stop_threshold)
 			netif_tx_stop_queue(nq);
 
-		if (!skb->xmit_more || netif_xmit_stopped(nq) ||
-		    txq->pending + frags > MVNETA_TXQ_DEC_SENT_MASK)
+		if (!skb->xmit_more || netif_xmit_stopped(nq))
 			mvneta_txq_pend_desc_add(pp, txq, frags);
 		else
 			txq->pending += frags;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* RE: [PATCH] net: mvneta: fix handling of the Tx descriptor counter
  2017-11-08 16:58                     ` [PATCH] net: mvneta: fix handling of the Tx descriptor counter Simon Guinot
@ 2017-11-08 17:03                       ` David Laight
  2017-11-08 17:17                       ` Simon Guinot
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: David Laight @ 2017-11-08 17:03 UTC (permalink / raw)
  To: 'Simon Guinot', Thomas Petazzoni
  Cc: netdev, Sven Müller, Andreas Tobler, Grégory Clement,
	Antoine Ténart, Marcin Wojtas, stable

From: Simon Guinot
> Sent: 08 November 2017 16:59
> 
> The mvneta controller provides a 8-bit register to update the pending
> Tx descriptor counter. Then, a maximum of 255 Tx descriptors can be
> added at once. In the current code the mvneta_txq_pend_desc_add function
> assumes the caller takes care of this limit. But it is not the case. In
> some situations (xmit_more flag), more than 255 descriptors are added.
> When this happens, the Tx descriptor counter register is updated with a
> wrong value, which breaks the whole Tx queue management.

Even with xmit_more it is usually best to limit the number
(in order to reduce tx latency).

OTOH there are probably paths where it 'just happens'.

> This patch fixes the issue by allowing the mvneta_txq_pend_desc_add
> function to process more than 255 Tx descriptors.
> 
> Fixes: 2a90f7e1d5d0 ("net: mvneta: add xmit_more support")
> Cc: stable@vger.kernel.org # 4.11+
> Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
> ---
>  drivers/net/ethernet/marvell/mvneta.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 64a04975bcf8..027c08ce4e5d 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -816,11 +816,14 @@ static void mvneta_txq_pend_desc_add(struct mvneta_port *pp,
>  {
>  	u32 val;
> 
> -	/* Only 255 descriptors can be added at once ; Assume caller
> -	 * process TX desriptors in quanta less than 256
> -	 */
> -	val = pend_desc + txq->pending;
> -	mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
> +	pend_desc += txq->pending;
> +
> +	/* Only 255 Tx descriptors can be added at once */
> +	while (pend_desc > 0) {
> +		val = min(pend_desc, 255);
> +		mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
> +		pend_desc -= val;
> +	}

do { ... } while () ??

	David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: mvneta: fix handling of the Tx descriptor counter
  2017-11-08 16:58                     ` [PATCH] net: mvneta: fix handling of the Tx descriptor counter Simon Guinot
  2017-11-08 17:03                       ` David Laight
@ 2017-11-08 17:17                       ` Simon Guinot
  2017-11-09 19:19                         ` Andreas Tobler
  2017-11-11  9:45                       ` David Miller
  2017-11-13 15:27                       ` [PATCH v2] " Simon Guinot
  3 siblings, 1 reply; 22+ messages in thread
From: Simon Guinot @ 2017-11-08 17:17 UTC (permalink / raw)
  To: Sven Müller, Andreas Tobler
  Cc: netdev, Thomas Petazzoni, Grégory Clement,
	Antoine Ténart, Marcin Wojtas, stable

[-- Attachment #1: Type: text/plain, Size: 2366 bytes --]

Hi Sven and Andreas,

Please, can you try with this patch ?

Thanks.

Simon

On Wed, Nov 08, 2017 at 05:58:35PM +0100, Simon Guinot wrote:
> The mvneta controller provides a 8-bit register to update the pending
> Tx descriptor counter. Then, a maximum of 255 Tx descriptors can be
> added at once. In the current code the mvneta_txq_pend_desc_add function
> assumes the caller takes care of this limit. But it is not the case. In
> some situations (xmit_more flag), more than 255 descriptors are added.
> When this happens, the Tx descriptor counter register is updated with a
> wrong value, which breaks the whole Tx queue management.
> 
> This patch fixes the issue by allowing the mvneta_txq_pend_desc_add
> function to process more than 255 Tx descriptors.
> 
> Fixes: 2a90f7e1d5d0 ("net: mvneta: add xmit_more support")
> Cc: stable@vger.kernel.org # 4.11+
> Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
> ---
>  drivers/net/ethernet/marvell/mvneta.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 64a04975bcf8..027c08ce4e5d 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -816,11 +816,14 @@ static void mvneta_txq_pend_desc_add(struct mvneta_port *pp,
>  {
>  	u32 val;
>  
> -	/* Only 255 descriptors can be added at once ; Assume caller
> -	 * process TX desriptors in quanta less than 256
> -	 */
> -	val = pend_desc + txq->pending;
> -	mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
> +	pend_desc += txq->pending;
> +
> +	/* Only 255 Tx descriptors can be added at once */
> +	while (pend_desc > 0) {
> +		val = min(pend_desc, 255);
> +		mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
> +		pend_desc -= val;
> +	}
>  	txq->pending = 0;
>  }
>  
> @@ -2413,8 +2416,7 @@ static int mvneta_tx(struct sk_buff *skb, struct net_device *dev)
>  		if (txq->count >= txq->tx_stop_threshold)
>  			netif_tx_stop_queue(nq);
>  
> -		if (!skb->xmit_more || netif_xmit_stopped(nq) ||
> -		    txq->pending + frags > MVNETA_TXQ_DEC_SENT_MASK)
> +		if (!skb->xmit_more || netif_xmit_stopped(nq))
>  			mvneta_txq_pend_desc_add(pp, txq, frags);
>  		else
>  			txq->pending += frags;
> -- 
> 2.9.3

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: mvneta: fix handling of the Tx descriptor counter
  2017-11-08 17:17                       ` Simon Guinot
@ 2017-11-09 19:19                         ` Andreas Tobler
  0 siblings, 0 replies; 22+ messages in thread
From: Andreas Tobler @ 2017-11-09 19:19 UTC (permalink / raw)
  To: Simon Guinot, Sven Müller
  Cc: netdev, Thomas Petazzoni, Grégory Clement,
	Antoine Ténart, Marcin Wojtas, stable

Hi Simon,


On 08.11.17 18:17, Simon Guinot wrote:
> Hi Sven and Andreas,
> 
> Please, can you try with this patch ?

Today we, my son and I, repeated the failing scenario and we were able 
to show that our scenario behaves stable after you patch being applied.

Thanks for taking care of this issue. If you need further testing let me 
know.

Regards,
Andreas

> On Wed, Nov 08, 2017 at 05:58:35PM +0100, Simon Guinot wrote:
>> The mvneta controller provides a 8-bit register to update the pending
>> Tx descriptor counter. Then, a maximum of 255 Tx descriptors can be
>> added at once. In the current code the mvneta_txq_pend_desc_add function
>> assumes the caller takes care of this limit. But it is not the case. In
>> some situations (xmit_more flag), more than 255 descriptors are added.
>> When this happens, the Tx descriptor counter register is updated with a
>> wrong value, which breaks the whole Tx queue management.
>>
>> This patch fixes the issue by allowing the mvneta_txq_pend_desc_add
>> function to process more than 255 Tx descriptors.
>>
>> Fixes: 2a90f7e1d5d0 ("net: mvneta: add xmit_more support")
>> Cc: stable@vger.kernel.org # 4.11+
>> Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
>> ---
>>   drivers/net/ethernet/marvell/mvneta.c | 16 +++++++++-------
>>   1 file changed, 9 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
>> index 64a04975bcf8..027c08ce4e5d 100644
>> --- a/drivers/net/ethernet/marvell/mvneta.c
>> +++ b/drivers/net/ethernet/marvell/mvneta.c
>> @@ -816,11 +816,14 @@ static void mvneta_txq_pend_desc_add(struct mvneta_port *pp,
>>   {
>>   	u32 val;
>>   
>> -	/* Only 255 descriptors can be added at once ; Assume caller
>> -	 * process TX desriptors in quanta less than 256
>> -	 */
>> -	val = pend_desc + txq->pending;
>> -	mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
>> +	pend_desc += txq->pending;
>> +
>> +	/* Only 255 Tx descriptors can be added at once */
>> +	while (pend_desc > 0) {
>> +		val = min(pend_desc, 255);
>> +		mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
>> +		pend_desc -= val;
>> +	}
>>   	txq->pending = 0;
>>   }
>>   
>> @@ -2413,8 +2416,7 @@ static int mvneta_tx(struct sk_buff *skb, struct net_device *dev)
>>   		if (txq->count >= txq->tx_stop_threshold)
>>   			netif_tx_stop_queue(nq);
>>   
>> -		if (!skb->xmit_more || netif_xmit_stopped(nq) ||
>> -		    txq->pending + frags > MVNETA_TXQ_DEC_SENT_MASK)
>> +		if (!skb->xmit_more || netif_xmit_stopped(nq))
>>   			mvneta_txq_pend_desc_add(pp, txq, frags);
>>   		else
>>   			txq->pending += frags;
>> -- 
>> 2.9.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: mvneta: fix handling of the Tx descriptor counter
  2017-10-31 14:23           ` Sven Müller
  2017-10-31 14:27             ` Thomas Petazzoni
@ 2017-11-10 10:22             ` Sven Müller
  1 sibling, 0 replies; 22+ messages in thread
From: Sven Müller @ 2017-11-10 10:22 UTC (permalink / raw)
  Cc: Thomas Petazzoni, Grégory Clement, Antoine Ténart,
	netdev, Marcin Wojtas


Hi Simon, 

Until now it seems to work. No issues so far.

Regards and Thanks
Sven

Am Thu, 9 Nov 2017 20:19:42 +0100
schrieb Andreas Tobler <andreas.tobler@cloudguard.ch>:

> Hi Simon,
> 
> 
> On 08.11.17 18:17, Simon Guinot wrote:  
> > Hi Sven and Andreas,
> > 
> > Please, can you try with this patch ?    
> 
> Today we, my son and I, repeated the failing scenario and we were
> able to show that our scenario behaves stable after you patch being
> applied.
> 
> Thanks for taking care of this issue. If you need further testing let
> me know.
> 
> Regards,
> Andreas
>   
> > On Wed, Nov 08, 2017 at 05:58:35PM +0100, Simon Guinot wrote:    
> >> The mvneta controller provides a 8-bit register to update the
> >> pending Tx descriptor counter. Then, a maximum of 255 Tx
> >> descriptors can be added at once. In the current code the
> >> mvneta_txq_pend_desc_add function assumes the caller takes care of
> >> this limit. But it is not the case. In some situations (xmit_more
> >> flag), more than 255 descriptors are added. When this happens, the
> >> Tx descriptor counter register is updated with a wrong value,
> >> which breaks the whole Tx queue management.
> >>
> >> This patch fixes the issue by allowing the mvneta_txq_pend_desc_add
> >> function to process more than 255 Tx descriptors.
> >>
> >> Fixes: 2a90f7e1d5d0 ("net: mvneta: add xmit_more support")
> >> Cc: stable@vger.kernel.org # 4.11+
> >> Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
> >> ---
> >>   drivers/net/ethernet/marvell/mvneta.c | 16 +++++++++-------
> >>   1 file changed, 9 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/marvell/mvneta.c
> >> b/drivers/net/ethernet/marvell/mvneta.c index
> >> 64a04975bcf8..027c08ce4e5d 100644 ---
> >> a/drivers/net/ethernet/marvell/mvneta.c +++
> >> b/drivers/net/ethernet/marvell/mvneta.c @@ -816,11 +816,14 @@
> >> static void mvneta_txq_pend_desc_add(struct mvneta_port *pp, {
> >>   	u32 val;
> >>   
> >> -	/* Only 255 descriptors can be added at once ; Assume
> >> caller
> >> -	 * process TX desriptors in quanta less than 256
> >> -	 */
> >> -	val = pend_desc + txq->pending;
> >> -	mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
> >> +	pend_desc += txq->pending;
> >> +
> >> +	/* Only 255 Tx descriptors can be added at once */
> >> +	while (pend_desc > 0) {
> >> +		val = min(pend_desc, 255);
> >> +		mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id),
> >> val);
> >> +		pend_desc -= val;
> >> +	}
> >>   	txq->pending = 0;
> >>   }
> >>   
> >> @@ -2413,8 +2416,7 @@ static int mvneta_tx(struct sk_buff *skb,
> >> struct net_device *dev) if (txq->count >= txq->tx_stop_threshold)
> >>   			netif_tx_stop_queue(nq);
> >>   
> >> -		if (!skb->xmit_more || netif_xmit_stopped(nq) ||
> >> -		    txq->pending + frags >
> >> MVNETA_TXQ_DEC_SENT_MASK)
> >> +		if (!skb->xmit_more || netif_xmit_stopped(nq))
> >>   			mvneta_txq_pend_desc_add(pp, txq, frags);
> >>   		else
> >>   			txq->pending += frags;
> >> -- 
> >> 2.9.3    

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: mvneta: fix handling of the Tx descriptor counter
  2017-11-08 16:58                     ` [PATCH] net: mvneta: fix handling of the Tx descriptor counter Simon Guinot
  2017-11-08 17:03                       ` David Laight
  2017-11-08 17:17                       ` Simon Guinot
@ 2017-11-11  9:45                       ` David Miller
  2017-11-13 14:51                         ` Simon Guinot
  2017-11-13 15:27                       ` [PATCH v2] " Simon Guinot
  3 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2017-11-11  9:45 UTC (permalink / raw)
  To: simon.guinot
  Cc: thomas.petazzoni, netdev, musv, andreas.tobler, gregory.clement,
	antoine.tenart, mw, stable

From: Simon Guinot <simon.guinot@sequanux.org>
Date: Wed,  8 Nov 2017 17:58:35 +0100

> @@ -2413,8 +2416,7 @@ static int mvneta_tx(struct sk_buff *skb, struct net_device *dev)
>  		if (txq->count >= txq->tx_stop_threshold)
>  			netif_tx_stop_queue(nq);
>  
> -		if (!skb->xmit_more || netif_xmit_stopped(nq) ||
> -		    txq->pending + frags > MVNETA_TXQ_DEC_SENT_MASK)
> +		if (!skb->xmit_more || netif_xmit_stopped(nq))
>  			mvneta_txq_pend_desc_add(pp, txq, frags);
>  		else
>  			txq->pending += frags;

As David Laight said, you should not allow unlimited amounts of
->xmit_more frames to be queued without a TX doorbell update.

Therefore, please keep some kind of limit here otherwise latency
will spike in some circumstances.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: mvneta: fix handling of the Tx descriptor counter
  2017-11-11  9:45                       ` David Miller
@ 2017-11-13 14:51                         ` Simon Guinot
  2017-11-13 14:54                           ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Simon Guinot @ 2017-11-13 14:51 UTC (permalink / raw)
  To: David Miller
  Cc: thomas.petazzoni, netdev, musv, andreas.tobler, gregory.clement,
	antoine.tenart, mw, stable

[-- Attachment #1: Type: text/plain, Size: 1081 bytes --]

On Sat, Nov 11, 2017 at 06:45:04PM +0900, David Miller wrote:
> From: Simon Guinot <simon.guinot@sequanux.org>
> Date: Wed,  8 Nov 2017 17:58:35 +0100
> 
> > @@ -2413,8 +2416,7 @@ static int mvneta_tx(struct sk_buff *skb, struct net_device *dev)
> >  		if (txq->count >= txq->tx_stop_threshold)
> >  			netif_tx_stop_queue(nq);
> >  
> > -		if (!skb->xmit_more || netif_xmit_stopped(nq) ||
> > -		    txq->pending + frags > MVNETA_TXQ_DEC_SENT_MASK)
> > +		if (!skb->xmit_more || netif_xmit_stopped(nq))
> >  			mvneta_txq_pend_desc_add(pp, txq, frags);
> >  		else
> >  			txq->pending += frags;
> 
> As David Laight said, you should not allow unlimited amounts of
> ->xmit_more frames to be queued without a TX doorbell update.
> 
> Therefore, please keep some kind of limit here otherwise latency
> will spike in some circumstances.

Hi David,

IIUC the driver stops the queue if a threshold of 316 Tx descriptors is
reached (default and worst value).

Is that a "good enough" kind of limit ? Or do you want me to keep the
condition above ?

Simon

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: mvneta: fix handling of the Tx descriptor counter
  2017-11-13 14:51                         ` Simon Guinot
@ 2017-11-13 14:54                           ` David Miller
  2017-11-13 15:36                             ` Simon Guinot
  0 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2017-11-13 14:54 UTC (permalink / raw)
  To: simon.guinot
  Cc: thomas.petazzoni, netdev, musv, andreas.tobler, gregory.clement,
	antoine.tenart, mw, stable

From: Simon Guinot <simon.guinot@sequanux.org>
Date: Mon, 13 Nov 2017 15:51:15 +0100

> IIUC the driver stops the queue if a threshold of 316 Tx descriptors is
> reached (default and worst value).

That's a lot of latency.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2] net: mvneta: fix handling of the Tx descriptor counter
  2017-11-08 16:58                     ` [PATCH] net: mvneta: fix handling of the Tx descriptor counter Simon Guinot
                                         ` (2 preceding siblings ...)
  2017-11-11  9:45                       ` David Miller
@ 2017-11-13 15:27                       ` Simon Guinot
  2017-11-14 12:53                         ` David Miller
  3 siblings, 1 reply; 22+ messages in thread
From: Simon Guinot @ 2017-11-13 15:27 UTC (permalink / raw)
  To: thomas.petazzoni
  Cc: netdev, musv, andreas.tobler, gregory.clement, antoine.tenart,
	mw, stable, David Miller, David Laight, Simon Guinot

The mvneta controller provides a 8-bit register to update the pending
Tx descriptor counter. Then, a maximum of 255 Tx descriptors can be
added at once. In the current code the mvneta_txq_pend_desc_add function
assumes the caller takes care of this limit. But it is not the case. In
some situations (xmit_more flag), more than 255 descriptors are added.
When this happens, the Tx descriptor counter register is updated with a
wrong value, which breaks the whole Tx queue management.

This patch fixes the issue by allowing the mvneta_txq_pend_desc_add
function to process more than 255 Tx descriptors.

Fixes: 2a90f7e1d5d0 ("net: mvneta: add xmit_more support")
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Changes for v2:

- Use do {} while instead of while {}.
- Keep "txq->pending + frags > MVNETA_TXQ_DEC_SENT_MASK" as a flushing
  condition for the Tx queue.

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 64a04975bcf8..bc93b69cfd1e 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -816,11 +816,14 @@ static void mvneta_txq_pend_desc_add(struct mvneta_port *pp,
 {
 	u32 val;
 
-	/* Only 255 descriptors can be added at once ; Assume caller
-	 * process TX desriptors in quanta less than 256
-	 */
-	val = pend_desc + txq->pending;
-	mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
+	pend_desc += txq->pending;
+
+	/* Only 255 Tx descriptors can be added at once */
+	do {
+		val = min(pend_desc, 255);
+		mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
+		pend_desc -= val;
+	} while (pend_desc > 0);
 	txq->pending = 0;
 }
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: mvneta: fix handling of the Tx descriptor counter
  2017-11-13 14:54                           ` David Miller
@ 2017-11-13 15:36                             ` Simon Guinot
  2017-11-20 14:58                               ` David Laight
  0 siblings, 1 reply; 22+ messages in thread
From: Simon Guinot @ 2017-11-13 15:36 UTC (permalink / raw)
  To: David Miller
  Cc: thomas.petazzoni, netdev, musv, andreas.tobler, gregory.clement,
	antoine.tenart, mw, stable

[-- Attachment #1: Type: text/plain, Size: 607 bytes --]

On Mon, Nov 13, 2017 at 11:54:14PM +0900, David Miller wrote:
> From: Simon Guinot <simon.guinot@sequanux.org>
> Date: Mon, 13 Nov 2017 15:51:15 +0100
> 
> > IIUC the driver stops the queue if a threshold of 316 Tx descriptors is
> > reached (default and worst value).
> 
> That's a lot of latency.

OK, then I'll keep the "tx_pending > 255" flushing condition. But note
there is no other software mechanism to limit the Tx latency inside the
mvneta driver. Should we add something ? And is that not rather the job
of the network stack to keep track of the latency and to limit the txq
size ?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] net: mvneta: fix handling of the Tx descriptor counter
  2017-11-13 15:27                       ` [PATCH v2] " Simon Guinot
@ 2017-11-14 12:53                         ` David Miller
  0 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2017-11-14 12:53 UTC (permalink / raw)
  To: simon.guinot
  Cc: thomas.petazzoni, netdev, musv, andreas.tobler, gregory.clement,
	antoine.tenart, mw, stable, David.Laight

From: Simon Guinot <simon.guinot@sequanux.org>
Date: Mon, 13 Nov 2017 16:27:02 +0100

> The mvneta controller provides a 8-bit register to update the pending
> Tx descriptor counter. Then, a maximum of 255 Tx descriptors can be
> added at once. In the current code the mvneta_txq_pend_desc_add function
> assumes the caller takes care of this limit. But it is not the case. In
> some situations (xmit_more flag), more than 255 descriptors are added.
> When this happens, the Tx descriptor counter register is updated with a
> wrong value, which breaks the whole Tx queue management.
> 
> This patch fixes the issue by allowing the mvneta_txq_pend_desc_add
> function to process more than 255 Tx descriptors.
> 
> Fixes: 2a90f7e1d5d0 ("net: mvneta: add xmit_more support")
> Cc: stable@vger.kernel.org # 4.11+
> Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>

Applied.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] net: mvneta: fix handling of the Tx descriptor counter
  2017-11-13 15:36                             ` Simon Guinot
@ 2017-11-20 14:58                               ` David Laight
  0 siblings, 0 replies; 22+ messages in thread
From: David Laight @ 2017-11-20 14:58 UTC (permalink / raw)
  To: 'Simon Guinot', David Miller
  Cc: thomas.petazzoni, netdev, musv, andreas.tobler, gregory.clement,
	antoine.tenart, mw, stable

From: Simon Guinot
> Sent: 13 November 2017 15:36
> To: David Miller
> Cc: thomas.petazzoni@free-electrons.com; netdev@vger.kernel.org; musv@gmx.de;
> andreas.tobler@cloudguard.ch; gregory.clement@free-electrons.com; antoine.tenart@free-electrons.com;
> mw@semihalf.com; stable@vger.kernel.org
> Subject: Re: [PATCH] net: mvneta: fix handling of the Tx descriptor counter
> 
> On Mon, Nov 13, 2017 at 11:54:14PM +0900, David Miller wrote:
> > From: Simon Guinot <simon.guinot@sequanux.org>
> > Date: Mon, 13 Nov 2017 15:51:15 +0100
> >
> > > IIUC the driver stops the queue if a threshold of 316 Tx descriptors is
> > > reached (default and worst value).
> >
> > That's a lot of latency.
> 
> OK, then I'll keep the "tx_pending > 255" flushing condition. But note
> there is no other software mechanism to limit the Tx latency inside the
> mvneta driver. Should we add something ? And is that not rather the job
> of the network stack to keep track of the latency and to limit the txq
> size ?

This is 'first packet transmit latency'.

If the 'doorbell write' is just a PCIe write then, on most systems,
that is cheap and pipelined/posted.
I'd almost be surprised if you see any 'improvement' from not doing
it every packet.

The overall tx queue size is a different issue - usually needs
limiting by BQL if TSO is done.

	David

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-11-20 14:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20171018223425.42ce7a74@gmx.de>
2017-10-18 20:55 ` Problems with mvneta Thomas Petazzoni
2017-10-19 22:25   ` Sven Müller
2017-10-20  7:09     ` Thomas Petazzoni
2017-10-23  6:29       ` Andreas Tobler
2017-10-23  9:30         ` Sven Müller
2017-10-31 14:23           ` Sven Müller
2017-10-31 14:27             ` Thomas Petazzoni
2017-10-31 17:09               ` Simon Guinot
2017-10-31 20:23                 ` Thomas Petazzoni
2017-11-01  8:10                   ` Marcin Wojtas
2017-11-08 16:58                     ` [PATCH] net: mvneta: fix handling of the Tx descriptor counter Simon Guinot
2017-11-08 17:03                       ` David Laight
2017-11-08 17:17                       ` Simon Guinot
2017-11-09 19:19                         ` Andreas Tobler
2017-11-11  9:45                       ` David Miller
2017-11-13 14:51                         ` Simon Guinot
2017-11-13 14:54                           ` David Miller
2017-11-13 15:36                             ` Simon Guinot
2017-11-20 14:58                               ` David Laight
2017-11-13 15:27                       ` [PATCH v2] " Simon Guinot
2017-11-14 12:53                         ` David Miller
2017-11-10 10:22             ` [PATCH] " Sven Müller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.