Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load
@ 2020-06-25  8:57 Tobias Waldekranz
  2020-06-25 19:19 ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Tobias Waldekranz @ 2020-06-25  8:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, fugang.duan

In the ISR, we poll the event register for the queues in need of
service and then enter polled mode. After this point, the event
register will never be read again until we exit polled mode.

In a scenario where a UDP flow is routed back out through the same
interface, i.e. "router-on-a-stick" we'll typically only see an rx
queue event initially. Once we start to process the incoming flow
we'll be locked polled mode, but we'll never clean the tx rings since
that event is never caught.

Eventually the netdev watchdog will trip, causing all buffers to be
dropped and then the process starts over again.

By adding a poll of the active events at each NAPI call, we avoid the
starvation.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 9f80a33c5b16..328fb12ef8db 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1616,8 +1616,17 @@ fec_enet_rx(struct net_device *ndev, int budget)
 }
 
 static bool
-fec_enet_collect_events(struct fec_enet_private *fep, uint int_events)
+fec_enet_collect_events(struct fec_enet_private *fep)
 {
+	uint int_events;
+
+	int_events = readl(fep->hwp + FEC_IEVENT);
+
+	/* Don't clear MDIO events, we poll for those */
+	int_events &= ~FEC_ENET_MII;
+
+	writel(int_events, fep->hwp + FEC_IEVENT);
+
 	if (int_events == 0)
 		return false;
 
@@ -1643,16 +1652,9 @@ fec_enet_interrupt(int irq, void *dev_id)
 {
 	struct net_device *ndev = dev_id;
 	struct fec_enet_private *fep = netdev_priv(ndev);
-	uint int_events;
 	irqreturn_t ret = IRQ_NONE;
 
-	int_events = readl(fep->hwp + FEC_IEVENT);
-
-	/* Don't clear MDIO events, we poll for those */
-	int_events &= ~FEC_ENET_MII;
-
-	writel(int_events, fep->hwp + FEC_IEVENT);
-	fec_enet_collect_events(fep, int_events);
+	fec_enet_collect_events(fep);
 
 	if ((fep->work_tx || fep->work_rx) && fep->link) {
 		ret = IRQ_HANDLED;
@@ -1673,6 +1675,8 @@ static int fec_enet_rx_napi(struct napi_struct *napi, int budget)
 	struct fec_enet_private *fep = netdev_priv(ndev);
 	int pkts;
 
+	fec_enet_collect_events(fep);
+
 	pkts = fec_enet_rx(ndev, budget);
 
 	fec_enet_tx(ndev);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load
  2020-06-25  8:57 [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load Tobias Waldekranz
@ 2020-06-25 19:19 ` David Miller
  2020-06-28  6:23   ` [EXT] " Andy Duan
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2020-06-25 19:19 UTC (permalink / raw)
  To: tobias; +Cc: netdev, fugang.duan

From: Tobias Waldekranz <tobias@waldekranz.com>
Date: Thu, 25 Jun 2020 10:57:28 +0200

> In the ISR, we poll the event register for the queues in need of
> service and then enter polled mode. After this point, the event
> register will never be read again until we exit polled mode.
> 
> In a scenario where a UDP flow is routed back out through the same
> interface, i.e. "router-on-a-stick" we'll typically only see an rx
> queue event initially. Once we start to process the incoming flow
> we'll be locked polled mode, but we'll never clean the tx rings since
> that event is never caught.
> 
> Eventually the netdev watchdog will trip, causing all buffers to be
> dropped and then the process starts over again.
> 
> By adding a poll of the active events at each NAPI call, we avoid the
> starvation.
> 
> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>

You're losing events, which is a bug.  Therefore this is a bug fix
which should be submitted to 'net' and an appropriate "Fixes: "
tag must be added to your commit message.

Thank you.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load
  2020-06-25 19:19 ` David Miller
@ 2020-06-28  6:23   ` Andy Duan
  2020-06-29 16:29     ` Tobias Waldekranz
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Duan @ 2020-06-28  6:23 UTC (permalink / raw)
  To: David Miller, tobias; +Cc: netdev

From: David Miller <davem@davemloft.net> Sent: Friday, June 26, 2020 3:20 AM
> From: Tobias Waldekranz <tobias@waldekranz.com>
> Date: Thu, 25 Jun 2020 10:57:28 +0200
> 
> > In the ISR, we poll the event register for the queues in need of
> > service and then enter polled mode. After this point, the event
> > register will never be read again until we exit polled mode.
> >
> > In a scenario where a UDP flow is routed back out through the same
> > interface, i.e. "router-on-a-stick" we'll typically only see an rx
> > queue event initially. Once we start to process the incoming flow
> > we'll be locked polled mode, but we'll never clean the tx rings since
> > that event is never caught.
> >
> > Eventually the netdev watchdog will trip, causing all buffers to be
> > dropped and then the process starts over again.
> >
> > By adding a poll of the active events at each NAPI call, we avoid the
> > starvation.
> >
> > Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
> 
> You're losing events, which is a bug.  Therefore this is a bug fix which should
> be submitted to 'net' and an appropriate "Fixes: "
> tag must be added to your commit message.
> 
> Thank you.

I never seem bandwidth test cause netdev watchdog trip.
Can you describe the reproduce steps on the commit, then we can reproduce it
on my local. Thanks. 

But, the logic seems fine.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load
  2020-06-28  6:23   ` [EXT] " Andy Duan
@ 2020-06-29 16:29     ` Tobias Waldekranz
  2020-06-30  6:27       ` Andy Duan
  0 siblings, 1 reply; 14+ messages in thread
From: Tobias Waldekranz @ 2020-06-29 16:29 UTC (permalink / raw)
  To: Andy Duan, David Miller; +Cc: netdev

On Sun Jun 28, 2020 at 8:23 AM CEST, Andy Duan wrote:
> I never seem bandwidth test cause netdev watchdog trip.
> Can you describe the reproduce steps on the commit, then we can
> reproduce it
> on my local. Thanks.

My setup uses a i.MX8M Nano EVK connected to an ethernet switch, but
can get the same results with a direct connection to a PC.

On the iMX, configure two VLANs on top of the FEC and enable IPv4
forwarding.

On the PC, configure two VLANs and put them in different
namespaces. From one namespace, use trafgen to generate a flow that
the iMX will route from the first VLAN to the second and then back
towards the second namespace on the PC.

Something like:

    {
        eth(sa=PC_MAC, da=IMX_MAC),
        ipv4(saddr=10.0.2.2, daddr=10.0.3.2, ttl=2)
        udp(sp=1, dp=2),
        "Hello world"
    }

Wait a couple of seconds and then you'll see the output from fec_dump.

In the same setup I also see a weird issue when running a TCP flow
using iperf3. Most of the time (~70%) when i start the iperf3 client
I'll see ~450Mbps of throughput. In the other case (~30%) I'll see
~790Mbps. The system is "stably bi-modal", i.e. whichever rate is
reached in the beginning is then sustained for as long as the session
is kept alive.

I've inserted some tracepoints in the driver to try to understand
what's going on: https://svgshare.com/i/MVp.svg

What I can't figure out is why the Tx buffers seem to be collected at
a much slower rate in the slow case (top in the picture). If we fall
behind in one NAPI poll, we should catch up at the next call (which we
can see in the fast case). But in the slow case we keep falling
further and further behind until we freeze the queue. Is this
something you've ever observed? Any ideas?

Thank you

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load
  2020-06-29 16:29     ` Tobias Waldekranz
@ 2020-06-30  6:27       ` Andy Duan
  2020-06-30  7:30         ` Tobias Waldekranz
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Duan @ 2020-06-30  6:27 UTC (permalink / raw)
  To: Tobias Waldekranz, David Miller; +Cc: netdev

From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June 30, 2020 12:29 AM
> On Sun Jun 28, 2020 at 8:23 AM CEST, Andy Duan wrote:
> > I never seem bandwidth test cause netdev watchdog trip.
> > Can you describe the reproduce steps on the commit, then we can
> > reproduce it on my local. Thanks.
> 
> My setup uses a i.MX8M Nano EVK connected to an ethernet switch, but can
> get the same results with a direct connection to a PC.
> 
> On the iMX, configure two VLANs on top of the FEC and enable IPv4
> forwarding.
> 
> On the PC, configure two VLANs and put them in different namespaces. From
> one namespace, use trafgen to generate a flow that the iMX will route from
> the first VLAN to the second and then back towards the second namespace on
> the PC.
> 
> Something like:
> 
>     {
>         eth(sa=PC_MAC, da=IMX_MAC),
>         ipv4(saddr=10.0.2.2, daddr=10.0.3.2, ttl=2)
>         udp(sp=1, dp=2),
>         "Hello world"
>     }
> 
> Wait a couple of seconds and then you'll see the output from fec_dump.
> 
> In the same setup I also see a weird issue when running a TCP flow using
> iperf3. Most of the time (~70%) when i start the iperf3 client I'll see
> ~450Mbps of throughput. In the other case (~30%) I'll see ~790Mbps. The
> system is "stably bi-modal", i.e. whichever rate is reached in the beginning is
> then sustained for as long as the session is kept alive.
> 
> I've inserted some tracepoints in the driver to try to understand what's going
> on:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsvgsha
> re.com%2Fi%2FMVp.svg&amp;data=02%7C01%7Cfugang.duan%40nxp.com%
> 7C12854e21ea124b4cc2e008d81c59d618%7C686ea1d3bc2b4c6fa92cd99c5c
> 301635%7C0%7C0%7C637290519453656013&amp;sdata=by4ShOkmTaRkFfE
> 0xJkrTptC%2B2egFf9iM4E5hx4jiSU%3D&amp;reserved=0
> 
> What I can't figure out is why the Tx buffers seem to be collected at a much
> slower rate in the slow case (top in the picture). If we fall behind in one NAPI
> poll, we should catch up at the next call (which we can see in the fast case).
> But in the slow case we keep falling further and further behind until we freeze
> the queue. Is this something you've ever observed? Any ideas?

Before, our cases don't reproduce the issue, cpu resource has better bandwidth
than ethernet uDMA then there have chance to complete current NAPI. The next,
work_tx get the update, never catch the issue.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load
  2020-06-30  6:27       ` Andy Duan
@ 2020-06-30  7:30         ` Tobias Waldekranz
  2020-06-30  8:26           ` Andy Duan
  0 siblings, 1 reply; 14+ messages in thread
From: Tobias Waldekranz @ 2020-06-30  7:30 UTC (permalink / raw)
  To: Andy Duan, David Miller; +Cc: netdev

On Tue Jun 30, 2020 at 8:27 AM CEST, Andy Duan wrote:
> From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June 30,
> 2020 12:29 AM
> > On Sun Jun 28, 2020 at 8:23 AM CEST, Andy Duan wrote:
> > > I never seem bandwidth test cause netdev watchdog trip.
> > > Can you describe the reproduce steps on the commit, then we can
> > > reproduce it on my local. Thanks.
> > 
> > My setup uses a i.MX8M Nano EVK connected to an ethernet switch, but can
> > get the same results with a direct connection to a PC.
> > 
> > On the iMX, configure two VLANs on top of the FEC and enable IPv4
> > forwarding.
> > 
> > On the PC, configure two VLANs and put them in different namespaces. From
> > one namespace, use trafgen to generate a flow that the iMX will route from
> > the first VLAN to the second and then back towards the second namespace on
> > the PC.
> > 
> > Something like:
> > 
> >     {
> >         eth(sa=PC_MAC, da=IMX_MAC),
> >         ipv4(saddr=10.0.2.2, daddr=10.0.3.2, ttl=2)
> >         udp(sp=1, dp=2),
> >         "Hello world"
> >     }
> > 
> > Wait a couple of seconds and then you'll see the output from fec_dump.
> > 
> > In the same setup I also see a weird issue when running a TCP flow using
> > iperf3. Most of the time (~70%) when i start the iperf3 client I'll see
> > ~450Mbps of throughput. In the other case (~30%) I'll see ~790Mbps. The
> > system is "stably bi-modal", i.e. whichever rate is reached in the beginning is
> > then sustained for as long as the session is kept alive.
> > 
> > I've inserted some tracepoints in the driver to try to understand what's going
> > on:
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsvgsha
> > re.com%2Fi%2FMVp.svg&amp;data=02%7C01%7Cfugang.duan%40nxp.com%
> > 7C12854e21ea124b4cc2e008d81c59d618%7C686ea1d3bc2b4c6fa92cd99c5c
> > 301635%7C0%7C0%7C637290519453656013&amp;sdata=by4ShOkmTaRkFfE
> > 0xJkrTptC%2B2egFf9iM4E5hx4jiSU%3D&amp;reserved=0
> > 
> > What I can't figure out is why the Tx buffers seem to be collected at a much
> > slower rate in the slow case (top in the picture). If we fall behind in one NAPI
> > poll, we should catch up at the next call (which we can see in the fast case).
> > But in the slow case we keep falling further and further behind until we freeze
> > the queue. Is this something you've ever observed? Any ideas?
>
> Before, our cases don't reproduce the issue, cpu resource has better
> bandwidth
> than ethernet uDMA then there have chance to complete current NAPI. The
> next,
> work_tx get the update, never catch the issue.

It appears it has nothing to do with routing back out through the same
interface.

I get the same bi-modal behavior if just run the iperf3 server on the
iMX and then have it be the transmitting part, i.e. on the PC I run:

    iperf3 -c $IMX_IP -R

I would be very interesting to see what numbers you see in this
scenario.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load
  2020-06-30  7:30         ` Tobias Waldekranz
@ 2020-06-30  8:26           ` Andy Duan
  2020-06-30  8:55             ` Tobias Waldekranz
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Duan @ 2020-06-30  8:26 UTC (permalink / raw)
  To: Tobias Waldekranz, David Miller; +Cc: netdev

From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June 30, 2020 3:31 PM
> On Tue Jun 30, 2020 at 8:27 AM CEST, Andy Duan wrote:
> > From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June
> > 30,
> > 2020 12:29 AM
> > > On Sun Jun 28, 2020 at 8:23 AM CEST, Andy Duan wrote:
> > > > I never seem bandwidth test cause netdev watchdog trip.
> > > > Can you describe the reproduce steps on the commit, then we can
> > > > reproduce it on my local. Thanks.
> > >
> > > My setup uses a i.MX8M Nano EVK connected to an ethernet switch, but
> > > can get the same results with a direct connection to a PC.
> > >
> > > On the iMX, configure two VLANs on top of the FEC and enable IPv4
> > > forwarding.
> > >
> > > On the PC, configure two VLANs and put them in different namespaces.
> > > From one namespace, use trafgen to generate a flow that the iMX will
> > > route from the first VLAN to the second and then back towards the
> > > second namespace on the PC.
> > >
> > > Something like:
> > >
> > >     {
> > >         eth(sa=PC_MAC, da=IMX_MAC),
> > >         ipv4(saddr=10.0.2.2, daddr=10.0.3.2, ttl=2)
> > >         udp(sp=1, dp=2),
> > >         "Hello world"
> > >     }
> > >
> > > Wait a couple of seconds and then you'll see the output from fec_dump.
> > >
> > > In the same setup I also see a weird issue when running a TCP flow
> > > using iperf3. Most of the time (~70%) when i start the iperf3 client
> > > I'll see ~450Mbps of throughput. In the other case (~30%) I'll see
> > > ~790Mbps. The system is "stably bi-modal", i.e. whichever rate is
> > > reached in the beginning is then sustained for as long as the session is kept
> alive.
> > >
> > > I've inserted some tracepoints in the driver to try to understand
> > > what's going
> > > on:
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsv
> > > gsha
> re.com%2Fi%2FMVp.svg&amp;data=02%7C01%7Cfugang.duan%40nxp.com%
> > >
> 7C12854e21ea124b4cc2e008d81c59d618%7C686ea1d3bc2b4c6fa92cd99c5c
> > >
> 301635%7C0%7C0%7C637290519453656013&amp;sdata=by4ShOkmTaRkFfE
> > > 0xJkrTptC%2B2egFf9iM4E5hx4jiSU%3D&amp;reserved=0
> > >
> > > What I can't figure out is why the Tx buffers seem to be collected
> > > at a much slower rate in the slow case (top in the picture). If we
> > > fall behind in one NAPI poll, we should catch up at the next call (which we
> can see in the fast case).
> > > But in the slow case we keep falling further and further behind
> > > until we freeze the queue. Is this something you've ever observed? Any
> ideas?
> >
> > Before, our cases don't reproduce the issue, cpu resource has better
> > bandwidth than ethernet uDMA then there have chance to complete
> > current NAPI. The next, work_tx get the update, never catch the issue.
> 
> It appears it has nothing to do with routing back out through the same
> interface.
> 
> I get the same bi-modal behavior if just run the iperf3 server on the iMX and
> then have it be the transmitting part, i.e. on the PC I run:
> 
>     iperf3 -c $IMX_IP -R
> 
> I would be very interesting to see what numbers you see in this scenario.
I just have on imx8mn evk in my hands, and run the case, the numbers is ~940Mbps
as below.

root@imx8mnevk:~# iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.192.242.132, port 43402
[  5] local 10.192.242.96 port 5201 connected to 10.192.242.132 port 43404
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   109 MBytes   913 Mbits/sec    0    428 KBytes
[  5]   1.00-2.00   sec   112 MBytes   943 Mbits/sec    0    447 KBytes
[  5]   2.00-3.00   sec   112 MBytes   941 Mbits/sec    0    472 KBytes
[  5]   3.00-4.00   sec   113 MBytes   944 Mbits/sec    0    472 KBytes
[  5]   4.00-5.00   sec   112 MBytes   942 Mbits/sec    0    472 KBytes
[  5]   5.00-6.00   sec   112 MBytes   936 Mbits/sec    0    472 KBytes
[  5]   6.00-7.00   sec   113 MBytes   945 Mbits/sec    0    472 KBytes
[  5]   7.00-8.00   sec   112 MBytes   944 Mbits/sec    0    472 KBytes
[  5]   8.00-9.00   sec   112 MBytes   941 Mbits/sec    0    472 KBytes
[  5]   9.00-10.00  sec   112 MBytes   940 Mbits/sec    0    472 KBytes
[  5]  10.00-10.04  sec  4.16 MBytes   873 Mbits/sec    0    472 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.04  sec  1.10 GBytes   939 Mbits/sec    0             sender

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load
  2020-06-30  8:26           ` Andy Duan
@ 2020-06-30  8:55             ` Tobias Waldekranz
  2020-06-30  9:02               ` Andy Duan
  0 siblings, 1 reply; 14+ messages in thread
From: Tobias Waldekranz @ 2020-06-30  8:55 UTC (permalink / raw)
  To: Andy Duan, David Miller; +Cc: netdev

On Tue Jun 30, 2020 at 10:26 AM CEST, Andy Duan wrote:
> From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June 30,
> 2020 3:31 PM
> > On Tue Jun 30, 2020 at 8:27 AM CEST, Andy Duan wrote:
> > > From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June
> > > 30,
> > > 2020 12:29 AM
> > > > On Sun Jun 28, 2020 at 8:23 AM CEST, Andy Duan wrote:
> > > > > I never seem bandwidth test cause netdev watchdog trip.
> > > > > Can you describe the reproduce steps on the commit, then we can
> > > > > reproduce it on my local. Thanks.
> > > >
> > > > My setup uses a i.MX8M Nano EVK connected to an ethernet switch, but
> > > > can get the same results with a direct connection to a PC.
> > > >
> > > > On the iMX, configure two VLANs on top of the FEC and enable IPv4
> > > > forwarding.
> > > >
> > > > On the PC, configure two VLANs and put them in different namespaces.
> > > > From one namespace, use trafgen to generate a flow that the iMX will
> > > > route from the first VLAN to the second and then back towards the
> > > > second namespace on the PC.
> > > >
> > > > Something like:
> > > >
> > > >     {
> > > >         eth(sa=PC_MAC, da=IMX_MAC),
> > > >         ipv4(saddr=10.0.2.2, daddr=10.0.3.2, ttl=2)
> > > >         udp(sp=1, dp=2),
> > > >         "Hello world"
> > > >     }
> > > >
> > > > Wait a couple of seconds and then you'll see the output from fec_dump.
> > > >
> > > > In the same setup I also see a weird issue when running a TCP flow
> > > > using iperf3. Most of the time (~70%) when i start the iperf3 client
> > > > I'll see ~450Mbps of throughput. In the other case (~30%) I'll see
> > > > ~790Mbps. The system is "stably bi-modal", i.e. whichever rate is
> > > > reached in the beginning is then sustained for as long as the session is kept
> > alive.
> > > >
> > > > I've inserted some tracepoints in the driver to try to understand
> > > > what's going
> > > > on:
> > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsv
> > > > gsha
> > re.com%2Fi%2FMVp.svg&amp;data=02%7C01%7Cfugang.duan%40nxp.com%
> > > >
> > 7C12854e21ea124b4cc2e008d81c59d618%7C686ea1d3bc2b4c6fa92cd99c5c
> > > >
> > 301635%7C0%7C0%7C637290519453656013&amp;sdata=by4ShOkmTaRkFfE
> > > > 0xJkrTptC%2B2egFf9iM4E5hx4jiSU%3D&amp;reserved=0
> > > >
> > > > What I can't figure out is why the Tx buffers seem to be collected
> > > > at a much slower rate in the slow case (top in the picture). If we
> > > > fall behind in one NAPI poll, we should catch up at the next call (which we
> > can see in the fast case).
> > > > But in the slow case we keep falling further and further behind
> > > > until we freeze the queue. Is this something you've ever observed? Any
> > ideas?
> > >
> > > Before, our cases don't reproduce the issue, cpu resource has better
> > > bandwidth than ethernet uDMA then there have chance to complete
> > > current NAPI. The next, work_tx get the update, never catch the issue.
> > 
> > It appears it has nothing to do with routing back out through the same
> > interface.
> > 
> > I get the same bi-modal behavior if just run the iperf3 server on the iMX and
> > then have it be the transmitting part, i.e. on the PC I run:
> > 
> >     iperf3 -c $IMX_IP -R
> > 
> > I would be very interesting to see what numbers you see in this scenario.
> I just have on imx8mn evk in my hands, and run the case, the numbers is
> ~940Mbps
> as below.
>
> root@imx8mnevk:~# iperf3 -s
> -----------------------------------------------------------
> Server listening on 5201
> -----------------------------------------------------------
> Accepted connection from 10.192.242.132, port 43402
> [ 5] local 10.192.242.96 port 5201 connected to 10.192.242.132 port
> 43404
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 109 MBytes 913 Mbits/sec 0 428 KBytes
> [ 5] 1.00-2.00 sec 112 MBytes 943 Mbits/sec 0 447 KBytes
> [ 5] 2.00-3.00 sec 112 MBytes 941 Mbits/sec 0 472 KBytes
> [ 5] 3.00-4.00 sec 113 MBytes 944 Mbits/sec 0 472 KBytes
> [ 5] 4.00-5.00 sec 112 MBytes 942 Mbits/sec 0 472 KBytes
> [ 5] 5.00-6.00 sec 112 MBytes 936 Mbits/sec 0 472 KBytes
> [ 5] 6.00-7.00 sec 113 MBytes 945 Mbits/sec 0 472 KBytes
> [ 5] 7.00-8.00 sec 112 MBytes 944 Mbits/sec 0 472 KBytes
> [ 5] 8.00-9.00 sec 112 MBytes 941 Mbits/sec 0 472 KBytes
> [ 5] 9.00-10.00 sec 112 MBytes 940 Mbits/sec 0 472 KBytes
> [ 5] 10.00-10.04 sec 4.16 MBytes 873 Mbits/sec 0 472 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.04 sec 1.10 GBytes 939 Mbits/sec 0 sender

Are you running the client with -R so that the iMX is the transmitter?
What if you run the test multiple times, do you get the same result
each time?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load
  2020-06-30  8:55             ` Tobias Waldekranz
@ 2020-06-30  9:02               ` Andy Duan
  2020-06-30  9:12                 ` Tobias Waldekranz
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Duan @ 2020-06-30  9:02 UTC (permalink / raw)
  To: Tobias Waldekranz, David Miller; +Cc: netdev

From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June 30, 2020 4:56 PM
> On Tue Jun 30, 2020 at 10:26 AM CEST, Andy Duan wrote:
> > From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June
> > 30,
> > 2020 3:31 PM
> > > On Tue Jun 30, 2020 at 8:27 AM CEST, Andy Duan wrote:
> > > > From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday,
> > > > June 30,
> > > > 2020 12:29 AM
> > > > > On Sun Jun 28, 2020 at 8:23 AM CEST, Andy Duan wrote:
> > > > > > I never seem bandwidth test cause netdev watchdog trip.
> > > > > > Can you describe the reproduce steps on the commit, then we
> > > > > > can reproduce it on my local. Thanks.
> > > > >
> > > > > My setup uses a i.MX8M Nano EVK connected to an ethernet switch,
> > > > > but can get the same results with a direct connection to a PC.
> > > > >
> > > > > On the iMX, configure two VLANs on top of the FEC and enable
> > > > > IPv4 forwarding.
> > > > >
> > > > > On the PC, configure two VLANs and put them in different
> namespaces.
> > > > > From one namespace, use trafgen to generate a flow that the iMX
> > > > > will route from the first VLAN to the second and then back
> > > > > towards the second namespace on the PC.
> > > > >
> > > > > Something like:
> > > > >
> > > > >     {
> > > > >         eth(sa=PC_MAC, da=IMX_MAC),
> > > > >         ipv4(saddr=10.0.2.2, daddr=10.0.3.2, ttl=2)
> > > > >         udp(sp=1, dp=2),
> > > > >         "Hello world"
> > > > >     }
> > > > >
> > > > > Wait a couple of seconds and then you'll see the output from
> fec_dump.
> > > > >
> > > > > In the same setup I also see a weird issue when running a TCP
> > > > > flow using iperf3. Most of the time (~70%) when i start the
> > > > > iperf3 client I'll see ~450Mbps of throughput. In the other case
> > > > > (~30%) I'll see ~790Mbps. The system is "stably bi-modal", i.e.
> > > > > whichever rate is reached in the beginning is then sustained for
> > > > > as long as the session is kept
> > > alive.
> > > > >
> > > > > I've inserted some tracepoints in the driver to try to
> > > > > understand what's going
> > > > > on:
> > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%
> > > > > 2Fsv
> > > > > gsha
> > >
> re.com%2Fi%2FMVp.svg&amp;data=02%7C01%7Cfugang.duan%40nxp.com%
> > > > >
> > >
> 7C12854e21ea124b4cc2e008d81c59d618%7C686ea1d3bc2b4c6fa92cd99c5c
> > > > >
> > >
> 301635%7C0%7C0%7C637290519453656013&amp;sdata=by4ShOkmTaRkFfE
> > > > > 0xJkrTptC%2B2egFf9iM4E5hx4jiSU%3D&amp;reserved=0
> > > > >
> > > > > What I can't figure out is why the Tx buffers seem to be
> > > > > collected at a much slower rate in the slow case (top in the
> > > > > picture). If we fall behind in one NAPI poll, we should catch up
> > > > > at the next call (which we
> > > can see in the fast case).
> > > > > But in the slow case we keep falling further and further behind
> > > > > until we freeze the queue. Is this something you've ever
> > > > > observed? Any
> > > ideas?
> > > >
> > > > Before, our cases don't reproduce the issue, cpu resource has
> > > > better bandwidth than ethernet uDMA then there have chance to
> > > > complete current NAPI. The next, work_tx get the update, never catch
> the issue.
> > >
> > > It appears it has nothing to do with routing back out through the
> > > same interface.
> > >
> > > I get the same bi-modal behavior if just run the iperf3 server on
> > > the iMX and then have it be the transmitting part, i.e. on the PC I run:
> > >
> > >     iperf3 -c $IMX_IP -R
> > >
> > > I would be very interesting to see what numbers you see in this scenario.
> > I just have on imx8mn evk in my hands, and run the case, the numbers
> > is ~940Mbps as below.
> >
> > root@imx8mnevk:~# iperf3 -s
> > -----------------------------------------------------------
> > Server listening on 5201
> > -----------------------------------------------------------
> > Accepted connection from 10.192.242.132, port 43402 [ 5] local
> > 10.192.242.96 port 5201 connected to 10.192.242.132 port
> > 43404
> > [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 109
> > MBytes 913 Mbits/sec 0 428 KBytes [ 5] 1.00-2.00 sec 112 MBytes 943
> > Mbits/sec 0 447 KBytes [ 5] 2.00-3.00 sec 112 MBytes 941 Mbits/sec 0
> > 472 KBytes [ 5] 3.00-4.00 sec 113 MBytes 944 Mbits/sec 0 472 KBytes [
> > 5] 4.00-5.00 sec 112 MBytes 942 Mbits/sec 0 472 KBytes [ 5] 5.00-6.00
> > sec 112 MBytes 936 Mbits/sec 0 472 KBytes [ 5] 6.00-7.00 sec 113
> > MBytes 945 Mbits/sec 0 472 KBytes [ 5] 7.00-8.00 sec 112 MBytes 944
> > Mbits/sec 0 472 KBytes [ 5] 8.00-9.00 sec 112 MBytes 941 Mbits/sec 0
> > 472 KBytes [ 5] 9.00-10.00 sec 112 MBytes 940 Mbits/sec 0 472 KBytes [
> > 5] 10.00-10.04 sec 4.16 MBytes 873 Mbits/sec 0 472 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval
> > Transfer Bitrate Retr [ 5] 0.00-10.04 sec 1.10 GBytes 939 Mbits/sec 0
> > sender
> 
> Are you running the client with -R so that the iMX is the transmitter?
> What if you run the test multiple times, do you get the same result each time?

Of course, PC command like: iperf3 -c 10.192.242.96 -R
Yes, the same result for each time.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load
  2020-06-30  9:02               ` Andy Duan
@ 2020-06-30  9:12                 ` Tobias Waldekranz
  2020-06-30  9:47                   ` Andy Duan
  0 siblings, 1 reply; 14+ messages in thread
From: Tobias Waldekranz @ 2020-06-30  9:12 UTC (permalink / raw)
  To: Andy Duan, David Miller; +Cc: netdev

On Tue Jun 30, 2020 at 11:02 AM CEST, Andy Duan wrote:
> From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June 30,
> 2020 4:56 PM
> > On Tue Jun 30, 2020 at 10:26 AM CEST, Andy Duan wrote:
> > > From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June
> > > 30,
> > > 2020 3:31 PM
> > > > On Tue Jun 30, 2020 at 8:27 AM CEST, Andy Duan wrote:
> > > > > From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday,
> > > > > June 30,
> > > > > 2020 12:29 AM
> > > > > > On Sun Jun 28, 2020 at 8:23 AM CEST, Andy Duan wrote:
> > > > > > > I never seem bandwidth test cause netdev watchdog trip.
> > > > > > > Can you describe the reproduce steps on the commit, then we
> > > > > > > can reproduce it on my local. Thanks.
> > > > > >
> > > > > > My setup uses a i.MX8M Nano EVK connected to an ethernet switch,
> > > > > > but can get the same results with a direct connection to a PC.
> > > > > >
> > > > > > On the iMX, configure two VLANs on top of the FEC and enable
> > > > > > IPv4 forwarding.
> > > > > >
> > > > > > On the PC, configure two VLANs and put them in different
> > namespaces.
> > > > > > From one namespace, use trafgen to generate a flow that the iMX
> > > > > > will route from the first VLAN to the second and then back
> > > > > > towards the second namespace on the PC.
> > > > > >
> > > > > > Something like:
> > > > > >
> > > > > >     {
> > > > > >         eth(sa=PC_MAC, da=IMX_MAC),
> > > > > >         ipv4(saddr=10.0.2.2, daddr=10.0.3.2, ttl=2)
> > > > > >         udp(sp=1, dp=2),
> > > > > >         "Hello world"
> > > > > >     }
> > > > > >
> > > > > > Wait a couple of seconds and then you'll see the output from
> > fec_dump.
> > > > > >
> > > > > > In the same setup I also see a weird issue when running a TCP
> > > > > > flow using iperf3. Most of the time (~70%) when i start the
> > > > > > iperf3 client I'll see ~450Mbps of throughput. In the other case
> > > > > > (~30%) I'll see ~790Mbps. The system is "stably bi-modal", i.e.
> > > > > > whichever rate is reached in the beginning is then sustained for
> > > > > > as long as the session is kept
> > > > alive.
> > > > > >
> > > > > > I've inserted some tracepoints in the driver to try to
> > > > > > understand what's going
> > > > > > on:
> > > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%
> > > > > > 2Fsv
> > > > > > gsha
> > > >
> > re.com%2Fi%2FMVp.svg&amp;data=02%7C01%7Cfugang.duan%40nxp.com%
> > > > > >
> > > >
> > 7C12854e21ea124b4cc2e008d81c59d618%7C686ea1d3bc2b4c6fa92cd99c5c
> > > > > >
> > > >
> > 301635%7C0%7C0%7C637290519453656013&amp;sdata=by4ShOkmTaRkFfE
> > > > > > 0xJkrTptC%2B2egFf9iM4E5hx4jiSU%3D&amp;reserved=0
> > > > > >
> > > > > > What I can't figure out is why the Tx buffers seem to be
> > > > > > collected at a much slower rate in the slow case (top in the
> > > > > > picture). If we fall behind in one NAPI poll, we should catch up
> > > > > > at the next call (which we
> > > > can see in the fast case).
> > > > > > But in the slow case we keep falling further and further behind
> > > > > > until we freeze the queue. Is this something you've ever
> > > > > > observed? Any
> > > > ideas?
> > > > >
> > > > > Before, our cases don't reproduce the issue, cpu resource has
> > > > > better bandwidth than ethernet uDMA then there have chance to
> > > > > complete current NAPI. The next, work_tx get the update, never catch
> > the issue.
> > > >
> > > > It appears it has nothing to do with routing back out through the
> > > > same interface.
> > > >
> > > > I get the same bi-modal behavior if just run the iperf3 server on
> > > > the iMX and then have it be the transmitting part, i.e. on the PC I run:
> > > >
> > > >     iperf3 -c $IMX_IP -R
> > > >
> > > > I would be very interesting to see what numbers you see in this scenario.
> > > I just have on imx8mn evk in my hands, and run the case, the numbers
> > > is ~940Mbps as below.
> > >
> > > root@imx8mnevk:~# iperf3 -s
> > > -----------------------------------------------------------
> > > Server listening on 5201
> > > -----------------------------------------------------------
> > > Accepted connection from 10.192.242.132, port 43402 [ 5] local
> > > 10.192.242.96 port 5201 connected to 10.192.242.132 port
> > > 43404
> > > [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 109
> > > MBytes 913 Mbits/sec 0 428 KBytes [ 5] 1.00-2.00 sec 112 MBytes 943
> > > Mbits/sec 0 447 KBytes [ 5] 2.00-3.00 sec 112 MBytes 941 Mbits/sec 0
> > > 472 KBytes [ 5] 3.00-4.00 sec 113 MBytes 944 Mbits/sec 0 472 KBytes [
> > > 5] 4.00-5.00 sec 112 MBytes 942 Mbits/sec 0 472 KBytes [ 5] 5.00-6.00
> > > sec 112 MBytes 936 Mbits/sec 0 472 KBytes [ 5] 6.00-7.00 sec 113
> > > MBytes 945 Mbits/sec 0 472 KBytes [ 5] 7.00-8.00 sec 112 MBytes 944
> > > Mbits/sec 0 472 KBytes [ 5] 8.00-9.00 sec 112 MBytes 941 Mbits/sec 0
> > > 472 KBytes [ 5] 9.00-10.00 sec 112 MBytes 940 Mbits/sec 0 472 KBytes [
> > > 5] 10.00-10.04 sec 4.16 MBytes 873 Mbits/sec 0 472 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval
> > > Transfer Bitrate Retr [ 5] 0.00-10.04 sec 1.10 GBytes 939 Mbits/sec 0
> > > sender
> > 
> > Are you running the client with -R so that the iMX is the transmitter?
> > What if you run the test multiple times, do you get the same result each time?
>
> Of course, PC command like: iperf3 -c 10.192.242.96 -R
> Yes, the same result for each time.

Very strange, I've now reduced my setup to a simple direct connection
between iMX and PC and I still see the same issue:

for i in $(seq 5); do iperf3 -c 10.0.2.1 -R -t2; sleep 1; done
Connecting to host 10.0.2.1, port 5201
Reverse mode, remote host 10.0.2.1 is sending
[  5] local 10.0.2.2 port 53978 connected to 10.0.2.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   110 MBytes   919 Mbits/sec
[  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec    0   0.00 Bytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-2.04   sec   223 MBytes   918 Mbits/sec    0             sender
[  5]   0.00-2.00   sec   222 MBytes   930 Mbits/sec                  receiver

iperf Done.
Connecting to host 10.0.2.1, port 5201
Reverse mode, remote host 10.0.2.1 is sending
[  5] local 10.0.2.2 port 53982 connected to 10.0.2.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  55.8 MBytes   468 Mbits/sec
[  5]   1.00-2.00   sec  56.3 MBytes   472 Mbits/sec    0   0.00 Bytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-2.04   sec   113 MBytes   464 Mbits/sec    0             sender
[  5]   0.00-2.00   sec   112 MBytes   470 Mbits/sec                  receiver

iperf Done.
Connecting to host 10.0.2.1, port 5201
Reverse mode, remote host 10.0.2.1 is sending
[  5] local 10.0.2.2 port 53986 connected to 10.0.2.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  55.7 MBytes   467 Mbits/sec
[  5]   1.00-2.00   sec  56.3 MBytes   472 Mbits/sec    0   0.00 Bytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-2.04   sec   113 MBytes   464 Mbits/sec    0             sender
[  5]   0.00-2.00   sec   112 MBytes   470 Mbits/sec                  receiver

iperf Done.
Connecting to host 10.0.2.1, port 5201
Reverse mode, remote host 10.0.2.1 is sending
[  5] local 10.0.2.2 port 53990 connected to 10.0.2.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   110 MBytes   920 Mbits/sec
[  5]   1.00-2.00   sec   112 MBytes   942 Mbits/sec    0   0.00 Bytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-2.04   sec   223 MBytes   919 Mbits/sec    0             sender
[  5]   0.00-2.00   sec   222 MBytes   931 Mbits/sec                  receiver

iperf Done.
Connecting to host 10.0.2.1, port 5201
Reverse mode, remote host 10.0.2.1 is sending
[  5] local 10.0.2.2 port 53994 connected to 10.0.2.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   110 MBytes   920 Mbits/sec
[  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec    0   0.00 Bytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-2.04   sec   223 MBytes   918 Mbits/sec    0             sender
[  5]   0.00-2.00   sec   222 MBytes   931 Mbits/sec                  receiver

iperf Done.

Which kernel version are you running? I'm on be74294ffa24 plus the
starvation fix in this patch.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load
  2020-06-30  9:12                 ` Tobias Waldekranz
@ 2020-06-30  9:47                   ` Andy Duan
  2020-06-30 11:01                     ` Tobias Waldekranz
  2020-06-30 13:45                     ` Tobias Waldekranz
  0 siblings, 2 replies; 14+ messages in thread
From: Andy Duan @ 2020-06-30  9:47 UTC (permalink / raw)
  To: Tobias Waldekranz, David Miller; +Cc: netdev

From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June 30, 2020 5:13 PM
> On Tue Jun 30, 2020 at 11:02 AM CEST, Andy Duan wrote:
> > From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June
> > 30,
> > 2020 4:56 PM
> > > On Tue Jun 30, 2020 at 10:26 AM CEST, Andy Duan wrote:
> > > > From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday,
> > > > June 30,
> > > > 2020 3:31 PM
> > > > > On Tue Jun 30, 2020 at 8:27 AM CEST, Andy Duan wrote:
> > > > > > From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday,
> > > > > > June 30,
> > > > > > 2020 12:29 AM
> > > > > > > On Sun Jun 28, 2020 at 8:23 AM CEST, Andy Duan wrote:
> > > > > > > > I never seem bandwidth test cause netdev watchdog trip.
> > > > > > > > Can you describe the reproduce steps on the commit, then
> > > > > > > > we can reproduce it on my local. Thanks.
> > > > > > >
> > > > > > > My setup uses a i.MX8M Nano EVK connected to an ethernet
> > > > > > > switch, but can get the same results with a direct connection to a
> PC.
> > > > > > >
> > > > > > > On the iMX, configure two VLANs on top of the FEC and enable
> > > > > > > IPv4 forwarding.
> > > > > > >
> > > > > > > On the PC, configure two VLANs and put them in different
> > > namespaces.
> > > > > > > From one namespace, use trafgen to generate a flow that the
> > > > > > > iMX will route from the first VLAN to the second and then
> > > > > > > back towards the second namespace on the PC.
> > > > > > >
> > > > > > > Something like:
> > > > > > >
> > > > > > >     {
> > > > > > >         eth(sa=PC_MAC, da=IMX_MAC),
> > > > > > >         ipv4(saddr=10.0.2.2, daddr=10.0.3.2, ttl=2)
> > > > > > >         udp(sp=1, dp=2),
> > > > > > >         "Hello world"
> > > > > > >     }
> > > > > > >
> > > > > > > Wait a couple of seconds and then you'll see the output from
> > > fec_dump.
> > > > > > >
> > > > > > > In the same setup I also see a weird issue when running a
> > > > > > > TCP flow using iperf3. Most of the time (~70%) when i start
> > > > > > > the
> > > > > > > iperf3 client I'll see ~450Mbps of throughput. In the other
> > > > > > > case
> > > > > > > (~30%) I'll see ~790Mbps. The system is "stably bi-modal", i.e.
> > > > > > > whichever rate is reached in the beginning is then sustained
> > > > > > > for as long as the session is kept
> > > > > alive.
> > > > > > >
> > > > > > > I've inserted some tracepoints in the driver to try to
> > > > > > > understand what's going
> > > > > > > on:
> > > > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A
> > > > > > > %2F%25
> > > > > > > 2Fsv
> > > > > > > gsha
> > > > >
> > >
> re.com%2Fi%2FMVp.svg&amp;data=02%7C01%7Cfugang.duan%40nxp.com%
> > > > > > >
> > > > >
> > >
> 7C12854e21ea124b4cc2e008d81c59d618%7C686ea1d3bc2b4c6fa92cd99c5c
> > > > > > >
> > > > >
> > >
> 301635%7C0%7C0%7C637290519453656013&amp;sdata=by4ShOkmTaRkFfE
> > > > > > > 0xJkrTptC%2B2egFf9iM4E5hx4jiSU%3D&amp;reserved=0
> > > > > > >
> > > > > > > What I can't figure out is why the Tx buffers seem to be
> > > > > > > collected at a much slower rate in the slow case (top in the
> > > > > > > picture). If we fall behind in one NAPI poll, we should
> > > > > > > catch up at the next call (which we
> > > > > can see in the fast case).
> > > > > > > But in the slow case we keep falling further and further
> > > > > > > behind until we freeze the queue. Is this something you've
> > > > > > > ever observed? Any
> > > > > ideas?
> > > > > >
> > > > > > Before, our cases don't reproduce the issue, cpu resource has
> > > > > > better bandwidth than ethernet uDMA then there have chance to
> > > > > > complete current NAPI. The next, work_tx get the update, never
> > > > > > catch
> > > the issue.
> > > > >
> > > > > It appears it has nothing to do with routing back out through
> > > > > the same interface.
> > > > >
> > > > > I get the same bi-modal behavior if just run the iperf3 server
> > > > > on the iMX and then have it be the transmitting part, i.e. on the PC I
> run:
> > > > >
> > > > >     iperf3 -c $IMX_IP -R
> > > > >
> > > > > I would be very interesting to see what numbers you see in this
> scenario.
> > > > I just have on imx8mn evk in my hands, and run the case, the
> > > > numbers is ~940Mbps as below.
> > > >
> > > > root@imx8mnevk:~# iperf3 -s
> > > > -----------------------------------------------------------
> > > > Server listening on 5201
> > > > -----------------------------------------------------------
> > > > Accepted connection from 10.192.242.132, port 43402 [ 5] local
> > > > 10.192.242.96 port 5201 connected to 10.192.242.132 port
> > > > 43404
> > > > [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 109
> > > > MBytes 913 Mbits/sec 0 428 KBytes [ 5] 1.00-2.00 sec 112 MBytes
> > > > 943 Mbits/sec 0 447 KBytes [ 5] 2.00-3.00 sec 112 MBytes 941
> > > > Mbits/sec 0
> > > > 472 KBytes [ 5] 3.00-4.00 sec 113 MBytes 944 Mbits/sec 0 472
> > > > KBytes [ 5] 4.00-5.00 sec 112 MBytes 942 Mbits/sec 0 472 KBytes [
> > > > 5] 5.00-6.00 sec 112 MBytes 936 Mbits/sec 0 472 KBytes [ 5]
> > > > 6.00-7.00 sec 113 MBytes 945 Mbits/sec 0 472 KBytes [ 5] 7.00-8.00
> > > > sec 112 MBytes 944 Mbits/sec 0 472 KBytes [ 5] 8.00-9.00 sec 112
> > > > MBytes 941 Mbits/sec 0
> > > > 472 KBytes [ 5] 9.00-10.00 sec 112 MBytes 940 Mbits/sec 0 472
> > > > KBytes [ 5] 10.00-10.04 sec 4.16 MBytes 873 Mbits/sec 0 472 KBytes
> > > > - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval
> > > > Transfer Bitrate Retr [ 5] 0.00-10.04 sec 1.10 GBytes 939
> > > > Mbits/sec 0 sender
> > >
> > > Are you running the client with -R so that the iMX is the transmitter?
> > > What if you run the test multiple times, do you get the same result each
> time?
> >
> > Of course, PC command like: iperf3 -c 10.192.242.96 -R Yes, the same
> > result for each time.
> 
> Very strange, I've now reduced my setup to a simple direct connection
> between iMX and PC and I still see the same issue:
> 
> for i in $(seq 5); do iperf3 -c 10.0.2.1 -R -t2; sleep 1; done Connecting to host
> 10.0.2.1, port 5201 Reverse mode, remote host 10.0.2.1 is sending [  5] local
> 10.0.2.2 port 53978 connected to 10.0.2.1 port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec   110 MBytes   919 Mbits/sec
> [  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec    0   0.00
> Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-2.04   sec   223 MBytes   918 Mbits/sec    0
> sender
> [  5]   0.00-2.00   sec   222 MBytes   930 Mbits/sec
> receiver
> 
> iperf Done.
> Connecting to host 10.0.2.1, port 5201
> Reverse mode, remote host 10.0.2.1 is sending [  5] local 10.0.2.2 port
> 53982 connected to 10.0.2.1 port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec  55.8 MBytes   468 Mbits/sec
> [  5]   1.00-2.00   sec  56.3 MBytes   472 Mbits/sec    0   0.00
> Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-2.04   sec   113 MBytes   464 Mbits/sec    0
> sender
> [  5]   0.00-2.00   sec   112 MBytes   470 Mbits/sec
> receiver
> 
> iperf Done.
> Connecting to host 10.0.2.1, port 5201
> Reverse mode, remote host 10.0.2.1 is sending [  5] local 10.0.2.2 port
> 53986 connected to 10.0.2.1 port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec  55.7 MBytes   467 Mbits/sec
> [  5]   1.00-2.00   sec  56.3 MBytes   472 Mbits/sec    0   0.00
> Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-2.04   sec   113 MBytes   464 Mbits/sec    0
> sender
> [  5]   0.00-2.00   sec   112 MBytes   470 Mbits/sec
> receiver
> 
> iperf Done.
> Connecting to host 10.0.2.1, port 5201
> Reverse mode, remote host 10.0.2.1 is sending [  5] local 10.0.2.2 port
> 53990 connected to 10.0.2.1 port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec   110 MBytes   920 Mbits/sec
> [  5]   1.00-2.00   sec   112 MBytes   942 Mbits/sec    0   0.00
> Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-2.04   sec   223 MBytes   919 Mbits/sec    0
> sender
> [  5]   0.00-2.00   sec   222 MBytes   931 Mbits/sec
> receiver
> 
> iperf Done.
> Connecting to host 10.0.2.1, port 5201
> Reverse mode, remote host 10.0.2.1 is sending [  5] local 10.0.2.2 port
> 53994 connected to 10.0.2.1 port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec   110 MBytes   920 Mbits/sec
> [  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec    0   0.00
> Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-2.04   sec   223 MBytes   918 Mbits/sec    0
> sender
> [  5]   0.00-2.00   sec   222 MBytes   931 Mbits/sec
> receiver
> 
> iperf Done.
> 
> Which kernel version are you running? I'm on be74294ffa24 plus the
> starvation fix in this patch.

Tobias, sorry, I am not running the net tree, I run the linux-imx tree:
https://source.codeaurora.org/external/imx/linux-imx/refs/heads
branch:imx_5.4.24_2.1.0
But the data follow is the same as net tree.

log on PC: (imx run as server)
$ for i in $(seq 5); do iperf3 -c 10.192.242.96 -R -t2; sleep 1; done
Connecting to host 10.192.242.96, port 5201
Reverse mode, remote host 10.192.242.96 is sending
[  4] local 10.192.242.132 port 46504 connected to 10.192.242.96 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   112 MBytes   939 Mbits/sec
[  4]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-2.00   sec   226 MBytes   949 Mbits/sec    0             sender
[  4]   0.00-2.00   sec   225 MBytes   942 Mbits/sec                  receiver

iperf Done.
Connecting to host 10.192.242.96, port 5201
Reverse mode, remote host 10.192.242.96 is sending
[  4] local 10.192.242.132 port 46510 connected to 10.192.242.96 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   111 MBytes   933 Mbits/sec
[  4]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-2.00   sec   226 MBytes   949 Mbits/sec    0             sender
[  4]   0.00-2.00   sec   224 MBytes   939 Mbits/sec                  receiver

iperf Done.
Connecting to host 10.192.242.96, port 5201
Reverse mode, remote host 10.192.242.96 is sending
[  4] local 10.192.242.132 port 46516 connected to 10.192.242.96 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   112 MBytes   936 Mbits/sec
[  4]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-2.00   sec   226 MBytes   949 Mbits/sec    0             sender
[  4]   0.00-2.00   sec   224 MBytes   940 Mbits/sec                  receiver

iperf Done.
Connecting to host 10.192.242.96, port 5201
Reverse mode, remote host 10.192.242.96 is sending
[  4] local 10.192.242.132 port 46522 connected to 10.192.242.96 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   111 MBytes   934 Mbits/sec
[  4]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-2.00   sec   226 MBytes   946 Mbits/sec    0             sender
[  4]   0.00-2.00   sec   224 MBytes   939 Mbits/sec                  receiver

iperf Done.
Connecting to host 10.192.242.96, port 5201
Reverse mode, remote host 10.192.242.96 is sending
[  4] local 10.192.242.132 port 46528 connected to 10.192.242.96 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   112 MBytes   936 Mbits/sec
[  4]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-2.00   sec   226 MBytes   947 Mbits/sec    0             sender
[  4]   0.00-2.00   sec   224 MBytes   940 Mbits/sec                  receiver

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load
  2020-06-30  9:47                   ` Andy Duan
@ 2020-06-30 11:01                     ` Tobias Waldekranz
  2020-07-01  1:27                       ` Andy Duan
  2020-06-30 13:45                     ` Tobias Waldekranz
  1 sibling, 1 reply; 14+ messages in thread
From: Tobias Waldekranz @ 2020-06-30 11:01 UTC (permalink / raw)
  To: Andy Duan, David Miller; +Cc: netdev

On Tue Jun 30, 2020 at 11:47 AM CEST, Andy Duan wrote:
> Tobias, sorry, I am not running the net tree, I run the linux-imx tree:
> https://source.codeaurora.org/external/imx/linux-imx/refs/heads
> branch:imx_5.4.24_2.1.0
> But the data follow is the same as net tree.

Ok, I'll build that kernel and see if I get different results. Would
you mind sharing your kernel config?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load
  2020-06-30  9:47                   ` Andy Duan
  2020-06-30 11:01                     ` Tobias Waldekranz
@ 2020-06-30 13:45                     ` Tobias Waldekranz
  1 sibling, 0 replies; 14+ messages in thread
From: Tobias Waldekranz @ 2020-06-30 13:45 UTC (permalink / raw)
  To: Andy Duan, David Miller; +Cc: netdev

On Tue Jun 30, 2020 at 11:47 AM CEST, Andy Duan wrote:
> Tobias, sorry, I am not running the net tree, I run the linux-imx tree:
> https://source.codeaurora.org/external/imx/linux-imx/refs/heads
> branch:imx_5.4.24_2.1.0
> But the data follow is the same as net tree.

I've now built the same kernel. On this one the issue does not occur,
consistent throughput of ~940Mbps just like you're seeing.

Now moving to mainline 5.4 to rule out any NXP changes first, then
start bisecting.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load
  2020-06-30 11:01                     ` Tobias Waldekranz
@ 2020-07-01  1:27                       ` Andy Duan
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Duan @ 2020-07-01  1:27 UTC (permalink / raw)
  To: Tobias Waldekranz, David Miller; +Cc: netdev

From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June 30, 2020 7:02 PM
> On Tue Jun 30, 2020 at 11:47 AM CEST, Andy Duan wrote:
> > Tobias, sorry, I am not running the net tree, I run the linux-imx tree:
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsour
> >
> ce.codeaurora.org%2Fexternal%2Fimx%2Flinux-imx%2Frefs%2Fheads&amp;
> data
> >
> =02%7C01%7Cfugang.duan%40nxp.com%7C351ae50de61a459351af08d81ce
> 54d00%7C
> >
> 686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637291118447122795
> &amp;sdat
> >
> a=C2gVUOrytGol3Al9yPU2CgDA6uPsk1LJvvb550zDiQk%3D&amp;reserved=0
> > branch:imx_5.4.24_2.1.0
> > But the data follow is the same as net tree.
> 
> Ok, I'll build that kernel and see if I get different results. Would you mind
> sharing your kernel config?

The config: imx_v8_defconfig

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, back to index

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-25  8:57 [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load Tobias Waldekranz
2020-06-25 19:19 ` David Miller
2020-06-28  6:23   ` [EXT] " Andy Duan
2020-06-29 16:29     ` Tobias Waldekranz
2020-06-30  6:27       ` Andy Duan
2020-06-30  7:30         ` Tobias Waldekranz
2020-06-30  8:26           ` Andy Duan
2020-06-30  8:55             ` Tobias Waldekranz
2020-06-30  9:02               ` Andy Duan
2020-06-30  9:12                 ` Tobias Waldekranz
2020-06-30  9:47                   ` Andy Duan
2020-06-30 11:01                     ` Tobias Waldekranz
2020-07-01  1:27                       ` Andy Duan
2020-06-30 13:45                     ` Tobias Waldekranz

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git