All of lore.kernel.org
 help / color / mirror / Atom feed
* DPAA TX Issues
@ 2018-04-08 23:46 Jacob S. Moroni
  2018-04-09  3:20 ` Jacob S. Moroni
  0 siblings, 1 reply; 2+ messages in thread
From: Jacob S. Moroni @ 2018-04-08 23:46 UTC (permalink / raw)
  To: madalin.bucur; +Cc: netdev

Hello Madalin,

I've been experiencing some issues with the DPAA Ethernet driver,
specifically related to frame transmission. Hopefully you can point
me in the right direction.

TLDR: Attempting to transmit faster than a few frames per second causes
the TX FQ CGR to enter into the congested state and remain there forever,
even after transmission stops.

The hardware is a T2080RDB, running from the tip of net-next, using
the standard t2080rdb device tree and corenet64_smp_defconfig kernel
config. No changes were made to any of the files. The issue occurs
with 4.16.1 stable as well. In fact, the only time I've been able
to achieve reliable frame transmission was with the SDK 4.1 kernel.

For my tests, I'm running iperf3 both with and without the -R
option (send/receive). When using a USB Ethernet adapter, there
are no issues.

The issue is that it seems like the TX frame queues are getting
"stuck" when attempting to transmit at rates greater than a few frames
per second. Ping works fine, but it seems like anything that could
potentially cause multiple TX frames to be enqueued causes issues.

If I run iperf3 in reverse mode (with the T2080RDB receiving), then
I can achieve ~940 Mbps, but this is also somewhat unreliable.

If I run it with the T2080RDB transmitting, the test will never
complete. Sometimes it starts transmitting for a few seconds then stops,
and other times it never even starts. This also seems to force the
interface into a bad state.

The ethtool stats show that the interface has entered
congestion a few times, and that it's currently congested. The fact
that it's currently congested even after stopping transmission
indicates that the FQ somehow stopped being drained. I've also
noticed that whenever this issue occurs, the TX confirmation
counters are always less than the TX packet counters.

When it gets into this state, I can see that the memory usage is
climbing, up until about the point of where the CGR threshold
is (about 100 MB).

Any idea what could prevent the TX FQ from being drained? My first
guess was flow control, but it's completely disabled.

I tried messing with the egress congestion threshold, workqueue
assignments, etc., but nothing seemed to have any effect.

If you need any more information or want me to run any tests,
please let me know.

Thanks,
-- 
  Jacob S. Moroni
  mail@jakemoroni.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: DPAA TX Issues
  2018-04-08 23:46 DPAA TX Issues Jacob S. Moroni
@ 2018-04-09  3:20 ` Jacob S. Moroni
  0 siblings, 0 replies; 2+ messages in thread
From: Jacob S. Moroni @ 2018-04-09  3:20 UTC (permalink / raw)
  To: madalin.bucur; +Cc: netdev

On Sun, Apr 8, 2018, at 7:46 PM, Jacob S. Moroni wrote:
> Hello Madalin,
> 
> I've been experiencing some issues with the DPAA Ethernet driver,
> specifically related to frame transmission. Hopefully you can point
> me in the right direction.
> 
> TLDR: Attempting to transmit faster than a few frames per second causes
> the TX FQ CGR to enter into the congested state and remain there forever,
> even after transmission stops.
> 
> The hardware is a T2080RDB, running from the tip of net-next, using
> the standard t2080rdb device tree and corenet64_smp_defconfig kernel
> config. No changes were made to any of the files. The issue occurs
> with 4.16.1 stable as well. In fact, the only time I've been able
> to achieve reliable frame transmission was with the SDK 4.1 kernel.
> 
> For my tests, I'm running iperf3 both with and without the -R
> option (send/receive). When using a USB Ethernet adapter, there
> are no issues.
> 
> The issue is that it seems like the TX frame queues are getting
> "stuck" when attempting to transmit at rates greater than a few frames
> per second. Ping works fine, but it seems like anything that could
> potentially cause multiple TX frames to be enqueued causes issues.
> 
> If I run iperf3 in reverse mode (with the T2080RDB receiving), then
> I can achieve ~940 Mbps, but this is also somewhat unreliable.
> 
> If I run it with the T2080RDB transmitting, the test will never
> complete. Sometimes it starts transmitting for a few seconds then stops,
> and other times it never even starts. This also seems to force the
> interface into a bad state.
> 
> The ethtool stats show that the interface has entered
> congestion a few times, and that it's currently congested. The fact
> that it's currently congested even after stopping transmission
> indicates that the FQ somehow stopped being drained. I've also
> noticed that whenever this issue occurs, the TX confirmation
> counters are always less than the TX packet counters.
> 
> When it gets into this state, I can see that the memory usage is
> climbing, up until about the point of where the CGR threshold
> is (about 100 MB).
> 
> Any idea what could prevent the TX FQ from being drained? My first
> guess was flow control, but it's completely disabled.
> 
> I tried messing with the egress congestion threshold, workqueue
> assignments, etc., but nothing seemed to have any effect.
> 
> If you need any more information or want me to run any tests,
> please let me know.
> 
> Thanks,
> -- 
>   Jacob S. Moroni
>   mail@jakemoroni.com

It turns out that irqbalance was causing all of the issues. After
disabling it and rebooting, the interfaces worked perfectly.

Perhaps there's an issue with how the qman/bman portals are defined
as per-cpu variables.

During the portal's probe, the CPUs are assigned one-by-one and
subsequently passed into request_irq as the argument.
However, it seems like if the IRQ affinity changes, then the ISR could be
passed a reference to a per-cpu variable belonging to another CPU.

At least I know where to look now.

- Jake

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-04-09  3:20 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-08 23:46 DPAA TX Issues Jacob S. Moroni
2018-04-09  3:20 ` Jacob S. Moroni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.