All of lore.kernel.org
 help / color / mirror / Atom feed
* net latency, long wait for softirq_entry
@ 2020-12-05 18:30 Paul Thomas
  2020-12-05 18:53 ` Paul Thomas
  2020-12-18 13:52 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 5+ messages in thread
From: Paul Thomas @ 2020-12-05 18:30 UTC (permalink / raw)
  To: linux-rt-users

(this time with a proper subject)
Hello,

I've been testing out the 5.10 rt kernel[1], and now I'm trying to
track down a networking latency issue. Generally this works well, I
have a user thread (priority 99) and the driver (macb) rx irq thread
(priority 98). With tracing (kernelshark is very cool btw) I can see
the normal flow, the irq thread does the netif_receive_skb:
softirq_raise
softirq_entry
napi_gro_receive_entry
napi_gro_receive_exit
netif_receive_skb
sched_waking
sched_wakeup
softirq_exit
sched_switch

Then the user thread does:
sys_exit_recvmsg

But then in the long latency case, instead of going straight to
softirq_entry this is delayed and it does the softirq_entry from the
ksoftirqd task instead of the irq/30-eth%d task. The problem is that
before ksoftirqd runs, another lower priority task runs (in this case
it's the macb tx interrupt thread at priority 80, but I've seen this
be user threads in slightly different tests). It seems the paths
diverge with a call to __kthread_should_park.

This is on an arm64 zynqmp platform.

Any thoughts on how this could be improved would be appreciated.

Cyclictest seems OK, a 10 minute run with stress-ng running shows a
maximum latency of 47us
# ./cyclictest -S -m -p 99 -d 0 -i 200 -D 10m
WARN: stat /dev/cpu_dma_latency failed: No such file or directory
policy: fifo: loadavg: 15.99 15.74 13.41 13/152 2605

T: 0 ( 5687) P:99 I:200 C:2999965 Min:      6 Act:   12 Avg:   11 Max:      39
T: 1 ( 5701) P:99 I:200 C:2999647 Min:      8 Act:   12 Avg:   10 Max:      40
T: 2 ( 5720) P:99 I:200 C:2999343 Min:      8 Act:   11 Avg:   10 Max:      41
T: 3 ( 5740) P:99 I:200 C:2999056 Min:      7 Act:   10 Avg:   11 Max:      47

-Paul

[1] I saw the v5.10-rc6-rt14 announcement yesterday, and rebased to
that to for the tracing mentioned here

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: net latency, long wait for softirq_entry
  2020-12-05 18:30 net latency, long wait for softirq_entry Paul Thomas
@ 2020-12-05 18:53 ` Paul Thomas
  2020-12-18 13:52 ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 5+ messages in thread
From: Paul Thomas @ 2020-12-05 18:53 UTC (permalink / raw)
  To: linux-rt-users

The trace can be found here:
https://bugzilla.kernel.org/attachment.cgi?id=293951
The softirq_raise with a long wait before the softirq_entry starts at
time 515.509590.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: net latency, long wait for softirq_entry
  2020-12-05 18:30 net latency, long wait for softirq_entry Paul Thomas
  2020-12-05 18:53 ` Paul Thomas
@ 2020-12-18 13:52 ` Sebastian Andrzej Siewior
       [not found]   ` <CAFzL-7ujcJ4KDbgVoq-N6GRTWfemCA6hfqOzBGwcaDzt=0rGzQ@mail.gmail.com>
  1 sibling, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-12-18 13:52 UTC (permalink / raw)
  To: Paul Thomas, Thomas Gleixner; +Cc: linux-rt-users

On 2020-12-05 13:30:21 [-0500], Paul Thomas wrote:
> (this time with a proper subject)
> Hello,
Hi, 

> I've been testing out the 5.10 rt kernel[1], and now I'm trying to
> But then in the long latency case, instead of going straight to
> softirq_entry this is delayed and it does the softirq_entry from the
> ksoftirqd task instead of the irq/30-eth%d task. The problem is that
> before ksoftirqd runs, another lower priority task runs (in this case
> it's the macb tx interrupt thread at priority 80, but I've seen this
> be user threads in slightly different tests). It seems the paths
> diverge with a call to __kthread_should_park.
> 
> This is on an arm64 zynqmp platform.
> 
> Any thoughts on how this could be improved would be appreciated.

The usual flow is:
- wake IRQ thread
- IRQ thread raises NAPI (softirq NET_RX)
- NAPI is processed on the exit of the IRQ thread.

This flow was disturbed by the timer:
- stress-ng is running
- the HZ tick timer fired, raised TIMER softirq. This leads to waking
  the ksoftirqd thread to process it (this is normal).
- stress-ng continues.
- the networking IRQ fires. The thread is woken up, raises NET_RX
  softirq. Usually, the softirq would be invoked on return from the
  interrupt thread but since ksoftirqd is already woken nothing happens.
  It is delayed to ksoftirqd. The IRQ-thread goes idle.

Having a thread to run the tick-timer would avoid that scenario.

Side note: If the networking thread runs for a longer period of then it
will handover to ksoftirqd.

 -Paul

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: net latency, long wait for softirq_entry
       [not found]   ` <CAFzL-7ujcJ4KDbgVoq-N6GRTWfemCA6hfqOzBGwcaDzt=0rGzQ@mail.gmail.com>
@ 2020-12-18 16:49     ` Sebastian Andrzej Siewior
  2020-12-21  0:08       ` Paul Thomas
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-12-18 16:49 UTC (permalink / raw)
  To: Alison Chaiken; +Cc: Paul Thomas, Thomas Gleixner, linux-rt-users

On 2020-12-18 08:18:05 [-0800], Alison Chaiken wrote:
> > Having a thread to run the tick-timer would avoid that scenario.
> >
> 
> Didn't ktimersoftd used to be such a thread?   It's still not entirely
> clear to me at least why it was removed.

Yes, ktimersoftd was such a thread that is why I am suggesting it. It
would be probably just a quick duct tape.

All of the reasons why it has been introduced disappeared in the
previous softirq rework. The NAPI handover works, posixtimer need no
additional love and so on (the original motivation to have it).

The problem, that Paul reported, should also exist for !RT with the
`threadirqs' switch (untested but it is the same code).
It is worth noting that in his report the latency was increased because
the timer-tick woke the ksoftirqd thread. Rightfully you could say that
this would not have happen with the timer thread. 
However, the usb-storage driver (just to pick an easy to trigger
scenario for my case) also wakes the ksoftirqd if a transfer completes.
If the ethernet interrupt fires before ksoftirqd completes its task then
we have the same situation without the involvement of the timer :)

> -- Alison Chaiken
>    Aurora Innovation

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: net latency, long wait for softirq_entry
  2020-12-18 16:49     ` Sebastian Andrzej Siewior
@ 2020-12-21  0:08       ` Paul Thomas
  0 siblings, 0 replies; 5+ messages in thread
From: Paul Thomas @ 2020-12-21  0:08 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Alison Chaiken, Thomas Gleixner, linux-rt-users

On Fri, Dec 18, 2020 at 11:49 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2020-12-18 08:18:05 [-0800], Alison Chaiken wrote:
> > > Having a thread to run the tick-timer would avoid that scenario.
> > >
> >
> > Didn't ktimersoftd used to be such a thread?   It's still not entirely
> > clear to me at least why it was removed.
>
> Yes, ktimersoftd was such a thread that is why I am suggesting it. It
> would be probably just a quick duct tape.
>
> All of the reasons why it has been introduced disappeared in the
> previous softirq rework. The NAPI handover works, posixtimer need no
> additional love and so on (the original motivation to have it).
>
> The problem, that Paul reported, should also exist for !RT with the
> `threadirqs' switch (untested but it is the same code).
> It is worth noting that in his report the latency was increased because
> the timer-tick woke the ksoftirqd thread. Rightfully you could say that
> this would not have happen with the timer thread.
> However, the usb-storage driver (just to pick an easy to trigger
> scenario for my case) also wakes the ksoftirqd if a transfer completes.
> If the ethernet interrupt fires before ksoftirqd completes its task then
> we have the same situation without the involvement of the timer :)
>
> > -- Alison Chaiken
> >    Aurora Innovation
>
> Sebastian

Hi Everyone,

Thanks for taking the time to look at this, it's appreciated! For now,
setting the ksoftirqd priority high on the same core as the interrupt
seems to greatly improve things, thanks for the suggestion Grygorii.

A few of other notes on items that seem to affect latency in a related
use case. First, if a sibling thread of an application calls clone
(e.g. a system() call) then this seems to prevent all the threads of
the application from being scheduled temporarily. Second, I saw a
couple of instances where one thread seemed to get migrated to another
core, alternating with the migration thread (~40 times) and then
ultimately running on a different core. Using taskset to set the CPU
affinity of the offending thread helped this. Third, the PHC tx
timestamping (but not the rx) can cause latency issues (using the macb
driver), but this is the least investigated of the group.

thanks,
Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-21  5:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-05 18:30 net latency, long wait for softirq_entry Paul Thomas
2020-12-05 18:53 ` Paul Thomas
2020-12-18 13:52 ` Sebastian Andrzej Siewior
     [not found]   ` <CAFzL-7ujcJ4KDbgVoq-N6GRTWfemCA6hfqOzBGwcaDzt=0rGzQ@mail.gmail.com>
2020-12-18 16:49     ` Sebastian Andrzej Siewior
2020-12-21  0:08       ` Paul Thomas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.