* irq thread latency caused by softirq_ctrl.lock contention
@ 2021-09-15 22:59 Gratian Crisan
2021-09-20 9:18 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 3+ messages in thread
From: Gratian Crisan @ 2021-09-15 22:59 UTC (permalink / raw)
To: Thomas Gleixner, Sebastian Andrzej Siewior, Steven Rostedt; +Cc: linux-rt-users
Hi guys,
We are seeing some pretty significant degradation in some of our
benchmarks when using 5.10.59-rt52 or 5.14.1-rt19 kernels versus
previous numbers on 4.14-rt and I'm looking for thoughts/ideas if
anything can be done about it.
The synthetic benchmark in question is measuring the latency of a
"analog in" -> "analog out" loop. We have two versions of this
benchmark: one that is interrupt based and one that polls on the I/O.
The polling mode benchmark is comparable between 5.x-rt and 4.14-rt.
However the most extreme case of the interrupt based one shows up to 60%
degradation in latency in 5.x-rt kernels.
I was able to confirm that the issue is caused by contention on
'softirq_ctrl.lock' local lock in the softirq code. This results in
event traces with the following pattern:
<idle>-0 [000] 582.023481: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=12 next_prio=91
ksoftirqd/0-12 [000] 582.023485: irq_handler_entry: irq=102 name=atomicchinchk
ksoftirqd/0-12 [000] 582.023485: irq_handler_exit: irq=102 ret=handled
ksoftirqd/0-12 [000] 582.023486: sched_waking: comm=irq/102-atomicc pid=248 prio=1 target_cpu=000
ksoftirqd/0-12 [000] 582.023488: sched_wakeup: comm=irq/102-atomicc pid=248 prio=1 target_cpu=000
ksoftirqd/0-12 [000] 582.023490: rcu_utilization: Start context switch
ksoftirqd/0-12 [000] 582.023490: rcu_utilization: End context switch
ksoftirqd/0-12 [000] 582.023493: sched_switch: prev_comm=ksoftirqd/0 prev_pid=12 prev_prio=91 prev_state=R+ ==> next_comm=irq/102-atomicc next_pid=248 next_prio=1
irq/102-atomicc-248 [000] 582.023496: sched_pi_setprio: comm=ksoftirqd/0 pid=12 oldprio=91 newprio=1
irq/102-atomicc-248 [000] 582.023499: rcu_utilization: Start context switch
irq/102-atomicc-248 [000] 582.023499: rcu_utilization: End context switch
irq/102-atomicc-248 [000] 582.023501: sched_switch: prev_comm=irq/102-atomicc prev_pid=248 prev_prio=1 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=12 next_prio=1
ksoftirqd/0-12 [000] 582.023503: softirq_entry: vec=7 [action=SCHED]
ksoftirqd/0-12 [000] 582.023508: softirq_exit: vec=7 [action=SCHED]
ksoftirqd/0-12 [000] 582.023510: sched_pi_setprio: comm=ksoftirqd/0 pid=12 oldprio=1 newprio=91
ksoftirqd/0-12 [000] 582.023513: sched_waking: comm=irq/102-atomicc pid=248 prio=1 target_cpu=000
ksoftirqd/0-12 [000] 582.023515: sched_wakeup: comm=irq/102-atomicc pid=248 prio=1 target_cpu=000
ksoftirqd/0-12 [000] 582.023519: rcu_utilization: Start context switch
ksoftirqd/0-12 [000] 582.023519: rcu_utilization: End context switch
ksoftirqd/0-12 [000] 582.023521: sched_switch: prev_comm=ksoftirqd/0 prev_pid=12 prev_prio=91 prev_state=S ==> next_comm=irq/102-atomicc next_pid=248 next_prio=1
irq/102-atomicc-248 [000] 582.023530: sched_waking: comm=S81039292 pid=2240 prio=40 target_cpu=001
irq/102-atomicc-248 [000] 582.023532: sched_wake_idle_without_ipi: cpu=1
irq/102-atomicc-248 [000] 582.023533: sched_wakeup: comm=S81039292 pid=2240 prio=40 target_cpu=001
irq/102-atomicc-248 [000] 582.023539: rcu_utilization: Start context switch
irq/102-atomicc-248 [000] 582.023540: rcu_utilization: End context switch
irq/102-atomicc-248 [000] 582.023542: sched_switch: prev_comm=irq/102-atomicc prev_pid=248 prev_prio=1 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
The 'irq/102-atomicc-248' irq thread is our high priority data
acquisition thread. The additional pi boost and context switch seems to
account for the main differences I'm seeing versus 4.14-rt. This race +
pi boost happens with other lower priority irq threads too but the
ksoftirq case is the most common one.
I would appreciate any thoughts on how/if we could improve this?
Thanks,
Gratian
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: irq thread latency caused by softirq_ctrl.lock contention
2021-09-15 22:59 irq thread latency caused by softirq_ctrl.lock contention Gratian Crisan
@ 2021-09-20 9:18 ` Sebastian Andrzej Siewior
2021-09-22 19:16 ` Gratian Crisan
0 siblings, 1 reply; 3+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-09-20 9:18 UTC (permalink / raw)
To: Gratian Crisan; +Cc: Thomas Gleixner, Steven Rostedt, linux-rt-users
On 2021-09-15 17:59:50 [-0500], Gratian Crisan wrote:
> Hi guys,
Hi,
…
> The 'irq/102-atomicc-248' irq thread is our high priority data
> acquisition thread. The additional pi boost and context switch seems to
> account for the main differences I'm seeing versus 4.14-rt. This race +
> pi boost happens with other lower priority irq threads too but the
> ksoftirq case is the most common one.
>
> I would appreciate any thoughts on how/if we could improve this?
It appears to be a consequence of the new softirq design/ handling.
Earlier we could have multiple softirqs running in parallel on a single
CPU (as in NET_RX and NET_TX).
With the new design only one softirq can be handled at a time resulting
in a full synchronisation at local_bh_diable() time by the lock you
mention in subject.
In your case it appears that irq/102-atomicc is force-threaded and
therefore requires to disable BH before its execution. This is just to
mimic what upstream does in terms of locking and to ensure that BH
invocation happens after the threaded interrupt ended.
If there is nothing special about this interrupt handler (in terms of BH
handling) you could request a threaded handler for the IRQ. The manually
threaded handler do not disable BH before their invocation.
> Thanks,
> Gratian
Sebastian
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Re: irq thread latency caused by softirq_ctrl.lock contention
2021-09-20 9:18 ` Sebastian Andrzej Siewior
@ 2021-09-22 19:16 ` Gratian Crisan
0 siblings, 0 replies; 3+ messages in thread
From: Gratian Crisan @ 2021-09-22 19:16 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: Thomas Gleixner, Steven Rostedt, linux-rt-users
Sebastian Andrzej Siewior writes:
> On 2021-09-15 17:59:50 [-0500], Gratian Crisan wrote:
>> Hi guys,
> Hi,
>
> …
>> The 'irq/102-atomicc-248' irq thread is our high priority data
>> acquisition thread. The additional pi boost and context switch seems to
>> account for the main differences I'm seeing versus 4.14-rt. This race +
>> pi boost happens with other lower priority irq threads too but the
>> ksoftirq case is the most common one.
>>
>> I would appreciate any thoughts on how/if we could improve this?
>
> It appears to be a consequence of the new softirq design/ handling.
> Earlier we could have multiple softirqs running in parallel on a single
> CPU (as in NET_RX and NET_TX).
> With the new design only one softirq can be handled at a time resulting
> in a full synchronisation at local_bh_diable() time by the lock you
> mention in subject.
Makes sense.
> In your case it appears that irq/102-atomicc is force-threaded and
> therefore requires to disable BH before its execution. This is just to
> mimic what upstream does in terms of locking and to ensure that BH
> invocation happens after the threaded interrupt ended.
Yes, good catch.
> If there is nothing special about this interrupt handler (in terms of BH
> handling) you could request a threaded handler for the IRQ. The manually
> threaded handler do not disable BH before their invocation.
>
I appreciate the insight. I think this will solve our problem.
>
> Sebastian
Thanks again,
Gratian
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-09-22 19:38 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-15 22:59 irq thread latency caused by softirq_ctrl.lock contention Gratian Crisan
2021-09-20 9:18 ` Sebastian Andrzej Siewior
2021-09-22 19:16 ` Gratian Crisan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.