linux-trace-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: About rtla osnoise and timerlat usage
       [not found] <CAE8KmOxedTiM8GJVp+-HuBW=jkuE=aSKFYrmaj8zHLmQP-1RCg@mail.gmail.com>
@ 2023-02-22 11:59 ` Daniel Bristot de Oliveira
       [not found]   ` <CAE8KmOzuCqp5w4FBVd6GjPg_znQhumcsA=PKozZbQWxXPdZYXg@mail.gmail.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Bristot de Oliveira @ 2023-02-22 11:59 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: linux-trace-users

Hi Prasad

On 2/22/23 05:35, Prasad Pandit wrote:
> Hello Daniel,
> 
> * I'm debugging a kernel-rt latency spike of about ~55us. Both trace-cmd + oslat(1) and rtla-timerlat-top(1) hint at a TIMER interrupt possibly causing the spike.
> 
>     # tail -n 30 timerlat_trace.txt
>           timerlat/3-925543 [003] ....... 26505.527002: #396   context thread timer_latency     18805 ns
>               <idle>-0     [004] d..h1.. 26505.527046: #396   context    irq timer_latency      8995 ns
>               <idle>-0     [004] dN.h2.. 26505.527052: irq_noise: local_timer:236 start 26505.527045606 duration 6544 ns
>               <idle>-0     [004] d...3.. 26505.527055: thread_noise: swapper/4:0 start 26505.527046097 duration 2430 ns
>           timerlat/4-925544 [004] ....... 26505.527056: #396   context thread timer_latency     18716 ns
>               <idle>-0     [005] d..h1.. 26505.527095: #396   context    irq timer_latency      9199 ns
>               <idle>-0     [005] dN.h2.. 26505.527102: irq_noise: local_timer:236 start 26505.527094732 duration 6586 ns
>               <idle>-0     [005] d...3.. 26505.527104: thread_noise: swapper/5:0 start 26505.527095248 duration 2325 ns
>           timerlat/5-925545 [005] ....... 26505.527105: #396   context thread timer_latency     18735 ns
>               <idle>-0     [006] d..h1.. 26505.527144: #396   context    irq timer_latency      8246 ns
>               <idle>-0     [006] dN.h2.. 26505.527150: irq_noise: local_timer:236 start 26505.527142853 duration 6699 ns
>               <idle>-0     [006] d...3.. 26505.527152: thread_noise: swapper/6:0 start 26505.527143344 duration 2399 ns
>           timerlat/6-925546 [006] ....... 26505.527153: #396   context thread timer_latency     18021 ns
>               <idle>-0     [007] d..h1.. 26505.527195: #396   context    irq timer_latency     10236 ns
>               <idle>-0     [007] dN.h2.. 26505.527201: irq_noise: local_timer:236 start 26505.527194172 duration 6808 ns
>               <idle>-0     [007] d...3.. 26505.527204: thread_noise: swapper/7:0 start 26505.527194702 duration 2401 ns
>           timerlat/7-925547 [007] ....... 26505.527205: #396   context thread timer_latency     20115 ns
>           timerlat/7-925547 [007] ....1.. 26505.527205: <stack trace>
>      => timerlat_irq
>      => __hrtimer_run_queues
>      => hrtimer_interrupt
>      => smp_apic_timer_interrupt
>      => apic_timer_interrupt
>      => native_safe_halt
>      => default_idle
>      => default_idle_call
>      => do_idle
>      => cpu_startup_entry
>      => start_secondary
>      => secondary_startup_64_no_verify

This is the timerlat's timer, so it is expected. What this trace is pointing is to
a possible exit from idle latency... so idle tune is required for this system
and *this metric*... but

> 
> * rtla-timerlat-top with threshold (-T) of 20us promptly terminates with the above trace. But with a threshold(-T) of 30us it completes the full duration(-d) run. To confirm:
> 
>     - 20us threshold is used with oslat(1) tool;  Is it the right threshold value for rtla-timerlat as well?

timerlat does not measure the same thing that oslat measures. oslat is similar to rtla osnoise,
so you need to run rtla osnoise, not rtla timerlat.

>     - From the trace it looks like an isolated CPU is going into an idle loop and from there leads to the timer interrupt.

Yes, that is expected on timerlat in an isolated CPU. But not with osnoise/oslat kind of tool,
as they keep running, while timerlat/cyclictest go to sleep.

Let me know how rtla osnoise results are, so I can help more.

-- Daniel
>     - Not sure why an isolated CPU is going into idle state and firing TIMER interrupt. Does the trace look reasonable or is something amiss?
> 
> * I'd appreciate it if you could share any inputs/suggestions for further clues/debugging/checks.
> 
> Thank you.
> ---
>    - P J P


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: About rtla osnoise and timerlat usage
       [not found]   ` <CAE8KmOzuCqp5w4FBVd6GjPg_znQhumcsA=PKozZbQWxXPdZYXg@mail.gmail.com>
@ 2023-02-22 13:15     ` Daniel Bristot de Oliveira
       [not found]       ` <CAE8KmOxV8u3v4ALVvqOUO+zvnd99d6iSXw0RiSLondvdX_JJSA@mail.gmail.com>
  2023-02-22 18:06     ` Daniel Bristot de Oliveira
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel Bristot de Oliveira @ 2023-02-22 13:15 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: linux-trace-users

On 2/22/23 09:39, Prasad Pandit wrote:
> Hello Daniel,
> 
> Thank you so much for your reply, I appreciate it.
> 
> On Wed, 22 Feb 2023 at 17:30, Daniel Bristot de Oliveira <bristot@kernel.org <mailto:bristot@kernel.org>> wrote:
> 
>     This is the timerlat's timer, so it is expected. What this trace is pointing is to
>     a possible exit from idle latency... so idle tune is required for this system
>     and *this metric*... but
> 
> 
> * Idle tune?
>  
> 
>     Yes, that is expected on timerlat in an isolated CPU. But not with osnoise/oslat kind of tool,
>     as they keep running, while timerlat/cyclictest go to sleep.
> 
> 
> * I see, okay.
> 
>     Let me know how rtla osnoise results are, so I can help more. 
> 
> 
> * Yes, I've been running oslat(1) and rtla-osnoise(1) too.
>    Please see:
>     oslat(1) log -> https://0bin.net/paste/T0PDXHz5#AnNEzkTRxQVT1gvAqKM43jW+yhqilbNbFqHIHHpy4MY <https://0bin.net/paste/T0PDXHz5#AnNEzkTRxQVT1gvAqKM43jW+yhqilbNbFqHIHHpy4MY>
>     rtla-osnoise-top(1) log -> https://0bin.net/paste/8qwjebnZ#22sfTYTv68JAAMHZJhnCBTP-uvP7Mxj8ipAVbuQVsiy <https://0bin.net/paste/8qwjebnZ#22sfTYTv68JAAMHZJhnCBTP-uvP7Mxj8ipAVbuQVsiy>


The problem in the oslat case is that trace-cmd is awakened in the isolated CPU.

That is probably because trace-cmd once ran and armed a timer there.

I recommend you restrict the affinity of trace-cmd to the non-isolated CPUs before
starting it and run the experiment again.

However, a busy loop in FIFO:95 is not a good setup. That is because you have to
raise the priority of other things like the ktimer because of this. Like in your
example, ktimer as FIFO:97... it is hard to justify this as a sane setup.

In a properly isolated CPU, SCHED_OTHER should be enough. I understand that
people use FIFO because it gives the impression that the busy loop will
receive more CPU time, but this is biased by tools that only measure the
single latency occurrence - and not overall latency.

See this article: https://research.redhat.com/blog/article/osnoise-for-fine-tuning-operating-system-noise-in-linux-kernel/

While running with FIFO reduces the "max single noise" by two us (from 7 to 5 us)
in relation to the SCHED_OTHER, the total amount of noise that the tool running with
FIFO is larger because the starvation of tasks require further checks from the OS
side, generating further noise. So SCHED_OTHER is better for total noise.

In properly isolated systems, the solution is to try to avoid things on the CPUs,
not to starve them. If the system has a job that is pinned to a CPU that cannot
be avoided, just let it run. Keeping the system in the starving condition is
keeping the system in a faulty state, and the work to take the system out of
this situation (like using throttling or stalld) will only cause more noise.

-- Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: About rtla osnoise and timerlat usage
       [not found]   ` <CAE8KmOzuCqp5w4FBVd6GjPg_znQhumcsA=PKozZbQWxXPdZYXg@mail.gmail.com>
  2023-02-22 13:15     ` Daniel Bristot de Oliveira
@ 2023-02-22 18:06     ` Daniel Bristot de Oliveira
  2023-02-22 19:13       ` Prasad Pandit
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel Bristot de Oliveira @ 2023-02-22 18:06 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: linux-trace-users

On 2/22/23 09:39, Prasad Pandit wrote:
> Hello Daniel,
> 
> Thank you so much for your reply, I appreciate it.
> 
> On Wed, 22 Feb 2023 at 17:30, Daniel Bristot de Oliveira <bristot@kernel.org <mailto:bristot@kernel.org>> wrote:
> 
>     This is the timerlat's timer, so it is expected. What this trace is pointing is to
>     a possible exit from idle latency... so idle tune is required for this system
>     and *this metric*... but
> 
> 
> * Idle tune?

ops, I did not reply to this: idle tuning is configuring the system to avoid deep idle states.

An easy way to do it is either setting idle=poll or enabling the "--dma-latency 0" in timerlat.

But this is only a "problem" for timerlat/cyclictest, not for oslat/osnoise as they measure
the noise as they run - so they CPU does not go idle.

> 
>     Yes, that is expected on timerlat in an isolated CPU. But not with osnoise/oslat kind of tool,
>     as they keep running, while timerlat/cyclictest go to sleep.
> 
> 
> * I see, okay.
> 
>     Let me know how rtla osnoise results are, so I can help more. 
> 
> 
> * Yes, I've been running oslat(1) and rtla-osnoise(1) too.
>    Please see:
>     oslat(1) log -> https://0bin.net/paste/T0PDXHz5#AnNEzkTRxQVT1gvAqKM43jW+yhqilbNbFqHIHHpy4MY <https://0bin.net/paste/T0PDXHz5#AnNEzkTRxQVT1gvAqKM43jW+yhqilbNbFqHIHHpy4MY>
>     rtla-osnoise-top(1) log -> https://0bin.net/paste/8qwjebnZ#22sfTYTv68JAAMHZJhnCBTP-uvP7Mxj8ipAVbuQVsiy <https://0bin.net/paste/8qwjebnZ#22sfTYTv68JAAMHZJhnCBTP-uvP7Mxj8ipAVbuQVsiy>
> 
> Thank you.
> ---
>   - P J P


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: About rtla osnoise and timerlat usage
  2023-02-22 18:06     ` Daniel Bristot de Oliveira
@ 2023-02-22 19:13       ` Prasad Pandit
  0 siblings, 0 replies; 10+ messages in thread
From: Prasad Pandit @ 2023-02-22 19:13 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira; +Cc: linux-trace-users

On Wed, 22 Feb 2023 at 23:36, Daniel Bristot de Oliveira
<bristot@kernel.org> wrote:
> > * Idle tune?
> idle tuning is configuring the system to avoid deep idle states.
> An easy way to do it is either setting idle=poll or enabling the "--dma-latency 0" in timerlat.
>
> But this is only a "problem" for timerlat/cyclictest, not for oslat/osnoise as they measure
> the noise as they run - so they CPU does not go idle.

* I see, got it. Thank you.
---
  - P J P


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: About rtla osnoise and timerlat usage
       [not found]       ` <CAE8KmOxV8u3v4ALVvqOUO+zvnd99d6iSXw0RiSLondvdX_JJSA@mail.gmail.com>
@ 2023-02-23 12:12         ` Prasad Pandit
  2023-02-23 14:38           ` Daniel Bristot de Oliveira
  2023-02-23 14:17         ` Daniel Bristot de Oliveira
  1 sibling, 1 reply; 10+ messages in thread
From: Prasad Pandit @ 2023-02-23 12:12 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira; +Cc: linux-trace-users

Hello Daniel,

On Thu, 23 Feb 2023 at 00:41, Prasad Pandit <ppandit@redhat.com> wrote:
>      # trace-cmd record -p nop -e all -M 0x7FE -m 32000 --poll ~test/rt-tests/oslat --cpu-list 1-10 --duration 1h -w memmove -m 4K -T20
>      ....
>       Maximum:    14 11 13 12 12 13 12 11 12 10 (us)
>       Max-Min:    13 10 12 11 11 12 11 10 11 9 (us)
>       Duration:    3599.986 3599.986 3599.987 3599.987 3599.986 3599.987 3599.986 3599.987 3599.986 3599.986 (sec)
>
> * Running oslat(1) with SCHED_OTHER priority via 'trace-cmd --poll' option, did not show the spike. Nonetheless, trace-cmd(1) logs show  <idle>, ktimers/ and kworker/ threads running on isolated CPUs.
> * Now running rtla-osnoise(1) test with SCHED_OTHER:
>       # rtla osnoise top -c 1-10 -d 6h -s 20 -T 20 -Po:0 -q -t

Please see:
    -> https://0bin.net/paste/ShXHmdvu#D0XY-WxTKCzWTxgQ+lTFbx1nB2TP2w+T0Mp8PBXt9gu

* rtla-osnoise(1) running for 6 hours with SCHED_OTHER and SCHED_FIFO,
both did not report any spike above 20us.
  It's reporting occurrence of IRQs only, all other noise columns are
zero(0). That's a little surprising!?

Thank you.
---
   - Prasad


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: About rtla osnoise and timerlat usage
       [not found]       ` <CAE8KmOxV8u3v4ALVvqOUO+zvnd99d6iSXw0RiSLondvdX_JJSA@mail.gmail.com>
  2023-02-23 12:12         ` Prasad Pandit
@ 2023-02-23 14:17         ` Daniel Bristot de Oliveira
  2023-02-23 14:39           ` Steven Rostedt
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel Bristot de Oliveira @ 2023-02-23 14:17 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: linux-trace-users

On 2/22/23 16:11, Prasad Pandit wrote:
> Hello Daniel,
> 
> On Wed, 22 Feb 2023 at 18:45, Daniel Bristot de Oliveira <bristot@kernel.org <mailto:bristot@kernel.org>> wrote:
> 
>     The problem in the oslat case is that trace-cmd is awakened in the isolated CPU.
>     That is probably because trace-cmd once ran and armed a timer there.
> 
>     I recommend you restrict the affinity of trace-cmd to the non-isolated CPUs before
>     starting it and run the experiment again.
> 
> 
> * Yes, I invoked trace-cmd(1) with '-M 0x7FE' cpumask to specify CPUs to trace. That leaves only housekeeping CPUs for the trace-cmd(1) process IIUC.
> ===
> $ for i in `pidof trace-cmd`; do taskset -p -c $i; done
> pid 4835's current affinity list: 0,11
> pid 4834's current affinity list: 0,11
> pid 4833's current affinity list: 0,11
> pid 4832's current affinity list: 0,11
> pid 4831's current affinity list: 0,11
> pid 4830's current affinity list: 0,11
> pid 4829's current affinity list: 0,11
> pid 4828's current affinity list: 0,11
> pid 4827's current affinity list: 0,11
> pid 4826's current affinity list: 0,11
> pid 4825's current affinity list: 0,11
> pid 4824's current affinity list: 0,11
> pid 4823's current affinity list: 0,11
> ===
> 
> * taskset(1) appears to confirm it. Not sure why 'ktimers/6' thread was scheduled on an isolated CPU#6 to sched_wakup trace-cmd process.
> 
> ktimers/6-73 [006] 12793.382812: sched_wakeup: trace-cmd:385311 [120] success=1 CPU:011
> 
>    Maybe because I did not use 'trace-cmd  --poll' option. Running a test with '--poll' now.
>  
> 
>     In a properly isolated CPU, SCHED_OTHER should be enough. I understand that
>     people use FIFO because it gives the impression that the busy loop will
>     receive more CPU time, but this is biased by tools that only measure the
>     single latency occurrence - and not overall latency.
> 
> 
> * I see.
>  
> 
>     See this article: https://research.redhat.com/blog/article/osnoise-for-fine-tuning-operating-system-noise-in-linux-kernel/ <https://research.redhat.com/blog/article/osnoise-for-fine-tuning-operating-system-noise-in-linux-kernel/>
> 
> 
> * Yes, I read this and other 3 articles by you and reading again. :)
>  
> 
>     While running with FIFO reduces the "max single noise" by two us (from 7 to 5 us)
>     in relation to the SCHED_OTHER, the total amount of noise that the tool running with
>     FIFO is larger because the starvation of tasks require further checks from the OS
>     side, generating further noise. So SCHED_OTHER is better for total noise.
> 
> 
> * Doesn't running -rt tasks with higher priority (FIFO:95) than kworker/[120], ktimer/[97] threads help to keep them running on isolated CPUs, than getting sched_switched by kernel threads?

I am not sure if I understood what you mean but...

kworker/[120] <--- this 120 is likely not the same as
ktimer/[97] <---- this 97

The kworker is likely a SCHED_OTHER 0 nice, and ktimer a FIFO:97.

You are placing your load in between them.

That would not be bad if we ran a traditional periodic/sporadic real-time
workload. That is, task that waits for an event, wakes up, runs, and goes
to sleep waiting for the next event.

The problem is that oslat/osnoise run non-stop.

Then a kworker awakened on the CPU will... starve. You will not see it
causing a sched_switch, but if the kworker is pinned to that CPU, it wil
not make progress.

The process waiting for its execution will not make progress either...
And the process of waiting for the process waiting will not make progress
either.. and so on ..

In other words, you are avoiding a context switch (a performance problem), but
creating a potential starvation that can lead to a system crash*.

Some people use FIFO:1 for the busy loop (instead of the 95)... and that is
**less bad** because then you can avoid some types of starvation of
threaded IRQs via PI, as threaded IRQs run as FIFO:50... so the PI breaks
the starvation chain... at the price of causing a sched_switch...

So, by running a busy loop with FIFO:95 (or 1), the user is not avoiding
context_swtich in an isolated CPU, they are postponing them (given proper
isolation). Still, it is better to keep it at a lower FIFO prio to avoid some
further problems.

That is why it is not that safe.

One can bypass that limitation using things like stalld, but in the end, it
is just another way to let the process starve to run. Under a proper setup,
that is the same as just running the busy loop as SCHED_OTHER without the
drawbacks and risks of starvation.

*assuming that you are disabling rt throttling. Otherwise, your system will
have latencies because of it.


>      # trace-cmd record -p nop -e all -M 0x7FE -m 32000 --poll ~test/rt-tests/oslat --cpu-list 1-10 --duration 1h -w memmove -m 4K -T20
>      ....
>       Maximum:    14 11 13 12 12 13 12 11 12 10 (us)
>       Max-Min:    13 10 12 11 11 12 11 10 11 9 (us)
>       Duration:    3599.986 3599.986 3599.987 3599.987 3599.986 3599.987 3599.986 3599.987 3599.986 3599.986 (sec)
> 
> * Running oslat(1) with SCHED_OTHER priority via 'trace-cmd --poll' option, did not show the spike. Nonetheless, trace-cmd(1) logs show  <idle>, ktimers/ and kworker/ threads running on isolated CPUs.
> * Now running rtla-osnoise(1) test with SCHED_OTHER:
>       # rtla osnoise top -c 1-10 -d 6h -s 20 -T 20 -Po:0 -q -t

I will reply to the next email on this... I saw you have results.

> Thank you.
> ---
>   - P J P


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: About rtla osnoise and timerlat usage
  2023-02-23 12:12         ` Prasad Pandit
@ 2023-02-23 14:38           ` Daniel Bristot de Oliveira
  0 siblings, 0 replies; 10+ messages in thread
From: Daniel Bristot de Oliveira @ 2023-02-23 14:38 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: linux-trace-users

On 2/23/23 09:12, Prasad Pandit wrote:
> Hello Daniel,
> 
> On Thu, 23 Feb 2023 at 00:41, Prasad Pandit <ppandit@redhat.com> wrote:
>>      # trace-cmd record -p nop -e all -M 0x7FE -m 32000 --poll ~test/rt-tests/oslat --cpu-list 1-10 --duration 1h -w memmove -m 4K -T20
>>      ....
>>       Maximum:    14 11 13 12 12 13 12 11 12 10 (us)
>>       Max-Min:    13 10 12 11 11 12 11 10 11 9 (us)
>>       Duration:    3599.986 3599.986 3599.987 3599.987 3599.986 3599.987 3599.986 3599.987 3599.986 3599.986 (sec)
>>
>> * Running oslat(1) with SCHED_OTHER priority via 'trace-cmd --poll' option, did not show the spike. Nonetheless, trace-cmd(1) logs show  <idle>, ktimers/ and kworker/ threads running on isolated CPUs.
>> * Now running rtla-osnoise(1) test with SCHED_OTHER:
>>       # rtla osnoise top -c 1-10 -d 6h -s 20 -T 20 -Po:0 -q -t
> 
> Please see:
>     -> https://0bin.net/paste/ShXHmdvu#D0XY-WxTKCzWTxgQ+lTFbx1nB2TP2w+T0Mp8PBXt9gu
> 
> * rtla-osnoise(1) running for 6 hours with SCHED_OTHER and SCHED_FIFO,
> both did not report any spike above 20us.

Good!

>   It's reporting occurrence of IRQs only, all other noise columns are
> zero(0). That's a little surprising!?

Regarding the IRQs only: That is a good sign! your isolation is good!

Regarding the numbers: -T 20 is saying to the tracer to account as noise only
noise >= 20 us. As you did not hit a single case as your -s20 did not stop the trace,
that is expected.

But I think that what you want is -T 1. That is, consider a noise anything that is > than 1 us.

This will show you the noise added by those IRQs.

Anyways, the isolation is good enough to have good numbers with SCHED_OTHER... without
risks of starvation.

> Thank you.
> ---
>    - Prasad
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: About rtla osnoise and timerlat usage
  2023-02-23 14:17         ` Daniel Bristot de Oliveira
@ 2023-02-23 14:39           ` Steven Rostedt
  2023-02-23 14:54             ` Daniel Bristot de Oliveira
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2023-02-23 14:39 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira; +Cc: Prasad Pandit, linux-trace-users

On Thu, 23 Feb 2023 11:17:03 -0300
Daniel Bristot de Oliveira <bristot@kernel.org> wrote:

> I am not sure if I understood what you mean but...
> 
> kworker/[120] <--- this 120 is likely not the same as
> ktimer/[97] <---- this 97
> 
> The kworker is likely a SCHED_OTHER 0 nice, and ktimer a FIFO:97.
> 
> You are placing your load in between them.
> 
> That would not be bad if we ran a traditional periodic/sporadic real-time
> workload. That is, task that waits for an event, wakes up, runs, and goes
> to sleep waiting for the next event.
> 
> The problem is that oslat/osnoise run non-stop.
> 
> Then a kworker awakened on the CPU will... starve. You will not see it
> causing a sched_switch, but if the kworker is pinned to that CPU, it wil
> not make progress.

Note, the kworker and other kernel threads that are pinned to a CPU are
ones that service requests that were triggered on that CPU. It is possible
to run a task at FIFO 99 on an isolated CPU non stop without causing any
issue (you may also need to enable NO_HZ_FULL and make sure RCU has
no-callbacks enabled where the RCU for that isolated CPU gets its work done
on other CPUs).

If your FIFO task calls into the kernel and does something that triggers a
worker, then you may then have an issue. You will need to make sure that
worker gets time to run.

The point I'm making is that it is possible to get something working where
you have a FIFO task running 100%, but you need to set up the system where
it will not cause issues. That requires knowing what system calls that are
done on that CPU that may require workers.

Oh, and there's another issue that can cause problems. Even if you figured
out everything your task does, and make sure that it doesn't trigger any
pinned kworkers, and you are using NO_CB_RCU and NO_HZ_FULL, there's still
an issue that needs to be taken care of. That is, if there was some task
running on that CPU just before your FIFO task runs, it could have
triggered a kworker. And even though it may be done, or even migrated to
another CPU, that kworker will still need to execute. I've seen this cause
days of debugging to why the system crashed.

-- Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: About rtla osnoise and timerlat usage
  2023-02-23 14:39           ` Steven Rostedt
@ 2023-02-23 14:54             ` Daniel Bristot de Oliveira
  2023-02-27  7:10               ` Prasad Pandit
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Bristot de Oliveira @ 2023-02-23 14:54 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Prasad Pandit, linux-trace-users

On 2/23/23 11:39, Steven Rostedt wrote:
> On Thu, 23 Feb 2023 11:17:03 -0300
> Daniel Bristot de Oliveira <bristot@kernel.org> wrote:
> 
>> I am not sure if I understood what you mean but...
>>
>> kworker/[120] <--- this 120 is likely not the same as
>> ktimer/[97] <---- this 97
>>
>> The kworker is likely a SCHED_OTHER 0 nice, and ktimer a FIFO:97.
>>
>> You are placing your load in between them.
>>
>> That would not be bad if we ran a traditional periodic/sporadic real-time
>> workload. That is, task that waits for an event, wakes up, runs, and goes
>> to sleep waiting for the next event.
>>
>> The problem is that oslat/osnoise run non-stop.
>>
>> Then a kworker awakened on the CPU will... starve. You will not see it
>> causing a sched_switch, but if the kworker is pinned to that CPU, it wil
>> not make progress.
> 
> Note, the kworker and other kernel threads that are pinned to a CPU are
> ones that service requests that were triggered on that CPU. It is possible
> to run a task at FIFO 99 on an isolated CPU non stop without causing any
> issue (you may also need to enable NO_HZ_FULL and make sure RCU has
> no-callbacks enabled where the RCU for that isolated CPU gets its work done
> on other CPUs).

Yes, but in the perfect isolation case, where no other task is scheduled there, being
FIFO and OTHER or even IDLE is... equivalent as no scheduler is needed :-).

> If your FIFO task calls into the kernel and does something that triggers a
> worker, then you may then have an issue. You will need to make sure that
> worker gets time to run.
> 
> The point I'm making is that it is possible to get something working where
> you have a FIFO task running 100%, but you need to set up the system where
> it will not cause issues. That requires knowing what system calls that are
> done on that CPU that may require workers.
> 
> Oh, and there's another issue that can cause problems. Even if you figured
> out everything your task does, and make sure that it doesn't trigger any
> pinned kworkers, and you are using NO_CB_RCU and NO_HZ_FULL, there's still
> an issue that needs to be taken care of. That is, if there was some task
> running on that CPU just before your FIFO task runs, it could have
> triggered a kworker. And even though it may be done, or even migrated to
> another CPU, that kworker will still need to execute. I've seen this cause
> days of debugging to why the system crashed.

There are also cases where kworkers are dispatched to all CPUs, from a non-isolated CPU,
to do some house-keeping work. E.g., I think that ftrace used to do that to allocate buffers.
Ideally, all these cases should be reworked to avoid dispatching kworkers where they are
not needed. But as kworkers are added to the code as part of the development, and bad
3rd part drivers can also do it... and... who knows?

That is why the safest path is to: assuming that the isolcpus is done at the perfection,
no schedule will happen, and so all the schedulers are equivalent.

in the exceptional case of something happening to that CPU, they are likely sort living
kernel work that is is just easier to let them run, one monitors those cases and try
to fix the code to avoid them.

> -- Steve


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: About rtla osnoise and timerlat usage
  2023-02-23 14:54             ` Daniel Bristot de Oliveira
@ 2023-02-27  7:10               ` Prasad Pandit
  0 siblings, 0 replies; 10+ messages in thread
From: Prasad Pandit @ 2023-02-27  7:10 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira; +Cc: Steven Rostedt, linux-trace-users

Hello Daniel, Steve,

On Thu, 23 Feb 2023 at 20:24, Daniel Bristot de Oliveira
<bristot@kernel.org> wrote:
> On 2/23/23 11:39, Steven Rostedt wrote:
>>> kworker/[120] <--- this 120 is likely not the same as
>>> ktimer/[97] <---- this 97
>>>
>>> The kworker is likely a SCHED_OTHER 0 nice, and ktimer a FIFO:97.
>>> You are placing your load in between them.

* Oh right, even those threads have different priorities.

>>> That would not be bad if we ran a traditional periodic/sporadic real-time
>>> workload. That is, task that waits for an event, wakes up, runs, and goes
>>> to sleep waiting for the next event.
>>>
>>> The problem is that oslat/osnoise run non-stop.
>>>
>>> Then a kworker awakened on the CPU will... starve. You will not see it
>>> causing a sched_switch, but if the kworker is pinned to that CPU, it wil
>>> not make progress.
>>
>> Note, the kworker and other kernel threads that are pinned to a CPU are
>> ones that service requests that were triggered on that CPU. It is possible
>> to run a task at FIFO 99 on an isolated CPU non stop without causing any
>> issue (you may also need to enable NO_HZ_FULL and make sure RCU has
>> no-callbacks enabled where the RCU for that isolated CPU gets its work done
>> on other CPUs).
>
> Yes, but in the perfect isolation case, where no other task is scheduled there, being
> FIFO and OTHER or even IDLE is... equivalent as no scheduler is needed :-).
>
>> If your FIFO task calls into the kernel and does something that triggers a
>> worker, then you may then have an issue. You will need to make sure that
>> worker gets time to run.
>>
>> The point I'm making is that it is possible to get something working where
>> you have a FIFO task running 100%, but you need to set up the system where
>> it will not cause issues. That requires knowing what system calls that are
>> done on that CPU that may require workers.
>>
>> Oh, and there's another issue that can cause problems. Even if you figured
>> out everything your task does, and make sure that it doesn't trigger any
>> pinned kworkers, and you are using NO_CB_RCU and NO_HZ_FULL, there's still
>> an issue that needs to be taken care of. That is, if there was some task
>> running on that CPU just before your FIFO task runs, it could have
>> triggered a kworker. And even though it may be done, or even migrated to
>> another CPU, that kworker will still need to execute. I've seen this cause
>> days of debugging to why the system crashed.
>
> There are also cases where kworkers are dispatched to all CPUs, from a non-isolated CPU,
> to do some house-keeping work. E.g., I think that ftrace used to do that to allocate buffers.
> Ideally, all these cases should be reworked to avoid dispatching kworkers where they are
> not needed. But as kworkers are added to the code as part of the development, and bad
> 3rd part drivers can also do it... and... who knows?
>
> in the exceptional case of something happening to that CPU, they are likely sort living
> kernel work that is is just easier to let them run, one monitors those cases and try
> to fix the code to avoid them.
>
> That is why the safest path is to: assuming that the isolcpus is done at the perfection,
> no schedule will happen, and so all the schedulers are equivalent.
>

* I see, got it. Thank you so much for your kind replies and detailed
explanations, I appreciate it.

Thank you.
---
  - P J P


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-02-27  7:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAE8KmOxedTiM8GJVp+-HuBW=jkuE=aSKFYrmaj8zHLmQP-1RCg@mail.gmail.com>
2023-02-22 11:59 ` About rtla osnoise and timerlat usage Daniel Bristot de Oliveira
     [not found]   ` <CAE8KmOzuCqp5w4FBVd6GjPg_znQhumcsA=PKozZbQWxXPdZYXg@mail.gmail.com>
2023-02-22 13:15     ` Daniel Bristot de Oliveira
     [not found]       ` <CAE8KmOxV8u3v4ALVvqOUO+zvnd99d6iSXw0RiSLondvdX_JJSA@mail.gmail.com>
2023-02-23 12:12         ` Prasad Pandit
2023-02-23 14:38           ` Daniel Bristot de Oliveira
2023-02-23 14:17         ` Daniel Bristot de Oliveira
2023-02-23 14:39           ` Steven Rostedt
2023-02-23 14:54             ` Daniel Bristot de Oliveira
2023-02-27  7:10               ` Prasad Pandit
2023-02-22 18:06     ` Daniel Bristot de Oliveira
2023-02-22 19:13       ` Prasad Pandit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).