Re: About rtla osnoise and timerlat usage

* Re: About rtla osnoise and timerlat usage
       [not found] <CAE8KmOxedTiM8GJVp+-HuBW=jkuE=aSKFYrmaj8zHLmQP-1RCg@mail.gmail.com>
@ 2023-02-22 11:59 ` Daniel Bristot de Oliveira
       [not found]   ` <CAE8KmOzuCqp5w4FBVd6GjPg_znQhumcsA=PKozZbQWxXPdZYXg@mail.gmail.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Bristot de Oliveira @ 2023-02-22 11:59 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: linux-trace-users

Hi Prasad

On 2/22/23 05:35, Prasad Pandit wrote:
> Hello Daniel,
> 
> * I'm debugging a kernel-rt latency spike of about ~55us. Both trace-cmd + oslat(1) and rtla-timerlat-top(1) hint at a TIMER interrupt possibly causing the spike.
> 
>     # tail -n 30 timerlat_trace.txt
>           timerlat/3-925543 [003] ....... 26505.527002: #396   context thread timer_latency     18805 ns
>               <idle>-0     [004] d..h1.. 26505.527046: #396   context    irq timer_latency      8995 ns
>               <idle>-0     [004] dN.h2.. 26505.527052: irq_noise: local_timer:236 start 26505.527045606 duration 6544 ns
>               <idle>-0     [004] d...3.. 26505.527055: thread_noise: swapper/4:0 start 26505.527046097 duration 2430 ns
>           timerlat/4-925544 [004] ....... 26505.527056: #396   context thread timer_latency     18716 ns
>               <idle>-0     [005] d..h1.. 26505.527095: #396   context    irq timer_latency      9199 ns
>               <idle>-0     [005] dN.h2.. 26505.527102: irq_noise: local_timer:236 start 26505.527094732 duration 6586 ns
>               <idle>-0     [005] d...3.. 26505.527104: thread_noise: swapper/5:0 start 26505.527095248 duration 2325 ns
>           timerlat/5-925545 [005] ....... 26505.527105: #396   context thread timer_latency     18735 ns
>               <idle>-0     [006] d..h1.. 26505.527144: #396   context    irq timer_latency      8246 ns
>               <idle>-0     [006] dN.h2.. 26505.527150: irq_noise: local_timer:236 start 26505.527142853 duration 6699 ns
>               <idle>-0     [006] d...3.. 26505.527152: thread_noise: swapper/6:0 start 26505.527143344 duration 2399 ns
>           timerlat/6-925546 [006] ....... 26505.527153: #396   context thread timer_latency     18021 ns
>               <idle>-0     [007] d..h1.. 26505.527195: #396   context    irq timer_latency     10236 ns
>               <idle>-0     [007] dN.h2.. 26505.527201: irq_noise: local_timer:236 start 26505.527194172 duration 6808 ns
>               <idle>-0     [007] d...3.. 26505.527204: thread_noise: swapper/7:0 start 26505.527194702 duration 2401 ns
>           timerlat/7-925547 [007] ....... 26505.527205: #396   context thread timer_latency     20115 ns
>           timerlat/7-925547 [007] ....1.. 26505.527205: <stack trace>
>      => timerlat_irq
>      => __hrtimer_run_queues
>      => hrtimer_interrupt
>      => smp_apic_timer_interrupt
>      => apic_timer_interrupt
>      => native_safe_halt
>      => default_idle
>      => default_idle_call
>      => do_idle
>      => cpu_startup_entry
>      => start_secondary
>      => secondary_startup_64_no_verify

This is the timerlat's timer, so it is expected. What this trace is pointing is to
a possible exit from idle latency... so idle tune is required for this system
and *this metric*... but

> 
> * rtla-timerlat-top with threshold (-T) of 20us promptly terminates with the above trace. But with a threshold(-T) of 30us it completes the full duration(-d) run. To confirm:
> 
>     - 20us threshold is used with oslat(1) tool;  Is it the right threshold value for rtla-timerlat as well?

timerlat does not measure the same thing that oslat measures. oslat is similar to rtla osnoise,
so you need to run rtla osnoise, not rtla timerlat.

>     - From the trace it looks like an isolated CPU is going into an idle loop and from there leads to the timer interrupt.

Yes, that is expected on timerlat in an isolated CPU. But not with osnoise/oslat kind of tool,
as they keep running, while timerlat/cyclictest go to sleep.

Let me know how rtla osnoise results are, so I can help more.

-- Daniel
>     - Not sure why an isolated CPU is going into idle state and firing TIMER interrupt. Does the trace look reasonable or is something amiss?
> 
> * I'd appreciate it if you could share any inputs/suggestions for further clues/debugging/checks.
> 
> Thank you.
> ---
>    - P J P

^ permalink raw reply	[flat|nested] 10+ messages in thread