linux-trace-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Trying to trace when RT FF & RR threads hit RT Throttle limit (kernel.sched_rt_runtime_us)
@ 2023-05-01  4:42 Brian Hutchinson
  2023-05-01 14:17 ` Steven Rostedt
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Hutchinson @ 2023-05-01  4:42 UTC (permalink / raw)
  To: linux-trace-users

Hi,

Using 5.10.69 kernel on a i.MX8 platform with isolated cpu's (isolcpus
on cmd line) with RT_RUNTIME_SHARE and wanting to know if (and who) is
hitting the RT Throttle limit.  Since my kernel does not have a
tracepoint for this, I used gdb and disassembled kernel/sched/rt.c to
find the address where throttling gets set and then added a kprobe for
it.

When I look at the trace I'm seeing what looks like "idle" being
throttled in addition to other things.  I "think" my probe is working
as when I look at the trace, the processes that show up look like they
have run over the 95% limit of one period (defaulted to 1 second) but
I'm confused as to why Idle shows up.

I've not ran lttng trace capturing context switches yet to see what
other processes might be involved around the time the throttle kicks
in, at the moment I'm just trying to validate I'm going after the
problem and setting it up the right way.

I did identify a patch that enables a tracepoint for the rt throttle
but it is for RT_GROUP_SCHED and we aren't using that at the moment.

I suspect the application I'm trying to debug has some misbehaving
realtime processes (not using a rt patched kernel) that are being
throttled so I'm trying to identify them so they can be studied and
made to behave better.

If anyone has a better idea or advice on how to go about this please
point me in the right direction.

Below is a sample of a trace I captured using the kprobe I mentioned
above when rt_throttled=1.  I don't quite understand how Idle can be 3
levels deep in preemption or why "idle" is even showing up.

CPU's 1, 2 and 3 have been isolated and the application that appears
to be getting throttled is on core 2:

# tracer: nop
#
# entries-in-buffer/entries-written: 592/592   #P:4
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| /     delay
#           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
#              | |         |   ||||      |         |
         <idle>-0       [002] d.h3   508.964800: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   508.964815: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   508.964822: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   508.964831: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   509.964801: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   509.964817: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   509.964824: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   509.964832: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   510.964799: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   510.964814: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   510.964821: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   510.964832: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   511.964799: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   511.964816: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   511.964823: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   511.964830: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   512.964800: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   512.964815: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   512.964822: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   512.964829: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   513.964801: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   513.964817: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   513.964826: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   513.964835: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   514.964803: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   514.964818: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   514.964825: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   514.964832: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   515.964801: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   515.964818: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   515.964825: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   515.964834: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   516.964797: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   516.964809: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   516.964815: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   516.964820: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [000] d.h3   517.964804: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [000] d.h3   517.964820: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [000] d.h3   517.964829: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [000] d.h3   517.964839: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   518.964799: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   518.964815: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   518.964823: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   518.964831: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   519.964801: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   519.964817: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   519.964824: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   519.964833: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   520.964800: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   520.964815: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   520.964822: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   520.964831: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] dNh3   521.964811: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] dNh3   521.964828: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] dNh3   521.964835: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] dNh3   521.964847: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] dNh3   522.964795: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] dNh3   522.964811: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] dNh3   522.964819: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] dNh3   522.964826: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    irq/33-sdma-369     [002] dNh3   523.964811: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    irq/33-sdma-369     [002] dNh3   523.964826: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    irq/33-sdma-369     [002] dNh3   523.964833: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    irq/33-sdma-369     [002] dNh3   523.964842: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    some-thread-758     [002] d.h4   524.964799: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    some-thread-758     [002] d.h4   524.964816: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    some-thread-758     [002] d.h4   524.964825: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    some-thread-758     [002] d.h4   524.964834: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    some-thread-758     [002] d.h3   525.964794: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    some-thread-758     [002] d.h3   525.964810: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    some-thread-758     [002] d.h3   525.964817: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    some-thread-758     [002] d.h3   525.964825: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    some-thread-758     [002] d.h2   526.964793: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    some-thread-758     [002] d.h2   526.964808: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    some-thread-758     [002] d.h2   526.964815: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    some-thread-758     [002] d.h2   526.964823: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    irq/33-sdma-369     [002] d.h4   527.964802: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    irq/33-sdma-369     [002] d.h4   527.964818: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    irq/33-sdma-369     [002] d.h4   527.964826: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
    irq/33-sdma-369     [002] d.h4   527.964835: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h4   528.964803: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h4   528.964818: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h4   528.964827: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h4   528.964837: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   529.964797: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   529.964812: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   529.964820: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] d.h3   529.964830: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] dNh3   530.964793: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] dNh3   530.964808: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] dNh3   530.964815: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0
         <idle>-0       [002] dNh3   530.964824: p_0x00000000223c0e95:
(sched_rt_period_timer+0x1f0/0x328) arg1=0x0

Thanks for any guidance/pointers.

Regards,

Brian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Trying to trace when RT FF & RR threads hit RT Throttle limit (kernel.sched_rt_runtime_us)
  2023-05-01  4:42 Trying to trace when RT FF & RR threads hit RT Throttle limit (kernel.sched_rt_runtime_us) Brian Hutchinson
@ 2023-05-01 14:17 ` Steven Rostedt
  2023-05-01 15:15   ` Brian Hutchinson
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Rostedt @ 2023-05-01 14:17 UTC (permalink / raw)
  To: Brian Hutchinson; +Cc: linux-trace-users

On Mon, 1 May 2023 00:42:03 -0400
Brian Hutchinson <b.hutchman@gmail.com> wrote:

> Hi,
> 
> Using 5.10.69 kernel on a i.MX8 platform with isolated cpu's (isolcpus
> on cmd line) with RT_RUNTIME_SHARE and wanting to know if (and who) is
> hitting the RT Throttle limit.  Since my kernel does not have a
> tracepoint for this, I used gdb and disassembled kernel/sched/rt.c to
> find the address where throttling gets set and then added a kprobe for
> it.

Where exactly did you add this kprobe?

> 
> When I look at the trace I'm seeing what looks like "idle" being
> throttled in addition to other things.  I "think" my probe is working
> as when I look at the trace, the processes that show up look like they
> have run over the 95% limit of one period (defaulted to 1 second) but
> I'm confused as to why Idle shows up.

Are you saying this because it's happening while idle is running?

> 
> I've not ran lttng trace capturing context switches yet to see what
> other processes might be involved around the time the throttle kicks
> in, at the moment I'm just trying to validate I'm going after the
> problem and setting it up the right way.
> 
> I did identify a patch that enables a tracepoint for the rt throttle
> but it is for RT_GROUP_SCHED and we aren't using that at the moment.
> 
> I suspect the application I'm trying to debug has some misbehaving
> realtime processes (not using a rt patched kernel) that are being
> throttled so I'm trying to identify them so they can be studied and
> made to behave better.
> 
> If anyone has a better idea or advice on how to go about this please
> point me in the right direction.
> 
> Below is a sample of a trace I captured using the kprobe I mentioned
> above when rt_throttled=1.  I don't quite understand how Idle can be 3
> levels deep in preemption or why "idle" is even showing up.
> 
> CPU's 1, 2 and 3 have been isolated and the application that appears
> to be getting throttled is on core 2:
> 
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 592/592   #P:4
> #
> #                                _-----=> irqs-off
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| /     delay
> #           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
> #              | |         |   ||||      |         |
>          <idle>-0       [002] d.h3   508.964800: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0


Note the "d.h3", which means as the heading states, 'd' is interrupts are
disabled, 'h' means it is running in hard interrupt context. That means,
even though idle is the current task, an interrupt triggered, and this is
running in interrupt context.

I see it's running sched_rt_period_timer() which calls
do_sched_rt_period_timer(), and if you look at that function, it does
for_each_cpu(i, span), checking other CPUs to see if it should be throttled
or not.

It's not throttling idle, but the interrupt running on the idle CPU noticed
that an RT task needs to be throttled.

-- Steve


>          <idle>-0       [002] d.h3   508.964815: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   508.964822: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   508.964831: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   509.964801: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   509.964817: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   509.964824: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   509.964832: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   510.964799: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   510.964814: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   510.964821: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   510.964832: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   511.964799: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   511.964816: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   511.964823: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   511.964830: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   512.964800: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   512.964815: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   512.964822: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   512.964829: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   513.964801: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   513.964817: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   513.964826: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   513.964835: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   514.964803: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   514.964818: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   514.964825: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   514.964832: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   515.964801: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   515.964818: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   515.964825: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   515.964834: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   516.964797: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   516.964809: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   516.964815: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   516.964820: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [000] d.h3   517.964804: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [000] d.h3   517.964820: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [000] d.h3   517.964829: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [000] d.h3   517.964839: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   518.964799: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   518.964815: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   518.964823: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   518.964831: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   519.964801: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   519.964817: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   519.964824: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   519.964833: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   520.964800: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   520.964815: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   520.964822: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   520.964831: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] dNh3   521.964811: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] dNh3   521.964828: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] dNh3   521.964835: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] dNh3   521.964847: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] dNh3   522.964795: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] dNh3   522.964811: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] dNh3   522.964819: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] dNh3   522.964826: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     irq/33-sdma-369     [002] dNh3   523.964811: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     irq/33-sdma-369     [002] dNh3   523.964826: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     irq/33-sdma-369     [002] dNh3   523.964833: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     irq/33-sdma-369     [002] dNh3   523.964842: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     some-thread-758     [002] d.h4   524.964799: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     some-thread-758     [002] d.h4   524.964816: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     some-thread-758     [002] d.h4   524.964825: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     some-thread-758     [002] d.h4   524.964834: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     some-thread-758     [002] d.h3   525.964794: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     some-thread-758     [002] d.h3   525.964810: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     some-thread-758     [002] d.h3   525.964817: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     some-thread-758     [002] d.h3   525.964825: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     some-thread-758     [002] d.h2   526.964793: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     some-thread-758     [002] d.h2   526.964808: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     some-thread-758     [002] d.h2   526.964815: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     some-thread-758     [002] d.h2   526.964823: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     irq/33-sdma-369     [002] d.h4   527.964802: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     irq/33-sdma-369     [002] d.h4   527.964818: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     irq/33-sdma-369     [002] d.h4   527.964826: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>     irq/33-sdma-369     [002] d.h4   527.964835: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h4   528.964803: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h4   528.964818: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h4   528.964827: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h4   528.964837: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   529.964797: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   529.964812: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   529.964820: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] d.h3   529.964830: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] dNh3   530.964793: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] dNh3   530.964808: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] dNh3   530.964815: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>          <idle>-0       [002] dNh3   530.964824: p_0x00000000223c0e95:
> (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
> 
> Thanks for any guidance/pointers.
> 
> Regards,
> 
> Brian


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Trying to trace when RT FF & RR threads hit RT Throttle limit (kernel.sched_rt_runtime_us)
  2023-05-01 14:17 ` Steven Rostedt
@ 2023-05-01 15:15   ` Brian Hutchinson
  2023-05-01 15:21     ` Steven Rostedt
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Hutchinson @ 2023-05-01 15:15 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-trace-users

Hey Steve, good to hear from you.

On Mon, May 1, 2023 at 10:17 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 1 May 2023 00:42:03 -0400
> Brian Hutchinson <b.hutchman@gmail.com> wrote:
>
> > Hi,
> >
> > Using 5.10.69 kernel on a i.MX8 platform with isolated cpu's (isolcpus
> > on cmd line) with RT_RUNTIME_SHARE and wanting to know if (and who) is
> > hitting the RT Throttle limit.  Since my kernel does not have a
> > tracepoint for this, I used gdb and disassembled kernel/sched/rt.c to
> > find the address where throttling gets set and then added a kprobe for
> > it.
>
> Where exactly did you add this kprobe?

It was:
/sys/kernel/debug/tracing

echo 'p 0xffff8000100a8e38 %x0' > kprobe_events

Your "Fun with Dynamic Kernel Tracing Events" talk in 2018 was cool
with all those examples but you never gave examples on how to do this
kind of stuff wink, wink, ha, ha.

>
> >
> > When I look at the trace I'm seeing what looks like "idle" being
> > throttled in addition to other things.  I "think" my probe is working
> > as when I look at the trace, the processes that show up look like they
> > have run over the 95% limit of one period (defaulted to 1 second) but
> > I'm confused as to why Idle shows up.
>
> Are you saying this because it's happening while idle is running?

I don't know what I'm saying.  Part of me posting was me questioning
if what I'm seeing is even valid ... but it makes more sense now that
you point out that it's a hard interrupt happening while idle.  Now I
just have to figure out which interrupt.

>
> >
> > I've not ran lttng trace capturing context switches yet to see what
> > other processes might be involved around the time the throttle kicks
> > in, at the moment I'm just trying to validate I'm going after the
> > problem and setting it up the right way.
> >
> > I did identify a patch that enables a tracepoint for the rt throttle
> > but it is for RT_GROUP_SCHED and we aren't using that at the moment.
> >
> > I suspect the application I'm trying to debug has some misbehaving
> > realtime processes (not using a rt patched kernel) that are being
> > throttled so I'm trying to identify them so they can be studied and
> > made to behave better.
> >
> > If anyone has a better idea or advice on how to go about this please
> > point me in the right direction.
> >
> > Below is a sample of a trace I captured using the kprobe I mentioned
> > above when rt_throttled=1.  I don't quite understand how Idle can be 3
> > levels deep in preemption or why "idle" is even showing up.
> >
> > CPU's 1, 2 and 3 have been isolated and the application that appears
> > to be getting throttled is on core 2:
> >
> > # tracer: nop
> > #
> > # entries-in-buffer/entries-written: 592/592   #P:4
> > #
> > #                                _-----=> irqs-off
> > #                               / _----=> need-resched
> > #                              | / _---=> hardirq/softirq
> > #                              || / _--=> preempt-depth
> > #                              ||| /     delay
> > #           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
> > #              | |         |   ||||      |         |
> >          <idle>-0       [002] d.h3   508.964800: p_0x00000000223c0e95:
> > (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
>
>
> Note the "d.h3", which means as the heading states, 'd' is interrupts are
> disabled, 'h' means it is running in hard interrupt context. That means,
> even though idle is the current task, an interrupt triggered, and this is
> running in interrupt context.

As I said above, makes more sense why idle is showing up now that you
pointed that out.  This is an app that was ported from a different OS
that used a single core arch and now that's its on multicore arch
(quad core A53's) with SMP Linux, I think there are some old school
disable interrupts/preempt critical section areas that are going to
need to be reworked ... which is one of the reasons I believe it was
necessary to pin the apps to an isolated core on the i.MX8 to get them
to even run.

>
> I see it's running sched_rt_period_timer() which calls
> do_sched_rt_period_timer(), and if you look at that function, it does
> for_each_cpu(i, span), checking other CPUs to see if it should be throttled
> or not.

... about that.  I screwed up and meant to say scheduler features is
set to NO_RT_RUNTIME_SHARE in the original post as I've read there is
some weirdness on "what" sched_rt_runtime_us means ... a limit on the
system as a whole (all cpu's) or each individual cpu. I "believe" the
way I'm running it is each individual cpu gets sched_rt_runtime_us for
realtime stuff.

>
> It's not throttling idle, but the interrupt running on the idle CPU noticed
> that an RT task needs to be throttled.

The trace all appears to be cpu2 context at the moment.  So now I just
need to figure out how to tell which interrupt it is.

Thanks Steve.

Regards,

Brian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Trying to trace when RT FF & RR threads hit RT Throttle limit (kernel.sched_rt_runtime_us)
  2023-05-01 15:15   ` Brian Hutchinson
@ 2023-05-01 15:21     ` Steven Rostedt
  2023-05-01 19:35       ` Brian Hutchinson
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Rostedt @ 2023-05-01 15:21 UTC (permalink / raw)
  To: Brian Hutchinson; +Cc: linux-trace-users

On Mon, 1 May 2023 11:15:45 -0400
Brian Hutchinson <b.hutchman@gmail.com> wrote:

> Hey Steve, good to hear from you.
> 
> On Mon, May 1, 2023 at 10:17 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Mon, 1 May 2023 00:42:03 -0400
> > Brian Hutchinson <b.hutchman@gmail.com> wrote:
> >  
> > > Hi,
> > >
> > > Using 5.10.69 kernel on a i.MX8 platform with isolated cpu's (isolcpus
> > > on cmd line) with RT_RUNTIME_SHARE and wanting to know if (and who) is
> > > hitting the RT Throttle limit.  Since my kernel does not have a
> > > tracepoint for this, I used gdb and disassembled kernel/sched/rt.c to
> > > find the address where throttling gets set and then added a kprobe for
> > > it.  
> >
> > Where exactly did you add this kprobe?  
> 
> It was:
> /sys/kernel/debug/tracing
> 
> echo 'p 0xffff8000100a8e38 %x0' > kprobe_events

I meant, where was the exact location that address pointed to in the code,
not where in the user interface did you add it ;-)

> 
> Your "Fun with Dynamic Kernel Tracing Events" talk in 2018 was cool
> with all those examples but you never gave examples on how to do this
> kind of stuff wink, wink, ha, ha.
> 
> >  
> > >
> > > When I look at the trace I'm seeing what looks like "idle" being
> > > throttled in addition to other things.  I "think" my probe is working
> > > as when I look at the trace, the processes that show up look like they
> > > have run over the 95% limit of one period (defaulted to 1 second) but
> > > I'm confused as to why Idle shows up.  
> >
> > Are you saying this because it's happening while idle is running?  
> 
> I don't know what I'm saying.  Part of me posting was me questioning
> if what I'm seeing is even valid ... but it makes more sense now that
> you point out that it's a hard interrupt happening while idle.  Now I
> just have to figure out which interrupt.
> 
> >  
> > >
> > > I've not ran lttng trace capturing context switches yet to see what
> > > other processes might be involved around the time the throttle kicks
> > > in, at the moment I'm just trying to validate I'm going after the
> > > problem and setting it up the right way.
> > >
> > > I did identify a patch that enables a tracepoint for the rt throttle
> > > but it is for RT_GROUP_SCHED and we aren't using that at the moment.
> > >
> > > I suspect the application I'm trying to debug has some misbehaving
> > > realtime processes (not using a rt patched kernel) that are being
> > > throttled so I'm trying to identify them so they can be studied and
> > > made to behave better.
> > >
> > > If anyone has a better idea or advice on how to go about this please
> > > point me in the right direction.
> > >
> > > Below is a sample of a trace I captured using the kprobe I mentioned
> > > above when rt_throttled=1.  I don't quite understand how Idle can be 3
> > > levels deep in preemption or why "idle" is even showing up.
> > >
> > > CPU's 1, 2 and 3 have been isolated and the application that appears
> > > to be getting throttled is on core 2:
> > >
> > > # tracer: nop
> > > #
> > > # entries-in-buffer/entries-written: 592/592   #P:4
> > > #
> > > #                                _-----=> irqs-off
> > > #                               / _----=> need-resched
> > > #                              | / _---=> hardirq/softirq
> > > #                              || / _--=> preempt-depth
> > > #                              ||| /     delay
> > > #           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
> > > #              | |         |   ||||      |         |
> > >          <idle>-0       [002] d.h3   508.964800: p_0x00000000223c0e95:
> > > (sched_rt_period_timer+0x1f0/0x328) arg1=0x0  
> >
> >
> > Note the "d.h3", which means as the heading states, 'd' is interrupts are
> > disabled, 'h' means it is running in hard interrupt context. That means,
> > even though idle is the current task, an interrupt triggered, and this is
> > running in interrupt context.  
> 
> As I said above, makes more sense why idle is showing up now that you
> pointed that out.  This is an app that was ported from a different OS
> that used a single core arch and now that's its on multicore arch
> (quad core A53's) with SMP Linux, I think there are some old school
> disable interrupts/preempt critical section areas that are going to
> need to be reworked ... which is one of the reasons I believe it was
> necessary to pin the apps to an isolated core on the i.MX8 to get them
> to even run.
> 
> >
> > I see it's running sched_rt_period_timer() which calls
> > do_sched_rt_period_timer(), and if you look at that function, it does
> > for_each_cpu(i, span), checking other CPUs to see if it should be throttled
> > or not.  
> 
> ... about that.  I screwed up and meant to say scheduler features is
> set to NO_RT_RUNTIME_SHARE in the original post as I've read there is
> some weirdness on "what" sched_rt_runtime_us means ... a limit on the
> system as a whole (all cpu's) or each individual cpu. I "believe" the
> way I'm running it is each individual cpu gets sched_rt_runtime_us for
> realtime stuff.

Correct. I've ran some tests on rt spinners where if their affinity is
broad they bounce around CPUs and still maintain 100% CPU resource (but on
different CPUs). But if you pin it to a single CPU, it will get throttled.

> 
> >
> > It's not throttling idle, but the interrupt running on the idle CPU noticed
> > that an RT task needs to be throttled.  
> 
> The trace all appears to be cpu2 context at the moment.  So now I just
> need to figure out how to tell which interrupt it is.
> 
> Thanks Steve.

No problem.

-- Steve


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Trying to trace when RT FF & RR threads hit RT Throttle limit (kernel.sched_rt_runtime_us)
  2023-05-01 15:21     ` Steven Rostedt
@ 2023-05-01 19:35       ` Brian Hutchinson
  2023-05-02 20:43         ` Steven Rostedt
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Hutchinson @ 2023-05-01 19:35 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-trace-users

On Mon, May 1, 2023 at 11:21 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 1 May 2023 11:15:45 -0400
> Brian Hutchinson <b.hutchman@gmail.com> wrote:
>
> > Hey Steve, good to hear from you.
> >
> > On Mon, May 1, 2023 at 10:17 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Mon, 1 May 2023 00:42:03 -0400
> > > Brian Hutchinson <b.hutchman@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > Using 5.10.69 kernel on a i.MX8 platform with isolated cpu's (isolcpus
> > > > on cmd line) with RT_RUNTIME_SHARE and wanting to know if (and who) is
> > > > hitting the RT Throttle limit.  Since my kernel does not have a
> > > > tracepoint for this, I used gdb and disassembled kernel/sched/rt.c to
> > > > find the address where throttling gets set and then added a kprobe for
> > > > it.
> > >
> > > Where exactly did you add this kprobe?
> >
> > It was:
> > /sys/kernel/debug/tracing
> >
> > echo 'p 0xffff8000100a8e38 %x0' > kprobe_events
>
> I meant, where was the exact location that address pointed to in the code,
> not where in the user interface did you add it ;-)

Douh!  Ok, sorry, my bad.

First attempt I put it here:

   915                         if (rt_rq->rt_time || rt_rq->rt_nr_running)
   916                                 idle = 0;
   917                         raw_spin_unlock(&rt_rq->rt_runtime_lock);
   918                 } else if (rt_rq->rt_nr_running) {
   919                         idle = 0;
   920                         if (!rt_rq_throttled(rt_rq))
   921                                 enqueue = 1;
   922                 }
   923                 if (rt_rq->rt_throttled)
   924                         throttled = 1;
   925
   926                 if (enqueue)
   927                         sched_rt_rq_enqueue(rt_rq);
   928                 raw_spin_unlock(&rq->lock);
   929         }

on line 924.  I'm not the most versed at A53 assembly code so I
guessed which instruction to use (it's always tricky with pipelining
etc.) then cheated and put a breakpoint on line 924 to confirm and
used that address.

Then later I did the same thing:
964         if (rt_rq->rt_time > runtime) {
   965                 struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
   966
   967                 /*
   968                  * Don't actually throttle groups that have no
runtime assigned
   969                  * but accrue some time due to boosting.
   970                  */
   971                 if (likely(rt_b->rt_runtime)) {
   972                         rt_rq->rt_throttled = 1;
   973                         printk_deferred_once("sched: RT
throttling activated\n");
   974                 } else {
   975                         /*
   976                          * In case we did anyway, make it go away,
   977                          * replenishment is a joke, since it
will replenish us
   978                          * with exactly 0 ns.
   979                          */
   980                         rt_rq->rt_time = 0;
   981                 }
   982
   983                 if (rt_rq_throttled(rt_rq)) {
   984                         sched_rt_rq_dequeue(rt_rq);
   985                         return 1;
   986                 }
   987         }

... and put a breakpoint on line 972 and used that address for kprobe.

When I did this I pretty much got the same results as before but I got
a lot of trace records more quickly (most were "idle") and a kworker
thread showed up in the mix too.

I'm also not a scheduler expert so I don't quite know the differences
between the two places rt_throttled gets set (although it should be
easy from the copious amount of comments ... not!).  I just suspect
the throttle is kicking in which is choking off sched_other OS stuff
and I'm trying to figure out who's doing it and why.

>
> >
> > Your "Fun with Dynamic Kernel Tracing Events" talk in 2018 was cool
> > with all those examples but you never gave examples on how to do this
> > kind of stuff wink, wink, ha, ha.
> >
> > >
> > > >
> > > > When I look at the trace I'm seeing what looks like "idle" being
> > > > throttled in addition to other things.  I "think" my probe is working
> > > > as when I look at the trace, the processes that show up look like they
> > > > have run over the 95% limit of one period (defaulted to 1 second) but
> > > > I'm confused as to why Idle shows up.
> > >
> > > Are you saying this because it's happening while idle is running?
> >
> > I don't know what I'm saying.  Part of me posting was me questioning
> > if what I'm seeing is even valid ... but it makes more sense now that
> > you point out that it's a hard interrupt happening while idle.  Now I
> > just have to figure out which interrupt.
> >
> > >
> > > >
> > > > I've not ran lttng trace capturing context switches yet to see what
> > > > other processes might be involved around the time the throttle kicks
> > > > in, at the moment I'm just trying to validate I'm going after the
> > > > problem and setting it up the right way.
> > > >
> > > > I did identify a patch that enables a tracepoint for the rt throttle
> > > > but it is for RT_GROUP_SCHED and we aren't using that at the moment.
> > > >
> > > > I suspect the application I'm trying to debug has some misbehaving
> > > > realtime processes (not using a rt patched kernel) that are being
> > > > throttled so I'm trying to identify them so they can be studied and
> > > > made to behave better.
> > > >
> > > > If anyone has a better idea or advice on how to go about this please
> > > > point me in the right direction.
> > > >
> > > > Below is a sample of a trace I captured using the kprobe I mentioned
> > > > above when rt_throttled=1.  I don't quite understand how Idle can be 3
> > > > levels deep in preemption or why "idle" is even showing up.
> > > >
> > > > CPU's 1, 2 and 3 have been isolated and the application that appears
> > > > to be getting throttled is on core 2:
> > > >
> > > > # tracer: nop
> > > > #
> > > > # entries-in-buffer/entries-written: 592/592   #P:4
> > > > #
> > > > #                                _-----=> irqs-off
> > > > #                               / _----=> need-resched
> > > > #                              | / _---=> hardirq/softirq
> > > > #                              || / _--=> preempt-depth
> > > > #                              ||| /     delay
> > > > #           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
> > > > #              | |         |   ||||      |         |
> > > >          <idle>-0       [002] d.h3   508.964800: p_0x00000000223c0e95:
> > > > (sched_rt_period_timer+0x1f0/0x328) arg1=0x0
> > >
> > >
> > > Note the "d.h3", which means as the heading states, 'd' is interrupts are
> > > disabled, 'h' means it is running in hard interrupt context. That means,
> > > even though idle is the current task, an interrupt triggered, and this is
> > > running in interrupt context.
> >
> > As I said above, makes more sense why idle is showing up now that you
> > pointed that out.  This is an app that was ported from a different OS
> > that used a single core arch and now that's its on multicore arch
> > (quad core A53's) with SMP Linux, I think there are some old school
> > disable interrupts/preempt critical section areas that are going to
> > need to be reworked ... which is one of the reasons I believe it was
> > necessary to pin the apps to an isolated core on the i.MX8 to get them
> > to even run.
> >
> > >
> > > I see it's running sched_rt_period_timer() which calls
> > > do_sched_rt_period_timer(), and if you look at that function, it does
> > > for_each_cpu(i, span), checking other CPUs to see if it should be throttled
> > > or not.
> >
> > ... about that.  I screwed up and meant to say scheduler features is
> > set to NO_RT_RUNTIME_SHARE in the original post as I've read there is
> > some weirdness on "what" sched_rt_runtime_us means ... a limit on the
> > system as a whole (all cpu's) or each individual cpu. I "believe" the
> > way I'm running it is each individual cpu gets sched_rt_runtime_us for
> > realtime stuff.
>
> Correct. I've ran some tests on rt spinners where if their affinity is
> broad they bounce around CPUs and still maintain 100% CPU resource (but on
> different CPUs). But if you pin it to a single CPU, it will get throttled.

I don't "think" anything is moving around here unless the OS is going
against what it's been told to do.  The cpu's are isolated with
cmdline at boot and the affinity of these processes has been selected
when they startup so they should stay where they are put unless there
is something going on I don't know about.  I think I've seen talk
before that the affinity is just a "suggestion" making me think there
are cases where something could move but if that's true I don't know
what those scenarios are.

Regards,

Brian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Trying to trace when RT FF & RR threads hit RT Throttle limit (kernel.sched_rt_runtime_us)
  2023-05-01 19:35       ` Brian Hutchinson
@ 2023-05-02 20:43         ` Steven Rostedt
  0 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2023-05-02 20:43 UTC (permalink / raw)
  To: Brian Hutchinson; +Cc: linux-trace-users

On Mon, 1 May 2023 15:35:43 -0400
Brian Hutchinson <b.hutchman@gmail.com> wrote:

> I don't "think" anything is moving around here unless the OS is going
> against what it's been told to do.  The cpu's are isolated with
> cmdline at boot and the affinity of these processes has been selected
> when they startup so they should stay where they are put unless there
> is something going on I don't know about.  I think I've seen talk
> before that the affinity is just a "suggestion" making me think there
> are cases where something could move but if that's true I don't know
> what those scenarios are.

BTW, you may want to email the relevant maintainers of the scheduler (yes,
I'm one of them), because it's getting off topic for linux-trace-users.

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-05-02 20:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-01  4:42 Trying to trace when RT FF & RR threads hit RT Throttle limit (kernel.sched_rt_runtime_us) Brian Hutchinson
2023-05-01 14:17 ` Steven Rostedt
2023-05-01 15:15   ` Brian Hutchinson
2023-05-01 15:21     ` Steven Rostedt
2023-05-01 19:35       ` Brian Hutchinson
2023-05-02 20:43         ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).