linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/tracing: Reset critical timings on scheduling
@ 2021-01-26 18:57 Steven Rostedt
  2021-01-27 11:37 ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: Steven Rostedt @ 2021-01-26 18:57 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, Viktor Rosendahl

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

There's some paths that can call into the scheduler from interrupt disabled
or preempt disabled state. Specifically from the idle thread. The problem is
that it can call the scheduler, still stay idle, and continue. The preempt
and irq disabled tracer considers this a very long latency, and hides real
latencies that we care about.

For example, this is from a preemptirqsoff trace:

  <idle>-0         2dN.1   16us : tick_nohz_account_idle_ticks.isra.0 <-tick_nohz_idle_exit
  <idle>-0         2.N.1   17us : flush_smp_call_function_from_idle <-do_idle
  <idle>-0         2dN.1   17us : flush_smp_call_function_queue <-flush_smp_call_function_from_idle
  <idle>-0         2dN.1   17us : nohz_csd_func <-flush_smp_call_function_queue
  <idle>-0         2.N.1   18us : schedule_idle <-do_idle
  <idle>-0         2dN.1   18us : rcu_note_context_switch <-__schedule
  <idle>-0         2dN.1   18us : rcu_preempt_deferred_qs <-rcu_note_context_switch
  <idle>-0         2dN.1   19us : rcu_preempt_need_deferred_qs <-rcu_preempt_deferred_qs
  <idle>-0         2dN.1   19us : rcu_qs <-rcu_note_context_switch
  <idle>-0         2dN.1   19us : _raw_spin_lock <-__schedule
  <idle>-0         2dN.1   19us : preempt_count_add <-_raw_spin_lock
  <idle>-0         2dN.2   20us : do_raw_spin_trylock <-_raw_spin_lock

do_idle() calls schedule_idle() which calls __schedule, but the latency
continues on for 1.4 milliseconds.

To handle this case, create a new function called
"reset_critical_timings()" which just calls stop_critical_timings() followed
by start_critical_timings() and place this in the scheduler. There's no
reason to worry about timings when the scheduler is called, as that should
allow everything to move forward.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/irqflags.h | 6 ++++++
 kernel/sched/core.c      | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 8de0e1373de7..8649f7dddb04 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -149,6 +149,12 @@ do {						\
 # define start_critical_timings() do { } while (0)
 #endif
 
+static inline void reset_critical_timings(void)
+{
+	stop_critical_timings();
+	start_critical_timings();
+}
+
 /*
  * Wrap the arch provided IRQ routines to provide appropriate checks.
  */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ff74fca39ed2..dd1c57308f0b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4977,6 +4977,8 @@ static void __sched notrace __schedule(bool preempt)
 	local_irq_disable();
 	rcu_note_context_switch(preempt);
 
+	reset_critical_timings();
+
 	/*
 	 * Make sure that signal_pending_state()->signal_pending() below
 	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched/tracing: Reset critical timings on scheduling
  2021-01-26 18:57 [PATCH] sched/tracing: Reset critical timings on scheduling Steven Rostedt
@ 2021-01-27 11:37 ` Peter Zijlstra
  2021-01-27 16:15   ` Steven Rostedt
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2021-01-27 11:37 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Ingo Molnar, Viktor Rosendahl

On Tue, Jan 26, 2021 at 01:57:18PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> There's some paths that can call into the scheduler from interrupt disabled
> or preempt disabled state. Specifically from the idle thread. The problem is
> that it can call the scheduler, still stay idle, and continue. The preempt
> and irq disabled tracer considers this a very long latency, and hides real
> latencies that we care about.
> 
> For example, this is from a preemptirqsoff trace:
> 
>   <idle>-0         2dN.1   16us : tick_nohz_account_idle_ticks.isra.0 <-tick_nohz_idle_exit
>   <idle>-0         2.N.1   17us : flush_smp_call_function_from_idle <-do_idle
>   <idle>-0         2dN.1   17us : flush_smp_call_function_queue <-flush_smp_call_function_from_idle
>   <idle>-0         2dN.1   17us : nohz_csd_func <-flush_smp_call_function_queue
>   <idle>-0         2.N.1   18us : schedule_idle <-do_idle
>   <idle>-0         2dN.1   18us : rcu_note_context_switch <-__schedule
>   <idle>-0         2dN.1   18us : rcu_preempt_deferred_qs <-rcu_note_context_switch
>   <idle>-0         2dN.1   19us : rcu_preempt_need_deferred_qs <-rcu_preempt_deferred_qs
>   <idle>-0         2dN.1   19us : rcu_qs <-rcu_note_context_switch
>   <idle>-0         2dN.1   19us : _raw_spin_lock <-__schedule
>   <idle>-0         2dN.1   19us : preempt_count_add <-_raw_spin_lock
>   <idle>-0         2dN.2   20us : do_raw_spin_trylock <-_raw_spin_lock
> 
> do_idle() calls schedule_idle() which calls __schedule, but the latency
> continues on for 1.4 milliseconds.

I'm not sure I understand the problem from this... what?

> To handle this case, create a new function called
> "reset_critical_timings()" which just calls stop_critical_timings() followed
> by start_critical_timings() and place this in the scheduler. There's no
> reason to worry about timings when the scheduler is called, as that should
> allow everything to move forward.

And that's just really daft.. why are you adding two unconditional
function calls to __schedule() that are a complete waste of time
99.999999% of the time?

If anything, this should be fixed in schedule_idle().

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched/tracing: Reset critical timings on scheduling
  2021-01-27 11:37 ` Peter Zijlstra
@ 2021-01-27 16:15   ` Steven Rostedt
  0 siblings, 0 replies; 3+ messages in thread
From: Steven Rostedt @ 2021-01-27 16:15 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Ingo Molnar, Viktor Rosendahl

On Wed, 27 Jan 2021 12:37:16 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> And that's just really daft.. why are you adding two unconditional
> function calls to __schedule() that are a complete waste of time
> 99.999999% of the time?
> 
> If anything, this should be fixed in schedule_idle().

Note, those two unconditional calls are only called when you have the
preempt or irqs off latency tracers enabled (which causes overhead on every
preempt_enable/disable and irqs enabled/disabled as well, and why we tell
people not to compile them in if you care about performance).

When irqs and preempt latency tracers are not enabled, we have this:

# define stop_critical_timings() do { } while (0)
# define start_critical_timings() do { } while (0)

static inline void reset_critical_timings(void) {
	stop_critical_timings();
	start_critical_timings();
}

which is basically a nop.

If you still care about the overhead to schedule when these tracers are
enabled (which is not much considered the overhead they cause elsewhere), we
can make it more specific to cpu idle.

I was worried that this could happen more than just in cpu idle, and added
it to the scheduler directly to make sure I got any other case.

-- Steve

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-01-27 16:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-26 18:57 [PATCH] sched/tracing: Reset critical timings on scheduling Steven Rostedt
2021-01-27 11:37 ` Peter Zijlstra
2021-01-27 16:15   ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).