Re: [PATCH] sched/tracing: Reset critical timings on scheduling

From: Peter Zijlstra <peterz@infradead.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Viktor Rosendahl <Viktor.Rosendahl@bmw.de>
Subject: Re: [PATCH] sched/tracing: Reset critical timings on scheduling
Date: Wed, 27 Jan 2021 12:37:16 +0100	[thread overview]
Message-ID: <YBFQbF/BqmjXFAd0@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20210126135718.5bf8d273@gandalf.local.home>

On Tue, Jan 26, 2021 at 01:57:18PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> There's some paths that can call into the scheduler from interrupt disabled
> or preempt disabled state. Specifically from the idle thread. The problem is
> that it can call the scheduler, still stay idle, and continue. The preempt
> and irq disabled tracer considers this a very long latency, and hides real
> latencies that we care about.
> 
> For example, this is from a preemptirqsoff trace:
> 
>   <idle>-0         2dN.1   16us : tick_nohz_account_idle_ticks.isra.0 <-tick_nohz_idle_exit
>   <idle>-0         2.N.1   17us : flush_smp_call_function_from_idle <-do_idle
>   <idle>-0         2dN.1   17us : flush_smp_call_function_queue <-flush_smp_call_function_from_idle
>   <idle>-0         2dN.1   17us : nohz_csd_func <-flush_smp_call_function_queue
>   <idle>-0         2.N.1   18us : schedule_idle <-do_idle
>   <idle>-0         2dN.1   18us : rcu_note_context_switch <-__schedule
>   <idle>-0         2dN.1   18us : rcu_preempt_deferred_qs <-rcu_note_context_switch
>   <idle>-0         2dN.1   19us : rcu_preempt_need_deferred_qs <-rcu_preempt_deferred_qs
>   <idle>-0         2dN.1   19us : rcu_qs <-rcu_note_context_switch
>   <idle>-0         2dN.1   19us : _raw_spin_lock <-__schedule
>   <idle>-0         2dN.1   19us : preempt_count_add <-_raw_spin_lock
>   <idle>-0         2dN.2   20us : do_raw_spin_trylock <-_raw_spin_lock
> 
> do_idle() calls schedule_idle() which calls __schedule, but the latency
> continues on for 1.4 milliseconds.

I'm not sure I understand the problem from this... what?

> To handle this case, create a new function called
> "reset_critical_timings()" which just calls stop_critical_timings() followed
> by start_critical_timings() and place this in the scheduler. There's no
> reason to worry about timings when the scheduler is called, as that should
> allow everything to move forward.

And that's just really daft.. why are you adding two unconditional
function calls to __schedule() that are a complete waste of time
99.999999% of the time?

If anything, this should be fixed in schedule_idle().