Re: [SchedulerWakeupLatency] Skipping Idle Cores and CPU Search

From: chris hyser <chris.hyser@oracle.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Parth Shah <parth@linux.ibm.com>,
	Patrick Bellasi <patrick.bellasi@matbug.net>,
	LKML <linux-kernel@vger.kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>, Paul Turner <pjt@google.com>,
	Ben Segall <bsegall@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Jonathan Corbet <corbet@lwn.net>,
	Dhaval Giani <dhaval.giani@oracle.com>,
	Josef Bacik <jbacik@fb.com>
Subject: Re: [SchedulerWakeupLatency] Skipping Idle Cores and CPU Search
Date: Wed, 22 Jul 2020 14:56:17 -0400	[thread overview]
Message-ID: <e8ebbc84-bd7a-5d44-6229-a9712f4a1482@oracle.com> (raw)
In-Reply-To: <c1b24dd5-dce9-61ed-baba-a70f08276bf5@arm.com>

On 7/20/20 4:47 AM, Dietmar Eggemann wrote:
> On 10/07/2020 01:08, chris hyser wrote:
> 
> [...]
> 
>>> D) Desired behavior:
>>
>> Reduce the maximum wake-up latency of designated CFS tasks by skipping
>> some or all of the idle CPU and core searches by setting a maximum idle
>> CPU search value (maximum loop iterations).
>>
>> Searching 'ALL' as the maximum would be the default and implies the
>> current code path which may or may not search up to ALL. Searching 0
>> would result in the least latency (shown with experimental results to be
>> included if/when patchset goes up). One of the considerations is that
>> the maximum length of the search is a function of the size of the LLC
>> scheduling domain and this is platform dependent. Whether 'some', i.e. a
>> numerical value limiting the search can be used to "normalize" this
>> latency across differing scheduling domain sizes is under investigation.
>> Clearly differing hardware will have many other significant differences,
>> but in different sized and dynamically sized VMs running on fleets of
>> common HW this may be interesting.
> 
> I assume that this task-specific feature could coexists in
> select_idle_core() and select_idle_cpu() with the already existing
> runtime heuristics (test_idle_cores() and the two sched features
> mentioned under E/F) to reduce the idle CPU search space on a busy system.

Yes, so perhaps a more generalized summary of the feature is that is simply places a per-task maximum number of 
iterations on the various 'for_each_cpu' loops (whose max is platform dependent) in this path. Any other technique to 
short circuit the loop below this max would be fine including the fact that the very first 'idle' check in a loop may 
succeed and that is perfectly ok in terms of minimizing the search latency. This really only kicks in on busy systems 
and while system or scheduling domain wide heuristics can reduce the cost to tasks for not doing something per-task like 
this, they can't drive the loop iteration search to 0 because that is BAD policy when applied to the wrong tasks or too 
many tasks.

> 
>>> E/F) Existing knobs (and limitations):
>>
>> There are existing sched_feat: SIS_AVG_CPU, SIS_PROP that attempt to
>> short circuit the idle cpu search path in select_idle_cpu() based on
>> estimations of the current costs of searching. Neither provides a means
> 
> [...]
> 
>>> H) Range Analysis:
>>
>> The knob is a positive integer representing "max number of CPUs to
>> search". The default would be 'ALL' which could be translated as
>> INT_MAX. '0 searches' translates to 0. Other values represent a max
>> limit on the search, in this case iterations of a for loop.
> 
> IMHO the opposite use case for this feature (favour high throughput over
> short wakeup latency (Facebook) is already cured by the changes
> introduced by commit 10e2f1acd010 ("sched/core: Rewrite and improve
> select_idle_siblings()"), i.e. with the current implementation of sis().
> 
> It seems that they don't need an additional per-task feature on top of
> the default system-wide runtime heuristics.

Agreed and I hope I've clarified how the attribute in question should not affect that as the default for the attribute 
is basically "no short cut because of this", other heuristics may apply.

-chrish