linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/10] Reduce time complexity of select_idle_sibling
@ 2020-12-03 14:11 Mel Gorman
  2020-12-03 14:11 ` [PATCH 01/10] sched/fair: Track efficiency " Mel Gorman
                   ` (10 more replies)
  0 siblings, 11 replies; 30+ messages in thread
From: Mel Gorman @ 2020-12-03 14:11 UTC (permalink / raw)
  To: LKML
  Cc: Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra, Juri Lelli,
	Vincent Guittot, Valentin Schneider, Linux-ARM, Mel Gorman

This is an early prototype that has not been tested heavily. While parts
of it may stand on its own, the motivation to release early is Aubrey
Li's series on using an idle cpumask to optimise the search and Barry
Song's series on representing clusters on die. The series is based on
tip/sched/core rebased to 5.10-rc6.

Patches 1-2 add schedstats to track the search efficiency of
	select_idle_sibling. They can be dropped from the final version but
	are useful when looking at select_idle_sibling in general. MMTests
	can already parse the stats and generate useful data including
	graphs over time.

Patch 3 kills SIS_AVG_CPU but is partially reintroduced later in the
	context of SIS_PROP.

Patch 4 notes that select_idle_core() can find an idle CPU that is
	not a free core yet it is ignored and a second search is conducted
	in select_idle_cpu() which is wasteful. Note that this patch
	will definitely change in the final version.

Patch 5 adjusts p->recent_used_cpu so that it has a higher success rate
	and avoids searching the domain in some cases.

Patch 6 notes that select_idle_* always starts with a CPU that is
	definitely not idle and fixes that.

Patch 7 notes that SIS_PROP is only partially accounting for search
	costs. While this might be accidentally beneficial, it makes it
	much harder to reason about the effectiveness of SIS_PROP.

Patch 8 uses similar logic to SIS_AVG_CPU but in the context of
	SIS_PROP to throttle the search depth.

Patches 9 and 10 are stupid in the context of this series. They
	are included even though it makes no sense to use SIS_PROP logic in
	select_idle_core() as it already has throttling logic. The point
	is to illustrate that the select_idle_mask can be initialised
	at the start of a domain search used to mask out CPUs that have
	already been visited.

In the context of Aubrey's and Barry's work, select_idle_mask would
be initialised *after* select_idle_core as select_idle_core uses
select_idle_mask for its own purposes. In Aubrey's case, the next
step would be to scan idle_cpus_span as those CPUs may still be idle
and bias the search towards likely idle candidates. If that fails,
select_idle_mask clears all the bits set in idle_cpus_span and then
scans the remainder. Similar observations apply to Barry's work, scan the
local domain first, mask out those bits then scan the remaining CPUs in
the cluster.

The final version of this series will drop patches 1-2 unless there is
demand and definitely drop patches 9-10. However, all 4 patches may be
useful in the context of Aubrey's and Barry's work. Patches 1-2 would
give more precise results on exactly how much they are improving "SIS
Domain Search Efficiency" which may be more illustrative than just the
headline performance figures of a given workload. The final version of
this series will also adjust patch 4. If select_idle_core() runs at all
then it definitely should return a CPU -- either an idle CPU or the target
as it has already searched the entire domain and no further searching
should be conducted. Barry might change that back so that a cluster can
be scanned but it would be done in the context of the cluster series.

-- 
2.26.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/10] sched/fair: Track efficiency of select_idle_sibling
  2020-12-03 14:11 [RFC PATCH 00/10] Reduce time complexity of select_idle_sibling Mel Gorman
@ 2020-12-03 14:11 ` Mel Gorman
  2020-12-03 14:11 ` [PATCH 02/10] sched/fair: Track efficiency of task recent_used_cpu Mel Gorman
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2020-12-03 14:11 UTC (permalink / raw)
  To: LKML
  Cc: Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra, Juri Lelli,
	Vincent Guittot, Valentin Schneider, Linux-ARM, Mel Gorman

select_idle_sibling is an important path that finds a nearby idle CPU on
wakeup. As it is examining other CPUs state, it can be expensive in terms
of cache usage. This patch tracks the search efficiency if schedstats
are enabled. In general, this is only useful for kernel developers but
schedstats are typically disabled by default so it is convenient for
development and mostly free otherwise.

It is not required that this patch be merged with the series but if we
are looking at time or search complexity, the stats generate hard data
on what the search costs actually are.

SIS Search: Number of calls to select_idle_sibling

SIS Domain Search: Number of times the domain was searched because the
	fast path failed.

SIS Scanned: Generally the number of runqueues scanned but the fast
	path counts as 1 regardless of the values for target, prev
	and recent.

SIS Domain Scanned: Number of runqueues scanned during a search of the
	LLC domain.

SIS Failures: Number of SIS calls that failed to find an idle CPU

SIS Search Efficiency: A ratio expressed as a percentage of runqueues
	scanned versus idle CPUs found. A 100% efficiency indicates that
	the target, prev or recent CPU of a task was idle at wakeup. The
	lower the efficiency, the more runqueues were scanned before an
	idle CPU was found.

SIS Domain Search Efficiency: Similar, except only for the slower SIS
	patch.

SIS Fast Success Rate: Percentage of SIS that used target, prev or
	recent CPUs.

SIS Success rate: Percentage of scans that found an idle CPU.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/debug.c |  4 ++++
 kernel/sched/fair.c  | 14 ++++++++++++++
 kernel/sched/sched.h |  6 ++++++
 kernel/sched/stats.c |  8 +++++---
 4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 2357921580f9..2386cc5e79e5 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -714,6 +714,10 @@ do {									\
 		P(sched_goidle);
 		P(ttwu_count);
 		P(ttwu_local);
+		P(sis_search);
+		P(sis_domain_search);
+		P(sis_scanned);
+		P(sis_failed);
 	}
 #undef P
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 98075f9ea9a8..494ba01f3414 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6081,6 +6081,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 		bool idle = true;
 
 		for_each_cpu(cpu, cpu_smt_mask(core)) {
+			schedstat_inc(this_rq()->sis_scanned);
 			if (!available_idle_cpu(cpu)) {
 				idle = false;
 				break;
@@ -6112,6 +6113,7 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
 		return -1;
 
 	for_each_cpu(cpu, cpu_smt_mask(target)) {
+		schedstat_inc(this_rq()->sis_scanned);
 		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
 		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
 			continue;
@@ -6177,6 +6179,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
 
 	for_each_cpu_wrap(cpu, cpus, target) {
+		schedstat_inc(this_rq()->sis_scanned);
 		if (!--nr)
 			return -1;
 		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
@@ -6240,6 +6243,15 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	unsigned long task_util;
 	int i, recent_used_cpu;
 
+	schedstat_inc(this_rq()->sis_search);
+
+	/*
+	 * Checking if prev, target and recent is treated as one scan. A
+	 * perfect hit on one of those is considered 100% efficiency.
+	 * Further scanning impairs efficiency.
+	 */
+	schedstat_inc(this_rq()->sis_scanned);
+
 	/*
 	 * On asymmetric system, update task utilization because we will check
 	 * that the task fits with cpu's capacity.
@@ -6315,6 +6327,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (!sd)
 		return target;
 
+	schedstat_inc(this_rq()->sis_domain_search);
 	i = select_idle_core(p, sd, target);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
@@ -6327,6 +6340,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 
+	schedstat_inc(this_rq()->sis_failed);
 	return target;
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f5acb6c5ce49..90a62dd9293d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1049,6 +1049,12 @@ struct rq {
 	/* try_to_wake_up() stats */
 	unsigned int		ttwu_count;
 	unsigned int		ttwu_local;
+
+	/* select_idle_sibling stats */
+	unsigned int		sis_search;
+	unsigned int		sis_domain_search;
+	unsigned int		sis_scanned;
+	unsigned int		sis_failed;
 #endif
 
 #ifdef CONFIG_CPU_IDLE
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 750fb3c67eed..390bfcc3842c 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -10,7 +10,7 @@
  * Bump this up when changing the output format or the meaning of an existing
  * format, so that tools can adapt (or abort)
  */
-#define SCHEDSTAT_VERSION 15
+#define SCHEDSTAT_VERSION 16
 
 static int show_schedstat(struct seq_file *seq, void *v)
 {
@@ -30,12 +30,14 @@ static int show_schedstat(struct seq_file *seq, void *v)
 
 		/* runqueue-specific stats */
 		seq_printf(seq,
-		    "cpu%d %u 0 %u %u %u %u %llu %llu %lu",
+		    "cpu%d %u 0 %u %u %u %u %llu %llu %lu %u %u %u %u",
 		    cpu, rq->yld_count,
 		    rq->sched_count, rq->sched_goidle,
 		    rq->ttwu_count, rq->ttwu_local,
 		    rq->rq_cpu_time,
-		    rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount);
+		    rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount,
+		    rq->sis_search, rq->sis_domain_search,
+		    rq->sis_scanned, rq->sis_failed);
 
 		seq_printf(seq, "\n");
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 02/10] sched/fair: Track efficiency of task recent_used_cpu
  2020-12-03 14:11 [RFC PATCH 00/10] Reduce time complexity of select_idle_sibling Mel Gorman
  2020-12-03 14:11 ` [PATCH 01/10] sched/fair: Track efficiency " Mel Gorman
@ 2020-12-03 14:11 ` Mel Gorman
  2020-12-03 14:11 ` [PATCH 03/10] sched/fair: Remove SIS_AVG_CPU Mel Gorman
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2020-12-03 14:11 UTC (permalink / raw)
  To: LKML
  Cc: Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra, Juri Lelli,
	Vincent Guittot, Valentin Schneider, Linux-ARM, Mel Gorman

This simply tracks the efficiency of the recent_used_cpu. The hit rate
of this matters as it can avoid a domain search. Similarly, the miss
rate matters because each miss is a penalty to the fast path.

It is not required that this patch be merged with the series but if we
are looking at the usefulness of p->recent_used_cpu, the stats generate
hard data on what the hit rate is.

MMTests uses this to generate additional metrics.

SIS Recent Used Hit: A recent CPU was eligible and used. Each hit is
	a domain search avoided.

SIS Recent Used Miss: A recent CPU was eligible but unavailable. Each
	time this is miss, there was a small penalty to the fast path
	before a domain search happened.

SIS Recent Success Rate: A percentage of the number of hits versus
	the total attempts to use the recent CPU.

SIS Recent Attempts: The total number of times the recent CPU was examined.
	A high number of Recent Attempts with a low Success Rate implies
	the fast path is being punished severely. This could have been
	presented as a weighting of hits and misses but calculating an
	appropriate weight for misses is problematic.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/debug.c |  2 ++
 kernel/sched/fair.c  | 23 +++++++++++++----------
 kernel/sched/sched.h |  2 ++
 kernel/sched/stats.c |  7 ++++---
 4 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 2386cc5e79e5..8f933a9e8c25 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -718,6 +718,8 @@ do {									\
 		P(sis_domain_search);
 		P(sis_scanned);
 		P(sis_failed);
+		P(sis_recent_hit);
+		P(sis_recent_miss);
 	}
 #undef P
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 494ba01f3414..d9acd55d309b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6291,16 +6291,19 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	recent_used_cpu = p->recent_used_cpu;
 	if (recent_used_cpu != prev &&
 	    recent_used_cpu != target &&
-	    cpus_share_cache(recent_used_cpu, target) &&
-	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
-	    cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr) &&
-	    asym_fits_capacity(task_util, recent_used_cpu)) {
-		/*
-		 * Replace recent_used_cpu with prev as it is a potential
-		 * candidate for the next wake:
-		 */
-		p->recent_used_cpu = prev;
-		return recent_used_cpu;
+	    cpus_share_cache(recent_used_cpu, target)) {
+		if ((available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
+		    cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr) &&
+		    asym_fits_capacity(task_util, recent_used_cpu)) {
+			/*
+			 * Replace recent_used_cpu with prev as it is a potential
+			 * candidate for the next wake:
+			 */
+			p->recent_used_cpu = prev;
+			schedstat_inc(this_rq()->sis_recent_hit);
+			return recent_used_cpu;
+		}
+		schedstat_inc(this_rq()->sis_recent_miss);
 	}
 
 	/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 90a62dd9293d..6a6578c4c24b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1055,6 +1055,8 @@ struct rq {
 	unsigned int		sis_domain_search;
 	unsigned int		sis_scanned;
 	unsigned int		sis_failed;
+	unsigned int		sis_recent_hit;
+	unsigned int		sis_recent_miss;
 #endif
 
 #ifdef CONFIG_CPU_IDLE
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 390bfcc3842c..402fab75aa14 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -10,7 +10,7 @@
  * Bump this up when changing the output format or the meaning of an existing
  * format, so that tools can adapt (or abort)
  */
-#define SCHEDSTAT_VERSION 16
+#define SCHEDSTAT_VERSION 17
 
 static int show_schedstat(struct seq_file *seq, void *v)
 {
@@ -30,14 +30,15 @@ static int show_schedstat(struct seq_file *seq, void *v)
 
 		/* runqueue-specific stats */
 		seq_printf(seq,
-		    "cpu%d %u 0 %u %u %u %u %llu %llu %lu %u %u %u %u",
+		    "cpu%d %u 0 %u %u %u %u %llu %llu %lu %u %u %u %u %u %u",
 		    cpu, rq->yld_count,
 		    rq->sched_count, rq->sched_goidle,
 		    rq->ttwu_count, rq->ttwu_local,
 		    rq->rq_cpu_time,
 		    rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount,
 		    rq->sis_search, rq->sis_domain_search,
-		    rq->sis_scanned, rq->sis_failed);
+		    rq->sis_scanned, rq->sis_failed,
+		    rq->sis_recent_hit, rq->sis_recent_miss);
 
 		seq_printf(seq, "\n");
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 03/10] sched/fair: Remove SIS_AVG_CPU
  2020-12-03 14:11 [RFC PATCH 00/10] Reduce time complexity of select_idle_sibling Mel Gorman
  2020-12-03 14:11 ` [PATCH 01/10] sched/fair: Track efficiency " Mel Gorman
  2020-12-03 14:11 ` [PATCH 02/10] sched/fair: Track efficiency of task recent_used_cpu Mel Gorman
@ 2020-12-03 14:11 ` Mel Gorman
  2020-12-03 14:11 ` [PATCH 04/10] sched/fair: Return an idle cpu if one is found after a failed search for an idle core Mel Gorman
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2020-12-03 14:11 UTC (permalink / raw)
  To: LKML
  Cc: Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra, Juri Lelli,
	Vincent Guittot, Valentin Schneider, Linux-ARM, Mel Gorman

SIS_AVG_CPU was introduced as a means of avoiding a search when the
average search cost indicated that the search would likely fail. It
was a blunt instrument and disabled by 4c77b18cf8b7 ("sched/fair: Make
select_idle_cpu() more aggressive") and later replaced with a proportional
search depth by 1ad3aaf3fcd2 ("sched/core: Implement new approach to
scale select_idle_cpu()").

While there are corner cases where SIS_AVG_CPU is better, it has now been
disabled for almost three years. As the intent of SIS_PROP is to reduce
the time complexity of select_idle_cpu(), lets drop SIS_AVG_CPU and focus
on SIS_PROP as a throttling mechanism.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c     | 3 ---
 kernel/sched/features.h | 1 -
 2 files changed, 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d9acd55d309b..fc48cc99b03d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6163,9 +6163,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	avg_idle = this_rq()->avg_idle / 512;
 	avg_cost = this_sd->avg_scan_cost + 1;
 
-	if (sched_feat(SIS_AVG_CPU) && avg_idle < avg_cost)
-		return -1;
-
 	if (sched_feat(SIS_PROP)) {
 		u64 span_avg = sd->span_weight * avg_idle;
 		if (span_avg > 4*avg_cost)
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 68d369cba9e4..e875eabb6600 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -54,7 +54,6 @@ SCHED_FEAT(TTWU_QUEUE, true)
 /*
  * When doing wakeups, attempt to limit superfluous scans of the LLC domain.
  */
-SCHED_FEAT(SIS_AVG_CPU, false)
 SCHED_FEAT(SIS_PROP, true)
 
 /*
-- 
2.26.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 04/10] sched/fair: Return an idle cpu if one is found after a failed search for an idle core
  2020-12-03 14:11 [RFC PATCH 00/10] Reduce time complexity of select_idle_sibling Mel Gorman
                   ` (2 preceding siblings ...)
  2020-12-03 14:11 ` [PATCH 03/10] sched/fair: Remove SIS_AVG_CPU Mel Gorman
@ 2020-12-03 14:11 ` Mel Gorman
  2020-12-03 16:35   ` Vincent Guittot
  2020-12-03 14:11 ` [PATCH 05/10] sched/fair: Do not replace recent_used_cpu with the new target Mel Gorman
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Mel Gorman @ 2020-12-03 14:11 UTC (permalink / raw)
  To: LKML
  Cc: Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra, Juri Lelli,
	Vincent Guittot, Valentin Schneider, Linux-ARM, Mel Gorman

select_idle_core is called when SMT is active and there is likely a free
core available. It may find idle CPUs but this information is simply
discarded and the scan starts over again with select_idle_cpu.

This patch caches information on idle CPUs found during the search for
a core and uses one if no core is found. This is a tradeoff. There may
be a slight impact when utilisation is low and an idle core can be
found quickly. It provides improvements as the number of busy CPUs
approaches 50% of the domain size when SMT is enabled.

With tbench on a 2-socket CascadeLake machine, 80 logical CPUs, HT enabled

                          5.10.0-rc6             5.10.0-rc6
                           schedstat          idlecandidate
Hmean     1        500.06 (   0.00%)      505.67 *   1.12%*
Hmean     2        975.90 (   0.00%)      974.06 *  -0.19%*
Hmean     4       1902.95 (   0.00%)     1904.43 *   0.08%*
Hmean     8       3761.73 (   0.00%)     3721.02 *  -1.08%*
Hmean     16      6713.93 (   0.00%)     6769.17 *   0.82%*
Hmean     32     10435.31 (   0.00%)    10312.58 *  -1.18%*
Hmean     64     12325.51 (   0.00%)    13792.01 *  11.90%*
Hmean     128    21225.21 (   0.00%)    20963.44 *  -1.23%*
Hmean     256    20532.83 (   0.00%)    20335.62 *  -0.96%*
Hmean     320    20334.81 (   0.00%)    20147.25 *  -0.92%*

In this particular test, the cost/benefit is marginal except
for 64 which was a point where the machine was over 50% busy
but not fully utilised.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fc48cc99b03d..845bc0cd9158 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6066,6 +6066,7 @@ void __update_idle_core(struct rq *rq)
  */
 static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
 {
+	int idle_candidate = -1;
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	int core, cpu;
 
@@ -6084,7 +6085,13 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 			schedstat_inc(this_rq()->sis_scanned);
 			if (!available_idle_cpu(cpu)) {
 				idle = false;
-				break;
+				if (idle_candidate != -1)
+					break;
+			}
+
+			if (idle_candidate == -1 &&
+			    cpumask_test_cpu(cpu, p->cpus_ptr)) {
+				idle_candidate = cpu;
 			}
 		}
 
@@ -6099,7 +6106,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 	 */
 	set_idle_cores(target, 0);
 
-	return -1;
+	return idle_candidate;
 }
 
 /*
-- 
2.26.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 05/10] sched/fair: Do not replace recent_used_cpu with the new target
  2020-12-03 14:11 [RFC PATCH 00/10] Reduce time complexity of select_idle_sibling Mel Gorman
                   ` (3 preceding siblings ...)
  2020-12-03 14:11 ` [PATCH 04/10] sched/fair: Return an idle cpu if one is found after a failed search for an idle core Mel Gorman
@ 2020-12-03 14:11 ` Mel Gorman
  2020-12-03 14:11 ` [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched Mel Gorman
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2020-12-03 14:11 UTC (permalink / raw)
  To: LKML
  Cc: Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra, Juri Lelli,
	Vincent Guittot, Valentin Schneider, Linux-ARM, Mel Gorman

After select_idle_sibling, p->recent_used_cpu is set to the
new target. However on the next wakeup, prev will be the same as
recent_used_cpu unless the load balancer has moved the task since the last
wakeup. It still works, but is less efficient than it can be after all
the changes that went in since that reduce unnecessary migrations, load
balancer changes etc.  This patch preserves recent_used_cpu for longer.

With tbench on a 2-socket CascadeLake machine, 80 logical CPUs, HT enabled

                          5.10.0-rc6             5.10.0-rc6
                 idlecandidate-v1r10        altrecent-v1r10
Hmean     1        505.67 (   0.00%)      501.34 *  -0.86%*
Hmean     2        974.06 (   0.00%)      981.39 *   0.75%*
Hmean     4       1904.43 (   0.00%)     1926.13 *   1.14%*
Hmean     8       3721.02 (   0.00%)     3799.86 *   2.12%*
Hmean     16      6769.17 (   0.00%)     6938.40 *   2.50%*
Hmean     32     10312.58 (   0.00%)    10632.11 *   3.10%*
Hmean     64     13792.01 (   0.00%)    13670.17 *  -0.88%*
Hmean     128    20963.44 (   0.00%)    21456.33 *   2.35%*
Hmean     256    20335.62 (   0.00%)    21070.24 *   3.61%*
Hmean     320    20147.25 (   0.00%)    20624.92 *   2.37%*

The benefit is marginal, the main impact is on how it affects
p->recent_used_cpu and whether a domain search happens. From the schedstats
patches and schedstat enabled

Ops SIS Search               5653107942.00  5726545742.00
Ops SIS Domain Search        3365067916.00  3319768543.00
Ops SIS Scanned            112173512543.00 99194352541.00
Ops SIS Domain Scanned     109885472517.00 96787575342.00
Ops SIS Failures             2923185114.00  2950166441.00
Ops SIS Recent Used Hit           56547.00   118064916.00
Ops SIS Recent Used Miss     1590899250.00   354942791.00
Ops SIS Recent Attempts      1590955797.00   473007707.00
Ops SIS Search Efficiency             5.04           5.77
Ops SIS Domain Search Eff             3.06           3.43
Ops SIS Fast Success Rate            40.47          42.03
Ops SIS Success Rate                 48.29          48.48
Ops SIS Recent Success Rate           0.00          24.96

(First interesting point is the ridiculous number of times runqueues are
enabled -- almost 97 billion times over the course of 40 minutes)

Note "Recent Used Hit" is over 2000 times more likely to succeed. The
failure rate also increases by quite a lot but the cost is marginal
even if the "Fast Success Rate" only increases by 2% overall. What
cannot be observed from these stats is where the biggest impact as
these stats cover low utilisation to over saturation.

If graphed over time, the graphs show that the sched domain is only
scanned at negligible rates until the machine is fully busy. With
low utilisation, the "Fast Success Rate" is almost 100% until the
machine is fully busy. For 320 clients, the success rate is close to
0% which is unsurprising.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 845bc0cd9158..68dd9cd62fbd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6293,6 +6293,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 
 	/* Check a recently used CPU as a potential idle candidate: */
 	recent_used_cpu = p->recent_used_cpu;
+	p->recent_used_cpu = prev;
 	if (recent_used_cpu != prev &&
 	    recent_used_cpu != target &&
 	    cpus_share_cache(recent_used_cpu, target)) {
@@ -6789,9 +6790,6 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 	} else if (wake_flags & WF_TTWU) { /* XXX always ? */
 		/* Fast path */
 		new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
-
-		if (want_affine)
-			current->recent_used_cpu = cpu;
 	}
 	rcu_read_unlock();
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-03 14:11 [RFC PATCH 00/10] Reduce time complexity of select_idle_sibling Mel Gorman
                   ` (4 preceding siblings ...)
  2020-12-03 14:11 ` [PATCH 05/10] sched/fair: Do not replace recent_used_cpu with the new target Mel Gorman
@ 2020-12-03 14:11 ` Mel Gorman
  2020-12-03 16:38   ` Vincent Guittot
  2020-12-03 14:11 ` [PATCH 07/10] sched/fair: Account for the idle cpu/smt search cost Mel Gorman
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Mel Gorman @ 2020-12-03 14:11 UTC (permalink / raw)
  To: LKML
  Cc: Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra, Juri Lelli,
	Vincent Guittot, Valentin Schneider, Linux-ARM, Mel Gorman

The target CPU is definitely not idle in both select_idle_core and
select_idle_cpu. For select_idle_core(), the SMT is potentially
checked unnecessarily as the core is definitely not idle if the
target is busy. For select_idle_cpu(), the first CPU checked is
simply a waste.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 68dd9cd62fbd..1d8f5c4b4936 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6077,6 +6077,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 		return -1;
 
 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+	__cpumask_clear_cpu(target, cpus);
 
 	for_each_cpu_wrap(core, cpus, target) {
 		bool idle = true;
@@ -6181,6 +6182,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	time = cpu_clock(this);
 
 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+	__cpumask_clear_cpu(target, cpus);
 
 	for_each_cpu_wrap(cpu, cpus, target) {
 		schedstat_inc(this_rq()->sis_scanned);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 07/10] sched/fair: Account for the idle cpu/smt search cost
  2020-12-03 14:11 [RFC PATCH 00/10] Reduce time complexity of select_idle_sibling Mel Gorman
                   ` (5 preceding siblings ...)
  2020-12-03 14:11 ` [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched Mel Gorman
@ 2020-12-03 14:11 ` Mel Gorman
  2020-12-03 14:11 ` [PATCH 08/10] sched/fair: Reintroduce SIS_AVG_CPU but in the context of SIS_PROP to reduce search depth Mel Gorman
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2020-12-03 14:11 UTC (permalink / raw)
  To: LKML
  Cc: Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra, Juri Lelli,
	Vincent Guittot, Valentin Schneider, Linux-ARM, Mel Gorman

select_idle_cpu() accounts average search cost for the purposes of
conducting a limited proportional search if SIS_PROP is enabled. The issue
is that select_idle_cpu() does not account for the cost if a candidate
is found and select_idle_smt() is ignored.

This patch moves the accounting of avg_cost to cover the cpu/smt search
costs. select_idle_core() costs could be accounted for but it has its
own throttling mechanism by tracking depending on whether idle cores are
expected to exist.

This patch is a bisection hazard becuse SIS_PROP and how it balances
avg_cost vs avg_idle was probably guided by the fact that avg_cost was
not always accounted for.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 82 +++++++++++++++++++++++++--------------------
 1 file changed, 46 insertions(+), 36 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1d8f5c4b4936..185fc6e28f8e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6006,6 +6006,29 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
 	return new_cpu;
 }
 
+static int sis_search_depth(struct sched_domain *sd, struct sched_domain *this_sd)
+{
+	u64 avg_cost, avg_idle, span_avg;
+	int nr = INT_MAX;
+
+	if (sched_feat(SIS_PROP)) {
+		/*
+		 * Due to large variance we need a large fuzz factor; hackbench in
+		 * particularly is sensitive here.
+		 */
+		avg_idle = this_rq()->avg_idle / 512;
+		avg_cost = this_sd->avg_scan_cost + 1;
+
+		span_avg = sd->span_weight * avg_idle;
+		if (span_avg > 4*avg_cost)
+			nr = div_u64(span_avg, avg_cost);
+		else
+			nr = 4;
+	}
+
+	return nr;
+}
+
 #ifdef CONFIG_SCHED_SMT
 DEFINE_STATIC_KEY_FALSE(sched_smt_present);
 EXPORT_SYMBOL_GPL(sched_smt_present);
@@ -6151,35 +6174,11 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
  * average idle time for this rq (as found in rq->avg_idle).
  */
-static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd,
+							int target, int nr)
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
-	struct sched_domain *this_sd;
-	u64 avg_cost, avg_idle;
-	u64 time;
-	int this = smp_processor_id();
-	int cpu, nr = INT_MAX;
-
-	this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
-	if (!this_sd)
-		return -1;
-
-	/*
-	 * Due to large variance we need a large fuzz factor; hackbench in
-	 * particularly is sensitive here.
-	 */
-	avg_idle = this_rq()->avg_idle / 512;
-	avg_cost = this_sd->avg_scan_cost + 1;
-
-	if (sched_feat(SIS_PROP)) {
-		u64 span_avg = sd->span_weight * avg_idle;
-		if (span_avg > 4*avg_cost)
-			nr = div_u64(span_avg, avg_cost);
-		else
-			nr = 4;
-	}
-
-	time = cpu_clock(this);
+	int cpu;
 
 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
 	__cpumask_clear_cpu(target, cpus);
@@ -6192,9 +6191,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 			break;
 	}
 
-	time = cpu_clock(this) - time;
-	update_avg(&this_sd->avg_scan_cost, time);
-
 	return cpu;
 }
 
@@ -6245,9 +6241,10 @@ static inline bool asym_fits_capacity(int task_util, int cpu)
  */
 static int select_idle_sibling(struct task_struct *p, int prev, int target)
 {
-	struct sched_domain *sd;
+	struct sched_domain *sd, *this_sd;
 	unsigned long task_util;
-	int i, recent_used_cpu;
+	int i, recent_used_cpu, depth;
+	u64 time;
 
 	schedstat_inc(this_rq()->sis_search);
 
@@ -6337,21 +6334,34 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (!sd)
 		return target;
 
+	this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
+	if (!this_sd)
+		return target;
+
+	depth = sis_search_depth(sd, this_sd);
+
 	schedstat_inc(this_rq()->sis_domain_search);
 	i = select_idle_core(p, sd, target);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 
-	i = select_idle_cpu(p, sd, target);
+	time = cpu_clock(smp_processor_id());
+	i = select_idle_cpu(p, sd, target, depth);
 	if ((unsigned)i < nr_cpumask_bits)
-		return i;
+		goto acct_cost;
 
 	i = select_idle_smt(p, sd, target);
 	if ((unsigned)i < nr_cpumask_bits)
-		return i;
+		goto acct_cost;
 
 	schedstat_inc(this_rq()->sis_failed);
-	return target;
+	i = target;
+
+acct_cost:
+	time = cpu_clock(smp_processor_id()) - time;
+	update_avg(&this_sd->avg_scan_cost, time);
+
+	return i;
 }
 
 /**
-- 
2.26.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 08/10] sched/fair: Reintroduce SIS_AVG_CPU but in the context of SIS_PROP to reduce search depth
  2020-12-03 14:11 [RFC PATCH 00/10] Reduce time complexity of select_idle_sibling Mel Gorman
                   ` (6 preceding siblings ...)
  2020-12-03 14:11 ` [PATCH 07/10] sched/fair: Account for the idle cpu/smt search cost Mel Gorman
@ 2020-12-03 14:11 ` Mel Gorman
  2020-12-03 14:11 ` [PATCH 09/10] sched/fair: Limit the search for an idle core Mel Gorman
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2020-12-03 14:11 UTC (permalink / raw)
  To: LKML
  Cc: Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra, Juri Lelli,
	Vincent Guittot, Valentin Schneider, Linux-ARM, Mel Gorman

Subject says it all but no supporting data at this time. This might help
the hackbench case in isolation or throw other workloads under the bus.
Final version will have proper data.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 185fc6e28f8e..33ce65b67381 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6024,6 +6024,14 @@ static int sis_search_depth(struct sched_domain *sd, struct sched_domain *this_s
 			nr = div_u64(span_avg, avg_cost);
 		else
 			nr = 4;
+
+		/*
+		 * Throttle the depth search futher if average idle time is
+		 * below the average cost. This is primarily to deal with
+		 * the saturated case where searches are likely to fail.
+		 */
+		if (avg_idle < avg_cost)
+			nr >>= 1;
 	}
 
 	return nr;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 09/10] sched/fair: Limit the search for an idle core
  2020-12-03 14:11 [RFC PATCH 00/10] Reduce time complexity of select_idle_sibling Mel Gorman
                   ` (7 preceding siblings ...)
  2020-12-03 14:11 ` [PATCH 08/10] sched/fair: Reintroduce SIS_AVG_CPU but in the context of SIS_PROP to reduce search depth Mel Gorman
@ 2020-12-03 14:11 ` Mel Gorman
  2020-12-03 14:19 ` Mel Gorman
  2020-12-03 14:20 ` [PATCH 10/10] sched/fair: Avoid revisiting CPUs multiple times during select_idle_sibling Mel Gorman
  10 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2020-12-03 14:11 UTC (permalink / raw)
  To: LKML
  Cc: Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra, Juri Lelli,
	Vincent Guittot, Valentin Schneider, Linux-ARM, Mel Gorman

Note: This is a bad idea, it's for illustration only to show how the
	search space can be filtered at each stage. Searching an
	idle_cpu_mask would be a potential option. select_idle_core()
	would be left alone as it has its own throttling mechanism

select_idle_core() may search a full domain for an idle core even if idle
CPUs exist result in an excessive search. This patch partially limits
the search for an idle core similar to select_idle_cpu() once an idle
candidate is found.

Note that this patch can *increase* the number of runqueues considered.
Any searching done by select_idle_core() is duplicated by select_idle_cpu()
if an idle candidate is not found. If there is an idle CPU then aborting
select_idle_core() can have a negative impact. This is addressed in the
next patch.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 33ce65b67381..cd95daf9f53e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6095,7 +6095,8 @@ void __update_idle_core(struct rq *rq)
  * there are no idle cores left in the system; tracked through
  * sd_llc->shared->has_idle_cores and enabled through update_idle_core() above.
  */
-static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_core(struct task_struct *p, struct sched_domain *sd,
+							int target, int nr)
 {
 	int idle_candidate = -1;
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
@@ -6115,6 +6116,11 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 
 		for_each_cpu(cpu, cpu_smt_mask(core)) {
 			schedstat_inc(this_rq()->sis_scanned);
+
+			/* Apply limits if there is an idle candidate */
+			if (idle_candidate != -1)
+				nr--;
+
 			if (!available_idle_cpu(cpu)) {
 				idle = false;
 				if (idle_candidate != -1)
@@ -6130,6 +6136,9 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 		if (idle)
 			return core;
 
+		if (!nr)
+			break;
+
 		cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
 	}
 
@@ -6165,7 +6174,8 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
 
 #else /* CONFIG_SCHED_SMT */
 
-static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
+static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd,
+							int target, int nr)
 {
 	return -1;
 }
@@ -6349,7 +6359,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	depth = sis_search_depth(sd, this_sd);
 
 	schedstat_inc(this_rq()->sis_domain_search);
-	i = select_idle_core(p, sd, target);
+	i = select_idle_core(p, sd, target, depth);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 09/10] sched/fair: Limit the search for an idle core
  2020-12-03 14:11 [RFC PATCH 00/10] Reduce time complexity of select_idle_sibling Mel Gorman
                   ` (8 preceding siblings ...)
  2020-12-03 14:11 ` [PATCH 09/10] sched/fair: Limit the search for an idle core Mel Gorman
@ 2020-12-03 14:19 ` Mel Gorman
  2020-12-03 14:20 ` [PATCH 10/10] sched/fair: Avoid revisiting CPUs multiple times during select_idle_sibling Mel Gorman
  10 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2020-12-03 14:19 UTC (permalink / raw)
  To: LKML
  Cc: Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra, Juri Lelli,
	Vincent Guittot, Valentin Schneider, Linux-ARM

Note: This is a bad idea, it's for illustration only to show how the
	search space can be filtered at each stage. Searching an
	idle_cpu_mask would be a potential option. select_idle_core()
	would be left alone as it has its own throttling mechanism

select_idle_core() may search a full domain for an idle core even if idle
CPUs exist result in an excessive search. This patch partially limits
the search for an idle core similar to select_idle_cpu() once an idle
candidate is found.

Note that this patch can *increase* the number of runqueues considered.
Any searching done by select_idle_core() is duplicated by select_idle_cpu()
if an idle candidate is not found. If there is an idle CPU then aborting
select_idle_core() can have a negative impact. This is addressed in the
next patch.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 33ce65b67381..cd95daf9f53e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6095,7 +6095,8 @@ void __update_idle_core(struct rq *rq)
  * there are no idle cores left in the system; tracked through
  * sd_llc->shared->has_idle_cores and enabled through update_idle_core() above.
  */
-static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_core(struct task_struct *p, struct sched_domain *sd,
+							int target, int nr)
 {
 	int idle_candidate = -1;
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
@@ -6115,6 +6116,11 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 
 		for_each_cpu(cpu, cpu_smt_mask(core)) {
 			schedstat_inc(this_rq()->sis_scanned);
+
+			/* Apply limits if there is an idle candidate */
+			if (idle_candidate != -1)
+				nr--;
+
 			if (!available_idle_cpu(cpu)) {
 				idle = false;
 				if (idle_candidate != -1)
@@ -6130,6 +6136,9 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 		if (idle)
 			return core;
 
+		if (!nr)
+			break;
+
 		cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
 	}
 
@@ -6165,7 +6174,8 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
 
 #else /* CONFIG_SCHED_SMT */
 
-static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
+static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd,
+							int target, int nr)
 {
 	return -1;
 }
@@ -6349,7 +6359,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	depth = sis_search_depth(sd, this_sd);
 
 	schedstat_inc(this_rq()->sis_domain_search);
-	i = select_idle_core(p, sd, target);
+	i = select_idle_core(p, sd, target, depth);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 10/10] sched/fair: Avoid revisiting CPUs multiple times during select_idle_sibling
  2020-12-03 14:11 [RFC PATCH 00/10] Reduce time complexity of select_idle_sibling Mel Gorman
                   ` (9 preceding siblings ...)
  2020-12-03 14:19 ` Mel Gorman
@ 2020-12-03 14:20 ` Mel Gorman
  10 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2020-12-03 14:20 UTC (permalink / raw)
  To: LKML
  Cc: Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra, Juri Lelli,
	Vincent Guittot, Valentin Schneider, Linux-ARM

Note: While this is done in the context of select_idle_core(), I would not
	expect it to be done like this. The intent is to illustrate how
	idle_cpu_mask could be filtered before select_idle_cpus() scans
	the rest of a domain or a wider scan was done across a cluster.

select_idle_core() potentially searches a number of CPUs for idle candidates
before select_idle_cpu() clears the mask and revisits the same CPUs. This
patch moves the initialisation of select_idle_mask to the top-level and
reuses the same mask across both select_idle_core and select_idle_cpu.
select_idle_smt() is left alone as the cost of checking one SMT sibling
is marginal relative to calling __clear_cpumask_cpu() for evey CPU
visited by select_idle_core().

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cd95daf9f53e..af2e108c20c0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6096,10 +6096,9 @@ void __update_idle_core(struct rq *rq)
  * sd_llc->shared->has_idle_cores and enabled through update_idle_core() above.
  */
 static int select_idle_core(struct task_struct *p, struct sched_domain *sd,
-							int target, int nr)
+				int target, int nr, struct cpumask *cpus)
 {
 	int idle_candidate = -1;
-	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	int core, cpu;
 
 	if (!static_branch_likely(&sched_smt_present))
@@ -6108,9 +6107,6 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd,
 	if (!test_idle_cores(target, false))
 		return -1;
 
-	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
-	__cpumask_clear_cpu(target, cpus);
-
 	for_each_cpu_wrap(core, cpus, target) {
 		bool idle = true;
 
@@ -6175,7 +6171,7 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
 #else /* CONFIG_SCHED_SMT */
 
 static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd,
-							int target, int nr)
+					int target, int nr, struct cpumask *cpus)
 {
 	return -1;
 }
@@ -6193,14 +6189,10 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
  * average idle time for this rq (as found in rq->avg_idle).
  */
 static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd,
-							int target, int nr)
+				int target, int nr, struct cpumask *cpus)
 {
-	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	int cpu;
 
-	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
-	__cpumask_clear_cpu(target, cpus);
-
 	for_each_cpu_wrap(cpu, cpus, target) {
 		schedstat_inc(this_rq()->sis_scanned);
 		if (!--nr)
@@ -6260,6 +6252,7 @@ static inline bool asym_fits_capacity(int task_util, int cpu)
 static int select_idle_sibling(struct task_struct *p, int prev, int target)
 {
 	struct sched_domain *sd, *this_sd;
+	struct cpumask *cpus_visited;
 	unsigned long task_util;
 	int i, recent_used_cpu, depth;
 	u64 time;
@@ -6358,13 +6351,23 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 
 	depth = sis_search_depth(sd, this_sd);
 
+	/*
+	 * Init the select_idle_mask. select_idle_core() will mask
+	 * out the CPUs that have already been limited to limit the
+	 * search in select_idle_cpu(). Further clearing is not
+	 * done as select_idle_smt checks only one CPU.
+	 */
+	cpus_visited = this_cpu_cpumask_var_ptr(select_idle_mask);
+	cpumask_and(cpus_visited, sched_domain_span(sd), p->cpus_ptr);
+	__cpumask_clear_cpu(target, cpus_visited);
+
 	schedstat_inc(this_rq()->sis_domain_search);
-	i = select_idle_core(p, sd, target, depth);
+	i = select_idle_core(p, sd, target, depth, cpus_visited);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 
 	time = cpu_clock(smp_processor_id());
-	i = select_idle_cpu(p, sd, target, depth);
+	i = select_idle_cpu(p, sd, target, depth, cpus_visited);
 	if ((unsigned)i < nr_cpumask_bits)
 		goto acct_cost;
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/10] sched/fair: Return an idle cpu if one is found after a failed search for an idle core
  2020-12-03 14:11 ` [PATCH 04/10] sched/fair: Return an idle cpu if one is found after a failed search for an idle core Mel Gorman
@ 2020-12-03 16:35   ` Vincent Guittot
  2020-12-03 17:50     ` Mel Gorman
  0 siblings, 1 reply; 30+ messages in thread
From: Vincent Guittot @ 2020-12-03 16:35 UTC (permalink / raw)
  To: Mel Gorman
  Cc: LKML, Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Thu, 3 Dec 2020 at 15:11, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> select_idle_core is called when SMT is active and there is likely a free
> core available. It may find idle CPUs but this information is simply
> discarded and the scan starts over again with select_idle_cpu.
>
> This patch caches information on idle CPUs found during the search for
> a core and uses one if no core is found. This is a tradeoff. There may
> be a slight impact when utilisation is low and an idle core can be
> found quickly. It provides improvements as the number of busy CPUs
> approaches 50% of the domain size when SMT is enabled.
>
> With tbench on a 2-socket CascadeLake machine, 80 logical CPUs, HT enabled
>
>                           5.10.0-rc6             5.10.0-rc6
>                            schedstat          idlecandidate
> Hmean     1        500.06 (   0.00%)      505.67 *   1.12%*
> Hmean     2        975.90 (   0.00%)      974.06 *  -0.19%*
> Hmean     4       1902.95 (   0.00%)     1904.43 *   0.08%*
> Hmean     8       3761.73 (   0.00%)     3721.02 *  -1.08%*
> Hmean     16      6713.93 (   0.00%)     6769.17 *   0.82%*
> Hmean     32     10435.31 (   0.00%)    10312.58 *  -1.18%*
> Hmean     64     12325.51 (   0.00%)    13792.01 *  11.90%*
> Hmean     128    21225.21 (   0.00%)    20963.44 *  -1.23%*
> Hmean     256    20532.83 (   0.00%)    20335.62 *  -0.96%*
> Hmean     320    20334.81 (   0.00%)    20147.25 *  -0.92%*
>
> In this particular test, the cost/benefit is marginal except
> for 64 which was a point where the machine was over 50% busy
> but not fully utilised.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  kernel/sched/fair.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fc48cc99b03d..845bc0cd9158 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6066,6 +6066,7 @@ void __update_idle_core(struct rq *rq)
>   */
>  static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
>  {
> +       int idle_candidate = -1;
>         struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
>         int core, cpu;
>
> @@ -6084,7 +6085,13 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>                         schedstat_inc(this_rq()->sis_scanned);
>                         if (!available_idle_cpu(cpu)) {
>                                 idle = false;
> -                               break;
> +                               if (idle_candidate != -1)
> +                                       break;


If I get your changes correctly, it will now continue to loop on all
cpus of the smt mask to try to find an idle cpu whereas it was
breaking before as soon as a cpu was not idle. In fact, I thought that
you just wanted to be opportunistic and save a candidate but without
looping more cpus than currently.

With the change above you might end up looping all cpus of llc if
there is only one idle cpu in the llc whereas before we were looping
only 1 cpu per core at most. The bottom change makes sense but the
above on is in some way replacing completely select_idle_cpu and
bypass SIS_PROP and we should avoid that IMO

> +                       }
> +
> +                       if (idle_candidate == -1 &&
> +                           cpumask_test_cpu(cpu, p->cpus_ptr)) {
> +                               idle_candidate = cpu;
>                         }
>                 }
>
> @@ -6099,7 +6106,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>          */
>         set_idle_cores(target, 0);
>
> -       return -1;
> +       return idle_candidate;
>  }
>
>  /*
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-03 14:11 ` [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched Mel Gorman
@ 2020-12-03 16:38   ` Vincent Guittot
  2020-12-03 17:52     ` Mel Gorman
  0 siblings, 1 reply; 30+ messages in thread
From: Vincent Guittot @ 2020-12-03 16:38 UTC (permalink / raw)
  To: Mel Gorman
  Cc: LKML, Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Thu, 3 Dec 2020 at 15:11, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> The target CPU is definitely not idle in both select_idle_core and
> select_idle_cpu. For select_idle_core(), the SMT is potentially
> checked unnecessarily as the core is definitely not idle if the
> target is busy. For select_idle_cpu(), the first CPU checked is
> simply a waste.

>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  kernel/sched/fair.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 68dd9cd62fbd..1d8f5c4b4936 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6077,6 +6077,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>                 return -1;
>
>         cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> +       __cpumask_clear_cpu(target, cpus);

should clear cpu_smt_mask(target) as we are sure that the core will not be idle

>
>         for_each_cpu_wrap(core, cpus, target) {
>                 bool idle = true;
> @@ -6181,6 +6182,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>         time = cpu_clock(this);
>
>         cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> +       __cpumask_clear_cpu(target, cpus);
>
>         for_each_cpu_wrap(cpu, cpus, target) {
>                 schedstat_inc(this_rq()->sis_scanned);
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/10] sched/fair: Return an idle cpu if one is found after a failed search for an idle core
  2020-12-03 16:35   ` Vincent Guittot
@ 2020-12-03 17:50     ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2020-12-03 17:50 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: LKML, Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Thu, Dec 03, 2020 at 05:35:29PM +0100, Vincent Guittot wrote:
> > index fc48cc99b03d..845bc0cd9158 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6066,6 +6066,7 @@ void __update_idle_core(struct rq *rq)
> >   */
> >  static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
> >  {
> > +       int idle_candidate = -1;
> >         struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> >         int core, cpu;
> >
> > @@ -6084,7 +6085,13 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
> >                         schedstat_inc(this_rq()->sis_scanned);
> >                         if (!available_idle_cpu(cpu)) {
> >                                 idle = false;
> > -                               break;
> > +                               if (idle_candidate != -1)
> > +                                       break;
> 
> 
> If I get your changes correctly, it will now continue to loop on all
> cpus of the smt mask to try to find an idle cpu whereas it was

That was an oversight, the intent is that the SMT search breaks but
the search for an idle core continues. The patch was taken from a very
different series that unified all the select_idle_* functions as a single
function and I failed to fix it up properly. The unification series
didn't generate good results back 9 months ago when I tried and I never
finished it off. In the current context, it would not make sense to try
a unification again.

> With the change above you might end up looping all cpus of llc if
> there is only one idle cpu in the llc whereas before we were looping
> only 1 cpu per core at most. The bottom change makes sense but the
> above on is in some way replacing completely select_idle_cpu and
> bypass SIS_PROP and we should avoid that IMO
> 

You're right of course, it was never intended to behave like that.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0a3d338770c4..49b1590e60a9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6084,8 +6084,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 		for_each_cpu(cpu, cpu_smt_mask(core)) {
 			if (!available_idle_cpu(cpu)) {
 				idle = false;
-				if (idle_candidate != -1)
-					break;
+				break;
 			}
 
 			if (idle_candidate == -1 &&

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-03 16:38   ` Vincent Guittot
@ 2020-12-03 17:52     ` Mel Gorman
  2020-12-04 10:56       ` Vincent Guittot
  0 siblings, 1 reply; 30+ messages in thread
From: Mel Gorman @ 2020-12-03 17:52 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: LKML, Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Thu, Dec 03, 2020 at 05:38:03PM +0100, Vincent Guittot wrote:
> On Thu, 3 Dec 2020 at 15:11, Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > The target CPU is definitely not idle in both select_idle_core and
> > select_idle_cpu. For select_idle_core(), the SMT is potentially
> > checked unnecessarily as the core is definitely not idle if the
> > target is busy. For select_idle_cpu(), the first CPU checked is
> > simply a waste.
> 
> >
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > ---
> >  kernel/sched/fair.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 68dd9cd62fbd..1d8f5c4b4936 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6077,6 +6077,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
> >                 return -1;
> >
> >         cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > +       __cpumask_clear_cpu(target, cpus);
> 
> should clear cpu_smt_mask(target) as we are sure that the core will not be idle
> 

The intent was that the sibling might still be an idle candidate. In
the current draft of the series, I do not even clear this so that the
SMT sibling is considered as an idle candidate. The reasoning is that if
there are no idle cores then an SMT sibling of the target is as good an
idle CPU to select as any.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-03 17:52     ` Mel Gorman
@ 2020-12-04 10:56       ` Vincent Guittot
  2020-12-04 11:30         ` Mel Gorman
  0 siblings, 1 reply; 30+ messages in thread
From: Vincent Guittot @ 2020-12-04 10:56 UTC (permalink / raw)
  To: Mel Gorman
  Cc: LKML, Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Thu, 3 Dec 2020 at 18:52, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> On Thu, Dec 03, 2020 at 05:38:03PM +0100, Vincent Guittot wrote:
> > On Thu, 3 Dec 2020 at 15:11, Mel Gorman <mgorman@techsingularity.net> wrote:
> > >
> > > The target CPU is definitely not idle in both select_idle_core and
> > > select_idle_cpu. For select_idle_core(), the SMT is potentially
> > > checked unnecessarily as the core is definitely not idle if the
> > > target is busy. For select_idle_cpu(), the first CPU checked is
> > > simply a waste.
> >
> > >
> > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > > ---
> > >  kernel/sched/fair.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 68dd9cd62fbd..1d8f5c4b4936 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -6077,6 +6077,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
> > >                 return -1;
> > >
> > >         cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > > +       __cpumask_clear_cpu(target, cpus);
> >
> > should clear cpu_smt_mask(target) as we are sure that the core will not be idle
> >
>
> The intent was that the sibling might still be an idle candidate. In
> the current draft of the series, I do not even clear this so that the
> SMT sibling is considered as an idle candidate. The reasoning is that if
> there are no idle cores then an SMT sibling of the target is as good an
> idle CPU to select as any.

Isn't the purpose of select_idle_smt ?

select_idle_core() looks for an idle core and opportunistically saves
an idle CPU candidate to skip select_idle_cpu. In this case this is
useless loops for select_idle_core() because we are sure that the core
is not idle


>
> --
> Mel Gorman
> SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-04 10:56       ` Vincent Guittot
@ 2020-12-04 11:30         ` Mel Gorman
  2020-12-04 13:13           ` Vincent Guittot
  0 siblings, 1 reply; 30+ messages in thread
From: Mel Gorman @ 2020-12-04 11:30 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: LKML, Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
> > The intent was that the sibling might still be an idle candidate. In
> > the current draft of the series, I do not even clear this so that the
> > SMT sibling is considered as an idle candidate. The reasoning is that if
> > there are no idle cores then an SMT sibling of the target is as good an
> > idle CPU to select as any.
> 
> Isn't the purpose of select_idle_smt ?
> 

Only in part.

> select_idle_core() looks for an idle core and opportunistically saves
> an idle CPU candidate to skip select_idle_cpu. In this case this is
> useless loops for select_idle_core() because we are sure that the core
> is not idle
> 

If select_idle_core() finds an idle candidate other than the sibling,
it'll use it if there is no idle core -- it picks a busy sibling based
on a linear walk of the cpumask. Similarly, select_idle_cpu() is not
guaranteed to scan the sibling first (ordering) or even reach the sibling
(throttling). select_idle_smt() is a last-ditch effort.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-04 11:30         ` Mel Gorman
@ 2020-12-04 13:13           ` Vincent Guittot
  2020-12-04 13:17             ` Vincent Guittot
  0 siblings, 1 reply; 30+ messages in thread
From: Vincent Guittot @ 2020-12-04 13:13 UTC (permalink / raw)
  To: Mel Gorman
  Cc: LKML, Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Fri, 4 Dec 2020 at 12:30, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
> > > The intent was that the sibling might still be an idle candidate. In
> > > the current draft of the series, I do not even clear this so that the
> > > SMT sibling is considered as an idle candidate. The reasoning is that if
> > > there are no idle cores then an SMT sibling of the target is as good an
> > > idle CPU to select as any.
> >
> > Isn't the purpose of select_idle_smt ?
> >
>
> Only in part.
>
> > select_idle_core() looks for an idle core and opportunistically saves
> > an idle CPU candidate to skip select_idle_cpu. In this case this is
> > useless loops for select_idle_core() because we are sure that the core
> > is not idle
> >
>
> If select_idle_core() finds an idle candidate other than the sibling,
> it'll use it if there is no idle core -- it picks a busy sibling based
> on a linear walk of the cpumask. Similarly, select_idle_cpu() is not

My point is that it's a waste of time to loop the sibling cpus of
target in select_idle_core because it will not help to find an idle
core. The sibling  cpus will then be check either by select_idle_cpu
of select_idle_smt

> guaranteed to scan the sibling first (ordering) or even reach the sibling
> (throttling). select_idle_smt() is a last-ditch effort.
>
> --
> Mel Gorman
> SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-04 13:13           ` Vincent Guittot
@ 2020-12-04 13:17             ` Vincent Guittot
  2020-12-04 13:40               ` Li, Aubrey
  2020-12-04 14:27               ` Mel Gorman
  0 siblings, 2 replies; 30+ messages in thread
From: Vincent Guittot @ 2020-12-04 13:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: LKML, Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Fri, 4 Dec 2020 at 14:13, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>
> On Fri, 4 Dec 2020 at 12:30, Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
> > > > The intent was that the sibling might still be an idle candidate. In
> > > > the current draft of the series, I do not even clear this so that the
> > > > SMT sibling is considered as an idle candidate. The reasoning is that if
> > > > there are no idle cores then an SMT sibling of the target is as good an
> > > > idle CPU to select as any.
> > >
> > > Isn't the purpose of select_idle_smt ?
> > >
> >
> > Only in part.
> >
> > > select_idle_core() looks for an idle core and opportunistically saves
> > > an idle CPU candidate to skip select_idle_cpu. In this case this is
> > > useless loops for select_idle_core() because we are sure that the core
> > > is not idle
> > >
> >
> > If select_idle_core() finds an idle candidate other than the sibling,
> > it'll use it if there is no idle core -- it picks a busy sibling based
> > on a linear walk of the cpumask. Similarly, select_idle_cpu() is not
>
> My point is that it's a waste of time to loop the sibling cpus of
> target in select_idle_core because it will not help to find an idle
> core. The sibling  cpus will then be check either by select_idle_cpu
> of select_idle_smt

also, while looping the cpumask, the sibling cpus of not idle cpu are
removed and will not be check

>
> > guaranteed to scan the sibling first (ordering) or even reach the sibling
> > (throttling). select_idle_smt() is a last-ditch effort.
> >
> > --
> > Mel Gorman
> > SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-04 13:17             ` Vincent Guittot
@ 2020-12-04 13:40               ` Li, Aubrey
  2020-12-04 13:47                 ` Li, Aubrey
  2020-12-04 13:47                 ` Vincent Guittot
  2020-12-04 14:27               ` Mel Gorman
  1 sibling, 2 replies; 30+ messages in thread
From: Li, Aubrey @ 2020-12-04 13:40 UTC (permalink / raw)
  To: Vincent Guittot, Mel Gorman
  Cc: LKML, Barry Song, Ingo Molnar, Peter Ziljstra, Juri Lelli,
	Valentin Schneider, Linux-ARM

On 2020/12/4 21:17, Vincent Guittot wrote:
> On Fri, 4 Dec 2020 at 14:13, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>>
>> On Fri, 4 Dec 2020 at 12:30, Mel Gorman <mgorman@techsingularity.net> wrote:
>>>
>>> On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
>>>>> The intent was that the sibling might still be an idle candidate. In
>>>>> the current draft of the series, I do not even clear this so that the
>>>>> SMT sibling is considered as an idle candidate. The reasoning is that if
>>>>> there are no idle cores then an SMT sibling of the target is as good an
>>>>> idle CPU to select as any.
>>>>
>>>> Isn't the purpose of select_idle_smt ?
>>>>
>>>
>>> Only in part.
>>>
>>>> select_idle_core() looks for an idle core and opportunistically saves
>>>> an idle CPU candidate to skip select_idle_cpu. In this case this is
>>>> useless loops for select_idle_core() because we are sure that the core
>>>> is not idle
>>>>
>>>
>>> If select_idle_core() finds an idle candidate other than the sibling,
>>> it'll use it if there is no idle core -- it picks a busy sibling based
>>> on a linear walk of the cpumask. Similarly, select_idle_cpu() is not
>>
>> My point is that it's a waste of time to loop the sibling cpus of
>> target in select_idle_core because it will not help to find an idle
>> core. The sibling  cpus will then be check either by select_idle_cpu
>> of select_idle_smt
> 
> also, while looping the cpumask, the sibling cpus of not idle cpu are
> removed and will not be check
>

IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
If the target's sibling is removed from select_idle_mask from select_idle_core(),
select_idle_cpu() will lose the chance to pick it up?

Thanks,
-Aubrey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-04 13:40               ` Li, Aubrey
@ 2020-12-04 13:47                 ` Li, Aubrey
  2020-12-04 13:47                 ` Vincent Guittot
  1 sibling, 0 replies; 30+ messages in thread
From: Li, Aubrey @ 2020-12-04 13:47 UTC (permalink / raw)
  To: Vincent Guittot, Mel Gorman
  Cc: LKML, Barry Song, Ingo Molnar, Peter Ziljstra, Juri Lelli,
	Valentin Schneider, Linux-ARM

On 2020/12/4 21:40, Li, Aubrey wrote:
> On 2020/12/4 21:17, Vincent Guittot wrote:
>> On Fri, 4 Dec 2020 at 14:13, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>>>
>>> On Fri, 4 Dec 2020 at 12:30, Mel Gorman <mgorman@techsingularity.net> wrote:
>>>>
>>>> On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
>>>>>> The intent was that the sibling might still be an idle candidate. In
>>>>>> the current draft of the series, I do not even clear this so that the
>>>>>> SMT sibling is considered as an idle candidate. The reasoning is that if
>>>>>> there are no idle cores then an SMT sibling of the target is as good an
>>>>>> idle CPU to select as any.
>>>>>
>>>>> Isn't the purpose of select_idle_smt ?
>>>>>
>>>>
>>>> Only in part.
>>>>
>>>>> select_idle_core() looks for an idle core and opportunistically saves
>>>>> an idle CPU candidate to skip select_idle_cpu. In this case this is
>>>>> useless loops for select_idle_core() because we are sure that the core
>>>>> is not idle
>>>>>
>>>>
>>>> If select_idle_core() finds an idle candidate other than the sibling,
>>>> it'll use it if there is no idle core -- it picks a busy sibling based
>>>> on a linear walk of the cpumask. Similarly, select_idle_cpu() is not
>>>
>>> My point is that it's a waste of time to loop the sibling cpus of
>>> target in select_idle_core because it will not help to find an idle
>>> core. The sibling  cpus will then be check either by select_idle_cpu
>>> of select_idle_smt
>>
>> also, while looping the cpumask, the sibling cpus of not idle cpu are
>> removed and will not be check
>>
> 
> IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
> If the target's sibling is removed from select_idle_mask from select_idle_core(),
> select_idle_cpu() will lose the chance to pick it up?

aha, no, select_idle_mask will be re-assigned in select_idle_cpu() by:

	cpumask_and(cpus, sds_idle_cpus(sd->shared), p->cpus_ptr);

So, yes, I guess we can remove the cpu_smt_mask(target) from select_idle_core() safely.

> 
> Thanks,
> -Aubrey
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-04 13:40               ` Li, Aubrey
  2020-12-04 13:47                 ` Li, Aubrey
@ 2020-12-04 13:47                 ` Vincent Guittot
  2020-12-04 14:07                   ` Li, Aubrey
  2020-12-04 14:31                   ` Mel Gorman
  1 sibling, 2 replies; 30+ messages in thread
From: Vincent Guittot @ 2020-12-04 13:47 UTC (permalink / raw)
  To: Li, Aubrey
  Cc: Mel Gorman, LKML, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Fri, 4 Dec 2020 at 14:40, Li, Aubrey <aubrey.li@linux.intel.com> wrote:
>
> On 2020/12/4 21:17, Vincent Guittot wrote:
> > On Fri, 4 Dec 2020 at 14:13, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> >>
> >> On Fri, 4 Dec 2020 at 12:30, Mel Gorman <mgorman@techsingularity.net> wrote:
> >>>
> >>> On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
> >>>>> The intent was that the sibling might still be an idle candidate. In
> >>>>> the current draft of the series, I do not even clear this so that the
> >>>>> SMT sibling is considered as an idle candidate. The reasoning is that if
> >>>>> there are no idle cores then an SMT sibling of the target is as good an
> >>>>> idle CPU to select as any.
> >>>>
> >>>> Isn't the purpose of select_idle_smt ?
> >>>>
> >>>
> >>> Only in part.
> >>>
> >>>> select_idle_core() looks for an idle core and opportunistically saves
> >>>> an idle CPU candidate to skip select_idle_cpu. In this case this is
> >>>> useless loops for select_idle_core() because we are sure that the core
> >>>> is not idle
> >>>>
> >>>
> >>> If select_idle_core() finds an idle candidate other than the sibling,
> >>> it'll use it if there is no idle core -- it picks a busy sibling based
> >>> on a linear walk of the cpumask. Similarly, select_idle_cpu() is not
> >>
> >> My point is that it's a waste of time to loop the sibling cpus of
> >> target in select_idle_core because it will not help to find an idle
> >> core. The sibling  cpus will then be check either by select_idle_cpu
> >> of select_idle_smt
> >
> > also, while looping the cpumask, the sibling cpus of not idle cpu are
> > removed and will not be check
> >
>
> IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
> If the target's sibling is removed from select_idle_mask from select_idle_core(),
> select_idle_cpu() will lose the chance to pick it up?

This is only relevant for patch 10 which is not to be included IIUC
what mel said in cover letter : "Patches 9 and 10 are stupid in the
context of this series."

>
> Thanks,
> -Aubrey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-04 13:47                 ` Vincent Guittot
@ 2020-12-04 14:07                   ` Li, Aubrey
  2020-12-04 14:31                   ` Mel Gorman
  1 sibling, 0 replies; 30+ messages in thread
From: Li, Aubrey @ 2020-12-04 14:07 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Mel Gorman, LKML, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On 2020/12/4 21:47, Vincent Guittot wrote:
> On Fri, 4 Dec 2020 at 14:40, Li, Aubrey <aubrey.li@linux.intel.com> wrote:
>>
>> On 2020/12/4 21:17, Vincent Guittot wrote:
>>> On Fri, 4 Dec 2020 at 14:13, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>>>>
>>>> On Fri, 4 Dec 2020 at 12:30, Mel Gorman <mgorman@techsingularity.net> wrote:
>>>>>
>>>>> On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
>>>>>>> The intent was that the sibling might still be an idle candidate. In
>>>>>>> the current draft of the series, I do not even clear this so that the
>>>>>>> SMT sibling is considered as an idle candidate. The reasoning is that if
>>>>>>> there are no idle cores then an SMT sibling of the target is as good an
>>>>>>> idle CPU to select as any.
>>>>>>
>>>>>> Isn't the purpose of select_idle_smt ?
>>>>>>
>>>>>
>>>>> Only in part.
>>>>>
>>>>>> select_idle_core() looks for an idle core and opportunistically saves
>>>>>> an idle CPU candidate to skip select_idle_cpu. In this case this is
>>>>>> useless loops for select_idle_core() because we are sure that the core
>>>>>> is not idle
>>>>>>
>>>>>
>>>>> If select_idle_core() finds an idle candidate other than the sibling,
>>>>> it'll use it if there is no idle core -- it picks a busy sibling based
>>>>> on a linear walk of the cpumask. Similarly, select_idle_cpu() is not
>>>>
>>>> My point is that it's a waste of time to loop the sibling cpus of
>>>> target in select_idle_core because it will not help to find an idle
>>>> core. The sibling  cpus will then be check either by select_idle_cpu
>>>> of select_idle_smt
>>>
>>> also, while looping the cpumask, the sibling cpus of not idle cpu are
>>> removed and will not be check
>>>
>>
>> IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
>> If the target's sibling is removed from select_idle_mask from select_idle_core(),
>> select_idle_cpu() will lose the chance to pick it up?
> 
> This is only relevant for patch 10 which is not to be included IIUC
> what mel said in cover letter : "Patches 9 and 10 are stupid in the
> context of this series."

So the target's sibling can be removed from cpumask in select_idle_core
in patch 6, and need to be added back in select_idle_core in patch 10, :)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-04 13:17             ` Vincent Guittot
  2020-12-04 13:40               ` Li, Aubrey
@ 2020-12-04 14:27               ` Mel Gorman
  1 sibling, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2020-12-04 14:27 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: LKML, Aubrey Li, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Fri, Dec 04, 2020 at 02:17:20PM +0100, Vincent Guittot wrote:
> On Fri, 4 Dec 2020 at 14:13, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> >
> > On Fri, 4 Dec 2020 at 12:30, Mel Gorman <mgorman@techsingularity.net> wrote:
> > >
> > > On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
> > > > > The intent was that the sibling might still be an idle candidate. In
> > > > > the current draft of the series, I do not even clear this so that the
> > > > > SMT sibling is considered as an idle candidate. The reasoning is that if
> > > > > there are no idle cores then an SMT sibling of the target is as good an
> > > > > idle CPU to select as any.
> > > >
> > > > Isn't the purpose of select_idle_smt ?
> > > >
> > >
> > > Only in part.
> > >
> > > > select_idle_core() looks for an idle core and opportunistically saves
> > > > an idle CPU candidate to skip select_idle_cpu. In this case this is
> > > > useless loops for select_idle_core() because we are sure that the core
> > > > is not idle
> > > >
> > >
> > > If select_idle_core() finds an idle candidate other than the sibling,
> > > it'll use it if there is no idle core -- it picks a busy sibling based
> > > on a linear walk of the cpumask. Similarly, select_idle_cpu() is not
> >
> > My point is that it's a waste of time to loop the sibling cpus of
> > target in select_idle_core because it will not help to find an idle
> > core. The sibling  cpus will then be check either by select_idle_cpu
> > of select_idle_smt
> 

I understand and you're right, the full loop was in the context of a series
that unified select_idle_* where it made sense. The version I'm currently
testing aborts the SMT search if a !idle sibling is encountered. That
means that select_idle_core() will no longer scan the entire domain if
there are no idle cores.

https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/commit/?h=sched-sissearch-v2r6&id=eb04a344cf7d7ca64c0c8fc0bcade261fa08c19e

With the patch on its own, it does mean that select_idle_sibling
starts over because SMT siblings might have been cleared. As an aside,
select_idle_core() has it's own problems even then.  It can start a scan
for an idle sibling when cpu_rq(target)->nr_running is very large --
over 100+ running tasks which is almost certainly a useless scan for
cores. However, I haven't done anything with that in this series as it
seemed like it would be follow-up work.

> also, while looping the cpumask, the sibling cpus of not idle cpu are
> removed and will not be check
> 

True and I spotted this. I think the load_balance_mask can be abused to
clear siblings during select_idle_core() while using select_idle_mask to
track CPUs that have not been scanned yet so select_idle_cpu only scans
CPUs that have not already been visited.

https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/commit/?h=sched-sissearch-v2r6&id=a6e986dae38855e3be26dfde86bbef1617431dd1

As both the idle candidate and the load_balance_mask abuse are likely to
be controversial, I shuffled the series so that it's ordered from least
least controversial to most controversial.

This
https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/log/?h=sched-sissearch-v2r6
is what is currently being tested. It'll take most of the weekend and I'll
post them properly if they pass tests and do not throw up nasty surprises.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-04 13:47                 ` Vincent Guittot
  2020-12-04 14:07                   ` Li, Aubrey
@ 2020-12-04 14:31                   ` Mel Gorman
  2020-12-04 15:23                     ` Vincent Guittot
  1 sibling, 1 reply; 30+ messages in thread
From: Mel Gorman @ 2020-12-04 14:31 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Li, Aubrey, LKML, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Fri, Dec 04, 2020 at 02:47:48PM +0100, Vincent Guittot wrote:
> > IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
> > If the target's sibling is removed from select_idle_mask from select_idle_core(),
> > select_idle_cpu() will lose the chance to pick it up?
> 
> This is only relevant for patch 10 which is not to be included IIUC
> what mel said in cover letter : "Patches 9 and 10 are stupid in the
> context of this series."
> 

Patch 10 was stupid in the context of the prototype because
select_idle_core always returned a CPU. A variation ended up being
reintroduced at the end of the Series Yet To Be Posted so that SMT siblings
are cleared during select_idle_core() but select_idle_cpu() still has a
mask with unvisited CPUs to consider if no idle cores are found.

As far as I know, this would still be compatible with Aubrey's idle
cpu mask as long as it's visited and cleared between select_idle_core
and select_idle_cpu. It relaxes the contraints on Aubrey to some extent
because the idle cpu mask would be a hint so if the information is out
of date, an idle cpu may still be found the normal way.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-04 14:31                   ` Mel Gorman
@ 2020-12-04 15:23                     ` Vincent Guittot
  2020-12-04 15:40                       ` Mel Gorman
  0 siblings, 1 reply; 30+ messages in thread
From: Vincent Guittot @ 2020-12-04 15:23 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Li, Aubrey, LKML, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Fri, 4 Dec 2020 at 15:31, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> On Fri, Dec 04, 2020 at 02:47:48PM +0100, Vincent Guittot wrote:
> > > IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
> > > If the target's sibling is removed from select_idle_mask from select_idle_core(),
> > > select_idle_cpu() will lose the chance to pick it up?
> >
> > This is only relevant for patch 10 which is not to be included IIUC
> > what mel said in cover letter : "Patches 9 and 10 are stupid in the
> > context of this series."
> >
>
> Patch 10 was stupid in the context of the prototype because
> select_idle_core always returned a CPU. A variation ended up being
> reintroduced at the end of the Series Yet To Be Posted so that SMT siblings
> are cleared during select_idle_core() but select_idle_cpu() still has a
> mask with unvisited CPUs to consider if no idle cores are found.
>
> As far as I know, this would still be compatible with Aubrey's idle
> cpu mask as long as it's visited and cleared between select_idle_core
> and select_idle_cpu. It relaxes the contraints on Aubrey to some extent
> because the idle cpu mask would be a hint so if the information is out
> of date, an idle cpu may still be found the normal way.

But even without patch 10, just replacing sched_domain_span(sd) by
sds_idle_cpus(sd->shared) will ensure that sis loops only on cpus that
get a chance to be idle so select_idle_core is likely to return an
idle_candidate

>
> --
> Mel Gorman
> SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-04 15:23                     ` Vincent Guittot
@ 2020-12-04 15:40                       ` Mel Gorman
  2020-12-04 15:43                         ` Vincent Guittot
  0 siblings, 1 reply; 30+ messages in thread
From: Mel Gorman @ 2020-12-04 15:40 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Li, Aubrey, LKML, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Fri, Dec 04, 2020 at 04:23:48PM +0100, Vincent Guittot wrote:
> On Fri, 4 Dec 2020 at 15:31, Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > On Fri, Dec 04, 2020 at 02:47:48PM +0100, Vincent Guittot wrote:
> > > > IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
> > > > If the target's sibling is removed from select_idle_mask from select_idle_core(),
> > > > select_idle_cpu() will lose the chance to pick it up?
> > >
> > > This is only relevant for patch 10 which is not to be included IIUC
> > > what mel said in cover letter : "Patches 9 and 10 are stupid in the
> > > context of this series."
> > >
> >
> > Patch 10 was stupid in the context of the prototype because
> > select_idle_core always returned a CPU. A variation ended up being
> > reintroduced at the end of the Series Yet To Be Posted so that SMT siblings
> > are cleared during select_idle_core() but select_idle_cpu() still has a
> > mask with unvisited CPUs to consider if no idle cores are found.
> >
> > As far as I know, this would still be compatible with Aubrey's idle
> > cpu mask as long as it's visited and cleared between select_idle_core
> > and select_idle_cpu. It relaxes the contraints on Aubrey to some extent
> > because the idle cpu mask would be a hint so if the information is out
> > of date, an idle cpu may still be found the normal way.
> 
> But even without patch 10, just replacing sched_domain_span(sd) by
> sds_idle_cpus(sd->shared) will ensure that sis loops only on cpus that
> get a chance to be idle so select_idle_core is likely to return an
> idle_candidate
> 

Yes but if the idle mask is out of date for any reason then idle CPUs might
be missed -- hence the intent to maintain a mask of CPUs visited and use
the idle cpu mask as a hint to prioritise CPUs that are likely idle but
fall back to a normal scan if none of the "idle cpu mask" CPUs are
actually idle.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-04 15:40                       ` Mel Gorman
@ 2020-12-04 15:43                         ` Vincent Guittot
  2020-12-04 18:41                           ` Mel Gorman
  0 siblings, 1 reply; 30+ messages in thread
From: Vincent Guittot @ 2020-12-04 15:43 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Li, Aubrey, LKML, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Fri, 4 Dec 2020 at 16:40, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> On Fri, Dec 04, 2020 at 04:23:48PM +0100, Vincent Guittot wrote:
> > On Fri, 4 Dec 2020 at 15:31, Mel Gorman <mgorman@techsingularity.net> wrote:
> > >
> > > On Fri, Dec 04, 2020 at 02:47:48PM +0100, Vincent Guittot wrote:
> > > > > IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
> > > > > If the target's sibling is removed from select_idle_mask from select_idle_core(),
> > > > > select_idle_cpu() will lose the chance to pick it up?
> > > >
> > > > This is only relevant for patch 10 which is not to be included IIUC
> > > > what mel said in cover letter : "Patches 9 and 10 are stupid in the
> > > > context of this series."
> > > >
> > >
> > > Patch 10 was stupid in the context of the prototype because
> > > select_idle_core always returned a CPU. A variation ended up being
> > > reintroduced at the end of the Series Yet To Be Posted so that SMT siblings
> > > are cleared during select_idle_core() but select_idle_cpu() still has a
> > > mask with unvisited CPUs to consider if no idle cores are found.
> > >
> > > As far as I know, this would still be compatible with Aubrey's idle
> > > cpu mask as long as it's visited and cleared between select_idle_core
> > > and select_idle_cpu. It relaxes the contraints on Aubrey to some extent
> > > because the idle cpu mask would be a hint so if the information is out
> > > of date, an idle cpu may still be found the normal way.
> >
> > But even without patch 10, just replacing sched_domain_span(sd) by
> > sds_idle_cpus(sd->shared) will ensure that sis loops only on cpus that
> > get a chance to be idle so select_idle_core is likely to return an
> > idle_candidate
> >
>
> Yes but if the idle mask is out of date for any reason then idle CPUs might

In fact it's the opposite, a cpu in idle mask might not be idle but
all cpus that enter idle will be set

> be missed -- hence the intent to maintain a mask of CPUs visited and use
> the idle cpu mask as a hint to prioritise CPUs that are likely idle but
> fall back to a normal scan if none of the "idle cpu mask" CPUs are
> actually idle.
>
> --
> Mel Gorman
> SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched
  2020-12-04 15:43                         ` Vincent Guittot
@ 2020-12-04 18:41                           ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2020-12-04 18:41 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Li, Aubrey, LKML, Barry Song, Ingo Molnar, Peter Ziljstra,
	Juri Lelli, Valentin Schneider, Linux-ARM

On Fri, Dec 04, 2020 at 04:43:05PM +0100, Vincent Guittot wrote:
> On Fri, 4 Dec 2020 at 16:40, Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > On Fri, Dec 04, 2020 at 04:23:48PM +0100, Vincent Guittot wrote:
> > > On Fri, 4 Dec 2020 at 15:31, Mel Gorman <mgorman@techsingularity.net> wrote:
> > > >
> > > > On Fri, Dec 04, 2020 at 02:47:48PM +0100, Vincent Guittot wrote:
> > > > > > IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
> > > > > > If the target's sibling is removed from select_idle_mask from select_idle_core(),
> > > > > > select_idle_cpu() will lose the chance to pick it up?
> > > > >
> > > > > This is only relevant for patch 10 which is not to be included IIUC
> > > > > what mel said in cover letter : "Patches 9 and 10 are stupid in the
> > > > > context of this series."
> > > > >
> > > >
> > > > Patch 10 was stupid in the context of the prototype because
> > > > select_idle_core always returned a CPU. A variation ended up being
> > > > reintroduced at the end of the Series Yet To Be Posted so that SMT siblings
> > > > are cleared during select_idle_core() but select_idle_cpu() still has a
> > > > mask with unvisited CPUs to consider if no idle cores are found.
> > > >
> > > > As far as I know, this would still be compatible with Aubrey's idle
> > > > cpu mask as long as it's visited and cleared between select_idle_core
> > > > and select_idle_cpu. It relaxes the contraints on Aubrey to some extent
> > > > because the idle cpu mask would be a hint so if the information is out
> > > > of date, an idle cpu may still be found the normal way.
> > >
> > > But even without patch 10, just replacing sched_domain_span(sd) by
> > > sds_idle_cpus(sd->shared) will ensure that sis loops only on cpus that
> > > get a chance to be idle so select_idle_core is likely to return an
> > > idle_candidate
> > >
> >
> > Yes but if the idle mask is out of date for any reason then idle CPUs might
> 
> In fact it's the opposite, a cpu in idle mask might not be idle but
> all cpus that enter idle will be set
> 

When I first checked, the information was based on the tick or a CPU
stopping the tick. That was not guaranteed to be up to date so I considered
the best option would be to treat idle cpu mask as advisory. It would
not necessarily cover a CPU that was entering idle and polling before
entering an idle state for example or a rq that would pass sched_idle_cpu()
depending on the timing of the update_idle_cpumask call.

I know you reviewed that patch and v6 may be very different but the more
up to date that information is, the greater the cache conflicts will be
on sched_domain_shared so maintaining the up-to-date information may cost
enough to offset any benefit from reduced searching at wakeup.

If this turns out to be wrong, then great, the idle cpu mask can be used
as both the basis for an idle core search and a fast find of an individual
CPU. If the cost of keeping up to date information is too high then the
idle_cpu_mask can be treated as advisory to start the search and track
CPUs visited.

The series are not either/or, chunks of the series I posted are orthogonal
(e.g. changes to p->recent_cpu_used), the latter parts could either work
with idle cpu mask or be replaced by idle cpu mask depending on which
performs better.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-12-04 18:42 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 14:11 [RFC PATCH 00/10] Reduce time complexity of select_idle_sibling Mel Gorman
2020-12-03 14:11 ` [PATCH 01/10] sched/fair: Track efficiency " Mel Gorman
2020-12-03 14:11 ` [PATCH 02/10] sched/fair: Track efficiency of task recent_used_cpu Mel Gorman
2020-12-03 14:11 ` [PATCH 03/10] sched/fair: Remove SIS_AVG_CPU Mel Gorman
2020-12-03 14:11 ` [PATCH 04/10] sched/fair: Return an idle cpu if one is found after a failed search for an idle core Mel Gorman
2020-12-03 16:35   ` Vincent Guittot
2020-12-03 17:50     ` Mel Gorman
2020-12-03 14:11 ` [PATCH 05/10] sched/fair: Do not replace recent_used_cpu with the new target Mel Gorman
2020-12-03 14:11 ` [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched Mel Gorman
2020-12-03 16:38   ` Vincent Guittot
2020-12-03 17:52     ` Mel Gorman
2020-12-04 10:56       ` Vincent Guittot
2020-12-04 11:30         ` Mel Gorman
2020-12-04 13:13           ` Vincent Guittot
2020-12-04 13:17             ` Vincent Guittot
2020-12-04 13:40               ` Li, Aubrey
2020-12-04 13:47                 ` Li, Aubrey
2020-12-04 13:47                 ` Vincent Guittot
2020-12-04 14:07                   ` Li, Aubrey
2020-12-04 14:31                   ` Mel Gorman
2020-12-04 15:23                     ` Vincent Guittot
2020-12-04 15:40                       ` Mel Gorman
2020-12-04 15:43                         ` Vincent Guittot
2020-12-04 18:41                           ` Mel Gorman
2020-12-04 14:27               ` Mel Gorman
2020-12-03 14:11 ` [PATCH 07/10] sched/fair: Account for the idle cpu/smt search cost Mel Gorman
2020-12-03 14:11 ` [PATCH 08/10] sched/fair: Reintroduce SIS_AVG_CPU but in the context of SIS_PROP to reduce search depth Mel Gorman
2020-12-03 14:11 ` [PATCH 09/10] sched/fair: Limit the search for an idle core Mel Gorman
2020-12-03 14:19 ` Mel Gorman
2020-12-03 14:20 ` [PATCH 10/10] sched/fair: Avoid revisiting CPUs multiple times during select_idle_sibling Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).