linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Scan for an idle sibling in a single pass
@ 2021-01-11 15:50 Mel Gorman
  2021-01-11 15:50 ` [PATCH 1/5] sched/fair: Remove SIS_AVG_CPU Mel Gorman
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Mel Gorman @ 2021-01-11 15:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman

This series of 5 patches reposts three patches from Peter entitled
"select_idle_sibling() wreckage". It only scans the runqueues in a single
pass when searching for an idle sibling.

Two patches from Peter were dropped. The first patch altered how scan
depth was calculated. Scan depth deletion is a random number generator
with two major limitations. The avg_idle time is based on the time
between a CPU going idle and being woken up clamped approximately by
2*sysctl_sched_migration_cost.  This is difficult to compare in a sensible
fashion to avg_scan_cost. The second issue is that only the avg_scan_cost
of scan failures is recorded and it does not decay.  This requires deeper
surgery that would justify a patch on its own although Peter notes that
https://lkml.kernel.org/r/20180530143105.977759909@infradead.org is
potentially useful for an alternative avg_idle metric.

The second patch dropped converted the idle core scan throttling
mechanism to SIS_PROP. While this would unify the throttling of core
and CPU scanning, it was not free of regressions and has_idle_cores is
a fairly effective throttling mechanism with the caveat that it can have
a lot of false positives for workloads like hackbench.

Peter's series tried to solve three problems at once, this subset addresses
one problem. As with anything select_idle_sibling, it's a mix of wins and
losses but won more than it lost across a range of workloads and machines.

 kernel/sched/core.c     |  19 +++--
 kernel/sched/fair.c     | 157 ++++++++++++++++++++--------------------
 kernel/sched/features.h |   1 -
 3 files changed, 92 insertions(+), 85 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] sched/fair: Remove SIS_AVG_CPU
  2021-01-11 15:50 [PATCH 0/5] Scan for an idle sibling in a single pass Mel Gorman
@ 2021-01-11 15:50 ` Mel Gorman
  2021-01-11 15:50 ` [PATCH 2/5] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2021-01-11 15:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman

SIS_AVG_CPU was introduced as a means of avoiding a search when the
average search cost indicated that the search would likely fail. It was
a blunt instrument and disabled by commit 4c77b18cf8b7 ("sched/fair: Make
select_idle_cpu() more aggressive") and later replaced with a proportional
search depth by commit 1ad3aaf3fcd2 ("sched/core: Implement new approach
to scale select_idle_cpu()").

While there are corner cases where SIS_AVG_CPU is better, it has now been
disabled for almost three years. As the intent of SIS_PROP is to reduce
the time complexity of select_idle_cpu(), lets drop SIS_AVG_CPU and focus
on SIS_PROP as a throttling mechanism.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c     | 20 +++++++++-----------
 kernel/sched/features.h |  1 -
 2 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04a3ce20da67..9f5682aeda2e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6145,7 +6145,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	struct sched_domain *this_sd;
-	u64 avg_cost, avg_idle;
 	u64 time;
 	int this = smp_processor_id();
 	int cpu, nr = INT_MAX;
@@ -6154,18 +6153,17 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	if (!this_sd)
 		return -1;
 
-	/*
-	 * Due to large variance we need a large fuzz factor; hackbench in
-	 * particularly is sensitive here.
-	 */
-	avg_idle = this_rq()->avg_idle / 512;
-	avg_cost = this_sd->avg_scan_cost + 1;
+	if (sched_feat(SIS_PROP)) {
+		u64 avg_cost, avg_idle, span_avg;
 
-	if (sched_feat(SIS_AVG_CPU) && avg_idle < avg_cost)
-		return -1;
+		/*
+		 * Due to large variance we need a large fuzz factor;
+		 * hackbench in particularly is sensitive here.
+		 */
+		avg_idle = this_rq()->avg_idle / 512;
+		avg_cost = this_sd->avg_scan_cost + 1;
 
-	if (sched_feat(SIS_PROP)) {
-		u64 span_avg = sd->span_weight * avg_idle;
+		span_avg = sd->span_weight * avg_idle;
 		if (span_avg > 4*avg_cost)
 			nr = div_u64(span_avg, avg_cost);
 		else
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 68d369cba9e4..e875eabb6600 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -54,7 +54,6 @@ SCHED_FEAT(TTWU_QUEUE, true)
 /*
  * When doing wakeups, attempt to limit superfluous scans of the LLC domain.
  */
-SCHED_FEAT(SIS_AVG_CPU, false)
 SCHED_FEAT(SIS_PROP, true)
 
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/5] sched/fair: Move avg_scan_cost calculations under SIS_PROP
  2021-01-11 15:50 [PATCH 0/5] Scan for an idle sibling in a single pass Mel Gorman
  2021-01-11 15:50 ` [PATCH 1/5] sched/fair: Remove SIS_AVG_CPU Mel Gorman
@ 2021-01-11 15:50 ` Mel Gorman
  2021-01-11 15:50 ` [PATCH 3/5] sched/fair: Make select_idle_cpu() proportional to cores Mel Gorman
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2021-01-11 15:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman

As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP
even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP
check and while we are at it, exclude the cost of initialising the CPU
mask from the average scan cost.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9f5682aeda2e..c8d8e185cf3b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	if (!this_sd)
 		return -1;
 
+	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+
 	if (sched_feat(SIS_PROP)) {
 		u64 avg_cost, avg_idle, span_avg;
 
@@ -6168,11 +6170,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 			nr = div_u64(span_avg, avg_cost);
 		else
 			nr = 4;
-	}
-
-	time = cpu_clock(this);
 
-	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+		time = cpu_clock(this);
+	}
 
 	for_each_cpu_wrap(cpu, cpus, target) {
 		if (!--nr)
@@ -6181,8 +6181,10 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 			break;
 	}
 
-	time = cpu_clock(this) - time;
-	update_avg(&this_sd->avg_scan_cost, time);
+	if (sched_feat(SIS_PROP)) {
+		time = cpu_clock(this) - time;
+		update_avg(&this_sd->avg_scan_cost, time);
+	}
 
 	return cpu;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] sched/fair: Make select_idle_cpu() proportional to cores
  2021-01-11 15:50 [PATCH 0/5] Scan for an idle sibling in a single pass Mel Gorman
  2021-01-11 15:50 ` [PATCH 1/5] sched/fair: Remove SIS_AVG_CPU Mel Gorman
  2021-01-11 15:50 ` [PATCH 2/5] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman
@ 2021-01-11 15:50 ` Mel Gorman
  2021-01-13 16:49   ` Vincent Guittot
  2021-01-11 15:50 ` [PATCH 4/5] sched/fair: Remove select_idle_smt() Mel Gorman
  2021-01-11 15:50 ` [PATCH 5/5] sched/fair: Merge select_idle_core/cpu() Mel Gorman
  4 siblings, 1 reply; 18+ messages in thread
From: Mel Gorman @ 2021-01-11 15:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman

From: Peter Zijlstra (Intel) <peterz@infradead.org>

Instead of calculating how many (logical) CPUs to scan, compute how
many cores to scan.

This changes behaviour for anything !SMT2.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/core.c | 19 ++++++++++++++-----
 kernel/sched/fair.c | 12 ++++++++++--
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 15d2562118d1..7bfa73de6a8d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7444,11 +7444,20 @@ int sched_cpu_activate(unsigned int cpu)
 	balance_push_set(cpu, false);
 
 #ifdef CONFIG_SCHED_SMT
-	/*
-	 * When going up, increment the number of cores with SMT present.
-	 */
-	if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
-		static_branch_inc_cpuslocked(&sched_smt_present);
+	do {
+		int weight = cpumask_weight(cpu_smt_mask(cpu));
+		extern int sched_smt_weight;
+
+		if (weight > sched_smt_weight)
+			sched_smt_weight = weight;
+
+		/*
+		 * When going up, increment the number of cores with SMT present.
+		 */
+		if (weight == 2)
+			static_branch_inc_cpuslocked(&sched_smt_present);
+
+	} while (0);
 #endif
 	set_cpu_active(cpu, true);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c8d8e185cf3b..0811e2fe4f19 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6010,6 +6010,8 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
 DEFINE_STATIC_KEY_FALSE(sched_smt_present);
 EXPORT_SYMBOL_GPL(sched_smt_present);
 
+int sched_smt_weight __read_mostly = 1;
+
 static inline void set_idle_cores(int cpu, int val)
 {
 	struct sched_domain_shared *sds;
@@ -6124,6 +6126,8 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
 
 #else /* CONFIG_SCHED_SMT */
 
+#define sched_smt_weight	1
+
 static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
 {
 	return -1;
@@ -6136,6 +6140,8 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
 
 #endif /* CONFIG_SCHED_SMT */
 
+#define sis_min_cores		2
+
 /*
  * Scan the LLC domain for idle CPUs; this is dynamically regulated by
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
@@ -6166,10 +6172,12 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 		avg_cost = this_sd->avg_scan_cost + 1;
 
 		span_avg = sd->span_weight * avg_idle;
-		if (span_avg > 4*avg_cost)
+		if (span_avg > sis_min_cores*avg_cost)
 			nr = div_u64(span_avg, avg_cost);
 		else
-			nr = 4;
+			nr = sis_min_cores;
+
+		nr *= sched_smt_weight;
 
 		time = cpu_clock(this);
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/5] sched/fair: Remove select_idle_smt()
  2021-01-11 15:50 [PATCH 0/5] Scan for an idle sibling in a single pass Mel Gorman
                   ` (2 preceding siblings ...)
  2021-01-11 15:50 ` [PATCH 3/5] sched/fair: Make select_idle_cpu() proportional to cores Mel Gorman
@ 2021-01-11 15:50 ` Mel Gorman
  2021-01-11 15:50 ` [PATCH 5/5] sched/fair: Merge select_idle_core/cpu() Mel Gorman
  4 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2021-01-11 15:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman

From: Peter Zijlstra (Intel) <peterz@infradead.org>

In order to make the next patch more readable, and to quantify the
actual effectiveness of this pass, start by removing it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 30 ------------------------------
 1 file changed, 30 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0811e2fe4f19..12e08da90024 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6103,27 +6103,6 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 	return -1;
 }
 
-/*
- * Scan the local SMT mask for idle CPUs.
- */
-static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
-{
-	int cpu;
-
-	if (!static_branch_likely(&sched_smt_present))
-		return -1;
-
-	for_each_cpu(cpu, cpu_smt_mask(target)) {
-		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
-		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
-			continue;
-		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
-			return cpu;
-	}
-
-	return -1;
-}
-
 #else /* CONFIG_SCHED_SMT */
 
 #define sched_smt_weight	1
@@ -6133,11 +6112,6 @@ static inline int select_idle_core(struct task_struct *p, struct sched_domain *s
 	return -1;
 }
 
-static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
-{
-	return -1;
-}
-
 #endif /* CONFIG_SCHED_SMT */
 
 #define sis_min_cores		2
@@ -6331,10 +6305,6 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 
-	i = select_idle_smt(p, sd, target);
-	if ((unsigned)i < nr_cpumask_bits)
-		return i;
-
 	return target;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/5] sched/fair: Merge select_idle_core/cpu()
  2021-01-11 15:50 [PATCH 0/5] Scan for an idle sibling in a single pass Mel Gorman
                   ` (3 preceding siblings ...)
  2021-01-11 15:50 ` [PATCH 4/5] sched/fair: Remove select_idle_smt() Mel Gorman
@ 2021-01-11 15:50 ` Mel Gorman
  2021-01-13 17:03   ` Vincent Guittot
  4 siblings, 1 reply; 18+ messages in thread
From: Mel Gorman @ 2021-01-11 15:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman

Both select_idle_core() and select_idle_cpu() do a loop over the same
cpumask. Observe that by clearing the already visited CPUs, we can
fold the iteration and iterate a core at a time.

All we need to do is remember any non-idle CPU we encountered while
scanning for an idle core. This way we'll only iterate every CPU once.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 97 +++++++++++++++++++++++++++------------------
 1 file changed, 59 insertions(+), 38 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 12e08da90024..84f02abb29e3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6006,6 +6006,14 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
 	return new_cpu;
 }
 
+static inline int __select_idle_cpu(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
+{
+	if (available_idle_cpu(core) || sched_idle_cpu(core))
+		return core;
+
+	return -1;
+}
+
 #ifdef CONFIG_SCHED_SMT
 DEFINE_STATIC_KEY_FALSE(sched_smt_present);
 EXPORT_SYMBOL_GPL(sched_smt_present);
@@ -6066,40 +6074,34 @@ void __update_idle_core(struct rq *rq)
  * there are no idle cores left in the system; tracked through
  * sd_llc->shared->has_idle_cores and enabled through update_idle_core() above.
  */
-static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
 {
-	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
-	int core, cpu;
+	bool idle = true;
+	int cpu;
 
 	if (!static_branch_likely(&sched_smt_present))
-		return -1;
-
-	if (!test_idle_cores(target, false))
-		return -1;
-
-	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+		return __select_idle_cpu(p, core, cpus, idle_cpu);
 
-	for_each_cpu_wrap(core, cpus, target) {
-		bool idle = true;
-
-		for_each_cpu(cpu, cpu_smt_mask(core)) {
-			if (!available_idle_cpu(cpu)) {
-				idle = false;
-				break;
+	for_each_cpu(cpu, cpu_smt_mask(core)) {
+		if (!available_idle_cpu(cpu)) {
+			idle = false;
+			if (*idle_cpu == -1) {
+				if (sched_idle_cpu(cpu) && cpumask_test_cpu(cpu, p->cpus_ptr)) {
+					*idle_cpu = cpu;
+					break;
+				}
+				continue;
 			}
+			break;
 		}
-
-		if (idle)
-			return core;
-
-		cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
+		if (*idle_cpu == -1 && cpumask_test_cpu(cpu, p->cpus_ptr))
+			*idle_cpu = cpu;
 	}
 
-	/*
-	 * Failed to find an idle core; stop looking for one.
-	 */
-	set_idle_cores(target, 0);
+	if (idle)
+		return core;
 
+	cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
 	return -1;
 }
 
@@ -6107,9 +6109,18 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 
 #define sched_smt_weight	1
 
-static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
+static inline void set_idle_cores(int cpu, int val)
 {
-	return -1;
+}
+
+static inline bool test_idle_cores(int cpu, bool def)
+{
+	return def;
+}
+
+static inline int select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
+{
+	return __select_idle_cpu(p, core, cpus, idle_cpu);
 }
 
 #endif /* CONFIG_SCHED_SMT */
@@ -6124,10 +6135,11 @@ static inline int select_idle_core(struct task_struct *p, struct sched_domain *s
 static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
+	int i, cpu, idle_cpu = -1, nr = INT_MAX;
+	bool smt = test_idle_cores(target, false);
+	int this = smp_processor_id();
 	struct sched_domain *this_sd;
 	u64 time;
-	int this = smp_processor_id();
-	int cpu, nr = INT_MAX;
 
 	this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
 	if (!this_sd)
@@ -6135,7 +6147,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 
 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
 
-	if (sched_feat(SIS_PROP)) {
+	if (sched_feat(SIS_PROP) && !smt) {
 		u64 avg_cost, avg_idle, span_avg;
 
 		/*
@@ -6159,16 +6171,29 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	for_each_cpu_wrap(cpu, cpus, target) {
 		if (!--nr)
 			return -1;
-		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
-			break;
+		if (smt) {
+			i = select_idle_core(p, cpu, cpus, &idle_cpu);
+			if ((unsigned int)i < nr_cpumask_bits)
+				return i;
+
+		} else {
+			i = __select_idle_cpu(p, cpu, cpus, &idle_cpu);
+			if ((unsigned int)i < nr_cpumask_bits) {
+				idle_cpu = i;
+				break;
+			}
+		}
 	}
 
-	if (sched_feat(SIS_PROP)) {
+	if (smt)
+		set_idle_cores(this, false);
+
+	if (sched_feat(SIS_PROP) && !smt) {
 		time = cpu_clock(this) - time;
 		update_avg(&this_sd->avg_scan_cost, time);
 	}
 
-	return cpu;
+	return idle_cpu;
 }
 
 /*
@@ -6297,10 +6322,6 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (!sd)
 		return target;
 
-	i = select_idle_core(p, sd, target);
-	if ((unsigned)i < nr_cpumask_bits)
-		return i;
-
 	i = select_idle_cpu(p, sd, target);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] sched/fair: Make select_idle_cpu() proportional to cores
  2021-01-11 15:50 ` [PATCH 3/5] sched/fair: Make select_idle_cpu() proportional to cores Mel Gorman
@ 2021-01-13 16:49   ` Vincent Guittot
  2021-01-14  9:28     ` Mel Gorman
  0 siblings, 1 reply; 18+ messages in thread
From: Vincent Guittot @ 2021-01-13 16:49 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Peter Zijlstra, Ingo Molnar, Li Aubrey, Qais Yousef, LKML

On Mon, 11 Jan 2021 at 16:50, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> From: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> Instead of calculating how many (logical) CPUs to scan, compute how
> many cores to scan.
>
> This changes behaviour for anything !SMT2.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  kernel/sched/core.c | 19 ++++++++++++++-----
>  kernel/sched/fair.c | 12 ++++++++++--
>  2 files changed, 24 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 15d2562118d1..7bfa73de6a8d 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7444,11 +7444,20 @@ int sched_cpu_activate(unsigned int cpu)
>         balance_push_set(cpu, false);
>
>  #ifdef CONFIG_SCHED_SMT
> -       /*
> -        * When going up, increment the number of cores with SMT present.
> -        */
> -       if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
> -               static_branch_inc_cpuslocked(&sched_smt_present);
> +       do {
> +               int weight = cpumask_weight(cpu_smt_mask(cpu));
> +               extern int sched_smt_weight;

coding style problem

> +
> +               if (weight > sched_smt_weight)
> +                       sched_smt_weight = weight;
> +
> +               /*
> +                * When going up, increment the number of cores with SMT present.
> +                */
> +               if (weight == 2)
> +                       static_branch_inc_cpuslocked(&sched_smt_present);
> +
> +       } while (0);
>  #endif
>         set_cpu_active(cpu, true);
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c8d8e185cf3b..0811e2fe4f19 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6010,6 +6010,8 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
>  DEFINE_STATIC_KEY_FALSE(sched_smt_present);
>  EXPORT_SYMBOL_GPL(sched_smt_present);
>
> +int sched_smt_weight __read_mostly = 1;
> +
>  static inline void set_idle_cores(int cpu, int val)
>  {
>         struct sched_domain_shared *sds;
> @@ -6124,6 +6126,8 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
>
>  #else /* CONFIG_SCHED_SMT */
>
> +#define sched_smt_weight       1
> +
>  static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
>  {
>         return -1;
> @@ -6136,6 +6140,8 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
>
>  #endif /* CONFIG_SCHED_SMT */
>
> +#define sis_min_cores          2
> +
>  /*
>   * Scan the LLC domain for idle CPUs; this is dynamically regulated by
>   * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
> @@ -6166,10 +6172,12 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>                 avg_cost = this_sd->avg_scan_cost + 1;
>
>                 span_avg = sd->span_weight * avg_idle;
> -               if (span_avg > 4*avg_cost)
> +               if (span_avg > sis_min_cores*avg_cost)
>                         nr = div_u64(span_avg, avg_cost);
>                 else
> -                       nr = 4;
> +                       nr = sis_min_cores;
> +
> +               nr *= sched_smt_weight;

Also,  patch 5 will look at all CPUs of a core in select_idle_core so
nr will decrement by 1 per core so i don't see the need to multiply by
sched_smt_weight one patch 5 is applied

>
>                 time = cpu_clock(this);
>         }
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] sched/fair: Merge select_idle_core/cpu()
  2021-01-11 15:50 ` [PATCH 5/5] sched/fair: Merge select_idle_core/cpu() Mel Gorman
@ 2021-01-13 17:03   ` Vincent Guittot
  2021-01-14  9:35     ` Mel Gorman
  0 siblings, 1 reply; 18+ messages in thread
From: Vincent Guittot @ 2021-01-13 17:03 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Peter Zijlstra, Ingo Molnar, Li Aubrey, Qais Yousef, LKML

On Mon, 11 Jan 2021 at 16:50, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> Both select_idle_core() and select_idle_cpu() do a loop over the same
> cpumask. Observe that by clearing the already visited CPUs, we can
> fold the iteration and iterate a core at a time.
>
> All we need to do is remember any non-idle CPU we encountered while
> scanning for an idle core. This way we'll only iterate every CPU once.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  kernel/sched/fair.c | 97 +++++++++++++++++++++++++++------------------
>  1 file changed, 59 insertions(+), 38 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 12e08da90024..84f02abb29e3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6006,6 +6006,14 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
>         return new_cpu;
>  }
>
> +static inline int __select_idle_cpu(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
> +{
> +       if (available_idle_cpu(core) || sched_idle_cpu(core))
> +               return core;
> +
> +       return -1;
> +}
> +
>  #ifdef CONFIG_SCHED_SMT
>  DEFINE_STATIC_KEY_FALSE(sched_smt_present);
>  EXPORT_SYMBOL_GPL(sched_smt_present);
> @@ -6066,40 +6074,34 @@ void __update_idle_core(struct rq *rq)
>   * there are no idle cores left in the system; tracked through
>   * sd_llc->shared->has_idle_cores and enabled through update_idle_core() above.
>   */
> -static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
> +static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
>  {
> -       struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> -       int core, cpu;
> +       bool idle = true;
> +       int cpu;
>
>         if (!static_branch_likely(&sched_smt_present))
> -               return -1;
> -
> -       if (!test_idle_cores(target, false))
> -               return -1;
> -
> -       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> +               return __select_idle_cpu(p, core, cpus, idle_cpu);
>
> -       for_each_cpu_wrap(core, cpus, target) {
> -               bool idle = true;
> -
> -               for_each_cpu(cpu, cpu_smt_mask(core)) {
> -                       if (!available_idle_cpu(cpu)) {
> -                               idle = false;
> -                               break;
> +       for_each_cpu(cpu, cpu_smt_mask(core)) {
> +               if (!available_idle_cpu(cpu)) {
> +                       idle = false;
> +                       if (*idle_cpu == -1) {
> +                               if (sched_idle_cpu(cpu) && cpumask_test_cpu(cpu, p->cpus_ptr)) {
> +                                       *idle_cpu = cpu;
> +                                       break;
> +                               }
> +                               continue;
>                         }
> +                       break;
>                 }
> -
> -               if (idle)
> -                       return core;
> -
> -               cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
> +               if (*idle_cpu == -1 && cpumask_test_cpu(cpu, p->cpus_ptr))
> +                       *idle_cpu = cpu;
>         }
>
> -       /*
> -        * Failed to find an idle core; stop looking for one.
> -        */
> -       set_idle_cores(target, 0);
> +       if (idle)
> +               return core;
>
> +       cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
>         return -1;
>  }
>
> @@ -6107,9 +6109,18 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>
>  #define sched_smt_weight       1
>
> -static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
> +static inline void set_idle_cores(int cpu, int val)
>  {
> -       return -1;
> +}
> +
> +static inline bool test_idle_cores(int cpu, bool def)
> +{
> +       return def;
> +}
> +
> +static inline int select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
> +{
> +       return __select_idle_cpu(p, core, cpus, idle_cpu);
>  }
>
>  #endif /* CONFIG_SCHED_SMT */
> @@ -6124,10 +6135,11 @@ static inline int select_idle_core(struct task_struct *p, struct sched_domain *s
>  static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
>  {
>         struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> +       int i, cpu, idle_cpu = -1, nr = INT_MAX;
> +       bool smt = test_idle_cores(target, false);
> +       int this = smp_processor_id();
>         struct sched_domain *this_sd;
>         u64 time;
> -       int this = smp_processor_id();
> -       int cpu, nr = INT_MAX;
>
>         this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
>         if (!this_sd)
> @@ -6135,7 +6147,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>
>         cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>
> -       if (sched_feat(SIS_PROP)) {
> +       if (sched_feat(SIS_PROP) && !smt) {
>                 u64 avg_cost, avg_idle, span_avg;
>
>                 /*
> @@ -6159,16 +6171,29 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>         for_each_cpu_wrap(cpu, cpus, target) {
>                 if (!--nr)
>                         return -1;
> -               if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> -                       break;
> +               if (smt) {

If we want to stay on something similar to the previous behavior, we
want to check on all cores if test_idle_cores is true so nr should be
set to number of cores

and if there is no idle core, nr should be divided by sched_smt_weight
so we will only scan a limited number of core

and you can always call select_idle_core()

> +                       i = select_idle_core(p, cpu, cpus, &idle_cpu);
> +                       if ((unsigned int)i < nr_cpumask_bits)
> +                               return i;
> +
> +               } else {
> +                       i = __select_idle_cpu(p, cpu, cpus, &idle_cpu);
> +                       if ((unsigned int)i < nr_cpumask_bits) {
> +                               idle_cpu = i;
> +                               break;
> +                       }
> +               }
>         }
>
> -       if (sched_feat(SIS_PROP)) {
> +       if (smt)
> +               set_idle_cores(this, false);
> +
> +       if (sched_feat(SIS_PROP) && !smt) {
>                 time = cpu_clock(this) - time;
>                 update_avg(&this_sd->avg_scan_cost, time);
>         }
>
> -       return cpu;
> +       return idle_cpu;
>  }
>
>  /*
> @@ -6297,10 +6322,6 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>         if (!sd)
>                 return target;
>
> -       i = select_idle_core(p, sd, target);
> -       if ((unsigned)i < nr_cpumask_bits)
> -               return i;
> -
>         i = select_idle_cpu(p, sd, target);
>         if ((unsigned)i < nr_cpumask_bits)
>                 return i;
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] sched/fair: Make select_idle_cpu() proportional to cores
  2021-01-13 16:49   ` Vincent Guittot
@ 2021-01-14  9:28     ` Mel Gorman
  0 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2021-01-14  9:28 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: Peter Zijlstra, Ingo Molnar, Li Aubrey, Qais Yousef, LKML

On Wed, Jan 13, 2021 at 05:49:58PM +0100, Vincent Guittot wrote:
> > @@ -7444,11 +7444,20 @@ int sched_cpu_activate(unsigned int cpu)
> >         balance_push_set(cpu, false);
> >
> >  #ifdef CONFIG_SCHED_SMT
> > -       /*
> > -        * When going up, increment the number of cores with SMT present.
> > -        */
> > -       if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
> > -               static_branch_inc_cpuslocked(&sched_smt_present);
> > +       do {
> > +               int weight = cpumask_weight(cpu_smt_mask(cpu));
> > +               extern int sched_smt_weight;
> 
> coding style problem
> 

Presumably you are referring to an extern defined in a C file. That can
move to kernel/sched/sched.h in this patch.

> > <SNIP>
> >  /*
> >   * Scan the LLC domain for idle CPUs; this is dynamically regulated by
> >   * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
> > @@ -6166,10 +6172,12 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> >                 avg_cost = this_sd->avg_scan_cost + 1;
> >
> >                 span_avg = sd->span_weight * avg_idle;
> > -               if (span_avg > 4*avg_cost)
> > +               if (span_avg > sis_min_cores*avg_cost)
> >                         nr = div_u64(span_avg, avg_cost);
> >                 else
> > -                       nr = 4;
> > +                       nr = sis_min_cores;
> > +
> > +               nr *= sched_smt_weight;
> 
> Also,  patch 5 will look at all CPUs of a core in select_idle_core so
> nr will decrement by 1 per core so i don't see the need to multiply by
> sched_smt_weight one patch 5 is applied
> 

It makes sense in the context of this patch but can be removed again in
the last patch and then I think sched_smt_weight only exists in core.c

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] sched/fair: Merge select_idle_core/cpu()
  2021-01-13 17:03   ` Vincent Guittot
@ 2021-01-14  9:35     ` Mel Gorman
  2021-01-14 13:25       ` Vincent Guittot
  0 siblings, 1 reply; 18+ messages in thread
From: Mel Gorman @ 2021-01-14  9:35 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: Peter Zijlstra, Ingo Molnar, Li Aubrey, Qais Yousef, LKML

On Wed, Jan 13, 2021 at 06:03:00PM +0100, Vincent Guittot wrote:
> > @@ -6159,16 +6171,29 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> >         for_each_cpu_wrap(cpu, cpus, target) {
> >                 if (!--nr)
> >                         return -1;
> > -               if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> > -                       break;
> > +               if (smt) {
> 
> If we want to stay on something similar to the previous behavior, we
> want to check on all cores if test_idle_cores is true so nr should be
> set to number of cores
> 

I don't think we necessarily want to do that. has_idle_cores is an
effective throttling mechanism but it's not perfect. If the full domain
is always scanned for a core then there can be excessive scanning in
workloads like hackbench which tends to have has_idle_cores return false
positives. It becomes important once average busy CPUs is over half of
the domain for SMT2.

At least with the patch if that change was made, we still would not scan
twice going over the same runqueues so it would still be an improvement
but it would be nice to avoid excessive deep scanning when there are a
lot of busy CPUs but individual tasks are rapidly idling.

However, in the event regressions are found, changing to your suggested
behaviour would be an obvious starting point.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] sched/fair: Merge select_idle_core/cpu()
  2021-01-14  9:35     ` Mel Gorman
@ 2021-01-14 13:25       ` Vincent Guittot
  2021-01-14 13:53         ` Mel Gorman
  0 siblings, 1 reply; 18+ messages in thread
From: Vincent Guittot @ 2021-01-14 13:25 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Peter Zijlstra, Ingo Molnar, Li Aubrey, Qais Yousef, LKML

On Thu, 14 Jan 2021 at 10:35, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> On Wed, Jan 13, 2021 at 06:03:00PM +0100, Vincent Guittot wrote:
> > > @@ -6159,16 +6171,29 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> > >         for_each_cpu_wrap(cpu, cpus, target) {
> > >                 if (!--nr)
> > >                         return -1;
> > > -               if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> > > -                       break;
> > > +               if (smt) {
> >
> > If we want to stay on something similar to the previous behavior, we
> > want to check on all cores if test_idle_cores is true so nr should be
> > set to number of cores
> >
>
> I don't think we necessarily want to do that. has_idle_cores is an
> effective throttling mechanism but it's not perfect. If the full domain
> is always scanned for a core then there can be excessive scanning in

But that's what the code is currently doing. Can this change be done
in another patch so we can check the impact of each change more
easily?
This patch 5 should focus on merging select_idle_core and
select_idle_cpu so we keep (almost) the same behavior but each CPU is
checked only once.

> workloads like hackbench which tends to have has_idle_cores return false
> positives. It becomes important once average busy CPUs is over half of
> the domain for SMT2.
>
> At least with the patch if that change was made, we still would not scan
> twice going over the same runqueues so it would still be an improvement

yeah, it's for me the main goal of this patchset with the calculation
of avg_can_cost being done only when SIS_PROP is true and the remove
of SIS_AVG

any changes in the number of cpu/core to loop on is sensitive to
regression and should be done in a separate patch IMHO

> but it would be nice to avoid excessive deep scanning when there are a
> lot of busy CPUs but individual tasks are rapidly idling.
>
> However, in the event regressions are found, changing to your suggested
> behaviour would be an obvious starting point.
>
> --
> Mel Gorman
> SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] sched/fair: Merge select_idle_core/cpu()
  2021-01-14 13:25       ` Vincent Guittot
@ 2021-01-14 13:53         ` Mel Gorman
  2021-01-14 15:44           ` Vincent Guittot
  0 siblings, 1 reply; 18+ messages in thread
From: Mel Gorman @ 2021-01-14 13:53 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: Peter Zijlstra, Ingo Molnar, Li Aubrey, Qais Yousef, LKML

On Thu, Jan 14, 2021 at 02:25:32PM +0100, Vincent Guittot wrote:
> On Thu, 14 Jan 2021 at 10:35, Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > On Wed, Jan 13, 2021 at 06:03:00PM +0100, Vincent Guittot wrote:
> > > > @@ -6159,16 +6171,29 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> > > >         for_each_cpu_wrap(cpu, cpus, target) {
> > > >                 if (!--nr)
> > > >                         return -1;
> > > > -               if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> > > > -                       break;
> > > > +               if (smt) {
> > >
> > > If we want to stay on something similar to the previous behavior, we
> > > want to check on all cores if test_idle_cores is true so nr should be
> > > set to number of cores
> > >
> >
> > I don't think we necessarily want to do that. has_idle_cores is an
> > effective throttling mechanism but it's not perfect. If the full domain
> > is always scanned for a core then there can be excessive scanning in
> 
> But that's what the code is currently doing. Can this change be done
> in another patch so we can check the impact of each change more
> easily?

Ok, when looking at this again instead of just the mail, the flow is;

        int i, cpu, idle_cpu = -1, nr = INT_MAX;
	...
        if (sched_feat(SIS_PROP) && !smt) {
		/* recalculate nr */
	}

The !smt check should mean that core scanning is still scanning the entire
domain. There is no need to make it specific to the core account and we
are already doing the full scan. Throttling that would be a separate patch.

> This patch 5 should focus on merging select_idle_core and
> select_idle_cpu so we keep (almost) the same behavior but each CPU is
> checked only once.
> 

Which I think it's already doing. Main glitch really is that
__select_idle_cpu() shouldn't be taking *idle_cpu as it does not consume
the information.

> > workloads like hackbench which tends to have has_idle_cores return false
> > positives. It becomes important once average busy CPUs is over half of
> > the domain for SMT2.
> >
> > At least with the patch if that change was made, we still would not scan
> > twice going over the same runqueues so it would still be an improvement
> 
> yeah, it's for me the main goal of this patchset with the calculation
> of avg_can_cost being done only when SIS_PROP is true and the remove
> of SIS_AVG
> 
> any changes in the number of cpu/core to loop on is sensitive to
> regression and should be done in a separate patch IMHO
> 

Understood.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] sched/fair: Merge select_idle_core/cpu()
  2021-01-14 13:53         ` Mel Gorman
@ 2021-01-14 15:44           ` Vincent Guittot
  2021-01-14 17:18             ` Mel Gorman
  0 siblings, 1 reply; 18+ messages in thread
From: Vincent Guittot @ 2021-01-14 15:44 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Peter Zijlstra, Ingo Molnar, Li Aubrey, Qais Yousef, LKML

On Thu, 14 Jan 2021 at 14:53, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> On Thu, Jan 14, 2021 at 02:25:32PM +0100, Vincent Guittot wrote:
> > On Thu, 14 Jan 2021 at 10:35, Mel Gorman <mgorman@techsingularity.net> wrote:
> > >
> > > On Wed, Jan 13, 2021 at 06:03:00PM +0100, Vincent Guittot wrote:
> > > > > @@ -6159,16 +6171,29 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> > > > >         for_each_cpu_wrap(cpu, cpus, target) {
> > > > >                 if (!--nr)
> > > > >                         return -1;
> > > > > -               if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> > > > > -                       break;
> > > > > +               if (smt) {
> > > >
> > > > If we want to stay on something similar to the previous behavior, we
> > > > want to check on all cores if test_idle_cores is true so nr should be
> > > > set to number of cores
> > > >
> > >
> > > I don't think we necessarily want to do that. has_idle_cores is an
> > > effective throttling mechanism but it's not perfect. If the full domain
> > > is always scanned for a core then there can be excessive scanning in
> >
> > But that's what the code is currently doing. Can this change be done
> > in another patch so we can check the impact of each change more
> > easily?
>
> Ok, when looking at this again instead of just the mail, the flow is;
>
>         int i, cpu, idle_cpu = -1, nr = INT_MAX;
>         ...
>         if (sched_feat(SIS_PROP) && !smt) {
>                 /* recalculate nr */
>         }
>
> The !smt check should mean that core scanning is still scanning the entire

yes good point. I missed  this change.

> domain. There is no need to make it specific to the core account and we
> are already doing the full scan. Throttling that would be a separate patch.
>
> > This patch 5 should focus on merging select_idle_core and
> > select_idle_cpu so we keep (almost) the same behavior but each CPU is
> > checked only once.
> >
>
> Which I think it's already doing. Main glitch really is that
> __select_idle_cpu() shouldn't be taking *idle_cpu as it does not consume
> the information.

 don't really like the if (smt) else in the for_each_cpu_wrap(cpu,
cpus, target) loop  because it just looks like we fail to merge idle
core and idle cpu search loop at the end.

But there is probably not much we can do without changing what is
accounted idle core  search in the avg_scan_cost


>
> > > workloads like hackbench which tends to have has_idle_cores return false
> > > positives. It becomes important once average busy CPUs is over half of
> > > the domain for SMT2.
> > >
> > > At least with the patch if that change was made, we still would not scan
> > > twice going over the same runqueues so it would still be an improvement
> >
> > yeah, it's for me the main goal of this patchset with the calculation
> > of avg_can_cost being done only when SIS_PROP is true and the remove
> > of SIS_AVG
> >
> > any changes in the number of cpu/core to loop on is sensitive to
> > regression and should be done in a separate patch IMHO
> >
>
> Understood.
>
> --
> Mel Gorman
> SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] sched/fair: Merge select_idle_core/cpu()
  2021-01-14 15:44           ` Vincent Guittot
@ 2021-01-14 17:18             ` Mel Gorman
  0 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2021-01-14 17:18 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: Peter Zijlstra, Ingo Molnar, Li Aubrey, Qais Yousef, LKML

On Thu, Jan 14, 2021 at 04:44:46PM +0100, Vincent Guittot wrote:
> > domain. There is no need to make it specific to the core account and we
> > are already doing the full scan. Throttling that would be a separate patch.
> >
> > > This patch 5 should focus on merging select_idle_core and
> > > select_idle_cpu so we keep (almost) the same behavior but each CPU is
> > > checked only once.
> > >
> >
> > Which I think it's already doing. Main glitch really is that
> > __select_idle_cpu() shouldn't be taking *idle_cpu as it does not consume
> > the information.
> 
>  don't really like the if (smt) else in the for_each_cpu_wrap(cpu,
> cpus, target) loop  because it just looks like we fail to merge idle
> core and idle cpu search loop at the end.
> 

While it's not the best, I did at one point have a series that fully
unified this function and it wasn't pretty.

> But there is probably not much we can do without changing what is
> accounted idle core  search in the avg_scan_cost
> 

Indeed. Maybe in the future it'll make more sense to consolidate it
further but between the depth search and possibly using SIS_PROP core
core searches, we've bigger fish to fry.

Current delta between this series and what is being tested is simply

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7bfa73de6a8d..ada8faac2e4d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7446,7 +7446,6 @@ int sched_cpu_activate(unsigned int cpu)
 #ifdef CONFIG_SCHED_SMT
 	do {
 		int weight = cpumask_weight(cpu_smt_mask(cpu));
-		extern int sched_smt_weight;
 
 		if (weight > sched_smt_weight)
 			sched_smt_weight = weight;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 84f02abb29e3..6c0f841e9e75 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6006,7 +6006,7 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
 	return new_cpu;
 }
 
-static inline int __select_idle_cpu(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
+static inline int __select_idle_cpu(struct task_struct *p, int core, struct cpumask *cpus)
 {
 	if (available_idle_cpu(core) || sched_idle_cpu(core))
 		return core;
@@ -6080,7 +6080,7 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
 	int cpu;
 
 	if (!static_branch_likely(&sched_smt_present))
-		return __select_idle_cpu(p, core, cpus, idle_cpu);
+		return __select_idle_cpu(p, core, cpus);
 
 	for_each_cpu(cpu, cpu_smt_mask(core)) {
 		if (!available_idle_cpu(cpu)) {
@@ -6120,7 +6120,7 @@ static inline bool test_idle_cores(int cpu, bool def)
 
 static inline int select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
 {
-	return __select_idle_cpu(p, core, cpus, idle_cpu);
+	return __select_idle_cpu(p, core, cpus);
 }
 
 #endif /* CONFIG_SCHED_SMT */
@@ -6177,7 +6177,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 				return i;
 
 		} else {
-			i = __select_idle_cpu(p, cpu, cpus, &idle_cpu);
+			i = __select_idle_cpu(p, cpu, cpus);
 			if ((unsigned int)i < nr_cpumask_bits) {
 				idle_cpu = i;
 				break;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 12ada79d40f3..29aabe98dd1d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1107,6 +1107,8 @@ static inline void update_idle_core(struct rq *rq)
 		__update_idle_core(rq);
 }
 
+extern int sched_smt_weight;
+
 #else
 static inline void update_idle_core(struct rq *rq) { }
 #endif
-- 
Mel Gorman
SUSE Labs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] sched/fair: Make select_idle_cpu() proportional to cores
  2021-01-19 11:22 [PATCH v3 0/5] Scan for an idle sibling in a single pass Mel Gorman
@ 2021-01-19 11:22 ` Mel Gorman
  0 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2021-01-19 11:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman

From: Peter Zijlstra (Intel) <peterz@infradead.org>

Instead of calculating how many (logical) CPUs to scan, compute how
many cores to scan.

This changes behaviour for anything !SMT2.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/core.c  | 18 +++++++++++++-----
 kernel/sched/fair.c  | 12 ++++++++++--
 kernel/sched/sched.h |  2 ++
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 15d2562118d1..ada8faac2e4d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7444,11 +7444,19 @@ int sched_cpu_activate(unsigned int cpu)
 	balance_push_set(cpu, false);
 
 #ifdef CONFIG_SCHED_SMT
-	/*
-	 * When going up, increment the number of cores with SMT present.
-	 */
-	if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
-		static_branch_inc_cpuslocked(&sched_smt_present);
+	do {
+		int weight = cpumask_weight(cpu_smt_mask(cpu));
+
+		if (weight > sched_smt_weight)
+			sched_smt_weight = weight;
+
+		/*
+		 * When going up, increment the number of cores with SMT present.
+		 */
+		if (weight == 2)
+			static_branch_inc_cpuslocked(&sched_smt_present);
+
+	} while (0);
 #endif
 	set_cpu_active(cpu, true);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c8d8e185cf3b..0811e2fe4f19 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6010,6 +6010,8 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
 DEFINE_STATIC_KEY_FALSE(sched_smt_present);
 EXPORT_SYMBOL_GPL(sched_smt_present);
 
+int sched_smt_weight __read_mostly = 1;
+
 static inline void set_idle_cores(int cpu, int val)
 {
 	struct sched_domain_shared *sds;
@@ -6124,6 +6126,8 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
 
 #else /* CONFIG_SCHED_SMT */
 
+#define sched_smt_weight	1
+
 static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
 {
 	return -1;
@@ -6136,6 +6140,8 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
 
 #endif /* CONFIG_SCHED_SMT */
 
+#define sis_min_cores		2
+
 /*
  * Scan the LLC domain for idle CPUs; this is dynamically regulated by
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
@@ -6166,10 +6172,12 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 		avg_cost = this_sd->avg_scan_cost + 1;
 
 		span_avg = sd->span_weight * avg_idle;
-		if (span_avg > 4*avg_cost)
+		if (span_avg > sis_min_cores*avg_cost)
 			nr = div_u64(span_avg, avg_cost);
 		else
-			nr = 4;
+			nr = sis_min_cores;
+
+		nr *= sched_smt_weight;
 
 		time = cpu_clock(this);
 	}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 12ada79d40f3..29aabe98dd1d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1107,6 +1107,8 @@ static inline void update_idle_core(struct rq *rq)
 		__update_idle_core(rq);
 }
 
+extern int sched_smt_weight;
+
 #else
 static inline void update_idle_core(struct rq *rq) { }
 #endif
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] sched/fair: Make select_idle_cpu() proportional to cores
  2021-01-18  8:14   ` Li, Aubrey
@ 2021-01-18  9:27     ` Mel Gorman
  0 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2021-01-18  9:27 UTC (permalink / raw)
  To: Li, Aubrey
  Cc: Peter Zijlstra, Ingo Molnar, Vincent Guittot, Qais Yousef, LKML

On Mon, Jan 18, 2021 at 04:14:36PM +0800, Li, Aubrey wrote:
> > <SNIP>
> > @@ -6124,6 +6126,8 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
> >  
> >  #else /* CONFIG_SCHED_SMT */
> >  
> > +#define sched_smt_weight	1
> > +
> >  static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
> >  {
> >  	return -1;
> >
> > <SNIP>
> >
> > @@ -6166,10 +6172,12 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> >  		avg_cost = this_sd->avg_scan_cost + 1;
> >  
> >  		span_avg = sd->span_weight * avg_idle;
> > -		if (span_avg > 4*avg_cost)
> > +		if (span_avg > sis_min_cores*avg_cost)
> >  			nr = div_u64(span_avg, avg_cost);
> >  		else
> > -			nr = 4;
> > +			nr = sis_min_cores;
> > +
> > +		nr *= sched_smt_weight;
> 
> Is it better to put this into an inline wrapper to hide sched_smt_weight if !CONFIG_SCHED_SMT?
> 

There already is a #define sched_smt_weight for !CONFIG_SCHED_SMT and I
do not think an inline wrapper would make it more readable or maintainable.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] sched/fair: Make select_idle_cpu() proportional to cores
  2021-01-15 10:08 ` [PATCH 3/5] sched/fair: Make select_idle_cpu() proportional to cores Mel Gorman
@ 2021-01-18  8:14   ` Li, Aubrey
  2021-01-18  9:27     ` Mel Gorman
  0 siblings, 1 reply; 18+ messages in thread
From: Li, Aubrey @ 2021-01-18  8:14 UTC (permalink / raw)
  To: Mel Gorman, Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Qais Yousef, LKML

On 2021/1/15 18:08, Mel Gorman wrote:
> From: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> Instead of calculating how many (logical) CPUs to scan, compute how
> many cores to scan.
> 
> This changes behaviour for anything !SMT2.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  kernel/sched/core.c  | 18 +++++++++++++-----
>  kernel/sched/fair.c  | 12 ++++++++++--
>  kernel/sched/sched.h |  2 ++
>  3 files changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 15d2562118d1..ada8faac2e4d 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7444,11 +7444,19 @@ int sched_cpu_activate(unsigned int cpu)
>  	balance_push_set(cpu, false);
>  
>  #ifdef CONFIG_SCHED_SMT
> -	/*
> -	 * When going up, increment the number of cores with SMT present.
> -	 */
> -	if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
> -		static_branch_inc_cpuslocked(&sched_smt_present);
> +	do {
> +		int weight = cpumask_weight(cpu_smt_mask(cpu));
> +
> +		if (weight > sched_smt_weight)
> +			sched_smt_weight = weight;
> +
> +		/*
> +		 * When going up, increment the number of cores with SMT present.
> +		 */
> +		if (weight == 2)
> +			static_branch_inc_cpuslocked(&sched_smt_present);
> +
> +	} while (0);
>  #endif
>  	set_cpu_active(cpu, true);
>  
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c8d8e185cf3b..0811e2fe4f19 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6010,6 +6010,8 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
>  DEFINE_STATIC_KEY_FALSE(sched_smt_present);
>  EXPORT_SYMBOL_GPL(sched_smt_present);
>  
> +int sched_smt_weight __read_mostly = 1;
> +
>  static inline void set_idle_cores(int cpu, int val)
>  {
>  	struct sched_domain_shared *sds;
> @@ -6124,6 +6126,8 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
>  
>  #else /* CONFIG_SCHED_SMT */
>  
> +#define sched_smt_weight	1
> +
>  static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
>  {
>  	return -1;
> @@ -6136,6 +6140,8 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
>  
>  #endif /* CONFIG_SCHED_SMT */
>  
> +#define sis_min_cores		2
> +
>  /*
>   * Scan the LLC domain for idle CPUs; this is dynamically regulated by
>   * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
> @@ -6166,10 +6172,12 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>  		avg_cost = this_sd->avg_scan_cost + 1;
>  
>  		span_avg = sd->span_weight * avg_idle;
> -		if (span_avg > 4*avg_cost)
> +		if (span_avg > sis_min_cores*avg_cost)
>  			nr = div_u64(span_avg, avg_cost);
>  		else
> -			nr = 4;
> +			nr = sis_min_cores;
> +
> +		nr *= sched_smt_weight;

Is it better to put this into an inline wrapper to hide sched_smt_weight if !CONFIG_SCHED_SMT?

Thanks,
-Aubrey

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/5] sched/fair: Make select_idle_cpu() proportional to cores
  2021-01-15 10:08 [PATCH v2 0/5] Scan for an idle sibling in a single pass Mel Gorman
@ 2021-01-15 10:08 ` Mel Gorman
  2021-01-18  8:14   ` Li, Aubrey
  0 siblings, 1 reply; 18+ messages in thread
From: Mel Gorman @ 2021-01-15 10:08 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman

From: Peter Zijlstra (Intel) <peterz@infradead.org>

Instead of calculating how many (logical) CPUs to scan, compute how
many cores to scan.

This changes behaviour for anything !SMT2.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/core.c  | 18 +++++++++++++-----
 kernel/sched/fair.c  | 12 ++++++++++--
 kernel/sched/sched.h |  2 ++
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 15d2562118d1..ada8faac2e4d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7444,11 +7444,19 @@ int sched_cpu_activate(unsigned int cpu)
 	balance_push_set(cpu, false);
 
 #ifdef CONFIG_SCHED_SMT
-	/*
-	 * When going up, increment the number of cores with SMT present.
-	 */
-	if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
-		static_branch_inc_cpuslocked(&sched_smt_present);
+	do {
+		int weight = cpumask_weight(cpu_smt_mask(cpu));
+
+		if (weight > sched_smt_weight)
+			sched_smt_weight = weight;
+
+		/*
+		 * When going up, increment the number of cores with SMT present.
+		 */
+		if (weight == 2)
+			static_branch_inc_cpuslocked(&sched_smt_present);
+
+	} while (0);
 #endif
 	set_cpu_active(cpu, true);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c8d8e185cf3b..0811e2fe4f19 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6010,6 +6010,8 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
 DEFINE_STATIC_KEY_FALSE(sched_smt_present);
 EXPORT_SYMBOL_GPL(sched_smt_present);
 
+int sched_smt_weight __read_mostly = 1;
+
 static inline void set_idle_cores(int cpu, int val)
 {
 	struct sched_domain_shared *sds;
@@ -6124,6 +6126,8 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
 
 #else /* CONFIG_SCHED_SMT */
 
+#define sched_smt_weight	1
+
 static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
 {
 	return -1;
@@ -6136,6 +6140,8 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
 
 #endif /* CONFIG_SCHED_SMT */
 
+#define sis_min_cores		2
+
 /*
  * Scan the LLC domain for idle CPUs; this is dynamically regulated by
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
@@ -6166,10 +6172,12 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 		avg_cost = this_sd->avg_scan_cost + 1;
 
 		span_avg = sd->span_weight * avg_idle;
-		if (span_avg > 4*avg_cost)
+		if (span_avg > sis_min_cores*avg_cost)
 			nr = div_u64(span_avg, avg_cost);
 		else
-			nr = 4;
+			nr = sis_min_cores;
+
+		nr *= sched_smt_weight;
 
 		time = cpu_clock(this);
 	}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 12ada79d40f3..29aabe98dd1d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1107,6 +1107,8 @@ static inline void update_idle_core(struct rq *rq)
 		__update_idle_core(rq);
 }
 
+extern int sched_smt_weight;
+
 #else
 static inline void update_idle_core(struct rq *rq) { }
 #endif
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-01-19 11:39 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-11 15:50 [PATCH 0/5] Scan for an idle sibling in a single pass Mel Gorman
2021-01-11 15:50 ` [PATCH 1/5] sched/fair: Remove SIS_AVG_CPU Mel Gorman
2021-01-11 15:50 ` [PATCH 2/5] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman
2021-01-11 15:50 ` [PATCH 3/5] sched/fair: Make select_idle_cpu() proportional to cores Mel Gorman
2021-01-13 16:49   ` Vincent Guittot
2021-01-14  9:28     ` Mel Gorman
2021-01-11 15:50 ` [PATCH 4/5] sched/fair: Remove select_idle_smt() Mel Gorman
2021-01-11 15:50 ` [PATCH 5/5] sched/fair: Merge select_idle_core/cpu() Mel Gorman
2021-01-13 17:03   ` Vincent Guittot
2021-01-14  9:35     ` Mel Gorman
2021-01-14 13:25       ` Vincent Guittot
2021-01-14 13:53         ` Mel Gorman
2021-01-14 15:44           ` Vincent Guittot
2021-01-14 17:18             ` Mel Gorman
2021-01-15 10:08 [PATCH v2 0/5] Scan for an idle sibling in a single pass Mel Gorman
2021-01-15 10:08 ` [PATCH 3/5] sched/fair: Make select_idle_cpu() proportional to cores Mel Gorman
2021-01-18  8:14   ` Li, Aubrey
2021-01-18  9:27     ` Mel Gorman
2021-01-19 11:22 [PATCH v3 0/5] Scan for an idle sibling in a single pass Mel Gorman
2021-01-19 11:22 ` [PATCH 3/5] sched/fair: Make select_idle_cpu() proportional to cores Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).