All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/4] Scan for an idle sibling in a single pass
@ 2021-01-27 13:51 Mel Gorman
  2021-01-27 13:52 ` [PATCH 1/4] sched/fair: Remove SIS_AVG_CPU Mel Gorman
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Mel Gorman @ 2021-01-27 13:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman

Changelog since v4
o Avoid use of intermediate variable during select_idle_cpu

Changelog since v3
o Drop scanning based on cores, SMT4 results showed problems

Changelog since v2
o Remove unnecessary parameters
o Update nr during scan only when scanning for cpus

Changlog since v1
o Move extern declaration to header for coding style
o Remove unnecessary parameter from __select_idle_cpu

This series of 4 patches reposts three patches from Peter entitled
"select_idle_sibling() wreckage". It only scans the runqueues in a single
pass when searching for an idle sibling.

Three patches from Peter were dropped. The first patch altered how scan
depth was calculated. Scan depth deletion is a random number generator
with two major limitations. The avg_idle time is based on the time
between a CPU going idle and being woken up clamped approximately by
2*sysctl_sched_migration_cost.  This is difficult to compare in a sensible
fashion to avg_scan_cost. The second issue is that only the avg_scan_cost
of scan failures is recorded and it does not decay.  This requires deeper
surgery that would justify a patch on its own although Peter notes that
https://lkml.kernel.org/r/20180530143105.977759909@infradead.org is
potentially useful for an alternative avg_idle metric.

The second patch dropped scanned based on cores instead of CPUs as it
rationalised the difference between core scanning and CPU scanning.
Unfortunately, Vincent reported problems with SMT4 so it's dropped
for now until depth searching can be fixed.

The third patch dropped converted the idle core scan throttling mechanism
to SIS_PROP. While this would unify the throttling of core and CPU
scanning, it was not free of regressions and has_idle_cores is a fairly
effective throttling mechanism with the caveat that it can have a lot of
false positives for workloads like hackbench.

Peter's series tried to solve three problems at once, this subset addresses
one problem.

 kernel/sched/fair.c     | 151 +++++++++++++++++++---------------------
 kernel/sched/features.h |   1 -
 2 files changed, 70 insertions(+), 82 deletions(-)

-- 
2.26.2

Mel Gorman (4):
  sched/fair: Remove SIS_AVG_CPU
  sched/fair: Move avg_scan_cost calculations under SIS_PROP
  sched/fair: Remove select_idle_smt()
  sched/fair: Merge select_idle_core/cpu()

 kernel/sched/fair.c     | 151 +++++++++++++++++++---------------------
 kernel/sched/features.h |   1 -
 2 files changed, 70 insertions(+), 82 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] sched/fair: Remove SIS_AVG_CPU
  2021-01-27 13:51 [PATCH v5 0/4] Scan for an idle sibling in a single pass Mel Gorman
@ 2021-01-27 13:52 ` Mel Gorman
  2021-01-27 13:52 ` [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2021-01-27 13:52 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman

SIS_AVG_CPU was introduced as a means of avoiding a search when the
average search cost indicated that the search would likely fail. It was
a blunt instrument and disabled by commit 4c77b18cf8b7 ("sched/fair: Make
select_idle_cpu() more aggressive") and later replaced with a proportional
search depth by commit 1ad3aaf3fcd2 ("sched/core: Implement new approach
to scale select_idle_cpu()").

While there are corner cases where SIS_AVG_CPU is better, it has now been
disabled for almost three years. As the intent of SIS_PROP is to reduce
the time complexity of select_idle_cpu(), lets drop SIS_AVG_CPU and focus
on SIS_PROP as a throttling mechanism.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c     | 20 +++++++++-----------
 kernel/sched/features.h |  1 -
 2 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04a3ce20da67..9f5682aeda2e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6145,7 +6145,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	struct sched_domain *this_sd;
-	u64 avg_cost, avg_idle;
 	u64 time;
 	int this = smp_processor_id();
 	int cpu, nr = INT_MAX;
@@ -6154,18 +6153,17 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	if (!this_sd)
 		return -1;
 
-	/*
-	 * Due to large variance we need a large fuzz factor; hackbench in
-	 * particularly is sensitive here.
-	 */
-	avg_idle = this_rq()->avg_idle / 512;
-	avg_cost = this_sd->avg_scan_cost + 1;
+	if (sched_feat(SIS_PROP)) {
+		u64 avg_cost, avg_idle, span_avg;
 
-	if (sched_feat(SIS_AVG_CPU) && avg_idle < avg_cost)
-		return -1;
+		/*
+		 * Due to large variance we need a large fuzz factor;
+		 * hackbench in particularly is sensitive here.
+		 */
+		avg_idle = this_rq()->avg_idle / 512;
+		avg_cost = this_sd->avg_scan_cost + 1;
 
-	if (sched_feat(SIS_PROP)) {
-		u64 span_avg = sd->span_weight * avg_idle;
+		span_avg = sd->span_weight * avg_idle;
 		if (span_avg > 4*avg_cost)
 			nr = div_u64(span_avg, avg_cost);
 		else
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 68d369cba9e4..e875eabb6600 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -54,7 +54,6 @@ SCHED_FEAT(TTWU_QUEUE, true)
 /*
  * When doing wakeups, attempt to limit superfluous scans of the LLC domain.
  */
-SCHED_FEAT(SIS_AVG_CPU, false)
 SCHED_FEAT(SIS_PROP, true)
 
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP
  2021-01-27 13:51 [PATCH v5 0/4] Scan for an idle sibling in a single pass Mel Gorman
  2021-01-27 13:52 ` [PATCH 1/4] sched/fair: Remove SIS_AVG_CPU Mel Gorman
@ 2021-01-27 13:52 ` Mel Gorman
  2021-01-27 13:52 ` [PATCH 3/4] sched/fair: Remove select_idle_smt() Mel Gorman
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2021-01-27 13:52 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman

As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP
even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP
check and while we are at it, exclude the cost of initialising the CPU
mask from the average scan cost.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9f5682aeda2e..c8d8e185cf3b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	if (!this_sd)
 		return -1;
 
+	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+
 	if (sched_feat(SIS_PROP)) {
 		u64 avg_cost, avg_idle, span_avg;
 
@@ -6168,11 +6170,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 			nr = div_u64(span_avg, avg_cost);
 		else
 			nr = 4;
-	}
-
-	time = cpu_clock(this);
 
-	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+		time = cpu_clock(this);
+	}
 
 	for_each_cpu_wrap(cpu, cpus, target) {
 		if (!--nr)
@@ -6181,8 +6181,10 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 			break;
 	}
 
-	time = cpu_clock(this) - time;
-	update_avg(&this_sd->avg_scan_cost, time);
+	if (sched_feat(SIS_PROP)) {
+		time = cpu_clock(this) - time;
+		update_avg(&this_sd->avg_scan_cost, time);
+	}
 
 	return cpu;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] sched/fair: Remove select_idle_smt()
  2021-01-27 13:51 [PATCH v5 0/4] Scan for an idle sibling in a single pass Mel Gorman
  2021-01-27 13:52 ` [PATCH 1/4] sched/fair: Remove SIS_AVG_CPU Mel Gorman
  2021-01-27 13:52 ` [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman
@ 2021-01-27 13:52 ` Mel Gorman
  2021-01-27 13:52 ` [PATCH 4/4] sched/fair: Merge select_idle_core/cpu() Mel Gorman
  2021-02-01  1:13 ` [PATCH v5 0/4] Scan for an idle sibling in a single pass Li, Aubrey
  4 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2021-01-27 13:52 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman

From: Peter Zijlstra (Intel) <peterz@infradead.org>

In order to make the next patch more readable, and to quantify the
actual effectiveness of this pass, start by removing it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 30 ------------------------------
 1 file changed, 30 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c8d8e185cf3b..fe587350ea14 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6101,27 +6101,6 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 	return -1;
 }
 
-/*
- * Scan the local SMT mask for idle CPUs.
- */
-static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
-{
-	int cpu;
-
-	if (!static_branch_likely(&sched_smt_present))
-		return -1;
-
-	for_each_cpu(cpu, cpu_smt_mask(target)) {
-		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
-		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
-			continue;
-		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
-			return cpu;
-	}
-
-	return -1;
-}
-
 #else /* CONFIG_SCHED_SMT */
 
 static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
@@ -6129,11 +6108,6 @@ static inline int select_idle_core(struct task_struct *p, struct sched_domain *s
 	return -1;
 }
 
-static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
-{
-	return -1;
-}
-
 #endif /* CONFIG_SCHED_SMT */
 
 /*
@@ -6323,10 +6297,6 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 
-	i = select_idle_smt(p, sd, target);
-	if ((unsigned)i < nr_cpumask_bits)
-		return i;
-
 	return target;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] sched/fair: Merge select_idle_core/cpu()
  2021-01-27 13:51 [PATCH v5 0/4] Scan for an idle sibling in a single pass Mel Gorman
                   ` (2 preceding siblings ...)
  2021-01-27 13:52 ` [PATCH 3/4] sched/fair: Remove select_idle_smt() Mel Gorman
@ 2021-01-27 13:52 ` Mel Gorman
  2021-02-17 13:17   ` [tip: sched/core] " tip-bot2 for Mel Gorman
  2021-02-01  1:13 ` [PATCH v5 0/4] Scan for an idle sibling in a single pass Li, Aubrey
  4 siblings, 1 reply; 10+ messages in thread
From: Mel Gorman @ 2021-01-27 13:52 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman

From: Peter Zijlstra (Intel) <peterz@infradead.org>

Both select_idle_core() and select_idle_cpu() do a loop over the same
cpumask. Observe that by clearing the already visited CPUs, we can
fold the iteration and iterate a core at a time.

All we need to do is remember any non-idle CPU we encountered while
scanning for an idle core. This way we'll only iterate every CPU once.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 99 +++++++++++++++++++++++++++------------------
 1 file changed, 59 insertions(+), 40 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fe587350ea14..01591e24a95f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6006,6 +6006,14 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
 	return new_cpu;
 }
 
+static inline int __select_idle_cpu(int cpu)
+{
+	if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+		return cpu;
+
+	return -1;
+}
+
 #ifdef CONFIG_SCHED_SMT
 DEFINE_STATIC_KEY_FALSE(sched_smt_present);
 EXPORT_SYMBOL_GPL(sched_smt_present);
@@ -6064,48 +6072,51 @@ void __update_idle_core(struct rq *rq)
  * there are no idle cores left in the system; tracked through
  * sd_llc->shared->has_idle_cores and enabled through update_idle_core() above.
  */
-static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
 {
-	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
-	int core, cpu;
+	bool idle = true;
+	int cpu;
 
 	if (!static_branch_likely(&sched_smt_present))
-		return -1;
-
-	if (!test_idle_cores(target, false))
-		return -1;
-
-	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+		return __select_idle_cpu(core);
 
-	for_each_cpu_wrap(core, cpus, target) {
-		bool idle = true;
-
-		for_each_cpu(cpu, cpu_smt_mask(core)) {
-			if (!available_idle_cpu(cpu)) {
-				idle = false;
-				break;
+	for_each_cpu(cpu, cpu_smt_mask(core)) {
+		if (!available_idle_cpu(cpu)) {
+			idle = false;
+			if (*idle_cpu == -1) {
+				if (sched_idle_cpu(cpu) && cpumask_test_cpu(cpu, p->cpus_ptr)) {
+					*idle_cpu = cpu;
+					break;
+				}
+				continue;
 			}
+			break;
 		}
-
-		if (idle)
-			return core;
-
-		cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
+		if (*idle_cpu == -1 && cpumask_test_cpu(cpu, p->cpus_ptr))
+			*idle_cpu = cpu;
 	}
 
-	/*
-	 * Failed to find an idle core; stop looking for one.
-	 */
-	set_idle_cores(target, 0);
+	if (idle)
+		return core;
 
+	cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
 	return -1;
 }
 
 #else /* CONFIG_SCHED_SMT */
 
-static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
+static inline void set_idle_cores(int cpu, int val)
 {
-	return -1;
+}
+
+static inline bool test_idle_cores(int cpu, bool def)
+{
+	return def;
+}
+
+static inline int select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
+{
+	return __select_idle_cpu(core);
 }
 
 #endif /* CONFIG_SCHED_SMT */
@@ -6118,10 +6129,11 @@ static inline int select_idle_core(struct task_struct *p, struct sched_domain *s
 static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
+	int i, cpu, idle_cpu = -1, nr = INT_MAX;
+	bool smt = test_idle_cores(target, false);
+	int this = smp_processor_id();
 	struct sched_domain *this_sd;
 	u64 time;
-	int this = smp_processor_id();
-	int cpu, nr = INT_MAX;
 
 	this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
 	if (!this_sd)
@@ -6129,7 +6141,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 
 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
 
-	if (sched_feat(SIS_PROP)) {
+	if (sched_feat(SIS_PROP) && !smt) {
 		u64 avg_cost, avg_idle, span_avg;
 
 		/*
@@ -6149,18 +6161,29 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	}
 
 	for_each_cpu_wrap(cpu, cpus, target) {
-		if (!--nr)
-			return -1;
-		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
-			break;
+		if (smt) {
+			i = select_idle_core(p, cpu, cpus, &idle_cpu);
+			if ((unsigned int)i < nr_cpumask_bits)
+				return i;
+
+		} else {
+			if (!--nr)
+				return -1;
+			idle_cpu = __select_idle_cpu(cpu);
+			if ((unsigned int)idle_cpu < nr_cpumask_bits)
+				break;
+		}
 	}
 
-	if (sched_feat(SIS_PROP)) {
+	if (smt)
+		set_idle_cores(this, false);
+
+	if (sched_feat(SIS_PROP) && !smt) {
 		time = cpu_clock(this) - time;
 		update_avg(&this_sd->avg_scan_cost, time);
 	}
 
-	return cpu;
+	return idle_cpu;
 }
 
 /*
@@ -6289,10 +6312,6 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (!sd)
 		return target;
 
-	i = select_idle_core(p, sd, target);
-	if ((unsigned)i < nr_cpumask_bits)
-		return i;
-
 	i = select_idle_cpu(p, sd, target);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 0/4] Scan for an idle sibling in a single pass
  2021-01-27 13:51 [PATCH v5 0/4] Scan for an idle sibling in a single pass Mel Gorman
                   ` (3 preceding siblings ...)
  2021-01-27 13:52 ` [PATCH 4/4] sched/fair: Merge select_idle_core/cpu() Mel Gorman
@ 2021-02-01  1:13 ` Li, Aubrey
  2021-02-01 13:06   ` Mel Gorman
  4 siblings, 1 reply; 10+ messages in thread
From: Li, Aubrey @ 2021-02-01  1:13 UTC (permalink / raw)
  To: Mel Gorman, Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Qais Yousef, LKML

On 2021/1/27 21:51, Mel Gorman wrote:
> Changelog since v4
> o Avoid use of intermediate variable during select_idle_cpu
> 
> Changelog since v3
> o Drop scanning based on cores, SMT4 results showed problems
> 
> Changelog since v2
> o Remove unnecessary parameters
> o Update nr during scan only when scanning for cpus
> 
> Changlog since v1
> o Move extern declaration to header for coding style
> o Remove unnecessary parameter from __select_idle_cpu
> 
> This series of 4 patches reposts three patches from Peter entitled
> "select_idle_sibling() wreckage". It only scans the runqueues in a single
> pass when searching for an idle sibling.
> 
> Three patches from Peter were dropped. The first patch altered how scan
> depth was calculated. Scan depth deletion is a random number generator
> with two major limitations. The avg_idle time is based on the time
> between a CPU going idle and being woken up clamped approximately by
> 2*sysctl_sched_migration_cost.  This is difficult to compare in a sensible
> fashion to avg_scan_cost. The second issue is that only the avg_scan_cost
> of scan failures is recorded and it does not decay.  This requires deeper
> surgery that would justify a patch on its own although Peter notes that
> https://lkml.kernel.org/r/20180530143105.977759909@infradead.org is
> potentially useful for an alternative avg_idle metric.
> 
> The second patch dropped scanned based on cores instead of CPUs as it
> rationalised the difference between core scanning and CPU scanning.
> Unfortunately, Vincent reported problems with SMT4 so it's dropped
> for now until depth searching can be fixed.
> 
> The third patch dropped converted the idle core scan throttling mechanism
> to SIS_PROP. While this would unify the throttling of core and CPU
> scanning, it was not free of regressions and has_idle_cores is a fairly
> effective throttling mechanism with the caveat that it can have a lot of
> false positives for workloads like hackbench.
> 
> Peter's series tried to solve three problems at once, this subset addresses
> one problem.
> 
>  kernel/sched/fair.c     | 151 +++++++++++++++++++---------------------
>  kernel/sched/features.h |   1 -
>  2 files changed, 70 insertions(+), 82 deletions(-)
> 

4 benchmarks measured on a x86 4s system with 24 cores per socket and
2 HTs per core, total 192 CPUs. 

The load level is [25%, 50%, 75%, 100%].

- hackbench almost has a universal win.
- netperf high load has notable changes, as well as tbench 50% load.

Details below:

hackbench: 10 iterations, 10000 loops, 40 fds per group
======================================================

- pipe process

	group	base	%std	v5	%std
	3	1	19.18	1.0266	9.06
	6	1	9.17	0.987	13.03
	9	1	7.11	1.0195	4.61
	12	1	1.07	0.9927	1.43

- pipe thread

	group	base	%std	v5	%std
	3	1	11.14	0.9742	7.27
	6	1	9.15	0.9572	7.48
	9	1	2.95	0.986	4.05
	12	1	1.75	0.9992	1.68

- socket process

	group	base	%std	v5	%std
	3	1	2.9	0.9586	2.39
	6	1	0.68	0.9641	1.3
	9	1	0.64	0.9388	0.76
	12	1	0.56	0.9375	0.55

- socket thread

	group	base	%std	v5	%std
	3	1	3.82	0.9686	2.97
	6	1	2.06	0.9667	1.91
	9	1	0.44	0.9354	1.25
	12	1	0.54	0.9362	0.6

netperf: 10 iterations x 100 seconds, transactions rate / sec
=============================================================

- tcp request/response performance

	thread	base	%std	v4	%std
	25%	1	5.34	1.0039	5.13
	50%	1	4.97	1.0115	6.3
	75%	1	5.09	0.9257	6.75
	100%	1	4.53	0.908	4.83



- udp request/response performance

	thread	base	%std	v4	%std
	25%	1	6.18	0.9896	6.09
	50%	1	5.88	1.0198	8.92
	75%	1	24.38	0.9236	29.14
	100%	1	26.16	0.9063	22.16

tbench: 10 iterations x 100 seconds, throughput / sec
=====================================================

	thread	base	%std	v4	%std
	25%	1	0.45	1.003	1.48
	50%	1	1.71	0.9286	0.82
	75%	1	0.84	0.9928	0.94
	100%	1	0.76	0.9762	0.59

schbench: 10 iterations x 100 seconds, 99th percentile latency
==============================================================

	mthread	base	%std	v4	%std
	25%	1	2.89	0.9884	7.34
	50%	1	40.38	1.0055	38.37
	75%	1	4.76	1.0095	4.62
	100%	1	10.09	1.0083	8.03

Thanks,
-Aubrey

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 0/4] Scan for an idle sibling in a single pass
  2021-02-01  1:13 ` [PATCH v5 0/4] Scan for an idle sibling in a single pass Li, Aubrey
@ 2021-02-01 13:06   ` Mel Gorman
  0 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2021-02-01 13:06 UTC (permalink / raw)
  To: Li, Aubrey
  Cc: Peter Zijlstra, Ingo Molnar, Vincent Guittot, Qais Yousef, LKML

On Mon, Feb 01, 2021 at 09:13:16AM +0800, Li, Aubrey wrote:
> > Peter's series tried to solve three problems at once, this subset addresses
> > one problem.
> > 
> >  kernel/sched/fair.c     | 151 +++++++++++++++++++---------------------
> >  kernel/sched/features.h |   1 -
> >  2 files changed, 70 insertions(+), 82 deletions(-)
> > 
> 
> 4 benchmarks measured on a x86 4s system with 24 cores per socket and
> 2 HTs per core, total 192 CPUs. 
> 
> The load level is [25%, 50%, 75%, 100%].
> 
> - hackbench almost has a universal win.
> - netperf high load has notable changes, as well as tbench 50% load.
> 

Ok, both netperf and tbench are somewhat expected as at those loads are
rapidly idling. Previously I observed that rapidly idling loads can
allow the has_idle_cores test pass for short durations and the double
scanning means there is a greater chance of finding an idle CPU over the
two passes. I think overall it's better to avoid double scanning even if
there are counter examples as it's possible we'll get that back through
better depth selection in the future.

Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip: sched/core] sched/fair: Merge select_idle_core/cpu()
  2021-01-27 13:52 ` [PATCH 4/4] sched/fair: Merge select_idle_core/cpu() Mel Gorman
@ 2021-02-17 13:17   ` tip-bot2 for Mel Gorman
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot2 for Mel Gorman @ 2021-02-17 13:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mel Gorman, Peter Zijlstra (Intel),
	Ingo Molnar, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     9fe1f127b913318c631d0041ecf71486e38c2c2d
Gitweb:        https://git.kernel.org/tip/9fe1f127b913318c631d0041ecf71486e38c2c2d
Author:        Mel Gorman <mgorman@techsingularity.net>
AuthorDate:    Wed, 27 Jan 2021 13:52:03 
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 17 Feb 2021 14:07:25 +01:00

sched/fair: Merge select_idle_core/cpu()

Both select_idle_core() and select_idle_cpu() do a loop over the same
cpumask. Observe that by clearing the already visited CPUs, we can
fold the iteration and iterate a core at a time.

All we need to do is remember any non-idle CPU we encountered while
scanning for an idle core. This way we'll only iterate every CPU once.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210127135203.19633-5-mgorman@techsingularity.net
---
 kernel/sched/fair.c |  99 +++++++++++++++++++++++++------------------
 1 file changed, 59 insertions(+), 40 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6a0fc8a..c73d588 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6019,6 +6019,14 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
 	return new_cpu;
 }
 
+static inline int __select_idle_cpu(int cpu)
+{
+	if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+		return cpu;
+
+	return -1;
+}
+
 #ifdef CONFIG_SCHED_SMT
 DEFINE_STATIC_KEY_FALSE(sched_smt_present);
 EXPORT_SYMBOL_GPL(sched_smt_present);
@@ -6077,48 +6085,51 @@ unlock:
  * there are no idle cores left in the system; tracked through
  * sd_llc->shared->has_idle_cores and enabled through update_idle_core() above.
  */
-static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
 {
-	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
-	int core, cpu;
+	bool idle = true;
+	int cpu;
 
 	if (!static_branch_likely(&sched_smt_present))
-		return -1;
-
-	if (!test_idle_cores(target, false))
-		return -1;
-
-	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+		return __select_idle_cpu(core);
 
-	for_each_cpu_wrap(core, cpus, target) {
-		bool idle = true;
-
-		for_each_cpu(cpu, cpu_smt_mask(core)) {
-			if (!available_idle_cpu(cpu)) {
-				idle = false;
-				break;
+	for_each_cpu(cpu, cpu_smt_mask(core)) {
+		if (!available_idle_cpu(cpu)) {
+			idle = false;
+			if (*idle_cpu == -1) {
+				if (sched_idle_cpu(cpu) && cpumask_test_cpu(cpu, p->cpus_ptr)) {
+					*idle_cpu = cpu;
+					break;
+				}
+				continue;
 			}
+			break;
 		}
-
-		if (idle)
-			return core;
-
-		cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
+		if (*idle_cpu == -1 && cpumask_test_cpu(cpu, p->cpus_ptr))
+			*idle_cpu = cpu;
 	}
 
-	/*
-	 * Failed to find an idle core; stop looking for one.
-	 */
-	set_idle_cores(target, 0);
+	if (idle)
+		return core;
 
+	cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
 	return -1;
 }
 
 #else /* CONFIG_SCHED_SMT */
 
-static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
+static inline void set_idle_cores(int cpu, int val)
 {
-	return -1;
+}
+
+static inline bool test_idle_cores(int cpu, bool def)
+{
+	return def;
+}
+
+static inline int select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
+{
+	return __select_idle_cpu(core);
 }
 
 #endif /* CONFIG_SCHED_SMT */
@@ -6131,10 +6142,11 @@ static inline int select_idle_core(struct task_struct *p, struct sched_domain *s
 static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
+	int i, cpu, idle_cpu = -1, nr = INT_MAX;
+	bool smt = test_idle_cores(target, false);
+	int this = smp_processor_id();
 	struct sched_domain *this_sd;
 	u64 time;
-	int this = smp_processor_id();
-	int cpu, nr = INT_MAX;
 
 	this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
 	if (!this_sd)
@@ -6142,7 +6154,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 
 	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
 
-	if (sched_feat(SIS_PROP)) {
+	if (sched_feat(SIS_PROP) && !smt) {
 		u64 avg_cost, avg_idle, span_avg;
 
 		/*
@@ -6162,18 +6174,29 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	}
 
 	for_each_cpu_wrap(cpu, cpus, target) {
-		if (!--nr)
-			return -1;
-		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
-			break;
+		if (smt) {
+			i = select_idle_core(p, cpu, cpus, &idle_cpu);
+			if ((unsigned int)i < nr_cpumask_bits)
+				return i;
+
+		} else {
+			if (!--nr)
+				return -1;
+			idle_cpu = __select_idle_cpu(cpu);
+			if ((unsigned int)idle_cpu < nr_cpumask_bits)
+				break;
+		}
 	}
 
-	if (sched_feat(SIS_PROP)) {
+	if (smt)
+		set_idle_cores(this, false);
+
+	if (sched_feat(SIS_PROP) && !smt) {
 		time = cpu_clock(this) - time;
 		update_avg(&this_sd->avg_scan_cost, time);
 	}
 
-	return cpu;
+	return idle_cpu;
 }
 
 /*
@@ -6302,10 +6325,6 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (!sd)
 		return target;
 
-	i = select_idle_core(p, sd, target);
-	if ((unsigned)i < nr_cpumask_bits)
-		return i;
-
 	i = select_idle_cpu(p, sd, target);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] sched/fair: Remove select_idle_smt()
  2021-01-25  8:59 ` [PATCH 3/4] sched/fair: Remove select_idle_smt() Mel Gorman
@ 2021-01-27 10:39   ` Vincent Guittot
  0 siblings, 0 replies; 10+ messages in thread
From: Vincent Guittot @ 2021-01-27 10:39 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Peter Zijlstra, Ingo Molnar, Li Aubrey, Qais Yousef, LKML

On Mon, 25 Jan 2021 at 09:59, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
>
> From: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> In order to make the next patch more readable, and to quantify the
> actual effectiveness of this pass, start by removing it.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
>  kernel/sched/fair.c | 30 ------------------------------
>  1 file changed, 30 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c8d8e185cf3b..fe587350ea14 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6101,27 +6101,6 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>         return -1;
>  }
>
> -/*
> - * Scan the local SMT mask for idle CPUs.
> - */
> -static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> -{
> -       int cpu;
> -
> -       if (!static_branch_likely(&sched_smt_present))
> -               return -1;
> -
> -       for_each_cpu(cpu, cpu_smt_mask(target)) {
> -               if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
> -                   !cpumask_test_cpu(cpu, sched_domain_span(sd)))
> -                       continue;
> -               if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> -                       return cpu;
> -       }
> -
> -       return -1;
> -}
> -
>  #else /* CONFIG_SCHED_SMT */
>
>  static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
> @@ -6129,11 +6108,6 @@ static inline int select_idle_core(struct task_struct *p, struct sched_domain *s
>         return -1;
>  }
>
> -static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> -{
> -       return -1;
> -}
> -
>  #endif /* CONFIG_SCHED_SMT */
>
>  /*
> @@ -6323,10 +6297,6 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>         if ((unsigned)i < nr_cpumask_bits)
>                 return i;
>
> -       i = select_idle_smt(p, sd, target);
> -       if ((unsigned)i < nr_cpumask_bits)
> -               return i;
> -
>         return target;
>  }
>
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/4] sched/fair: Remove select_idle_smt()
  2021-01-25  8:59 [PATCH v4 " Mel Gorman
@ 2021-01-25  8:59 ` Mel Gorman
  2021-01-27 10:39   ` Vincent Guittot
  0 siblings, 1 reply; 10+ messages in thread
From: Mel Gorman @ 2021-01-25  8:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman

From: Peter Zijlstra <peterz@infradead.org>

From: Peter Zijlstra (Intel) <peterz@infradead.org>

In order to make the next patch more readable, and to quantify the
actual effectiveness of this pass, start by removing it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 30 ------------------------------
 1 file changed, 30 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c8d8e185cf3b..fe587350ea14 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6101,27 +6101,6 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 	return -1;
 }
 
-/*
- * Scan the local SMT mask for idle CPUs.
- */
-static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
-{
-	int cpu;
-
-	if (!static_branch_likely(&sched_smt_present))
-		return -1;
-
-	for_each_cpu(cpu, cpu_smt_mask(target)) {
-		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
-		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
-			continue;
-		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
-			return cpu;
-	}
-
-	return -1;
-}
-
 #else /* CONFIG_SCHED_SMT */
 
 static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
@@ -6129,11 +6108,6 @@ static inline int select_idle_core(struct task_struct *p, struct sched_domain *s
 	return -1;
 }
 
-static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
-{
-	return -1;
-}
-
 #endif /* CONFIG_SCHED_SMT */
 
 /*
@@ -6323,10 +6297,6 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 
-	i = select_idle_smt(p, sd, target);
-	if ((unsigned)i < nr_cpumask_bits)
-		return i;
-
 	return target;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-02-17 13:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-27 13:51 [PATCH v5 0/4] Scan for an idle sibling in a single pass Mel Gorman
2021-01-27 13:52 ` [PATCH 1/4] sched/fair: Remove SIS_AVG_CPU Mel Gorman
2021-01-27 13:52 ` [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman
2021-01-27 13:52 ` [PATCH 3/4] sched/fair: Remove select_idle_smt() Mel Gorman
2021-01-27 13:52 ` [PATCH 4/4] sched/fair: Merge select_idle_core/cpu() Mel Gorman
2021-02-17 13:17   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2021-02-01  1:13 ` [PATCH v5 0/4] Scan for an idle sibling in a single pass Li, Aubrey
2021-02-01 13:06   ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2021-01-25  8:59 [PATCH v4 " Mel Gorman
2021-01-25  8:59 ` [PATCH 3/4] sched/fair: Remove select_idle_smt() Mel Gorman
2021-01-27 10:39   ` Vincent Guittot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.