linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Enable Cluster Scheduling for x86 Hybrid CPUs
@ 2023-05-04 16:09 Tim Chen
  2023-05-04 16:09 ` [PATCH 1/6] sched/topology: Propagate SMT flags when removing degenerate domain Tim Chen
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Tim Chen @ 2023-05-04 16:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tim Chen, Juri Lelli, Vincent Guittot, Ricardo Neri,
	Ravi V . Shankar, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Len Brown, Mel Gorman, Rafael J . Wysocki,
	Srinivas Pandruvada, Steven Rostedt, Valentin Schneider,
	Ionela Voinescu, x86, linux-kernel, Shrikanth Hegde,
	Srikar Dronamraju, naveen.n.rao, Yicong Yang, Barry Song

Cluster scheduling domain is not enabled on x86 hybrid CPUs as the logic
is missing to do proper load balancing between a cluster
with SMT CPUs in single core and a cluster with multiple Atom CPUs.

When cluster scheduling was first introduced to x86, it was noticed
that with cluster scheduling on hybrid CPU, single threaded task often
ended up on Atom core (or E-core) instead on idle Big core (or P-core),
resulting in lower performance.  Hence cluster scheduling was disabled 
on x86 hybrid CPU.  (See: https://www.phoronix.com/review/linux-516-regress)

Ricardo recently introduced a patch series that greatly improved
the load balancing logic between P-cores and E-cores on x86 hybrid
CPUs.
https://lore.kernel.org/lkml/20230429153219.GC1495785@hirez.programming.kicks-ass.net/T/#m16ebc8de64dbf4c54adebab701b42b47805105f4

However, that patch series is not enough to allow the enabling of cluster
scheduling on hybrid x86 CPUs.  This patch series provides some additional
fixes needed for load balancing between cluster sched group consisting
of SMT CPUs of Big cores and cluster sched group consisting of Atom CPUs.
With these patches applied on top of Ricardo's patch series, load is
properly balanced between the P-core and E-core clusters.  Idle CPUs
are used in the proper order:

1) SMT CPU on an idle P-core, 
2) idle E-core,
3) unused SMT CPU with a busy sibling.

On x86, CPUs in a cluster share L2 cache. Load is now balanced
between the clusters with cluster enabled, for potentially less L2 cache
contention.

I did some experiments on an Alder Lake with 6 P-cores and 8 E-cores,
organized in two clusters of 4 E-core each.

I tested some single threaded benchmarks in Phoronix suite that previously
have shown regressions when cluster scheduling was first enabled. Cluster
scheduling using this patch series performs as well as vanilla kernel.

Single Threaded	6.3-rc5 		with cluster 	Improvement
Benchmark				scheduling	in Performance
		(run-run deviation) 	
-------------------------------------------------------------------------------------------
tjbench		(+/- 0.08%)		(+/- 0.23%)	-0.23%
PhPbench	(+/- 0.31%)		(+/- 0.89%)	-0.39%
flac		(+/- 0.58%)		(+/- 0.22%)	+0.17%
pybench		(+/- 3.16%)		(+/- 0.27%)	+2.55%

For multi-threaded benchmarks, I tried kernel build and tensor flow lite.
Cluster scheduling did best for the 10 thread case where 6 threads run on
the P-cores, 2 threads on one Atom cluster and 2 threads on the other Atom
cluster. Whereas the vanilla kernel will have 6 threads on the P-cores,
4 threads on one Atom cluster.  Though the differences are small and
fall within run variations.

Multi Threaded	6.3-rc5 		with cluster 	Improvement
Benchmark				scheduling	in Performance
(-#threads)	(run-run deviation) 	
-------------------------------------------------------------------------------------------
Kbuild-8	(+/- 2.90%)		(+/- 1.16%)	-0.76%
Kbuild-10	(+/- 3.08%)		(+/- 3.09%)	+0.64%
Kbuild-12	(+/- 3.28%)		(+/- 3.55%)	+0.91%
Tensor Lite-8	(+/- 4.84%)		(+/- 4.61%)	-0.23%
Tensor Lite-10	(+/- 0.87%)		(+/- 1.45%)	+0.47%
Tensor Lite-12	(+/- 1.37%)		(+/- 1.04%)	-0.12%

Thanks for reviewing these patches.

Tim Chen

Ricardo Neri (1):
  sched/fair: Consider the idle state of the whole core for load balance

Tim C Chen (5):
  sched/topology: Propagate SMT flags when removing degenerate domain
  sched/fair: Check whether active load balance is needed in busiest
    group
  sched/fair: Fix busiest group selection for asym groups
  sched/fair: Skip prefer sibling move between SMT group and non-SMT
    group
  sched/x86: Add cluster topology to hybrid CPU

 arch/x86/kernel/smpboot.c |  3 ++
 kernel/sched/fair.c       | 78 ++++++++++++++++++++++++++++++++++++++-
 kernel/sched/topology.c   |  7 +++-
 3 files changed, 86 insertions(+), 2 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/6] sched/topology: Propagate SMT flags when removing degenerate domain
  2023-05-04 16:09 [PATCH 0/6] Enable Cluster Scheduling for x86 Hybrid CPUs Tim Chen
@ 2023-05-04 16:09 ` Tim Chen
  2023-05-04 16:09 ` [PATCH 2/6] sched/fair: Check whether active load balance is needed in busiest group Tim Chen
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Tim Chen @ 2023-05-04 16:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tim C Chen, Juri Lelli, Vincent Guittot, Ricardo Neri,
	Ravi V . Shankar, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Len Brown, Mel Gorman, Rafael J . Wysocki,
	Srinivas Pandruvada, Steven Rostedt, Valentin Schneider,
	Ionela Voinescu, x86, linux-kernel, Shrikanth Hegde,
	Srikar Dronamraju, naveen.n.rao, Yicong Yang, Barry Song,
	Ricardo Neri

From: Tim C Chen <tim.c.chen@linux.intel.com>

When a degenerate cluster domain for core with SMT CPUs is removed,
the SD_SHARE_CPUCAPACITY flag in the local child sched group was not
propagated to the new parent.  We need this flag to properly determine
whether the local sched group is SMT.  Set the flag in the local
child sched group of the new parent sched domain.

Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/topology.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 051aaf65c749..6d5628fcebcf 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -719,8 +719,13 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
 
 		if (sd_parent_degenerate(tmp, parent)) {
 			tmp->parent = parent->parent;
-			if (parent->parent)
+
+			if (parent->parent) {
 				parent->parent->child = tmp;
+				if (tmp->flags & SD_SHARE_CPUCAPACITY)
+					parent->parent->groups->flags |= SD_SHARE_CPUCAPACITY;
+			}
+
 			/*
 			 * Transfer SD_PREFER_SIBLING down in case of a
 			 * degenerate parent; the spans match for this
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/6] sched/fair: Check whether active load balance is needed in busiest group
  2023-05-04 16:09 [PATCH 0/6] Enable Cluster Scheduling for x86 Hybrid CPUs Tim Chen
  2023-05-04 16:09 ` [PATCH 1/6] sched/topology: Propagate SMT flags when removing degenerate domain Tim Chen
@ 2023-05-04 16:09 ` Tim Chen
  2023-05-05 12:16   ` Peter Zijlstra
  2023-05-09 13:31   ` Vincent Guittot
  2023-05-04 16:09 ` [PATCH 3/6] sched/fair: Fix busiest group selection for asym groups Tim Chen
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 22+ messages in thread
From: Tim Chen @ 2023-05-04 16:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tim C Chen, Juri Lelli, Vincent Guittot, Ricardo Neri,
	Ravi V . Shankar, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Len Brown, Mel Gorman, Rafael J . Wysocki,
	Srinivas Pandruvada, Steven Rostedt, Valentin Schneider,
	Ionela Voinescu, x86, linux-kernel, Shrikanth Hegde,
	Srikar Dronamraju, naveen.n.rao, Yicong Yang, Barry Song,
	Ricardo Neri

From: Tim C Chen <tim.c.chen@linux.intel.com>

In the busiest group, we need to consider whether active load balance
to a local group is needed even when it is not overloaded.  For example,
when the busiest group is a SMT group that's fully busy and the destination group
is a cluster group with idle CPU.  Such condition is considered by
asym_active_balance() in load balancing but not when looking for busiest
group and load imbalance.  Add this consideration in find_busiest_group()
and calculate_imbalance().

Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 87317634fab2..bde962aa160a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9433,6 +9433,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 				sgs->group_capacity;
 }
 
+/* One group is SMT while the other group is not */
+static inline bool asymmetric_groups(struct sched_group *sg1,
+				    struct sched_group *sg2)
+{
+	if (!sg1 || !sg2)
+		return false;
+
+	return (sg1->flags & SD_SHARE_CPUCAPACITY) !=
+		(sg2->flags & SD_SHARE_CPUCAPACITY);
+}
+
 /**
  * update_sd_pick_busiest - return 1 on busiest group
  * @env: The load balancing environment.
@@ -10079,6 +10090,31 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 	update_idle_cpu_scan(env, sum_util);
 }
 
+static inline bool asym_active_balance_busiest(struct lb_env *env, struct sd_lb_stats *sds)
+{
+	/*
+	 * Don't balance to a group without spare capacity.
+	 *
+	 * Skip non asymmetric sched group balancing. That check
+	 * is handled by code path handling imbalanced load between
+	 * similar groups.
+	 */
+	if (env->idle == CPU_NOT_IDLE ||
+	    sds->local_stat.group_type != group_has_spare ||
+	    !asymmetric_groups(sds->local, sds->busiest))
+		return false;
+
+	/*
+	 * For SMT source group, pull when there are two or more
+	 * tasks over-utilizing a core.
+	 */
+	if (sds->busiest->flags & SD_SHARE_CPUCAPACITY &&
+	    sds->busiest_stat.sum_h_nr_running > 1)
+		return true;
+
+	return false;
+}
+
 /**
  * calculate_imbalance - Calculate the amount of imbalance present within the
  *			 groups of a given sched_domain during load balance.
@@ -10164,6 +10200,12 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 			return;
 		}
 
+		if (asym_active_balance_busiest(env, sds)) {
+			env->migration_type = migrate_task;
+			env->imbalance = 1;
+			return;
+		}
+
 		if (busiest->group_weight == 1 || sds->prefer_sibling) {
 			unsigned int nr_diff = busiest->sum_nr_running;
 			/*
@@ -10371,6 +10413,9 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 			 */
 			goto out_balanced;
 
+		if (asym_active_balance_busiest(env, &sds))
+			goto force_balance;
+
 		if (busiest->group_weight > 1 &&
 		    local->idle_cpus <= (busiest->idle_cpus + 1))
 			/*
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/6] sched/fair: Fix busiest group selection for asym groups
  2023-05-04 16:09 [PATCH 0/6] Enable Cluster Scheduling for x86 Hybrid CPUs Tim Chen
  2023-05-04 16:09 ` [PATCH 1/6] sched/topology: Propagate SMT flags when removing degenerate domain Tim Chen
  2023-05-04 16:09 ` [PATCH 2/6] sched/fair: Check whether active load balance is needed in busiest group Tim Chen
@ 2023-05-04 16:09 ` Tim Chen
  2023-05-05 13:19   ` Peter Zijlstra
  2023-05-04 16:09 ` [PATCH 4/6] sched/fair: Skip prefer sibling move between SMT group and non-SMT group Tim Chen
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Tim Chen @ 2023-05-04 16:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tim C Chen, Juri Lelli, Vincent Guittot, Ricardo Neri,
	Ravi V . Shankar, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Len Brown, Mel Gorman, Rafael J . Wysocki,
	Srinivas Pandruvada, Steven Rostedt, Valentin Schneider,
	Ionela Voinescu, x86, linux-kernel, Shrikanth Hegde,
	Srikar Dronamraju, naveen.n.rao, Yicong Yang, Barry Song,
	Ricardo Neri

From: Tim C Chen <tim.c.chen@linux.intel.com>

For two groups that have spare capacity and are partially busy, the
busier group should be the group with pure CPUs rather than the group
with SMT CPUs.  We do not want to make the SMT group the busiest one to
pull task off the SMT and makes the whole core empty.

Otherwise suppose in the search for busiest group,
we first encounter an SMT group with 1 task and set it as the busiest.
The local group is an atom cluster with 1 task and we then encounter an atom
cluster group with 3 tasks, we will not pick this atom cluster group over the
SMT group, even though we should.  As a result, we do not load balance
the busier Atom cluster (with 3 tasks) towards the local Atom cluster
(with 1 task).  And it doesn't make sense to pick the 1 task SMT group
as the busier group as we also should not pull task off the SMT towards
the 1 task atom cluster and make the SMT core completely empty.

Fix this case.

Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bde962aa160a..8a325db34b02 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9548,6 +9548,18 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 		break;
 
 	case group_has_spare:
+		/*
+		 * Do not pick sg with SMT CPUs over sg with pure CPUs,
+		 * as we do not want to pull task off half empty SMT core
+		 * and make the core idle.
+		 */
+		if (asymmetric_groups(sds->busiest, sg)) {
+			if (sds->busiest->flags & SD_SHARE_CPUCAPACITY)
+				return true;
+			else
+				return false;
+		}
+
 		/*
 		 * Select not overloaded group with lowest number of idle cpus
 		 * and highest number of running tasks. We could also compare
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/6] sched/fair: Skip prefer sibling move between SMT group and non-SMT group
  2023-05-04 16:09 [PATCH 0/6] Enable Cluster Scheduling for x86 Hybrid CPUs Tim Chen
                   ` (2 preceding siblings ...)
  2023-05-04 16:09 ` [PATCH 3/6] sched/fair: Fix busiest group selection for asym groups Tim Chen
@ 2023-05-04 16:09 ` Tim Chen
  2023-05-05 13:22   ` Peter Zijlstra
  2023-05-09 13:36   ` Vincent Guittot
  2023-05-04 16:09 ` [PATCH 5/6] sched/fair: Consider the idle state of the whole core for load balance Tim Chen
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 22+ messages in thread
From: Tim Chen @ 2023-05-04 16:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tim C Chen, Juri Lelli, Vincent Guittot, Ricardo Neri,
	Ravi V . Shankar, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Len Brown, Mel Gorman, Rafael J . Wysocki,
	Srinivas Pandruvada, Steven Rostedt, Valentin Schneider,
	Ionela Voinescu, x86, linux-kernel, Shrikanth Hegde,
	Srikar Dronamraju, naveen.n.rao, Yicong Yang, Barry Song,
	Ricardo Neri

From: Tim C Chen <tim.c.chen@linux.intel.com>

Do not try to move tasks between non SMT sched group and SMT sched
group for "prefer sibling" load balance.
Let asym_active_balance_busiest() handle that case properly.
Otherwise we could get task bouncing back and forth between
the SMT sched group and non SMT sched group.

Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a325db34b02..58ef7d529731 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10411,8 +10411,12 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 	/*
 	 * Try to move all excess tasks to a sibling domain of the busiest
 	 * group's child domain.
+	 *
+	 * Do not try to move between non smt sched group and smt sched
+	 * group. Let asym active balance properly handle that case.
 	 */
 	if (sds.prefer_sibling && local->group_type == group_has_spare &&
+	    !asymmetric_groups(sds.busiest, sds.local) &&
 	    busiest->sum_nr_running > local->sum_nr_running + 1)
 		goto force_balance;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/6] sched/fair: Consider the idle state of the whole core for load balance
  2023-05-04 16:09 [PATCH 0/6] Enable Cluster Scheduling for x86 Hybrid CPUs Tim Chen
                   ` (3 preceding siblings ...)
  2023-05-04 16:09 ` [PATCH 4/6] sched/fair: Skip prefer sibling move between SMT group and non-SMT group Tim Chen
@ 2023-05-04 16:09 ` Tim Chen
  2023-05-05 13:23   ` Peter Zijlstra
  2023-05-04 16:09 ` [PATCH 6/6] sched/x86: Add cluster topology to hybrid CPU Tim Chen
       [not found] ` <20230505071735.4083-1-hdanton@sina.com>
  6 siblings, 1 reply; 22+ messages in thread
From: Tim Chen @ 2023-05-04 16:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ricardo Neri, Juri Lelli, Vincent Guittot, Ricardo Neri,
	Ravi V . Shankar, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Len Brown, Mel Gorman, Rafael J . Wysocki,
	Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, Ionela Voinescu, x86, linux-kernel,
	Shrikanth Hegde, Srikar Dronamraju, naveen.n.rao, Yicong Yang,
	Barry Song

From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

should_we_balance() traverses the group_balance_mask (AND'ed with lb_env::
cpus) starting from lower numbered CPUs looking for the first idle CPU.

In hybrid x86 systems, the siblings of SMT cores get CPU numbers, before
non-SMT cores:

	[0, 1] [2, 3] [4, 5] 5 6 7 8
         b  i   b  i   b  i  b i i i

In the figure above, CPUs in brackets are siblings of an SMT core. The
rest are non-SMT cores. 'b' indicates a busy CPU, 'i' indicates an
idle CPU.

We should let a CPU on a fully idle core get the first chance to idle
load balance as it has more CPU capacity than a CPU on an idle SMT
CPU with busy sibling.  So for the figure above, if we are running
should_we_balance() to CPU 1, we should return false to let CPU 6 on
idle core to have a chance first to idle load balance.

A partially busy (i.e., of type group_has_spare) local group with SMT 
cores will often have only one SMT sibling busy. If the destination CPU
is a non-SMT core, partially busy, lower-numbered, SMT cores should not
be considered when finding the first idle CPU. 

However, in should_we_balance(), when we encounter idle SMT first in partially
busy core, we prematurely break the search for the first idle CPU.

Higher-numbered, non-SMT cores is not given the chance to have
idle balance done on their behalf. Those CPUs will only be considered
for idle balancing by chance via CPU_NEWLY_IDLE.

Instead, consider the idle state of the whole SMT core.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 58ef7d529731..c77fadba063d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10683,7 +10683,7 @@ static int active_load_balance_cpu_stop(void *data);
 static int should_we_balance(struct lb_env *env)
 {
 	struct sched_group *sg = env->sd->groups;
-	int cpu;
+	int cpu, idle_smt = -1;
 
 	/*
 	 * Ensure the balancing environment is consistent; can happen
@@ -10709,11 +10709,26 @@ static int should_we_balance(struct lb_env *env)
 	for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) {
 		if (!idle_cpu(cpu))
 			continue;
+		else {
+			/*
+			 * Don't balance to idle SMT in busy core right away when
+			 * balancing cores, but remember the first idle SMT CPU for
+			 * later consideration.  Find CPU on an idle core first.
+			 */
+			if (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && !is_core_idle(cpu)) {
+				if (idle_smt == -1)
+					idle_smt = cpu;
+				continue;
+			}
+		}
 
 		/* Are we the first idle CPU? */
 		return cpu == env->dst_cpu;
 	}
 
+	if (idle_smt == env->dst_cpu)
+		return true;
+
 	/* Are we the first CPU of this group ? */
 	return group_balance_cpu(sg) == env->dst_cpu;
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/6] sched/x86: Add cluster topology to hybrid CPU
  2023-05-04 16:09 [PATCH 0/6] Enable Cluster Scheduling for x86 Hybrid CPUs Tim Chen
                   ` (4 preceding siblings ...)
  2023-05-04 16:09 ` [PATCH 5/6] sched/fair: Consider the idle state of the whole core for load balance Tim Chen
@ 2023-05-04 16:09 ` Tim Chen
       [not found] ` <20230505071735.4083-1-hdanton@sina.com>
  6 siblings, 0 replies; 22+ messages in thread
From: Tim Chen @ 2023-05-04 16:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tim C Chen, Juri Lelli, Vincent Guittot, Ricardo Neri,
	Ravi V . Shankar, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Len Brown, Mel Gorman, Rafael J . Wysocki,
	Srinivas Pandruvada, Steven Rostedt, Valentin Schneider,
	Ionela Voinescu, x86, linux-kernel, Shrikanth Hegde,
	Srikar Dronamraju, naveen.n.rao, Yicong Yang, Barry Song,
	Ricardo Neri

From: Tim C Chen <tim.c.chen@linux.intel.com>

Cluster topology was not enabled on hybrid x86 CPU as load balance
was not properly working for cluster domain.  That has been fixed and
cluster domain can be enabled for hybrid CPU.

Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 arch/x86/kernel/smpboot.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index cea297d97034..2489d767c398 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -575,6 +575,9 @@ static struct sched_domain_topology_level x86_hybrid_topology[] = {
 #ifdef CONFIG_SCHED_SMT
 	{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
 #endif
+#ifdef CONFIG_SCHED_CLUSTER
+	{ cpu_clustergroup_mask, x86_cluster_flags, SD_INIT_NAME(CLS) },
+#endif
 #ifdef CONFIG_SCHED_MC
 	{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
 #endif
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] sched/fair: Check whether active load balance is needed in busiest group
  2023-05-04 16:09 ` [PATCH 2/6] sched/fair: Check whether active load balance is needed in busiest group Tim Chen
@ 2023-05-05 12:16   ` Peter Zijlstra
  2023-05-05 22:29     ` Tim Chen
  2023-05-09 13:31   ` Vincent Guittot
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2023-05-05 12:16 UTC (permalink / raw)
  To: Tim Chen
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V . Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J . Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Valentin Schneider, Ionela Voinescu, x86,
	linux-kernel, Shrikanth Hegde, Srikar Dronamraju, naveen.n.rao,
	Yicong Yang, Barry Song, Ricardo Neri

On Thu, May 04, 2023 at 09:09:52AM -0700, Tim Chen wrote:
> From: Tim C Chen <tim.c.chen@linux.intel.com>
> 
> In the busiest group, we need to consider whether active load balance
> to a local group is needed even when it is not overloaded.  For example,
> when the busiest group is a SMT group that's fully busy and the destination group
> is a cluster group with idle CPU.  Such condition is considered by
> asym_active_balance() in load balancing but not when looking for busiest
> group and load imbalance.  Add this consideration in find_busiest_group()
> and calculate_imbalance().
> 
> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  kernel/sched/fair.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 87317634fab2..bde962aa160a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9433,6 +9433,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>  				sgs->group_capacity;
>  }
>  
> +/* One group is SMT while the other group is not */
> +static inline bool asymmetric_groups(struct sched_group *sg1,
> +				    struct sched_group *sg2)
> +{
> +	if (!sg1 || !sg2)
> +		return false;
> +
> +	return (sg1->flags & SD_SHARE_CPUCAPACITY) !=
> +		(sg2->flags & SD_SHARE_CPUCAPACITY);
> +}
> +
>  /**
>   * update_sd_pick_busiest - return 1 on busiest group
>   * @env: The load balancing environment.
> @@ -10079,6 +10090,31 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
>  	update_idle_cpu_scan(env, sum_util);
>  }
>  
> +static inline bool asym_active_balance_busiest(struct lb_env *env, struct sd_lb_stats *sds)
> +{
> +	/*
> +	 * Don't balance to a group without spare capacity.
> +	 *
> +	 * Skip non asymmetric sched group balancing. That check
> +	 * is handled by code path handling imbalanced load between
> +	 * similar groups.
> +	 */
> +	if (env->idle == CPU_NOT_IDLE ||
> +	    sds->local_stat.group_type != group_has_spare ||
> +	    !asymmetric_groups(sds->local, sds->busiest))
> +		return false;
> +
> +	/*
> +	 * For SMT source group, pull when there are two or more
> +	 * tasks over-utilizing a core.
> +	 */
> +	if (sds->busiest->flags & SD_SHARE_CPUCAPACITY &&
> +	    sds->busiest_stat.sum_h_nr_running > 1)
> +		return true;
> +
> +	return false;
> +}

This all seems to be mixing two 'asymmetric' things in the 'asym'
namespace :/ One being the SD_ASYM_PACKING and then the above SMT/no-SMT
core thing.

> +
>  /**
>   * calculate_imbalance - Calculate the amount of imbalance present within the
>   *			 groups of a given sched_domain during load balance.
> @@ -10164,6 +10200,12 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>  			return;
>  		}
>  
> +		if (asym_active_balance_busiest(env, sds)) {
> +			env->migration_type = migrate_task;
> +			env->imbalance = 1;
> +			return;
> +		}
> +
>  		if (busiest->group_weight == 1 || sds->prefer_sibling) {
>  			unsigned int nr_diff = busiest->sum_nr_running;
>  			/*
> @@ -10371,6 +10413,9 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
>  			 */
>  			goto out_balanced;
>  
> +		if (asym_active_balance_busiest(env, &sds))
> +			goto force_balance;
> +
>  		if (busiest->group_weight > 1 &&
>  		    local->idle_cpus <= (busiest->idle_cpus + 1))
>  			/*

All the cases here have a nice (CodingStyle busting) comment, perhaps
add the missing {} when hou add the comment?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] sched/fair: Fix busiest group selection for asym groups
  2023-05-04 16:09 ` [PATCH 3/6] sched/fair: Fix busiest group selection for asym groups Tim Chen
@ 2023-05-05 13:19   ` Peter Zijlstra
  2023-05-05 22:36     ` Tim Chen
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2023-05-05 13:19 UTC (permalink / raw)
  To: Tim Chen
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V . Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J . Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Valentin Schneider, Ionela Voinescu, x86,
	linux-kernel, Shrikanth Hegde, Srikar Dronamraju, naveen.n.rao,
	Yicong Yang, Barry Song, Ricardo Neri

On Thu, May 04, 2023 at 09:09:53AM -0700, Tim Chen wrote:
> From: Tim C Chen <tim.c.chen@linux.intel.com>
> 
> For two groups that have spare capacity and are partially busy, the
> busier group should be the group with pure CPUs rather than the group
> with SMT CPUs.  We do not want to make the SMT group the busiest one to
> pull task off the SMT and makes the whole core empty.
> 
> Otherwise suppose in the search for busiest group,
> we first encounter an SMT group with 1 task and set it as the busiest.
> The local group is an atom cluster with 1 task and we then encounter an atom
> cluster group with 3 tasks, we will not pick this atom cluster group over the
> SMT group, even though we should.  As a result, we do not load balance
> the busier Atom cluster (with 3 tasks) towards the local Atom cluster
> (with 1 task).  And it doesn't make sense to pick the 1 task SMT group
> as the busier group as we also should not pull task off the SMT towards
> the 1 task atom cluster and make the SMT core completely empty.
> 
> Fix this case.
> 
> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  kernel/sched/fair.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bde962aa160a..8a325db34b02 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9548,6 +9548,18 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>  		break;
>  
>  	case group_has_spare:
> +		/*
> +		 * Do not pick sg with SMT CPUs over sg with pure CPUs,
> +		 * as we do not want to pull task off half empty SMT core
> +		 * and make the core idle.
> +		 */

Comment says what the code does; not why.

> +		if (asymmetric_groups(sds->busiest, sg)) {
> +			if (sds->busiest->flags & SD_SHARE_CPUCAPACITY)
> +				return true;
> +			else
> +				return false;

			return (sds->busiest->flags & SD_SHARE_CPUCAPACITY)
> +		}

Also, should this not be part of the previous patch?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] sched/fair: Skip prefer sibling move between SMT group and non-SMT group
  2023-05-04 16:09 ` [PATCH 4/6] sched/fair: Skip prefer sibling move between SMT group and non-SMT group Tim Chen
@ 2023-05-05 13:22   ` Peter Zijlstra
  2023-05-05 23:07     ` Tim Chen
  2023-05-09 13:36   ` Vincent Guittot
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2023-05-05 13:22 UTC (permalink / raw)
  To: Tim Chen
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V . Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J . Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Valentin Schneider, Ionela Voinescu, x86,
	linux-kernel, Shrikanth Hegde, Srikar Dronamraju, naveen.n.rao,
	Yicong Yang, Barry Song, Ricardo Neri

On Thu, May 04, 2023 at 09:09:54AM -0700, Tim Chen wrote:
> From: Tim C Chen <tim.c.chen@linux.intel.com>
> 
> Do not try to move tasks between non SMT sched group and SMT sched
> group for "prefer sibling" load balance.
> Let asym_active_balance_busiest() handle that case properly.
> Otherwise we could get task bouncing back and forth between
> the SMT sched group and non SMT sched group.
> 
> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  kernel/sched/fair.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a325db34b02..58ef7d529731 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10411,8 +10411,12 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
>  	/*
>  	 * Try to move all excess tasks to a sibling domain of the busiest
>  	 * group's child domain.
> +	 *
> +	 * Do not try to move between non smt sched group and smt sched
> +	 * group. Let asym active balance properly handle that case.
>  	 */
>  	if (sds.prefer_sibling && local->group_type == group_has_spare &&
> +	    !asymmetric_groups(sds.busiest, sds.local) &&
>  	    busiest->sum_nr_running > local->sum_nr_running + 1)
>  		goto force_balance;

This seems to have the hidden assumption that a !SMT core is somehow
'less' that an SMT code. Should this not also look at
sched_asym_prefer() to establush this is so?

I mean, imagine I have a regular system and just offline one smt sibling
for giggles.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] sched/fair: Consider the idle state of the whole core for load balance
  2023-05-04 16:09 ` [PATCH 5/6] sched/fair: Consider the idle state of the whole core for load balance Tim Chen
@ 2023-05-05 13:23   ` Peter Zijlstra
  2023-05-05 22:51     ` Tim Chen
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2023-05-05 13:23 UTC (permalink / raw)
  To: Tim Chen
  Cc: Ricardo Neri, Juri Lelli, Vincent Guittot, Ricardo Neri,
	Ravi V . Shankar, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Len Brown, Mel Gorman, Rafael J . Wysocki,
	Srinivas Pandruvada, Steven Rostedt, Valentin Schneider,
	Ionela Voinescu, x86, linux-kernel, Shrikanth Hegde,
	Srikar Dronamraju, naveen.n.rao, Yicong Yang, Barry Song

On Thu, May 04, 2023 at 09:09:55AM -0700, Tim Chen wrote:

> @@ -10709,11 +10709,26 @@ static int should_we_balance(struct lb_env *env)
>  	for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) {
>  		if (!idle_cpu(cpu))
>  			continue;
> +		else {
> +			/*
> +			 * Don't balance to idle SMT in busy core right away when
> +			 * balancing cores, but remember the first idle SMT CPU for
> +			 * later consideration.  Find CPU on an idle core first.
> +			 */
> +			if (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && !is_core_idle(cpu)) {
> +				if (idle_smt == -1)
> +					idle_smt = cpu;
> +				continue;
> +			}
> +		}

Not only does that bust CodingStyle, it's also entirely daft. What
exactly is the purpose of that else statement?

>  
>  		/* Are we the first idle CPU? */
>  		return cpu == env->dst_cpu;
>  	}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] sched/fair: Check whether active load balance is needed in busiest group
  2023-05-05 12:16   ` Peter Zijlstra
@ 2023-05-05 22:29     ` Tim Chen
  2023-05-05 23:44       ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Tim Chen @ 2023-05-05 22:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V . Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J . Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Valentin Schneider, Ionela Voinescu, x86,
	linux-kernel, Shrikanth Hegde, Srikar Dronamraju, naveen.n.rao,
	Yicong Yang, Barry Song, Ricardo Neri

On Fri, 2023-05-05 at 14:16 +0200, Peter Zijlstra wrote:
> 
> > +static inline bool asym_active_balance_busiest(struct lb_env *env, struct sd_lb_stats *sds)
> > +{
> > +	/*
> > +	 * Don't balance to a group without spare capacity.
> > +	 *
> > +	 * Skip non asymmetric sched group balancing. That check
> > +	 * is handled by code path handling imbalanced load between
> > +	 * similar groups.
> > +	 */
> > +	if (env->idle == CPU_NOT_IDLE ||
> > +	    sds->local_stat.group_type != group_has_spare ||
> > +	    !asymmetric_groups(sds->local, sds->busiest))
> > +		return false;
> > +
> > +	/*
> > +	 * For SMT source group, pull when there are two or more
> > +	 * tasks over-utilizing a core.
> > +	 */
> > +	if (sds->busiest->flags & SD_SHARE_CPUCAPACITY &&
> > +	    sds->busiest_stat.sum_h_nr_running > 1)
> > +		return true;
> > +
> > +	return false;
> > +}
> 
> This all seems to be mixing two 'asymmetric' things in the 'asym'
> namespace :/ One being the SD_ASYM_PACKING and then the above SMT/no-SMT
> core thing.

Yeah, I am kind of abusing the "asymmetric" word.  However, the above
code does try to set things up for the aysm_active_balance() code
later. Any suggestion on better names for "asymmetric_groups()" and
and "asym_active_balance_busiest()"? 

Perhaps "hybrid_groups()" and "hybrid_active_balance_busiest()"?

> 
> > +
> >  /**
> >   * calculate_imbalance - Calculate the amount of imbalance present within the
> >   *			 groups of a given sched_domain during load balance.
> > @@ -10164,6 +10200,12 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
> >  			return;
> >  		}
> >  
> > +		if (asym_active_balance_busiest(env, sds)) {
> > +			env->migration_type = migrate_task;
> > +			env->imbalance = 1;
> > +			return;
> > +		}
> > +
> >  		if (busiest->group_weight == 1 || sds->prefer_sibling) {
> >  			unsigned int nr_diff = busiest->sum_nr_running;
> >  			/*
> > @@ -10371,6 +10413,9 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> >  			 */
> >  			goto out_balanced;
> >  
> > +		if (asym_active_balance_busiest(env, &sds))
> > +			goto force_balance;
> > +
> >  		if (busiest->group_weight > 1 &&
> >  		    local->idle_cpus <= (busiest->idle_cpus + 1))
> >  			/*
> 
> All the cases here have a nice (CodingStyle busting) comment, perhaps
> add the missing {} when hou add the comment?

Sure, will add a comment here.

Tim


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] sched/fair: Fix busiest group selection for asym groups
  2023-05-05 13:19   ` Peter Zijlstra
@ 2023-05-05 22:36     ` Tim Chen
  0 siblings, 0 replies; 22+ messages in thread
From: Tim Chen @ 2023-05-05 22:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V . Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J . Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Valentin Schneider, Ionela Voinescu, x86,
	linux-kernel, Shrikanth Hegde, Srikar Dronamraju, naveen.n.rao,
	Yicong Yang, Barry Song, Ricardo Neri

On Fri, 2023-05-05 at 15:19 +0200, Peter Zijlstra wrote:
> 
> >  
> >  	case group_has_spare:
> > +		/*
> > +		 * Do not pick sg with SMT CPUs over sg with pure CPUs,
> > +		 * as we do not want to pull task off half empty SMT core
> > +		 * and make the core idle.
> > +		 */
> 
> Comment says what the code does; not why.

Good point, will make the comment better.

> 
> > +		if (asymmetric_groups(sds->busiest, sg)) {
> > +			if (sds->busiest->flags & SD_SHARE_CPUCAPACITY)
> > +				return true;
> > +			else
> > +				return false;
> 
> 			return (sds->busiest->flags & SD_SHARE_CPUCAPACITY)
> > +		}
> 
> Also, should this not be part of the previous patch?

Sure, I can merge it with the previous patch.

Tim


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] sched/fair: Consider the idle state of the whole core for load balance
       [not found] ` <20230505071735.4083-1-hdanton@sina.com>
@ 2023-05-05 22:49   ` Tim Chen
  0 siblings, 0 replies; 22+ messages in thread
From: Tim Chen @ 2023-05-05 22:49 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Peter Zijlstra, Ricardo Neri, Mel Gorman, linux-kernel

On Fri, 2023-05-05 at 15:17 +0800, Hillf Danton wrote:
> On 4 May 2023 09:09:55 -0700 Tim Chen <tim.c.chen@linux.intel.com>
> > From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > 
> > should_we_balance() traverses the group_balance_mask (AND'ed with lb_env::
> > cpus) starting from lower numbered CPUs looking for the first idle CPU.
> > 
> > In hybrid x86 systems, the siblings of SMT cores get CPU numbers, before
> > non-SMT cores:
> > 
> > 	[0, 1] [2, 3] [4, 5] 5 6 7 8
> >        b  i   b  i   b  i  b i i i
> > 
> > In the figure above, CPUs in brackets are siblings of an SMT core. The
> > rest are non-SMT cores. 'b' indicates a busy CPU, 'i' indicates an
> > idle CPU.
> 
> Better if l2-cache affinity is added in the diagram above. And better
> again if the diagram is available upon introducing hybrid x86 in the
> cover letter.

Good suggestion.  Will update the diagram and add it to cover letter.

Tim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] sched/fair: Consider the idle state of the whole core for load balance
  2023-05-05 13:23   ` Peter Zijlstra
@ 2023-05-05 22:51     ` Tim Chen
  0 siblings, 0 replies; 22+ messages in thread
From: Tim Chen @ 2023-05-05 22:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ricardo Neri, Juri Lelli, Vincent Guittot, Ricardo Neri,
	Ravi V . Shankar, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Len Brown, Mel Gorman, Rafael J . Wysocki,
	Srinivas Pandruvada, Steven Rostedt, Valentin Schneider,
	Ionela Voinescu, x86, linux-kernel, Shrikanth Hegde,
	Srikar Dronamraju, naveen.n.rao, Yicong Yang, Barry Song

On Fri, 2023-05-05 at 15:23 +0200, Peter Zijlstra wrote:
> On Thu, May 04, 2023 at 09:09:55AM -0700, Tim Chen wrote:
> 
> > @@ -10709,11 +10709,26 @@ static int should_we_balance(struct lb_env *env)
> >  	for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) {
> >  		if (!idle_cpu(cpu))
> >  			continue;
> > +		else {
> > +			/*
> > +			 * Don't balance to idle SMT in busy core right away when
> > +			 * balancing cores, but remember the first idle SMT CPU for
> > +			 * later consideration.  Find CPU on an idle core first.
> > +			 */
> > +			if (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && !is_core_idle(cpu)) {
> > +				if (idle_smt == -1)
> > +					idle_smt = cpu;
> > +				continue;
> > +			}
> > +		}
> 
> Not only does that bust CodingStyle, it's also entirely daft. What
> exactly is the purpose of that else statement?
> 
> 

Yeah, that's a dumb "else" statement.  Will remove that.

Tim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] sched/fair: Skip prefer sibling move between SMT group and non-SMT group
  2023-05-05 13:22   ` Peter Zijlstra
@ 2023-05-05 23:07     ` Tim Chen
  2023-05-05 23:38       ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Tim Chen @ 2023-05-05 23:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V . Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J . Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Valentin Schneider, Ionela Voinescu, x86,
	linux-kernel, Shrikanth Hegde, Srikar Dronamraju, naveen.n.rao,
	Yicong Yang, Barry Song, Ricardo Neri

On Fri, 2023-05-05 at 15:22 +0200, Peter Zijlstra wrote:
> On Thu, May 04, 2023 at 09:09:54AM -0700, Tim Chen wrote:
> > From: Tim C Chen <tim.c.chen@linux.intel.com>
> > 
> > Do not try to move tasks between non SMT sched group and SMT sched
> > group for "prefer sibling" load balance.
> > Let asym_active_balance_busiest() handle that case properly.
> > Otherwise we could get task bouncing back and forth between
> > the SMT sched group and non SMT sched group.
> > 
> > Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > ---
> >  kernel/sched/fair.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 8a325db34b02..58ef7d529731 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10411,8 +10411,12 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> >  	/*
> >  	 * Try to move all excess tasks to a sibling domain of the busiest
> >  	 * group's child domain.
> > +	 *
> > +	 * Do not try to move between non smt sched group and smt sched
> > +	 * group. Let asym active balance properly handle that case.
> >  	 */
> >  	if (sds.prefer_sibling && local->group_type == group_has_spare &&
> > +	    !asymmetric_groups(sds.busiest, sds.local) &&
> >  	    busiest->sum_nr_running > local->sum_nr_running + 1)
> >  		goto force_balance;
> 
> This seems to have the hidden assumption that a !SMT core is somehow
> 'less' that an SMT code. Should this not also look at
> sched_asym_prefer() to establush this is so?
> 
> I mean, imagine I have a regular system and just offline one smt sibling
> for giggles.

I don't quite follow your point as asymmetric_groups() returns false even
one smt sibling is offlined.

Even say sds.busiest has 1 SMT and sds.local has 2 SMT, both sched groups still
have SD_SHARE_CPUCAPACITY flag turned on.  So asymmetric_groups() return
false and the load balancing logic is not changed for regular non-hybrid system.

I may be missing something.

Tim

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] sched/fair: Skip prefer sibling move between SMT group and non-SMT group
  2023-05-05 23:07     ` Tim Chen
@ 2023-05-05 23:38       ` Peter Zijlstra
  2023-05-06  0:08         ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2023-05-05 23:38 UTC (permalink / raw)
  To: Tim Chen
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V . Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J . Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Valentin Schneider, Ionela Voinescu, x86,
	linux-kernel, Shrikanth Hegde, Srikar Dronamraju, naveen.n.rao,
	Yicong Yang, Barry Song, Ricardo Neri

On Fri, May 05, 2023 at 04:07:39PM -0700, Tim Chen wrote:
> On Fri, 2023-05-05 at 15:22 +0200, Peter Zijlstra wrote:
> > On Thu, May 04, 2023 at 09:09:54AM -0700, Tim Chen wrote:
> > > From: Tim C Chen <tim.c.chen@linux.intel.com>
> > > 
> > > Do not try to move tasks between non SMT sched group and SMT sched
> > > group for "prefer sibling" load balance.
> > > Let asym_active_balance_busiest() handle that case properly.
> > > Otherwise we could get task bouncing back and forth between
> > > the SMT sched group and non SMT sched group.
> > > 
> > > Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > ---
> > >  kernel/sched/fair.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 8a325db34b02..58ef7d529731 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -10411,8 +10411,12 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> > >  	/*
> > >  	 * Try to move all excess tasks to a sibling domain of the busiest
> > >  	 * group's child domain.
> > > +	 *
> > > +	 * Do not try to move between non smt sched group and smt sched
> > > +	 * group. Let asym active balance properly handle that case.
> > >  	 */
> > >  	if (sds.prefer_sibling && local->group_type == group_has_spare &&
> > > +	    !asymmetric_groups(sds.busiest, sds.local) &&
> > >  	    busiest->sum_nr_running > local->sum_nr_running + 1)
> > >  		goto force_balance;
> > 
> > This seems to have the hidden assumption that a !SMT core is somehow
> > 'less' that an SMT code. Should this not also look at
> > sched_asym_prefer() to establush this is so?
> > 
> > I mean, imagine I have a regular system and just offline one smt sibling
> > for giggles.
> 
> I don't quite follow your point as asymmetric_groups() returns false even
> one smt sibling is offlined.
> 
> Even say sds.busiest has 1 SMT and sds.local has 2 SMT, both sched groups still
> have SD_SHARE_CPUCAPACITY flag turned on.  So asymmetric_groups() return
> false and the load balancing logic is not changed for regular non-hybrid system.
> 
> I may be missing something.

What's the difference between the two cases? That is, if the remaining
sibling will have SD_SHARE_CPUCAPACIY from the degenerate SMT domain
that's been reaped, then why doesn't the same thing apply to the atoms
in the hybrid muck?

Those two cases *should* be identical, both cases you have cores with
and cores without SMT.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] sched/fair: Check whether active load balance is needed in busiest group
  2023-05-05 22:29     ` Tim Chen
@ 2023-05-05 23:44       ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2023-05-05 23:44 UTC (permalink / raw)
  To: Tim Chen
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V . Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J . Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Valentin Schneider, Ionela Voinescu, x86,
	linux-kernel, Shrikanth Hegde, Srikar Dronamraju, naveen.n.rao,
	Yicong Yang, Barry Song, Ricardo Neri

On Fri, May 05, 2023 at 03:29:45PM -0700, Tim Chen wrote:
> On Fri, 2023-05-05 at 14:16 +0200, Peter Zijlstra wrote:
> > 
> > > +static inline bool asym_active_balance_busiest(struct lb_env *env, struct sd_lb_stats *sds)
> > > +{
> > > +	/*
> > > +	 * Don't balance to a group without spare capacity.
> > > +	 *
> > > +	 * Skip non asymmetric sched group balancing. That check
> > > +	 * is handled by code path handling imbalanced load between
> > > +	 * similar groups.
> > > +	 */
> > > +	if (env->idle == CPU_NOT_IDLE ||
> > > +	    sds->local_stat.group_type != group_has_spare ||
> > > +	    !asymmetric_groups(sds->local, sds->busiest))
> > > +		return false;
> > > +
> > > +	/*
> > > +	 * For SMT source group, pull when there are two or more
> > > +	 * tasks over-utilizing a core.
> > > +	 */
> > > +	if (sds->busiest->flags & SD_SHARE_CPUCAPACITY &&
> > > +	    sds->busiest_stat.sum_h_nr_running > 1)
> > > +		return true;
> > > +
> > > +	return false;
> > > +}
> > 
> > This all seems to be mixing two 'asymmetric' things in the 'asym'
> > namespace :/ One being the SD_ASYM_PACKING and then the above SMT/no-SMT
> > core thing.
> 
> Yeah, I am kind of abusing the "asymmetric" word.  However, the above
> code does try to set things up for the aysm_active_balance() code
> later. Any suggestion on better names for "asymmetric_groups()" and
> and "asym_active_balance_busiest()"? 
> 
> Perhaps "hybrid_groups()" and "hybrid_active_balance_busiest()"?

As per the other subthread; I really don't think this whole SMT vs
non-SMT should be in any way shape or form be related to hybrid.
Offlining siblings should really get you the same topology.

(and if that currently is not the case, that should be fixed)

(and also; we should probably add group_flags to
 /debug/sched/domains/cpuN/domainM/ so we can actually see what's what,
 because this seems to be a bit of a blind spot).

That also suggests the hybrid naming suggestion is not a very good one.

And I'll blame it being nearly 2am for not being able to come up with a
good suggestion, but I'm thinking it ought to have SMT in the name
somehow.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] sched/fair: Skip prefer sibling move between SMT group and non-SMT group
  2023-05-05 23:38       ` Peter Zijlstra
@ 2023-05-06  0:08         ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2023-05-06  0:08 UTC (permalink / raw)
  To: Tim Chen
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V . Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J . Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Valentin Schneider, Ionela Voinescu, x86,
	linux-kernel, Shrikanth Hegde, Srikar Dronamraju, naveen.n.rao,
	Yicong Yang, Barry Song, Ricardo Neri

On Sat, May 06, 2023 at 01:38:36AM +0200, Peter Zijlstra wrote:
> On Fri, May 05, 2023 at 04:07:39PM -0700, Tim Chen wrote:
> > On Fri, 2023-05-05 at 15:22 +0200, Peter Zijlstra wrote:
> > > On Thu, May 04, 2023 at 09:09:54AM -0700, Tim Chen wrote:
> > > > From: Tim C Chen <tim.c.chen@linux.intel.com>
> > > > 
> > > > Do not try to move tasks between non SMT sched group and SMT sched
> > > > group for "prefer sibling" load balance.
> > > > Let asym_active_balance_busiest() handle that case properly.
> > > > Otherwise we could get task bouncing back and forth between
> > > > the SMT sched group and non SMT sched group.
> > > > 
> > > > Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > > ---
> > > >  kernel/sched/fair.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > > 
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index 8a325db34b02..58ef7d529731 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -10411,8 +10411,12 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> > > >  	/*
> > > >  	 * Try to move all excess tasks to a sibling domain of the busiest
> > > >  	 * group's child domain.
> > > > +	 *
> > > > +	 * Do not try to move between non smt sched group and smt sched
> > > > +	 * group. Let asym active balance properly handle that case.
> > > >  	 */
> > > >  	if (sds.prefer_sibling && local->group_type == group_has_spare &&
> > > > +	    !asymmetric_groups(sds.busiest, sds.local) &&
> > > >  	    busiest->sum_nr_running > local->sum_nr_running + 1)
> > > >  		goto force_balance;
> > > 
> > > This seems to have the hidden assumption that a !SMT core is somehow
> > > 'less' that an SMT code. Should this not also look at
> > > sched_asym_prefer() to establush this is so?
> > > 
> > > I mean, imagine I have a regular system and just offline one smt sibling
> > > for giggles.
> > 
> > I don't quite follow your point as asymmetric_groups() returns false even
> > one smt sibling is offlined.
> > 
> > Even say sds.busiest has 1 SMT and sds.local has 2 SMT, both sched groups still
> > have SD_SHARE_CPUCAPACITY flag turned on.  So asymmetric_groups() return
> > false and the load balancing logic is not changed for regular non-hybrid system.
> > 
> > I may be missing something.
> 
> What's the difference between the two cases? That is, if the remaining
> sibling will have SD_SHARE_CPUCAPACIY from the degenerate SMT domain
> that's been reaped, then why doesn't the same thing apply to the atoms
> in the hybrid muck?
> 
> Those two cases *should* be identical, both cases you have cores with
> and cores without SMT.

On my alderlake:

[  202.222019] CPU0 attaching sched-domain(s):
[  202.222509]  domain-0: span=0-1 level=SMT
[  202.222707]   groups: 0:{ span=0 }, 1:{ span=1 }
[  202.222945]   domain-1: span=0-23 level=MC
[  202.223148]    groups: 0:{ span=0-1 cap=2048 }, 2:{ span=2-3 cap=2048 }, 4:{ span=4-5 cap=2048 }, 6:{ span=6-7 cap=2048 }, 8:{ span=8-9 cap=2048 }, 10:{ span=10-11 cap=2048 },12:{ span=12-13 cap=2048 }, 14:{ span=14-15 cap=2048 }, 16:{ span=16 }, 17:{ span=17 }, 18:{ span=18 }, 19:{ span=19 }, 20:{ span=20 }, 21:{ span=21 }, 22:{ span=22 }, 23:{ span=23 }
...
[  202.249979] CPU23 attaching sched-domain(s):
[  202.250127]  domain-0: span=0-23 level=MC
[  202.250198]   groups: 23:{ span=23 }, 0:{ span=0-1 cap=2048 }, 2:{ span=2-3 cap=2048 }, 4:{ span=4-5 cap=2048 }, 6:{ span=6-7 cap=2048 }, 8:{ span=8-9 cap=2048 }, 10:{ span=10-11 cap=2048 }, 12:{ span=12-13 cap=2048 }, 14:{ span=14-15 cap=2048 }, 16:{ span=16 }, 17:{ span=17 }, 18:{ span=18 }, 19:{ span=19 }, 20:{ span=20 }, 21:{ span=21 }, 22:{ span=22 }

$ echo 0 > /sys/devices/system/cpu/cpu1/online
[  251.213848] CPU0 attaching sched-domain(s):
[  251.214376]  domain-0: span=0,2-23 level=MC
[  251.214580]   groups: 0:{ span=0 }, 2:{ span=2-3 cap=2048 }, 4:{ span=4-5 cap=2048 }, 6:{ span=6-7 cap=2048 }, 8:{ span=8-9 cap=2048 }, 10:{ span=10-11 cap=2048 }, 12:{ span=12-13 cap=2048 }, 14:{ span=14-15 cap=2048 }, 16:{ span=16 }, 17:{ span=17 }, 18:{ span=18 }, 19:{ span=19 }, 20:{ span=20 }, 21:{ span=21 }, 22:{ span=22 }, 23:{ span=23 }
...
[  251.239511] CPU23 attaching sched-domain(s):
[  251.239656]  domain-0: span=0,2-23 level=MC
[  251.239727]   groups: 23:{ span=23 }, 0:{ span=0 }, 2:{ span=2-3 cap=2048 }, 4:{ span=4-5 cap=2048 }, 6:{ span=6-7 cap=2048 }, 8:{ span=8-9 cap=2048 }, 10:{ span=10-11 cap=2048 }, 12:{ span=12-13 cap=2048 }, 14:{ span=14-15 cap=2048 }, 16:{ span=16 }, 17:{ span=17 }, 18:{ span=18 }, 19:{ span=19 }, 20:{ span=20 }, 21:{ span=21 }, 22:{ span=22 }

$ cat /debug/sched/domains/cpu0/domain0/groups_flags

$ cat /debug/sched/domains/cpu23/domain0/groups_flags


IOW, neither the big core with SMT with one sibling offline, nor the
little core with no SMT on at all have the relevant flags set on their
domain0 groups.



---
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 98bfc0f4ec94..e408b2889186 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -427,6 +427,7 @@ static void register_sd(struct sched_domain *sd, struct dentry *parent)
 #undef SDM
 
 	debugfs_create_file("flags", 0444, parent, &sd->flags, &sd_flags_fops);
+	debugfs_create_file("groups_flags", 0444, parent, &sd->groups->flags, &sd_flags_fops);
 }
 
 void update_sched_domain_debugfs(void)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] sched/fair: Check whether active load balance is needed in busiest group
  2023-05-04 16:09 ` [PATCH 2/6] sched/fair: Check whether active load balance is needed in busiest group Tim Chen
  2023-05-05 12:16   ` Peter Zijlstra
@ 2023-05-09 13:31   ` Vincent Guittot
  1 sibling, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2023-05-09 13:31 UTC (permalink / raw)
  To: Tim Chen
  Cc: Peter Zijlstra, Juri Lelli, Ricardo Neri, Ravi V . Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J . Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Valentin Schneider, Ionela Voinescu, x86,
	linux-kernel, Shrikanth Hegde, Srikar Dronamraju, naveen.n.rao,
	Yicong Yang, Barry Song, Ricardo Neri

On Thu, 4 May 2023 at 18:11, Tim Chen <tim.c.chen@linux.intel.com> wrote:
>
> From: Tim C Chen <tim.c.chen@linux.intel.com>
>
> In the busiest group, we need to consider whether active load balance
> to a local group is needed even when it is not overloaded.  For example,
> when the busiest group is a SMT group that's fully busy and the destination group
> is a cluster group with idle CPU.  Such condition is considered by
> asym_active_balance() in load balancing but not when looking for busiest
> group and load imbalance.  Add this consideration in find_busiest_group()
> and calculate_imbalance().
>
> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>

Could you have a look at what we did with misfit ?

we already have a SD_ASYM_CPUCAPACITY when  busiest->group_type ==
group_misfit_task. Misfit_task is between overloaded and ahs_spare to
handle such situation. Please detect this during the update of
statistic and tag the group correctly. We will not re-start to add
exception everywhere.

> ---
>  kernel/sched/fair.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 87317634fab2..bde962aa160a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9433,6 +9433,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>                                 sgs->group_capacity;
>  }
>
> +/* One group is SMT while the other group is not */
> +static inline bool asymmetric_groups(struct sched_group *sg1,
> +                                   struct sched_group *sg2)
> +{
> +       if (!sg1 || !sg2)
> +               return false;
> +
> +       return (sg1->flags & SD_SHARE_CPUCAPACITY) !=
> +               (sg2->flags & SD_SHARE_CPUCAPACITY);
> +}
> +
>  /**
>   * update_sd_pick_busiest - return 1 on busiest group
>   * @env: The load balancing environment.
> @@ -10079,6 +10090,31 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
>         update_idle_cpu_scan(env, sum_util);
>  }
>
> +static inline bool asym_active_balance_busiest(struct lb_env *env, struct sd_lb_stats *sds)
> +{
> +       /*
> +        * Don't balance to a group without spare capacity.
> +        *
> +        * Skip non asymmetric sched group balancing. That check
> +        * is handled by code path handling imbalanced load between
> +        * similar groups.
> +        */
> +       if (env->idle == CPU_NOT_IDLE ||
> +           sds->local_stat.group_type != group_has_spare ||
> +           !asymmetric_groups(sds->local, sds->busiest))
> +               return false;
> +
> +       /*
> +        * For SMT source group, pull when there are two or more
> +        * tasks over-utilizing a core.
> +        */
> +       if (sds->busiest->flags & SD_SHARE_CPUCAPACITY &&
> +           sds->busiest_stat.sum_h_nr_running > 1)
> +               return true;
> +
> +       return false;
> +}
> +
>  /**
>   * calculate_imbalance - Calculate the amount of imbalance present within the
>   *                      groups of a given sched_domain during load balance.
> @@ -10164,6 +10200,12 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>                         return;
>                 }
>
> +               if (asym_active_balance_busiest(env, sds)) {
> +                       env->migration_type = migrate_task;
> +                       env->imbalance = 1;
> +                       return;
> +               }
> +
>                 if (busiest->group_weight == 1 || sds->prefer_sibling) {
>                         unsigned int nr_diff = busiest->sum_nr_running;
>                         /*
> @@ -10371,6 +10413,9 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
>                          */
>                         goto out_balanced;
>
> +               if (asym_active_balance_busiest(env, &sds))
> +                       goto force_balance;
> +
>                 if (busiest->group_weight > 1 &&
>                     local->idle_cpus <= (busiest->idle_cpus + 1))
>                         /*
> --
> 2.32.0
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] sched/fair: Skip prefer sibling move between SMT group and non-SMT group
  2023-05-04 16:09 ` [PATCH 4/6] sched/fair: Skip prefer sibling move between SMT group and non-SMT group Tim Chen
  2023-05-05 13:22   ` Peter Zijlstra
@ 2023-05-09 13:36   ` Vincent Guittot
  2023-05-09 23:35     ` Tim Chen
  1 sibling, 1 reply; 22+ messages in thread
From: Vincent Guittot @ 2023-05-09 13:36 UTC (permalink / raw)
  To: Tim Chen
  Cc: Peter Zijlstra, Juri Lelli, Ricardo Neri, Ravi V . Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J . Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Valentin Schneider, Ionela Voinescu, x86,
	linux-kernel, Shrikanth Hegde, Srikar Dronamraju, naveen.n.rao,
	Yicong Yang, Barry Song, Ricardo Neri

On Thu, 4 May 2023 at 18:11, Tim Chen <tim.c.chen@linux.intel.com> wrote:
>
> From: Tim C Chen <tim.c.chen@linux.intel.com>
>
> Do not try to move tasks between non SMT sched group and SMT sched
> group for "prefer sibling" load balance.
> Let asym_active_balance_busiest() handle that case properly.
> Otherwise we could get task bouncing back and forth between
> the SMT sched group and non SMT sched group.
>
> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  kernel/sched/fair.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a325db34b02..58ef7d529731 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10411,8 +10411,12 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
>         /*
>          * Try to move all excess tasks to a sibling domain of the busiest
>          * group's child domain.
> +        *
> +        * Do not try to move between non smt sched group and smt sched
> +        * group. Let asym active balance properly handle that case.
>          */
>         if (sds.prefer_sibling && local->group_type == group_has_spare &&
> +           !asymmetric_groups(sds.busiest, sds.local) &&

Can't you delete SD_PREFER_SIBLING flags when building topology like
SD_ASYM_CPUCAPACITY does ?

Generally speaking  SD_ASYM_CPUCAPACITY and SD_ASYM_PACKING are doing
quite similar thing, it would be good to get one common solution
instead 2 parallel paths

>             busiest->sum_nr_running > local->sum_nr_running + 1)
>                 goto force_balance;
>
> --
> 2.32.0
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] sched/fair: Skip prefer sibling move between SMT group and non-SMT group
  2023-05-09 13:36   ` Vincent Guittot
@ 2023-05-09 23:35     ` Tim Chen
  0 siblings, 0 replies; 22+ messages in thread
From: Tim Chen @ 2023-05-09 23:35 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Juri Lelli, Ricardo Neri, Ravi V . Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J . Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Valentin Schneider, Ionela Voinescu, x86,
	linux-kernel, Shrikanth Hegde, Srikar Dronamraju, naveen.n.rao,
	Yicong Yang, Barry Song, Ricardo Neri

On Tue, 2023-05-09 at 15:36 +0200, Vincent Guittot wrote:
> On Thu, 4 May 2023 at 18:11, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > 
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 8a325db34b02..58ef7d529731 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10411,8 +10411,12 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> >         /*
> >          * Try to move all excess tasks to a sibling domain of the busiest
> >          * group's child domain.
> > +        *
> > +        * Do not try to move between non smt sched group and smt sched
> > +        * group. Let asym active balance properly handle that case.
> >          */
> >         if (sds.prefer_sibling && local->group_type == group_has_spare &&
> > +           !asymmetric_groups(sds.busiest, sds.local) &&
> 
> Can't you delete SD_PREFER_SIBLING flags when building topology like
> SD_ASYM_CPUCAPACITY does ?

The sched domain actually can have a mixture of sched groups with Atom modules
and sched groups with SMT cores.  When comparing sched group of Atom core cluster
and Atom core cluster, or SMT core with SMT core, I think we do want the prefer sibling logic.
It is only when we are comparing SMT core and Atom core cluster we
want to skip this. Ricardo, please correct me if I am wrong.

> 
> Generally speaking  SD_ASYM_CPUCAPACITY and SD_ASYM_PACKING are doing
> quite similar thing, it would be good to get one common solution
> instead 2 parallel paths

Okay. I'll see what I can do to merge the handling.

Tim


> 
> >             busiest->sum_nr_running > local->sum_nr_running + 1)
> >                 goto force_balance;
> > 
> > --
> > 2.32.0
> > 


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-05-09 23:36 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-04 16:09 [PATCH 0/6] Enable Cluster Scheduling for x86 Hybrid CPUs Tim Chen
2023-05-04 16:09 ` [PATCH 1/6] sched/topology: Propagate SMT flags when removing degenerate domain Tim Chen
2023-05-04 16:09 ` [PATCH 2/6] sched/fair: Check whether active load balance is needed in busiest group Tim Chen
2023-05-05 12:16   ` Peter Zijlstra
2023-05-05 22:29     ` Tim Chen
2023-05-05 23:44       ` Peter Zijlstra
2023-05-09 13:31   ` Vincent Guittot
2023-05-04 16:09 ` [PATCH 3/6] sched/fair: Fix busiest group selection for asym groups Tim Chen
2023-05-05 13:19   ` Peter Zijlstra
2023-05-05 22:36     ` Tim Chen
2023-05-04 16:09 ` [PATCH 4/6] sched/fair: Skip prefer sibling move between SMT group and non-SMT group Tim Chen
2023-05-05 13:22   ` Peter Zijlstra
2023-05-05 23:07     ` Tim Chen
2023-05-05 23:38       ` Peter Zijlstra
2023-05-06  0:08         ` Peter Zijlstra
2023-05-09 13:36   ` Vincent Guittot
2023-05-09 23:35     ` Tim Chen
2023-05-04 16:09 ` [PATCH 5/6] sched/fair: Consider the idle state of the whole core for load balance Tim Chen
2023-05-05 13:23   ` Peter Zijlstra
2023-05-05 22:51     ` Tim Chen
2023-05-04 16:09 ` [PATCH 6/6] sched/x86: Add cluster topology to hybrid CPU Tim Chen
     [not found] ` <20230505071735.4083-1-hdanton@sina.com>
2023-05-05 22:49   ` [PATCH 5/6] sched/fair: Consider the idle state of the whole core for load balance Tim Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).