[PATCH 0/4] sched/fair: Avoid unnecessary migrations within SMT domains

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] sched/fair: Avoid unnecessary migrations within SMT domains
@ 2022-08-25 22:55 Ricardo Neri
  2022-08-25 22:55 ` [PATCH 1/4] sched/fair: Simplify asym_packing logic for SMT sched groups Ricardo Neri
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Ricardo Neri @ 2022-08-25 22:55 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri

Intel processors that support Intel Turbo Boost Max 3.0 use asym_packing
to assign higher priorities to CPUs with higher maximum frequencies. It
artificially assigns, however, a lower priority to the higher-numbered
SMT siblings to ensure that they are used last.

This results in unnecessary task migrations within the SMT domains.

On processors with a mixture of higher-frequency SMT cores and lower-
frequency non-SMT cores (such as Intel hybrid processors), a lower-
priority CPU pulls tasks from the higher-priority cores if more than one
SMT sibling is busy.

Do not use different priorities for each SMT sibling. Instead, tweak the
asym_packing load balancer to recognize SMT cores with more than one
busy sibling and let lower-priority CPUs pull tasks.

Removing these artificial priorities avoids superfluous migrations and
lets lower-priority cores inspect all SMT siblings for the busiest queue.

Thanks and BR,
Ricardo

Ricardo Neri (4):
  sched/fair: Simplify asym_packing logic for SMT sched groups
  sched/fair: Do not disqualify either runqueues of SMT sched groups
  sched/fair: Let lower-priority CPUs do active balancing
  x86/sched: Avoid unnecessary migrations within SMT domains

 arch/x86/kernel/itmt.c | 23 +++++-----------------
 kernel/sched/fair.c    | 44 ++++++++++++++++++++----------------------
 2 files changed, 26 insertions(+), 41 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] sched/fair: Simplify asym_packing logic for SMT sched groups
  2022-08-25 22:55 [PATCH 0/4] sched/fair: Avoid unnecessary migrations within SMT domains Ricardo Neri
@ 2022-08-25 22:55 ` Ricardo Neri
  2022-10-18  9:34   ` Peter Zijlstra
  2022-08-25 22:55 ` [PATCH 2/4] sched/fair: Do not disqualify either runqueues of " Ricardo Neri
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Ricardo Neri @ 2022-08-25 22:55 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

When the destination CPU is an SMT sibling and idle, it can only help the
busiest group if all of its other SMT siblings are also idle. Otherwise,
there is not increase in throughput.

It does not matter whether the busiest group has SMT siblings. Simply
check if there are any tasks running on the local group before proceeding.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/sched/fair.c | 29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 77b2048a9326..91f271ea02d2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8603,12 +8603,10 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
 				    struct sched_group *sg)
 {
 #ifdef CONFIG_SCHED_SMT
-	bool local_is_smt, sg_is_smt;
+	bool local_is_smt;
 	int sg_busy_cpus;
 
 	local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
-	sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
-
 	sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
 
 	if (!local_is_smt) {
@@ -8629,25 +8627,16 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
 		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
 	}
 
-	/* @dst_cpu has SMT siblings. */
-
-	if (sg_is_smt) {
-		int local_busy_cpus = sds->local->group_weight -
-				      sds->local_stat.idle_cpus;
-		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
-
-		if (busy_cpus_delta == 1)
-			return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
-
-		return false;
-	}
-
 	/*
-	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
-	 * up with more than one busy SMT sibling and only pull tasks if there
-	 * are not busy CPUs (i.e., no CPU has running tasks).
+	 * @dst_cpu has SMT siblings. When both @dst_cpu and the busiest core
+	 * have one or more busy siblings, moving tasks between them results
+	 * in the same throughput. Only if all the siblings of @dst_cpu are
+	 * idle throughput can increase.
+	 *
+	 * If the difference in the number of busy CPUs is two or more, let
+	 * find_busiest_group() take care of it.
 	 */
-	if (!sds->local_stat.sum_nr_running)
+	if (sg_busy_cpus == 1 && !sds->local_stat.sum_nr_running)
 		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
 
 	return false;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] sched/fair: Do not disqualify either runqueues of SMT sched groups
  2022-08-25 22:55 [PATCH 0/4] sched/fair: Avoid unnecessary migrations within SMT domains Ricardo Neri
  2022-08-25 22:55 ` [PATCH 1/4] sched/fair: Simplify asym_packing logic for SMT sched groups Ricardo Neri
@ 2022-08-25 22:55 ` Ricardo Neri
  2022-08-25 22:55 ` [PATCH 3/4] sched/fair: Let lower-priority CPUs do active balancing Ricardo Neri
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Ricardo Neri @ 2022-08-25 22:55 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

We may be here because the busiest group is composed of SMT siblings and
more than one is busy.

An idle CPU with lower priority can help the higher-priority busiest
scheduling group by pulling tasks from it. The tasks that remain in the
busiest group will run with higher performance.

This scenario is observed, for instance, on Intel hybrid processors. PCores
have two SMT siblings and have higher priority than the ECores, which do
not have SMT siblings.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>>
---
 kernel/sched/fair.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 91f271ea02d2..810645eb58ed 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9662,10 +9662,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 		    nr_running == 1)
 			continue;
 
-		/* Make sure we only pull tasks from a CPU of lower priority */
+		/*
+		 * Make sure we only pull tasks from a CPU of lower priority.
+		 * Except for scheduling groups composed of SMT siblings.
+		 */
 		if ((env->sd->flags & SD_ASYM_PACKING) &&
 		    sched_asym_prefer(i, env->dst_cpu) &&
-		    nr_running == 1)
+		    nr_running == 1 &&
+		    !(group->flags & SD_SHARE_CPUCAPACITY))
 			continue;
 
 		switch (env->migration_type) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] sched/fair: Let lower-priority CPUs do active balancing
  2022-08-25 22:55 [PATCH 0/4] sched/fair: Avoid unnecessary migrations within SMT domains Ricardo Neri
  2022-08-25 22:55 ` [PATCH 1/4] sched/fair: Simplify asym_packing logic for SMT sched groups Ricardo Neri
  2022-08-25 22:55 ` [PATCH 2/4] sched/fair: Do not disqualify either runqueues of " Ricardo Neri
@ 2022-08-25 22:55 ` Ricardo Neri
  2022-08-25 22:55 ` [PATCH 4/4] x86/sched: Avoid unnecessary migrations within SMT domains Ricardo Neri
  2022-10-18  2:35 ` [PATCH 0/4] sched/fair: " Ricardo Neri
  4 siblings, 0 replies; 10+ messages in thread
From: Ricardo Neri @ 2022-08-25 22:55 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

When more than one SMT siblings of a physical core are busy, an idle CPU
of lower priority can help.

Indicate that the low priority CPU can do active balancing from the high-
priority CPU only if they belong to separate cores.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/sched/fair.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 810645eb58ed..9b608b31080f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9759,9 +9759,14 @@ asym_active_balance(struct lb_env *env)
 	 * ASYM_PACKING needs to force migrate tasks from busy but
 	 * lower priority CPUs in order to pack all tasks in the
 	 * highest priority CPUs.
+	 *
+	 * If the busy CPU has higher priority but is an SMT sibling
+	 * in which other SMT siblings are also busy, a lower-priority
+	 * CPU in a separate core can help.
 	 */
 	return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
-	       sched_asym_prefer(env->dst_cpu, env->src_cpu);
+	       (sched_asym_prefer(env->dst_cpu, env->src_cpu) ||
+		!(env->sd->flags & SD_SHARE_CPUCAPACITY));
 }
 
 static inline bool
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] x86/sched: Avoid unnecessary migrations within SMT domains
  2022-08-25 22:55 [PATCH 0/4] sched/fair: Avoid unnecessary migrations within SMT domains Ricardo Neri
                   ` (2 preceding siblings ...)
  2022-08-25 22:55 ` [PATCH 3/4] sched/fair: Let lower-priority CPUs do active balancing Ricardo Neri
@ 2022-08-25 22:55 ` Ricardo Neri
  2022-10-18  2:35 ` [PATCH 0/4] sched/fair: " Ricardo Neri
  4 siblings, 0 replies; 10+ messages in thread
From: Ricardo Neri @ 2022-08-25 22:55 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Having different priorities for each SMT sibling triggers unnecessary
load balancing towards the higher-priority sibling.

The scheduler now has logic to allow lower-priority CPUs to relieve load
from scheduling groups composed of SMT siblings with more than one busy
sibling.

Hence, it is no longer necessary to give different priorities to each of
the SMT siblings of a physical core.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/itmt.c | 23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index 9ff480e94511..6510883c5e81 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -174,32 +174,19 @@ int arch_asym_cpu_priority(int cpu)
 
 /**
  * sched_set_itmt_core_prio() - Set CPU priority based on ITMT
- * @prio:	Priority of cpu core
- * @core_cpu:	The cpu number associated with the core
+ * @prio:	Priority of @cpu
+ * @cpu:	The CPU number
  *
  * The pstate driver will find out the max boost frequency
  * and call this function to set a priority proportional
- * to the max boost frequency. CPU with higher boost
+ * to the max boost frequency. CPUs with higher boost
  * frequency will receive higher priority.
  *
  * No need to rebuild sched domain after updating
  * the CPU priorities. The sched domains have no
  * dependency on CPU priorities.
  */
-void sched_set_itmt_core_prio(int prio, int core_cpu)
+void sched_set_itmt_core_prio(int prio, int cpu)
 {
-	int cpu, i = 1;
-
-	for_each_cpu(cpu, topology_sibling_cpumask(core_cpu)) {
-		int smt_prio;
-
-		/*
-		 * Ensure that the siblings are moved to the end
-		 * of the priority chain and only used when
-		 * all other high priority cpus are out of capacity.
-		 */
-		smt_prio = prio * smp_num_siblings / (i * i);
-		per_cpu(sched_core_priority, cpu) = smt_prio;
-		i++;
-	}
+	per_cpu(sched_core_priority, cpu) = prio;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] sched/fair: Avoid unnecessary migrations within SMT domains
  2022-08-25 22:55 [PATCH 0/4] sched/fair: Avoid unnecessary migrations within SMT domains Ricardo Neri
                   ` (3 preceding siblings ...)
  2022-08-25 22:55 ` [PATCH 4/4] x86/sched: Avoid unnecessary migrations within SMT domains Ricardo Neri
@ 2022-10-18  2:35 ` Ricardo Neri
  2022-10-18  9:40   ` Peter Zijlstra
  4 siblings, 1 reply; 10+ messages in thread
From: Ricardo Neri @ 2022-10-18  2:35 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel

On Thu, Aug 25, 2022 at 03:55:25PM -0700, Ricardo Neri wrote:
> Intel processors that support Intel Turbo Boost Max 3.0 use asym_packing
> to assign higher priorities to CPUs with higher maximum frequencies. It
> artificially assigns, however, a lower priority to the higher-numbered
> SMT siblings to ensure that they are used last.
> 
> This results in unnecessary task migrations within the SMT domains.
> 
> On processors with a mixture of higher-frequency SMT cores and lower-
> frequency non-SMT cores (such as Intel hybrid processors), a lower-
> priority CPU pulls tasks from the higher-priority cores if more than one
> SMT sibling is busy.
> 
> Do not use different priorities for each SMT sibling. Instead, tweak the
> asym_packing load balancer to recognize SMT cores with more than one
> busy sibling and let lower-priority CPUs pull tasks.
> 
> Removing these artificial priorities avoids superfluous migrations and
> lets lower-priority cores inspect all SMT siblings for the busiest queue.

Hello. I'd like to know if there are any comments on these patches. This
patchset is a requisite for the IPC classes of tasks patchset [1].

Thanks in advance!
Ricardo

[1]. https://lore.kernel.org/lkml/20220909231205.14009-5-ricardo.neri-calderon@linux.intel.com/T/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] sched/fair: Simplify asym_packing logic for SMT sched groups
  2022-08-25 22:55 ` [PATCH 1/4] sched/fair: Simplify asym_packing logic for SMT sched groups Ricardo Neri
@ 2022-10-18  9:34   ` Peter Zijlstra
  2022-10-20  1:25     ` Ricardo Neri
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2022-10-18  9:34 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Thu, Aug 25, 2022 at 03:55:26PM -0700, Ricardo Neri wrote:
> When the destination CPU is an SMT sibling and idle, it can only help the
> busiest group if all of its other SMT siblings are also idle. Otherwise,
> there is not increase in throughput.
> 
> It does not matter whether the busiest group has SMT siblings. Simply
> check if there are any tasks running on the local group before proceeding.

> Reviewed-by: Len Brown <len.brown@intel.com>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  kernel/sched/fair.c | 29 +++++++++--------------------
>  1 file changed, 9 insertions(+), 20 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 77b2048a9326..91f271ea02d2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8603,12 +8603,10 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
>  				    struct sched_group *sg)
>  {
>  #ifdef CONFIG_SCHED_SMT
> -	bool local_is_smt, sg_is_smt;
> +	bool local_is_smt;
>  	int sg_busy_cpus;
>  
>  	local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> -	sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> -
>  	sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
>  
>  	if (!local_is_smt) {
> @@ -8629,25 +8627,16 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
>  		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
>  	}
>  
> -	/* @dst_cpu has SMT siblings. */
> -
> -	if (sg_is_smt) {
> -		int local_busy_cpus = sds->local->group_weight -
> -				      sds->local_stat.idle_cpus;
> -		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> -
> -		if (busy_cpus_delta == 1)
> -			return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> -
> -		return false;
> -	}
> -
>  	/*
> -	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
> -	 * up with more than one busy SMT sibling and only pull tasks if there
> -	 * are not busy CPUs (i.e., no CPU has running tasks).
> +	 * @dst_cpu has SMT siblings. When both @dst_cpu and the busiest core
> +	 * have one or more busy siblings, moving tasks between them results
> +	 * in the same throughput. Only if all the siblings of @dst_cpu are
> +	 * idle throughput can increase.
> +	 *
> +	 * If the difference in the number of busy CPUs is two or more, let
> +	 * find_busiest_group() take care of it.
>  	 */
> -	if (!sds->local_stat.sum_nr_running)
> +	if (sg_busy_cpus == 1 && !sds->local_stat.sum_nr_running)
>  		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
>  

I can't follow this logic; doesn't this hard assume SMT2 at the very
least? The case for Power7 with SMT8 is that SMT1 is faster than SMT2 is
faster than SMT4, only once you have more than 4 threads active it no
longer matters.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] sched/fair: Avoid unnecessary migrations within SMT domains
  2022-10-18  2:35 ` [PATCH 0/4] sched/fair: " Ricardo Neri
@ 2022-10-18  9:40   ` Peter Zijlstra
  2022-10-20  1:38     ` Ricardo Neri
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2022-10-18  9:40 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel

On Mon, Oct 17, 2022 at 07:35:27PM -0700, Ricardo Neri wrote:
> On Thu, Aug 25, 2022 at 03:55:25PM -0700, Ricardo Neri wrote:
> > Intel processors that support Intel Turbo Boost Max 3.0 use asym_packing
> > to assign higher priorities to CPUs with higher maximum frequencies. It
> > artificially assigns, however, a lower priority to the higher-numbered
> > SMT siblings to ensure that they are used last.
> > 
> > This results in unnecessary task migrations within the SMT domains.
> > 
> > On processors with a mixture of higher-frequency SMT cores and lower-
> > frequency non-SMT cores (such as Intel hybrid processors), a lower-
> > priority CPU pulls tasks from the higher-priority cores if more than one
> > SMT sibling is busy.
> > 
> > Do not use different priorities for each SMT sibling. Instead, tweak the
> > asym_packing load balancer to recognize SMT cores with more than one
> > busy sibling and let lower-priority CPUs pull tasks.
> > 
> > Removing these artificial priorities avoids superfluous migrations and
> > lets lower-priority cores inspect all SMT siblings for the busiest queue.
> 
> Hello. I'd like to know if there are any comments on these patches. This
> patchset is a requisite for the IPC classes of tasks patchset [1].

Urgh.. so I'm not liking this, afaict you're sprinkling SMT2
assumptions.

Why can't we make arch_asym_cpu_priority() depend on CPU state? Doesn't
it then magically work?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] sched/fair: Simplify asym_packing logic for SMT sched groups
  2022-10-18  9:34   ` Peter Zijlstra
@ 2022-10-20  1:25     ` Ricardo Neri
  0 siblings, 0 replies; 10+ messages in thread
From: Ricardo Neri @ 2022-10-20  1:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Oct 18, 2022 at 11:34:04AM +0200, Peter Zijlstra wrote:
> On Thu, Aug 25, 2022 at 03:55:26PM -0700, Ricardo Neri wrote:
> > When the destination CPU is an SMT sibling and idle, it can only help the
> > busiest group if all of its other SMT siblings are also idle. Otherwise,
> > there is not increase in throughput.
> > 
> > It does not matter whether the busiest group has SMT siblings. Simply
> > check if there are any tasks running on the local group before proceeding.
> 
> > Reviewed-by: Len Brown <len.brown@intel.com>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  kernel/sched/fair.c | 29 +++++++++--------------------
> >  1 file changed, 9 insertions(+), 20 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 77b2048a9326..91f271ea02d2 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8603,12 +8603,10 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> >  				    struct sched_group *sg)
> >  {
> >  #ifdef CONFIG_SCHED_SMT
> > -	bool local_is_smt, sg_is_smt;
> > +	bool local_is_smt;
> >  	int sg_busy_cpus;
> >  
> >  	local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> > -	sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> > -
> >  	sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> >  
> >  	if (!local_is_smt) {
> > @@ -8629,25 +8627,16 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> >  		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> >  	}
> >  
> > -	/* @dst_cpu has SMT siblings. */
> > -
> > -	if (sg_is_smt) {
> > -		int local_busy_cpus = sds->local->group_weight -
> > -				      sds->local_stat.idle_cpus;
> > -		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> > -
> > -		if (busy_cpus_delta == 1)
> > -			return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > -
> > -		return false;
> > -	}
> > -
> >  	/*
> > -	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
> > -	 * up with more than one busy SMT sibling and only pull tasks if there
> > -	 * are not busy CPUs (i.e., no CPU has running tasks).
> > +	 * @dst_cpu has SMT siblings. When both @dst_cpu and the busiest core
> > +	 * have one or more busy siblings, moving tasks between them results
> > +	 * in the same throughput. Only if all the siblings of @dst_cpu are
> > +	 * idle throughput can increase.
> > +	 *
> > +	 * If the difference in the number of busy CPUs is two or more, let
> > +	 * find_busiest_group() take care of it.
> >  	 */
> > -	if (!sds->local_stat.sum_nr_running)
> > +	if (sg_busy_cpus == 1 && !sds->local_stat.sum_nr_running)
> >  		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> >  
> 
> I can't follow this logic; doesn't this hard assume SMT2 at the very
> least? The case for Power7 with SMT8 is that SMT1 is faster than SMT2 is
> faster than SMT4, only once you have more than 4 threads active it no
> longer matters.

Now that I look at this again, the comment is correct but it is
implemented at wrong sched domain.

Consider in the "MC" level, two sched groups composed of SMT siblings.
Both sched group have one or more busy siblings and the difference
between busy siblings is exactly one. Pulling tasks to the sched group
with higher asym_prefer_cpu makes no difference in throughput. The
totality of CPU resources continue being share among the same number of
tasks. This is true for both SMT2 and SMT8, no?

Only when all the siblings of an SMT core are idle new CPU resources are
used.

The "MC" domain does not have the flag SD_SHARE_CPUCAPACITY. My patch
should check for this flag in the child domain.

At the "SMT" domain, yes, I agree that siblings with different priorities
should be accommodated.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] sched/fair: Avoid unnecessary migrations within SMT domains
  2022-10-18  9:40   ` Peter Zijlstra
@ 2022-10-20  1:38     ` Ricardo Neri
  0 siblings, 0 replies; 10+ messages in thread
From: Ricardo Neri @ 2022-10-20  1:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel

On Tue, Oct 18, 2022 at 11:40:11AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 17, 2022 at 07:35:27PM -0700, Ricardo Neri wrote:
> > On Thu, Aug 25, 2022 at 03:55:25PM -0700, Ricardo Neri wrote:
> > > Intel processors that support Intel Turbo Boost Max 3.0 use asym_packing
> > > to assign higher priorities to CPUs with higher maximum frequencies. It
> > > artificially assigns, however, a lower priority to the higher-numbered
> > > SMT siblings to ensure that they are used last.
> > > 
> > > This results in unnecessary task migrations within the SMT domains.
> > > 
> > > On processors with a mixture of higher-frequency SMT cores and lower-
> > > frequency non-SMT cores (such as Intel hybrid processors), a lower-
> > > priority CPU pulls tasks from the higher-priority cores if more than one
> > > SMT sibling is busy.
> > > 
> > > Do not use different priorities for each SMT sibling. Instead, tweak the
> > > asym_packing load balancer to recognize SMT cores with more than one
> > > busy sibling and let lower-priority CPUs pull tasks.
> > > 
> > > Removing these artificial priorities avoids superfluous migrations and
> > > lets lower-priority cores inspect all SMT siblings for the busiest queue.
> > 
> > Hello. I'd like to know if there are any comments on these patches. This
> > patchset is a requisite for the IPC classes of tasks patchset [1].

Thank you very much for your feedback Peter!

> 
> Urgh.. so I'm not liking this, afaict you're sprinkling SMT2
> assumptions.

I was careful to not introduce this assumption. I always look for one or
more busy SMT siblings. The goal of the series use the same priority for
all SMT siblings when possible (in x86 but not in Power7) so that lower-
priority sched groups can pull tasks from either siblings when needed (as
in [1]).

> 
> Why can't we make arch_asym_cpu_priority() depend on CPU state? Doesn't
> it then magically work?

That is an interesting idea. I will experiment with it.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-10-20  1:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25 22:55 [PATCH 0/4] sched/fair: Avoid unnecessary migrations within SMT domains Ricardo Neri
2022-08-25 22:55 ` [PATCH 1/4] sched/fair: Simplify asym_packing logic for SMT sched groups Ricardo Neri
2022-10-18  9:34   ` Peter Zijlstra
2022-10-20  1:25     ` Ricardo Neri
2022-08-25 22:55 ` [PATCH 2/4] sched/fair: Do not disqualify either runqueues of " Ricardo Neri
2022-08-25 22:55 ` [PATCH 3/4] sched/fair: Let lower-priority CPUs do active balancing Ricardo Neri
2022-08-25 22:55 ` [PATCH 4/4] x86/sched: Avoid unnecessary migrations within SMT domains Ricardo Neri
2022-10-18  2:35 ` [PATCH 0/4] sched/fair: " Ricardo Neri
2022-10-18  9:40   ` Peter Zijlstra
2022-10-20  1:38     ` Ricardo Neri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).