All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] x86/sched: Avoid unnecessary migrations within SMT domains
@ 2022-11-22 20:35 Ricardo Neri
  2022-11-22 20:35 ` [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group Ricardo Neri
                   ` (6 more replies)
  0 siblings, 7 replies; 42+ messages in thread
From: Ricardo Neri @ 2022-11-22 20:35 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri

Hi,

This v2 of this patchset. V1 can be found here [1]. In this version I took
the suggestion of Peter to teach arch_asym_cpu_priority() the CPU state.
Also, I reworded the cover letter to explain better the intent.

---

asym_packing load balancing is used to balance load among physical cores
with SMT (e.g., Intel processors that support Intel Turbo Boost Max 3.0 and
hybrid processors) and among SMT siblings of a physical cores (e.g.,
Power7).

The current implementation is sufficient for the latter case as it favors
higher-priority SMT siblings. In the former case, however, we must consider
the fact that the throughput of a CPU degrades if one or more of its SMT
siblings are busy. Hence, a lower-priority CPU that is fully idle is more
desirable than a high-priority CPU with busy SMT siblings. 

To fit the current implementation of asym_packing, x86 artificially assigns
a lower priority to the higher-numbered SMT siblings. In reality, there is
no difference between any of the SMT siblings of a core.

Do not use different priorities for each SMT sibling. Instead, tweak the
asym_packing load balancing logic to consider the idle state of the SMT
siblings of a CPU. 

Removing these artificial priorities avoids superfluous migrations and let
lower-priority cores inspect all SMT siblings for the busiest queue. The
latter is also necessary to support IPC classes of tasks [2], as the
destination CPU will need to inspect the tasks running on CPUs of equal
priority.

This patchset should not break Power7 SMT8. Functionality does not change
for architectures that do not implement the new check_smt parameter of
sched_prefer_asym().

These patches apply cleanly on today's tip tree.

Changes since v1:
 * Tweaked arch_asym_cpu_priority() and sched_asym_prefer() to handle
   the idle state of the SMT siblings of a CPU. (PeterZ)
 * Expose functionality of the scheduler that determines the idle state
   of the SMT siblings of a CPU.
 * Addressed concerns from Peter about SMT2 assumptions and breaking
   Power7.
 * Removed the SD_ASYM_PACKING flag from the "SMT" domain in x86.
 * Reworked x86's arch_asym_cpu_priority() to consider the idle state
   of the SMT siblings of a CPU.

[1]. https://lore.kernel.org/lkml/20220825225529.26465-1-ricardo.neri-calderon@linux.intel.com/
[2]. https://lore.kernel.org/lkml/20220909231205.14009-1-ricardo.neri-calderon@linux.intel.com/

Ricardo Neri (7):
  sched/fair: Generalize asym_packing logic for SMT local sched group
  sched: Prepare sched_asym_prefer() to handle idle state of SMT
    siblings
  sched: Teach arch_asym_cpu_priority() the idle state of SMT siblings
  sched/fair: Introduce sched_smt_siblings_idle()
  x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain
  x86/sched/itmt: Give all SMT siblings of a core the same priority
  x86/sched/itmt: Consider the idle state of SMT siblings

 arch/x86/kernel/itmt.c         | 30 ++++--------
 arch/x86/kernel/smpboot.c      |  2 +-
 include/linux/sched.h          |  2 +
 include/linux/sched/topology.h |  2 +-
 kernel/sched/fair.c            | 90 +++++++++++++++++-----------------
 kernel/sched/sched.h           | 11 +++--
 kernel/sched/topology.c        |  6 ++-
 7 files changed, 72 insertions(+), 71 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
  2022-11-22 20:35 [PATCH v2 0/7] x86/sched: Avoid unnecessary migrations within SMT domains Ricardo Neri
@ 2022-11-22 20:35 ` Ricardo Neri
  2022-12-06 17:22   ` Dietmar Eggemann
  2022-12-22 16:55   ` Valentin Schneider
  2022-11-22 20:35 ` [PATCH v2 2/7] sched: Prepare sched_asym_prefer() to handle idle state of SMT siblings Ricardo Neri
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 42+ messages in thread
From: Ricardo Neri @ 2022-11-22 20:35 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

When balancing load between two physical cores, an idle destination CPU can
help another core only if all of its SMT siblings are also idle. Otherwise,
there is not increase in throughput. It does not matter whether the other
core is composed of SMT siblings.

Simply check if there are any tasks running on the local group and the
other core has exactly one busy CPU before proceeding. Let
find_busiest_group() handle the case of more than one busy CPU. This is
true for SMT2, SMT4, SMT8, etc.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Reworded commit message and inline comments for clarity.
 * Stated that this changeset does not impact STM4 or SMT8.
---
 kernel/sched/fair.c | 29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e4a0b8bd941c..18c672ff39ef 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8900,12 +8900,10 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
 				    struct sched_group *sg)
 {
 #ifdef CONFIG_SCHED_SMT
-	bool local_is_smt, sg_is_smt;
+	bool local_is_smt;
 	int sg_busy_cpus;
 
 	local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
-	sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
-
 	sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
 
 	if (!local_is_smt) {
@@ -8926,25 +8924,16 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
 		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
 	}
 
-	/* @dst_cpu has SMT siblings. */
-
-	if (sg_is_smt) {
-		int local_busy_cpus = sds->local->group_weight -
-				      sds->local_stat.idle_cpus;
-		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
-
-		if (busy_cpus_delta == 1)
-			return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
-
-		return false;
-	}
-
 	/*
-	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
-	 * up with more than one busy SMT sibling and only pull tasks if there
-	 * are not busy CPUs (i.e., no CPU has running tasks).
+	 * @dst_cpu has SMT siblings. Do asym_packing load balancing only if
+	 * all its siblings are idle (moving tasks between physical cores in
+	 * which some SMT siblings are busy results in the same throughput).
+	 *
+	 * If the difference in the number of busy CPUs is two or more, let
+	 * find_busiest_group() take care of it. We only care if @sg has
+	 * exactly one busy CPU. This covers SMT and non-SMT sched groups.
 	 */
-	if (!sds->local_stat.sum_nr_running)
+	if (sg_busy_cpus == 1 && !sds->local_stat.sum_nr_running)
 		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
 
 	return false;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 2/7] sched: Prepare sched_asym_prefer() to handle idle state of SMT siblings
  2022-11-22 20:35 [PATCH v2 0/7] x86/sched: Avoid unnecessary migrations within SMT domains Ricardo Neri
  2022-11-22 20:35 ` [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group Ricardo Neri
@ 2022-11-22 20:35 ` Ricardo Neri
  2022-11-22 20:35 ` [PATCH v2 3/7] sched: Teach arch_asym_cpu_priority() the " Ricardo Neri
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 42+ messages in thread
From: Ricardo Neri @ 2022-11-22 20:35 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

The throughput of an SMT sibling decreases if one or more of its siblings
are also busy. Idle, lower-priority cores can help. Thus, it is necessary
to consider the idle state of the SMT siblings of CPUs when selecting by
priority.

In some cases, sched_asym_prefer() does not care about the idle state
(when building sched domains or looking at the priority of the preferred
CPU in a sched group).

Add a new parameter to check the state of the SMT siblings of a CPU when
applicable.

While here, remove a spurious newline.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Introduced this patch
---
 kernel/sched/fair.c     | 17 ++++++++++-------
 kernel/sched/sched.h    |  8 ++++++--
 kernel/sched/topology.c |  6 +++++-
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 18c672ff39ef..d18947a9c03e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8921,7 +8921,7 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
 		 * can help if it has higher priority and is idle (i.e.,
 		 * it has no running tasks).
 		 */
-		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu, false);
 	}
 
 	/*
@@ -8934,7 +8934,7 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
 	 * exactly one busy CPU. This covers SMT and non-SMT sched groups.
 	 */
 	if (sg_busy_cpus == 1 && !sds->local_stat.sum_nr_running)
-		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu, false);
 
 	return false;
 #else
@@ -8952,7 +8952,8 @@ sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs
 	    (group->flags & SD_SHARE_CPUCAPACITY))
 		return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
 
-	return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
+	/* Neither env::dst_cpu nor group::asym_prefer_cpu have SMT siblings. */
+	return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu, false);
 }
 
 static inline bool
@@ -9118,7 +9119,9 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 
 	case group_asym_packing:
 		/* Prefer to move from lowest priority CPU's work */
-		if (sched_asym_prefer(sg->asym_prefer_cpu, sds->busiest->asym_prefer_cpu))
+		if (sched_asym_prefer(sg->asym_prefer_cpu,
+				      sds->busiest->asym_prefer_cpu,
+				      false))
 			return false;
 		break;
 
@@ -10060,7 +10063,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 
 		/* Make sure we only pull tasks from a CPU of lower priority */
 		if ((env->sd->flags & SD_ASYM_PACKING) &&
-		    sched_asym_prefer(i, env->dst_cpu) &&
+		    sched_asym_prefer(i, env->dst_cpu, true) &&
 		    nr_running == 1)
 			continue;
 
@@ -10153,7 +10156,7 @@ asym_active_balance(struct lb_env *env)
 	 * highest priority CPUs.
 	 */
 	return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
-	       sched_asym_prefer(env->dst_cpu, env->src_cpu);
+	       sched_asym_prefer(env->dst_cpu, env->src_cpu, true);
 }
 
 static inline bool
@@ -10889,7 +10892,7 @@ static void nohz_balancer_kick(struct rq *rq)
 		 * around.
 		 */
 		for_each_cpu_and(i, sched_domain_span(sd), nohz.idle_cpus_mask) {
-			if (sched_asym_prefer(i, cpu)) {
+			if (sched_asym_prefer(i, cpu, true)) {
 				flags = NOHZ_STATS_KICK | NOHZ_BALANCE_KICK;
 				goto unlock;
 			}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index a4a20046e586..0fc7c0130755 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -795,8 +795,12 @@ static inline long se_weight(struct sched_entity *se)
 	return scale_load_down(se->load.weight);
 }
 
-
-static inline bool sched_asym_prefer(int a, int b)
+/*
+ * Used to compare specific CPUs. Also when comparing the preferred CPU of a
+ * sched group or building the sched domains; in such cases checking the state
+ * of SMT siblings, if any, is not needed.
+ */
+static inline bool sched_asym_prefer(int a, int b, bool check_smt)
 {
 	return arch_asym_cpu_priority(a) > arch_asym_cpu_priority(b);
 }
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8739c2a5a54e..8154ef590b9f 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1282,7 +1282,11 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
 		for_each_cpu(cpu, sched_group_span(sg)) {
 			if (max_cpu < 0)
 				max_cpu = cpu;
-			else if (sched_asym_prefer(cpu, max_cpu))
+			/*
+			 * We want the CPU priorities unaffected by the idle
+			 * state of its SMT siblings, if any.
+			 */
+			else if (sched_asym_prefer(cpu, max_cpu, false))
 				max_cpu = cpu;
 		}
 		sg->asym_prefer_cpu = max_cpu;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 3/7] sched: Teach arch_asym_cpu_priority() the idle state of SMT siblings
  2022-11-22 20:35 [PATCH v2 0/7] x86/sched: Avoid unnecessary migrations within SMT domains Ricardo Neri
  2022-11-22 20:35 ` [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group Ricardo Neri
  2022-11-22 20:35 ` [PATCH v2 2/7] sched: Prepare sched_asym_prefer() to handle idle state of SMT siblings Ricardo Neri
@ 2022-11-22 20:35 ` Ricardo Neri
  2022-12-06 17:54   ` Dietmar Eggemann
  2022-11-22 20:35 ` [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle() Ricardo Neri
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: Ricardo Neri @ 2022-11-22 20:35 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Some processors (e.g., Intel processors with ITMT) use asym_packing to
balance load between physical cores with SMT. In such case, a core with all
its SMT siblings idle is more desirable than another with one or more busy
siblings.

Other processors (e.g, Power7 with SMT8) use asym_packing to balance load
among SMT siblings of different priority, regardless of their idle state.

Add a new parameter, check_smt, that architectures can use as needed.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Introduced this patch.
---
 arch/x86/kernel/itmt.c         | 2 +-
 include/linux/sched/topology.h | 2 +-
 kernel/sched/fair.c            | 5 ++++-
 kernel/sched/sched.h           | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index 9ff480e94511..4cb5a5e4fa47 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -167,7 +167,7 @@ void sched_clear_itmt_support(void)
 	mutex_unlock(&itmt_update_mutex);
 }
 
-int arch_asym_cpu_priority(int cpu)
+int arch_asym_cpu_priority(int cpu, bool check_smt)
 {
 	return per_cpu(sched_core_priority, cpu);
 }
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 816df6cc444e..87b64b9776f6 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -63,7 +63,7 @@ static inline int cpu_numa_flags(void)
 }
 #endif
 
-extern int arch_asym_cpu_priority(int cpu);
+extern int arch_asym_cpu_priority(int cpu, bool check_smt);
 
 struct sched_domain_attr {
 	int relax_domain_level;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d18947a9c03e..0e4251f83807 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -142,8 +142,11 @@ __setup("sched_thermal_decay_shift=", setup_sched_thermal_decay_shift);
 #ifdef CONFIG_SMP
 /*
  * For asym packing, by default the lower numbered CPU has higher priority.
+ *
+ * When doing ASYM_PACKING at the "MC" or higher domains, architectures may
+ * want to check the idle state of the SMT siblngs of @cpu.
  */
-int __weak arch_asym_cpu_priority(int cpu)
+int __weak arch_asym_cpu_priority(int cpu, bool check_smt)
 {
 	return -cpu;
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0fc7c0130755..e5e52c2e82de 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -802,7 +802,8 @@ static inline long se_weight(struct sched_entity *se)
  */
 static inline bool sched_asym_prefer(int a, int b, bool check_smt)
 {
-	return arch_asym_cpu_priority(a) > arch_asym_cpu_priority(b);
+	return arch_asym_cpu_priority(a, check_smt) >
+	       arch_asym_cpu_priority(b, check_smt);
 }
 
 struct perf_domain {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle()
  2022-11-22 20:35 [PATCH v2 0/7] x86/sched: Avoid unnecessary migrations within SMT domains Ricardo Neri
                   ` (2 preceding siblings ...)
  2022-11-22 20:35 ` [PATCH v2 3/7] sched: Teach arch_asym_cpu_priority() the " Ricardo Neri
@ 2022-11-22 20:35 ` Ricardo Neri
  2022-12-06 18:03   ` Dietmar Eggemann
  2022-12-22 16:56   ` Valentin Schneider
  2022-11-22 20:35 ` [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain Ricardo Neri
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 42+ messages in thread
From: Ricardo Neri @ 2022-11-22 20:35 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Architectures that implement arch_asym_cpu_priority() may need to know the
idle state of the SMT siblings of a CPU. The scheduler has this information
and functionality. Expose it.

Move the existing functionality outside of the NUMA code.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Introduced this patch.
---
 include/linux/sched.h |  2 ++
 kernel/sched/fair.c   | 39 ++++++++++++++++++++++-----------------
 2 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ffb6eb55cd13..0d01c64ac737 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2426,4 +2426,6 @@ static inline void sched_core_fork(struct task_struct *p) { }
 
 extern void sched_set_stop_task(int cpu, struct task_struct *stop);
 
+extern bool sched_smt_siblings_idle(int cpu);
+
 #endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0e4251f83807..9517c48df50e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1052,6 +1052,28 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
  * Scheduling class queueing methods:
  */
 
+static inline bool is_core_idle(int cpu)
+{
+#ifdef CONFIG_SCHED_SMT
+	int sibling;
+
+	for_each_cpu(sibling, cpu_smt_mask(cpu)) {
+		if (cpu == sibling)
+			continue;
+
+		if (!idle_cpu(sibling))
+			return false;
+	}
+#endif
+
+	return true;
+}
+
+bool sched_smt_siblings_idle(int cpu)
+{
+	return is_core_idle(cpu);
+}
+
 #ifdef CONFIG_NUMA
 #define NUMA_IMBALANCE_MIN 2
 
@@ -1691,23 +1713,6 @@ struct numa_stats {
 	int idle_cpu;
 };
 
-static inline bool is_core_idle(int cpu)
-{
-#ifdef CONFIG_SCHED_SMT
-	int sibling;
-
-	for_each_cpu(sibling, cpu_smt_mask(cpu)) {
-		if (cpu == sibling)
-			continue;
-
-		if (!idle_cpu(sibling))
-			return false;
-	}
-#endif
-
-	return true;
-}
-
 struct task_numa_env {
 	struct task_struct *p;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain
  2022-11-22 20:35 [PATCH v2 0/7] x86/sched: Avoid unnecessary migrations within SMT domains Ricardo Neri
                   ` (3 preceding siblings ...)
  2022-11-22 20:35 ` [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle() Ricardo Neri
@ 2022-11-22 20:35 ` Ricardo Neri
  2022-12-08 16:03   ` Ionela Voinescu
  2022-11-22 20:35 ` [PATCH v2 6/7] x86/sched/itmt: Give all SMT siblings of a core the same priority Ricardo Neri
  2022-11-22 20:35 ` [PATCH v2 7/7] x86/sched/itmt: Consider the idle state of SMT siblings Ricardo Neri
  6 siblings, 1 reply; 42+ messages in thread
From: Ricardo Neri @ 2022-11-22 20:35 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

There is no difference between any of the SMT siblings of a physical core.
asym_packing load balancing is not needed among siblings.

When balancing load among physical cores, the scheduler now considers the
state of the siblings when checking the priority of a CPU.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Introduced this patch.
---
 arch/x86/kernel/smpboot.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3f3ea0287f69..c3de98224cb4 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -545,7 +545,7 @@ static int x86_core_flags(void)
 #ifdef CONFIG_SCHED_SMT
 static int x86_smt_flags(void)
 {
-	return cpu_smt_flags() | x86_sched_itmt_flags();
+	return cpu_smt_flags();
 }
 #endif
 #ifdef CONFIG_SCHED_CLUSTER
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 6/7] x86/sched/itmt: Give all SMT siblings of a core the same priority
  2022-11-22 20:35 [PATCH v2 0/7] x86/sched: Avoid unnecessary migrations within SMT domains Ricardo Neri
                   ` (4 preceding siblings ...)
  2022-11-22 20:35 ` [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain Ricardo Neri
@ 2022-11-22 20:35 ` Ricardo Neri
  2022-11-22 20:35 ` [PATCH v2 7/7] x86/sched/itmt: Consider the idle state of SMT siblings Ricardo Neri
  6 siblings, 0 replies; 42+ messages in thread
From: Ricardo Neri @ 2022-11-22 20:35 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Currently, each SMT sibling is given a priority that ensures that in
partially busy systems load is evenly spread among cores.

There is, however, no difference in performance among the SMT siblings of
a core.

Having different priorities for each SMT sibling triggers unnecessary
load balancing towards the higher-priority sibling.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Reworded commit message for clarity.
---
 arch/x86/kernel/itmt.c | 23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index 4cb5a5e4fa47..2be49ce4f94a 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -174,32 +174,19 @@ int arch_asym_cpu_priority(int cpu, bool check_smt)
 
 /**
  * sched_set_itmt_core_prio() - Set CPU priority based on ITMT
- * @prio:	Priority of cpu core
- * @core_cpu:	The cpu number associated with the core
+ * @prio:	Priority of @cpu
+ * @cpu:	The CPU number
  *
  * The pstate driver will find out the max boost frequency
  * and call this function to set a priority proportional
- * to the max boost frequency. CPU with higher boost
+ * to the max boost frequency. CPUs with higher boost
  * frequency will receive higher priority.
  *
  * No need to rebuild sched domain after updating
  * the CPU priorities. The sched domains have no
  * dependency on CPU priorities.
  */
-void sched_set_itmt_core_prio(int prio, int core_cpu)
+void sched_set_itmt_core_prio(int prio, int cpu)
 {
-	int cpu, i = 1;
-
-	for_each_cpu(cpu, topology_sibling_cpumask(core_cpu)) {
-		int smt_prio;
-
-		/*
-		 * Ensure that the siblings are moved to the end
-		 * of the priority chain and only used when
-		 * all other high priority cpus are out of capacity.
-		 */
-		smt_prio = prio * smp_num_siblings / (i * i);
-		per_cpu(sched_core_priority, cpu) = smt_prio;
-		i++;
-	}
+	per_cpu(sched_core_priority, cpu) = prio;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 7/7] x86/sched/itmt: Consider the idle state of SMT siblings
  2022-11-22 20:35 [PATCH v2 0/7] x86/sched: Avoid unnecessary migrations within SMT domains Ricardo Neri
                   ` (5 preceding siblings ...)
  2022-11-22 20:35 ` [PATCH v2 6/7] x86/sched/itmt: Give all SMT siblings of a core the same priority Ricardo Neri
@ 2022-11-22 20:35 ` Ricardo Neri
  6 siblings, 0 replies; 42+ messages in thread
From: Ricardo Neri @ 2022-11-22 20:35 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Cores with more than one busy SMT sibling need help from fully idle cores,
even if they have lower priority. Return a priority of 0 if any of the SMT
siblings of a CPU are busy. This lets lower-priority idle cores pull tasks
from the CPU.

Only do this if the scheduler cares about the idle state of the siblings.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Introduced this patch.
---
 arch/x86/kernel/itmt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index 2be49ce4f94a..cb22944969a1 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -169,7 +169,10 @@ void sched_clear_itmt_support(void)
 
 int arch_asym_cpu_priority(int cpu, bool check_smt)
 {
-	return per_cpu(sched_core_priority, cpu);
+	if (!check_smt || sched_smt_siblings_idle(cpu))
+		return per_cpu(sched_core_priority, cpu);
+
+	return 0;
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
  2022-11-22 20:35 ` [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group Ricardo Neri
@ 2022-12-06 17:22   ` Dietmar Eggemann
  2022-12-12 17:53     ` Ricardo Neri
  2022-12-22 16:55   ` Valentin Schneider
  1 sibling, 1 reply; 42+ messages in thread
From: Dietmar Eggemann @ 2022-12-06 17:22 UTC (permalink / raw)
  To: Ricardo Neri, Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On 22/11/2022 21:35, Ricardo Neri wrote:
> When balancing load between two physical cores, an idle destination CPU can
> help another core only if all of its SMT siblings are also idle. Otherwise,
> there is not increase in throughput. It does not matter whether the other
> core is composed of SMT siblings.
> 
> Simply check if there are any tasks running on the local group and the
> other core has exactly one busy CPU before proceeding. Let
> find_busiest_group() handle the case of more than one busy CPU. This is
> true for SMT2, SMT4, SMT8, etc.

[...]

> Changes since v1:
>  * Reworded commit message and inline comments for clarity.
>  * Stated that this changeset does not impact STM4 or SMT8.
> ---
>  kernel/sched/fair.c | 29 +++++++++--------------------
>  1 file changed, 9 insertions(+), 20 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e4a0b8bd941c..18c672ff39ef 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8900,12 +8900,10 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
>  				    struct sched_group *sg)

I'm not sure why you change asym_smt_can_pull_tasks() together with
removing SD_ASYM_PACKING from SMT level (patch 5/7)?

update_sg_lb_stats()

  ... && env->sd->flags & SD_ASYM_PACKING && .. && sched_asym()
                                                   ^^^^^^^^^^^^
    sched_asym()

      if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
          (group->flags & SD_SHARE_CPUCAPACITY))
        return asym_smt_can_pull_tasks()
               ^^^^^^^^^^^^^^^^^^^^^^^^^

So x86 won't have a sched domain with SD_SHARE_CPUCAPACITY and
SD_ASYM_PACKING anymore. So sched_asym() would call sched_asym_prefer()
directly on MC. What do I miss here?

[...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/7] sched: Teach arch_asym_cpu_priority() the idle state of SMT siblings
  2022-11-22 20:35 ` [PATCH v2 3/7] sched: Teach arch_asym_cpu_priority() the " Ricardo Neri
@ 2022-12-06 17:54   ` Dietmar Eggemann
  2022-12-12 17:54     ` Ricardo Neri
  0 siblings, 1 reply; 42+ messages in thread
From: Dietmar Eggemann @ 2022-12-06 17:54 UTC (permalink / raw)
  To: Ricardo Neri, Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On 22/11/2022 21:35, Ricardo Neri wrote:
> Some processors (e.g., Intel processors with ITMT) use asym_packing to
> balance load between physical cores with SMT. In such case, a core with all
> its SMT siblings idle is more desirable than another with one or more busy
> siblings.
> 
> Other processors (e.g, Power7 with SMT8) use asym_packing to balance load
> among SMT siblings of different priority, regardless of their idle state.
> 
> Add a new parameter, check_smt, that architectures can use as needed.

[...]

> Changes since v1:
>  * Introduced this patch.

[...]

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d18947a9c03e..0e4251f83807 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -142,8 +142,11 @@ __setup("sched_thermal_decay_shift=", setup_sched_thermal_decay_shift);
>  #ifdef CONFIG_SMP
>  /*
>   * For asym packing, by default the lower numbered CPU has higher priority.
> + *
> + * When doing ASYM_PACKING at the "MC" or higher domains, architectures may

There is this new CLUSTER level between SMT and MC.

> + * want to check the idle state of the SMT siblngs of @cpu.

s/siblngs/siblings

The scheduler calls sched_asym_prefer(..., true) from
find_busiest_queue(), asym_active_balance() and nohz_balancer_kick()
even from SMT layer on !x86. So I guess a `bool check_smt` wouldn't be
sufficient to distinguish whether sched_smt_siblings_idle() should be
called or not. To me this comment is a little bit misleading. Not an
issue currently since there is only the x86 overwrite right now.

>   */
> -int __weak arch_asym_cpu_priority(int cpu)
> +int __weak arch_asym_cpu_priority(int cpu, bool check_smt)
>  {
>  	return -cpu;
>  }

[...]


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle()
  2022-11-22 20:35 ` [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle() Ricardo Neri
@ 2022-12-06 18:03   ` Dietmar Eggemann
  2022-12-12 17:54     ` Ricardo Neri
  2022-12-22 16:56   ` Valentin Schneider
  1 sibling, 1 reply; 42+ messages in thread
From: Dietmar Eggemann @ 2022-12-06 18:03 UTC (permalink / raw)
  To: Ricardo Neri, Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On 22/11/2022 21:35, Ricardo Neri wrote:
> Architectures that implement arch_asym_cpu_priority() may need to know the
> idle state of the SMT siblings of a CPU. The scheduler has this information
> and functionality. Expose it.
> 
> Move the existing functionality outside of the NUMA code.

[...]

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0e4251f83807..9517c48df50e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1052,6 +1052,28 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
>   * Scheduling class queueing methods:
>   */
>  
> +static inline bool is_core_idle(int cpu)
> +{
> +#ifdef CONFIG_SCHED_SMT
> +	int sibling;
> +
> +	for_each_cpu(sibling, cpu_smt_mask(cpu)) {
> +		if (cpu == sibling)
> +			continue;
> +
> +		if (!idle_cpu(sibling))
> +			return false;
> +	}
> +#endif
> +
> +	return true;
> +}
> +
> +bool sched_smt_siblings_idle(int cpu)
> +{
> +	return is_core_idle(cpu);
> +}

Nitpick: Can we not just have one exported function for both use-cases:
NUMA and x86 ITMT?

[...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain
  2022-11-22 20:35 ` [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain Ricardo Neri
@ 2022-12-08 16:03   ` Ionela Voinescu
  2022-12-14 16:59     ` Ricardo Neri
  0 siblings, 1 reply; 42+ messages in thread
From: Ionela Voinescu @ 2022-12-08 16:03 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

Hi Ricardo,

On Tuesday 22 Nov 2022 at 12:35:30 (-0800), Ricardo Neri wrote:
> There is no difference between any of the SMT siblings of a physical core.
> asym_packing load balancing is not needed among siblings.
> 
> When balancing load among physical cores, the scheduler now considers the
> state of the siblings when checking the priority of a CPU.
> 
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim C. Chen <tim.c.chen@intel.com>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
> Changes since v1:
>  * Introduced this patch.
> ---
>  arch/x86/kernel/smpboot.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 3f3ea0287f69..c3de98224cb4 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -545,7 +545,7 @@ static int x86_core_flags(void)
>  #ifdef CONFIG_SCHED_SMT
>  static int x86_smt_flags(void)
>  {
> -	return cpu_smt_flags() | x86_sched_itmt_flags();
> +	return cpu_smt_flags();

Based on:

kernel/sched/topology.c:
sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);

and described at:

include/linux/sched/sd_flags.h:
/*
 * Place busy tasks earlier in the domain
 *
 * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
 *               up, but currently assumed to be set from the base domain
 *               upwards (see update_top_cache_domain()).
 * NEEDS_GROUPS: Load balancing flag.
 */
SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)

doesn't your change result in sd_asym_packing being NULL?

The SD_ASYM_PACKING flag requires all children of a domain to have it set
as well. So having SMT not setting the flag, while CLUSTER and MC having
set the flag would result in a broken topology, right?

Thanks,
Ionela.

>  }
>  #endif
>  #ifdef CONFIG_SCHED_CLUSTER
> -- 
> 2.25.1
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
  2022-12-06 17:22   ` Dietmar Eggemann
@ 2022-12-12 17:53     ` Ricardo Neri
  2022-12-21 13:03       ` Dietmar Eggemann
  0 siblings, 1 reply; 42+ messages in thread
From: Ricardo Neri @ 2022-12-12 17:53 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On Tue, Dec 06, 2022 at 06:22:41PM +0100, Dietmar Eggemann wrote:
> On 22/11/2022 21:35, Ricardo Neri wrote:
> > When balancing load between two physical cores, an idle destination CPU can
> > help another core only if all of its SMT siblings are also idle. Otherwise,
> > there is not increase in throughput. It does not matter whether the other
> > core is composed of SMT siblings.
> > 
> > Simply check if there are any tasks running on the local group and the
> > other core has exactly one busy CPU before proceeding. Let
> > find_busiest_group() handle the case of more than one busy CPU. This is
> > true for SMT2, SMT4, SMT8, etc.
> 
> [...]

Thank you very much for your feedback, Dietmar!

> 
> > Changes since v1:
> >  * Reworded commit message and inline comments for clarity.
> >  * Stated that this changeset does not impact STM4 or SMT8.
> > ---
> >  kernel/sched/fair.c | 29 +++++++++--------------------
> >  1 file changed, 9 insertions(+), 20 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index e4a0b8bd941c..18c672ff39ef 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8900,12 +8900,10 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> >  				    struct sched_group *sg)
> 
> I'm not sure why you change asym_smt_can_pull_tasks() together with
> removing SD_ASYM_PACKING from SMT level (patch 5/7)?

In x86 we have SD_ASYM_PACKING at the MC, CLS* and, before my patches, SMT
sched domains.

> 
> update_sg_lb_stats()
> 
>   ... && env->sd->flags & SD_ASYM_PACKING && .. && sched_asym()
>                                                    ^^^^^^^^^^^^
>     sched_asym()
> 
>       if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
>           (group->flags & SD_SHARE_CPUCAPACITY))
>         return asym_smt_can_pull_tasks()
>                ^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> So x86 won't have a sched domain with SD_SHARE_CPUCAPACITY and
> SD_ASYM_PACKING anymore. So sched_asym() would call sched_asym_prefer()
> directly on MC. What do I miss here?

asym_smt_can_pull_tasks() is used above the SMT level *and* when either the
local or sg sched groups are composed of CPUs that are SMT siblings.

In fact, asym_smt_can_pull_tasks() can only be called above the SMT level.
This is because the flags of a sched_group in a sched_domain are equal to
the flags of the child sched_domain. Since SMT is the lowest sched_domain,
its groups' flags are 0.

sched_asym() calls sched_asym_prefer() directly if balancing at the
SMT level and, at higher domains, if the child domain is not SMT.

This meets the requirement of Power7, where SMT siblings have different
priorities; and of x86, where physical cores have different priorities.

Thanks and BR,
Ricardo

* The target of these patches is Intel hybrid processors, on which cluster
  scheduling is disabled - cabdc3a8475b ("sched,x86: Don't use cluster
  topology for x86 hybrid CPUs"). Also, I have not observed topologies in
  which CPUs of the same cluster have different priorities.

  We are also looking into re-enabling cluster scheduling on hybrid
  topologies.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/7] sched: Teach arch_asym_cpu_priority() the idle state of SMT siblings
  2022-12-06 17:54   ` Dietmar Eggemann
@ 2022-12-12 17:54     ` Ricardo Neri
  2022-12-21 17:12       ` Dietmar Eggemann
  0 siblings, 1 reply; 42+ messages in thread
From: Ricardo Neri @ 2022-12-12 17:54 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On Tue, Dec 06, 2022 at 06:54:39PM +0100, Dietmar Eggemann wrote:
> On 22/11/2022 21:35, Ricardo Neri wrote:
> > Some processors (e.g., Intel processors with ITMT) use asym_packing to
> > balance load between physical cores with SMT. In such case, a core with all
> > its SMT siblings idle is more desirable than another with one or more busy
> > siblings.
> > 
> > Other processors (e.g, Power7 with SMT8) use asym_packing to balance load
> > among SMT siblings of different priority, regardless of their idle state.
> > 
> > Add a new parameter, check_smt, that architectures can use as needed.
> 
> [...]
> 
> > Changes since v1:
> >  * Introduced this patch.
> 
> [...]
> 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index d18947a9c03e..0e4251f83807 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -142,8 +142,11 @@ __setup("sched_thermal_decay_shift=", setup_sched_thermal_decay_shift);
> >  #ifdef CONFIG_SMP
> >  /*
> >   * For asym packing, by default the lower numbered CPU has higher priority.
> > + *
> > + * When doing ASYM_PACKING at the "MC" or higher domains, architectures may
> 
> There is this new CLUSTER level between SMT and MC.

This is a good catch! I will update the comment to say "domains above SMT".

> 
> > + * want to check the idle state of the SMT siblngs of @cpu.
> 
> s/siblngs/siblings
> 
> The scheduler calls sched_asym_prefer(..., true) from
> find_busiest_queue(), asym_active_balance() and nohz_balancer_kick()

In these places we are comparing two specific CPUs, of which the idle
state of its siblings impact their throughput and, in consequence, the
decision of attempt to balance load.  

In the places were sched_asym_prefer(...., false) is called we compare the
destination CPU with a CPU that bears the priority of a sched group,
regardless of the idle state of its siblings.

> even from SMT layer on !x86.

This is true, but the default arch_asym_cpu_priority ignores check_smt.

>  So I guess a `bool check_smt` wouldn't be
> sufficient to distinguish whether sched_smt_siblings_idle() should be
> called or not.

But it is the caller who determines whether the idle state of the SMT
siblings of @cpu may be relevant.

> To me this comment is a little bit misleading. Not an
> issue currently since there is only the x86 overwrite right now.

If my justification make sense to you, I can expand the comment to state
that the caller decides whether check_smt is needed but arch-specific
implementations are free to ignore it.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle()
  2022-12-06 18:03   ` Dietmar Eggemann
@ 2022-12-12 17:54     ` Ricardo Neri
  2022-12-22 11:12       ` Dietmar Eggemann
  0 siblings, 1 reply; 42+ messages in thread
From: Ricardo Neri @ 2022-12-12 17:54 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On Tue, Dec 06, 2022 at 07:03:37PM +0100, Dietmar Eggemann wrote:
> On 22/11/2022 21:35, Ricardo Neri wrote:
> > Architectures that implement arch_asym_cpu_priority() may need to know the
> > idle state of the SMT siblings of a CPU. The scheduler has this information
> > and functionality. Expose it.
> > 
> > Move the existing functionality outside of the NUMA code.
> 
> [...]
> 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 0e4251f83807..9517c48df50e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1052,6 +1052,28 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >   * Scheduling class queueing methods:
> >   */
> >  
> > +static inline bool is_core_idle(int cpu)
> > +{
> > +#ifdef CONFIG_SCHED_SMT
> > +	int sibling;
> > +
> > +	for_each_cpu(sibling, cpu_smt_mask(cpu)) {
> > +		if (cpu == sibling)
> > +			continue;
> > +
> > +		if (!idle_cpu(sibling))
> > +			return false;
> > +	}
> > +#endif
> > +
> > +	return true;
> > +}
> > +
> > +bool sched_smt_siblings_idle(int cpu)
> > +{
> > +	return is_core_idle(cpu);
> > +}
> 
> Nitpick: Can we not just have one exported function for both use-cases:
> NUMA and x86 ITMT?

By adding a new function I intend to preserve the inlinig of is_core_idle()
in update_numa_stats() (via numa_idle_core(), which is also inline). Do you
think there is no value?

A downside of having the new function is that now the code is duplicated
in update_numa_stats() and sched_smt_siblings_idle().

I can take your suggestion if losing the inline is OK.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain
  2022-12-08 16:03   ` Ionela Voinescu
@ 2022-12-14 16:59     ` Ricardo Neri
  2022-12-15 16:48       ` Valentin Schneider
  0 siblings, 1 reply; 42+ messages in thread
From: Ricardo Neri @ 2022-12-14 16:59 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Thu, Dec 08, 2022 at 04:03:04PM +0000, Ionela Voinescu wrote:
> Hi Ricardo,

Hi Ionela,

Thank you very much for your feedback!

> 
> On Tuesday 22 Nov 2022 at 12:35:30 (-0800), Ricardo Neri wrote:
> > There is no difference between any of the SMT siblings of a physical core.
> > asym_packing load balancing is not needed among siblings.
> > 
> > When balancing load among physical cores, the scheduler now considers the
> > state of the siblings when checking the priority of a CPU.
> > 
> > Cc: Ben Segall <bsegall@google.com>
> > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Len Brown <len.brown@intel.com>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Tim C. Chen <tim.c.chen@intel.com>
> > Cc: Valentin Schneider <vschneid@redhat.com>
> > Cc: x86@kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> > Changes since v1:
> >  * Introduced this patch.
> > ---
> >  arch/x86/kernel/smpboot.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index 3f3ea0287f69..c3de98224cb4 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -545,7 +545,7 @@ static int x86_core_flags(void)
> >  #ifdef CONFIG_SCHED_SMT
> >  static int x86_smt_flags(void)
> >  {
> > -	return cpu_smt_flags() | x86_sched_itmt_flags();
> > +	return cpu_smt_flags();
> 
> Based on:
> 
> kernel/sched/topology.c:
> sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
> rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);
> 
> and described at:
> 
> include/linux/sched/sd_flags.h:
> /*
>  * Place busy tasks earlier in the domain
>  *
>  * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
>  *               up, but currently assumed to be set from the base domain
>  *               upwards (see update_top_cache_domain()).
>  * NEEDS_GROUPS: Load balancing flag.
>  */
> SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
> 
> doesn't your change result in sd_asym_packing being NULL?

Yes. This is a good catch. Thanks!

> 
> The SD_ASYM_PACKING flag requires all children of a domain to have it set
> as well. So having SMT not setting the flag, while CLUSTER and MC having
> set the flag would result in a broken topology, right?

I'd say that highest_flag_domain(..., flag) requires all children to have
`flag`, but clearly the comment you quote allows for SD_ASYM_PACKING to
be located in upper domains.

Perhaps this can be fixed with a variant of highest_flag_domain() that do
not require all children to have the flag?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain
  2022-12-14 16:59     ` Ricardo Neri
@ 2022-12-15 16:48       ` Valentin Schneider
  2022-12-20  0:42         ` Ricardo Neri
  0 siblings, 1 reply; 42+ messages in thread
From: Valentin Schneider @ 2022-12-15 16:48 UTC (permalink / raw)
  To: Ricardo Neri, Ionela Voinescu
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On 14/12/22 08:59, Ricardo Neri wrote:
> On Thu, Dec 08, 2022 at 04:03:04PM +0000, Ionela Voinescu wrote:
>> Based on:
>>
>> kernel/sched/topology.c:
>> sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
>> rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);
>>
>> and described at:
>>
>> include/linux/sched/sd_flags.h:
>> /*
>>  * Place busy tasks earlier in the domain
>>  *
>>  * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
>>  *               up, but currently assumed to be set from the base domain
>>  *               upwards (see update_top_cache_domain()).
>>  * NEEDS_GROUPS: Load balancing flag.
>>  */
>> SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
>>
>> doesn't your change result in sd_asym_packing being NULL?
>
> Yes. This is a good catch. Thanks!
>

Nice to see those being useful :-) FYI if you run your kernel with
CONFIG_SCHED_DEBUG=y and sched_debug on the cmdline, you should get a
warning at boot time from the topology debug code checking assertions
against those flags.

>>
>> The SD_ASYM_PACKING flag requires all children of a domain to have it set
>> as well. So having SMT not setting the flag, while CLUSTER and MC having
>> set the flag would result in a broken topology, right?
>
> I'd say that highest_flag_domain(..., flag) requires all children to have
> `flag`, but clearly the comment you quote allows for SD_ASYM_PACKING to
> be located in upper domains.
>
> Perhaps this can be fixed with a variant of highest_flag_domain() that do
> not require all children to have the flag?
>

So I gave that flag SDF_SHARED_CHILD because its cached SD pointer was set
up using highest_flag_domain(). Looking for the highest level where it is
set matches how it is used in nohz_balancer_kick(), so you might want a new
helper.

With that said, so far all but one flag (SD_PREFER_SIBLING, and that's
because of big.LITTLE woes) follow the SDF_SHARED_{CHILD, PARENT} pattern,
if SD_ASYM_PACKING no longer does then we need to think whether we're
trying to make it do funky things. I need to look at the rest of your
series to get an idea, that unfortunately won't be today but it's now in my
todolist.

> Thanks and BR,
> Ricardo


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain
  2022-12-15 16:48       ` Valentin Schneider
@ 2022-12-20  0:42         ` Ricardo Neri
  2022-12-22 16:56           ` Valentin Schneider
  0 siblings, 1 reply; 42+ messages in thread
From: Ricardo Neri @ 2022-12-20  0:42 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ionela Voinescu, Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On Thu, Dec 15, 2022 at 04:48:14PM +0000, Valentin Schneider wrote:
> On 14/12/22 08:59, Ricardo Neri wrote:
> > On Thu, Dec 08, 2022 at 04:03:04PM +0000, Ionela Voinescu wrote:
> >> Based on:
> >>
> >> kernel/sched/topology.c:
> >> sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
> >> rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);
> >>
> >> and described at:
> >>
> >> include/linux/sched/sd_flags.h:
> >> /*
> >>  * Place busy tasks earlier in the domain
> >>  *
> >>  * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
> >>  *               up, but currently assumed to be set from the base domain
> >>  *               upwards (see update_top_cache_domain()).
> >>  * NEEDS_GROUPS: Load balancing flag.
> >>  */
> >> SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
> >>
> >> doesn't your change result in sd_asym_packing being NULL?
> >
> > Yes. This is a good catch. Thanks!
> >
> 
> Nice to see those being useful :-) FYI if you run your kernel with
> CONFIG_SCHED_DEBUG=y and sched_debug on the cmdline, you should get a
> warning at boot time from the topology debug code checking assertions
> against those flags.

Thanks! I missed this warning. Indeed, it is there.
> 
> >>
> >> The SD_ASYM_PACKING flag requires all children of a domain to have it set
> >> as well. So having SMT not setting the flag, while CLUSTER and MC having
> >> set the flag would result in a broken topology, right?
> >
> > I'd say that highest_flag_domain(..., flag) requires all children to have
> > `flag`, but clearly the comment you quote allows for SD_ASYM_PACKING to
> > be located in upper domains.
> >
> > Perhaps this can be fixed with a variant of highest_flag_domain() that do
> > not require all children to have the flag?
> >
> 
> So I gave that flag SDF_SHARED_CHILD because its cached SD pointer was set
> up using highest_flag_domain(). Looking for the highest level where it is
> set matches how it is used in nohz_balancer_kick(), so you might want a new
> helper.

Right. I was thinking on a highest_flag_domain_weak() or a changing
highest_flag_domain(..., bool shared_child).

> 
> With that said, so far all but one flag (SD_PREFER_SIBLING, and that's
> because of big.LITTLE woes) follow the SDF_SHARED_{CHILD, PARENT} pattern,
> if SD_ASYM_PACKING no longer does then we need to think whether we're
> trying to make it do funky things.

My thesis is that x86 does not need the SD_ASYM_PACKING flag at the SMT
level because all SMT siblings are identical. There are cores of higher
priority at the "MC" level (maybe in the future at the "CLS" level).

Power7 is fine because it only uses SD_ASYM_PACKING at the SMT level.

> I need to look at the rest of your
> series to get an idea, that unfortunately won't be today but it's now in my
> todolist.

Thank you!

BR,
Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
  2022-12-12 17:53     ` Ricardo Neri
@ 2022-12-21 13:03       ` Dietmar Eggemann
  2022-12-22  4:32         ` Ricardo Neri
  0 siblings, 1 reply; 42+ messages in thread
From: Dietmar Eggemann @ 2022-12-21 13:03 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On 12/12/2022 18:53, Ricardo Neri wrote:
> On Tue, Dec 06, 2022 at 06:22:41PM +0100, Dietmar Eggemann wrote:
>> On 22/11/2022 21:35, Ricardo Neri wrote:

[...]

>> I'm not sure why you change asym_smt_can_pull_tasks() together with
>> removing SD_ASYM_PACKING from SMT level (patch 5/7)?
> 
> In x86 we have SD_ASYM_PACKING at the MC, CLS* and, before my patches, SMT
> sched domains.
> 
>>
>> update_sg_lb_stats()
>>
>>   ... && env->sd->flags & SD_ASYM_PACKING && .. && sched_asym()
>>                                                    ^^^^^^^^^^^^
>>     sched_asym()
>>
>>       if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
>>           (group->flags & SD_SHARE_CPUCAPACITY))
>>         return asym_smt_can_pull_tasks()
>>                ^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> So x86 won't have a sched domain with SD_SHARE_CPUCAPACITY and
>> SD_ASYM_PACKING anymore. So sched_asym() would call sched_asym_prefer()
>> directly on MC. What do I miss here?
> 
> asym_smt_can_pull_tasks() is used above the SMT level *and* when either the
> local or sg sched groups are composed of CPUs that are SMT siblings.

OK.

> In fact, asym_smt_can_pull_tasks() can only be called above the SMT level.
> This is because the flags of a sched_group in a sched_domain are equal to
> the flags of the child sched_domain. Since SMT is the lowest sched_domain,
> its groups' flags are 0.

I see. I forgot about `[PATCH v5 0/6] sched/fair: Fix load balancing of
SMT siblings with ASYM_PACKING` from Sept 21 (specifically [PATCH v5
2/6] sched/topology: Introduce sched_group::flags).

> sched_asym() calls sched_asym_prefer() directly if balancing at the
> SMT level and, at higher domains, if the child domain is not SMT.

OK.

> This meets the requirement of Power7, where SMT siblings have different
> priorities; and of x86, where physical cores have different priorities.
> 
> Thanks and BR,
> Ricardo
> 
> * The target of these patches is Intel hybrid processors, on which cluster
>   scheduling is disabled - cabdc3a8475b ("sched,x86: Don't use cluster
>   topology for x86 hybrid CPUs"). Also, I have not observed topologies in
>   which CPUs of the same cluster have different priorities.

OK.

IMHO, the function header of asym_smt_can_pull_tasks() (3rd and 4th
paragraph ...  `If both @dst_cpu and @sg have SMT siblings` and `If @sg
does not have SMT siblings` still reflect the old code layout.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/7] sched: Teach arch_asym_cpu_priority() the idle state of SMT siblings
  2022-12-12 17:54     ` Ricardo Neri
@ 2022-12-21 17:12       ` Dietmar Eggemann
  2022-12-22  4:55         ` Ricardo Neri
  0 siblings, 1 reply; 42+ messages in thread
From: Dietmar Eggemann @ 2022-12-21 17:12 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On 12/12/2022 18:54, Ricardo Neri wrote:
> On Tue, Dec 06, 2022 at 06:54:39PM +0100, Dietmar Eggemann wrote:
>> On 22/11/2022 21:35, Ricardo Neri wrote:

[...]

>>> + * want to check the idle state of the SMT siblngs of @cpu.
>>
>> s/siblngs/siblings
>>
>> The scheduler calls sched_asym_prefer(..., true) from
>> find_busiest_queue(), asym_active_balance() and nohz_balancer_kick()
> 
> In these places we are comparing two specific CPUs, of which the idle
> state of its siblings impact their throughput and, in consequence, the
> decision of attempt to balance load.  
> 
> In the places were sched_asym_prefer(...., false) is called we compare the
> destination CPU with a CPU that bears the priority of a sched group,
> regardless of the idle state of its siblings.

OK.

>> even from SMT layer on !x86.
> 
> This is true, but the default arch_asym_cpu_priority ignores check_smt.

True.

>>  So I guess a `bool check_smt` wouldn't be
>> sufficient to distinguish whether sched_smt_siblings_idle() should be
>> called or not.
> 
> But it is the caller who determines whether the idle state of the SMT
> siblings of @cpu may be relevant.

I assume caller being the task scheduler here. Callers with
`check_smt=true` can be called from any SD level with SD_ASYM_PACKING.

Imagine an arch w/ SD_ASYM_PACKING on SMT & MC overwriting
arch_asym_cpu_priority(). `bool check_smt` wouldn't be sufficient to
know whether a call to something like sched_smt_siblings_idle()
(is_core_idle()) which iterates over cpu_smt_mask(cpu) would make sense.

>> To me this comment is a little bit misleading. Not an
>> issue currently since there is only the x86 overwrite right now.
> 
> If my justification make sense to you, I can expand the comment to state
> that the caller decides whether check_smt is needed but arch-specific
> implementations are free to ignore it.

Not a big issue but to me if the task scheduler asks for `bool
check_smt` then archs would have to check to guarantee common behaviour.
And the meaning of `bool check_smt` on SMT is unclear to me.
Since only x86 would use this so far it can be adapted later for others
if needed.

[...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
  2022-12-21 13:03       ` Dietmar Eggemann
@ 2022-12-22  4:32         ` Ricardo Neri
  2022-12-22 11:12           ` Dietmar Eggemann
  0 siblings, 1 reply; 42+ messages in thread
From: Ricardo Neri @ 2022-12-22  4:32 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On Wed, Dec 21, 2022 at 02:03:15PM +0100, Dietmar Eggemann wrote:
> On 12/12/2022 18:53, Ricardo Neri wrote:
> > On Tue, Dec 06, 2022 at 06:22:41PM +0100, Dietmar Eggemann wrote:
> >> On 22/11/2022 21:35, Ricardo Neri wrote:
> 
> [...]
> 
> >> I'm not sure why you change asym_smt_can_pull_tasks() together with
> >> removing SD_ASYM_PACKING from SMT level (patch 5/7)?
> > 
> > In x86 we have SD_ASYM_PACKING at the MC, CLS* and, before my patches, SMT
> > sched domains.
> > 
> >>
> >> update_sg_lb_stats()
> >>
> >>   ... && env->sd->flags & SD_ASYM_PACKING && .. && sched_asym()
> >>                                                    ^^^^^^^^^^^^
> >>     sched_asym()
> >>
> >>       if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> >>           (group->flags & SD_SHARE_CPUCAPACITY))
> >>         return asym_smt_can_pull_tasks()
> >>                ^^^^^^^^^^^^^^^^^^^^^^^^^
> >>
> >> So x86 won't have a sched domain with SD_SHARE_CPUCAPACITY and
> >> SD_ASYM_PACKING anymore. So sched_asym() would call sched_asym_prefer()
> >> directly on MC. What do I miss here?
> > 
> > asym_smt_can_pull_tasks() is used above the SMT level *and* when either the
> > local or sg sched groups are composed of CPUs that are SMT siblings.
> 
> OK.
> 
> > In fact, asym_smt_can_pull_tasks() can only be called above the SMT level.
> > This is because the flags of a sched_group in a sched_domain are equal to
> > the flags of the child sched_domain. Since SMT is the lowest sched_domain,
> > its groups' flags are 0.
> 
> I see. I forgot about `[PATCH v5 0/6] sched/fair: Fix load balancing of
> SMT siblings with ASYM_PACKING` from Sept 21 (specifically [PATCH v5
> 2/6] sched/topology: Introduce sched_group::flags).
> 
> > sched_asym() calls sched_asym_prefer() directly if balancing at the
> > SMT level and, at higher domains, if the child domain is not SMT.
> 
> OK.
> 
> > This meets the requirement of Power7, where SMT siblings have different
> > priorities; and of x86, where physical cores have different priorities.
> > 
> > Thanks and BR,
> > Ricardo
> > 
> > * The target of these patches is Intel hybrid processors, on which cluster
> >   scheduling is disabled - cabdc3a8475b ("sched,x86: Don't use cluster
> >   topology for x86 hybrid CPUs"). Also, I have not observed topologies in
> >   which CPUs of the same cluster have different priorities.
> 
> OK.
> 
> IMHO, the function header of asym_smt_can_pull_tasks() (3rd and 4th
> paragraph ...  `If both @dst_cpu and @sg have SMT siblings` and

Agreed. I changed the behavior of the function. I will update the
description.

>`If @sg does not have SMT siblings` still reflect the old code layout.

But this behavior did not change. The check covers both SMT and non-SMT
cases:

	 /*
	  * non-SMT @sg can only have 1 busy CPU. We only care SMT @sg
	  * has exactly one busy sibling
	  */
	if (sg_busy_cpus == 1 && 
	    /* local group is fully idle, SMT and non-SMT. */
	    !sds->local_stat.sum_nr_running)

Perhaps I can collapse the two paragraphs into one.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/7] sched: Teach arch_asym_cpu_priority() the idle state of SMT siblings
  2022-12-21 17:12       ` Dietmar Eggemann
@ 2022-12-22  4:55         ` Ricardo Neri
  2022-12-22 16:56           ` Valentin Schneider
  0 siblings, 1 reply; 42+ messages in thread
From: Ricardo Neri @ 2022-12-22  4:55 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On Wed, Dec 21, 2022 at 06:12:52PM +0100, Dietmar Eggemann wrote:
> On 12/12/2022 18:54, Ricardo Neri wrote:
> > On Tue, Dec 06, 2022 at 06:54:39PM +0100, Dietmar Eggemann wrote:
> >> On 22/11/2022 21:35, Ricardo Neri wrote:
> 
> [...]
> 
> >>> + * want to check the idle state of the SMT siblngs of @cpu.
> >>
> >> s/siblngs/siblings
> >>
> >> The scheduler calls sched_asym_prefer(..., true) from
> >> find_busiest_queue(), asym_active_balance() and nohz_balancer_kick()
> > 
> > In these places we are comparing two specific CPUs, of which the idle
> > state of its siblings impact their throughput and, in consequence, the
> > decision of attempt to balance load.  
> > 
> > In the places were sched_asym_prefer(...., false) is called we compare the
> > destination CPU with a CPU that bears the priority of a sched group,
> > regardless of the idle state of its siblings.
> 
> OK.
> 
> >> even from SMT layer on !x86.
> > 
> > This is true, but the default arch_asym_cpu_priority ignores check_smt.
> 
> True.
> 
> >>  So I guess a `bool check_smt` wouldn't be
> >> sufficient to distinguish whether sched_smt_siblings_idle() should be
> >> called or not.
> > 
> > But it is the caller who determines whether the idle state of the SMT
> > siblings of @cpu may be relevant.
> 
> I assume caller being the task scheduler here.

Yes.

> Callers with `check_smt=true` can be called from any SD level with
> SD_ASYM_PACKING.

This is true.

> 
> Imagine an arch w/ SD_ASYM_PACKING on SMT & MC overwriting
> arch_asym_cpu_priority(). `bool check_smt` wouldn't be sufficient to
> know whether a call to something like sched_smt_siblings_idle()
> (is_core_idle()) which iterates over cpu_smt_mask(cpu) would make sense.

Agreed. I was hoping I could get away with this. x86 would not have the
the SD_ASYM_PACKING flag at the SMT level and Power7 would ignore
`check_smt`. :)

Callers of sched_asym_prefer() could use the flags of the sched domain to
check if we are at the SMT level.

I rescanned the code again and it looks like the needed sched domain flags
are available in all the places sched_asym_prefer() is called. The only
exception is asym_smt_can_pull_tasks(), but we already know that we don't
need such check. (We are looking for the sched group priority, regardless
of the idle state of the SMT siblings).

> 
> >> To me this comment is a little bit misleading. Not an
> >> issue currently since there is only the x86 overwrite right now.
> > 
> > If my justification make sense to you, I can expand the comment to state
> > that the caller decides whether check_smt is needed but arch-specific
> > implementations are free to ignore it.
> 
> Not a big issue but to me if the task scheduler asks for `bool
> check_smt` then archs would have to check to guarantee common behaviour.
> And the meaning of `bool check_smt` on SMT is unclear to me.
> Since only x86 would use this so far it can be adapted later for others
> if needed.

What is proposed in my previous paragraph should solve this, IMO.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
  2022-12-22  4:32         ` Ricardo Neri
@ 2022-12-22 11:12           ` Dietmar Eggemann
  2022-12-23 13:11             ` Ricardo Neri
  0 siblings, 1 reply; 42+ messages in thread
From: Dietmar Eggemann @ 2022-12-22 11:12 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On 22/12/2022 05:32, Ricardo Neri wrote:
> On Wed, Dec 21, 2022 at 02:03:15PM +0100, Dietmar Eggemann wrote:
>> On 12/12/2022 18:53, Ricardo Neri wrote:
>>> On Tue, Dec 06, 2022 at 06:22:41PM +0100, Dietmar Eggemann wrote:
>>>> On 22/11/2022 21:35, Ricardo Neri wrote:
>>
>> [...]
>>
>>>> I'm not sure why you change asym_smt_can_pull_tasks() together with
>>>> removing SD_ASYM_PACKING from SMT level (patch 5/7)?
>>>
>>> In x86 we have SD_ASYM_PACKING at the MC, CLS* and, before my patches, SMT
>>> sched domains.
>>>
>>>>
>>>> update_sg_lb_stats()
>>>>
>>>>   ... && env->sd->flags & SD_ASYM_PACKING && .. && sched_asym()
>>>>                                                    ^^^^^^^^^^^^
>>>>     sched_asym()
>>>>
>>>>       if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
>>>>           (group->flags & SD_SHARE_CPUCAPACITY))
>>>>         return asym_smt_can_pull_tasks()
>>>>                ^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>
>>>> So x86 won't have a sched domain with SD_SHARE_CPUCAPACITY and
>>>> SD_ASYM_PACKING anymore. So sched_asym() would call sched_asym_prefer()
>>>> directly on MC. What do I miss here?
>>>
>>> asym_smt_can_pull_tasks() is used above the SMT level *and* when either the
>>> local or sg sched groups are composed of CPUs that are SMT siblings.
>>
>> OK.
>>
>>> In fact, asym_smt_can_pull_tasks() can only be called above the SMT level.
>>> This is because the flags of a sched_group in a sched_domain are equal to
>>> the flags of the child sched_domain. Since SMT is the lowest sched_domain,
>>> its groups' flags are 0.
>>
>> I see. I forgot about `[PATCH v5 0/6] sched/fair: Fix load balancing of
>> SMT siblings with ASYM_PACKING` from Sept 21 (specifically [PATCH v5
>> 2/6] sched/topology: Introduce sched_group::flags).
>>
>>> sched_asym() calls sched_asym_prefer() directly if balancing at the
>>> SMT level and, at higher domains, if the child domain is not SMT.
>>
>> OK.
>>
>>> This meets the requirement of Power7, where SMT siblings have different
>>> priorities; and of x86, where physical cores have different priorities.
>>>
>>> Thanks and BR,
>>> Ricardo
>>>
>>> * The target of these patches is Intel hybrid processors, on which cluster
>>>   scheduling is disabled - cabdc3a8475b ("sched,x86: Don't use cluster
>>>   topology for x86 hybrid CPUs"). Also, I have not observed topologies in
>>>   which CPUs of the same cluster have different priorities.
>>
>> OK.
>>
>> IMHO, the function header of asym_smt_can_pull_tasks() (3rd and 4th
>> paragraph ...  `If both @dst_cpu and @sg have SMT siblings` and
> 
> Agreed. I changed the behavior of the function. I will update the
> description.
> 
>> `If @sg does not have SMT siblings` still reflect the old code layout.
> 
> But this behavior did not change. The check covers both SMT and non-SMT
> cases:

The condition to call sched_asym_prefer() seems to have changed slightly
though (including the explanation that busy_cpus_delta >= 2 handling
should be done by fbg().:

sds->local_stat.sum_nr_running (A)
busy_cpus_delta = sg_busy_cpus - local_busy_cpus (B)
sg_busy_cpus = sgs->group_weight - sgs->idle_cpus (C)

From ((smt && B == 1) || (!smt && !A)) to (C == 1 && !A)

> 
> 	 /*
> 	  * non-SMT @sg can only have 1 busy CPU. We only care SMT @sg
> 	  * has exactly one busy sibling
> 	  */
> 	if (sg_busy_cpus == 1 && 
> 	    /* local group is fully idle, SMT and non-SMT. */
> 	    !sds->local_stat.sum_nr_running)
> 
> Perhaps I can collapse the two paragraphs into one.

Sounds good to me.

[...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle()
  2022-12-12 17:54     ` Ricardo Neri
@ 2022-12-22 11:12       ` Dietmar Eggemann
  0 siblings, 0 replies; 42+ messages in thread
From: Dietmar Eggemann @ 2022-12-22 11:12 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On 12/12/2022 18:54, Ricardo Neri wrote:
> On Tue, Dec 06, 2022 at 07:03:37PM +0100, Dietmar Eggemann wrote:
>> On 22/11/2022 21:35, Ricardo Neri wrote:

[...]

>>> +bool sched_smt_siblings_idle(int cpu)
>>> +{
>>> +	return is_core_idle(cpu);
>>> +}
>>
>> Nitpick: Can we not just have one exported function for both use-cases:
>> NUMA and x86 ITMT?
> 
> By adding a new function I intend to preserve the inlinig of is_core_idle()
> in update_numa_stats() (via numa_idle_core(), which is also inline). Do you
> think there is no value?

OK. It's only used in NUMA balancing (task_numa_fault() -> ... ->
update_numa_stats()). I can't see that this will have a noticeable perf
impact but only benchmark can really tell.

A `static inline bool sched_is_core_idle(int cpu)` via
include/linux/sched/topology.h might work? We have similar functions
(like sched_core_cookie_match()` but in the private scheduler header
file kernel/sched/sched.h though.

> A downside of having the new function is that now the code is duplicated
> in update_numa_stats() and sched_smt_siblings_idle().
> 
> I can take your suggestion if losing the inline is OK.

I doubt that it will have an impact but can't be sure.

[...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
  2022-11-22 20:35 ` [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group Ricardo Neri
  2022-12-06 17:22   ` Dietmar Eggemann
@ 2022-12-22 16:55   ` Valentin Schneider
  2022-12-29  4:00     ` Ricardo Neri
  1 sibling, 1 reply; 42+ messages in thread
From: Valentin Schneider @ 2022-12-22 16:55 UTC (permalink / raw)
  To: Ricardo Neri, Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Ricardo Neri,
	Tim C . Chen

On 22/11/22 12:35, Ricardo Neri wrote:
> @@ -8926,25 +8924,16 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
>               return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
>       }
>
> -	/* @dst_cpu has SMT siblings. */
> -
> -	if (sg_is_smt) {
> -		int local_busy_cpus = sds->local->group_weight -
> -				      sds->local_stat.idle_cpus;
> -		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> -
> -		if (busy_cpus_delta == 1)
> -			return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> -
> -		return false;
> -	}
> -
>       /*
> -	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
> -	 * up with more than one busy SMT sibling and only pull tasks if there
> -	 * are not busy CPUs (i.e., no CPU has running tasks).
> +	 * @dst_cpu has SMT siblings. Do asym_packing load balancing only if
> +	 * all its siblings are idle (moving tasks between physical cores in
> +	 * which some SMT siblings are busy results in the same throughput).
> +	 *
> +	 * If the difference in the number of busy CPUs is two or more, let
> +	 * find_busiest_group() take care of it. We only care if @sg has
> +	 * exactly one busy CPU. This covers SMT and non-SMT sched groups.
>        */
> -	if (!sds->local_stat.sum_nr_running)
> +	if (sg_busy_cpus == 1 && !sds->local_stat.sum_nr_running)
>               return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
>

Some of this is new to me - I had missed the original series introducing
this. However ISTM that this is conflating two concepts: SMT occupancy
balancing, and asym packing.

Take the !local_is_smt :: sg_busy_cpus >= 2 :: return true; path. It does
not involve asym packing priorities at all. This can end up in an
ASYM_PACKING load balance, which per calculate_imbalance() tries to move
*all* tasks to the higher priority CPU - in the case of SMT balancing,
we don't want to totally empty the core, just its siblings.

Is there an ITMT/big.LITTLE (or however x86 calls it) case that invalidates
the above?

Say, what's not sufficient with the below? AFAICT the only thing that isn't
covered is the sg_busy_cpus >= 2 thing, but IMO that's SMT balancing, not
asym packing - if the current calculate_imbalance() doesn't do it, it
should be fixed to do it. Looking at the

  local->group_type == group_has_spare

case, it looks like it should DTRT.

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 224107278471f..15eb2d3cff186 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9176,12 +9176,15 @@ static inline bool
 sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
 	   struct sched_group *group)
 {
-	/* Only do SMT checks if either local or candidate have SMT siblings */
-	if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
-	    (group->flags & SD_SHARE_CPUCAPACITY))
-		return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
+	/*
+	 * For SMT, env->idle != CPU_NOT_IDLE isn't sufficient, we need to make
+	 * sure the whole core is idle.
+	 */
+	if (((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
+	     (group->flags & SD_SHARE_CPUCAPACITY)) &&
+	    !test_idle_cores(env->dst_cpu))
+		return false;
 
-	/* Neither env::dst_cpu nor group::asym_prefer_cpu have SMT siblings. */
 	return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu, false);
 }
 
 


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/7] sched: Teach arch_asym_cpu_priority() the idle state of SMT siblings
  2022-12-22  4:55         ` Ricardo Neri
@ 2022-12-22 16:56           ` Valentin Schneider
  0 siblings, 0 replies; 42+ messages in thread
From: Valentin Schneider @ 2022-12-22 16:56 UTC (permalink / raw)
  To: Ricardo Neri, Dietmar Eggemann
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	x86, linux-kernel, Tim C . Chen

On 21/12/22 20:55, Ricardo Neri wrote:
> On Wed, Dec 21, 2022 at 06:12:52PM +0100, Dietmar Eggemann wrote:
>> Imagine an arch w/ SD_ASYM_PACKING on SMT & MC overwriting
>> arch_asym_cpu_priority(). `bool check_smt` wouldn't be sufficient to
>> know whether a call to something like sched_smt_siblings_idle()
>> (is_core_idle()) which iterates over cpu_smt_mask(cpu) would make sense.
>
> Agreed. I was hoping I could get away with this. x86 would not have the
> the SD_ASYM_PACKING flag at the SMT level and Power7 would ignore
> `check_smt`. :)
>
> Callers of sched_asym_prefer() could use the flags of the sched domain to
> check if we are at the SMT level.
>
> I rescanned the code again and it looks like the needed sched domain flags
> are available in all the places sched_asym_prefer() is called. The only
> exception is asym_smt_can_pull_tasks(), but we already know that we don't
> need such check. (We are looking for the sched group priority, regardless
> of the idle state of the SMT siblings).
>

Given this is fed back to arch code, another option here would be to feed
the topology level this is happening at. You get it via sd->level and it
maps back into the arch's sched_domain_topology_level array.

Though the SD flags are probably the more generic solution.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle()
  2022-11-22 20:35 ` [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle() Ricardo Neri
  2022-12-06 18:03   ` Dietmar Eggemann
@ 2022-12-22 16:56   ` Valentin Schneider
  2022-12-24  5:28     ` Ricardo Neri
  1 sibling, 1 reply; 42+ messages in thread
From: Valentin Schneider @ 2022-12-22 16:56 UTC (permalink / raw)
  To: Ricardo Neri, Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Ricardo Neri,
	Tim C . Chen

On 22/11/22 12:35, Ricardo Neri wrote:
> Architectures that implement arch_asym_cpu_priority() may need to know the
> idle state of the SMT siblings of a CPU. The scheduler has this information
> and functionality. Expose it.
>
> Move the existing functionality outside of the NUMA code.
>

test_idle_cores() does something similar without an iteration, did you
consider using that instead?

> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim C. Chen <tim.c.chen@intel.com>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
> Changes since v1:
>  * Introduced this patch.
> ---
>  include/linux/sched.h |  2 ++
>  kernel/sched/fair.c   | 39 ++++++++++++++++++++++-----------------
>  2 files changed, 24 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index ffb6eb55cd13..0d01c64ac737 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2426,4 +2426,6 @@ static inline void sched_core_fork(struct task_struct *p) { }
>  
>  extern void sched_set_stop_task(int cpu, struct task_struct *stop);
>  
> +extern bool sched_smt_siblings_idle(int cpu);
> +
>  #endif
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0e4251f83807..9517c48df50e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1052,6 +1052,28 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
>   * Scheduling class queueing methods:
>   */
>  
> +static inline bool is_core_idle(int cpu)
> +{
> +#ifdef CONFIG_SCHED_SMT
> +	int sibling;
> +
> +	for_each_cpu(sibling, cpu_smt_mask(cpu)) {
> +		if (cpu == sibling)
> +			continue;
> +
> +		if (!idle_cpu(sibling))
> +			return false;
> +	}
> +#endif
> +
> +	return true;
> +}
> +
> +bool sched_smt_siblings_idle(int cpu)
> +{
> +	return is_core_idle(cpu);
> +}
> +
>  #ifdef CONFIG_NUMA
>  #define NUMA_IMBALANCE_MIN 2
>  
> @@ -1691,23 +1713,6 @@ struct numa_stats {
>  	int idle_cpu;
>  };
>  
> -static inline bool is_core_idle(int cpu)
> -{
> -#ifdef CONFIG_SCHED_SMT
> -	int sibling;
> -
> -	for_each_cpu(sibling, cpu_smt_mask(cpu)) {
> -		if (cpu == sibling)
> -			continue;
> -
> -		if (!idle_cpu(sibling))
> -			return false;
> -	}
> -#endif
> -
> -	return true;
> -}
> -
>  struct task_numa_env {
>  	struct task_struct *p;
>  
> -- 
> 2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain
  2022-12-20  0:42         ` Ricardo Neri
@ 2022-12-22 16:56           ` Valentin Schneider
  2022-12-29 19:02             ` Ricardo Neri
  0 siblings, 1 reply; 42+ messages in thread
From: Valentin Schneider @ 2022-12-22 16:56 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ionela Voinescu, Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On 19/12/22 16:42, Ricardo Neri wrote:
> On Thu, Dec 15, 2022 at 04:48:14PM +0000, Valentin Schneider wrote:
>> With that said, so far all but one flag (SD_PREFER_SIBLING, and that's
>> because of big.LITTLE woes) follow the SDF_SHARED_{CHILD, PARENT} pattern,
>> if SD_ASYM_PACKING no longer does then we need to think whether we're
>> trying to make it do funky things.
>
> My thesis is that x86 does not need the SD_ASYM_PACKING flag at the SMT
> level because all SMT siblings are identical. There are cores of higher
> priority at the "MC" level (maybe in the future at the "CLS" level).
>
> Power7 is fine because it only uses SD_ASYM_PACKING at the SMT level.
>

So with what I groked from your series, I agree with you, x86 shouldn't
need it at SMT level.

What about the below?

---

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 57bde66d95f7a..8dc16942135b4 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -132,12 +132,12 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
 /*
  * Place busy tasks earlier in the domain
  *
- * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
- *               up, but currently assumed to be set from the base domain
- *               upwards (see update_top_cache_domain()).
+ * SHARED_PARENT: Usually set on the SMT level. Can be set further up if all
+ *                siblings of an SMT core are identical, but SMT cores themselves
+ *                have different priorites.
  * NEEDS_GROUPS: Load balancing flag.
  */
-SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
+SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
 
 /*
  * Prefer to place tasks in a sibling domain
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b1d338a740e56..2d532e29373b1 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1772,6 +1772,19 @@ queue_balance_callback(struct rq *rq,
 	for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); \
 			__sd; __sd = __sd->parent)
 
+static inline struct sched_domain *__highest_flag_domain(struct sched_domain *sd, int flag)
+{
+	struct sched_domain *hsd = NULL;
+
+	for (; sd; sd = sd->parent) {
+		if (!(sd->flags & flag))
+			break;
+		hsd = sd;
+	}
+
+	return hsd;
+}
+
 /**
  * highest_flag_domain - Return highest sched_domain containing flag.
  * @cpu:	The CPU whose highest level of sched domain is to
@@ -1783,15 +1796,7 @@ queue_balance_callback(struct rq *rq,
  */
 static inline struct sched_domain *highest_flag_domain(int cpu, int flag)
 {
-	struct sched_domain *sd, *hsd = NULL;
-
-	for_each_domain(cpu, sd) {
-		if (!(sd->flags & flag))
-			break;
-		hsd = sd;
-	}
-
-	return hsd;
+	return __highest_flag_domain(rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd), flag);
 }
 
 static inline struct sched_domain *lowest_flag_domain(int cpu, int flag)
@@ -1806,6 +1811,16 @@ static inline struct sched_domain *lowest_flag_domain(int cpu, int flag)
 	return sd;
 }
 
+static inline struct sched_domain *highest_parent_flag_domain(int cpu, int flag)
+{
+	struct sched_domain *sd;
+
+	SCHED_WARN_ON(!(sd_flag_debug[ilog2(flag)].meta_flags & SDF_SHARED_PARENT));
+
+	sd = lowest_flag_domain(cpu, flag);
+	return __highest_flag_domain(sd, flag);
+}
+
 DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc);
 DECLARE_PER_CPU(int, sd_llc_size);
 DECLARE_PER_CPU(int, sd_llc_id);
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8154ef590b9f8..4e0e5b27c331b 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -692,7 +692,7 @@ static void update_top_cache_domain(int cpu)
 	sd = lowest_flag_domain(cpu, SD_NUMA);
 	rcu_assign_pointer(per_cpu(sd_numa, cpu), sd);
 
-	sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
+	sd = highest_parent_flag_domain(cpu, SD_ASYM_PACKING);
 	rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);
 
 	sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL);


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
  2022-12-22 11:12           ` Dietmar Eggemann
@ 2022-12-23 13:11             ` Ricardo Neri
  0 siblings, 0 replies; 42+ messages in thread
From: Ricardo Neri @ 2022-12-23 13:11 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On Thu, Dec 22, 2022 at 12:12:00PM +0100, Dietmar Eggemann wrote:
> On 22/12/2022 05:32, Ricardo Neri wrote:
> > On Wed, Dec 21, 2022 at 02:03:15PM +0100, Dietmar Eggemann wrote:
> >> On 12/12/2022 18:53, Ricardo Neri wrote:
> >>> On Tue, Dec 06, 2022 at 06:22:41PM +0100, Dietmar Eggemann wrote:
> >>>> On 22/11/2022 21:35, Ricardo Neri wrote:
> >>
> >> [...]
> >>
> >>>> I'm not sure why you change asym_smt_can_pull_tasks() together with
> >>>> removing SD_ASYM_PACKING from SMT level (patch 5/7)?
> >>>
> >>> In x86 we have SD_ASYM_PACKING at the MC, CLS* and, before my patches, SMT
> >>> sched domains.
> >>>
> >>>>
> >>>> update_sg_lb_stats()
> >>>>
> >>>>   ... && env->sd->flags & SD_ASYM_PACKING && .. && sched_asym()
> >>>>                                                    ^^^^^^^^^^^^
> >>>>     sched_asym()
> >>>>
> >>>>       if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> >>>>           (group->flags & SD_SHARE_CPUCAPACITY))
> >>>>         return asym_smt_can_pull_tasks()
> >>>>                ^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>>
> >>>> So x86 won't have a sched domain with SD_SHARE_CPUCAPACITY and
> >>>> SD_ASYM_PACKING anymore. So sched_asym() would call sched_asym_prefer()
> >>>> directly on MC. What do I miss here?
> >>>
> >>> asym_smt_can_pull_tasks() is used above the SMT level *and* when either the
> >>> local or sg sched groups are composed of CPUs that are SMT siblings.
> >>
> >> OK.
> >>
> >>> In fact, asym_smt_can_pull_tasks() can only be called above the SMT level.
> >>> This is because the flags of a sched_group in a sched_domain are equal to
> >>> the flags of the child sched_domain. Since SMT is the lowest sched_domain,
> >>> its groups' flags are 0.
> >>
> >> I see. I forgot about `[PATCH v5 0/6] sched/fair: Fix load balancing of
> >> SMT siblings with ASYM_PACKING` from Sept 21 (specifically [PATCH v5
> >> 2/6] sched/topology: Introduce sched_group::flags).
> >>
> >>> sched_asym() calls sched_asym_prefer() directly if balancing at the
> >>> SMT level and, at higher domains, if the child domain is not SMT.
> >>
> >> OK.
> >>
> >>> This meets the requirement of Power7, where SMT siblings have different
> >>> priorities; and of x86, where physical cores have different priorities.
> >>>
> >>> Thanks and BR,
> >>> Ricardo
> >>>
> >>> * The target of these patches is Intel hybrid processors, on which cluster
> >>>   scheduling is disabled - cabdc3a8475b ("sched,x86: Don't use cluster
> >>>   topology for x86 hybrid CPUs"). Also, I have not observed topologies in
> >>>   which CPUs of the same cluster have different priorities.
> >>
> >> OK.
> >>
> >> IMHO, the function header of asym_smt_can_pull_tasks() (3rd and 4th
> >> paragraph ...  `If both @dst_cpu and @sg have SMT siblings` and
> > 
> > Agreed. I changed the behavior of the function. I will update the
> > description.
> > 
> >> `If @sg does not have SMT siblings` still reflect the old code layout.
> > 
> > But this behavior did not change. The check covers both SMT and non-SMT
> > cases:
> 
> The condition to call sched_asym_prefer() seems to have changed slightly
> though (including the explanation that busy_cpus_delta >= 2 handling
> should be done by fbg().:
> 
> sds->local_stat.sum_nr_running (A)
> busy_cpus_delta = sg_busy_cpus - local_busy_cpus (B)
> sg_busy_cpus = sgs->group_weight - sgs->idle_cpus (C)
> 
> From ((smt && B == 1) || (!smt && !A)) to (C == 1 && !A)

I agree that ((smt && B == 1) did change and I will update the comment.

My point is that (!smt && !A) is equivalent to (C == 1 && !A) if @sg has
only one CPU and is busy. The fourth paragraph still stands.

> 
> > 
> > 	 /*
> > 	  * non-SMT @sg can only have 1 busy CPU. We only care SMT @sg
> > 	  * has exactly one busy sibling
> > 	  */
> > 	if (sg_busy_cpus == 1 && 
> > 	    /* local group is fully idle, SMT and non-SMT. */
> > 	    !sds->local_stat.sum_nr_running)
> > 
> > Perhaps I can collapse the two paragraphs into one.
> 
> Sounds good to me.

Will do.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle()
  2022-12-22 16:56   ` Valentin Schneider
@ 2022-12-24  5:28     ` Ricardo Neri
  2022-12-28 15:29       ` Chen Yu
  2023-01-10 19:21       ` Valentin Schneider
  0 siblings, 2 replies; 42+ messages in thread
From: Ricardo Neri @ 2022-12-24  5:28 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On Thu, Dec 22, 2022 at 04:56:22PM +0000, Valentin Schneider wrote:
> On 22/11/22 12:35, Ricardo Neri wrote:
> > Architectures that implement arch_asym_cpu_priority() may need to know the
> > idle state of the SMT siblings of a CPU. The scheduler has this information
> > and functionality. Expose it.
> >
> > Move the existing functionality outside of the NUMA code.
> >
> 
> test_idle_cores() does something similar without an iteration, did you
> consider using that instead?

IIUC, test_idle_cores() returns true if there is at least one idle core in
the package. In my case, I need to know the idle state of only the SMT
siblings of a specific CPU. Am I missing something?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle()
  2022-12-24  5:28     ` Ricardo Neri
@ 2022-12-28 15:29       ` Chen Yu
  2022-12-30  0:17         ` Ricardo Neri
  2023-01-10 19:21       ` Valentin Schneider
  1 sibling, 1 reply; 42+ messages in thread
From: Chen Yu @ 2022-12-28 15:29 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Valentin Schneider, Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On 2022-12-23 at 21:28:50 -0800, Ricardo Neri wrote:
> On Thu, Dec 22, 2022 at 04:56:22PM +0000, Valentin Schneider wrote:
> > On 22/11/22 12:35, Ricardo Neri wrote:
> > > Architectures that implement arch_asym_cpu_priority() may need to know the
> > > idle state of the SMT siblings of a CPU. The scheduler has this information
> > > and functionality. Expose it.
> > >
> > > Move the existing functionality outside of the NUMA code.
> > >
> > 
> > test_idle_cores() does something similar without an iteration, did you
> > consider using that instead?
> 
> IIUC, test_idle_cores() returns true if there is at least one idle core in
> the package. In my case, I need to know the idle state of only the SMT
> siblings of a specific CPU. Am I missing something?
>
I guess a similar one is select_idle_core(), but it also consider the CPU with
SCHED_IDLE task as idle. Is CPU with SCHED_IDLE task a candidate in your
scenario?

thanks,
Chenyu
> Thanks and BR,
> Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
  2022-12-22 16:55   ` Valentin Schneider
@ 2022-12-29  4:00     ` Ricardo Neri
  2023-01-11 16:04       ` Valentin Schneider
  0 siblings, 1 reply; 42+ messages in thread
From: Ricardo Neri @ 2022-12-29  4:00 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On Thu, Dec 22, 2022 at 04:55:58PM +0000, Valentin Schneider wrote:
> On 22/11/22 12:35, Ricardo Neri wrote:
> > @@ -8926,25 +8924,16 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> >               return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> >       }
> >
> > -	/* @dst_cpu has SMT siblings. */
> > -
> > -	if (sg_is_smt) {
> > -		int local_busy_cpus = sds->local->group_weight -
> > -				      sds->local_stat.idle_cpus;
> > -		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> > -
> > -		if (busy_cpus_delta == 1)
> > -			return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > -
> > -		return false;
> > -	}
> > -
> >       /*
> > -	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
> > -	 * up with more than one busy SMT sibling and only pull tasks if there
> > -	 * are not busy CPUs (i.e., no CPU has running tasks).
> > +	 * @dst_cpu has SMT siblings. Do asym_packing load balancing only if
> > +	 * all its siblings are idle (moving tasks between physical cores in
> > +	 * which some SMT siblings are busy results in the same throughput).
> > +	 *
> > +	 * If the difference in the number of busy CPUs is two or more, let
> > +	 * find_busiest_group() take care of it. We only care if @sg has
> > +	 * exactly one busy CPU. This covers SMT and non-SMT sched groups.
> >        */
> > -	if (!sds->local_stat.sum_nr_running)
> > +	if (sg_busy_cpus == 1 && !sds->local_stat.sum_nr_running)
> >               return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> >
> 
> Some of this is new to me - I had missed the original series introducing
> this. However ISTM that this is conflating two concepts: SMT occupancy
> balancing, and asym packing.
> 
> Take the !local_is_smt :: sg_busy_cpus >= 2 :: return true; path. It does
> not involve asym packing priorities at all. This can end up in an
> ASYM_PACKING load balance,

Yes, this the desired result: an idle low-priority CPU should help a high-
priority core with more than one busy SMT sibling. But yes, it does not
relate to priorities and can be implemented differently.

> which per calculate_imbalance() tries to move
> *all* tasks to the higher priority CPU - in the case of SMT balancing,
> we don't want to totally empty the core, just its siblings.

But it will not empty the core, only one of its SMT siblings. A single
sibling will be selected in find_busiest_queue(). The other siblings will
be unaffected.

> 
> Is there an ITMT/big.LITTLE (or however x86 calls it) case that invalidates
> the above?

Please see below.

> 
> Say, what's not sufficient with the below? AFAICT the only thing that isn't
> covered is the sg_busy_cpus >= 2 thing, but IMO that's SMT balancing, not
> asym packing - if the current calculate_imbalance() doesn't do it, it
> should be fixed to do it.

Agreed.

>Looking at the
> 
>   local->group_type == group_has_spare
> 
> case, it looks like it should DTRT.

I had tried (and failed) to have find_busiest_group() handle the
!local_is_smt :: sg_busy_cpus >= 2 case. Now I think I made it work.

When the busiest group is not overloaded, find_busiest_group() and
local->group = group_has_spare during an idle load balance events the
number of *idle* CPUs. However, this does not work if the local and busiest
groups have different weights. In SMT2, for instance, if busiest has 2 busy
CPUs (i.e., 0 idle CPUs) and the destination CPU is idle, the difference in
the number of idle CPUs is 1. find_busiest_group() will incorrectly goto
out_balanced.

This issue very visible in Intel hybrid processors because the big cores
have SMT but the small cores do not. It can, however, be reproduced in non-
hybrid processors by offlining the SMT siblings of some cores.

The problem can be fixed by instead balancing the number of *busy* CPUs,
which is what in general we want, IMO. (When two groups have the same
weight, it is equivalent to balancing the number of idle CPUs).

This patch worked for me:

@@ -9787,14 +9787,18 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 			lsub_positive(&nr_diff, local->sum_nr_running);
 			env->imbalance = nr_diff;
 		} else {
+			unsigned int busiest_busy_cpus, local_busy_cpus;
+
+			busiest_busy_cpus = busiest->group_weight - busiest->idle_cpus;
+			local_busy_cpus = local->group_weight - local->idle_cpus;
 
 			/*
 			 * If there is no overload, we just want to even the number of
-			 * idle cpus.
+			 * busy cpus.
 			 */
 			env->migration_type = migrate_task;
-			env->imbalance = max_t(long, 0,
-					       (local->idle_cpus - busiest->idle_cpus));
+			env->imbalance = max_t(long, 0 ,
+					       (busiest_busy_cpus -  local_busy_cpus));
 		}
 
 #ifdef CONFIG_NUMA
@@ -9981,18 +9985,24 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 			 */
 			goto out_balanced;
 
-		if (busiest->group_weight > 1 &&
-		    local->idle_cpus <= (busiest->idle_cpus + 1))
-			/*
-			 * If the busiest group is not overloaded
-			 * and there is no imbalance between this and busiest
-			 * group wrt idle CPUs, it is balanced. The imbalance
-			 * becomes significant if the diff is greater than 1
-			 * otherwise we might end up to just move the imbalance
-			 * on another group. Of course this applies only if
-			 * there is more than 1 CPU per group.
-			 */
-			goto out_balanced;
+		if (busiest->group_weight > 1) {
+			unsigned int local_busy_cpus, busiest_busy_cpus;
+
+			local_busy_cpus = local->group_weight - local->idle_cpus;
+			busiest_busy_cpus = busiest->group_weight - busiest->idle_cpus;
+
+			if (busiest_busy_cpus <= local_busy_cpus + 1)
+				/*
+				 * If the busiest group is not overloaded
+				 * and there is no imbalance between this and busiest
+				 * group wrt busy CPUs, it is balanced. The imbalance
+				 * becomes significant if the diff is greater than 1
+				 * otherwise we might end up to just move the imbalance
+				 * on another group. Of course this applies only if
+				 * there is more than 1 CPU per group.
+				 */
+				goto out_balanced;
+		}
 
 		if (busiest->sum_h_nr_running == 1)
 			/*

With this I can remove the sg_busy_cpus >=2 thing from asym_smt_can_pull_tasks().

> 
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 224107278471f..15eb2d3cff186 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9176,12 +9176,15 @@ static inline bool
>  sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
>  	   struct sched_group *group)
>  {
> -	/* Only do SMT checks if either local or candidate have SMT siblings */
> -	if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> -	    (group->flags & SD_SHARE_CPUCAPACITY))
> -		return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
> +	/*
> +	 * For SMT, env->idle != CPU_NOT_IDLE isn't sufficient, we need to make
> +	 * sure the whole core is idle.
> +	 */
> +	if (((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> +	     (group->flags & SD_SHARE_CPUCAPACITY)) &&
> +	    !test_idle_cores(env->dst_cpu))

But test_idle_cores() tests for *any* idle core in the highest domain with the
SD_SHARE_PKG_RESOURCES flag. Here we are only interested in the SMT siblings
of env->dst_cpu. If is_core_idle(env->dst_cpu) is used, then I agree with the
change.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain
  2022-12-22 16:56           ` Valentin Schneider
@ 2022-12-29 19:02             ` Ricardo Neri
  2023-01-10 19:17               ` Valentin Schneider
  0 siblings, 1 reply; 42+ messages in thread
From: Ricardo Neri @ 2022-12-29 19:02 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ionela Voinescu, Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On Thu, Dec 22, 2022 at 04:56:51PM +0000, Valentin Schneider wrote:
> On 19/12/22 16:42, Ricardo Neri wrote:
> > On Thu, Dec 15, 2022 at 04:48:14PM +0000, Valentin Schneider wrote:
> >> With that said, so far all but one flag (SD_PREFER_SIBLING, and that's
> >> because of big.LITTLE woes) follow the SDF_SHARED_{CHILD, PARENT} pattern,
> >> if SD_ASYM_PACKING no longer does then we need to think whether we're
> >> trying to make it do funky things.
> >
> > My thesis is that x86 does not need the SD_ASYM_PACKING flag at the SMT
> > level because all SMT siblings are identical. There are cores of higher
> > priority at the "MC" level (maybe in the future at the "CLS" level).
> >
> > Power7 is fine because it only uses SD_ASYM_PACKING at the SMT level.
> >
> 
> So with what I groked from your series, I agree with you, x86 shouldn't
> need it at SMT level.
> 
> What about the below?
> 
> ---
> 
> diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
> index 57bde66d95f7a..8dc16942135b4 100644
> --- a/include/linux/sched/sd_flags.h
> +++ b/include/linux/sched/sd_flags.h
> @@ -132,12 +132,12 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
>  /*
>   * Place busy tasks earlier in the domain
>   *
> - * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
> - *               up, but currently assumed to be set from the base domain
> - *               upwards (see update_top_cache_domain()).
> + * SHARED_PARENT: Usually set on the SMT level. Can be set further up if all
> + *                siblings of an SMT core are identical, but SMT cores themselves
> + *                have different priorites.
>   * NEEDS_GROUPS: Load balancing flag.
>   */
> -SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
> +SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)

But this would not work for Power7. It only has SD_ASYM_PACKING in the SMT
sched domain. Must it have either of these flags?

In Power7 SMT siblings have the different priority but, IIUC, physical
cores are identical.

It seems to me that asym_packing is specific to a domain.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle()
  2022-12-28 15:29       ` Chen Yu
@ 2022-12-30  0:17         ` Ricardo Neri
  0 siblings, 0 replies; 42+ messages in thread
From: Ricardo Neri @ 2022-12-30  0:17 UTC (permalink / raw)
  To: Chen Yu
  Cc: Valentin Schneider, Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On Wed, Dec 28, 2022 at 11:29:52PM +0800, Chen Yu wrote:
> On 2022-12-23 at 21:28:50 -0800, Ricardo Neri wrote:
> > On Thu, Dec 22, 2022 at 04:56:22PM +0000, Valentin Schneider wrote:
> > > On 22/11/22 12:35, Ricardo Neri wrote:
> > > > Architectures that implement arch_asym_cpu_priority() may need to know the
> > > > idle state of the SMT siblings of a CPU. The scheduler has this information
> > > > and functionality. Expose it.
> > > >
> > > > Move the existing functionality outside of the NUMA code.
> > > >
> > > 
> > > test_idle_cores() does something similar without an iteration, did you
> > > consider using that instead?
> > 
> > IIUC, test_idle_cores() returns true if there is at least one idle core in
> > the package. In my case, I need to know the idle state of only the SMT
> > siblings of a specific CPU. Am I missing something?
> >
> I guess a similar one is select_idle_core(), but it also consider the CPU with
> SCHED_IDLE task as idle. Is CPU with SCHED_IDLE task a candidate in your
> scenario?

However, we are not looking for an idle CPU. We want to know the idle state of
the siblings of a CPU. I see that select_idle_core() uses available_idle_cpu(),
which in turn uses idle_cpu(). is_core_idle() also uses it.

As per 943d355d7fee ("sched/core: Distinguish between idle_cpu() calls based on
desired effect, introduce available_idle_cpu()") the load balancer can just
call idle_cpu().

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain
  2022-12-29 19:02             ` Ricardo Neri
@ 2023-01-10 19:17               ` Valentin Schneider
  2023-01-13  1:31                 ` Ricardo Neri
  0 siblings, 1 reply; 42+ messages in thread
From: Valentin Schneider @ 2023-01-10 19:17 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ionela Voinescu, Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On 29/12/22 11:02, Ricardo Neri wrote:
> On Thu, Dec 22, 2022 at 04:56:51PM +0000, Valentin Schneider wrote:
>> diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
>> index 57bde66d95f7a..8dc16942135b4 100644
>> --- a/include/linux/sched/sd_flags.h
>> +++ b/include/linux/sched/sd_flags.h
>> @@ -132,12 +132,12 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
>>  /*
>>   * Place busy tasks earlier in the domain
>>   *
>> - * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
>> - *               up, but currently assumed to be set from the base domain
>> - *               upwards (see update_top_cache_domain()).
>> + * SHARED_PARENT: Usually set on the SMT level. Can be set further up if all
>> + *                siblings of an SMT core are identical, but SMT cores themselves
>> + *                have different priorites.
>>   * NEEDS_GROUPS: Load balancing flag.
>>   */
>> -SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
>> +SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
>
> But this would not work for Power7. It only has SD_ASYM_PACKING in the SMT
> sched domain. Must it have either of these flags?
>

It's not mandatory, but making sure SD flags conform to either of them
means the topology debugging infra can help spot misshapen topologies...

> In Power7 SMT siblings have the different priority but, IIUC, physical
> cores are identical.
>


...But you're right, this doesn't work with Power7 as it would need
SD_ASYM_PACKING all the way up the topology to conform with
SDF_SHARED_PARENT, which clearly doesn't work with how Power7 uses
asym_packing.

> It seems to me that asym_packing is specific to a domain.
>

For Power7 it is, since the asymmetry is only between siblings of a given
core. For other systems where the asymmetry is between cores, that could
theoretically affect several levels. Consider:

  DIE [                      ]
  MC  [          ][          ]
  SMT [    ][    ][    ][    ]
  CPU  0  1  2  3  4  5  6  7
  prio 3  3  2  2  1  1  0  0

As done in your patch, here asym_packing doesn't make sense for SMT, but it
does for MC and DIE.

Anywho, I think what this means if we should drop the SDF_SHARED_* metaflag
for SD_ASYM_PACKING, unless we can think of a nice way to programmatically
describe how SD_ASYM_PACKING should be set.

> Thanks and BR,
> Ricardo


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle()
  2022-12-24  5:28     ` Ricardo Neri
  2022-12-28 15:29       ` Chen Yu
@ 2023-01-10 19:21       ` Valentin Schneider
  1 sibling, 0 replies; 42+ messages in thread
From: Valentin Schneider @ 2023-01-10 19:21 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On 23/12/22 21:28, Ricardo Neri wrote:
> On Thu, Dec 22, 2022 at 04:56:22PM +0000, Valentin Schneider wrote:
>> On 22/11/22 12:35, Ricardo Neri wrote:
>> > Architectures that implement arch_asym_cpu_priority() may need to know the
>> > idle state of the SMT siblings of a CPU. The scheduler has this information
>> > and functionality. Expose it.
>> >
>> > Move the existing functionality outside of the NUMA code.
>> >
>> 
>> test_idle_cores() does something similar without an iteration, did you
>> consider using that instead?
>
> IIUC, test_idle_cores() returns true if there is at least one idle core in
> the package. In my case, I need to know the idle state of only the SMT
> siblings of a specific CPU. Am I missing something?
>

No, you're right, clearly I needed that end of the year break - sorry for
the noise.

> Thanks and BR,
> Ricardo


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
  2022-12-29  4:00     ` Ricardo Neri
@ 2023-01-11 16:04       ` Valentin Schneider
  2023-01-13 19:02         ` Ricardo Neri
  0 siblings, 1 reply; 42+ messages in thread
From: Valentin Schneider @ 2023-01-11 16:04 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On 28/12/22 20:00, Ricardo Neri wrote:
> On Thu, Dec 22, 2022 at 04:55:58PM +0000, Valentin Schneider wrote:
>> Some of this is new to me - I had missed the original series introducing
>> this. However ISTM that this is conflating two concepts: SMT occupancy
>> balancing, and asym packing.
>> 
>> Take the !local_is_smt :: sg_busy_cpus >= 2 :: return true; path. It does
>> not involve asym packing priorities at all. This can end up in an
>> ASYM_PACKING load balance,
>
> Yes, this the desired result: an idle low-priority CPU should help a high-
> priority core with more than one busy SMT sibling. But yes, it does not
> relate to priorities and can be implemented differently.
>
>> which per calculate_imbalance() tries to move
>> *all* tasks to the higher priority CPU - in the case of SMT balancing,
>> we don't want to totally empty the core, just its siblings.
>
> But it will not empty the core, only one of its SMT siblings. A single
> sibling will be selected in find_busiest_queue(). The other siblings will
> be unaffected.
>

Right

>> 
>> Is there an ITMT/big.LITTLE (or however x86 calls it) case that invalidates
>> the above?
>
> Please see below.
>
>> 
>> Say, what's not sufficient with the below? AFAICT the only thing that isn't
>> covered is the sg_busy_cpus >= 2 thing, but IMO that's SMT balancing, not
>> asym packing - if the current calculate_imbalance() doesn't do it, it
>> should be fixed to do it.
>
> Agreed.
>
>>Looking at the
>> 
>>   local->group_type == group_has_spare
>> 
>> case, it looks like it should DTRT.
>
> I had tried (and failed) to have find_busiest_group() handle the
> !local_is_smt :: sg_busy_cpus >= 2 case. Now I think I made it work.
>
> When the busiest group is not overloaded, find_busiest_group() and
> local->group = group_has_spare during an idle load balance events the
> number of *idle* CPUs. However, this does not work if the local and busiest
> groups have different weights. In SMT2, for instance, if busiest has 2 busy
> CPUs (i.e., 0 idle CPUs) and the destination CPU is idle, the difference in
> the number of idle CPUs is 1. find_busiest_group() will incorrectly goto
> out_balanced.
>
> This issue very visible in Intel hybrid processors because the big cores
> have SMT but the small cores do not. It can, however, be reproduced in non-
> hybrid processors by offlining the SMT siblings of some cores.
>

I think I follow. If we're comparing two groups each spanning an SMT2 core,
then

  busiest->group_weight > 1 && local->idle_cpus <= (busiest->idle_cpus + 1)

is false if local is fully idle and busiest fully busy, but if local
becomes a non-SMT core, then that's true and we goto out_balanced.


With that said, shouldn't SD_PREFER_SIBLING help here? cf.

	if (sds.prefer_sibling && local->group_type == group_has_spare &&
	    busiest->sum_nr_running > local->sum_nr_running + 1)

It should be set on any topology level below the NUMA ones, we do remove it
on SD_ASYM_CPUCAPACITY levels because this used to interfere with misfit
balancing (it would override the group_type), things are a bit different
since Vincent's rewrite of load_balance() but I think we still want it off
there. I would expect it to be set in your system, though whether this is
playing nice with the asymmetry is another matter :-)

> The problem can be fixed by instead balancing the number of *busy* CPUs,
> which is what in general we want, IMO. (When two groups have the same
> weight, it is equivalent to balancing the number of idle CPUs).
>
> This patch worked for me:
>
> @@ -9787,14 +9787,18 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>  			lsub_positive(&nr_diff, local->sum_nr_running);
>  			env->imbalance = nr_diff;
>  		} else {
> +			unsigned int busiest_busy_cpus, local_busy_cpus;
> +
> +			busiest_busy_cpus = busiest->group_weight - busiest->idle_cpus;
> +			local_busy_cpus = local->group_weight - local->idle_cpus;
>  
>  			/*
>  			 * If there is no overload, we just want to even the number of
> -			 * idle cpus.
> +			 * busy cpus.
>  			 */
>  			env->migration_type = migrate_task;
> -			env->imbalance = max_t(long, 0,
> -					       (local->idle_cpus - busiest->idle_cpus));
> +			env->imbalance = max_t(long, 0 ,
> +					       (busiest_busy_cpus -  local_busy_cpus));
>  		}
>  
>  #ifdef CONFIG_NUMA
> @@ -9981,18 +9985,24 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
>  			 */
>  			goto out_balanced;
>  
> -		if (busiest->group_weight > 1 &&
> -		    local->idle_cpus <= (busiest->idle_cpus + 1))
> -			/*
> -			 * If the busiest group is not overloaded
> -			 * and there is no imbalance between this and busiest
> -			 * group wrt idle CPUs, it is balanced. The imbalance
> -			 * becomes significant if the diff is greater than 1
> -			 * otherwise we might end up to just move the imbalance
> -			 * on another group. Of course this applies only if
> -			 * there is more than 1 CPU per group.
> -			 */
> -			goto out_balanced;
> +		if (busiest->group_weight > 1) {
> +			unsigned int local_busy_cpus, busiest_busy_cpus;
> +
> +			local_busy_cpus = local->group_weight - local->idle_cpus;
> +			busiest_busy_cpus = busiest->group_weight - busiest->idle_cpus;
> +
> +			if (busiest_busy_cpus <= local_busy_cpus + 1)
> +				/*
> +				 * If the busiest group is not overloaded
> +				 * and there is no imbalance between this and busiest
> +				 * group wrt busy CPUs, it is balanced. The imbalance
> +				 * becomes significant if the diff is greater than 1
> +				 * otherwise we might end up to just move the imbalance
> +				 * on another group. Of course this applies only if
> +				 * there is more than 1 CPU per group.
> +				 */
> +				goto out_balanced;
> +		}
>  
>  		if (busiest->sum_h_nr_running == 1)
>  			/*
>
> With this I can remove the sg_busy_cpus >=2 thing from asym_smt_can_pull_tasks().
>
>> 
>> ---
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 224107278471f..15eb2d3cff186 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -9176,12 +9176,15 @@ static inline bool
>>  sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
>>  	   struct sched_group *group)
>>  {
>> -	/* Only do SMT checks if either local or candidate have SMT siblings */
>> -	if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
>> -	    (group->flags & SD_SHARE_CPUCAPACITY))
>> -		return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
>> +	/*
>> +	 * For SMT, env->idle != CPU_NOT_IDLE isn't sufficient, we need to make
>> +	 * sure the whole core is idle.
>> +	 */
>> +	if (((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
>> +	     (group->flags & SD_SHARE_CPUCAPACITY)) &&
>> +	    !test_idle_cores(env->dst_cpu))
>
> But test_idle_cores() tests for *any* idle core in the highest domain with the
> SD_SHARE_PKG_RESOURCES flag. Here we are only interested in the SMT siblings
> of env->dst_cpu. If is_core_idle(env->dst_cpu) is used, then I agree with the
> change.
>

Right, I had gotten confused with test_idle_cores()

> Thanks and BR,
> Ricardo


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain
  2023-01-10 19:17               ` Valentin Schneider
@ 2023-01-13  1:31                 ` Ricardo Neri
  0 siblings, 0 replies; 42+ messages in thread
From: Ricardo Neri @ 2023-01-13  1:31 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ionela Voinescu, Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On Tue, Jan 10, 2023 at 07:17:51PM +0000, Valentin Schneider wrote:
> On 29/12/22 11:02, Ricardo Neri wrote:
> > On Thu, Dec 22, 2022 at 04:56:51PM +0000, Valentin Schneider wrote:
> >> diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
> >> index 57bde66d95f7a..8dc16942135b4 100644
> >> --- a/include/linux/sched/sd_flags.h
> >> +++ b/include/linux/sched/sd_flags.h
> >> @@ -132,12 +132,12 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
> >>  /*
> >>   * Place busy tasks earlier in the domain
> >>   *
> >> - * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
> >> - *               up, but currently assumed to be set from the base domain
> >> - *               upwards (see update_top_cache_domain()).
> >> + * SHARED_PARENT: Usually set on the SMT level. Can be set further up if all
> >> + *                siblings of an SMT core are identical, but SMT cores themselves
> >> + *                have different priorites.
> >>   * NEEDS_GROUPS: Load balancing flag.
> >>   */
> >> -SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
> >> +SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
> >
> > But this would not work for Power7. It only has SD_ASYM_PACKING in the SMT
> > sched domain. Must it have either of these flags?
> >
> 
> It's not mandatory, but making sure SD flags conform to either of them
> means the topology debugging infra can help spot misshapen topologies...
> 
> > In Power7 SMT siblings have the different priority but, IIUC, physical
> > cores are identical.
> >
> 
> 
> ...But you're right, this doesn't work with Power7 as it would need
> SD_ASYM_PACKING all the way up the topology to conform with
> SDF_SHARED_PARENT, which clearly doesn't work with how Power7 uses
> asym_packing.
> 
> > It seems to me that asym_packing is specific to a domain.
> >
> 
> For Power7 it is, since the asymmetry is only between siblings of a given
> core. For other systems where the asymmetry is between cores, that could
> theoretically affect several levels. Consider:
> 
>   DIE [                      ]
>   MC  [          ][          ]
>   SMT [    ][    ][    ][    ]
>   CPU  0  1  2  3  4  5  6  7
>   prio 3  3  2  2  1  1  0  0
> 
> As done in your patch, here asym_packing doesn't make sense for SMT, but it
> does for MC and DIE.
> 
> Anywho, I think what this means if we should drop the SDF_SHARED_* metaflag
> for SD_ASYM_PACKING, unless we can think of a nice way to programmatically
> describe how SD_ASYM_PACKING should be set.

Perhaps it can be done by adding an extra check for sg::asym_prefer_cpu. In
the example you give, DIE does not need SD_ASYM_PACKING if asym_prefer_cpu 
all the MC sched groups of have the same priority. This would satisfy both
Power7 and x86.

This assumes that priorities are available when checking the sanity of the
topology.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
  2023-01-11 16:04       ` Valentin Schneider
@ 2023-01-13 19:02         ` Ricardo Neri
  2023-01-16  4:05           ` Ricardo Neri
  0 siblings, 1 reply; 42+ messages in thread
From: Ricardo Neri @ 2023-01-13 19:02 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On Wed, Jan 11, 2023 at 04:04:23PM +0000, Valentin Schneider wrote:
> On 28/12/22 20:00, Ricardo Neri wrote:
> > On Thu, Dec 22, 2022 at 04:55:58PM +0000, Valentin Schneider wrote:
> >> Some of this is new to me - I had missed the original series introducing
> >> this. However ISTM that this is conflating two concepts: SMT occupancy
> >> balancing, and asym packing.
> >> 
> >> Take the !local_is_smt :: sg_busy_cpus >= 2 :: return true; path. It does
> >> not involve asym packing priorities at all. This can end up in an
> >> ASYM_PACKING load balance,
> >
> > Yes, this the desired result: an idle low-priority CPU should help a high-
> > priority core with more than one busy SMT sibling. But yes, it does not
> > relate to priorities and can be implemented differently.
> >
> >> which per calculate_imbalance() tries to move
> >> *all* tasks to the higher priority CPU - in the case of SMT balancing,
> >> we don't want to totally empty the core, just its siblings.
> >
> > But it will not empty the core, only one of its SMT siblings. A single
> > sibling will be selected in find_busiest_queue(). The other siblings will
> > be unaffected.
> >
> 
> Right
> 
> >> 
> >> Is there an ITMT/big.LITTLE (or however x86 calls it) case that invalidates
> >> the above?
> >
> > Please see below.
> >
> >> 
> >> Say, what's not sufficient with the below? AFAICT the only thing that isn't
> >> covered is the sg_busy_cpus >= 2 thing, but IMO that's SMT balancing, not
> >> asym packing - if the current calculate_imbalance() doesn't do it, it
> >> should be fixed to do it.
> >
> > Agreed.
> >
> >>Looking at the
> >> 
> >>   local->group_type == group_has_spare
> >> 
> >> case, it looks like it should DTRT.
> >
> > I had tried (and failed) to have find_busiest_group() handle the
> > !local_is_smt :: sg_busy_cpus >= 2 case. Now I think I made it work.
> >
> > When the busiest group is not overloaded, find_busiest_group() and
> > local->group = group_has_spare during an idle load balance events the
> > number of *idle* CPUs. However, this does not work if the local and busiest
> > groups have different weights. In SMT2, for instance, if busiest has 2 busy
> > CPUs (i.e., 0 idle CPUs) and the destination CPU is idle, the difference in
> > the number of idle CPUs is 1. find_busiest_group() will incorrectly goto
> > out_balanced.
> >
> > This issue very visible in Intel hybrid processors because the big cores
> > have SMT but the small cores do not. It can, however, be reproduced in non-
> > hybrid processors by offlining the SMT siblings of some cores.
> >
> 
> I think I follow. If we're comparing two groups each spanning an SMT2 core,
> then
> 
>   busiest->group_weight > 1 && local->idle_cpus <= (busiest->idle_cpus + 1)
> 
> is false if local is fully idle and busiest fully busy, but if local
> becomes a non-SMT core, then that's true and we goto out_balanced.

Exactly right.

> 
> 
> With that said, shouldn't SD_PREFER_SIBLING help here? cf.
> 
> 	if (sds.prefer_sibling && local->group_type == group_has_spare &&
> 	    busiest->sum_nr_running > local->sum_nr_running + 1)

It does not help because sds.prefer_sibling is false: an non-SMT core is
looking into offloading a fully_busy SMT core at the "MC" domain.
sds.prefer_sibling is set in update_sd_lb_stats() if the sched domain's child
has the SD_PREFER_SIBLING flag. Since the destination CPU is the non-SMT
core, there is no child.

> 
> It should be set on any topology level below the NUMA ones, we do remove it
> on SD_ASYM_CPUCAPACITY levels because this used to interfere with misfit
> balancing (it would override the group_type), things are a bit different
> since Vincent's rewrite of load_balance() but I think we still want it off
> there.

I see in find_busiest_group() that group_misfit_task is evaluated before
sds.prefer_sibling.

> I would expect it to be set in your system, though whether this is
> playing nice with the asymmetry is another matter :-)

I recall a few instances of SD_PREFER_SIBLING causing trouble me, but I
need to investigate more.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
  2023-01-13 19:02         ` Ricardo Neri
@ 2023-01-16  4:05           ` Ricardo Neri
  2023-01-16 19:07             ` Valentin Schneider
  0 siblings, 1 reply; 42+ messages in thread
From: Ricardo Neri @ 2023-01-16  4:05 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On Fri, Jan 13, 2023 at 11:02:26AM -0800, Ricardo Neri wrote:
> On Wed, Jan 11, 2023 at 04:04:23PM +0000, Valentin Schneider wrote:
> > On 28/12/22 20:00, Ricardo Neri wrote:
> > > On Thu, Dec 22, 2022 at 04:55:58PM +0000, Valentin Schneider wrote:
> > >> Some of this is new to me - I had missed the original series introducing
> > >> this. However ISTM that this is conflating two concepts: SMT occupancy
> > >> balancing, and asym packing.
> > >> 
> > >> Take the !local_is_smt :: sg_busy_cpus >= 2 :: return true; path. It does
> > >> not involve asym packing priorities at all. This can end up in an
> > >> ASYM_PACKING load balance,
> > >
> > > Yes, this the desired result: an idle low-priority CPU should help a high-
> > > priority core with more than one busy SMT sibling. But yes, it does not
> > > relate to priorities and can be implemented differently.
> > >
> > >> which per calculate_imbalance() tries to move
> > >> *all* tasks to the higher priority CPU - in the case of SMT balancing,
> > >> we don't want to totally empty the core, just its siblings.
> > >
> > > But it will not empty the core, only one of its SMT siblings. A single
> > > sibling will be selected in find_busiest_queue(). The other siblings will
> > > be unaffected.
> > >
> > 
> > Right
> > 
> > >> 
> > >> Is there an ITMT/big.LITTLE (or however x86 calls it) case that invalidates
> > >> the above?
> > >
> > > Please see below.
> > >
> > >> 
> > >> Say, what's not sufficient with the below? AFAICT the only thing that isn't
> > >> covered is the sg_busy_cpus >= 2 thing, but IMO that's SMT balancing, not
> > >> asym packing - if the current calculate_imbalance() doesn't do it, it
> > >> should be fixed to do it.
> > >
> > > Agreed.
> > >
> > >>Looking at the
> > >> 
> > >>   local->group_type == group_has_spare
> > >> 
> > >> case, it looks like it should DTRT.
> > >
> > > I had tried (and failed) to have find_busiest_group() handle the
> > > !local_is_smt :: sg_busy_cpus >= 2 case. Now I think I made it work.
> > >
> > > When the busiest group is not overloaded, find_busiest_group() and
> > > local->group = group_has_spare during an idle load balance events the
> > > number of *idle* CPUs. However, this does not work if the local and busiest
> > > groups have different weights. In SMT2, for instance, if busiest has 2 busy
> > > CPUs (i.e., 0 idle CPUs) and the destination CPU is idle, the difference in
> > > the number of idle CPUs is 1. find_busiest_group() will incorrectly goto
> > > out_balanced.
> > >
> > > This issue very visible in Intel hybrid processors because the big cores
> > > have SMT but the small cores do not. It can, however, be reproduced in non-
> > > hybrid processors by offlining the SMT siblings of some cores.
> > >
> > 
> > I think I follow. If we're comparing two groups each spanning an SMT2 core,
> > then
> > 
> >   busiest->group_weight > 1 && local->idle_cpus <= (busiest->idle_cpus + 1)
> > 
> > is false if local is fully idle and busiest fully busy, but if local
> > becomes a non-SMT core, then that's true and we goto out_balanced.
> 
> Exactly right.
> 
> > 
> > 
> > With that said, shouldn't SD_PREFER_SIBLING help here? cf.
> > 
> > 	if (sds.prefer_sibling && local->group_type == group_has_spare &&
> > 	    busiest->sum_nr_running > local->sum_nr_running + 1)
> 
> It does not help because sds.prefer_sibling is false: an non-SMT core is
> looking into offloading a fully_busy SMT core at the "MC" domain.
> sds.prefer_sibling is set in update_sd_lb_stats() if the sched domain's child
> has the SD_PREFER_SIBLING flag. Since the destination CPU is the non-SMT
> core, there is no child.
> 
> > 
> > It should be set on any topology level below the NUMA ones, we do remove it
> > on SD_ASYM_CPUCAPACITY levels because this used to interfere with misfit
> > balancing (it would override the group_type), things are a bit different
> > since Vincent's rewrite of load_balance() but I think we still want it off
> > there.

Your comment got me thinking. Whose child sched domain wants prefer_sibling?
It sounds to me that is busiest's. I could not think of any reason of *having*
to use the flags of the local group.

We can use the flags of the sched group (as per 16d364ba6ef2 ("sched/topology:
Introduce sched_group::flags"), these are the flags of the child domain).

The patch below works for me and I don't have to even the number of busy CPUs.
It should not interfere with misfit balancing either:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a2c70e1087d0..737bb3c8bfae 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9752,8 +9752,12 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 		sg = sg->next;
 	} while (sg != env->sd->groups);
 
-	/* Tag domain that child domain prefers tasks go to siblings first */
-	sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING;
+	/*
+	 * Tag domain that child domain prefers tasks go to siblings first.
+	 * A sched group has the flags of the child domain, if any.
+	 */
+	if (sds->busiest)
+		sds->prefer_sibling = sds->busiest->flags & SD_PREFER_SIBLING;
 
 
 	if (env->sd->flags & SD_NUMA)

Thanks and BR,
Ricardo

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
  2023-01-16  4:05           ` Ricardo Neri
@ 2023-01-16 19:07             ` Valentin Schneider
  2023-01-17 12:49               ` Ricardo Neri
  0 siblings, 1 reply; 42+ messages in thread
From: Valentin Schneider @ 2023-01-16 19:07 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On 15/01/23 20:05, Ricardo Neri wrote:
>> >
>> > It should be set on any topology level below the NUMA ones, we do remove it
>> > on SD_ASYM_CPUCAPACITY levels because this used to interfere with misfit
>> > balancing (it would override the group_type), things are a bit different
>> > since Vincent's rewrite of load_balance() but I think we still want it off
>> > there.
>
> Your comment got me thinking. Whose child sched domain wants prefer_sibling?
> It sounds to me that is busiest's. I could not think of any reason of *having*
> to use the flags of the local group.
>

Hm, given that on systems without SD_ASYM_CPUCAPACITY, SD_PREFER_SIBLING is
set all the way from SMT up to the last !NUMA domain, should we just get
rid of the child/parent weirdness of SD_PREFER_SIBLING and stick with the
flags we are given at the level we're balancing at?

i.e.

        sds->prefer_sibling = env->sd & SD_PREFER_SIBLING;

Unless I'm reading this wrong, this also eliminates the effect of
SD_PREFER_SIBLING on the first NUMA level - DIE level has SD_PREFER_SIBLING
set, but we don't necessarily want to evenly spread things out when accross
NUMA nodes.


> We can use the flags of the sched group (as per 16d364ba6ef2 ("sched/topology:
> Introduce sched_group::flags"), these are the flags of the child domain).
>
> The patch below works for me and I don't have to even the number of busy CPUs.
> It should not interfere with misfit balancing either:
>

We remove that flag on systems where misfit balancing happens anyway, so
that's safe vs. SD_PREFER_SIBLING.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
  2023-01-16 19:07             ` Valentin Schneider
@ 2023-01-17 12:49               ` Ricardo Neri
  0 siblings, 0 replies; 42+ messages in thread
From: Ricardo Neri @ 2023-01-17 12:49 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, x86, linux-kernel, Tim C . Chen

On Mon, Jan 16, 2023 at 07:07:54PM +0000, Valentin Schneider wrote:
> On 15/01/23 20:05, Ricardo Neri wrote:
> >> >
> >> > It should be set on any topology level below the NUMA ones, we do remove it
> >> > on SD_ASYM_CPUCAPACITY levels because this used to interfere with misfit
> >> > balancing (it would override the group_type), things are a bit different
> >> > since Vincent's rewrite of load_balance() but I think we still want it off
> >> > there.
> >
> > Your comment got me thinking. Whose child sched domain wants prefer_sibling?
> > It sounds to me that is busiest's. I could not think of any reason of *having*
> > to use the flags of the local group.
> >
> 
> Hm, given that on systems without SD_ASYM_CPUCAPACITY, SD_PREFER_SIBLING is
> set all the way from SMT up to the last !NUMA domain, should we just get
> rid of the child/parent weirdness of SD_PREFER_SIBLING and stick with the
> flags we are given at the level we're balancing at?
> 
> i.e.
> 
>         sds->prefer_sibling = env->sd & SD_PREFER_SIBLING;

Agreed. This would also make the code easier to understand. It should not change
the current behavior either; except (i.e., fix) for the busiest->group_weight = 2
vs local->group_weight = 1 I raised.

> 
> Unless I'm reading this wrong, this also eliminates the effect of
> SD_PREFER_SIBLING on the first NUMA level - DIE level has SD_PREFER_SIBLING
> set, but we don't necessarily want to evenly spread things out when accross
> NUMA nodes.

Agreed.

> 
> 
> > We can use the flags of the sched group (as per 16d364ba6ef2 ("sched/topology:
> > Introduce sched_group::flags"), these are the flags of the child domain).
> >
> > The patch below works for me and I don't have to even the number of busy CPUs.
> > It should not interfere with misfit balancing either:
> >
> 
> We remove that flag on systems where misfit balancing happens anyway, so
> that's safe vs. SD_PREFER_SIBLING.

Then all looks good with your suggestion. I'll include a patch in my series.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2023-01-17 12:40 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-22 20:35 [PATCH v2 0/7] x86/sched: Avoid unnecessary migrations within SMT domains Ricardo Neri
2022-11-22 20:35 ` [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group Ricardo Neri
2022-12-06 17:22   ` Dietmar Eggemann
2022-12-12 17:53     ` Ricardo Neri
2022-12-21 13:03       ` Dietmar Eggemann
2022-12-22  4:32         ` Ricardo Neri
2022-12-22 11:12           ` Dietmar Eggemann
2022-12-23 13:11             ` Ricardo Neri
2022-12-22 16:55   ` Valentin Schneider
2022-12-29  4:00     ` Ricardo Neri
2023-01-11 16:04       ` Valentin Schneider
2023-01-13 19:02         ` Ricardo Neri
2023-01-16  4:05           ` Ricardo Neri
2023-01-16 19:07             ` Valentin Schneider
2023-01-17 12:49               ` Ricardo Neri
2022-11-22 20:35 ` [PATCH v2 2/7] sched: Prepare sched_asym_prefer() to handle idle state of SMT siblings Ricardo Neri
2022-11-22 20:35 ` [PATCH v2 3/7] sched: Teach arch_asym_cpu_priority() the " Ricardo Neri
2022-12-06 17:54   ` Dietmar Eggemann
2022-12-12 17:54     ` Ricardo Neri
2022-12-21 17:12       ` Dietmar Eggemann
2022-12-22  4:55         ` Ricardo Neri
2022-12-22 16:56           ` Valentin Schneider
2022-11-22 20:35 ` [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle() Ricardo Neri
2022-12-06 18:03   ` Dietmar Eggemann
2022-12-12 17:54     ` Ricardo Neri
2022-12-22 11:12       ` Dietmar Eggemann
2022-12-22 16:56   ` Valentin Schneider
2022-12-24  5:28     ` Ricardo Neri
2022-12-28 15:29       ` Chen Yu
2022-12-30  0:17         ` Ricardo Neri
2023-01-10 19:21       ` Valentin Schneider
2022-11-22 20:35 ` [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain Ricardo Neri
2022-12-08 16:03   ` Ionela Voinescu
2022-12-14 16:59     ` Ricardo Neri
2022-12-15 16:48       ` Valentin Schneider
2022-12-20  0:42         ` Ricardo Neri
2022-12-22 16:56           ` Valentin Schneider
2022-12-29 19:02             ` Ricardo Neri
2023-01-10 19:17               ` Valentin Schneider
2023-01-13  1:31                 ` Ricardo Neri
2022-11-22 20:35 ` [PATCH v2 6/7] x86/sched/itmt: Give all SMT siblings of a core the same priority Ricardo Neri
2022-11-22 20:35 ` [PATCH v2 7/7] x86/sched/itmt: Consider the idle state of SMT siblings Ricardo Neri

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.