linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING
@ 2021-04-06  4:11 Ricardo Neri
  2021-04-06  4:11 ` [PATCH 1/4] sched/fair: Optimize checking for group_asym_packing Ricardo Neri
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Ricardo Neri @ 2021-04-06  4:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linux-kernel, Ricardo Neri

=== Problem statement ===
On SMT-enabled hardware, ASYM_PACKING can cause the load balancer to choose
low priority CPUs over medium priority CPUs.

When balancing load in scheduling domains with the SD_ASYM_PACKING flag,
idle CPUs of higher priority pull tasks from CPUs of lower priority. This
balancing is done by comparing pairs of scheduling groups. There may also
be scheduling groups composed of CPUs that are SMT siblings.

When using SD_ASYM_PACKING on x86 with Intel Turbo Boost Max Turbo 3.0
(ITMT), the priorities of a scheduling group of CPUs that has N SMT
siblings are assigned as N*P, N*P/2, N*P/3, ..., P, where P is the
priority assigned by the hardware to the physical core and N is the number
of SMT siblings.

Systems such as Intel Comet Lake can have some cores supporting SMT, while
others do not. As a result, it is possible to have medium non-SMT
priorities, Q, such that N*P > Q > P.

When comparing groups for load balancing, the priority of the CPU doing the
load balancing is only compared with the preferred CPU of the candidate
busiest group (N*P vs Q in the example above). Thus, scheduling groups
with a preferred CPU with priority N*P always pull tasks from the
scheduling group with priority Q and then such tasks are spread in the
“SMT” domain. Conversely, since N*P > Q, CPUs with priority Q cannot
pull tasks from a group with a preferred CPU with priority N*P, even
though Q > P.

Doing load balancing based on load (i.e. if the busiest group is of type
group_overloaded) will not help if the system is not fully busy as the
involved groups will have only one task and load balancing will
not be deemed as necessary.

The behavior described above results in leaving CPUs with medium priority
idle, while CPUs with lower priority are busy. Furthermore, such CPUs of
lower priority are SMT siblings of high priority CPUs, which are also busy.

This patchset fixes this behavior by also checking the idle state of the
SMT siblings of both the CPU doing the load balance and the busiest
candidate group.

I ran a few benchmarks with and without this patchset on an Intel(R)
Core(TM) i9-7900X CPU. I kept online both SMT siblings of two high
priority cores. I offlined the lower priority SMT siblings of three low
priority cores. I offlined the rest of the cores. The resulting priority
configuration meets the N*P > Q > P condition described above.

The baseline for the results is an unmodified v5.12-rc3 kernel. Results
show a comparative percentage of improvement (positive) or degradation
(negative). Each test case is repeated three times, and the standard
deviation among repetitions is also documented.

In order to judge only the improvements this patchset provide, Table 1
shows the results when setting the CPU's frequency at 800MHz. It can be
observed that the patches bring an overall positive impact, especially on
schench. Regressions are mostly of less than 1%.

Table 2 shows the results when using hardware-controlled performance
performance states (HWP), a common use case. In this case, the impact of
the patches is overall positive, especially on schbench. Although some
regressions are also observed.

Thanks and BR,
Ricardo

========       Table 1. Test results of patches at 800MHz      ========
=======================================================================
hackbench
=========
case                    load            baseline(std%)  compare%( std%)
process-pipe            group-1          1.00 (  3.19)   +1.50 (  4.10)
process-pipe            group-2          1.00 (  1.07)   -0.82 (  0.63)
process-pipe            group-4          1.00 (  2.61)   +1.35 (  4.88)
process-pipe            group-8          1.00 (  6.81)   +0.90 ( 17.64)
process-sockets         group-1          1.00 (  0.35)   +0.37 (  1.37)
process-sockets         group-2          1.00 (  0.69)   -0.42 (  0.81)
process-sockets         group-4          1.00 (  1.89)   -0.06 (  1.83)
process-sockets         group-8          1.00 (  3.10)   -6.92 (  8.11)
threads-pipe            group-1          1.00 (  5.89)   +6.42 (  2.68)
threads-pipe            group-2          1.00 (  2.26)   +2.81 (  3.36)
threads-pipe            group-4          1.00 (  4.01)   +0.51 (  5.17)
threads-pipe            group-8          1.00 ( 16.17)  +18.85 ( 11.37)
threads-sockets         group-1          1.00 (  1.00)   -3.96 (  2.34)
threads-sockets         group-2          1.00 (  1.22)   +0.67 (  1.26)
threads-sockets         group-4          1.00 (  0.51)   +0.39 (  0.32)
threads-sockets         group-8          1.00 (  3.39)   +2.19 (  3.16)

netperf
=======
case                    load            baseline(std%)  compare%( std%)
TCP_RR                  thread-1         1.00 (  0.41)   -0.47 (  0.25)
TCP_RR                  thread-3         1.00 (  0.41)   +2.74 (  0.29)
TCP_RR                  thread-7         1.00 ( 11.00)   +0.29 ( 16.64)
TCP_RR                  thread-14        1.00 ( 23.14)   +0.71 ( 22.57)
UDP_RR                  thread-1         1.00 (  0.13)   -0.24 (  0.20)
UDP_RR                  thread-3         1.00 (  0.32)   +3.82 (  0.46)
UDP_RR                  thread-7         1.00 (  8.85)   +0.43 ( 13.60)
UDP_RR                  thread-14        1.00 ( 20.71)   +0.58 ( 21.85)

tbench
======
case                    load            baseline(std%)  compare%( std%)
loopback                thread-1         1.00 (  0.10)   +0.06 (  0.27)
loopback                thread-3         1.00 (  0.27)   +3.87 (  0.11)
loopback                thread-7         1.00 (  0.28)   -0.27 (  1.29)
loopback                thread-14        1.00 (  0.07)   +0.52 (  0.24)

schbench
========
case                    load            baseline(std%)  compare%( std%)
normal                  mthread-1        1.00 (  2.33)  +32.38 (  1.99)
normal                  mthread-2        1.00 (  0.00)   +0.00 (  0.00)
normal                  mthread-4        1.00 (  0.00)   +0.00 (  0.00)
normal                  mthread-8        1.00 (  3.01)   -2.16 (  3.04)

========      Table 2. Test results of patches with HWP        ========
=======================================================================

hackbench
=========
case                    load            baseline(std%)  compare%( std%)
process-pipe            group-1          1.00 (  4.73)   -3.23 (  2.62)
process-pipe            group-2          1.00 (  2.32)   +0.13 (  0.36)
process-pipe            group-4          1.00 (  3.85)   +1.95 (  3.92)
process-pipe            group-8          1.00 ( 14.31)   +2.12 ( 13.44)
process-sockets         group-1          1.00 (  3.10)   -1.22 (  1.95)
process-sockets         group-2          1.00 (  0.91)   -1.30 (  1.20)
process-sockets         group-4          1.00 (  0.87)   -1.29 (  1.35)
process-sockets         group-8          1.00 (  4.98)   +1.21 (  4.57)
threads-pipe            group-1          1.00 (  0.76)   +2.87 (  0.89)
threads-pipe            group-2          1.00 (  1.41)   -3.19 (  1.18)
threads-pipe            group-4          1.00 (  2.37)   -1.22 (  1.41)
threads-pipe            group-8          1.00 ( 15.12)   +6.64 (  5.82)
threads-sockets         group-1          1.00 (  2.73)   +0.89 (  0.92)
threads-sockets         group-2          1.00 (  1.59)   -4.32 (  1.23)
threads-sockets         group-4          1.00 (  0.35)   -1.42 (  1.13)
threads-sockets         group-8          1.00 (  0.77)   +0.23 (  1.35)

netperf
=======
case                    load            baseline(std%)  compare%( std%)
TCP_RR                  thread-1         1.00 (  0.27)   +0.05 (  0.41)
TCP_RR                  thread-3         1.00 (  4.71)   +8.43 (  0.45)
TCP_RR                  thread-7         1.00 ( 10.15)   -1.16 ( 16.95)
TCP_RR                  thread-14        1.00 ( 23.46)   +0.21 ( 23.69)
UDP_RR                  thread-1         1.00 (  0.16)   +1.52 (  0.81)
UDP_RR                  thread-3         1.00 (  4.00)   +8.33 (  0.50)
UDP_RR                  thread-7         1.00 (  7.39)   -0.99 ( 11.34)
UDP_RR                  thread-14        1.00 ( 23.74)   -0.51 ( 21.87)

tbench
======
case                    load            baseline(std%)  compare%( std%)
loopback                thread-1         1.00 (  0.51)   -0.86 (  0.08)
loopback                thread-3         1.00 (  1.40)  +52.18 (  1.13)
loopback                thread-7         1.00 (  0.94)   +0.31 (  0.20)
loopback                thread-14        1.00 (  0.21)   -0.24 (  0.47)

schbench
========
case                    load            baseline(std%)  compare%( std%)
normal                  mthread-1        1.00 ( 20.14)  +61.19 (  5.44)
normal                  mthread-2        1.00 (  0.00)   +0.00 (  0.00)
normal                  mthread-4        1.00 (  0.00)   +0.00 (  0.00)
normal                  mthread-8        1.00 (  3.05)   -2.16 (  2.99)

Ricardo Neri (4):
  sched/fair: Optimize checking for group_asym_packing
  sched/fair: Introduce arch_sched_asym_prefer_early()
  sched/fair: Consider SMT in ASYM_PACKING load balance
  x86/sched: Enable checks of the state of SMT siblings in load
    balancing

 arch/x86/kernel/itmt.c         |  15 ++++
 include/linux/sched/topology.h |   2 +
 kernel/sched/fair.c            | 135 ++++++++++++++++++++++++++++++++-
 3 files changed, 150 insertions(+), 2 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] sched/fair: Optimize checking for group_asym_packing
  2021-04-06  4:11 [PATCH 0/4] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
@ 2021-04-06  4:11 ` Ricardo Neri
  2021-04-06  4:11 ` [PATCH 2/4] sched/fair: Introduce arch_sched_asym_prefer_early() Ricardo Neri
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Ricardo Neri @ 2021-04-06  4:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linux-kernel, Ricardo Neri, Aubrey Li, Ben Segall,
	Daniel Bristot de Oliveira

By checking local_group, we can avoid additional checks and invoking
sched_asmy_prefer() when it is not needed.

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04a3ce20da67..4ef3fa0d5e8d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8455,7 +8455,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 	}
 
 	/* Check if dst CPU is idle and preferred to this group */
-	if (env->sd->flags & SD_ASYM_PACKING &&
+	if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
 	    env->idle != CPU_NOT_IDLE &&
 	    sgs->sum_h_nr_running &&
 	    sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] sched/fair: Introduce arch_sched_asym_prefer_early()
  2021-04-06  4:11 [PATCH 0/4] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
  2021-04-06  4:11 ` [PATCH 1/4] sched/fair: Optimize checking for group_asym_packing Ricardo Neri
@ 2021-04-06  4:11 ` Ricardo Neri
  2021-04-06 14:31   ` Vincent Guittot
  2021-04-06  4:11 ` [PATCH 3/4] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
  2021-04-06  4:11 ` [PATCH 4/4] x86/sched: Enable checks of the state of SMT siblings in load balancing Ricardo Neri
  3 siblings, 1 reply; 15+ messages in thread
From: Ricardo Neri @ 2021-04-06  4:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linux-kernel, Ricardo Neri, Aubrey Li, Ben Segall,
	Daniel Bristot de Oliveira

Introduce arch_sched_asym_prefer_early() so that architectures with SMT
can delay the decision to label a candidate busiest group as
group_asym_packing.

When using asymmetric packing, high priority idle CPUs pull tasks from
scheduling groups with low priority CPUs. The decision on using asymmetric
packing for load balancing is done after collecting the statistics of a
candidate busiest group. However, this decision needs to consider the
state of SMT siblings of dst_cpu.

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/sched/topology.h |  1 +
 kernel/sched/fair.c            | 11 ++++++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 8f0f778b7c91..663b98959305 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -57,6 +57,7 @@ static inline int cpu_numa_flags(void)
 #endif
 
 extern int arch_asym_cpu_priority(int cpu);
+extern bool arch_sched_asym_prefer_early(int a, int b);
 
 struct sched_domain_attr {
 	int relax_domain_level;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4ef3fa0d5e8d..e74da853b046 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -106,6 +106,15 @@ int __weak arch_asym_cpu_priority(int cpu)
 	return -cpu;
 }
 
+/*
+ * For asym packing, early check if CPUs with higher priority should be
+ * preferred. On some architectures, more data is needed to make a decision.
+ */
+bool __weak arch_sched_asym_prefer_early(int a, int b)
+{
+	return sched_asym_prefer(a, b);
+}
+
 /*
  * The margin used when comparing utilization with CPU capacity.
  *
@@ -8458,7 +8467,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 	if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
 	    env->idle != CPU_NOT_IDLE &&
 	    sgs->sum_h_nr_running &&
-	    sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
+	    arch_sched_asym_prefer_early(env->dst_cpu, group->asym_prefer_cpu)) {
 		sgs->group_asym_packing = 1;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-04-06  4:11 [PATCH 0/4] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
  2021-04-06  4:11 ` [PATCH 1/4] sched/fair: Optimize checking for group_asym_packing Ricardo Neri
  2021-04-06  4:11 ` [PATCH 2/4] sched/fair: Introduce arch_sched_asym_prefer_early() Ricardo Neri
@ 2021-04-06  4:11 ` Ricardo Neri
  2021-04-06 11:17   ` Peter Zijlstra
  2021-04-06 11:18   ` Peter Zijlstra
  2021-04-06  4:11 ` [PATCH 4/4] x86/sched: Enable checks of the state of SMT siblings in load balancing Ricardo Neri
  3 siblings, 2 replies; 15+ messages in thread
From: Ricardo Neri @ 2021-04-06  4:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linux-kernel, Ricardo Neri, Aubrey Li, Ben Segall,
	Daniel Bristot de Oliveira

When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
check for the idle state of the CPU doing the load balancing, but also of
its SMT siblings.

If the CPU doing the balancing is idle but its SMT siblings are not busy,
performance suffers if it pulls tasks from a medium priority CPU that does
not have SMT siblings. The decision to label a group for asymmetric packing
balancing is done in update_sg_lb_stats(). However, for SMT, that code does
not account for idle SMT siblings.

Implement asym_can_pull_tasks() to revisit the early decision on whether
the CPU doing the balance can pull tasks once the needed information is
available. arch_sched_asym_prefer_early() and
arch_asym_check_smt_siblings() are used to conserve the legacy behavior.

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/sched/topology.h |   1 +
 kernel/sched/fair.c            | 122 +++++++++++++++++++++++++++++++++
 2 files changed, 123 insertions(+)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 663b98959305..6487953b24e8 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -58,6 +58,7 @@ static inline int cpu_numa_flags(void)
 
 extern int arch_asym_cpu_priority(int cpu);
 extern bool arch_sched_asym_prefer_early(int a, int b);
+extern bool arch_asym_check_smt_siblings(void);
 
 struct sched_domain_attr {
 	int relax_domain_level;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e74da853b046..14b18dfea5b1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -115,6 +115,14 @@ bool __weak arch_sched_asym_prefer_early(int a, int b)
 	return sched_asym_prefer(a, b);
 }
 
+/*
+ * For asym packing, first check the state of SMT siblings before deciding to
+ * pull tasks.
+ */
+bool __weak arch_asym_check_smt_siblings(void)
+{
+	return false;
+}
 /*
  * The margin used when comparing utilization with CPU capacity.
  *
@@ -8483,6 +8491,110 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 				sgs->group_capacity;
 }
 
+static bool cpu_group_is_smt(int cpu, struct sched_group *sg)
+{
+#ifdef CONFIG_SCHED_SMT
+	if (!static_branch_likely(&sched_smt_present))
+		return false;
+
+	if (sg->group_weight == 1)
+		return false;
+
+	if (cpumask_weight(cpu_smt_mask(cpu)) == 1)
+		return false;
+
+	return cpumask_equal(sched_group_span(sg), cpu_smt_mask(cpu));
+#else
+	return false;
+#endif
+}
+
+/**
+ * asym_can_pull_tasks - Check whether the load balancing CPU can pull tasks
+ * @dst_cpu:	CPU doing the load balancing
+ * @sds:	Load-balancing data with statistics of the local group
+ * @sgs:	Load-balancing statistics of the candidate busiest group
+ * @sg:		The candidate busiet group
+ *
+ * Check the state of the SMT siblings of both @sds::local and @sg and decide
+ * if @dst_cpu can pull tasks. If @dst_cpu does not have SMT siblings, it can
+ * pull tasks if two or more of the SMT siblings of @sg are busy. If only one
+ * CPU in @sg is busy, pull tasks only if @dst_cpu has higher priority.
+ *
+ * If both @dst_cpu and @sg have SMT siblings. Even the number of idle CPUs
+ * between @sds::local and @sg. Thus, pull tasks from @sg if the difference
+ * between the number of busy CPUs is 2 or more. If the difference is of 1,
+ * only pull if @dst_cpu has higher priority. If @sg does not have SMT siblings
+ * only pull tasks if all of the SMT siblings of @dst_cpu are idle and @sg
+ * has lower priority.
+ */
+static bool asym_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
+				struct sg_lb_stats *sgs, struct sched_group *sg)
+{
+#ifdef CONFIG_SCHED_SMT
+	int cpu, local_busy_cpus, sg_busy_cpus;
+	bool local_is_smt, sg_is_smt;
+
+	if (!arch_asym_check_smt_siblings())
+		return true;
+
+	cpu = group_first_cpu(sg);
+	local_is_smt = cpu_group_is_smt(dst_cpu, sds->local);
+	sg_is_smt = cpu_group_is_smt(cpu, sg);
+
+	sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
+
+	if (!local_is_smt) {
+		/*
+		 * If we are here, @dst_cpu is idle and does not have SMT
+		 * siblings. Pull tasks if candidate group has two or more
+		 * busy CPUs.
+		 */
+		if (sg_is_smt && sg_busy_cpus >= 2)
+			return true;
+
+		/*
+		 * @dst_cpu does not have SMT siblings. @sg may have SMT
+		 * siblings and only one is busy. In such case, @dst_cpu
+		 * can help if it has higher priority and is idle.
+		 */
+		return !sds->local_stat.group_util &&
+		       sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+	}
+
+	/* @dst_cpu has SMT siblings. */
+
+	local_busy_cpus = sds->local->group_weight - sds->local_stat.idle_cpus;
+
+	if (sg_is_smt) {
+		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
+
+		/* Local can always help to even the number busy CPUs. */
+		if (busy_cpus_delta >= 2)
+			return true;
+
+		if (busy_cpus_delta == 1)
+			return sched_asym_prefer(dst_cpu,
+						 sg->asym_prefer_cpu);
+
+		return false;
+	}
+
+	/*
+	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
+	 * up with more than one busy SMT sibling and only pull tasks if there
+	 * are not busy CPUs. As CPUs move in and out of idle state frequently,
+	 * also check the group utilization to smoother the decision.
+	 */
+	if (!local_busy_cpus && !sds->local_stat.group_util)
+		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+
+	return false;
+#else
+	return true;
+#endif
+}
+
 /**
  * update_sd_pick_busiest - return 1 on busiest group
  * @env: The load balancing environment.
@@ -8507,6 +8619,10 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 	if (!sgs->sum_h_nr_running)
 		return false;
 
+	if (sgs->group_type == group_asym_packing &&
+	    !asym_can_pull_tasks(env->dst_cpu, sds, sgs, sg))
+		return false;
+
 	/*
 	 * Don't try to pull misfit tasks we can't help.
 	 * We can use max_capacity here as reduction in capacity on some
@@ -9412,6 +9528,12 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 		    nr_running == 1)
 			continue;
 
+		/* Make sure we only pull tasks from a CPU of lower priority */
+		if ((env->sd->flags & SD_ASYM_PACKING) &&
+		    sched_asym_prefer(i, env->dst_cpu) &&
+		    nr_running == 1)
+			continue;
+
 		switch (env->migration_type) {
 		case migrate_load:
 			/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] x86/sched: Enable checks of the state of SMT siblings in load balancing
  2021-04-06  4:11 [PATCH 0/4] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
                   ` (2 preceding siblings ...)
  2021-04-06  4:11 ` [PATCH 3/4] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
@ 2021-04-06  4:11 ` Ricardo Neri
  3 siblings, 0 replies; 15+ messages in thread
From: Ricardo Neri @ 2021-04-06  4:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linux-kernel, Ricardo Neri, Aubrey Li, Ben Segall,
	Daniel Bristot de Oliveira

ITMT relies on asymmetric packing of tasks to ensure CPUs are populated in
priority order. When balancing load, the scheduler compares scheduling
groups in pairs, and compares only the priority of the CPUs of highest
priority in the group. This may result on CPUs with medium priority being
overlooked. A recent change introduced logic to also consider the idle
state of the SMT siblings of the CPU doing the load balance. Enable those
checks for x86 when using ITMT.

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/itmt.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index 1afbdd1dd777..1407120af82d 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -28,6 +28,8 @@ DEFINE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
 
 /* Boolean to track if system has ITMT capabilities */
 static bool __read_mostly sched_itmt_capable;
+/* Boolean to activate checks on the state of SMT siblings */
+static bool __read_mostly sched_itmt_smt_checks;
 
 /*
  * Boolean to control whether we want to move processes to cpu capable
@@ -124,6 +126,8 @@ int sched_set_itmt_support(void)
 
 	sysctl_sched_itmt_enabled = 1;
 
+	sched_itmt_smt_checks = true;
+
 	x86_topology_update = true;
 	rebuild_sched_domains();
 
@@ -160,6 +164,7 @@ void sched_clear_itmt_support(void)
 	if (sysctl_sched_itmt_enabled) {
 		/* disable sched_itmt if we are no longer ITMT capable */
 		sysctl_sched_itmt_enabled = 0;
+		sched_itmt_smt_checks = false;
 		x86_topology_update = true;
 		rebuild_sched_domains();
 	}
@@ -167,6 +172,16 @@ void sched_clear_itmt_support(void)
 	mutex_unlock(&itmt_update_mutex);
 }
 
+bool arch_asym_check_smt_siblings(void)
+{
+	return sched_itmt_smt_checks;
+}
+
+bool arch_sched_asym_prefer_early(int a, int b)
+{
+	return sched_itmt_smt_checks;
+}
+
 int arch_asym_cpu_priority(int cpu)
 {
 	return per_cpu(sched_core_priority, cpu);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-04-06  4:11 ` [PATCH 3/4] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
@ 2021-04-06 11:17   ` Peter Zijlstra
  2021-04-06 23:17     ` Ricardo Neri
  2021-04-06 11:18   ` Peter Zijlstra
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2021-04-06 11:17 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linux-kernel, Aubrey Li, Ben Segall, Daniel Bristot de Oliveira

On Mon, Apr 05, 2021 at 09:11:07PM -0700, Ricardo Neri wrote:
> @@ -8507,6 +8619,10 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>  	if (!sgs->sum_h_nr_running)
>  		return false;
>  
> +	if (sgs->group_type == group_asym_packing &&
> +	    !asym_can_pull_tasks(env->dst_cpu, sds, sgs, sg))
> +		return false;

All of this makes my head hurt; but afaict this isn't right.

Your update_sg_lb_stats() change makes that we unconditionally set
sgs->group_asym_packing, and then this is to undo that. But it's not
clear this covers all cases right.

Even if !sched_asym_prefer(), we could end up selecting this sg as
busiest, but you're just bailing out here.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-04-06  4:11 ` [PATCH 3/4] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
  2021-04-06 11:17   ` Peter Zijlstra
@ 2021-04-06 11:18   ` Peter Zijlstra
  2021-04-06 23:17     ` Ricardo Neri
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2021-04-06 11:18 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linux-kernel, Aubrey Li, Ben Segall, Daniel Bristot de Oliveira

On Mon, Apr 05, 2021 at 09:11:07PM -0700, Ricardo Neri wrote:
> +static bool cpu_group_is_smt(int cpu, struct sched_group *sg)
> +{
> +#ifdef CONFIG_SCHED_SMT
> +	if (!static_branch_likely(&sched_smt_present))
> +		return false;
> +
> +	if (sg->group_weight == 1)
> +		return false;
> +
> +	if (cpumask_weight(cpu_smt_mask(cpu)) == 1)
> +		return false;

Please explain this condition. Why is it required?

> +	return cpumask_equal(sched_group_span(sg), cpu_smt_mask(cpu));
> +#else
> +	return false;
> +#endif
> +}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] sched/fair: Introduce arch_sched_asym_prefer_early()
  2021-04-06  4:11 ` [PATCH 2/4] sched/fair: Introduce arch_sched_asym_prefer_early() Ricardo Neri
@ 2021-04-06 14:31   ` Vincent Guittot
  2021-04-06 23:36     ` Ricardo Neri
  0 siblings, 1 reply; 15+ messages in thread
From: Vincent Guittot @ 2021-04-06 14:31 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Len Brown, Srinivas Pandruvada, Tim Chen,
	Aubrey Li, Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linux-kernel, Aubrey Li, Ben Segall, Daniel Bristot de Oliveira

On Tue, 6 Apr 2021 at 06:11, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> Introduce arch_sched_asym_prefer_early() so that architectures with SMT
> can delay the decision to label a candidate busiest group as
> group_asym_packing.
>
> When using asymmetric packing, high priority idle CPUs pull tasks from
> scheduling groups with low priority CPUs. The decision on using asymmetric
> packing for load balancing is done after collecting the statistics of a
> candidate busiest group. However, this decision needs to consider the
> state of SMT siblings of dst_cpu.
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  include/linux/sched/topology.h |  1 +
>  kernel/sched/fair.c            | 11 ++++++++++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index 8f0f778b7c91..663b98959305 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -57,6 +57,7 @@ static inline int cpu_numa_flags(void)
>  #endif
>
>  extern int arch_asym_cpu_priority(int cpu);
> +extern bool arch_sched_asym_prefer_early(int a, int b);
>
>  struct sched_domain_attr {
>         int relax_domain_level;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4ef3fa0d5e8d..e74da853b046 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -106,6 +106,15 @@ int __weak arch_asym_cpu_priority(int cpu)
>         return -cpu;
>  }
>
> +/*
> + * For asym packing, early check if CPUs with higher priority should be
> + * preferred. On some architectures, more data is needed to make a decision.
> + */
> +bool __weak arch_sched_asym_prefer_early(int a, int b)
> +{
> +       return sched_asym_prefer(a, b);
> +}
> +
>  /*
>   * The margin used when comparing utilization with CPU capacity.
>   *
> @@ -8458,7 +8467,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>         if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
>             env->idle != CPU_NOT_IDLE &&
>             sgs->sum_h_nr_running &&
> -           sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
> +           arch_sched_asym_prefer_early(env->dst_cpu, group->asym_prefer_cpu)) {

If itmt set arch_sched_asym_prefer_early to true all groups will be
set as group_asym_packing unconditionally which is wrong. The state
has to be set only when we want asym packing migration

>                 sgs->group_asym_packing = 1;
>         }
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-04-06 11:17   ` Peter Zijlstra
@ 2021-04-06 23:17     ` Ricardo Neri
  2021-04-08 11:10       ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Ricardo Neri @ 2021-04-06 23:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linux-kernel, Aubrey Li, Ben Segall, Daniel Bristot de Oliveira

On Tue, Apr 06, 2021 at 01:17:28PM +0200, Peter Zijlstra wrote:
> On Mon, Apr 05, 2021 at 09:11:07PM -0700, Ricardo Neri wrote:
> > @@ -8507,6 +8619,10 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> >  	if (!sgs->sum_h_nr_running)
> >  		return false;
> >  
> > +	if (sgs->group_type == group_asym_packing &&
> > +	    !asym_can_pull_tasks(env->dst_cpu, sds, sgs, sg))
> > +		return false;
> 
> All of this makes my head hurt; but afaict this isn't right.
> 
> Your update_sg_lb_stats() change makes that we unconditionally set
> sgs->group_asym_packing, and then this is to undo that. But it's not
> clear this covers all cases right.

We could not make a decision to set sgs->group_asym_packing in
update_sg_lb_stats() because we don't have information about the dst_cpu
and its SMT siblings if any. That is the reason I proposed to delay the
decision to update_sd_pick_busiest(), where we can compare local and
sgs.
> 
> Even if !sched_asym_prefer(), we could end up selecting this sg as
> busiest, but you're just bailing out here.

Even if sgs->group_asym_packing is unconditionally set, sgs can still
be classified as group_overloaded and group_imbalanced. In such cases
we wouldn't bailout. sgs could not be classified as group_fully_busy
or group_has_spare and we would bailout, though. Is your concern about
these? I can fixup these two cases.

Thanks and BR,
Ricardo
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-04-06 11:18   ` Peter Zijlstra
@ 2021-04-06 23:17     ` Ricardo Neri
  2021-04-08 11:21       ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Ricardo Neri @ 2021-04-06 23:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linux-kernel, Aubrey Li, Ben Segall, Daniel Bristot de Oliveira

On Tue, Apr 06, 2021 at 01:18:09PM +0200, Peter Zijlstra wrote:
> On Mon, Apr 05, 2021 at 09:11:07PM -0700, Ricardo Neri wrote:
> > +static bool cpu_group_is_smt(int cpu, struct sched_group *sg)
> > +{
> > +#ifdef CONFIG_SCHED_SMT
> > +	if (!static_branch_likely(&sched_smt_present))
> > +		return false;
> > +
> > +	if (sg->group_weight == 1)
> > +		return false;
> > +
> > +	if (cpumask_weight(cpu_smt_mask(cpu)) == 1)
> > +		return false;
> 
> Please explain this condition. Why is it required?

Thank you for your quick review Peter!

Probably this is not required since the previous check verifies the
group weight, and the subsequent check makes sure that @sg matches the
SMT siblings of @cpu.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] sched/fair: Introduce arch_sched_asym_prefer_early()
  2021-04-06 14:31   ` Vincent Guittot
@ 2021-04-06 23:36     ` Ricardo Neri
  0 siblings, 0 replies; 15+ messages in thread
From: Ricardo Neri @ 2021-04-06 23:36 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Len Brown, Srinivas Pandruvada, Tim Chen,
	Aubrey Li, Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linux-kernel, Aubrey Li, Ben Segall, Daniel Bristot de Oliveira

On Tue, Apr 06, 2021 at 04:31:35PM +0200, Vincent Guittot wrote:
> On Tue, 6 Apr 2021 at 06:11, Ricardo Neri
> <ricardo.neri-calderon@linux.intel.com> wrote:
> >
> > Introduce arch_sched_asym_prefer_early() so that architectures with SMT
> > can delay the decision to label a candidate busiest group as
> > group_asym_packing.
> >
> > When using asymmetric packing, high priority idle CPUs pull tasks from
> > scheduling groups with low priority CPUs. The decision on using asymmetric
> > packing for load balancing is done after collecting the statistics of a
> > candidate busiest group. However, this decision needs to consider the
> > state of SMT siblings of dst_cpu.
> >
> > Cc: Aubrey Li <aubrey.li@intel.com>
> > Cc: Ben Segall <bsegall@google.com>
> > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Quentin Perret <qperret@google.com>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Tim Chen <tim.c.chen@linux.intel.com>
> > Reviewed-by: Len Brown <len.brown@intel.com>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  include/linux/sched/topology.h |  1 +
> >  kernel/sched/fair.c            | 11 ++++++++++-
> >  2 files changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > index 8f0f778b7c91..663b98959305 100644
> > --- a/include/linux/sched/topology.h
> > +++ b/include/linux/sched/topology.h
> > @@ -57,6 +57,7 @@ static inline int cpu_numa_flags(void)
> >  #endif
> >
> >  extern int arch_asym_cpu_priority(int cpu);
> > +extern bool arch_sched_asym_prefer_early(int a, int b);
> >
> >  struct sched_domain_attr {
> >         int relax_domain_level;
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 4ef3fa0d5e8d..e74da853b046 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -106,6 +106,15 @@ int __weak arch_asym_cpu_priority(int cpu)
> >         return -cpu;
> >  }
> >
> > +/*
> > + * For asym packing, early check if CPUs with higher priority should be
> > + * preferred. On some architectures, more data is needed to make a decision.
> > + */
> > +bool __weak arch_sched_asym_prefer_early(int a, int b)
> > +{
> > +       return sched_asym_prefer(a, b);
> > +}
> > +
> >  /*
> >   * The margin used when comparing utilization with CPU capacity.
> >   *
> > @@ -8458,7 +8467,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> >         if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
> >             env->idle != CPU_NOT_IDLE &&
> >             sgs->sum_h_nr_running &&
> > -           sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
> > +           arch_sched_asym_prefer_early(env->dst_cpu, group->asym_prefer_cpu)) {
> 
> If itmt set arch_sched_asym_prefer_early to true all groups will be
> set as group_asym_packing unconditionally which is wrong. The state
> has to be set only when we want asym packing migration

Thanks for your quick review Vincent! 

Indeed, sgs->group_asym_packing should only be set when we want
asym_packing migration. However, for ITMT we also need to know the state
of the SMT siblings of dst_cpu if any. update_sg_lb_stats() does not
anything about the local group. Thus, arch_sched_asym_prefer_early()
intends to give an early decision that can be revoked later when
statistics of the local group are known. I also thought about checking
the state of the SMT siblings from update_sg_lb_stats() but that would
duplicate the functionality of update_sg_lb_stats() when called on the
local group.

The group can still be classified as group_overloaded and
group_imbalanced as they take precedence over group_asym_packing. We
would miss the group_has_spare and group_fully_busy classifications,
though. I can look into fixing that.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-04-06 23:17     ` Ricardo Neri
@ 2021-04-08 11:10       ` Peter Zijlstra
  2021-04-09  5:12         ` Ricardo Neri
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2021-04-08 11:10 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linux-kernel, Aubrey Li, Ben Segall, Daniel Bristot de Oliveira

On Tue, Apr 06, 2021 at 04:17:10PM -0700, Ricardo Neri wrote:
> On Tue, Apr 06, 2021 at 01:17:28PM +0200, Peter Zijlstra wrote:
> > On Mon, Apr 05, 2021 at 09:11:07PM -0700, Ricardo Neri wrote:
> > > @@ -8507,6 +8619,10 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> > >  	if (!sgs->sum_h_nr_running)
> > >  		return false;
> > >  
> > > +	if (sgs->group_type == group_asym_packing &&
> > > +	    !asym_can_pull_tasks(env->dst_cpu, sds, sgs, sg))
> > > +		return false;
> > 
> > All of this makes my head hurt; but afaict this isn't right.
> > 
> > Your update_sg_lb_stats() change makes that we unconditionally set
> > sgs->group_asym_packing, and then this is to undo that. But it's not
> > clear this covers all cases right.
> 
> We could not make a decision to set sgs->group_asym_packing in
> update_sg_lb_stats() because we don't have information about the dst_cpu
> and its SMT siblings if any. That is the reason I proposed to delay the
> decision to update_sd_pick_busiest(), where we can compare local and
> sgs.

Yeah, I sorta got that.

> > Even if !sched_asym_prefer(), we could end up selecting this sg as
> > busiest, but you're just bailing out here.
> 
> Even if sgs->group_asym_packing is unconditionally set, sgs can still
> be classified as group_overloaded and group_imbalanced. In such cases
> we wouldn't bailout. sgs could not be classified as group_fully_busy
> or group_has_spare and we would bailout, though. Is your concern about
> these? I can fixup these two cases.

Yes. Either explain (in a comment) why those cases are not relevant, or
handle them properly.

Because when reading this, it wasn't at all obvious that this is correct
or as intended.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-04-06 23:17     ` Ricardo Neri
@ 2021-04-08 11:21       ` Peter Zijlstra
  2021-04-09  5:13         ` Ricardo Neri
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2021-04-08 11:21 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linux-kernel, Aubrey Li, Ben Segall, Daniel Bristot de Oliveira

On Tue, Apr 06, 2021 at 04:17:51PM -0700, Ricardo Neri wrote:
> On Tue, Apr 06, 2021 at 01:18:09PM +0200, Peter Zijlstra wrote:
> > On Mon, Apr 05, 2021 at 09:11:07PM -0700, Ricardo Neri wrote:
> > > +static bool cpu_group_is_smt(int cpu, struct sched_group *sg)
> > > +{
> > > +#ifdef CONFIG_SCHED_SMT
> > > +	if (!static_branch_likely(&sched_smt_present))
> > > +		return false;
> > > +
> > > +	if (sg->group_weight == 1)
> > > +		return false;
> > > +
> > > +	if (cpumask_weight(cpu_smt_mask(cpu)) == 1)
> > > +		return false;
> > 
> > Please explain this condition. Why is it required?
> 
> Thank you for your quick review Peter!
> 
> Probably this is not required since the previous check verifies the
> group weight, and the subsequent check makes sure that @sg matches the
> SMT siblings of @cpu.

So the thing is that cpumask_weight() can be fairly expensive, depending
on how large the machine is.

Now I suppose this mixing of SMT and !SMT cores is typical for 'small'
machines (for now), but this is enabled for everything with ITMT on,
which might very well include large systems.

So yes, if it can go away, that'd be good.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-04-08 11:10       ` Peter Zijlstra
@ 2021-04-09  5:12         ` Ricardo Neri
  0 siblings, 0 replies; 15+ messages in thread
From: Ricardo Neri @ 2021-04-09  5:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linux-kernel, Aubrey Li, Ben Segall, Daniel Bristot de Oliveira

On Thu, Apr 08, 2021 at 01:10:39PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 06, 2021 at 04:17:10PM -0700, Ricardo Neri wrote:
> > On Tue, Apr 06, 2021 at 01:17:28PM +0200, Peter Zijlstra wrote:
> > > On Mon, Apr 05, 2021 at 09:11:07PM -0700, Ricardo Neri wrote:
> > > > @@ -8507,6 +8619,10 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> > > >  	if (!sgs->sum_h_nr_running)
> > > >  		return false;
> > > >  
> > > > +	if (sgs->group_type == group_asym_packing &&
> > > > +	    !asym_can_pull_tasks(env->dst_cpu, sds, sgs, sg))
> > > > +		return false;
> > > 
> > > All of this makes my head hurt; but afaict this isn't right.
> > > 
> > > Your update_sg_lb_stats() change makes that we unconditionally set
> > > sgs->group_asym_packing, and then this is to undo that. But it's not
> > > clear this covers all cases right.
> > 
> > We could not make a decision to set sgs->group_asym_packing in
> > update_sg_lb_stats() because we don't have information about the dst_cpu
> > and its SMT siblings if any. That is the reason I proposed to delay the
> > decision to update_sd_pick_busiest(), where we can compare local and
> > sgs.
> 
> Yeah, I sorta got that.
> 
> > > Even if !sched_asym_prefer(), we could end up selecting this sg as
> > > busiest, but you're just bailing out here.
> > 
> > Even if sgs->group_asym_packing is unconditionally set, sgs can still
> > be classified as group_overloaded and group_imbalanced. In such cases
> > we wouldn't bailout. sgs could not be classified as group_fully_busy
> > or group_has_spare and we would bailout, though. Is your concern about
> > these? I can fixup these two cases.
> 
> Yes. Either explain (in a comment) why those cases are not relevant, or
> handle them properly.
> 
> Because when reading this, it wasn't at all obvious that this is correct
> or as intended.

Sure Peter, I will post a v2 handling the remaining cases properly.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-04-08 11:21       ` Peter Zijlstra
@ 2021-04-09  5:13         ` Ricardo Neri
  0 siblings, 0 replies; 15+ messages in thread
From: Ricardo Neri @ 2021-04-09  5:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linux-kernel, Aubrey Li, Ben Segall, Daniel Bristot de Oliveira

On Thu, Apr 08, 2021 at 01:21:22PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 06, 2021 at 04:17:51PM -0700, Ricardo Neri wrote:
> > On Tue, Apr 06, 2021 at 01:18:09PM +0200, Peter Zijlstra wrote:
> > > On Mon, Apr 05, 2021 at 09:11:07PM -0700, Ricardo Neri wrote:
> > > > +static bool cpu_group_is_smt(int cpu, struct sched_group *sg)
> > > > +{
> > > > +#ifdef CONFIG_SCHED_SMT
> > > > +	if (!static_branch_likely(&sched_smt_present))
> > > > +		return false;
> > > > +
> > > > +	if (sg->group_weight == 1)
> > > > +		return false;
> > > > +
> > > > +	if (cpumask_weight(cpu_smt_mask(cpu)) == 1)
> > > > +		return false;
> > > 
> > > Please explain this condition. Why is it required?
> > 
> > Thank you for your quick review Peter!
> > 
> > Probably this is not required since the previous check verifies the
> > group weight, and the subsequent check makes sure that @sg matches the
> > SMT siblings of @cpu.
> 
> So the thing is that cpumask_weight() can be fairly expensive, depending
> on how large the machine is.
> 
> Now I suppose this mixing of SMT and !SMT cores is typical for 'small'
> machines (for now), but this is enabled for everything with ITMT on,
> which might very well include large systems.
> 
> So yes, if it can go away, that'd be good.

Sure Peter, I think this check can be removed. I'll post a v2 with the
updates.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-04-09  5:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-06  4:11 [PATCH 0/4] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
2021-04-06  4:11 ` [PATCH 1/4] sched/fair: Optimize checking for group_asym_packing Ricardo Neri
2021-04-06  4:11 ` [PATCH 2/4] sched/fair: Introduce arch_sched_asym_prefer_early() Ricardo Neri
2021-04-06 14:31   ` Vincent Guittot
2021-04-06 23:36     ` Ricardo Neri
2021-04-06  4:11 ` [PATCH 3/4] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
2021-04-06 11:17   ` Peter Zijlstra
2021-04-06 23:17     ` Ricardo Neri
2021-04-08 11:10       ` Peter Zijlstra
2021-04-09  5:12         ` Ricardo Neri
2021-04-06 11:18   ` Peter Zijlstra
2021-04-06 23:17     ` Ricardo Neri
2021-04-08 11:21       ` Peter Zijlstra
2021-04-09  5:13         ` Ricardo Neri
2021-04-06  4:11 ` [PATCH 4/4] x86/sched: Enable checks of the state of SMT siblings in load balancing Ricardo Neri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).