linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING
@ 2021-09-11  1:18 Ricardo Neri
  2021-09-11  1:18 ` [PATCH v5 1/6] x86/sched: Decrease further the priorities of SMT siblings Ricardo Neri
                   ` (5 more replies)
  0 siblings, 6 replies; 35+ messages in thread
From: Ricardo Neri @ 2021-09-11  1:18 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Ricardo Neri

This is v5 the series. v1, v2, v3, and v4 patches and test results can be
found in [1], [2], [3], and [4], respectively.

=== Problem statement ===
++ Load balancing ++

When using asymmetric packing, there exists CPU topologies with three
priority levels in which only a subset of the physical cores support SMT.
An instance of such topology is Intel Alderlake, a hybrid processor with
a mixture of Intel Core (with support for SMT) and Intel Atom CPUs.

On Alderlake, it is almost always beneficial to spread work by picking
first the Core CPUs, then the Atoms and at last the SMT siblings.

The current load balancer, however, does not behave as described when using
ASYM_PACKING. Instead, the load balancer will choose higher-priority CPUs 
(an Intel Core) over medium-priority CPUs (an Intel Atom), and subsequently
overflow the load to a low priority SMT sibling CPU. This leaves medium-
priority CPUs idle while low-priority CPUs are busy.

This patchset fixes this behavior by also checking the idle state of the
SMT siblings of both the CPU doing the load balance and the busiest
candidate group when deciding whether the destination CPUs can pull tasks
from the busiest CPU.

++ Rework ASYM_PACKING priorities with ITMT ++
We also reworked the priority computation of the SMT siblings to ensure
that higher-numbered SMT siblings are always low priority. The current
computation may lead to situations in which in some processors those
higher-numbered SMT siblings have the same priority as the Intel Atom
CPUs.

=== Testing ===
I ran a few benchmarks with and without this version of the patchset on
an Intel Alderlake system with 8 Intel Core (with SMT) and 8 Intel
Atom CPUs.

The baseline for the results is an unmodified v5.14 kernel. Results
show a comparative percentage of improvement (positive) or degradation
(negative). Each test case is repeated eight times, and the standard
deviation among repetitions is also documented.

Table 1 shows the results when using hardware-controlled performance
performance states (HWP), a common use case. The impact of the patches
is overall positive with a few test cases showing slight degradation.
hackbench is especially difficult to assess it shows a high degree of
variability.

Thanks and BR,
Ricardo

ITMT: Intel Turbo Boost Max Technology 3.0

========
Changes since v4:
  * Use sg_lb_stats::sum_nr_running the idle state of a scheduling group.
    (patch 6, Vincent, Peter)
  * Do not even idle CPUs in asym_smt_can_pull_tasks(). (patch 6, Vincent)
  * Unchanged patches: 1, 2, 3, 4, 5.

Changes since v3:
  * Reworked the ITMT priority computation to further reduce the priority
    of SMT siblings (patch 1).
  * Clear scheduling group flags when a child scheduling level
    degenerates (patch 2).
  * Removed arch-specific hooks (patch 6, PeterZ)
  * Removed redundant checks for the local group. (patch 5, Dietmar)
  * Removed redundant check for local idle CPUs and local group
    utilization. (patch 6, Joel)
  * Reworded commit messages of patches 2, 3, 5, and 6 for clarity.
    (Len, PeterZ)
  * Added Joel's Reviewed-by tag.
  * Unchanged patches: 4

Changes since v2:
  * Removed arch_sched_asym_prefer_early() and re-evaluation of the
    candidate busiest group in update_sd_pick_busiest(). (PeterZ)
  * Introduced sched_group::flags to reflect the properties of CPUs
    in the scheduling group. This helps to identify scheduling groups
    whose CPUs are SMT siblings. (PeterZ)
  * Modified update_sg_lb_stats() to get statistics of the scheduling
    domain as an argument. This provides it with the statistics of the
    local domain. (PeterZ)
  * Introduced sched_asym() to decide if a busiest candidate group can
    be marked for asymmetric packing.
  * Reworded patch 1 for clarity. (PeterZ)

Changes since v1:
  * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
    tasks. Instead, reclassify the candidate busiest group, as it
    may still be selected. (PeterZ)
  * Avoid an expensive and unnecessary call to cpumask_weight() when
    determining if a sched_group is comprised of SMT siblings.
    (PeterZ).
  * Updated test results using the v2 patches.


========      Table 1. Test results of patches with HWP        ========
=======================================================================

hackbench
=========
case                    load            baseline(std%)  compare%( std%)
process-pipe            group-1          1.00 ( 18.21)   +4.95 ( 11.79)
process-pipe            group-2          1.00 ( 10.09)   -2.41 ( 14.12)
process-pipe            group-4          1.00 ( 29.09)   -4.04 ( 22.58)
process-pipe            group-8          1.00 (  6.76)   -1.61 (  8.13)
process-pipe            group-12         1.00 ( 12.39)   +3.90 (  7.59)
process-pipe            group-16         1.00 (  5.78)   -3.65 (  7.90)
process-pipe            group-20         1.00 (  4.71)   -2.70 (  5.17)
process-pipe            group-24         1.00 (  9.44)  -11.20 ( 10.22)
process-pipe            group-32         1.00 (  9.29)   -0.84 (  7.04)
process-pipe            group-48         1.00 (  7.47)   +0.66 (  5.90)
process-sockets         group-1          1.00 ( 13.53)   -9.60 (  7.91)
process-sockets         group-2          1.00 ( 21.92)   +7.48 (  9.23)
process-sockets         group-4          1.00 ( 14.59)   +9.43 ( 11.85)
process-sockets         group-8          1.00 (  9.43)   +4.67 (  6.23)
process-sockets         group-12         1.00 ( 12.80)   +7.44 ( 12.62)
process-sockets         group-16         1.00 (  8.47)   +2.12 (  9.45)
process-sockets         group-20         1.00 (  5.86)   +2.41 (  3.20)
process-sockets         group-24         1.00 (  4.47)   +4.56 (  3.14)
process-sockets         group-32         1.00 (  4.40)   +5.41 (  3.11)
process-sockets         group-48         1.00 ( 16.60)  +14.69 (  3.08)
threads-pipe            group-1          1.00 (  3.49)   -0.91 (  3.37)
threads-pipe            group-2          1.00 ( 17.48)   +7.36 (  8.81)
threads-pipe            group-4          1.00 ( 17.58)   -2.36 ( 18.80)
threads-pipe            group-8          1.00 (  9.58)   -1.26 (  6.24)
threads-pipe            group-12         1.00 (  6.49)   +2.34 (  4.37)
threads-pipe            group-16         1.00 ( 15.49)   -6.13 ( 14.84)
threads-pipe            group-20         1.00 (  2.76)   -7.87 (  7.93)
threads-pipe            group-24         1.00 ( 13.80)   +3.46 ( 11.91)
threads-pipe            group-32         1.00 (  8.12)   -2.91 (  8.20)
threads-pipe            group-48         1.00 (  5.79)   -3.95 (  5.44)
threads-sockets         group-1          1.00 ( 11.24)   -4.56 ( 10.01)
threads-sockets         group-2          1.00 (  6.81)   +0.60 (  4.95)
threads-sockets         group-4          1.00 (  8.78)   -0.79 (  7.86)
threads-sockets         group-8          1.00 (  6.51)  -15.64 ( 15.33)
threads-sockets         group-12         1.00 ( 12.30)   -7.09 ( 12.45)
threads-sockets         group-16         1.00 (  8.65)   +3.77 (  8.25)
threads-sockets         group-20         1.00 (  5.52)   +4.40 (  3.48)
threads-sockets         group-24         1.00 (  2.89)   +2.54 (  2.68)
threads-sockets         group-32         1.00 (  3.49)   +1.17 (  3.02)
threads-sockets         group-48         1.00 (  3.15)   -3.95 ( 10.64)

netperf
=======
case                    load            baseline(std%)  compare%( std%)
TCP_RR                  thread-1         1.00 (  0.55)   -0.12 (  0.47)
TCP_RR                  thread-2         1.00 (  0.77)   -0.44 (  0.65)
TCP_RR                  thread-4         1.00 (  0.73)   +0.26 (  0.61)
TCP_RR                  thread-8         1.00 (  1.21)   -0.18 (  1.18)
TCP_RR                  thread-12        1.00 (  1.91)   -0.29 (  2.25)
TCP_RR                  thread-16        1.00 (  4.18)   -0.45 (  3.78)
TCP_RR                  thread-20        1.00 (  2.09)   -0.83 (  1.75)
TCP_RR                  thread-24        1.00 (  1.23)   -0.42 (  1.35)
TCP_RR                  thread-32        1.00 ( 13.72)   +6.22 ( 16.10)
TCP_RR                  thread-48        1.00 ( 12.91)   -0.38 ( 13.37)
UDP_RR                  thread-1         1.00 (  0.85)   +0.04 (  0.75)
UDP_RR                  thread-2         1.00 (  0.57)   -0.56 (  0.62)
UDP_RR                  thread-4         1.00 (  0.65)   -0.04 (  0.78)
UDP_RR                  thread-8         1.00 (  1.24)   -0.46 (  8.31)
UDP_RR                  thread-12        1.00 (  6.87)   +0.01 (  1.27)
UDP_RR                  thread-16        1.00 (  6.07)   -0.30 (  1.51)
UDP_RR                  thread-20        1.00 (  1.00)   -0.97 (  0.87)
UDP_RR                  thread-24        1.00 (  0.67)   +0.65 (  4.39)
UDP_RR                  thread-32        1.00 ( 15.59)   +3.27 ( 17.34)
UDP_RR                  thread-48        1.00 ( 12.56)   -1.28 ( 13.43)

tbench
======
case                    load            baseline(std%)  compare%( std%)
loopback                thread-1         1.00 (  0.59)   +0.06 (  0.53)
loopback                thread-2         1.00 (  0.44)   -0.69 (  0.66)
loopback                thread-4         1.00 (  0.27)   +0.61 (  0.31)
loopback                thread-8         1.00 (  0.25)   -0.18 (  0.20)
loopback                thread-12        1.00 (  1.12)   -0.23 (  0.85)
loopback                thread-16        1.00 (  0.40)   -0.25 (  1.59)
loopback                thread-20        1.00 (  0.20)   +0.58 (  0.34)
loopback                thread-24        1.00 (  6.93)   +0.73 (  8.46)
loopback                thread-32        1.00 (  4.61)   +0.96 (  1.62)
loopback                thread-48        1.00 (  2.33)   +1.45 (  0.97)

schbench
========
case                    load            baseline(std%)  compare%( std%)
normal                  mthread-1        1.00 (  4.39)   -0.37 (  6.53)
normal                  mthread-2        1.00 (  2.44)   +0.96 (  4.31)
normal                  mthread-4        1.00 (  9.47)  +13.26 ( 19.52)
normal                  mthread-8        1.00 (  0.06)   -1.05 (  2.31)
normal                  mthread-12       1.00 (  2.62)   -0.66 (  2.21)
normal                  mthread-16       1.00 (  1.90)   +0.21 (  1.70)
normal                  mthread-20       1.00 (  2.25)   -0.44 (  2.41)
normal                  mthread-24       1.00 (  1.89)   -0.05 (  1.78)
normal                  mthread-32       1.00 (  2.04)   -0.28 (  1.92)
normal                  mthread-48       1.00 (  4.43)   +1.10 (  3.86)

[1]. https://lore.kernel.org/lkml/20210406041108.7416-1-ricardo.neri-calderon@linux.intel.com/
[2]. https://lore.kernel.org/lkml/20210414020436.12980-1-ricardo.neri-calderon@linux.intel.com/
[3]. https://lore.kernel.org/lkml/20210513154909.6385-1-ricardo.neri-calderon@linux.intel.com/
[4]. https://lore.kernel.org/lkml/20210810144145.18776-1-ricardo.neri-calderon@linux.intel.com/

Ricardo Neri (6):
  x86/sched: Decrease further the priorities of SMT siblings
  sched/topology: Introduce sched_group::flags
  sched/fair: Optimize checking for group_asym_packing
  sched/fair: Provide update_sg_lb_stats() with sched domain statistics
  sched/fair: Carve out logic to mark a group for asymmetric packing
  sched/fair: Consider SMT in ASYM_PACKING load balance

 arch/x86/kernel/itmt.c  |   2 +-
 kernel/sched/fair.c     | 121 ++++++++++++++++++++++++++++++++++++----
 kernel/sched/sched.h    |   1 +
 kernel/sched/topology.c |  21 ++++++-
 4 files changed, 131 insertions(+), 14 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v5 1/6] x86/sched: Decrease further the priorities of SMT siblings
  2021-09-11  1:18 [PATCH v5 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
@ 2021-09-11  1:18 ` Ricardo Neri
  2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
  2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  2021-09-11  1:18 ` [PATCH v5 2/6] sched/topology: Introduce sched_group::flags Ricardo Neri
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 35+ messages in thread
From: Ricardo Neri @ 2021-09-11  1:18 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Ricardo Neri, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

When scheduling, it is better to prefer a separate physical core rather
than the SMT sibling of a high priority core. The existing formula to
compute priorities takes such fact in consideration. There may exist,
however, combinations of priorities (i.e., maximum frequencies) in which
the priority of high-numbered SMT siblings of high-priority cores collides
with the priority of low-numbered SMT siblings of low-priority cores.

Consider for instance an SMT2 system with CPUs [0, 1] with priority 60 and
[2, 3] with priority 30(CPUs in brackets are SMT siblings. In such a case,
the resulting priorities would be [120, 60], [60, 30]. Thus, to ensure
that CPU2 has higher priority than CPU1, divide the raw priority by the
squared SMT iterator. The resulting priorities are [120, 30]. [60, 15].

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Originally-by: Len Brown <len.brown@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
  * None

Changes since v3:
  * Introduced this patch

Changes since v2:
  * N/A

Changes since v1:
  * N/A
---
 arch/x86/kernel/itmt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index 1afbdd1dd777..9ff480e94511 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -198,7 +198,7 @@ void sched_set_itmt_core_prio(int prio, int core_cpu)
 		 * of the priority chain and only used when
 		 * all other high priority cpus are out of capacity.
 		 */
-		smt_prio = prio * smp_num_siblings / i;
+		smt_prio = prio * smp_num_siblings / (i * i);
 		per_cpu(sched_core_priority, cpu) = smt_prio;
 		i++;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 2/6] sched/topology: Introduce sched_group::flags
  2021-09-11  1:18 [PATCH v5 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
  2021-09-11  1:18 ` [PATCH v5 1/6] x86/sched: Decrease further the priorities of SMT siblings Ricardo Neri
@ 2021-09-11  1:18 ` Ricardo Neri
  2021-09-17 15:26   ` Vincent Guittot
                     ` (2 more replies)
  2021-09-11  1:18 ` [PATCH v5 3/6] sched/fair: Optimize checking for group_asym_packing Ricardo Neri
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 35+ messages in thread
From: Ricardo Neri @ 2021-09-11  1:18 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Ricardo Neri, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

There exist situations in which the load balance needs to know the
properties of the CPUs in a scheduling group. When using asymmetric
packing, for instance, the load balancer needs to know not only the
state of dst_cpu but also of its SMT siblings, if any.

Use the flags of the child scheduling domains to initialize scheduling
group flags. This will reflect the properties of the CPUs in the
group.

A subsequent changeset will make use of these new flags. No functional
changes are introduced.

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
  * None

Changes since v3:
  * Clear the flags of the scheduling groups of a domain if its child is
    destroyed.
  * Minor rewording of the commit message.

Changes since v2:
  * Introduced this patch.

Changes since v1:
  * N/A
---
 kernel/sched/sched.h    |  1 +
 kernel/sched/topology.c | 21 ++++++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3d3e5793e117..86ab33ce529d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1809,6 +1809,7 @@ struct sched_group {
 	unsigned int		group_weight;
 	struct sched_group_capacity *sgc;
 	int			asym_prefer_cpu;	/* CPU of highest priority in group */
+	int			flags;
 
 	/*
 	 * The CPUs this group covers.
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 4e8698e62f07..c56faae461d9 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -716,8 +716,20 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
 		tmp = sd;
 		sd = sd->parent;
 		destroy_sched_domain(tmp);
-		if (sd)
+		if (sd) {
+			struct sched_group *sg = sd->groups;
+
+			/*
+			 * sched groups hold the flags of the child sched
+			 * domain for convenience. Clear such flags since
+			 * the child is being destroyed.
+			 */
+			do {
+				sg->flags = 0;
+			} while (sg != sd->groups);
+
 			sd->child = NULL;
+		}
 	}
 
 	for (tmp = sd; tmp; tmp = tmp->parent)
@@ -916,10 +928,12 @@ build_group_from_child_sched_domain(struct sched_domain *sd, int cpu)
 		return NULL;
 
 	sg_span = sched_group_span(sg);
-	if (sd->child)
+	if (sd->child) {
 		cpumask_copy(sg_span, sched_domain_span(sd->child));
-	else
+		sg->flags = sd->child->flags;
+	} else {
 		cpumask_copy(sg_span, sched_domain_span(sd));
+	}
 
 	atomic_inc(&sg->ref);
 	return sg;
@@ -1169,6 +1183,7 @@ static struct sched_group *get_group(int cpu, struct sd_data *sdd)
 	if (child) {
 		cpumask_copy(sched_group_span(sg), sched_domain_span(child));
 		cpumask_copy(group_balance_mask(sg), sched_group_span(sg));
+		sg->flags = child->flags;
 	} else {
 		cpumask_set_cpu(cpu, sched_group_span(sg));
 		cpumask_set_cpu(cpu, group_balance_mask(sg));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 3/6] sched/fair: Optimize checking for group_asym_packing
  2021-09-11  1:18 [PATCH v5 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
  2021-09-11  1:18 ` [PATCH v5 1/6] x86/sched: Decrease further the priorities of SMT siblings Ricardo Neri
  2021-09-11  1:18 ` [PATCH v5 2/6] sched/topology: Introduce sched_group::flags Ricardo Neri
@ 2021-09-11  1:18 ` Ricardo Neri
  2021-09-17 15:26   ` Vincent Guittot
                     ` (2 more replies)
  2021-09-11  1:18 ` [PATCH v5 4/6] sched/fair: Provide update_sg_lb_stats() with sched domain statistics Ricardo Neri
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 35+ messages in thread
From: Ricardo Neri @ 2021-09-11  1:18 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Ricardo Neri, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

sched_asmy_prefer() always returns false when called on the local group. By
checking local_group, we can avoid additional checks and invoking
sched_asmy_prefer() when it is not needed. No functional changes are
introduced.

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
  * None

Changes since v3:
  * Further rewording of the commit message. (Len)

Changes since v2:
  * Reworded the commit message for clarity. (Peter Z)

Changes since v1:
  * None
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ff69f245b939..7a054f528bcc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8657,7 +8657,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 	}
 
 	/* Check if dst CPU is idle and preferred to this group */
-	if (env->sd->flags & SD_ASYM_PACKING &&
+	if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
 	    env->idle != CPU_NOT_IDLE &&
 	    sgs->sum_h_nr_running &&
 	    sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 4/6] sched/fair: Provide update_sg_lb_stats() with sched domain statistics
  2021-09-11  1:18 [PATCH v5 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
                   ` (2 preceding siblings ...)
  2021-09-11  1:18 ` [PATCH v5 3/6] sched/fair: Optimize checking for group_asym_packing Ricardo Neri
@ 2021-09-11  1:18 ` Ricardo Neri
  2021-09-17 15:27   ` Vincent Guittot
                     ` (2 more replies)
  2021-09-11  1:18 ` [PATCH v5 5/6] sched/fair: Carve out logic to mark a group for asymmetric packing Ricardo Neri
  2021-09-11  1:18 ` [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
  5 siblings, 3 replies; 35+ messages in thread
From: Ricardo Neri @ 2021-09-11  1:18 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Ricardo Neri, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

Before deciding to pull tasks when using asymmetric packing of tasks,
on some architectures (e.g., x86) it is necessary to know not only the
state of dst_cpu but also of its SMT siblings. The decision to classify
a candidate busiest group as group_asym_packing is done in
update_sg_lb_stats(). Give this function access to the scheduling domain
statistics, which contains the statistics of the local group.

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
  * None

Changes since v3:
  * None

Changes since v2:
  * Introduced this patch.

Changes since v1:
  * N/A
---
 kernel/sched/fair.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a054f528bcc..c5851260b4d8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8605,6 +8605,7 @@ group_type group_classify(unsigned int imbalance_pct,
  * @sg_status: Holds flag indicating the status of the sched_group
  */
 static inline void update_sg_lb_stats(struct lb_env *env,
+				      struct sd_lb_stats *sds,
 				      struct sched_group *group,
 				      struct sg_lb_stats *sgs,
 				      int *sg_status)
@@ -8613,7 +8614,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 
 	memset(sgs, 0, sizeof(*sgs));
 
-	local_group = cpumask_test_cpu(env->dst_cpu, sched_group_span(group));
+	local_group = group == sds->local;
 
 	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
 		struct rq *rq = cpu_rq(i);
@@ -9176,7 +9177,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 				update_group_capacity(env->sd, env->dst_cpu);
 		}
 
-		update_sg_lb_stats(env, sg, sgs, &sg_status);
+		update_sg_lb_stats(env, sds, sg, sgs, &sg_status);
 
 		if (local_group)
 			goto next_group;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 5/6] sched/fair: Carve out logic to mark a group for asymmetric packing
  2021-09-11  1:18 [PATCH v5 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
                   ` (3 preceding siblings ...)
  2021-09-11  1:18 ` [PATCH v5 4/6] sched/fair: Provide update_sg_lb_stats() with sched domain statistics Ricardo Neri
@ 2021-09-11  1:18 ` Ricardo Neri
  2021-09-17 15:27   ` Vincent Guittot
                     ` (2 more replies)
  2021-09-11  1:18 ` [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
  5 siblings, 3 replies; 35+ messages in thread
From: Ricardo Neri @ 2021-09-11  1:18 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Ricardo Neri, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

Create a separate function, sched_asym(). A subsequent changeset will
introduce logic to deal with SMT in conjunction with asmymmetric
packing. Such logic will need the statistics of the scheduling
group provided as argument. Update them before calling sched_asym().

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
  * None

Changes since v3:
  * Remove a redundant check for the local group in sched_asym().
    (Dietmar)
  * Reworded commit message for clarity. (Len)

Changes since v2:
  * Introduced this patch.

Changes since v1:
  * N/A
---
 kernel/sched/fair.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c5851260b4d8..26db017c14a3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8597,6 +8597,13 @@ group_type group_classify(unsigned int imbalance_pct,
 	return group_has_spare;
 }
 
+static inline bool
+sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
+	   struct sched_group *group)
+{
+	return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
+}
+
 /**
  * update_sg_lb_stats - Update sched_group's statistics for load balancing.
  * @env: The load balancing environment.
@@ -8657,18 +8664,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 		}
 	}
 
+	sgs->group_capacity = group->sgc->capacity;
+
+	sgs->group_weight = group->group_weight;
+
 	/* Check if dst CPU is idle and preferred to this group */
 	if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
-	    env->idle != CPU_NOT_IDLE &&
-	    sgs->sum_h_nr_running &&
-	    sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
+	    env->idle != CPU_NOT_IDLE && sgs->sum_h_nr_running &&
+	    sched_asym(env, sds, sgs, group)) {
 		sgs->group_asym_packing = 1;
 	}
 
-	sgs->group_capacity = group->sgc->capacity;
-
-	sgs->group_weight = group->group_weight;
-
 	sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs);
 
 	/* Computing avg_load makes sense only when group is overloaded */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-09-11  1:18 [PATCH v5 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
                   ` (4 preceding siblings ...)
  2021-09-11  1:18 ` [PATCH v5 5/6] sched/fair: Carve out logic to mark a group for asymmetric packing Ricardo Neri
@ 2021-09-11  1:18 ` Ricardo Neri
  2021-09-15 15:43   ` Vincent Guittot
                     ` (3 more replies)
  5 siblings, 4 replies; 35+ messages in thread
From: Ricardo Neri @ 2021-09-11  1:18 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Ricardo Neri, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
check for the idle state of the destination CPU, dst_cpu, but also of
its SMT siblings.

If dst_cpu is idle but its SMT siblings are busy, performance suffers
if it pulls tasks from a medium priority CPU that does not have SMT
siblings.

Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
siblings of both dst_cpu and the CPUs in the candidate busiest group.

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
  * Use sg_lb_stats::sum_nr_running the idle state of a scheduling group.
    (Vincent, Peter)
  * Do not even idle CPUs in asym_smt_can_pull_tasks(). (Vincent)
  * Updated function documentation and corrected a typo.

Changes since v3:
  * Removed the arch_asym_check_smt_siblings() hook. Discussions with the
    powerpc folks showed that this patch should not impact them. Also, more
    recent powerpc processor no longer use asym_packing. (PeterZ)
  * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar)
  * Removed unnecessary check for local CPUs when the local group has zero
    utilization. (Joel)
  * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect
    the fact that it deals with SMT cases.
  * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so
    that callers can deal with non-SMT cases.

Changes since v2:
  * Reworded the commit message to reflect updates in code.
  * Corrected misrepresentation of dst_cpu as the CPU doing the load
    balancing. (PeterZ)
  * Removed call to arch_asym_check_smt_siblings() as it is now called in
    sched_asym().

Changes since v1:
  * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
    tasks. Instead, reclassify the candidate busiest group, as it
    may still be selected. (PeterZ)
  * Avoid an expensive and unnecessary call to cpumask_weight() when
    determining if a sched_group is comprised of SMT siblings.
    (PeterZ).
---
 kernel/sched/fair.c | 94 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 26db017c14a3..8d763dd0174b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8597,10 +8597,98 @@ group_type group_classify(unsigned int imbalance_pct,
 	return group_has_spare;
 }
 
+/**
+ * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
+ * @dst_cpu:	Destination CPU of the load balancing
+ * @sds:	Load-balancing data with statistics of the local group
+ * @sgs:	Load-balancing statistics of the candidate busiest group
+ * @sg:		The candidate busiest group
+ *
+ * Check the state of the SMT siblings of both @sds::local and @sg and decide
+ * if @dst_cpu can pull tasks.
+ *
+ * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
+ * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
+ * only if @dst_cpu has higher priority.
+ *
+ * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
+ * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
+ * Bigger imbalances in the number of busy CPUs will be dealt with in
+ * update_sd_pick_busiest().
+ *
+ * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
+ * of @dst_cpu are idle and @sg has lower priority.
+ */
+static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
+				    struct sg_lb_stats *sgs,
+				    struct sched_group *sg)
+{
+#ifdef CONFIG_SCHED_SMT
+	bool local_is_smt, sg_is_smt;
+	int sg_busy_cpus;
+
+	local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
+	sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
+
+	sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
+
+	if (!local_is_smt) {
+		/*
+		 * If we are here, @dst_cpu is idle and does not have SMT
+		 * siblings. Pull tasks if candidate group has two or more
+		 * busy CPUs.
+		 */
+		if (sg_is_smt && sg_busy_cpus >= 2)
+			return true;
+
+		/*
+		 * @dst_cpu does not have SMT siblings. @sg may have SMT
+		 * siblings and only one is busy. In such case, @dst_cpu
+		 * can help if it has higher priority and is idle (i.e.,
+		 * it has no running tasks).
+		 */
+		return !sds->local_stat.sum_nr_running &&
+		       sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+	}
+
+	/* @dst_cpu has SMT siblings. */
+
+	if (sg_is_smt) {
+		int local_busy_cpus = sds->local->group_weight -
+				      sds->local_stat.idle_cpus;
+		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
+
+		if (busy_cpus_delta == 1)
+			return sched_asym_prefer(dst_cpu,
+						 sg->asym_prefer_cpu);
+
+		return false;
+	}
+
+	/*
+	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
+	 * up with more than one busy SMT sibling and only pull tasks if there
+	 * are not busy CPUs (i.e., no CPU has running tasks).
+	 */
+	if (!sds->local_stat.sum_nr_running)
+		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+
+	return false;
+#else
+	/* Always return false so that callers deal with non-SMT cases. */
+	return false;
+#endif
+}
+
 static inline bool
 sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
 	   struct sched_group *group)
 {
+	/* Only do SMT checks if either local or candidate have SMT siblings */
+	if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
+	    (group->flags & SD_SHARE_CPUCAPACITY))
+		return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
+
 	return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
 }
 
@@ -9606,6 +9694,12 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 		    nr_running == 1)
 			continue;
 
+		/* Make sure we only pull tasks from a CPU of lower priority */
+		if ((env->sd->flags & SD_ASYM_PACKING) &&
+		    sched_asym_prefer(i, env->dst_cpu) &&
+		    nr_running == 1)
+			continue;
+
 		switch (env->migration_type) {
 		case migrate_load:
 			/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-09-11  1:18 ` [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
@ 2021-09-15 15:43   ` Vincent Guittot
  2021-09-17  1:00     ` Ricardo Neri
  2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: Vincent Guittot @ 2021-09-15 15:43 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Srikar Dronamraju, Nicholas Piggin,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
> check for the idle state of the destination CPU, dst_cpu, but also of
> its SMT siblings.
>
> If dst_cpu is idle but its SMT siblings are busy, performance suffers
> if it pulls tasks from a medium priority CPU that does not have SMT
> siblings.
>
> Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
> siblings of both dst_cpu and the CPUs in the candidate busiest group.
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
> Changes since v4:
>   * Use sg_lb_stats::sum_nr_running the idle state of a scheduling group.
>     (Vincent, Peter)
>   * Do not even idle CPUs in asym_smt_can_pull_tasks(). (Vincent)
>   * Updated function documentation and corrected a typo.
>
> Changes since v3:
>   * Removed the arch_asym_check_smt_siblings() hook. Discussions with the
>     powerpc folks showed that this patch should not impact them. Also, more
>     recent powerpc processor no longer use asym_packing. (PeterZ)
>   * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar)
>   * Removed unnecessary check for local CPUs when the local group has zero
>     utilization. (Joel)
>   * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect
>     the fact that it deals with SMT cases.
>   * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so
>     that callers can deal with non-SMT cases.
>
> Changes since v2:
>   * Reworded the commit message to reflect updates in code.
>   * Corrected misrepresentation of dst_cpu as the CPU doing the load
>     balancing. (PeterZ)
>   * Removed call to arch_asym_check_smt_siblings() as it is now called in
>     sched_asym().
>
> Changes since v1:
>   * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
>     tasks. Instead, reclassify the candidate busiest group, as it
>     may still be selected. (PeterZ)
>   * Avoid an expensive and unnecessary call to cpumask_weight() when
>     determining if a sched_group is comprised of SMT siblings.
>     (PeterZ).
> ---
>  kernel/sched/fair.c | 94 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 94 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 26db017c14a3..8d763dd0174b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8597,10 +8597,98 @@ group_type group_classify(unsigned int imbalance_pct,
>         return group_has_spare;
>  }
>
> +/**
> + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> + * @dst_cpu:   Destination CPU of the load balancing
> + * @sds:       Load-balancing data with statistics of the local group
> + * @sgs:       Load-balancing statistics of the candidate busiest group
> + * @sg:                The candidate busiest group
> + *
> + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> + * if @dst_cpu can pull tasks.
> + *
> + * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
> + * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
> + * only if @dst_cpu has higher priority.
> + *
> + * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
> + * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
> + * Bigger imbalances in the number of busy CPUs will be dealt with in
> + * update_sd_pick_busiest().
> + *
> + * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
> + * of @dst_cpu are idle and @sg has lower priority.
> + */
> +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> +                                   struct sg_lb_stats *sgs,
> +                                   struct sched_group *sg)
> +{
> +#ifdef CONFIG_SCHED_SMT
> +       bool local_is_smt, sg_is_smt;
> +       int sg_busy_cpus;
> +
> +       local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> +       sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> +
> +       sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> +
> +       if (!local_is_smt) {
> +               /*
> +                * If we are here, @dst_cpu is idle and does not have SMT
> +                * siblings. Pull tasks if candidate group has two or more
> +                * busy CPUs.
> +                */
> +               if (sg_is_smt && sg_busy_cpus >= 2)

Do you really need to test sg_is_smt ? if sg_busy_cpus >= 2 then
sd_is_smt must be true ?

Also, This is the default behavior where we want to even the number of
busy cpu. Shouldn't you return false and fall back to the default
behavior ?

That being said, the default behavior tries to even the number of idle
cpus which is easier to compute and is equal to even the number of
busy cpus in "normal" system with the same number of cpus in groups
but this is not the case here. It could be good to change the default
behavior to even the number of busy cpus and that you use the default
behavior here. Additional condition will be used to select the busiest
group like more busy cpu or more number of running tasks

> +                       return true;
> +
> +               /*
> +                * @dst_cpu does not have SMT siblings. @sg may have SMT
> +                * siblings and only one is busy. In such case, @dst_cpu
> +                * can help if it has higher priority and is idle (i.e.,
> +                * it has no running tasks).

The previous comment above assume that "@dst_cpu is idle" but now you
need to check that sds->local_stat.sum_nr_running == 0

> +                */
> +               return !sds->local_stat.sum_nr_running &&
> +                      sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> +       }
> +
> +       /* @dst_cpu has SMT siblings. */
> +
> +       if (sg_is_smt) {
> +               int local_busy_cpus = sds->local->group_weight -
> +                                     sds->local_stat.idle_cpus;
> +               int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> +
> +               if (busy_cpus_delta == 1)
> +                       return sched_asym_prefer(dst_cpu,
> +                                                sg->asym_prefer_cpu);
> +
> +               return false;
> +       }
> +
> +       /*
> +        * @sg does not have SMT siblings. Ensure that @sds::local does not end
> +        * up with more than one busy SMT sibling and only pull tasks if there
> +        * are not busy CPUs (i.e., no CPU has running tasks).
> +        */
> +       if (!sds->local_stat.sum_nr_running)
> +               return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> +
> +       return false;
> +#else
> +       /* Always return false so that callers deal with non-SMT cases. */
> +       return false;
> +#endif
> +}
> +
>  static inline bool
>  sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
>            struct sched_group *group)
>  {
> +       /* Only do SMT checks if either local or candidate have SMT siblings */
> +       if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> +           (group->flags & SD_SHARE_CPUCAPACITY))
> +               return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
> +
>         return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
>  }
>
> @@ -9606,6 +9694,12 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>                     nr_running == 1)
>                         continue;
>
> +               /* Make sure we only pull tasks from a CPU of lower priority */
> +               if ((env->sd->flags & SD_ASYM_PACKING) &&
> +                   sched_asym_prefer(i, env->dst_cpu) &&
> +                   nr_running == 1)
> +                       continue;
> +
>                 switch (env->migration_type) {
>                 case migrate_load:
>                         /*
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-09-15 15:43   ` Vincent Guittot
@ 2021-09-17  1:00     ` Ricardo Neri
  2021-09-17  7:41       ` Vincent Guittot
  2021-09-17 15:25       ` Vincent Guittot
  0 siblings, 2 replies; 35+ messages in thread
From: Ricardo Neri @ 2021-09-17  1:00 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Srikar Dronamraju, Nicholas Piggin,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

On Wed, Sep 15, 2021 at 05:43:44PM +0200, Vincent Guittot wrote:
> On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
> <ricardo.neri-calderon@linux.intel.com> wrote:
> >
> > When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
> > check for the idle state of the destination CPU, dst_cpu, but also of
> > its SMT siblings.
> >
> > If dst_cpu is idle but its SMT siblings are busy, performance suffers
> > if it pulls tasks from a medium priority CPU that does not have SMT
> > siblings.
> >
> > Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
> > siblings of both dst_cpu and the CPUs in the candidate busiest group.
> >
> > Cc: Aubrey Li <aubrey.li@intel.com>
> > Cc: Ben Segall <bsegall@google.com>
> > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Quentin Perret <qperret@google.com>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Tim Chen <tim.c.chen@linux.intel.com>
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Reviewed-by: Len Brown <len.brown@intel.com>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> > Changes since v4:
> >   * Use sg_lb_stats::sum_nr_running the idle state of a scheduling group.
> >     (Vincent, Peter)
> >   * Do not even idle CPUs in asym_smt_can_pull_tasks(). (Vincent)
> >   * Updated function documentation and corrected a typo.
> >
> > Changes since v3:
> >   * Removed the arch_asym_check_smt_siblings() hook. Discussions with the
> >     powerpc folks showed that this patch should not impact them. Also, more
> >     recent powerpc processor no longer use asym_packing. (PeterZ)
> >   * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar)
> >   * Removed unnecessary check for local CPUs when the local group has zero
> >     utilization. (Joel)
> >   * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect
> >     the fact that it deals with SMT cases.
> >   * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so
> >     that callers can deal with non-SMT cases.
> >
> > Changes since v2:
> >   * Reworded the commit message to reflect updates in code.
> >   * Corrected misrepresentation of dst_cpu as the CPU doing the load
> >     balancing. (PeterZ)
> >   * Removed call to arch_asym_check_smt_siblings() as it is now called in
> >     sched_asym().
> >
> > Changes since v1:
> >   * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
> >     tasks. Instead, reclassify the candidate busiest group, as it
> >     may still be selected. (PeterZ)
> >   * Avoid an expensive and unnecessary call to cpumask_weight() when
> >     determining if a sched_group is comprised of SMT siblings.
> >     (PeterZ).
> > ---
> >  kernel/sched/fair.c | 94 +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 94 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 26db017c14a3..8d763dd0174b 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8597,10 +8597,98 @@ group_type group_classify(unsigned int imbalance_pct,
> >         return group_has_spare;
> >  }
> >
> > +/**
> > + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> > + * @dst_cpu:   Destination CPU of the load balancing
> > + * @sds:       Load-balancing data with statistics of the local group
> > + * @sgs:       Load-balancing statistics of the candidate busiest group
> > + * @sg:                The candidate busiest group
> > + *
> > + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> > + * if @dst_cpu can pull tasks.
> > + *
> > + * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
> > + * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
> > + * only if @dst_cpu has higher priority.
> > + *
> > + * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
> > + * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
> > + * Bigger imbalances in the number of busy CPUs will be dealt with in
> > + * update_sd_pick_busiest().
> > + *
> > + * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
> > + * of @dst_cpu are idle and @sg has lower priority.
> > + */
> > +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> > +                                   struct sg_lb_stats *sgs,
> > +                                   struct sched_group *sg)
> > +{
> > +#ifdef CONFIG_SCHED_SMT
> > +       bool local_is_smt, sg_is_smt;
> > +       int sg_busy_cpus;
> > +
> > +       local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> > +       sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> > +
> > +       sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> > +
> > +       if (!local_is_smt) {
> > +               /*
> > +                * If we are here, @dst_cpu is idle and does not have SMT
> > +                * siblings. Pull tasks if candidate group has two or more
> > +                * busy CPUs.
> > +                */
> > +               if (sg_is_smt && sg_busy_cpus >= 2)
> 
> Do you really need to test sg_is_smt ? if sg_busy_cpus >= 2 then
> sd_is_smt must be true ?

Thank you very much for your feedback Vincent!

Yes, it is true that sg_busy_cpus >=2 is only true if @sg is SMT. I will
remove this check.

> 
> Also, This is the default behavior where we want to even the number of
> busy cpu. Shouldn't you return false and fall back to the default
> behavior ?

This is also true.

> 
> That being said, the default behavior tries to even the number of idle
> cpus which is easier to compute and is equal to even the number of
> busy cpus in "normal" system with the same number of cpus in groups
> but this is not the case here. It could be good to change the default
> behavior to even the number of busy cpus and that you use the default
> behavior here. Additional condition will be used to select the busiest
> group like more busy cpu or more number of running tasks

That is a very good observation. Checking the number of idle CPUs
assumes that both groups have the same number of CPUs. I'll look into
modifying the default behavior.

> 
> > +                       return true;
> > +
> > +               /*
> > +                * @dst_cpu does not have SMT siblings. @sg may have SMT
> > +                * siblings and only one is busy. In such case, @dst_cpu
> > +                * can help if it has higher priority and is idle (i.e.,
> > +                * it has no running tasks).
> 
> The previous comment above assume that "@dst_cpu is idle" but now you
> need to check that sds->local_stat.sum_nr_running == 0

But we already know that, right? We are here because in
update_sg_lb_stats() we determine that dst CPU is idle (env->idle !=
CPU_NOT_IDLE).

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-09-17  1:00     ` Ricardo Neri
@ 2021-09-17  7:41       ` Vincent Guittot
  2021-09-17 15:25       ` Vincent Guittot
  1 sibling, 0 replies; 35+ messages in thread
From: Vincent Guittot @ 2021-09-17  7:41 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Srikar Dronamraju, Nicholas Piggin,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

On Fri, 17 Sept 2021 at 03:01, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> On Wed, Sep 15, 2021 at 05:43:44PM +0200, Vincent Guittot wrote:
> > On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
> > <ricardo.neri-calderon@linux.intel.com> wrote:
> > >
> > > When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
> > > check for the idle state of the destination CPU, dst_cpu, but also of
> > > its SMT siblings.
> > >
> > > If dst_cpu is idle but its SMT siblings are busy, performance suffers
> > > if it pulls tasks from a medium priority CPU that does not have SMT
> > > siblings.
> > >
> > > Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
> > > siblings of both dst_cpu and the CPUs in the candidate busiest group.
> > >
> > > Cc: Aubrey Li <aubrey.li@intel.com>
> > > Cc: Ben Segall <bsegall@google.com>
> > > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > > Cc: Mel Gorman <mgorman@suse.de>
> > > Cc: Quentin Perret <qperret@google.com>
> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Tim Chen <tim.c.chen@linux.intel.com>
> > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > Reviewed-by: Len Brown <len.brown@intel.com>
> > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > > ---
> > > Changes since v4:
> > >   * Use sg_lb_stats::sum_nr_running the idle state of a scheduling group.
> > >     (Vincent, Peter)
> > >   * Do not even idle CPUs in asym_smt_can_pull_tasks(). (Vincent)
> > >   * Updated function documentation and corrected a typo.
> > >
> > > Changes since v3:
> > >   * Removed the arch_asym_check_smt_siblings() hook. Discussions with the
> > >     powerpc folks showed that this patch should not impact them. Also, more
> > >     recent powerpc processor no longer use asym_packing. (PeterZ)
> > >   * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar)
> > >   * Removed unnecessary check for local CPUs when the local group has zero
> > >     utilization. (Joel)
> > >   * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect
> > >     the fact that it deals with SMT cases.
> > >   * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so
> > >     that callers can deal with non-SMT cases.
> > >
> > > Changes since v2:
> > >   * Reworded the commit message to reflect updates in code.
> > >   * Corrected misrepresentation of dst_cpu as the CPU doing the load
> > >     balancing. (PeterZ)
> > >   * Removed call to arch_asym_check_smt_siblings() as it is now called in
> > >     sched_asym().
> > >
> > > Changes since v1:
> > >   * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
> > >     tasks. Instead, reclassify the candidate busiest group, as it
> > >     may still be selected. (PeterZ)
> > >   * Avoid an expensive and unnecessary call to cpumask_weight() when
> > >     determining if a sched_group is comprised of SMT siblings.
> > >     (PeterZ).
> > > ---
> > >  kernel/sched/fair.c | 94 +++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 94 insertions(+)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 26db017c14a3..8d763dd0174b 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8597,10 +8597,98 @@ group_type group_classify(unsigned int imbalance_pct,
> > >         return group_has_spare;
> > >  }
> > >
> > > +/**
> > > + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> > > + * @dst_cpu:   Destination CPU of the load balancing
> > > + * @sds:       Load-balancing data with statistics of the local group
> > > + * @sgs:       Load-balancing statistics of the candidate busiest group
> > > + * @sg:                The candidate busiest group
> > > + *
> > > + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> > > + * if @dst_cpu can pull tasks.
> > > + *
> > > + * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
> > > + * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
> > > + * only if @dst_cpu has higher priority.
> > > + *
> > > + * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
> > > + * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
> > > + * Bigger imbalances in the number of busy CPUs will be dealt with in
> > > + * update_sd_pick_busiest().
> > > + *
> > > + * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
> > > + * of @dst_cpu are idle and @sg has lower priority.
> > > + */
> > > +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> > > +                                   struct sg_lb_stats *sgs,
> > > +                                   struct sched_group *sg)
> > > +{
> > > +#ifdef CONFIG_SCHED_SMT
> > > +       bool local_is_smt, sg_is_smt;
> > > +       int sg_busy_cpus;
> > > +
> > > +       local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> > > +       sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> > > +
> > > +       sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> > > +
> > > +       if (!local_is_smt) {
> > > +               /*
> > > +                * If we are here, @dst_cpu is idle and does not have SMT
> > > +                * siblings. Pull tasks if candidate group has two or more
> > > +                * busy CPUs.
> > > +                */
> > > +               if (sg_is_smt && sg_busy_cpus >= 2)
> >
> > Do you really need to test sg_is_smt ? if sg_busy_cpus >= 2 then
> > sd_is_smt must be true ?
>
> Thank you very much for your feedback Vincent!
>
> Yes, it is true that sg_busy_cpus >=2 is only true if @sg is SMT. I will
> remove this check.
>
> >
> > Also, This is the default behavior where we want to even the number of
> > busy cpu. Shouldn't you return false and fall back to the default
> > behavior ?
>
> This is also true.
>
> >
> > That being said, the default behavior tries to even the number of idle
> > cpus which is easier to compute and is equal to even the number of
> > busy cpus in "normal" system with the same number of cpus in groups
> > but this is not the case here. It could be good to change the default
> > behavior to even the number of busy cpus and that you use the default
> > behavior here. Additional condition will be used to select the busiest
> > group like more busy cpu or more number of running tasks
>
> That is a very good observation. Checking the number of idle CPUs
> assumes that both groups have the same number of CPUs. I'll look into
> modifying the default behavior.
>
> >
> > > +                       return true;
> > > +
> > > +               /*
> > > +                * @dst_cpu does not have SMT siblings. @sg may have SMT
> > > +                * siblings and only one is busy. In such case, @dst_cpu
> > > +                * can help if it has higher priority and is idle (i.e.,
> > > +                * it has no running tasks).
> >
> > The previous comment above assume that "@dst_cpu is idle" but now you
> > need to check that sds->local_stat.sum_nr_running == 0
>
> But we already know that, right? We are here because in
> update_sg_lb_stats() we determine that dst CPU is idle (env->idle !=
> CPU_NOT_IDLE).

That's my point:
Why do you add the condition !sds->local_stat.sum_nr_running below ? I
assume that it's to check that the cpu is idle, isn't it ?

> > > +                */
> > > +               return !sds->local_stat.sum_nr_running &&
> > > +                      sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > > +       }

>
> Thanks and BR,
> Ricardo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-09-17  1:00     ` Ricardo Neri
  2021-09-17  7:41       ` Vincent Guittot
@ 2021-09-17 15:25       ` Vincent Guittot
  2021-09-17 18:46         ` Peter Zijlstra
  1 sibling, 1 reply; 35+ messages in thread
From: Vincent Guittot @ 2021-09-17 15:25 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Srikar Dronamraju, Nicholas Piggin,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

On Fri, 17 Sept 2021 at 03:01, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> On Wed, Sep 15, 2021 at 05:43:44PM +0200, Vincent Guittot wrote:
> > On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
> > <ricardo.neri-calderon@linux.intel.com> wrote:
> > >
> > > When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
> > > check for the idle state of the destination CPU, dst_cpu, but also of
> > > its SMT siblings.
> > >
> > > If dst_cpu is idle but its SMT siblings are busy, performance suffers
> > > if it pulls tasks from a medium priority CPU that does not have SMT
> > > siblings.
> > >
> > > Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
> > > siblings of both dst_cpu and the CPUs in the candidate busiest group.
> > >
> > > Cc: Aubrey Li <aubrey.li@intel.com>
> > > Cc: Ben Segall <bsegall@google.com>
> > > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > > Cc: Mel Gorman <mgorman@suse.de>
> > > Cc: Quentin Perret <qperret@google.com>
> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Tim Chen <tim.c.chen@linux.intel.com>
> > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > Reviewed-by: Len Brown <len.brown@intel.com>
> > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > > ---
> > > Changes since v4:
> > >   * Use sg_lb_stats::sum_nr_running the idle state of a scheduling group.
> > >     (Vincent, Peter)
> > >   * Do not even idle CPUs in asym_smt_can_pull_tasks(). (Vincent)
> > >   * Updated function documentation and corrected a typo.
> > >
> > > Changes since v3:
> > >   * Removed the arch_asym_check_smt_siblings() hook. Discussions with the
> > >     powerpc folks showed that this patch should not impact them. Also, more
> > >     recent powerpc processor no longer use asym_packing. (PeterZ)
> > >   * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar)
> > >   * Removed unnecessary check for local CPUs when the local group has zero
> > >     utilization. (Joel)
> > >   * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect
> > >     the fact that it deals with SMT cases.
> > >   * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so
> > >     that callers can deal with non-SMT cases.
> > >
> > > Changes since v2:
> > >   * Reworded the commit message to reflect updates in code.
> > >   * Corrected misrepresentation of dst_cpu as the CPU doing the load
> > >     balancing. (PeterZ)
> > >   * Removed call to arch_asym_check_smt_siblings() as it is now called in
> > >     sched_asym().
> > >
> > > Changes since v1:
> > >   * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
> > >     tasks. Instead, reclassify the candidate busiest group, as it
> > >     may still be selected. (PeterZ)
> > >   * Avoid an expensive and unnecessary call to cpumask_weight() when
> > >     determining if a sched_group is comprised of SMT siblings.
> > >     (PeterZ).
> > > ---
> > >  kernel/sched/fair.c | 94 +++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 94 insertions(+)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 26db017c14a3..8d763dd0174b 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8597,10 +8597,98 @@ group_type group_classify(unsigned int imbalance_pct,
> > >         return group_has_spare;
> > >  }
> > >
> > > +/**
> > > + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> > > + * @dst_cpu:   Destination CPU of the load balancing
> > > + * @sds:       Load-balancing data with statistics of the local group
> > > + * @sgs:       Load-balancing statistics of the candidate busiest group
> > > + * @sg:                The candidate busiest group
> > > + *
> > > + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> > > + * if @dst_cpu can pull tasks.
> > > + *
> > > + * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
> > > + * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
> > > + * only if @dst_cpu has higher priority.
> > > + *
> > > + * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
> > > + * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
> > > + * Bigger imbalances in the number of busy CPUs will be dealt with in
> > > + * update_sd_pick_busiest().
> > > + *
> > > + * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
> > > + * of @dst_cpu are idle and @sg has lower priority.
> > > + */
> > > +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> > > +                                   struct sg_lb_stats *sgs,
> > > +                                   struct sched_group *sg)
> > > +{
> > > +#ifdef CONFIG_SCHED_SMT
> > > +       bool local_is_smt, sg_is_smt;
> > > +       int sg_busy_cpus;
> > > +
> > > +       local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> > > +       sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> > > +
> > > +       sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> > > +
> > > +       if (!local_is_smt) {
> > > +               /*
> > > +                * If we are here, @dst_cpu is idle and does not have SMT
> > > +                * siblings. Pull tasks if candidate group has two or more
> > > +                * busy CPUs.
> > > +                */
> > > +               if (sg_is_smt && sg_busy_cpus >= 2)
> >
> > Do you really need to test sg_is_smt ? if sg_busy_cpus >= 2 then
> > sd_is_smt must be true ?
>
> Thank you very much for your feedback Vincent!
>
> Yes, it is true that sg_busy_cpus >=2 is only true if @sg is SMT. I will
> remove this check.
>
> >
> > Also, This is the default behavior where we want to even the number of
> > busy cpu. Shouldn't you return false and fall back to the default
> > behavior ?
>
> This is also true.
>
> >
> > That being said, the default behavior tries to even the number of idle
> > cpus which is easier to compute and is equal to even the number of
> > busy cpus in "normal" system with the same number of cpus in groups
> > but this is not the case here. It could be good to change the default
> > behavior to even the number of busy cpus and that you use the default
> > behavior here. Additional condition will be used to select the busiest
> > group like more busy cpu or more number of running tasks
>
> That is a very good observation. Checking the number of idle CPUs
> assumes that both groups have the same number of CPUs. I'll look into
> modifying the default behavior.

Because this change will impact default smt/smp system, we might
prefer to do that in a separate step

With the removal of the condition !sds->local_stat.sum_nr_running
which seems useless because dst_cpu is idle and not SMT, this patch
looks good to me

>
> >
> > > +                       return true;
> > > +
> > > +               /*
> > > +                * @dst_cpu does not have SMT siblings. @sg may have SMT
> > > +                * siblings and only one is busy. In such case, @dst_cpu
> > > +                * can help if it has higher priority and is idle (i.e.,
> > > +                * it has no running tasks).
> >
> > The previous comment above assume that "@dst_cpu is idle" but now you
> > need to check that sds->local_stat.sum_nr_running == 0
>
> But we already know that, right? We are here because in
> update_sg_lb_stats() we determine that dst CPU is idle (env->idle !=
> CPU_NOT_IDLE).
>
> Thanks and BR,
> Ricardo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 2/6] sched/topology: Introduce sched_group::flags
  2021-09-11  1:18 ` [PATCH v5 2/6] sched/topology: Introduce sched_group::flags Ricardo Neri
@ 2021-09-17 15:26   ` Vincent Guittot
  2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
  2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  2 siblings, 0 replies; 35+ messages in thread
From: Vincent Guittot @ 2021-09-17 15:26 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Srikar Dronamraju, Nicholas Piggin,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> There exist situations in which the load balance needs to know the
> properties of the CPUs in a scheduling group. When using asymmetric
> packing, for instance, the load balancer needs to know not only the
> state of dst_cpu but also of its SMT siblings, if any.
>
> Use the flags of the child scheduling domains to initialize scheduling
> group flags. This will reflect the properties of the CPUs in the
> group.
>
> A subsequent changeset will make use of these new flags. No functional
> changes are introduced.
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
> Changes since v4:
>   * None
>
> Changes since v3:
>   * Clear the flags of the scheduling groups of a domain if its child is
>     destroyed.
>   * Minor rewording of the commit message.
>
> Changes since v2:
>   * Introduced this patch.
>
> Changes since v1:
>   * N/A
> ---
>  kernel/sched/sched.h    |  1 +
>  kernel/sched/topology.c | 21 ++++++++++++++++++---
>  2 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3d3e5793e117..86ab33ce529d 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1809,6 +1809,7 @@ struct sched_group {
>         unsigned int            group_weight;
>         struct sched_group_capacity *sgc;
>         int                     asym_prefer_cpu;        /* CPU of highest priority in group */
> +       int                     flags;
>
>         /*
>          * The CPUs this group covers.
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 4e8698e62f07..c56faae461d9 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -716,8 +716,20 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
>                 tmp = sd;
>                 sd = sd->parent;
>                 destroy_sched_domain(tmp);
> -               if (sd)
> +               if (sd) {
> +                       struct sched_group *sg = sd->groups;
> +
> +                       /*
> +                        * sched groups hold the flags of the child sched
> +                        * domain for convenience. Clear such flags since
> +                        * the child is being destroyed.
> +                        */
> +                       do {
> +                               sg->flags = 0;
> +                       } while (sg != sd->groups);
> +
>                         sd->child = NULL;
> +               }
>         }
>
>         for (tmp = sd; tmp; tmp = tmp->parent)
> @@ -916,10 +928,12 @@ build_group_from_child_sched_domain(struct sched_domain *sd, int cpu)
>                 return NULL;
>
>         sg_span = sched_group_span(sg);
> -       if (sd->child)
> +       if (sd->child) {
>                 cpumask_copy(sg_span, sched_domain_span(sd->child));
> -       else
> +               sg->flags = sd->child->flags;
> +       } else {
>                 cpumask_copy(sg_span, sched_domain_span(sd));
> +       }
>
>         atomic_inc(&sg->ref);
>         return sg;
> @@ -1169,6 +1183,7 @@ static struct sched_group *get_group(int cpu, struct sd_data *sdd)
>         if (child) {
>                 cpumask_copy(sched_group_span(sg), sched_domain_span(child));
>                 cpumask_copy(group_balance_mask(sg), sched_group_span(sg));
> +               sg->flags = child->flags;
>         } else {
>                 cpumask_set_cpu(cpu, sched_group_span(sg));
>                 cpumask_set_cpu(cpu, group_balance_mask(sg));
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 3/6] sched/fair: Optimize checking for group_asym_packing
  2021-09-11  1:18 ` [PATCH v5 3/6] sched/fair: Optimize checking for group_asym_packing Ricardo Neri
@ 2021-09-17 15:26   ` Vincent Guittot
  2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
  2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  2 siblings, 0 replies; 35+ messages in thread
From: Vincent Guittot @ 2021-09-17 15:26 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Srikar Dronamraju, Nicholas Piggin,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> sched_asmy_prefer() always returns false when called on the local group. By
> checking local_group, we can avoid additional checks and invoking
> sched_asmy_prefer() when it is not needed. No functional changes are
> introduced.
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
> Changes since v4:
>   * None
>
> Changes since v3:
>   * Further rewording of the commit message. (Len)
>
> Changes since v2:
>   * Reworded the commit message for clarity. (Peter Z)
>
> Changes since v1:
>   * None
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ff69f245b939..7a054f528bcc 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8657,7 +8657,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>         }
>
>         /* Check if dst CPU is idle and preferred to this group */
> -       if (env->sd->flags & SD_ASYM_PACKING &&
> +       if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
>             env->idle != CPU_NOT_IDLE &&
>             sgs->sum_h_nr_running &&
>             sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 4/6] sched/fair: Provide update_sg_lb_stats() with sched domain statistics
  2021-09-11  1:18 ` [PATCH v5 4/6] sched/fair: Provide update_sg_lb_stats() with sched domain statistics Ricardo Neri
@ 2021-09-17 15:27   ` Vincent Guittot
  2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
  2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  2 siblings, 0 replies; 35+ messages in thread
From: Vincent Guittot @ 2021-09-17 15:27 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Srikar Dronamraju, Nicholas Piggin,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> Before deciding to pull tasks when using asymmetric packing of tasks,
> on some architectures (e.g., x86) it is necessary to know not only the
> state of dst_cpu but also of its SMT siblings. The decision to classify
> a candidate busiest group as group_asym_packing is done in
> update_sg_lb_stats(). Give this function access to the scheduling domain
> statistics, which contains the statistics of the local group.
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
> Changes since v4:
>   * None
>
> Changes since v3:
>   * None
>
> Changes since v2:
>   * Introduced this patch.
>
> Changes since v1:
>   * N/A
> ---
>  kernel/sched/fair.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7a054f528bcc..c5851260b4d8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8605,6 +8605,7 @@ group_type group_classify(unsigned int imbalance_pct,
>   * @sg_status: Holds flag indicating the status of the sched_group
>   */
>  static inline void update_sg_lb_stats(struct lb_env *env,
> +                                     struct sd_lb_stats *sds,
>                                       struct sched_group *group,
>                                       struct sg_lb_stats *sgs,
>                                       int *sg_status)
> @@ -8613,7 +8614,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>
>         memset(sgs, 0, sizeof(*sgs));
>
> -       local_group = cpumask_test_cpu(env->dst_cpu, sched_group_span(group));
> +       local_group = group == sds->local;
>
>         for_each_cpu_and(i, sched_group_span(group), env->cpus) {
>                 struct rq *rq = cpu_rq(i);
> @@ -9176,7 +9177,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
>                                 update_group_capacity(env->sd, env->dst_cpu);
>                 }
>
> -               update_sg_lb_stats(env, sg, sgs, &sg_status);
> +               update_sg_lb_stats(env, sds, sg, sgs, &sg_status);
>
>                 if (local_group)
>                         goto next_group;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 5/6] sched/fair: Carve out logic to mark a group for asymmetric packing
  2021-09-11  1:18 ` [PATCH v5 5/6] sched/fair: Carve out logic to mark a group for asymmetric packing Ricardo Neri
@ 2021-09-17 15:27   ` Vincent Guittot
  2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
  2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  2 siblings, 0 replies; 35+ messages in thread
From: Vincent Guittot @ 2021-09-17 15:27 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Srikar Dronamraju, Nicholas Piggin,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> Create a separate function, sched_asym(). A subsequent changeset will
> introduce logic to deal with SMT in conjunction with asmymmetric
> packing. Such logic will need the statistics of the scheduling
> group provided as argument. Update them before calling sched_asym().
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
> Changes since v4:
>   * None
>
> Changes since v3:
>   * Remove a redundant check for the local group in sched_asym().
>     (Dietmar)
>   * Reworded commit message for clarity. (Len)
>
> Changes since v2:
>   * Introduced this patch.
>
> Changes since v1:
>   * N/A
> ---
>  kernel/sched/fair.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c5851260b4d8..26db017c14a3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8597,6 +8597,13 @@ group_type group_classify(unsigned int imbalance_pct,
>         return group_has_spare;
>  }
>
> +static inline bool
> +sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
> +          struct sched_group *group)
> +{
> +       return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
> +}
> +
>  /**
>   * update_sg_lb_stats - Update sched_group's statistics for load balancing.
>   * @env: The load balancing environment.
> @@ -8657,18 +8664,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>                 }
>         }
>
> +       sgs->group_capacity = group->sgc->capacity;
> +
> +       sgs->group_weight = group->group_weight;
> +
>         /* Check if dst CPU is idle and preferred to this group */
>         if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
> -           env->idle != CPU_NOT_IDLE &&
> -           sgs->sum_h_nr_running &&
> -           sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
> +           env->idle != CPU_NOT_IDLE && sgs->sum_h_nr_running &&
> +           sched_asym(env, sds, sgs, group)) {
>                 sgs->group_asym_packing = 1;
>         }
>
> -       sgs->group_capacity = group->sgc->capacity;
> -
> -       sgs->group_weight = group->group_weight;
> -
>         sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs);
>
>         /* Computing avg_load makes sense only when group is overloaded */
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-09-17 15:25       ` Vincent Guittot
@ 2021-09-17 18:46         ` Peter Zijlstra
  2021-09-18  9:33           ` Vincent Guittot
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2021-09-17 18:46 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ricardo Neri, Ingo Molnar, Juri Lelli, Srikar Dronamraju,
	Nicholas Piggin, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

On Fri, Sep 17, 2021 at 05:25:02PM +0200, Vincent Guittot wrote:

> With the removal of the condition !sds->local_stat.sum_nr_running
> which seems useless because dst_cpu is idle and not SMT, this patch
> looks good to me

I've made it look like this. Thanks!

---
Subject: sched/fair: Consider SMT in ASYM_PACKING load balance
From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Date: Fri, 10 Sep 2021 18:18:19 -0700

From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
check for the idle state of the destination CPU, dst_cpu, but also of
its SMT siblings.

If dst_cpu is idle but its SMT siblings are busy, performance suffers
if it pulls tasks from a medium priority CPU that does not have SMT
siblings.

Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
siblings of both dst_cpu and the CPUs in the candidate busiest group.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Link: https://lkml.kernel.org/r/20210911011819.12184-7-ricardo.neri-calderon@linux.intel.com
---
 kernel/sched/fair.c |   92 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8538,10 +8538,96 @@ group_type group_classify(unsigned int i
 	return group_has_spare;
 }
 
+/**
+ * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
+ * @dst_cpu:	Destination CPU of the load balancing
+ * @sds:	Load-balancing data with statistics of the local group
+ * @sgs:	Load-balancing statistics of the candidate busiest group
+ * @sg:		The candidate busiest group
+ *
+ * Check the state of the SMT siblings of both @sds::local and @sg and decide
+ * if @dst_cpu can pull tasks.
+ *
+ * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
+ * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
+ * only if @dst_cpu has higher priority.
+ *
+ * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
+ * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
+ * Bigger imbalances in the number of busy CPUs will be dealt with in
+ * update_sd_pick_busiest().
+ *
+ * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
+ * of @dst_cpu are idle and @sg has lower priority.
+ */
+static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
+				    struct sg_lb_stats *sgs,
+				    struct sched_group *sg)
+{
+#ifdef CONFIG_SCHED_SMT
+	bool local_is_smt, sg_is_smt;
+	int sg_busy_cpus;
+
+	local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
+	sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
+
+	sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
+
+	if (!local_is_smt) {
+		/*
+		 * If we are here, @dst_cpu is idle and does not have SMT
+		 * siblings. Pull tasks if candidate group has two or more
+		 * busy CPUs.
+		 */
+		if (sg_busy_cpus >= 2) /* implies sg_is_smt */
+			return true;
+
+		/*
+		 * @dst_cpu does not have SMT siblings. @sg may have SMT
+		 * siblings and only one is busy. In such case, @dst_cpu
+		 * can help if it has higher priority and is idle (i.e.,
+		 * it has no running tasks).
+		 */
+		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+	}
+
+	/* @dst_cpu has SMT siblings. */
+
+	if (sg_is_smt) {
+		int local_busy_cpus = sds->local->group_weight -
+				      sds->local_stat.idle_cpus;
+		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
+
+		if (busy_cpus_delta == 1)
+			return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+
+		return false;
+	}
+
+	/*
+	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
+	 * up with more than one busy SMT sibling and only pull tasks if there
+	 * are not busy CPUs (i.e., no CPU has running tasks).
+	 */
+	if (!sds->local_stat.sum_nr_running)
+		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+
+	return false;
+#else
+	/* Always return false so that callers deal with non-SMT cases. */
+	return false;
+#endif
+}
+
 static inline bool
 sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
 	   struct sched_group *group)
 {
+	/* Only do SMT checks if either local or candidate have SMT siblings */
+	if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
+	    (group->flags & SD_SHARE_CPUCAPACITY))
+		return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
+
 	return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
 }
 
@@ -9547,6 +9633,12 @@ static struct rq *find_busiest_queue(str
 		    nr_running == 1)
 			continue;
 
+		/* Make sure we only pull tasks from a CPU of lower priority */
+		if ((env->sd->flags & SD_ASYM_PACKING) &&
+		    sched_asym_prefer(i, env->dst_cpu) &&
+		    nr_running == 1)
+			continue;
+
 		switch (env->migration_type) {
 		case migrate_load:
 			/*

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-09-17 18:46         ` Peter Zijlstra
@ 2021-09-18  9:33           ` Vincent Guittot
  0 siblings, 0 replies; 35+ messages in thread
From: Vincent Guittot @ 2021-09-18  9:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ricardo Neri, Ingo Molnar, Juri Lelli, Srikar Dronamraju,
	Nicholas Piggin, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki

On Fri, 17 Sept 2021 at 20:47, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Sep 17, 2021 at 05:25:02PM +0200, Vincent Guittot wrote:
>
> > With the removal of the condition !sds->local_stat.sum_nr_running
> > which seems useless because dst_cpu is idle and not SMT, this patch
> > looks good to me
>
> I've made it look like this. Thanks!

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

Thanks

>
> ---
> Subject: sched/fair: Consider SMT in ASYM_PACKING load balance
> From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> Date: Fri, 10 Sep 2021 18:18:19 -0700
>
> From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
>
> When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
> check for the idle state of the destination CPU, dst_cpu, but also of
> its SMT siblings.
>
> If dst_cpu is idle but its SMT siblings are busy, performance suffers
> if it pulls tasks from a medium priority CPU that does not have SMT
> siblings.
>
> Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
> siblings of both dst_cpu and the CPUs in the candidate busiest group.
>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Link: https://lkml.kernel.org/r/20210911011819.12184-7-ricardo.neri-calderon@linux.intel.com
> ---
>  kernel/sched/fair.c |   92 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 92 insertions(+)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8538,10 +8538,96 @@ group_type group_classify(unsigned int i
>         return group_has_spare;
>  }
>
> +/**
> + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> + * @dst_cpu:   Destination CPU of the load balancing
> + * @sds:       Load-balancing data with statistics of the local group
> + * @sgs:       Load-balancing statistics of the candidate busiest group
> + * @sg:                The candidate busiest group
> + *
> + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> + * if @dst_cpu can pull tasks.
> + *
> + * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
> + * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
> + * only if @dst_cpu has higher priority.
> + *
> + * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
> + * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
> + * Bigger imbalances in the number of busy CPUs will be dealt with in
> + * update_sd_pick_busiest().
> + *
> + * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
> + * of @dst_cpu are idle and @sg has lower priority.
> + */
> +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> +                                   struct sg_lb_stats *sgs,
> +                                   struct sched_group *sg)
> +{
> +#ifdef CONFIG_SCHED_SMT
> +       bool local_is_smt, sg_is_smt;
> +       int sg_busy_cpus;
> +
> +       local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> +       sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> +
> +       sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> +
> +       if (!local_is_smt) {
> +               /*
> +                * If we are here, @dst_cpu is idle and does not have SMT
> +                * siblings. Pull tasks if candidate group has two or more
> +                * busy CPUs.
> +                */
> +               if (sg_busy_cpus >= 2) /* implies sg_is_smt */
> +                       return true;
> +
> +               /*
> +                * @dst_cpu does not have SMT siblings. @sg may have SMT
> +                * siblings and only one is busy. In such case, @dst_cpu
> +                * can help if it has higher priority and is idle (i.e.,
> +                * it has no running tasks).
> +                */
> +               return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> +       }
> +
> +       /* @dst_cpu has SMT siblings. */
> +
> +       if (sg_is_smt) {
> +               int local_busy_cpus = sds->local->group_weight -
> +                                     sds->local_stat.idle_cpus;
> +               int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> +
> +               if (busy_cpus_delta == 1)
> +                       return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> +
> +               return false;
> +       }
> +
> +       /*
> +        * @sg does not have SMT siblings. Ensure that @sds::local does not end
> +        * up with more than one busy SMT sibling and only pull tasks if there
> +        * are not busy CPUs (i.e., no CPU has running tasks).
> +        */
> +       if (!sds->local_stat.sum_nr_running)
> +               return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> +
> +       return false;
> +#else
> +       /* Always return false so that callers deal with non-SMT cases. */
> +       return false;
> +#endif
> +}
> +
>  static inline bool
>  sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
>            struct sched_group *group)
>  {
> +       /* Only do SMT checks if either local or candidate have SMT siblings */
> +       if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> +           (group->flags & SD_SHARE_CPUCAPACITY))
> +               return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
> +
>         return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
>  }
>
> @@ -9547,6 +9633,12 @@ static struct rq *find_busiest_queue(str
>                     nr_running == 1)
>                         continue;
>
> +               /* Make sure we only pull tasks from a CPU of lower priority */
> +               if ((env->sd->flags & SD_ASYM_PACKING) &&
> +                   sched_asym_prefer(i, env->dst_cpu) &&
> +                   nr_running == 1)
> +                       continue;
> +
>                 switch (env->migration_type) {
>                 case migrate_load:
>                         /*

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-09-11  1:18 ` [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
  2021-09-15 15:43   ` Vincent Guittot
@ 2021-09-21  7:27   ` tip-bot2 for Ricardo Neri
  2021-10-01  9:33   ` [PATCH v5 6/6] " Guillaume Tucker
  2021-10-05 14:12   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
  3 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Ricardo Neri @ 2021-09-21  7:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ricardo Neri, Peter Zijlstra (Intel), Joel Fernandes (Google),
	Len Brown, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     eac6f3841f1dac7b6f43002056b63f44cc1f1543
Gitweb:        https://git.kernel.org/tip/eac6f3841f1dac7b6f43002056b63f44cc1f1543
Author:        Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
AuthorDate:    Fri, 10 Sep 2021 18:18:19 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 18 Sep 2021 12:18:40 +02:00

sched/fair: Consider SMT in ASYM_PACKING load balance

When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
check for the idle state of the destination CPU, dst_cpu, but also of
its SMT siblings.

If dst_cpu is idle but its SMT siblings are busy, performance suffers
if it pulls tasks from a medium priority CPU that does not have SMT
siblings.

Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
siblings of both dst_cpu and the CPUs in the candidate busiest group.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210911011819.12184-7-ricardo.neri-calderon@linux.intel.com
---
 kernel/sched/fair.c | 92 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 92 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6d27375..2ce015c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8538,10 +8538,96 @@ group_type group_classify(unsigned int imbalance_pct,
 	return group_has_spare;
 }
 
+/**
+ * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
+ * @dst_cpu:	Destination CPU of the load balancing
+ * @sds:	Load-balancing data with statistics of the local group
+ * @sgs:	Load-balancing statistics of the candidate busiest group
+ * @sg:		The candidate busiest group
+ *
+ * Check the state of the SMT siblings of both @sds::local and @sg and decide
+ * if @dst_cpu can pull tasks.
+ *
+ * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
+ * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
+ * only if @dst_cpu has higher priority.
+ *
+ * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
+ * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
+ * Bigger imbalances in the number of busy CPUs will be dealt with in
+ * update_sd_pick_busiest().
+ *
+ * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
+ * of @dst_cpu are idle and @sg has lower priority.
+ */
+static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
+				    struct sg_lb_stats *sgs,
+				    struct sched_group *sg)
+{
+#ifdef CONFIG_SCHED_SMT
+	bool local_is_smt, sg_is_smt;
+	int sg_busy_cpus;
+
+	local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
+	sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
+
+	sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
+
+	if (!local_is_smt) {
+		/*
+		 * If we are here, @dst_cpu is idle and does not have SMT
+		 * siblings. Pull tasks if candidate group has two or more
+		 * busy CPUs.
+		 */
+		if (sg_busy_cpus >= 2) /* implies sg_is_smt */
+			return true;
+
+		/*
+		 * @dst_cpu does not have SMT siblings. @sg may have SMT
+		 * siblings and only one is busy. In such case, @dst_cpu
+		 * can help if it has higher priority and is idle (i.e.,
+		 * it has no running tasks).
+		 */
+		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+	}
+
+	/* @dst_cpu has SMT siblings. */
+
+	if (sg_is_smt) {
+		int local_busy_cpus = sds->local->group_weight -
+				      sds->local_stat.idle_cpus;
+		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
+
+		if (busy_cpus_delta == 1)
+			return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+
+		return false;
+	}
+
+	/*
+	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
+	 * up with more than one busy SMT sibling and only pull tasks if there
+	 * are not busy CPUs (i.e., no CPU has running tasks).
+	 */
+	if (!sds->local_stat.sum_nr_running)
+		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+
+	return false;
+#else
+	/* Always return false so that callers deal with non-SMT cases. */
+	return false;
+#endif
+}
+
 static inline bool
 sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
 	   struct sched_group *group)
 {
+	/* Only do SMT checks if either local or candidate have SMT siblings */
+	if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
+	    (group->flags & SD_SHARE_CPUCAPACITY))
+		return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
+
 	return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
 }
 
@@ -9547,6 +9633,12 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 		    nr_running == 1)
 			continue;
 
+		/* Make sure we only pull tasks from a CPU of lower priority */
+		if ((env->sd->flags & SD_ASYM_PACKING) &&
+		    sched_asym_prefer(i, env->dst_cpu) &&
+		    nr_running == 1)
+			continue;
+
 		switch (env->migration_type) {
 		case migrate_load:
 			/*

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/fair: Carve out logic to mark a group for asymmetric packing
  2021-09-11  1:18 ` [PATCH v5 5/6] sched/fair: Carve out logic to mark a group for asymmetric packing Ricardo Neri
  2021-09-17 15:27   ` Vincent Guittot
@ 2021-09-21  7:27   ` tip-bot2 for Ricardo Neri
  2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  2 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Ricardo Neri @ 2021-09-21  7:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ricardo Neri, Peter Zijlstra (Intel), Joel Fernandes (Google),
	Len Brown, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     f58215ed2ff917dc40e6fb7b2d9b7fd290ec5055
Gitweb:        https://git.kernel.org/tip/f58215ed2ff917dc40e6fb7b2d9b7fd290ec5055
Author:        Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
AuthorDate:    Fri, 10 Sep 2021 18:18:18 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 18 Sep 2021 12:18:40 +02:00

sched/fair: Carve out logic to mark a group for asymmetric packing

Create a separate function, sched_asym(). A subsequent changeset will
introduce logic to deal with SMT in conjunction with asmymmetric
packing. Such logic will need the statistics of the scheduling
group provided as argument. Update them before calling sched_asym().

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210911011819.12184-6-ricardo.neri-calderon@linux.intel.com
---
 kernel/sched/fair.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d592de4..6d27375 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8538,6 +8538,13 @@ group_type group_classify(unsigned int imbalance_pct,
 	return group_has_spare;
 }
 
+static inline bool
+sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
+	   struct sched_group *group)
+{
+	return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
+}
+
 /**
  * update_sg_lb_stats - Update sched_group's statistics for load balancing.
  * @env: The load balancing environment.
@@ -8598,18 +8605,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 		}
 	}
 
+	sgs->group_capacity = group->sgc->capacity;
+
+	sgs->group_weight = group->group_weight;
+
 	/* Check if dst CPU is idle and preferred to this group */
 	if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
-	    env->idle != CPU_NOT_IDLE &&
-	    sgs->sum_h_nr_running &&
-	    sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
+	    env->idle != CPU_NOT_IDLE && sgs->sum_h_nr_running &&
+	    sched_asym(env, sds, sgs, group)) {
 		sgs->group_asym_packing = 1;
 	}
 
-	sgs->group_capacity = group->sgc->capacity;
-
-	sgs->group_weight = group->group_weight;
-
 	sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs);
 
 	/* Computing avg_load makes sense only when group is overloaded */

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/fair: Provide update_sg_lb_stats() with sched domain statistics
  2021-09-11  1:18 ` [PATCH v5 4/6] sched/fair: Provide update_sg_lb_stats() with sched domain statistics Ricardo Neri
  2021-09-17 15:27   ` Vincent Guittot
@ 2021-09-21  7:27   ` tip-bot2 for Ricardo Neri
  2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  2 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Ricardo Neri @ 2021-09-21  7:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Ricardo Neri, Joel Fernandes (Google),
	Len Brown, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     a7bd2ed2dc9e8bf6b69d26573cb6e80ef42d8e5b
Gitweb:        https://git.kernel.org/tip/a7bd2ed2dc9e8bf6b69d26573cb6e80ef42d8e5b
Author:        Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
AuthorDate:    Fri, 10 Sep 2021 18:18:17 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 18 Sep 2021 12:18:40 +02:00

sched/fair: Provide update_sg_lb_stats() with sched domain statistics

Before deciding to pull tasks when using asymmetric packing of tasks,
on some architectures (e.g., x86) it is necessary to know not only the
state of dst_cpu but also of its SMT siblings. The decision to classify
a candidate busiest group as group_asym_packing is done in
update_sg_lb_stats(). Give this function access to the scheduling domain
statistics, which contains the statistics of the local group.

Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210911011819.12184-5-ricardo.neri-calderon@linux.intel.com
---
 kernel/sched/fair.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d85c5a4..d592de4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8546,6 +8546,7 @@ group_type group_classify(unsigned int imbalance_pct,
  * @sg_status: Holds flag indicating the status of the sched_group
  */
 static inline void update_sg_lb_stats(struct lb_env *env,
+				      struct sd_lb_stats *sds,
 				      struct sched_group *group,
 				      struct sg_lb_stats *sgs,
 				      int *sg_status)
@@ -8554,7 +8555,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 
 	memset(sgs, 0, sizeof(*sgs));
 
-	local_group = cpumask_test_cpu(env->dst_cpu, sched_group_span(group));
+	local_group = group == sds->local;
 
 	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
 		struct rq *rq = cpu_rq(i);
@@ -9117,7 +9118,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 				update_group_capacity(env->sd, env->dst_cpu);
 		}
 
-		update_sg_lb_stats(env, sg, sgs, &sg_status);
+		update_sg_lb_stats(env, sds, sg, sgs, &sg_status);
 
 		if (local_group)
 			goto next_group;

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/fair: Optimize checking for group_asym_packing
  2021-09-11  1:18 ` [PATCH v5 3/6] sched/fair: Optimize checking for group_asym_packing Ricardo Neri
  2021-09-17 15:26   ` Vincent Guittot
@ 2021-09-21  7:27   ` tip-bot2 for Ricardo Neri
  2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  2 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Ricardo Neri @ 2021-09-21  7:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ricardo Neri, Peter Zijlstra (Intel), Joel Fernandes (Google),
	Len Brown, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     cb0e4ee938b1a08507ded179cec3a35b4a8d75b8
Gitweb:        https://git.kernel.org/tip/cb0e4ee938b1a08507ded179cec3a35b4a8d75b8
Author:        Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
AuthorDate:    Fri, 10 Sep 2021 18:18:16 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 18 Sep 2021 12:18:39 +02:00

sched/fair: Optimize checking for group_asym_packing

sched_asmy_prefer() always returns false when called on the local group. By
checking local_group, we can avoid additional checks and invoking
sched_asmy_prefer() when it is not needed. No functional changes are
introduced.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210911011819.12184-4-ricardo.neri-calderon@linux.intel.com
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e26d622..d85c5a4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8598,7 +8598,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 	}
 
 	/* Check if dst CPU is idle and preferred to this group */
-	if (env->sd->flags & SD_ASYM_PACKING &&
+	if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
 	    env->idle != CPU_NOT_IDLE &&
 	    sgs->sum_h_nr_running &&
 	    sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/topology: Introduce sched_group::flags
  2021-09-11  1:18 ` [PATCH v5 2/6] sched/topology: Introduce sched_group::flags Ricardo Neri
  2021-09-17 15:26   ` Vincent Guittot
@ 2021-09-21  7:27   ` tip-bot2 for Ricardo Neri
  2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  2 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Ricardo Neri @ 2021-09-21  7:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Ricardo Neri, Joel Fernandes (Google),
	Len Brown, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     048679b6a675a83f6f54f2775e61fc0f647c9c2b
Gitweb:        https://git.kernel.org/tip/048679b6a675a83f6f54f2775e61fc0f647c9c2b
Author:        Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
AuthorDate:    Fri, 10 Sep 2021 18:18:15 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 18 Sep 2021 12:18:39 +02:00

sched/topology: Introduce sched_group::flags

There exist situations in which the load balance needs to know the
properties of the CPUs in a scheduling group. When using asymmetric
packing, for instance, the load balancer needs to know not only the
state of dst_cpu but also of its SMT siblings, if any.

Use the flags of the child scheduling domains to initialize scheduling
group flags. This will reflect the properties of the CPUs in the
group.

A subsequent changeset will make use of these new flags. No functional
changes are introduced.

Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210911011819.12184-3-ricardo.neri-calderon@linux.intel.com
---
 kernel/sched/sched.h    |  1 +
 kernel/sched/topology.c | 21 ++++++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 148d5d3..71bc710 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1808,6 +1808,7 @@ struct sched_group {
 	unsigned int		group_weight;
 	struct sched_group_capacity *sgc;
 	int			asym_prefer_cpu;	/* CPU of highest priority in group */
+	int			flags;
 
 	/*
 	 * The CPUs this group covers.
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 4e8698e..c56faae 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -716,8 +716,20 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
 		tmp = sd;
 		sd = sd->parent;
 		destroy_sched_domain(tmp);
-		if (sd)
+		if (sd) {
+			struct sched_group *sg = sd->groups;
+
+			/*
+			 * sched groups hold the flags of the child sched
+			 * domain for convenience. Clear such flags since
+			 * the child is being destroyed.
+			 */
+			do {
+				sg->flags = 0;
+			} while (sg != sd->groups);
+
 			sd->child = NULL;
+		}
 	}
 
 	for (tmp = sd; tmp; tmp = tmp->parent)
@@ -916,10 +928,12 @@ build_group_from_child_sched_domain(struct sched_domain *sd, int cpu)
 		return NULL;
 
 	sg_span = sched_group_span(sg);
-	if (sd->child)
+	if (sd->child) {
 		cpumask_copy(sg_span, sched_domain_span(sd->child));
-	else
+		sg->flags = sd->child->flags;
+	} else {
 		cpumask_copy(sg_span, sched_domain_span(sd));
+	}
 
 	atomic_inc(&sg->ref);
 	return sg;
@@ -1169,6 +1183,7 @@ static struct sched_group *get_group(int cpu, struct sd_data *sdd)
 	if (child) {
 		cpumask_copy(sched_group_span(sg), sched_domain_span(child));
 		cpumask_copy(group_balance_mask(sg), sched_group_span(sg));
+		sg->flags = child->flags;
 	} else {
 		cpumask_set_cpu(cpu, sched_group_span(sg));
 		cpumask_set_cpu(cpu, group_balance_mask(sg));

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] x86/sched: Decrease further the priorities of SMT siblings
  2021-09-11  1:18 ` [PATCH v5 1/6] x86/sched: Decrease further the priorities of SMT siblings Ricardo Neri
@ 2021-09-21  7:27   ` tip-bot2 for Ricardo Neri
  2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot2 for Ricardo Neri @ 2021-09-21  7:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Len Brown, Ricardo Neri, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     014bfe892220077b8c623b97e31a91378d204137
Gitweb:        https://git.kernel.org/tip/014bfe892220077b8c623b97e31a91378d204137
Author:        Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
AuthorDate:    Fri, 10 Sep 2021 18:18:14 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 18 Sep 2021 12:18:39 +02:00

x86/sched: Decrease further the priorities of SMT siblings

When scheduling, it is better to prefer a separate physical core rather
than the SMT sibling of a high priority core. The existing formula to
compute priorities takes such fact in consideration. There may exist,
however, combinations of priorities (i.e., maximum frequencies) in which
the priority of high-numbered SMT siblings of high-priority cores collides
with the priority of low-numbered SMT siblings of low-priority cores.

Consider for instance an SMT2 system with CPUs [0, 1] with priority 60 and
[2, 3] with priority 30(CPUs in brackets are SMT siblings. In such a case,
the resulting priorities would be [120, 60], [60, 30]. Thus, to ensure
that CPU2 has higher priority than CPU1, divide the raw priority by the
squared SMT iterator. The resulting priorities are [120, 30]. [60, 15].

Originally-by: Len Brown <len.brown@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210911011819.12184-2-ricardo.neri-calderon@linux.intel.com
---
 arch/x86/kernel/itmt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index 1afbdd1..9ff480e 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -198,7 +198,7 @@ void sched_set_itmt_core_prio(int prio, int core_cpu)
 		 * of the priority chain and only used when
 		 * all other high priority cpus are out of capacity.
 		 */
-		smt_prio = prio * smp_num_siblings / i;
+		smt_prio = prio * smp_num_siblings / (i * i);
 		per_cpu(sched_core_priority, cpu) = smt_prio;
 		i++;
 	}

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-09-11  1:18 ` [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
  2021-09-15 15:43   ` Vincent Guittot
  2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
@ 2021-10-01  9:33   ` Guillaume Tucker
  2021-10-01  9:40     ` Guillaume Tucker
  2021-10-01 10:25     ` Vincent Guittot
  2021-10-05 14:12   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
  3 siblings, 2 replies; 35+ messages in thread
From: Guillaume Tucker @ 2021-10-01  9:33 UTC (permalink / raw)
  To: Ricardo Neri, Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki, kernelci-results

Hi Ricardo,

Please see the bisection report below about a boot failure on
rk3288-rock64.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

Some more details can be found here:

  https://linux.kernelci.org/test/case/id/6155a4b0836c79a98f99a31d/

It looks like some i.MX arm64 platforms also regressed with
next-20210920 although it hasn't been verified yet whether that's
due to the same commit:

  https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20210930/plan/baseline/

The x86 platforms don't seem to be impacted though.

Please let us know if you need help debugging the issue or if you
have a fix to try.

Best wishes,
Guillaume


GitHub: https://github.com/kernelci/kernelci-project/issues/65

-------------------------------------------------------------------------------


* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* This automated bisection report was sent to you on the basis  *
* that you may be involved with the breaking commit it has      *
* found.  No manual investigation has been done to verify it,   *
* and the root cause of the problem may be somewhere else.      *
*                                                               *
* If you do send a fix, please include this trailer:            *
*   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
*                                                               *
* Hope this helps!                                              *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

next/master bisection: baseline.login on rk3328-rock64

Summary:
  Start:      2d02a18f75fc Add linux-next specific files for 20210929
  Plain log:  https://storage.kernelci.org/next/master/next-20210929/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-baylibre/baseline-rk3328-rock64.txt
  HTML log:   https://storage.kernelci.org/next/master/next-20210929/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-baylibre/baseline-rk3328-rock64.html
  Result:     eac6f3841f1d sched/fair: Consider SMT in ASYM_PACKING load balance

Checks:
  revert:     PASS
  verify:     PASS

Parameters:
  Tree:       next
  URL:        https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
  Branch:     master
  Target:     rk3328-rock64
  CPU arch:   arm64
  Lab:        lab-baylibre
  Compiler:   gcc-8
  Config:     defconfig+CONFIG_RANDOMIZE_BASE=y
  Test case:  baseline.login

Breaking commit found:

-------------------------------------------------------------------------------
commit eac6f3841f1dac7b6f43002056b63f44cc1f1543
Author: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Date:   Fri Sep 10 18:18:19 2021 -0700

    sched/fair: Consider SMT in ASYM_PACKING load balance


On 11/09/2021 03:18, Ricardo Neri wrote:
> When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
> check for the idle state of the destination CPU, dst_cpu, but also of
> its SMT siblings.
> 
> If dst_cpu is idle but its SMT siblings are busy, performance suffers
> if it pulls tasks from a medium priority CPU that does not have SMT
> siblings.
> 
> Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
> siblings of both dst_cpu and the CPUs in the candidate busiest group.
> 
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
> Changes since v4:
>   * Use sg_lb_stats::sum_nr_running the idle state of a scheduling group.
>     (Vincent, Peter)
>   * Do not even idle CPUs in asym_smt_can_pull_tasks(). (Vincent)
>   * Updated function documentation and corrected a typo.
> 
> Changes since v3:
>   * Removed the arch_asym_check_smt_siblings() hook. Discussions with the
>     powerpc folks showed that this patch should not impact them. Also, more
>     recent powerpc processor no longer use asym_packing. (PeterZ)
>   * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar)
>   * Removed unnecessary check for local CPUs when the local group has zero
>     utilization. (Joel)
>   * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect
>     the fact that it deals with SMT cases.
>   * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so
>     that callers can deal with non-SMT cases.
> 
> Changes since v2:
>   * Reworded the commit message to reflect updates in code.
>   * Corrected misrepresentation of dst_cpu as the CPU doing the load
>     balancing. (PeterZ)
>   * Removed call to arch_asym_check_smt_siblings() as it is now called in
>     sched_asym().
> 
> Changes since v1:
>   * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
>     tasks. Instead, reclassify the candidate busiest group, as it
>     may still be selected. (PeterZ)
>   * Avoid an expensive and unnecessary call to cpumask_weight() when
>     determining if a sched_group is comprised of SMT siblings.
>     (PeterZ).
> ---
>  kernel/sched/fair.c | 94 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 94 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 26db017c14a3..8d763dd0174b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8597,10 +8597,98 @@ group_type group_classify(unsigned int imbalance_pct,
>  	return group_has_spare;
>  }
>  
> +/**
> + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> + * @dst_cpu:	Destination CPU of the load balancing
> + * @sds:	Load-balancing data with statistics of the local group
> + * @sgs:	Load-balancing statistics of the candidate busiest group
> + * @sg:		The candidate busiest group
> + *
> + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> + * if @dst_cpu can pull tasks.
> + *
> + * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
> + * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
> + * only if @dst_cpu has higher priority.
> + *
> + * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
> + * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
> + * Bigger imbalances in the number of busy CPUs will be dealt with in
> + * update_sd_pick_busiest().
> + *
> + * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
> + * of @dst_cpu are idle and @sg has lower priority.
> + */
> +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> +				    struct sg_lb_stats *sgs,
> +				    struct sched_group *sg)
> +{
> +#ifdef CONFIG_SCHED_SMT
> +	bool local_is_smt, sg_is_smt;
> +	int sg_busy_cpus;
> +
> +	local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> +	sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> +
> +	sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> +
> +	if (!local_is_smt) {
> +		/*
> +		 * If we are here, @dst_cpu is idle and does not have SMT
> +		 * siblings. Pull tasks if candidate group has two or more
> +		 * busy CPUs.
> +		 */
> +		if (sg_is_smt && sg_busy_cpus >= 2)
> +			return true;
> +
> +		/*
> +		 * @dst_cpu does not have SMT siblings. @sg may have SMT
> +		 * siblings and only one is busy. In such case, @dst_cpu
> +		 * can help if it has higher priority and is idle (i.e.,
> +		 * it has no running tasks).
> +		 */
> +		return !sds->local_stat.sum_nr_running &&
> +		       sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> +	}
> +
> +	/* @dst_cpu has SMT siblings. */
> +
> +	if (sg_is_smt) {
> +		int local_busy_cpus = sds->local->group_weight -
> +				      sds->local_stat.idle_cpus;
> +		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> +
> +		if (busy_cpus_delta == 1)
> +			return sched_asym_prefer(dst_cpu,
> +						 sg->asym_prefer_cpu);
> +
> +		return false;
> +	}
> +
> +	/*
> +	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
> +	 * up with more than one busy SMT sibling and only pull tasks if there
> +	 * are not busy CPUs (i.e., no CPU has running tasks).
> +	 */
> +	if (!sds->local_stat.sum_nr_running)
> +		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> +
> +	return false;
> +#else
> +	/* Always return false so that callers deal with non-SMT cases. */
> +	return false;
> +#endif
> +}
> +
>  static inline bool
>  sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
>  	   struct sched_group *group)
>  {
> +	/* Only do SMT checks if either local or candidate have SMT siblings */
> +	if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> +	    (group->flags & SD_SHARE_CPUCAPACITY))
> +		return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
> +
>  	return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
>  }
>  
> @@ -9606,6 +9694,12 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>  		    nr_running == 1)
>  			continue;
>  
> +		/* Make sure we only pull tasks from a CPU of lower priority */
> +		if ((env->sd->flags & SD_ASYM_PACKING) &&
> +		    sched_asym_prefer(i, env->dst_cpu) &&
> +		    nr_running == 1)
> +			continue;
> +
>  		switch (env->migration_type) {
>  		case migrate_load:
>  			/*
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-10-01  9:33   ` [PATCH v5 6/6] " Guillaume Tucker
@ 2021-10-01  9:40     ` Guillaume Tucker
  2021-10-01 10:25     ` Vincent Guittot
  1 sibling, 0 replies; 35+ messages in thread
From: Guillaume Tucker @ 2021-10-01  9:40 UTC (permalink / raw)
  To: Ricardo Neri, Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Quentin Perret, Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki, kernelci-results

On 01/10/2021 11:33, Guillaume Tucker wrote:
> Please see the bisection report below about a boot failure on
> rk3288-rock64.

Sorry, I meant rk3328-rock64.

Guillaume

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-10-01  9:33   ` [PATCH v5 6/6] " Guillaume Tucker
  2021-10-01  9:40     ` Guillaume Tucker
@ 2021-10-01 10:25     ` Vincent Guittot
  2021-10-01 17:43       ` Ricardo Neri
  1 sibling, 1 reply; 35+ messages in thread
From: Vincent Guittot @ 2021-10-01 10:25 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Ricardo Neri, Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Srikar Dronamraju, Nicholas Piggin,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki, kernelci-results

Hi Guillaume,

This patch and the patchset which includes this patch only impacts
systems with hyperthreading which is not the case of rk3328-rock64
AFAICT. So there is no reason that this code is used by the board. The
only impact should be an increase of the binary for this platform.
Could it be that it reached a limit ?

Regards,
Vincent

On Fri, 1 Oct 2021 at 11:33, Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> Hi Ricardo,
>
> Please see the bisection report below about a boot failure on
> rk3288-rock64.
>
> Reports aren't automatically sent to the public while we're
> trialing new bisection features on kernelci.org but this one
> looks valid.
>
> Some more details can be found here:
>
>   https://linux.kernelci.org/test/case/id/6155a4b0836c79a98f99a31d/
>
> It looks like some i.MX arm64 platforms also regressed with
> next-20210920 although it hasn't been verified yet whether that's
> due to the same commit:
>
>   https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20210930/plan/baseline/
>
> The x86 platforms don't seem to be impacted though.
>
> Please let us know if you need help debugging the issue or if you
> have a fix to try.
>
> Best wishes,
> Guillaume
>
>
> GitHub: https://github.com/kernelci/kernelci-project/issues/65
>
> -------------------------------------------------------------------------------
>
>
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has      *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.      *
> *                                                               *
> * If you do send a fix, please include this trailer:            *
> *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
> *                                                               *
> * Hope this helps!                                              *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>
> next/master bisection: baseline.login on rk3328-rock64
>
> Summary:
>   Start:      2d02a18f75fc Add linux-next specific files for 20210929
>   Plain log:  https://storage.kernelci.org/next/master/next-20210929/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-baylibre/baseline-rk3328-rock64.txt
>   HTML log:   https://storage.kernelci.org/next/master/next-20210929/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-baylibre/baseline-rk3328-rock64.html
>   Result:     eac6f3841f1d sched/fair: Consider SMT in ASYM_PACKING load balance
>
> Checks:
>   revert:     PASS
>   verify:     PASS
>
> Parameters:
>   Tree:       next
>   URL:        https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch:     master
>   Target:     rk3328-rock64
>   CPU arch:   arm64
>   Lab:        lab-baylibre
>   Compiler:   gcc-8
>   Config:     defconfig+CONFIG_RANDOMIZE_BASE=y
>   Test case:  baseline.login
>
> Breaking commit found:
>
> -------------------------------------------------------------------------------
> commit eac6f3841f1dac7b6f43002056b63f44cc1f1543
> Author: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> Date:   Fri Sep 10 18:18:19 2021 -0700
>
>     sched/fair: Consider SMT in ASYM_PACKING load balance
>
>
> On 11/09/2021 03:18, Ricardo Neri wrote:
> > When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
> > check for the idle state of the destination CPU, dst_cpu, but also of
> > its SMT siblings.
> >
> > If dst_cpu is idle but its SMT siblings are busy, performance suffers
> > if it pulls tasks from a medium priority CPU that does not have SMT
> > siblings.
> >
> > Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
> > siblings of both dst_cpu and the CPUs in the candidate busiest group.
> >
> > Cc: Aubrey Li <aubrey.li@intel.com>
> > Cc: Ben Segall <bsegall@google.com>
> > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Quentin Perret <qperret@google.com>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Tim Chen <tim.c.chen@linux.intel.com>
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Reviewed-by: Len Brown <len.brown@intel.com>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> > Changes since v4:
> >   * Use sg_lb_stats::sum_nr_running the idle state of a scheduling group.
> >     (Vincent, Peter)
> >   * Do not even idle CPUs in asym_smt_can_pull_tasks(). (Vincent)
> >   * Updated function documentation and corrected a typo.
> >
> > Changes since v3:
> >   * Removed the arch_asym_check_smt_siblings() hook. Discussions with the
> >     powerpc folks showed that this patch should not impact them. Also, more
> >     recent powerpc processor no longer use asym_packing. (PeterZ)
> >   * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar)
> >   * Removed unnecessary check for local CPUs when the local group has zero
> >     utilization. (Joel)
> >   * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect
> >     the fact that it deals with SMT cases.
> >   * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so
> >     that callers can deal with non-SMT cases.
> >
> > Changes since v2:
> >   * Reworded the commit message to reflect updates in code.
> >   * Corrected misrepresentation of dst_cpu as the CPU doing the load
> >     balancing. (PeterZ)
> >   * Removed call to arch_asym_check_smt_siblings() as it is now called in
> >     sched_asym().
> >
> > Changes since v1:
> >   * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
> >     tasks. Instead, reclassify the candidate busiest group, as it
> >     may still be selected. (PeterZ)
> >   * Avoid an expensive and unnecessary call to cpumask_weight() when
> >     determining if a sched_group is comprised of SMT siblings.
> >     (PeterZ).
> > ---
> >  kernel/sched/fair.c | 94 +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 94 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 26db017c14a3..8d763dd0174b 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8597,10 +8597,98 @@ group_type group_classify(unsigned int imbalance_pct,
> >       return group_has_spare;
> >  }
> >
> > +/**
> > + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> > + * @dst_cpu: Destination CPU of the load balancing
> > + * @sds:     Load-balancing data with statistics of the local group
> > + * @sgs:     Load-balancing statistics of the candidate busiest group
> > + * @sg:              The candidate busiest group
> > + *
> > + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> > + * if @dst_cpu can pull tasks.
> > + *
> > + * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
> > + * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
> > + * only if @dst_cpu has higher priority.
> > + *
> > + * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
> > + * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
> > + * Bigger imbalances in the number of busy CPUs will be dealt with in
> > + * update_sd_pick_busiest().
> > + *
> > + * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
> > + * of @dst_cpu are idle and @sg has lower priority.
> > + */
> > +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> > +                                 struct sg_lb_stats *sgs,
> > +                                 struct sched_group *sg)
> > +{
> > +#ifdef CONFIG_SCHED_SMT
> > +     bool local_is_smt, sg_is_smt;
> > +     int sg_busy_cpus;
> > +
> > +     local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> > +     sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> > +
> > +     sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> > +
> > +     if (!local_is_smt) {
> > +             /*
> > +              * If we are here, @dst_cpu is idle and does not have SMT
> > +              * siblings. Pull tasks if candidate group has two or more
> > +              * busy CPUs.
> > +              */
> > +             if (sg_is_smt && sg_busy_cpus >= 2)
> > +                     return true;
> > +
> > +             /*
> > +              * @dst_cpu does not have SMT siblings. @sg may have SMT
> > +              * siblings and only one is busy. In such case, @dst_cpu
> > +              * can help if it has higher priority and is idle (i.e.,
> > +              * it has no running tasks).
> > +              */
> > +             return !sds->local_stat.sum_nr_running &&
> > +                    sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > +     }
> > +
> > +     /* @dst_cpu has SMT siblings. */
> > +
> > +     if (sg_is_smt) {
> > +             int local_busy_cpus = sds->local->group_weight -
> > +                                   sds->local_stat.idle_cpus;
> > +             int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> > +
> > +             if (busy_cpus_delta == 1)
> > +                     return sched_asym_prefer(dst_cpu,
> > +                                              sg->asym_prefer_cpu);
> > +
> > +             return false;
> > +     }
> > +
> > +     /*
> > +      * @sg does not have SMT siblings. Ensure that @sds::local does not end
> > +      * up with more than one busy SMT sibling and only pull tasks if there
> > +      * are not busy CPUs (i.e., no CPU has running tasks).
> > +      */
> > +     if (!sds->local_stat.sum_nr_running)
> > +             return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > +
> > +     return false;
> > +#else
> > +     /* Always return false so that callers deal with non-SMT cases. */
> > +     return false;
> > +#endif
> > +}
> > +
> >  static inline bool
> >  sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
> >          struct sched_group *group)
> >  {
> > +     /* Only do SMT checks if either local or candidate have SMT siblings */
> > +     if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> > +         (group->flags & SD_SHARE_CPUCAPACITY))
> > +             return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
> > +
> >       return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
> >  }
> >
> > @@ -9606,6 +9694,12 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> >                   nr_running == 1)
> >                       continue;
> >
> > +             /* Make sure we only pull tasks from a CPU of lower priority */
> > +             if ((env->sd->flags & SD_ASYM_PACKING) &&
> > +                 sched_asym_prefer(i, env->dst_cpu) &&
> > +                 nr_running == 1)
> > +                     continue;
> > +
> >               switch (env->migration_type) {
> >               case migrate_load:
> >                       /*
> >
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-10-01 10:25     ` Vincent Guittot
@ 2021-10-01 17:43       ` Ricardo Neri
  0 siblings, 0 replies; 35+ messages in thread
From: Ricardo Neri @ 2021-10-01 17:43 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Guillaume Tucker, Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Srikar Dronamraju, Nicholas Piggin,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira, Rafael J . Wysocki, kernelci-results

On Fri, Oct 01, 2021 at 12:25:42PM +0200, Vincent Guittot wrote:
> Hi Guillaume,
> 
> This patch and the patchset which includes this patch only impacts
> systems with hyperthreading which is not the case of rk3328-rock64
> AFAICT. So there is no reason that this code is used by the board. The
> only impact should be an increase of the binary for this platform.
> Could it be that it reached a limit ?
> 
> Regards,
> Vincent
> 
> On Fri, 1 Oct 2021 at 11:33, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
> >
> > Hi Ricardo,
> >
> > Please see the bisection report below about a boot failure on
> > rk3288-rock64.
> >
> > Reports aren't automatically sent to the public while we're
> > trialing new bisection features on kernelci.org but this one
> > looks valid.
> >
> > Some more details can be found here:
> >
> >   https://linux.kernelci.org/test/case/id/6155a4b0836c79a98f99a31d/
> >
> > It looks like some i.MX arm64 platforms also regressed with
> > next-20210920 although it hasn't been verified yet whether that's
> > due to the same commit:
> >
> >   https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20210930/plan/baseline/
> >
> > The x86 platforms don't seem to be impacted though.
> >
> > Please let us know if you need help debugging the issue or if you
> > have a fix to try.

Hi Guillaume,

I accessed the bug report above. It does not seem to include any kernel
log or crash. Maybe it hangs very early at boot? As Vicent mention, this
code is specific to hyperthreading. Furthermore, for this code path to
be executed the scheduling domains must have the SD_ASYM_PACKING flag.
AFAIK, only powerpc and x86 use this flag. Also, by the time this code
is executed, we should be past early boot. At least some messages should
have been printed to the kernel console.

Maybe Vincent's idea on the binary size is the issue?

Thanks and BR,
Ricardo

> >
> > Best wishes,
> > Guillaume
> >
> >
> > GitHub: https://github.com/kernelci/kernelci-project/issues/65
> >
> > -------------------------------------------------------------------------------
> >
> >
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> > * This automated bisection report was sent to you on the basis  *
> > * that you may be involved with the breaking commit it has      *
> > * found.  No manual investigation has been done to verify it,   *
> > * and the root cause of the problem may be somewhere else.      *
> > *                                                               *
> > * If you do send a fix, please include this trailer:            *
> > *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
> > *                                                               *
> > * Hope this helps!                                              *
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> >
> > next/master bisection: baseline.login on rk3328-rock64
> >
> > Summary:
> >   Start:      2d02a18f75fc Add linux-next specific files for 20210929
> >   Plain log:  https://storage.kernelci.org/next/master/next-20210929/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-baylibre/baseline-rk3328-rock64.txt
> >   HTML log:   https://storage.kernelci.org/next/master/next-20210929/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-baylibre/baseline-rk3328-rock64.html
> >   Result:     eac6f3841f1d sched/fair: Consider SMT in ASYM_PACKING load balance
> >
> > Checks:
> >   revert:     PASS
> >   verify:     PASS
> >
> > Parameters:
> >   Tree:       next
> >   URL:        https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> >   Branch:     master
> >   Target:     rk3328-rock64
> >   CPU arch:   arm64
> >   Lab:        lab-baylibre
> >   Compiler:   gcc-8
> >   Config:     defconfig+CONFIG_RANDOMIZE_BASE=y
> >   Test case:  baseline.login
> >
> > Breaking commit found:
> >
> > -------------------------------------------------------------------------------
> > commit eac6f3841f1dac7b6f43002056b63f44cc1f1543
> > Author: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > Date:   Fri Sep 10 18:18:19 2021 -0700
> >
> >     sched/fair: Consider SMT in ASYM_PACKING load balance
> >
> >
> > On 11/09/2021 03:18, Ricardo Neri wrote:
> > > When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
> > > check for the idle state of the destination CPU, dst_cpu, but also of
> > > its SMT siblings.
> > >
> > > If dst_cpu is idle but its SMT siblings are busy, performance suffers
> > > if it pulls tasks from a medium priority CPU that does not have SMT
> > > siblings.
> > >
> > > Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
> > > siblings of both dst_cpu and the CPUs in the candidate busiest group.
> > >
> > > Cc: Aubrey Li <aubrey.li@intel.com>
> > > Cc: Ben Segall <bsegall@google.com>
> > > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > > Cc: Mel Gorman <mgorman@suse.de>
> > > Cc: Quentin Perret <qperret@google.com>
> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Tim Chen <tim.c.chen@linux.intel.com>
> > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > Reviewed-by: Len Brown <len.brown@intel.com>
> > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > > ---
> > > Changes since v4:
> > >   * Use sg_lb_stats::sum_nr_running the idle state of a scheduling group.
> > >     (Vincent, Peter)
> > >   * Do not even idle CPUs in asym_smt_can_pull_tasks(). (Vincent)
> > >   * Updated function documentation and corrected a typo.
> > >
> > > Changes since v3:
> > >   * Removed the arch_asym_check_smt_siblings() hook. Discussions with the
> > >     powerpc folks showed that this patch should not impact them. Also, more
> > >     recent powerpc processor no longer use asym_packing. (PeterZ)
> > >   * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar)
> > >   * Removed unnecessary check for local CPUs when the local group has zero
> > >     utilization. (Joel)
> > >   * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect
> > >     the fact that it deals with SMT cases.
> > >   * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so
> > >     that callers can deal with non-SMT cases.
> > >
> > > Changes since v2:
> > >   * Reworded the commit message to reflect updates in code.
> > >   * Corrected misrepresentation of dst_cpu as the CPU doing the load
> > >     balancing. (PeterZ)
> > >   * Removed call to arch_asym_check_smt_siblings() as it is now called in
> > >     sched_asym().
> > >
> > > Changes since v1:
> > >   * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
> > >     tasks. Instead, reclassify the candidate busiest group, as it
> > >     may still be selected. (PeterZ)
> > >   * Avoid an expensive and unnecessary call to cpumask_weight() when
> > >     determining if a sched_group is comprised of SMT siblings.
> > >     (PeterZ).
> > > ---
> > >  kernel/sched/fair.c | 94 +++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 94 insertions(+)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 26db017c14a3..8d763dd0174b 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8597,10 +8597,98 @@ group_type group_classify(unsigned int imbalance_pct,
> > >       return group_has_spare;
> > >  }
> > >
> > > +/**
> > > + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> > > + * @dst_cpu: Destination CPU of the load balancing
> > > + * @sds:     Load-balancing data with statistics of the local group
> > > + * @sgs:     Load-balancing statistics of the candidate busiest group
> > > + * @sg:              The candidate busiest group
> > > + *
> > > + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> > > + * if @dst_cpu can pull tasks.
> > > + *
> > > + * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
> > > + * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
> > > + * only if @dst_cpu has higher priority.
> > > + *
> > > + * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
> > > + * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
> > > + * Bigger imbalances in the number of busy CPUs will be dealt with in
> > > + * update_sd_pick_busiest().
> > > + *
> > > + * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
> > > + * of @dst_cpu are idle and @sg has lower priority.
> > > + */
> > > +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> > > +                                 struct sg_lb_stats *sgs,
> > > +                                 struct sched_group *sg)
> > > +{
> > > +#ifdef CONFIG_SCHED_SMT
> > > +     bool local_is_smt, sg_is_smt;
> > > +     int sg_busy_cpus;
> > > +
> > > +     local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> > > +     sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> > > +
> > > +     sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> > > +
> > > +     if (!local_is_smt) {
> > > +             /*
> > > +              * If we are here, @dst_cpu is idle and does not have SMT
> > > +              * siblings. Pull tasks if candidate group has two or more
> > > +              * busy CPUs.
> > > +              */
> > > +             if (sg_is_smt && sg_busy_cpus >= 2)
> > > +                     return true;
> > > +
> > > +             /*
> > > +              * @dst_cpu does not have SMT siblings. @sg may have SMT
> > > +              * siblings and only one is busy. In such case, @dst_cpu
> > > +              * can help if it has higher priority and is idle (i.e.,
> > > +              * it has no running tasks).
> > > +              */
> > > +             return !sds->local_stat.sum_nr_running &&
> > > +                    sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > > +     }
> > > +
> > > +     /* @dst_cpu has SMT siblings. */
> > > +
> > > +     if (sg_is_smt) {
> > > +             int local_busy_cpus = sds->local->group_weight -
> > > +                                   sds->local_stat.idle_cpus;
> > > +             int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> > > +
> > > +             if (busy_cpus_delta == 1)
> > > +                     return sched_asym_prefer(dst_cpu,
> > > +                                              sg->asym_prefer_cpu);
> > > +
> > > +             return false;
> > > +     }
> > > +
> > > +     /*
> > > +      * @sg does not have SMT siblings. Ensure that @sds::local does not end
> > > +      * up with more than one busy SMT sibling and only pull tasks if there
> > > +      * are not busy CPUs (i.e., no CPU has running tasks).
> > > +      */
> > > +     if (!sds->local_stat.sum_nr_running)
> > > +             return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > > +
> > > +     return false;
> > > +#else
> > > +     /* Always return false so that callers deal with non-SMT cases. */
> > > +     return false;
> > > +#endif
> > > +}
> > > +
> > >  static inline bool
> > >  sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
> > >          struct sched_group *group)
> > >  {
> > > +     /* Only do SMT checks if either local or candidate have SMT siblings */
> > > +     if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> > > +         (group->flags & SD_SHARE_CPUCAPACITY))
> > > +             return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
> > > +
> > >       return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
> > >  }
> > >
> > > @@ -9606,6 +9694,12 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> > >                   nr_running == 1)
> > >                       continue;
> > >
> > > +             /* Make sure we only pull tasks from a CPU of lower priority */
> > > +             if ((env->sd->flags & SD_ASYM_PACKING) &&
> > > +                 sched_asym_prefer(i, env->dst_cpu) &&
> > > +                 nr_running == 1)
> > > +                     continue;
> > > +
> > >               switch (env->migration_type) {
> > >               case migrate_load:
> > >                       /*
> > >
> >

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/fair: Carve out logic to mark a group for asymmetric packing
  2021-09-11  1:18 ` [PATCH v5 5/6] sched/fair: Carve out logic to mark a group for asymmetric packing Ricardo Neri
  2021-09-17 15:27   ` Vincent Guittot
  2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
@ 2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  2 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Ricardo Neri @ 2021-10-05 14:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ricardo Neri, Peter Zijlstra (Intel), Joel Fernandes (Google),
	Len Brown, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     aafc917a3c31dcc76cb0279cd7617dda11b15f2a
Gitweb:        https://git.kernel.org/tip/aafc917a3c31dcc76cb0279cd7617dda11b15f2a
Author:        Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
AuthorDate:    Fri, 10 Sep 2021 18:18:18 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Oct 2021 15:52:04 +02:00

sched/fair: Carve out logic to mark a group for asymmetric packing

Create a separate function, sched_asym(). A subsequent changeset will
introduce logic to deal with SMT in conjunction with asmymmetric
packing. Such logic will need the statistics of the scheduling
group provided as argument. Update them before calling sched_asym().

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210911011819.12184-6-ricardo.neri-calderon@linux.intel.com
---
 kernel/sched/fair.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2e8ef33..1c8b5fa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8571,6 +8571,13 @@ group_type group_classify(unsigned int imbalance_pct,
 	return group_has_spare;
 }
 
+static inline bool
+sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
+	   struct sched_group *group)
+{
+	return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
+}
+
 /**
  * update_sg_lb_stats - Update sched_group's statistics for load balancing.
  * @env: The load balancing environment.
@@ -8631,18 +8638,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 		}
 	}
 
+	sgs->group_capacity = group->sgc->capacity;
+
+	sgs->group_weight = group->group_weight;
+
 	/* Check if dst CPU is idle and preferred to this group */
 	if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
-	    env->idle != CPU_NOT_IDLE &&
-	    sgs->sum_h_nr_running &&
-	    sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
+	    env->idle != CPU_NOT_IDLE && sgs->sum_h_nr_running &&
+	    sched_asym(env, sds, sgs, group)) {
 		sgs->group_asym_packing = 1;
 	}
 
-	sgs->group_capacity = group->sgc->capacity;
-
-	sgs->group_weight = group->group_weight;
-
 	sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs);
 
 	/* Computing avg_load makes sense only when group is overloaded */

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-09-11  1:18 ` [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
                     ` (2 preceding siblings ...)
  2021-10-01  9:33   ` [PATCH v5 6/6] " Guillaume Tucker
@ 2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  3 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Ricardo Neri @ 2021-10-05 14:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ricardo Neri, Peter Zijlstra (Intel), Joel Fernandes (Google),
	Len Brown, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     4006a72bdd93b1ffedc2bd8646dee18c822a2c26
Gitweb:        https://git.kernel.org/tip/4006a72bdd93b1ffedc2bd8646dee18c822a2c26
Author:        Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
AuthorDate:    Fri, 10 Sep 2021 18:18:19 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Oct 2021 15:52:06 +02:00

sched/fair: Consider SMT in ASYM_PACKING load balance

When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
check for the idle state of the destination CPU, dst_cpu, but also of
its SMT siblings.

If dst_cpu is idle but its SMT siblings are busy, performance suffers
if it pulls tasks from a medium priority CPU that does not have SMT
siblings.

Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
siblings of both dst_cpu and the CPUs in the candidate busiest group.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210911011819.12184-7-ricardo.neri-calderon@linux.intel.com
---
 kernel/sched/fair.c | 92 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 92 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1c8b5fa..c50ae23 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8571,10 +8571,96 @@ group_type group_classify(unsigned int imbalance_pct,
 	return group_has_spare;
 }
 
+/**
+ * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
+ * @dst_cpu:	Destination CPU of the load balancing
+ * @sds:	Load-balancing data with statistics of the local group
+ * @sgs:	Load-balancing statistics of the candidate busiest group
+ * @sg:		The candidate busiest group
+ *
+ * Check the state of the SMT siblings of both @sds::local and @sg and decide
+ * if @dst_cpu can pull tasks.
+ *
+ * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
+ * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
+ * only if @dst_cpu has higher priority.
+ *
+ * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
+ * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
+ * Bigger imbalances in the number of busy CPUs will be dealt with in
+ * update_sd_pick_busiest().
+ *
+ * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
+ * of @dst_cpu are idle and @sg has lower priority.
+ */
+static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
+				    struct sg_lb_stats *sgs,
+				    struct sched_group *sg)
+{
+#ifdef CONFIG_SCHED_SMT
+	bool local_is_smt, sg_is_smt;
+	int sg_busy_cpus;
+
+	local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
+	sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
+
+	sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
+
+	if (!local_is_smt) {
+		/*
+		 * If we are here, @dst_cpu is idle and does not have SMT
+		 * siblings. Pull tasks if candidate group has two or more
+		 * busy CPUs.
+		 */
+		if (sg_busy_cpus >= 2) /* implies sg_is_smt */
+			return true;
+
+		/*
+		 * @dst_cpu does not have SMT siblings. @sg may have SMT
+		 * siblings and only one is busy. In such case, @dst_cpu
+		 * can help if it has higher priority and is idle (i.e.,
+		 * it has no running tasks).
+		 */
+		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+	}
+
+	/* @dst_cpu has SMT siblings. */
+
+	if (sg_is_smt) {
+		int local_busy_cpus = sds->local->group_weight -
+				      sds->local_stat.idle_cpus;
+		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
+
+		if (busy_cpus_delta == 1)
+			return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+
+		return false;
+	}
+
+	/*
+	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
+	 * up with more than one busy SMT sibling and only pull tasks if there
+	 * are not busy CPUs (i.e., no CPU has running tasks).
+	 */
+	if (!sds->local_stat.sum_nr_running)
+		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+
+	return false;
+#else
+	/* Always return false so that callers deal with non-SMT cases. */
+	return false;
+#endif
+}
+
 static inline bool
 sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
 	   struct sched_group *group)
 {
+	/* Only do SMT checks if either local or candidate have SMT siblings */
+	if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
+	    (group->flags & SD_SHARE_CPUCAPACITY))
+		return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
+
 	return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
 }
 
@@ -9580,6 +9666,12 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 		    nr_running == 1)
 			continue;
 
+		/* Make sure we only pull tasks from a CPU of lower priority */
+		if ((env->sd->flags & SD_ASYM_PACKING) &&
+		    sched_asym_prefer(i, env->dst_cpu) &&
+		    nr_running == 1)
+			continue;
+
 		switch (env->migration_type) {
 		case migrate_load:
 			/*

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/fair: Optimize checking for group_asym_packing
  2021-09-11  1:18 ` [PATCH v5 3/6] sched/fair: Optimize checking for group_asym_packing Ricardo Neri
  2021-09-17 15:26   ` Vincent Guittot
  2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
@ 2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  2 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Ricardo Neri @ 2021-10-05 14:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ricardo Neri, Peter Zijlstra (Intel), Joel Fernandes (Google),
	Len Brown, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     6025643596895695956c27119c6b0bfa40d8035b
Gitweb:        https://git.kernel.org/tip/6025643596895695956c27119c6b0bfa40d8035b
Author:        Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
AuthorDate:    Fri, 10 Sep 2021 18:18:16 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Oct 2021 15:52:02 +02:00

sched/fair: Optimize checking for group_asym_packing

sched_asmy_prefer() always returns false when called on the local group. By
checking local_group, we can avoid additional checks and invoking
sched_asmy_prefer() when it is not needed. No functional changes are
introduced.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210911011819.12184-4-ricardo.neri-calderon@linux.intel.com
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 71b30ef..e050b1d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8631,7 +8631,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 	}
 
 	/* Check if dst CPU is idle and preferred to this group */
-	if (env->sd->flags & SD_ASYM_PACKING &&
+	if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
 	    env->idle != CPU_NOT_IDLE &&
 	    sgs->sum_h_nr_running &&
 	    sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/fair: Provide update_sg_lb_stats() with sched domain statistics
  2021-09-11  1:18 ` [PATCH v5 4/6] sched/fair: Provide update_sg_lb_stats() with sched domain statistics Ricardo Neri
  2021-09-17 15:27   ` Vincent Guittot
  2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
@ 2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  2 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Ricardo Neri @ 2021-10-05 14:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Ricardo Neri, Joel Fernandes (Google),
	Len Brown, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     c0d14b57fe0c11b65ce8a1a4a58a48f3f324ca0f
Gitweb:        https://git.kernel.org/tip/c0d14b57fe0c11b65ce8a1a4a58a48f3f324ca0f
Author:        Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
AuthorDate:    Fri, 10 Sep 2021 18:18:17 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Oct 2021 15:52:03 +02:00

sched/fair: Provide update_sg_lb_stats() with sched domain statistics

Before deciding to pull tasks when using asymmetric packing of tasks,
on some architectures (e.g., x86) it is necessary to know not only the
state of dst_cpu but also of its SMT siblings. The decision to classify
a candidate busiest group as group_asym_packing is done in
update_sg_lb_stats(). Give this function access to the scheduling domain
statistics, which contains the statistics of the local group.

Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210911011819.12184-5-ricardo.neri-calderon@linux.intel.com
---
 kernel/sched/fair.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e050b1d..2e8ef33 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8579,6 +8579,7 @@ group_type group_classify(unsigned int imbalance_pct,
  * @sg_status: Holds flag indicating the status of the sched_group
  */
 static inline void update_sg_lb_stats(struct lb_env *env,
+				      struct sd_lb_stats *sds,
 				      struct sched_group *group,
 				      struct sg_lb_stats *sgs,
 				      int *sg_status)
@@ -8587,7 +8588,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 
 	memset(sgs, 0, sizeof(*sgs));
 
-	local_group = cpumask_test_cpu(env->dst_cpu, sched_group_span(group));
+	local_group = group == sds->local;
 
 	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
 		struct rq *rq = cpu_rq(i);
@@ -9150,7 +9151,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 				update_group_capacity(env->sd, env->dst_cpu);
 		}
 
-		update_sg_lb_stats(env, sg, sgs, &sg_status);
+		update_sg_lb_stats(env, sds, sg, sgs, &sg_status);
 
 		if (local_group)
 			goto next_group;

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/topology: Introduce sched_group::flags
  2021-09-11  1:18 ` [PATCH v5 2/6] sched/topology: Introduce sched_group::flags Ricardo Neri
  2021-09-17 15:26   ` Vincent Guittot
  2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
@ 2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  2023-05-23 10:59     ` Peter Zijlstra
  2 siblings, 1 reply; 35+ messages in thread
From: tip-bot2 for Ricardo Neri @ 2021-10-05 14:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Ricardo Neri, Joel Fernandes (Google),
	Len Brown, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     16d364ba6ef2aa59b409df70682770f3ed23f7c0
Gitweb:        https://git.kernel.org/tip/16d364ba6ef2aa59b409df70682770f3ed23f7c0
Author:        Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
AuthorDate:    Fri, 10 Sep 2021 18:18:15 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Oct 2021 15:52:00 +02:00

sched/topology: Introduce sched_group::flags

There exist situations in which the load balance needs to know the
properties of the CPUs in a scheduling group. When using asymmetric
packing, for instance, the load balancer needs to know not only the
state of dst_cpu but also of its SMT siblings, if any.

Use the flags of the child scheduling domains to initialize scheduling
group flags. This will reflect the properties of the CPUs in the
group.

A subsequent changeset will make use of these new flags. No functional
changes are introduced.

Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210911011819.12184-3-ricardo.neri-calderon@linux.intel.com
---
 kernel/sched/sched.h    |  1 +
 kernel/sched/topology.c | 21 ++++++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 198058b..e5c4d4d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1808,6 +1808,7 @@ struct sched_group {
 	unsigned int		group_weight;
 	struct sched_group_capacity *sgc;
 	int			asym_prefer_cpu;	/* CPU of highest priority in group */
+	int			flags;
 
 	/*
 	 * The CPUs this group covers.
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 4e8698e..c56faae 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -716,8 +716,20 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
 		tmp = sd;
 		sd = sd->parent;
 		destroy_sched_domain(tmp);
-		if (sd)
+		if (sd) {
+			struct sched_group *sg = sd->groups;
+
+			/*
+			 * sched groups hold the flags of the child sched
+			 * domain for convenience. Clear such flags since
+			 * the child is being destroyed.
+			 */
+			do {
+				sg->flags = 0;
+			} while (sg != sd->groups);
+
 			sd->child = NULL;
+		}
 	}
 
 	for (tmp = sd; tmp; tmp = tmp->parent)
@@ -916,10 +928,12 @@ build_group_from_child_sched_domain(struct sched_domain *sd, int cpu)
 		return NULL;
 
 	sg_span = sched_group_span(sg);
-	if (sd->child)
+	if (sd->child) {
 		cpumask_copy(sg_span, sched_domain_span(sd->child));
-	else
+		sg->flags = sd->child->flags;
+	} else {
 		cpumask_copy(sg_span, sched_domain_span(sd));
+	}
 
 	atomic_inc(&sg->ref);
 	return sg;
@@ -1169,6 +1183,7 @@ static struct sched_group *get_group(int cpu, struct sd_data *sdd)
 	if (child) {
 		cpumask_copy(sched_group_span(sg), sched_domain_span(child));
 		cpumask_copy(group_balance_mask(sg), sched_group_span(sg));
+		sg->flags = child->flags;
 	} else {
 		cpumask_set_cpu(cpu, sched_group_span(sg));
 		cpumask_set_cpu(cpu, group_balance_mask(sg));

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] x86/sched: Decrease further the priorities of SMT siblings
  2021-09-11  1:18 ` [PATCH v5 1/6] x86/sched: Decrease further the priorities of SMT siblings Ricardo Neri
  2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
@ 2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot2 for Ricardo Neri @ 2021-10-05 14:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Len Brown, Ricardo Neri, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     183b8ec38f1ec6c1f8419375303bf1d09a2b8369
Gitweb:        https://git.kernel.org/tip/183b8ec38f1ec6c1f8419375303bf1d09a2b8369
Author:        Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
AuthorDate:    Fri, 10 Sep 2021 18:18:14 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Oct 2021 15:51:59 +02:00

x86/sched: Decrease further the priorities of SMT siblings

When scheduling, it is better to prefer a separate physical core rather
than the SMT sibling of a high priority core. The existing formula to
compute priorities takes such fact in consideration. There may exist,
however, combinations of priorities (i.e., maximum frequencies) in which
the priority of high-numbered SMT siblings of high-priority cores collides
with the priority of low-numbered SMT siblings of low-priority cores.

Consider for instance an SMT2 system with CPUs [0, 1] with priority 60 and
[2, 3] with priority 30(CPUs in brackets are SMT siblings. In such a case,
the resulting priorities would be [120, 60], [60, 30]. Thus, to ensure
that CPU2 has higher priority than CPU1, divide the raw priority by the
squared SMT iterator. The resulting priorities are [120, 30]. [60, 15].

Originally-by: Len Brown <len.brown@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210911011819.12184-2-ricardo.neri-calderon@linux.intel.com
---
 arch/x86/kernel/itmt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index 1afbdd1..9ff480e 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -198,7 +198,7 @@ void sched_set_itmt_core_prio(int prio, int core_cpu)
 		 * of the priority chain and only used when
 		 * all other high priority cpus are out of capacity.
 		 */
-		smt_prio = prio * smp_num_siblings / i;
+		smt_prio = prio * smp_num_siblings / (i * i);
 		per_cpu(sched_core_priority, cpu) = smt_prio;
 		i++;
 	}

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [tip: sched/core] sched/topology: Introduce sched_group::flags
  2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
@ 2023-05-23 10:59     ` Peter Zijlstra
  2023-05-24 13:02       ` Vincent Guittot
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2023-05-23 10:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-tip-commits, Ricardo Neri, Joel Fernandes (Google),
	Len Brown, Vincent Guittot, x86

On Tue, Oct 05, 2021 at 02:12:02PM -0000, tip-bot2 for Ricardo Neri wrote:

> index 4e8698e..c56faae 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -716,8 +716,20 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
>  		tmp = sd;
>  		sd = sd->parent;
>  		destroy_sched_domain(tmp);
> -		if (sd)
> +		if (sd) {
> +			struct sched_group *sg = sd->groups;
> +
> +			/*
> +			 * sched groups hold the flags of the child sched
> +			 * domain for convenience. Clear such flags since
> +			 * the child is being destroyed.
> +			 */
> +			do {
> +				sg->flags = 0;
> +			} while (sg != sd->groups);


I happened to be reading this here code and aren't we missing:

	sg = sg->next;

somewhere in that loop?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [tip: sched/core] sched/topology: Introduce sched_group::flags
  2023-05-23 10:59     ` Peter Zijlstra
@ 2023-05-24 13:02       ` Vincent Guittot
  0 siblings, 0 replies; 35+ messages in thread
From: Vincent Guittot @ 2023-05-24 13:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-tip-commits, Ricardo Neri,
	Joel Fernandes (Google),
	Len Brown, x86

On Tue, 23 May 2023 at 12:59, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Oct 05, 2021 at 02:12:02PM -0000, tip-bot2 for Ricardo Neri wrote:
>
> > index 4e8698e..c56faae 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -716,8 +716,20 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
> >               tmp = sd;
> >               sd = sd->parent;
> >               destroy_sched_domain(tmp);
> > -             if (sd)
> > +             if (sd) {
> > +                     struct sched_group *sg = sd->groups;
> > +
> > +                     /*
> > +                      * sched groups hold the flags of the child sched
> > +                      * domain for convenience. Clear such flags since
> > +                      * the child is being destroyed.
> > +                      */
> > +                     do {
> > +                             sg->flags = 0;
> > +                     } while (sg != sd->groups);
>
>
> I happened to be reading this here code and aren't we missing:
>
>         sg = sg->next;
>
> somewhere in that loop?

Yes, I missed that.

That being said, the only reason for sd to be degenerate is that there
is only 1 group. Otherwise we will keep it and degenerate parents
instead

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2023-05-24 13:03 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-11  1:18 [PATCH v5 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
2021-09-11  1:18 ` [PATCH v5 1/6] x86/sched: Decrease further the priorities of SMT siblings Ricardo Neri
2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
2021-09-11  1:18 ` [PATCH v5 2/6] sched/topology: Introduce sched_group::flags Ricardo Neri
2021-09-17 15:26   ` Vincent Guittot
2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
2023-05-23 10:59     ` Peter Zijlstra
2023-05-24 13:02       ` Vincent Guittot
2021-09-11  1:18 ` [PATCH v5 3/6] sched/fair: Optimize checking for group_asym_packing Ricardo Neri
2021-09-17 15:26   ` Vincent Guittot
2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
2021-09-11  1:18 ` [PATCH v5 4/6] sched/fair: Provide update_sg_lb_stats() with sched domain statistics Ricardo Neri
2021-09-17 15:27   ` Vincent Guittot
2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
2021-09-11  1:18 ` [PATCH v5 5/6] sched/fair: Carve out logic to mark a group for asymmetric packing Ricardo Neri
2021-09-17 15:27   ` Vincent Guittot
2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
2021-10-05 14:12   ` tip-bot2 for Ricardo Neri
2021-09-11  1:18 ` [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
2021-09-15 15:43   ` Vincent Guittot
2021-09-17  1:00     ` Ricardo Neri
2021-09-17  7:41       ` Vincent Guittot
2021-09-17 15:25       ` Vincent Guittot
2021-09-17 18:46         ` Peter Zijlstra
2021-09-18  9:33           ` Vincent Guittot
2021-09-21  7:27   ` [tip: sched/core] " tip-bot2 for Ricardo Neri
2021-10-01  9:33   ` [PATCH v5 6/6] " Guillaume Tucker
2021-10-01  9:40     ` Guillaume Tucker
2021-10-01 10:25     ` Vincent Guittot
2021-10-01 17:43       ` Ricardo Neri
2021-10-05 14:12   ` [tip: sched/core] " tip-bot2 for Ricardo Neri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).