linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING
@ 2021-08-10 14:41 Ricardo Neri
  2021-08-10 14:41 ` [PATCH v4 1/6] x86/sched: Decrease further the priorities of SMT siblings Ricardo Neri
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Ricardo Neri @ 2021-08-10 14:41 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Rafael J. Wysocki, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Ricardo Neri

This is v4 the series. v1, v2, and v3 patches and test results can be found
in [1], [2], and [3], respectively.

=== Problem statement ===
++ Load balancing ++

When using asymmetric packing, there exists CPU topologies with three
priority levels in which only a subset of the physical cores support SMT.
An instance of such topology is Intel Alderlake, a hybrid processor with
a mixture of Intel Core (with support for SMT) and Intel Atom CPUs.

On Alderlake, it is almost always beneficial to spread work by picking
first the Core CPUs, then the Atoms and at last the SMT siblings.

The current load balancer, however, does not behave as described when using
ASYM_PACKING. Instead, the load balancer will choose higher-priority CPUs 
(an Intel Core) over medium-priority CPUs (an Intel Atom), and subsequently
overflow the load to a low priority SMT sibling CPU. This leaves medium-
priority CPUs idle while low-priority CPUs are busy.

This patchset fixes this behavior by also checking the idle state of the
SMT siblings of both the CPU doing the load balance and the busiest
candidate group when deciding whether the destination CPUs can pull tasks
from the busiest CPU.

++ Rework ASYM_PACKING priorities with ITMT
We also reworked the priority computation of the SMT siblings to ensure
that higher-numbered SMT siblings are always low priority. The current
computation may lead to situations in which in some processors those
higher-numbered SMT siblings have the same priority as the Intel Atom
CPUs.

=== Testing ===
I ran a few benchmarks with and without this version of the patchset on
an Intel Alderlake system with 8 Intel Core (with SMT) and 8 Intel
Atom CPUs.

The baseline for the results is an unmodified v5.13-rc4 kernel. Results
show a comparative percentage of improvement (positive) or degradation
(negative). Each test case is repeated eight times, and the standard
deviation among repetitions is also documented.

Table 1 shows the results when using hardware-controlled performance
performance states (HWP), a common use case. The impact of the patches
is overall positive with a few test cases showing slight degradation.
hackbench is especially difficult to assess it shows a high degree of
variability.

Thanks and BR,
Ricardo

ITMT: Intel Turbo Boost Max Technology 3.0

========
Changes since v3:
  * Reworked the ITMT priority computation to further reduce the priority
    of SMT siblings (patch 1).
  * Clear scheduling group flags when a child scheduling level
    degenerates (patch 2).
  * Removed arch-specific hooks (patch 6, PeterZ)
  * Removed redundant checks for the local group. (patch 5, Dietmar)
  * Removed redundant check for local idle CPUs and local group
    utilization. (patch 6, Joel)
  * Reworded commit messages of patches 2, 3, 5, and 6 for clarity.
    (Len, PeterZ)
  * Added Joel's Reviewed-by tag.
  * Unchanged patches: 4

Changes since v2:
  * Removed arch_sched_asym_prefer_early() and re-evaluation of the
    candidate busiest group in update_sd_pick_busiest(). (PeterZ)
  * Introduced sched_group::flags to reflect the properties of CPUs
    in the scheduling group. This helps to identify scheduling groups
    whose CPUs are SMT siblings. (PeterZ)
  * Modified update_sg_lb_stats() to get statistics of the scheduling
    domain as an argument. This provides it with the statistics of the
    local domain. (PeterZ)
  * Introduced sched_asym() to decide if a busiest candidate group can
    be marked for asymmetric packing.
  * Reworded patch 1 for clarity. (PeterZ)

Changes since v1:
  * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
    tasks. Instead, reclassify the candidate busiest group, as it
    may still be selected. (PeterZ)
  * Avoid an expensive and unnecessary call to cpumask_weight() when
    determining if a sched_group is comprised of SMT siblings.
    (PeterZ).
  * Updated test results using the v2 patches.


========      Table 1. Test results of patches with HWP        ========
=======================================================================
hackbench
=========
case                    load            baseline(std%)  compare%( std%)
process-pipe            group-1          1.00 (  1.45)   -2.99 (  4.35)
process-pipe            group-2          1.00 ( 18.32)   +2.14 (  8.33)
process-pipe            group-4          1.00 ( 17.27)  -10.68 ( 15.85)
process-pipe            group-8          1.00 ( 12.33)   +2.26 ( 13.28)
process-pipe            group-12         1.00 (  6.52)   -4.07 (  7.97)
process-pipe            group-16         1.00 (  9.70)   -7.71 (  6.01)
process-pipe            group-20         1.00 (  2.52)   -4.15 (  6.35)
process-pipe            group-24         1.00 (  4.84)   +1.04 (  4.60)
process-pipe            group-32         1.00 (  4.79)   +1.72 (  5.13)
process-pipe            group-48         1.00 (  6.77)   +4.68 (  4.24)
process-sockets         group-1          1.00 (  1.89)   +0.53 (  3.48)
process-sockets         group-2          1.00 (  7.57)   -6.16 (  4.52)
process-sockets         group-4          1.00 ( 14.62)   +4.93 (  7.11)
process-sockets         group-8          1.00 (  7.69)   +3.15 (  7.19)
process-sockets         group-12         1.00 (  4.97)   +2.49 (  2.80)
process-sockets         group-16         1.00 (  3.93)   -1.57 (  3.86)
process-sockets         group-20         1.00 (  2.56)   -3.63 (  2.88)
process-sockets         group-24         1.00 (  3.00)   +0.74 (  3.01)
process-sockets         group-32         1.00 (  7.63)   +1.79 (  3.67)
process-sockets         group-48         1.00 (  4.15)   -0.44 (  3.70)
threads-pipe            group-1          1.00 (  2.34)   -0.55 (  3.78)
threads-pipe            group-2          1.00 ( 12.74)   -2.24 ( 12.96)
threads-pipe            group-4          1.00 ( 10.03)   +5.80 ( 16.02)
threads-pipe            group-8          1.00 (  7.45)  -12.09 ( 22.91)
threads-pipe            group-12         1.00 (  5.00)  -15.25 ( 10.86)
threads-pipe            group-16         1.00 (  7.41)   +1.95 ( 11.73)
threads-pipe            group-20         1.00 (  7.31)   -1.72 (  5.17)
threads-pipe            group-24         1.00 (  4.48)   +0.43 (  6.39)
threads-pipe            group-32         1.00 (  3.75)   -0.62 (  3.87)
threads-pipe            group-48         1.00 (  1.56)   -3.69 (  5.99)
threads-sockets         group-1          1.00 (  2.27)   +3.51 (  3.79)
threads-sockets         group-2          1.00 (  6.86)   -8.42 ( 11.39)
threads-sockets         group-4          1.00 (  5.28)  -14.35 (  8.73)
threads-sockets         group-8          1.00 ( 11.74)   +5.04 (  5.18)
threads-sockets         group-12         1.00 (  3.29)   -6.15 (  7.08)
threads-sockets         group-16         1.00 (  5.07)   -1.40 (  6.49)
threads-sockets         group-20         1.00 (  4.38)   -5.44 (  5.68)
threads-sockets         group-24         1.00 (  7.20)   +3.67 (  3.99)
threads-sockets         group-32         1.00 (  3.30)   +0.10 (  3.08)
threads-sockets         group-48         1.00 (  4.83)   +4.83 (  3.61)

netperf
=======
case                    load            baseline(std%)  compare%( std%)
TCP_RR                  thread-1         1.00 (  0.00)   +1.20 (  0.71)
TCP_RR                  thread-2         1.00 (  3.03)   +0.25 (  3.05)
TCP_RR                  thread-4         1.00 (  1.99)   +1.21 (  3.40)
TCP_RR                  thread-8         1.00 (  1.36)   -2.00 (  6.79)
TCP_RR                  thread-12        1.00 (  4.53)   +1.68 (  0.95)
TCP_RR                  thread-16        1.00 (  0.56)   -1.20 (  1.27)
TCP_RR                  thread-20        1.00 (  2.41)   +0.68 (  0.36)
TCP_RR                  thread-24        1.00 (  1.27)   -0.16 (  0.29)
TCP_RR                  thread-32        1.00 ( 11.14)   -0.35 ( 11.10)
TCP_RR                  thread-48        1.00 ( 18.56)   -0.12 ( 19.98)
UDP_RR                  thread-1         1.00 (  0.00)   +1.85 (  1.53)
UDP_RR                  thread-2         1.00 (  2.78)   -2.02 (  3.14)
UDP_RR                  thread-4         1.00 (  2.26)   -0.54 (  2.42)
UDP_RR                  thread-8         1.00 (  1.46)   -1.49 (  5.31)
UDP_RR                  thread-12        1.00 (  0.70)   +0.21 (  1.64)
UDP_RR                  thread-16        1.00 (  1.26)   -1.90 (  2.87)
UDP_RR                  thread-20        1.00 (  0.29)   +0.29 (  0.27)
UDP_RR                  thread-24        1.00 (  2.75)   +2.61 (  0.97)
UDP_RR                  thread-32        1.00 ( 11.16)   -2.90 ( 11.26)
UDP_RR                  thread-48        1.00 ( 19.22)   +3.12 ( 17.32)

tbench
======
case                    load            baseline(std%)  compare%( std%)
loopback                thread-1         1.00 (  1.33)   +1.41 (  0.38)
loopback                thread-2         1.00 (  1.03)   +0.06 (  1.08)
loopback                thread-4         1.00 (  1.05)   -0.74 (  1.65)
loopback                thread-8         1.00 (  5.81)   +6.66 (  0.31)
loopback                thread-12        1.00 (  0.60)   +0.32 (  0.20)
loopback                thread-16        1.00 (  1.38)   +1.67 (  0.34)
loopback                thread-20        1.00 (  2.49)   +0.24 (  2.96)
loopback                thread-24        1.00 (  2.26)   +0.75 (  0.29)
loopback                thread-32        1.00 (  2.54)   +0.07 (  2.83)
loopback                thread-48        1.00 (  2.40)   +1.90 (  2.32)

schbench
========
case                    load            baseline(std%)  compare%( std%)
normal                  mthread-1        1.00 (  3.51)   +1.03 (  2.96)
normal                  mthread-2        1.00 (  2.33)   +6.35 (  6.19)
normal                  mthread-4        1.00 (  9.58)   +0.12 (  8.02)
normal                  mthread-8        1.00 (  1.59)   +0.40 (  0.07)
normal                  mthread-12       1.00 (  3.18)   +1.51 (  4.19)
normal                  mthread-16       1.00 (  1.72)   +0.00 (  1.72)
normal                  mthread-20       1.00 (  1.43)   +0.80 (  3.16)
normal                  mthread-24       1.00 (  1.38)   -1.69 (  1.71)
normal                  mthread-32       1.00 (  1.86)   -0.78 (  2.25)
normal                  mthread-48       1.00 (  3.50)   -2.81 (  2.82)
========

[1]. https://lore.kernel.org/lkml/20210406041108.7416-1-ricardo.neri-calderon@linux.intel.com/
[2]. https://lore.kernel.org/lkml/20210414020436.12980-1-ricardo.neri-calderon@linux.intel.com/
[3]. https://lore.kernel.org/lkml/20210513154909.6385-1-ricardo.neri-calderon@linux.intel.com/

Ricardo Neri (6):
  x86/sched: Decrease further the priorities of SMT siblings
  sched/topology: Introduce sched_group::flags
  sched/fair: Optimize checking for group_asym_packing
  sched/fair: Provide update_sg_lb_stats() with sched domain statistics
  sched/fair: Carve out logic to mark a group for asymmetric packing
  sched/fair: Consider SMT in ASYM_PACKING load balance

 arch/x86/kernel/itmt.c  |   2 +-
 kernel/sched/fair.c     | 122 ++++++++++++++++++++++++++++++++++++----
 kernel/sched/sched.h    |   1 +
 kernel/sched/topology.c |  21 ++++++-
 4 files changed, 132 insertions(+), 14 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/6] x86/sched: Decrease further the priorities of SMT siblings
  2021-08-10 14:41 [PATCH v4 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
@ 2021-08-10 14:41 ` Ricardo Neri
  2021-08-10 14:41 ` [PATCH v4 2/6] sched/topology: Introduce sched_group::flags Ricardo Neri
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Ricardo Neri @ 2021-08-10 14:41 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Rafael J. Wysocki, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Ricardo Neri, Aubrey Li,
	Daniel Bristot de Oliveira

When scheduling, it is better to prefer a separate physical core rather
than the SMT sibling of a high priority core. The existing formula to
compute priorities takes such fact in consideration. There may exist,
however, combinations of priorities (i.e., maximum frequencies) in which
the priority of high-numbered SMT siblings of high-priority cores collides
with the priority of low-numbered SMT siblings of low-priority cores.

Consider for instance an SMT2 system with CPUs [0, 1] with priority 60 and
[2, 3] with priority 30(CPUs in brackets are SMT siblings. In such a case,
the resulting priorities would be [120, 60], [60, 30]. Thus, to ensure
that CPU2 has higher priority than CPU1, divide the raw priority by the
squared SMT iterator. The resulting priorities are [120, 30]. [60, 15].

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Originally-by: Len Brown <len.brown@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v3:
  * Introduced this patch

Changes since v2:
  * N/A

Changes since v1:
  * N/A
---
 arch/x86/kernel/itmt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index 1afbdd1dd777..9ff480e94511 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -198,7 +198,7 @@ void sched_set_itmt_core_prio(int prio, int core_cpu)
 		 * of the priority chain and only used when
 		 * all other high priority cpus are out of capacity.
 		 */
-		smt_prio = prio * smp_num_siblings / i;
+		smt_prio = prio * smp_num_siblings / (i * i);
 		per_cpu(sched_core_priority, cpu) = smt_prio;
 		i++;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/6] sched/topology: Introduce sched_group::flags
  2021-08-10 14:41 [PATCH v4 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
  2021-08-10 14:41 ` [PATCH v4 1/6] x86/sched: Decrease further the priorities of SMT siblings Ricardo Neri
@ 2021-08-10 14:41 ` Ricardo Neri
  2021-08-10 14:41 ` [PATCH v4 3/6] sched/fair: Optimize checking for group_asym_packing Ricardo Neri
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Ricardo Neri @ 2021-08-10 14:41 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Rafael J. Wysocki, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Ricardo Neri, Aubrey Li,
	Daniel Bristot de Oliveira

There exist situations in which the load balance needs to know the
properties of the CPUs in a scheduling group. When using asymmetric
packing, for instance, the load balancer needs to know not only the
state of dst_cpu but also of its SMT siblings, if any.

Use the flags of the child scheduling domains to initialize scheduling
group flags. This will reflect the properties of the CPUs in the
group.

A subsequent changeset will make use of these new flags. No functional
changes are introduced.

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v3:
  * Clear the flags of the scheduling groups of a domain if its child is
    destroyed.
  * Minor rewording of the commit message.

Changes since v2:
  * Introduced this patch.

Changes since v1:
  * N/A
---
 kernel/sched/sched.h    |  1 +
 kernel/sched/topology.c | 21 ++++++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index a2c6f6ae6392..fb716e78974d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1799,6 +1799,7 @@ struct sched_group {
 	unsigned int		group_weight;
 	struct sched_group_capacity *sgc;
 	int			asym_prefer_cpu;	/* CPU of highest priority in group */
+	int			flags;
 
 	/*
 	 * The CPUs this group covers.
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b77ad49dc14f..27c959de175d 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -716,8 +716,20 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
 		tmp = sd;
 		sd = sd->parent;
 		destroy_sched_domain(tmp);
-		if (sd)
+		if (sd) {
+			struct sched_group *sg = sd->groups;
+
+			/*
+			 * sched groups hold the flags of the child sched
+			 * domain for convenience. Clear such flags since
+			 * the child is being destroyed.
+			 */
+			do {
+				sg->flags = 0;
+			} while (sg != sd->groups);
+
 			sd->child = NULL;
+		}
 	}
 
 	for (tmp = sd; tmp; tmp = tmp->parent)
@@ -916,10 +928,12 @@ build_group_from_child_sched_domain(struct sched_domain *sd, int cpu)
 		return NULL;
 
 	sg_span = sched_group_span(sg);
-	if (sd->child)
+	if (sd->child) {
 		cpumask_copy(sg_span, sched_domain_span(sd->child));
-	else
+		sg->flags = sd->child->flags;
+	} else {
 		cpumask_copy(sg_span, sched_domain_span(sd));
+	}
 
 	atomic_inc(&sg->ref);
 	return sg;
@@ -1169,6 +1183,7 @@ static struct sched_group *get_group(int cpu, struct sd_data *sdd)
 	if (child) {
 		cpumask_copy(sched_group_span(sg), sched_domain_span(child));
 		cpumask_copy(group_balance_mask(sg), sched_group_span(sg));
+		sg->flags = child->flags;
 	} else {
 		cpumask_set_cpu(cpu, sched_group_span(sg));
 		cpumask_set_cpu(cpu, group_balance_mask(sg));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/6] sched/fair: Optimize checking for group_asym_packing
  2021-08-10 14:41 [PATCH v4 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
  2021-08-10 14:41 ` [PATCH v4 1/6] x86/sched: Decrease further the priorities of SMT siblings Ricardo Neri
  2021-08-10 14:41 ` [PATCH v4 2/6] sched/topology: Introduce sched_group::flags Ricardo Neri
@ 2021-08-10 14:41 ` Ricardo Neri
  2021-08-10 14:41 ` [PATCH v4 4/6] sched/fair: Provide update_sg_lb_stats() with sched domain statistics Ricardo Neri
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Ricardo Neri @ 2021-08-10 14:41 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Rafael J. Wysocki, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Ricardo Neri, Aubrey Li,
	Daniel Bristot de Oliveira

sched_asmy_prefer() always returns false when called on the local group. By
checking local_group, we can avoid additional checks and invoking
sched_asmy_prefer() when it is not needed. No functional changes are
introduced.

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v3:
  * Further rewording of the commit message. (Len)

Changes since v2:
  * Reworded the commit message for clarity. (Peter Z)

Changes since v1:
  * None
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 44e44c235f1f..3b686e18a39b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8591,7 +8591,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 	}
 
 	/* Check if dst CPU is idle and preferred to this group */
-	if (env->sd->flags & SD_ASYM_PACKING &&
+	if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
 	    env->idle != CPU_NOT_IDLE &&
 	    sgs->sum_h_nr_running &&
 	    sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/6] sched/fair: Provide update_sg_lb_stats() with sched domain statistics
  2021-08-10 14:41 [PATCH v4 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
                   ` (2 preceding siblings ...)
  2021-08-10 14:41 ` [PATCH v4 3/6] sched/fair: Optimize checking for group_asym_packing Ricardo Neri
@ 2021-08-10 14:41 ` Ricardo Neri
  2021-08-10 14:41 ` [PATCH v4 5/6] sched/fair: Carve out logic to mark a group for asymmetric packing Ricardo Neri
  2021-08-10 14:41 ` [PATCH v4 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
  5 siblings, 0 replies; 13+ messages in thread
From: Ricardo Neri @ 2021-08-10 14:41 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Rafael J. Wysocki, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Ricardo Neri, Aubrey Li,
	Daniel Bristot de Oliveira

Before deciding to pull tasks when using asymmetric packing of tasks,
on some architectures (e.g., x86) it is necessary to know not only the
state of dst_cpu but also of its SMT siblings. The decision to classify
a candidate busiest group as group_asym_packing is done in
update_sg_lb_stats(). Give this function access to the scheduling domain
statistics, which contains the statistics of the local group.

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v3:
  * None

Changes since v2:
  * Introduced this patch.

Changes since v1:
  * N/A
---
 kernel/sched/fair.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3b686e18a39b..ae3d2bc59d8d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8539,6 +8539,7 @@ group_type group_classify(unsigned int imbalance_pct,
  * @sg_status: Holds flag indicating the status of the sched_group
  */
 static inline void update_sg_lb_stats(struct lb_env *env,
+				      struct sd_lb_stats *sds,
 				      struct sched_group *group,
 				      struct sg_lb_stats *sgs,
 				      int *sg_status)
@@ -8547,7 +8548,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 
 	memset(sgs, 0, sizeof(*sgs));
 
-	local_group = cpumask_test_cpu(env->dst_cpu, sched_group_span(group));
+	local_group = group == sds->local;
 
 	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
 		struct rq *rq = cpu_rq(i);
@@ -9110,7 +9111,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 				update_group_capacity(env->sd, env->dst_cpu);
 		}
 
-		update_sg_lb_stats(env, sg, sgs, &sg_status);
+		update_sg_lb_stats(env, sds, sg, sgs, &sg_status);
 
 		if (local_group)
 			goto next_group;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 5/6] sched/fair: Carve out logic to mark a group for asymmetric packing
  2021-08-10 14:41 [PATCH v4 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
                   ` (3 preceding siblings ...)
  2021-08-10 14:41 ` [PATCH v4 4/6] sched/fair: Provide update_sg_lb_stats() with sched domain statistics Ricardo Neri
@ 2021-08-10 14:41 ` Ricardo Neri
  2021-08-10 14:41 ` [PATCH v4 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
  5 siblings, 0 replies; 13+ messages in thread
From: Ricardo Neri @ 2021-08-10 14:41 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Rafael J. Wysocki, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Ricardo Neri, Aubrey Li,
	Daniel Bristot de Oliveira

Create a separate function, sched_asym(). A subsequent changeset will
introduce logic to deal with SMT in conjunction with asmymmetric
packing. Such logic will need the statistics of the scheduling
group provided as argument. Update them before calling sched_asym().

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v3:
  * Remove a redundant check for the local group in sched_asym().
    (Dietmar)
  * Reworded commit message for clarity. (Len)

Changes since v2:
  * Introduced this patch.

Changes since v1:
  * N/A
---
 kernel/sched/fair.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ae3d2bc59d8d..dd411cefb63f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8531,6 +8531,13 @@ group_type group_classify(unsigned int imbalance_pct,
 	return group_has_spare;
 }
 
+static inline bool
+sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
+	   struct sched_group *group)
+{
+	return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
+}
+
 /**
  * update_sg_lb_stats - Update sched_group's statistics for load balancing.
  * @env: The load balancing environment.
@@ -8591,18 +8598,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 		}
 	}
 
+	sgs->group_capacity = group->sgc->capacity;
+
+	sgs->group_weight = group->group_weight;
+
 	/* Check if dst CPU is idle and preferred to this group */
 	if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
-	    env->idle != CPU_NOT_IDLE &&
-	    sgs->sum_h_nr_running &&
-	    sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
+	    env->idle != CPU_NOT_IDLE && sgs->sum_h_nr_running &&
+	    sched_asym(env, sds, sgs, group)) {
 		sgs->group_asym_packing = 1;
 	}
 
-	sgs->group_capacity = group->sgc->capacity;
-
-	sgs->group_weight = group->group_weight;
-
 	sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs);
 
 	/* Computing avg_load makes sense only when group is overloaded */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-08-10 14:41 [PATCH v4 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
                   ` (4 preceding siblings ...)
  2021-08-10 14:41 ` [PATCH v4 5/6] sched/fair: Carve out logic to mark a group for asymmetric packing Ricardo Neri
@ 2021-08-10 14:41 ` Ricardo Neri
  2021-08-27 10:13   ` Vincent Guittot
  5 siblings, 1 reply; 13+ messages in thread
From: Ricardo Neri @ 2021-08-10 14:41 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Ingo Molnar, Juri Lelli, Vincent Guittot
  Cc: Srikar Dronamraju, Nicholas Piggin, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Len Brown,
	Srinivas Pandruvada, Tim Chen, Aubrey Li, Ravi V. Shankar,
	Ricardo Neri, Rafael J. Wysocki, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Ricardo Neri, Aubrey Li,
	Daniel Bristot de Oliveira

When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
check for the idle state of the destination CPU, dst_cpu, but also of
its SMT siblings.

If dst_cpu is idle but its SMT siblings are busy, performance suffers
if it pulls tasks from a medium priority CPU that does not have SMT
siblings.

Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
siblings of both dst_cpu and the CPUs in the candidate busiest group.

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v3:
  * Removed the arch_asym_check_smt_siblings() hook. Discussions with the
    powerpc folks showed that this patch should not impact them. Also, more
    recent powerpc processor no longer use asym_packing. (PeterZ)
  * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar)
  * Removed unnecessary check for local CPUs when the local group has zero
    utilization. (Joel)
  * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect
    the fact that it deals with SMT cases.
  * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so
    that callers can deal with non-SMT cases.

Changes since v2:
  * Reworded the commit message to reflect updates in code.
  * Corrected misrepresentation of dst_cpu as the CPU doing the load
    balancing. (PeterZ)
  * Removed call to arch_asym_check_smt_siblings() as it is now called in
    sched_asym().

Changes since v1:
  * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
    tasks. Instead, reclassify the candidate busiest group, as it
    may still be selected. (PeterZ)
  * Avoid an expensive and unnecessary call to cpumask_weight() when
    determining if a sched_group is comprised of SMT siblings.
    (PeterZ).
---
 kernel/sched/fair.c | 95 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dd411cefb63f..8a1a2a43732c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8531,10 +8531,99 @@ group_type group_classify(unsigned int imbalance_pct,
 	return group_has_spare;
 }
 
+/**
+ * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
+ * @dst_cpu:	Destination CPU of the load balancing
+ * @sds:	Load-balancing data with statistics of the local group
+ * @sgs:	Load-balancing statistics of the candidate busiest group
+ * @sg:		The candidate busiet group
+ *
+ * Check the state of the SMT siblings of both @sds::local and @sg and decide
+ * if @dst_cpu can pull tasks. If @dst_cpu does not have SMT siblings, it can
+ * pull tasks if two or more of the SMT siblings of @sg are busy. If only one
+ * CPU in @sg is busy, pull tasks only if @dst_cpu has higher priority.
+ *
+ * If both @dst_cpu and @sg have SMT siblings, even the number of idle CPUs
+ * between @sds::local and @sg. Thus, pull tasks from @sg if the difference
+ * between the number of busy CPUs is 2 or more. If the difference is of 1,
+ * only pull if @dst_cpu has higher priority. If @sg does not have SMT siblings
+ * only pull tasks if all of the SMT siblings of @dst_cpu are idle and @sg
+ * has lower priority.
+ */
+static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
+				    struct sg_lb_stats *sgs,
+				    struct sched_group *sg)
+{
+#ifdef CONFIG_SCHED_SMT
+	bool local_is_smt, sg_is_smt;
+	int sg_busy_cpus;
+
+	local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
+	sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
+
+	sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
+
+	if (!local_is_smt) {
+		/*
+		 * If we are here, @dst_cpu is idle and does not have SMT
+		 * siblings. Pull tasks if candidate group has two or more
+		 * busy CPUs.
+		 */
+		if (sg_is_smt && sg_busy_cpus >= 2)
+			return true;
+
+		/*
+		 * @dst_cpu does not have SMT siblings. @sg may have SMT
+		 * siblings and only one is busy. In such case, @dst_cpu
+		 * can help if it has higher priority and is idle.
+		 */
+		return !sds->local_stat.group_util &&
+		       sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+	}
+
+	/* @dst_cpu has SMT siblings. */
+
+	if (sg_is_smt) {
+		int local_busy_cpus = sds->local->group_weight -
+				      sds->local_stat.idle_cpus;
+		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
+
+		/* Local can always help to even the number busy CPUs. */
+		if (busy_cpus_delta >= 2)
+			return true;
+
+		if (busy_cpus_delta == 1)
+			return sched_asym_prefer(dst_cpu,
+						 sg->asym_prefer_cpu);
+
+		return false;
+	}
+
+	/*
+	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
+	 * up with more than one busy SMT sibling and only pull tasks if there
+	 * are not busy CPUs. As CPUs move in and out of idle state frequently,
+	 * also check the group utilization to smoother the decision.
+	 */
+	if (!sds->local_stat.group_util)
+		return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
+
+	return false;
+#else
+	/* Always return false so that callers deal with non-SMT cases. */
+	return false;
+#endif
+}
+
 static inline bool
 sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
 	   struct sched_group *group)
 {
+	/* Only do SMT checks if either local or candidate have SMT siblings */
+	if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
+	    (group->flags & SD_SHARE_CPUCAPACITY))
+		return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
+
 	return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
 }
 
@@ -9540,6 +9629,12 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 		    nr_running == 1)
 			continue;
 
+		/* Make sure we only pull tasks from a CPU of lower priority */
+		if ((env->sd->flags & SD_ASYM_PACKING) &&
+		    sched_asym_prefer(i, env->dst_cpu) &&
+		    nr_running == 1)
+			continue;
+
 		switch (env->migration_type) {
 		case migrate_load:
 			/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-08-10 14:41 ` [PATCH v4 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
@ 2021-08-27 10:13   ` Vincent Guittot
  2021-08-27 14:48     ` Peter Zijlstra
  2021-08-27 19:45     ` Ricardo Neri
  0 siblings, 2 replies; 13+ messages in thread
From: Vincent Guittot @ 2021-08-27 10:13 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Srikar Dronamraju, Nicholas Piggin,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Rafael J. Wysocki, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira

On Tue, 10 Aug 2021 at 16:41, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
> check for the idle state of the destination CPU, dst_cpu, but also of
> its SMT siblings.
>
> If dst_cpu is idle but its SMT siblings are busy, performance suffers
> if it pulls tasks from a medium priority CPU that does not have SMT
> siblings.
>
> Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
> siblings of both dst_cpu and the CPUs in the candidate busiest group.
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
> Changes since v3:
>   * Removed the arch_asym_check_smt_siblings() hook. Discussions with the
>     powerpc folks showed that this patch should not impact them. Also, more
>     recent powerpc processor no longer use asym_packing. (PeterZ)
>   * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar)
>   * Removed unnecessary check for local CPUs when the local group has zero
>     utilization. (Joel)
>   * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect
>     the fact that it deals with SMT cases.
>   * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so
>     that callers can deal with non-SMT cases.
>
> Changes since v2:
>   * Reworded the commit message to reflect updates in code.
>   * Corrected misrepresentation of dst_cpu as the CPU doing the load
>     balancing. (PeterZ)
>   * Removed call to arch_asym_check_smt_siblings() as it is now called in
>     sched_asym().
>
> Changes since v1:
>   * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
>     tasks. Instead, reclassify the candidate busiest group, as it
>     may still be selected. (PeterZ)
>   * Avoid an expensive and unnecessary call to cpumask_weight() when
>     determining if a sched_group is comprised of SMT siblings.
>     (PeterZ).
> ---
>  kernel/sched/fair.c | 95 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 95 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index dd411cefb63f..8a1a2a43732c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8531,10 +8531,99 @@ group_type group_classify(unsigned int imbalance_pct,
>         return group_has_spare;
>  }
>
> +/**
> + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> + * @dst_cpu:   Destination CPU of the load balancing
> + * @sds:       Load-balancing data with statistics of the local group
> + * @sgs:       Load-balancing statistics of the candidate busiest group
> + * @sg:                The candidate busiet group
> + *
> + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> + * if @dst_cpu can pull tasks. If @dst_cpu does not have SMT siblings, it can
> + * pull tasks if two or more of the SMT siblings of @sg are busy. If only one
> + * CPU in @sg is busy, pull tasks only if @dst_cpu has higher priority.
> + *
> + * If both @dst_cpu and @sg have SMT siblings, even the number of idle CPUs
> + * between @sds::local and @sg. Thus, pull tasks from @sg if the difference
> + * between the number of busy CPUs is 2 or more. If the difference is of 1,
> + * only pull if @dst_cpu has higher priority. If @sg does not have SMT siblings
> + * only pull tasks if all of the SMT siblings of @dst_cpu are idle and @sg
> + * has lower priority.
> + */
> +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> +                                   struct sg_lb_stats *sgs,
> +                                   struct sched_group *sg)
> +{
> +#ifdef CONFIG_SCHED_SMT
> +       bool local_is_smt, sg_is_smt;
> +       int sg_busy_cpus;
> +
> +       local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> +       sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> +
> +       sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> +
> +       if (!local_is_smt) {
> +               /*
> +                * If we are here, @dst_cpu is idle and does not have SMT
> +                * siblings. Pull tasks if candidate group has two or more
> +                * busy CPUs.
> +                */
> +               if (sg_is_smt && sg_busy_cpus >= 2)
> +                       return true;
> +
> +               /*
> +                * @dst_cpu does not have SMT siblings. @sg may have SMT
> +                * siblings and only one is busy. In such case, @dst_cpu
> +                * can help if it has higher priority and is idle.
> +                */
> +               return !sds->local_stat.group_util &&

sds->local_stat.group_util can't be used to decide if a CPU or group
of CPUs is idle. util_avg is usually not null when a CPU becomes idle
and you can have to wait  more than 300ms before it becomes Null
At the opposite, the utilization of a CPU can be null but a task with
null utilization has just woken up on it.
Utilization is used to reflect the average work of the CPU or group of
CPUs but not the current state

> +                      sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> +       }
> +
> +       /* @dst_cpu has SMT siblings. */
> +
> +       if (sg_is_smt) {
> +               int local_busy_cpus = sds->local->group_weight -
> +                                     sds->local_stat.idle_cpus;
> +               int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> +
> +               /* Local can always help to even the number busy CPUs. */

default behavior of the load balance already tries to even the number
of idle CPUs.

> +               if (busy_cpus_delta >= 2)
> +                       return true;
> +
> +               if (busy_cpus_delta == 1)
> +                       return sched_asym_prefer(dst_cpu,
> +                                                sg->asym_prefer_cpu);
> +
> +               return false;
> +       }
> +
> +       /*
> +        * @sg does not have SMT siblings. Ensure that @sds::local does not end
> +        * up with more than one busy SMT sibling and only pull tasks if there
> +        * are not busy CPUs. As CPUs move in and out of idle state frequently,
> +        * also check the group utilization to smoother the decision.
> +        */
> +       if (!sds->local_stat.group_util)

same comment as above about the meaning of group_util == 0

> +               return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> +
> +       return false;
> +#else
> +       /* Always return false so that callers deal with non-SMT cases. */
> +       return false;
> +#endif
> +}
> +
>  static inline bool
>  sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
>            struct sched_group *group)
>  {
> +       /* Only do SMT checks if either local or candidate have SMT siblings */
> +       if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> +           (group->flags & SD_SHARE_CPUCAPACITY))
> +               return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
> +
>         return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
>  }
>
> @@ -9540,6 +9629,12 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>                     nr_running == 1)
>                         continue;
>
> +               /* Make sure we only pull tasks from a CPU of lower priority */
> +               if ((env->sd->flags & SD_ASYM_PACKING) &&
> +                   sched_asym_prefer(i, env->dst_cpu) &&
> +                   nr_running == 1)
> +                       continue;

This really looks similar to the test above for SD_ASYM_CPUCAPACITY.
More generally speaking SD_ASYM_PACKING and SD_ASYM_CPUCAPACITY share
a lot of common policy and I wonder if at some point we could not
merge their behavior in LB

> +
>                 switch (env->migration_type) {
>                 case migrate_load:
>                         /*
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-08-27 10:13   ` Vincent Guittot
@ 2021-08-27 14:48     ` Peter Zijlstra
  2021-08-27 15:17       ` Vincent Guittot
  2021-08-27 19:45     ` Ricardo Neri
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2021-08-27 14:48 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ricardo Neri, Ingo Molnar, Juri Lelli, Srikar Dronamraju,
	Nicholas Piggin, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Rafael J. Wysocki, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira

On Fri, Aug 27, 2021 at 12:13:42PM +0200, Vincent Guittot wrote:
> > +/**
> > + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> > + * @dst_cpu:   Destination CPU of the load balancing
> > + * @sds:       Load-balancing data with statistics of the local group
> > + * @sgs:       Load-balancing statistics of the candidate busiest group
> > + * @sg:                The candidate busiet group
> > + *
> > + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> > + * if @dst_cpu can pull tasks. If @dst_cpu does not have SMT siblings, it can
> > + * pull tasks if two or more of the SMT siblings of @sg are busy. If only one
> > + * CPU in @sg is busy, pull tasks only if @dst_cpu has higher priority.
> > + *
> > + * If both @dst_cpu and @sg have SMT siblings, even the number of idle CPUs
> > + * between @sds::local and @sg. Thus, pull tasks from @sg if the difference
> > + * between the number of busy CPUs is 2 or more. If the difference is of 1,
> > + * only pull if @dst_cpu has higher priority. If @sg does not have SMT siblings
> > + * only pull tasks if all of the SMT siblings of @dst_cpu are idle and @sg
> > + * has lower priority.
> > + */
> > +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> > +                                   struct sg_lb_stats *sgs,
> > +                                   struct sched_group *sg)
> > +{
> > +#ifdef CONFIG_SCHED_SMT
> > +       bool local_is_smt, sg_is_smt;
> > +       int sg_busy_cpus;
> > +
> > +       local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> > +       sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> > +
> > +       sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> > +
> > +       if (!local_is_smt) {
> > +               /*
> > +                * If we are here, @dst_cpu is idle and does not have SMT
> > +                * siblings. Pull tasks if candidate group has two or more
> > +                * busy CPUs.
> > +                */
> > +               if (sg_is_smt && sg_busy_cpus >= 2)
> > +                       return true;
> > +
> > +               /*
> > +                * @dst_cpu does not have SMT siblings. @sg may have SMT
> > +                * siblings and only one is busy. In such case, @dst_cpu
> > +                * can help if it has higher priority and is idle.
> > +                */
> > +               return !sds->local_stat.group_util &&
> 
> sds->local_stat.group_util can't be used to decide if a CPU or group
> of CPUs is idle. util_avg is usually not null when a CPU becomes idle
> and you can have to wait  more than 300ms before it becomes Null
> At the opposite, the utilization of a CPU can be null but a task with
> null utilization has just woken up on it.
> Utilization is used to reflect the average work of the CPU or group of
> CPUs but not the current state

If you want immediate idle, sgs->nr_running == 0 or sgs->idle_cpus ==
sgs->group_weight come to mind.

> > +                      sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > +       }
> > +
> > +       /* @dst_cpu has SMT siblings. */
> > +
> > +       if (sg_is_smt) {
> > +               int local_busy_cpus = sds->local->group_weight -
> > +                                     sds->local_stat.idle_cpus;
> > +               int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> > +
> > +               /* Local can always help to even the number busy CPUs. */
> 
> default behavior of the load balance already tries to even the number
> of idle CPUs.

Right, but I suppose this is because we're trapped here and have to deal
with the SMT-SMT case too. Ricardo, can you clarify?

> > +               if (busy_cpus_delta >= 2)
> > +                       return true;
> > +
> > +               if (busy_cpus_delta == 1)
> > +                       return sched_asym_prefer(dst_cpu,
> > +                                                sg->asym_prefer_cpu);
> > +
> > +               return false;
> > +       }
> > +
> > +       /*
> > +        * @sg does not have SMT siblings. Ensure that @sds::local does not end
> > +        * up with more than one busy SMT sibling and only pull tasks if there
> > +        * are not busy CPUs. As CPUs move in and out of idle state frequently,
> > +        * also check the group utilization to smoother the decision.
> > +        */
> > +       if (!sds->local_stat.group_util)
> 
> same comment as above about the meaning of group_util == 0
> 
> > +               return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > +
> > +       return false;
> > +#else
> > +       /* Always return false so that callers deal with non-SMT cases. */
> > +       return false;
> > +#endif
> > +}
> > +
> >  static inline bool
> >  sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
> >            struct sched_group *group)
> >  {
> > +       /* Only do SMT checks if either local or candidate have SMT siblings */
> > +       if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> > +           (group->flags & SD_SHARE_CPUCAPACITY))
> > +               return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
> > +
> >         return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
> >  }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-08-27 14:48     ` Peter Zijlstra
@ 2021-08-27 15:17       ` Vincent Guittot
  2021-08-27 19:43         ` Ricardo Neri
  0 siblings, 1 reply; 13+ messages in thread
From: Vincent Guittot @ 2021-08-27 15:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ricardo Neri, Ingo Molnar, Juri Lelli, Srikar Dronamraju,
	Nicholas Piggin, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Rafael J. Wysocki, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira

On Fri, 27 Aug 2021 at 16:50, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Aug 27, 2021 at 12:13:42PM +0200, Vincent Guittot wrote:
> > > +/**
> > > + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> > > + * @dst_cpu:   Destination CPU of the load balancing
> > > + * @sds:       Load-balancing data with statistics of the local group
> > > + * @sgs:       Load-balancing statistics of the candidate busiest group
> > > + * @sg:                The candidate busiet group
> > > + *
> > > + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> > > + * if @dst_cpu can pull tasks. If @dst_cpu does not have SMT siblings, it can
> > > + * pull tasks if two or more of the SMT siblings of @sg are busy. If only one
> > > + * CPU in @sg is busy, pull tasks only if @dst_cpu has higher priority.
> > > + *
> > > + * If both @dst_cpu and @sg have SMT siblings, even the number of idle CPUs
> > > + * between @sds::local and @sg. Thus, pull tasks from @sg if the difference
> > > + * between the number of busy CPUs is 2 or more. If the difference is of 1,
> > > + * only pull if @dst_cpu has higher priority. If @sg does not have SMT siblings
> > > + * only pull tasks if all of the SMT siblings of @dst_cpu are idle and @sg
> > > + * has lower priority.
> > > + */
> > > +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> > > +                                   struct sg_lb_stats *sgs,
> > > +                                   struct sched_group *sg)
> > > +{
> > > +#ifdef CONFIG_SCHED_SMT
> > > +       bool local_is_smt, sg_is_smt;
> > > +       int sg_busy_cpus;
> > > +
> > > +       local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> > > +       sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> > > +
> > > +       sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> > > +
> > > +       if (!local_is_smt) {
> > > +               /*
> > > +                * If we are here, @dst_cpu is idle and does not have SMT
> > > +                * siblings. Pull tasks if candidate group has two or more
> > > +                * busy CPUs.
> > > +                */
> > > +               if (sg_is_smt && sg_busy_cpus >= 2)
> > > +                       return true;
> > > +
> > > +               /*
> > > +                * @dst_cpu does not have SMT siblings. @sg may have SMT
> > > +                * siblings and only one is busy. In such case, @dst_cpu
> > > +                * can help if it has higher priority and is idle.
> > > +                */
> > > +               return !sds->local_stat.group_util &&
> >
> > sds->local_stat.group_util can't be used to decide if a CPU or group
> > of CPUs is idle. util_avg is usually not null when a CPU becomes idle
> > and you can have to wait  more than 300ms before it becomes Null
> > At the opposite, the utilization of a CPU can be null but a task with
> > null utilization has just woken up on it.
> > Utilization is used to reflect the average work of the CPU or group of
> > CPUs but not the current state
>
> If you want immediate idle, sgs->nr_running == 0 or sgs->idle_cpus ==
> sgs->group_weight come to mind.

yes, I have the same in mind

>
> > > +                      sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > > +       }
> > > +
> > > +       /* @dst_cpu has SMT siblings. */
> > > +
> > > +       if (sg_is_smt) {
> > > +               int local_busy_cpus = sds->local->group_weight -
> > > +                                     sds->local_stat.idle_cpus;
> > > +               int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> > > +
> > > +               /* Local can always help to even the number busy CPUs. */
> >
> > default behavior of the load balance already tries to even the number
 a> > of idle CPUs.
>
> Right, but I suppose this is because we're trapped here and have to deal
> with the SMT-SMT case too. Ricardo, can you clarify?

IIUC, this function is used in sg_lb_stats to set
sgs->group_asym_packing which is then used to set the group state to
group_asym_packing and force asym migration.
But if we only want to even the number of busy CPUs between the group,
we should not need to set the group state to  group_asym_packing

>
> > > +               if (busy_cpus_delta >= 2)
> > > +                       return true;
> > > +
> > > +               if (busy_cpus_delta == 1)
> > > +                       return sched_asym_prefer(dst_cpu,
> > > +                                                sg->asym_prefer_cpu);
> > > +
> > > +               return false;
> > > +       }
> > > +
> > > +       /*
> > > +        * @sg does not have SMT siblings. Ensure that @sds::local does not end
> > > +        * up with more than one busy SMT sibling and only pull tasks if there
> > > +        * are not busy CPUs. As CPUs move in and out of idle state frequently,
> > > +        * also check the group utilization to smoother the decision.
> > > +        */
> > > +       if (!sds->local_stat.group_util)
> >
> > same comment as above about the meaning of group_util == 0
> >
> > > +               return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > > +
> > > +       return false;
> > > +#else
> > > +       /* Always return false so that callers deal with non-SMT cases. */
> > > +       return false;
> > > +#endif
> > > +}
> > > +
> > >  static inline bool
> > >  sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
> > >            struct sched_group *group)
> > >  {
> > > +       /* Only do SMT checks if either local or candidate have SMT siblings */
> > > +       if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> > > +           (group->flags & SD_SHARE_CPUCAPACITY))
> > > +               return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
> > > +
> > >         return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
> > >  }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-08-27 15:17       ` Vincent Guittot
@ 2021-08-27 19:43         ` Ricardo Neri
  0 siblings, 0 replies; 13+ messages in thread
From: Ricardo Neri @ 2021-08-27 19:43 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Srikar Dronamraju,
	Nicholas Piggin, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Rafael J. Wysocki, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira

On Fri, Aug 27, 2021 at 05:17:22PM +0200, Vincent Guittot wrote:
> On Fri, 27 Aug 2021 at 16:50, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, Aug 27, 2021 at 12:13:42PM +0200, Vincent Guittot wrote:
> > > > +/**
> > > > + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> > > > + * @dst_cpu:   Destination CPU of the load balancing
> > > > + * @sds:       Load-balancing data with statistics of the local group
> > > > + * @sgs:       Load-balancing statistics of the candidate busiest group
> > > > + * @sg:                The candidate busiet group
> > > > + *
> > > > + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> > > > + * if @dst_cpu can pull tasks. If @dst_cpu does not have SMT siblings, it can
> > > > + * pull tasks if two or more of the SMT siblings of @sg are busy. If only one
> > > > + * CPU in @sg is busy, pull tasks only if @dst_cpu has higher priority.
> > > > + *
> > > > + * If both @dst_cpu and @sg have SMT siblings, even the number of idle CPUs
> > > > + * between @sds::local and @sg. Thus, pull tasks from @sg if the difference
> > > > + * between the number of busy CPUs is 2 or more. If the difference is of 1,
> > > > + * only pull if @dst_cpu has higher priority. If @sg does not have SMT siblings
> > > > + * only pull tasks if all of the SMT siblings of @dst_cpu are idle and @sg
> > > > + * has lower priority.
> > > > + */
> > > > +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> > > > +                                   struct sg_lb_stats *sgs,
> > > > +                                   struct sched_group *sg)
> > > > +{
> > > > +#ifdef CONFIG_SCHED_SMT
> > > > +       bool local_is_smt, sg_is_smt;
> > > > +       int sg_busy_cpus;
> > > > +
> > > > +       local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> > > > +       sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> > > > +
> > > > +       sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> > > > +
> > > > +       if (!local_is_smt) {
> > > > +               /*
> > > > +                * If we are here, @dst_cpu is idle and does not have SMT
> > > > +                * siblings. Pull tasks if candidate group has two or more
> > > > +                * busy CPUs.
> > > > +                */
> > > > +               if (sg_is_smt && sg_busy_cpus >= 2)
> > > > +                       return true;
> > > > +
> > > > +               /*
> > > > +                * @dst_cpu does not have SMT siblings. @sg may have SMT
> > > > +                * siblings and only one is busy. In such case, @dst_cpu
> > > > +                * can help if it has higher priority and is idle.
> > > > +                */
> > > > +               return !sds->local_stat.group_util &&
> > >
> > > sds->local_stat.group_util can't be used to decide if a CPU or group
> > > of CPUs is idle. util_avg is usually not null when a CPU becomes idle
> > > and you can have to wait  more than 300ms before it becomes Null
> > > At the opposite, the utilization of a CPU can be null but a task with
> > > null utilization has just woken up on it.
> > > Utilization is used to reflect the average work of the CPU or group of
> > > CPUs but not the current state
> >
> > If you want immediate idle, sgs->nr_running == 0 or sgs->idle_cpus ==
> > sgs->group_weight come to mind.
> 
> yes, I have the same in mind

Thank you very much Vincent and Peter for the feedback! I'll look at
using these stats to determine immediate idle.

> 
> >
> > > > +                      sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > > > +       }
> > > > +
> > > > +       /* @dst_cpu has SMT siblings. */
> > > > +
> > > > +       if (sg_is_smt) {
> > > > +               int local_busy_cpus = sds->local->group_weight -
> > > > +                                     sds->local_stat.idle_cpus;
> > > > +               int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> > > > +
> > > > +               /* Local can always help to even the number busy CPUs. */
> > >
> > > default behavior of the load balance already tries to even the number
>  a> > of idle CPUs.
> >
> > Right, but I suppose this is because we're trapped here and have to deal
> > with the SMT-SMT case too. Ricardo, can you clarify?
> 
> IIUC, this function is used in sg_lb_stats to set
> sgs->group_asym_packing which is then used to set the group state to
> group_asym_packing and force asym migration.
> But if we only want to even the number of busy CPUs between the group,
> we should not need to set the group state to  group_asym_packing

Yes, what Vincent describe is the intent. Then, I think it is probably
true that it is not necessary to even the number of idle CPUs here.
> 
> >
> > > > +               if (busy_cpus_delta >= 2)
> > > > +                       return true;
> > > > +
> > > > +               if (busy_cpus_delta == 1)
> > > > +                       return sched_asym_prefer(dst_cpu,
> > > > +                                                sg->asym_prefer_cpu);

I think we should keep this check in order to move tasks to higher
priority CPUs.

> > > > +
> > > > +               return false;
> > > > +       }
> > > > +
> > > > +       /*
> > > > +        * @sg does not have SMT siblings. Ensure that @sds::local does not end
> > > > +        * up with more than one busy SMT sibling and only pull tasks if there
> > > > +        * are not busy CPUs. As CPUs move in and out of idle state frequently,
> > > > +        * also check the group utilization to smoother the decision.
> > > > +        */
> > > > +       if (!sds->local_stat.group_util)
> > >
> > > same comment as above about the meaning of group_util == 0

I will look into using the suggested statistics.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-08-27 10:13   ` Vincent Guittot
  2021-08-27 14:48     ` Peter Zijlstra
@ 2021-08-27 19:45     ` Ricardo Neri
  2021-08-28 13:25       ` Vincent Guittot
  1 sibling, 1 reply; 13+ messages in thread
From: Ricardo Neri @ 2021-08-27 19:45 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Srikar Dronamraju, Nicholas Piggin,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Rafael J. Wysocki, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira

On Fri, Aug 27, 2021 at 12:13:42PM +0200, Vincent Guittot wrote:
> On Tue, 10 Aug 2021 at 16:41, Ricardo Neri
> <ricardo.neri-calderon@linux.intel.com> wrote:
> > @@ -9540,6 +9629,12 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> >                     nr_running == 1)
> >                         continue;
> >
> > +               /* Make sure we only pull tasks from a CPU of lower priority */
> > +               if ((env->sd->flags & SD_ASYM_PACKING) &&
> > +                   sched_asym_prefer(i, env->dst_cpu) &&
> > +                   nr_running == 1)
> > +                       continue;
> 
> This really looks similar to the test above for SD_ASYM_CPUCAPACITY.
> More generally speaking SD_ASYM_PACKING and SD_ASYM_CPUCAPACITY share
> a lot of common policy and I wonder if at some point we could not
> merge their behavior in LB

I would like to confirm with you that you are not expecting this merge
as part of this series, right?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
  2021-08-27 19:45     ` Ricardo Neri
@ 2021-08-28 13:25       ` Vincent Guittot
  0 siblings, 0 replies; 13+ messages in thread
From: Vincent Guittot @ 2021-08-28 13:25 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Ingo Molnar, Juri Lelli, Srikar Dronamraju, Nicholas Piggin,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Len Brown, Srinivas Pandruvada, Tim Chen, Aubrey Li,
	Ravi V. Shankar, Ricardo Neri, Rafael J. Wysocki, Quentin Perret,
	Joel Fernandes (Google),
	linuxppc-dev, linux-kernel, Aubrey Li,
	Daniel Bristot de Oliveira

On Fri, 27 Aug 2021 at 21:45, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> On Fri, Aug 27, 2021 at 12:13:42PM +0200, Vincent Guittot wrote:
> > On Tue, 10 Aug 2021 at 16:41, Ricardo Neri
> > <ricardo.neri-calderon@linux.intel.com> wrote:
> > > @@ -9540,6 +9629,12 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> > >                     nr_running == 1)
> > >                         continue;
> > >
> > > +               /* Make sure we only pull tasks from a CPU of lower priority */
> > > +               if ((env->sd->flags & SD_ASYM_PACKING) &&
> > > +                   sched_asym_prefer(i, env->dst_cpu) &&
> > > +                   nr_running == 1)
> > > +                       continue;
> >
> > This really looks similar to the test above for SD_ASYM_CPUCAPACITY.
> > More generally speaking SD_ASYM_PACKING and SD_ASYM_CPUCAPACITY share
> > a lot of common policy and I wonder if at some point we could not
> > merge their behavior in LB
>
> I would like to confirm with you that you are not expecting this merge
> as part of this series, right?

Merging them will probably need more tests on both x86 and Arm so I
suppose that we could keep them separate for now

Regards,
Vincent

>
> Thanks and BR,
> Ricardo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-08-28 13:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-10 14:41 [PATCH v4 0/6] sched/fair: Fix load balancing of SMT siblings with ASYM_PACKING Ricardo Neri
2021-08-10 14:41 ` [PATCH v4 1/6] x86/sched: Decrease further the priorities of SMT siblings Ricardo Neri
2021-08-10 14:41 ` [PATCH v4 2/6] sched/topology: Introduce sched_group::flags Ricardo Neri
2021-08-10 14:41 ` [PATCH v4 3/6] sched/fair: Optimize checking for group_asym_packing Ricardo Neri
2021-08-10 14:41 ` [PATCH v4 4/6] sched/fair: Provide update_sg_lb_stats() with sched domain statistics Ricardo Neri
2021-08-10 14:41 ` [PATCH v4 5/6] sched/fair: Carve out logic to mark a group for asymmetric packing Ricardo Neri
2021-08-10 14:41 ` [PATCH v4 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance Ricardo Neri
2021-08-27 10:13   ` Vincent Guittot
2021-08-27 14:48     ` Peter Zijlstra
2021-08-27 15:17       ` Vincent Guittot
2021-08-27 19:43         ` Ricardo Neri
2021-08-27 19:45     ` Ricardo Neri
2021-08-28 13:25       ` Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).