linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] sched/fair: Capacity aware wakeup rework
@ 2020-01-26 20:09 Valentin Schneider
  2020-01-26 20:09 ` [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan Valentin Schneider
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Valentin Schneider @ 2020-01-26 20:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, qperret, adharmap

This series is about replacing the current wakeup logic for asymmetric CPU
capacity topologies, i.e. wake_cap().

Details are in patch 1, the TL;DR is that wake_cap() works fine for
"legacy" big.LITTLE systems (e.g. Juno), since the Last Level Cache (LLC)
domain of a CPU only spans CPUs of the same capacity, but somewhat broken
for newer DynamIQ systems (e.g. Dragonboard 845C), since the LLC domain of
a CPU can span all CPUs in the system. Both example boards are supported in
mainline.

A bit of history
================

Due to the old Energy Model (EM) used until Android Common Kernel v4.14
which grafted itself onto the sched domain hierarchy, mobile topologies
have been represented with "phantom domains"; IOW we'd make a DynamIQ
topology look like a big.LITTLE one:

actual hardware:

  +-------------------+
  |        L3         |
  +----+----+----+----+
  | L2 | L2 | L2 | L2 |
  +----+----+----+----+
  |CPU0|CPU1|CPU2|CPU3|
  +----+----+----+----+
     ^^^^^     ^^^^^
    LITTLEs    bigs

vanilla/mainline topology:

  MC [       ]
      0 1 2 3

phantom domains topology:

  DIE [        ]
  MC  [   ][   ]
       0 1  2 3

With the newer, mainline EM this is no longer required, and wake_cap() is
the last sticking point to getting rid of this legacy crud. More details
and examples are in patch 1.

Notes
=====

This removes the use of SD_BALANCE_WAKE for asymmetric CPU capacity
topologies (which are the last mainline users of that flag), as such it
shouldn't be a surprise that this comes with significant improvements to
wake-intensive workloads: wakeups no longer go through the
select_task_rq_fair() slow-path.

Testing
=======

I've picked sysbench --test=threads to mimic Peter's testing mentioned in

  commit 182a85f8a119 ("sched: Disable wakeup balancing")

Sysbench results are the number of events handled in a fixed amount of
time, so higher is better. Hackbench results are the usual time taken for
the thing, so lower is better.

Note: the 'X%' stats are the percentiles, so 50% is the 50th percentile.

Juno r0 ("legacy" big.LITTLE)
+++++++++++++++++++++++++++++

This is 2 bigs and 4 LITTLEs:

  +---------------+ +-------+
  |      L2       | |  L2   |
  +---+---+---+---+ +---+---+
  | L | L | L | L | | B | B |
  +---+---+---+---+ +---+---+


100 iterations of 'hackbench':

|      |   -PATCH |   +PATCH | DELTA (%) |
|------+----------+----------+-----------|
| mean | 0.631040 | 0.617040 |    -2.219 |
| std  | 0.025486 | 0.017176 |   -32.606 |
| min  | 0.582000 | 0.580000 |    -0.344 |
| 50%  | 0.628500 | 0.613500 |    -2.387 |
| 75%  | 0.645500 | 0.625000 |    -3.176 |
| 99%  | 0.697060 | 0.668020 |    -4.166 |
| max  | 0.703000 | 0.670000 |    -4.694 |

100 iterations of 'sysbench --max-time=5 --max-requests=-1 --test=threads --num-threads=6 run':

|      |       -PATCH |       +PATCH | DELTA (%) |
|------+--------------+--------------+-----------|
| mean | 10267.760000 | 15767.580000 |   +53.564 |
| std  |  3110.439815 |   151.040712 |   -95.144 |
| min  |  7186.000000 | 15206.000000 |  +111.606 |
| 50%  |  9019.500000 | 15778.500000 |   +74.938 |
| 75%  | 12711.000000 | 15873.500000 |   +24.880 |
| 99%  | 15749.290000 | 16042.100000 |    +1.859 |
| max  | 15877.000000 | 16052.000000 |    +1.102 |

Pixel3 (DynamIQ)
++++++++++++++++

Ideally I would have used a DB845C but had a few issues with mine, so I
went with a mainline-ish Pixel3 instead [1]. It's still the same SoC under
the hood (Snapdragon 845), which has 4 bigs and 4 LITTLEs:

  +-------------------------------+
  |               L3              |
  +---+---+---+---+---+---+---+---+
  | L2| L2| L2| L2| L2| L2| L2| L2|
  +---+---+---+---+---+---+---+---+
  | L | L | L | L | B | B | B | B |
  +---+---+---+---+---+---+---+---+

Default topology (single MC domain)
-----------------------------------

100 iterations of 'hackbench -l 200'

|      |   -PATCH |   +PATCH | DELTA (%) |
|------+----------+----------+-----------|
| mean | 1.131360 | 1.127490 |    -0.342 |
| std  | 0.116322 | 0.084154 |   -27.654 |
| min  | 0.935000 | 0.984000 |    +5.241 |
| 50%  | 1.099000 | 1.115500 |    +1.501 |
| 75%  | 1.211250 | 1.182500 |    -2.374 |
| 99%  | 1.401020 | 1.343070 |    -4.136 |
| max  | 1.502000 | 1.350000 |   -10.120 |


100 iterations of 'sysbench --max-time=5 --max-requests=-1 --test=threads --num-threads=8 run':

|      |      -PATCH |      +PATCH | DELTA (%) |
|------+-------------+-------------+-----------|
| mean | 7108.310000 | 8724.770000 |   +22.740 |
| std  |  199.431854 |  185.383188 |    -7.044 |
| min  | 6655.000000 | 8288.000000 |   +24.538 |
| 50%  | 7107.500000 | 8712.500000 |   +22.582 |
| 75%  | 7255.500000 | 8846.250000 |   +21.925 |
| 99%  | 7539.540000 | 9080.970000 |   +20.445 |
| max  | 7593.000000 | 9177.000000 |   +20.861 |

Phantom domains (MC + DIE)
--------------------------

This is mostly included for the sake of completeness.

100 iterations of 'sysbench --max-time=5 --max-requests=-1 --test=threads --num-threads=8 run':

|      |      -PATCH |       +PATCH | DELTA (%) |
|------+-------------+--------------+-----------|
| mean | 7317.940000 |  9888.270000 |   +35.124 |
| std  |  460.372682 |   225.019185 |   -51.122 |
| min  | 5888.000000 |  9277.000000 |   +57.558 |
| 50%  | 7271.000000 |  9879.500000 |   +35.875 |
| 75%  | 7497.500000 | 10023.750000 |   +33.695 |
| 99%  | 8464.390000 | 10318.320000 |   +21.903 |
| max  | 8602.000000 | 10350.000000 |   +20.321 |

Revisions
=========

v2 -> v3
--------
o Added missing sync_entity_load_avg() (Quentin)
o Added fallback CPU selection (maximize capacity)
o Added special case for CPU hogs: task_fits_capacity() will always return 'false'
  for tasks that are simply too big, due to the margin.

v1 -> v2
--------
o Removed unrelated select_idle_core() change

[1]: https://git.linaro.org/people/amit.pundir/linux.git/log/?h=blueline-mainline-tracking

Morten Rasmussen (3):
  sched/fair: Add asymmetric CPU capacity wakeup scan
  sched/topology: Remove SD_BALANCE_WAKE on asymmetric capacity systems
  sched/fair: Kill wake_cap()

 kernel/sched/fair.c     | 89 +++++++++++++++++++++++++++--------------
 kernel/sched/topology.c | 15 ++-----
 2 files changed, 63 insertions(+), 41 deletions(-)

--
2.24.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan
  2020-01-26 20:09 [PATCH v3 0/3] sched/fair: Capacity aware wakeup rework Valentin Schneider
@ 2020-01-26 20:09 ` Valentin Schneider
  2020-01-28  6:22   ` Pavan Kondeti
  2020-01-29 11:04   ` Dietmar Eggemann
  2020-01-26 20:09 ` [PATCH v3 2/3] sched/topology: Remove SD_BALANCE_WAKE on asymmetric capacity systems Valentin Schneider
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 17+ messages in thread
From: Valentin Schneider @ 2020-01-26 20:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, qperret, adharmap

From: Morten Rasmussen <morten.rasmussen@arm.com>

Issue
=====

On asymmetric CPU capacity topologies, we currently rely on wake_cap() to
drive select_task_rq_fair() towards either
- its slow-path (find_idlest_cpu()) if either the previous or
  current (waking) CPU has too little capacity for the waking task
- its fast-path (select_idle_sibling()) otherwise

Commit 3273163c6775 ("sched/fair: Let asymmetric CPU configurations balance
at wake-up") points out that this relies on the assumption that "[...]the
CPU capacities within an SD_SHARE_PKG_RESOURCES domain (sd_llc) are
homogeneous".

This assumption no longer holds on newer generations of big.LITTLE
systems (DynamIQ), which can accommodate CPUs of different compute capacity
within a single LLC domain. To hopefully paint a better picture, a regular
big.LITTLE topology would look like this:

  +---------+ +---------+
  |   L2    | |   L2    |
  +----+----+ +----+----+
  |CPU0|CPU1| |CPU2|CPU3|
  +----+----+ +----+----+
      ^^^         ^^^
    LITTLEs      bigs

which would result in the following scheduler topology:

  DIE [         ] <- sd_asym_cpucapacity
  MC  [   ] [   ] <- sd_llc
       0 1   2 3

Conversely, a DynamIQ topology could look like:

  +-------------------+
  |        L3         |
  +----+----+----+----+
  | L2 | L2 | L2 | L2 |
  +----+----+----+----+
  |CPU0|CPU1|CPU2|CPU3|
  +----+----+----+----+
     ^^^^^     ^^^^^
    LITTLEs    bigs

which would result in the following scheduler topology:

  MC [       ] <- sd_llc, sd_asym_cpucapacity
      0 1 2 3

What this means is that, on DynamIQ systems, we could pass the wake_cap()
test (IOW presume the waking task fits on the CPU capacities of some LLC
domain), thus go through select_idle_sibling().
This function operates on an LLC domain, which here spans both bigs and
LITTLEs, so it could very well pick a CPU of too small capacity for the
task, despite there being fitting idle CPUs - it very much depends on the
CPU iteration order, on which we have absolutely no guarantees
capacity-wise.

Implementation
==============

Introduce yet another select_idle_sibling() helper function that takes CPU
capacity into account. The policy is to pick the first idle CPU which is
big enough for the task (task_util * margin < cpu_capacity). If no
idle CPU is big enough, we pick the idle one with the highest capacity.

Unlike other select_idle_sibling() helpers, this one operates on the
sd_asym_cpucapacity sched_domain pointer, which is guaranteed to span all
known CPU capacities in the system. As such, this will work for both
"legacy" big.LITTLE (LITTLEs & bigs split at MC, joined at DIE) and for
newer DynamIQ systems (e.g. LITTLEs and bigs in the same MC domain).

Co-authored-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/fair.c | 59 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fe4e0d7753756..aebc2e0e6c8a1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5894,6 +5894,60 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	return cpu;
 }
 
+static inline int check_cpu_capacity(struct rq *rq, struct sched_domain *sd);
+
+/*
+ * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
+ * the task fits. If no CPU is big enough, but there are idle ones, try to
+ * maximize capacity.
+ */
+static int select_idle_capacity(struct task_struct *p, int target)
+{
+	unsigned long best_cap = 0;
+	struct sched_domain *sd;
+	struct cpumask *cpus;
+	int best_cpu = -1;
+	struct rq *rq;
+	int cpu;
+
+	if (!static_branch_unlikely(&sched_asym_cpucapacity))
+		return -1;
+
+	sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
+	if (!sd)
+		return -1;
+
+	sync_entity_load_avg(&p->se);
+
+	cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
+	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+
+	for_each_cpu_wrap(cpu, cpus, target) {
+		rq = cpu_rq(cpu);
+
+		if (!available_idle_cpu(cpu))
+			continue;
+		if (task_fits_capacity(p, rq->cpu_capacity))
+			return cpu;
+
+		/*
+		 * It would be silly to keep looping when we've found a CPU
+		 * of highest available capacity. Just check that it's not been
+		 * too pressured lately.
+		 */
+		if (rq->cpu_capacity_orig == READ_ONCE(rq->rd->max_cpu_capacity) &&
+		    !check_cpu_capacity(rq, sd))
+			return cpu;
+
+		if (rq->cpu_capacity > best_cap) {
+			best_cap = rq->cpu_capacity;
+			best_cpu = cpu;
+		}
+	}
+
+	return best_cpu;
+}
+
 /*
  * Try and locate an idle core/thread in the LLC cache domain.
  */
@@ -5902,6 +5956,11 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	struct sched_domain *sd;
 	int i, recent_used_cpu;
 
+	/* For asymmetric capacities, try to be smart about the placement */
+	i = select_idle_capacity(p, target);
+	if ((unsigned)i < nr_cpumask_bits)
+		return i;
+
 	if (available_idle_cpu(target) || sched_idle_cpu(target))
 		return target;
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/3] sched/topology: Remove SD_BALANCE_WAKE on asymmetric capacity systems
  2020-01-26 20:09 [PATCH v3 0/3] sched/fair: Capacity aware wakeup rework Valentin Schneider
  2020-01-26 20:09 ` [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan Valentin Schneider
@ 2020-01-26 20:09 ` Valentin Schneider
  2020-01-29 16:27   ` Dietmar Eggemann
  2020-01-26 20:09 ` [PATCH v3 3/3] sched/fair: Kill wake_cap() Valentin Schneider
  2020-01-28  9:24 ` [PATCH v3 0/3] sched/fair: Capacity aware wakeup rework Dietmar Eggemann
  3 siblings, 1 reply; 17+ messages in thread
From: Valentin Schneider @ 2020-01-26 20:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, qperret, adharmap

From: Morten Rasmussen <morten.rasmussen@arm.com>

SD_BALANCE_WAKE was previously added to lower sched_domain levels on
asymmetric CPU capacity systems by commit 9ee1cda5ee25 ("sched/core: Enable
SD_BALANCE_WAKE for asymmetric capacity systems") to enable the use of
find_idlest_cpu() and friends to find an appropriate CPU for tasks.

That responsibility has now been shifted to select_idle_sibling() and
friends, and hence the flag can be removed. Note that this causes
asymmetric CPU capacity systems to no longer enter the slow wakeup path
(find_idlest_cpu()) on wakeups - only on execs and forks (which is aligned
with all other mainline topologies).

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
[Changelog tweaks]
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/topology.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index dfb64c08a407a..00911884b7e7a 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1374,18 +1374,9 @@ sd_init(struct sched_domain_topology_level *tl,
 	 * Convert topological properties into behaviour.
 	 */
 
-	if (sd->flags & SD_ASYM_CPUCAPACITY) {
-		struct sched_domain *t = sd;
-
-		/*
-		 * Don't attempt to spread across CPUs of different capacities.
-		 */
-		if (sd->child)
-			sd->child->flags &= ~SD_PREFER_SIBLING;
-
-		for_each_lower_domain(t)
-			t->flags |= SD_BALANCE_WAKE;
-	}
+	/* Don't attempt to spread across CPUs of different capacities. */
+	if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child)
+		sd->child->flags &= ~SD_PREFER_SIBLING;
 
 	if (sd->flags & SD_SHARE_CPUCAPACITY) {
 		sd->imbalance_pct = 110;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/3] sched/fair: Kill wake_cap()
  2020-01-26 20:09 [PATCH v3 0/3] sched/fair: Capacity aware wakeup rework Valentin Schneider
  2020-01-26 20:09 ` [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan Valentin Schneider
  2020-01-26 20:09 ` [PATCH v3 2/3] sched/topology: Remove SD_BALANCE_WAKE on asymmetric capacity systems Valentin Schneider
@ 2020-01-26 20:09 ` Valentin Schneider
  2020-01-28  9:24 ` [PATCH v3 0/3] sched/fair: Capacity aware wakeup rework Dietmar Eggemann
  3 siblings, 0 replies; 17+ messages in thread
From: Valentin Schneider @ 2020-01-26 20:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, qperret, adharmap

From: Morten Rasmussen <morten.rasmussen@arm.com>

Capacity-awareness in the wake-up path previously involved disabling
wake_affine in certain scenarios. We have just made select_idle_sibling()
capacity-aware, so this isn't needed anymore.

Remove wake_cap() entirely.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
[Changelog tweaks]
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/fair.c | 30 +-----------------------------
 1 file changed, 1 insertion(+), 29 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index aebc2e0e6c8a1..0df85c32d69cd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6146,33 +6146,6 @@ static unsigned long cpu_util_without(int cpu, struct task_struct *p)
 	return min_t(unsigned long, util, capacity_orig_of(cpu));
 }
 
-/*
- * Disable WAKE_AFFINE in the case where task @p doesn't fit in the
- * capacity of either the waking CPU @cpu or the previous CPU @prev_cpu.
- *
- * In that case WAKE_AFFINE doesn't make sense and we'll let
- * BALANCE_WAKE sort things out.
- */
-static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
-{
-	long min_cap, max_cap;
-
-	if (!static_branch_unlikely(&sched_asym_cpucapacity))
-		return 0;
-
-	min_cap = min(capacity_orig_of(prev_cpu), capacity_orig_of(cpu));
-	max_cap = cpu_rq(cpu)->rd->max_cpu_capacity;
-
-	/* Minimum capacity is close to max, no need to abort wake_affine */
-	if (max_cap - min_cap < max_cap >> 3)
-		return 0;
-
-	/* Bring task utilization in sync with prev_cpu */
-	sync_entity_load_avg(&p->se);
-
-	return !task_fits_capacity(p, min_cap);
-}
-
 /*
  * Predicts what cpu_util(@cpu) would return if @p was migrated (and enqueued)
  * to @dst_cpu.
@@ -6437,8 +6410,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 			new_cpu = prev_cpu;
 		}
 
-		want_affine = !wake_wide(p) && !wake_cap(p, cpu, prev_cpu) &&
-			      cpumask_test_cpu(cpu, p->cpus_ptr);
+		want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, p->cpus_ptr);
 	}
 
 	rcu_read_lock();
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan
  2020-01-26 20:09 ` [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan Valentin Schneider
@ 2020-01-28  6:22   ` Pavan Kondeti
  2020-01-28 11:30     ` Valentin Schneider
  2020-01-29 11:04   ` Dietmar Eggemann
  1 sibling, 1 reply; 17+ messages in thread
From: Pavan Kondeti @ 2020-01-28  6:22 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, qperret, adharmap

Hi Valentin,

On Sun, Jan 26, 2020 at 08:09:32PM +0000, Valentin Schneider wrote:
>  
> +static inline int check_cpu_capacity(struct rq *rq, struct sched_domain *sd);
> +
> +/*
> + * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
> + * the task fits. If no CPU is big enough, but there are idle ones, try to
> + * maximize capacity.
> + */
> +static int select_idle_capacity(struct task_struct *p, int target)
> +{
> +	unsigned long best_cap = 0;
> +	struct sched_domain *sd;
> +	struct cpumask *cpus;
> +	int best_cpu = -1;
> +	struct rq *rq;
> +	int cpu;
> +
> +	if (!static_branch_unlikely(&sched_asym_cpucapacity))
> +		return -1;
> +
> +	sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
> +	if (!sd)
> +		return -1;
> +
> +	sync_entity_load_avg(&p->se);
> +
> +	cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> +	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> +
> +	for_each_cpu_wrap(cpu, cpus, target) {
> +		rq = cpu_rq(cpu);
> +
> +		if (!available_idle_cpu(cpu))
> +			continue;
> +		if (task_fits_capacity(p, rq->cpu_capacity))
> +			return cpu;

I have couple of questions.

(1) Any particular reason for not checking sched_idle_cpu() as a backup
for the case where all eligible CPUs are busy? select_idle_cpu() does
that.

(2) Assuming all CPUs are busy, we return -1 from here and end up
calling select_idle_cpu(). The traversal in select_idle_cpu() may be
waste in cases where sd_llc == sd_asym_cpucapacity . For example SDM845.
Should we worry about this?

> +
> +		/*
> +		 * It would be silly to keep looping when we've found a CPU
> +		 * of highest available capacity. Just check that it's not been
> +		 * too pressured lately.
> +		 */
> +		if (rq->cpu_capacity_orig == READ_ONCE(rq->rd->max_cpu_capacity) &&
> +		    !check_cpu_capacity(rq, sd))
> +			return cpu;
> +
> +		if (rq->cpu_capacity > best_cap) {
> +			best_cap = rq->cpu_capacity;
> +			best_cpu = cpu;
> +		}
> +	}
> +
> +	return best_cpu;
> +}
> +
>  /*
>   * Try and locate an idle core/thread in the LLC cache domain.
>   */
> @@ -5902,6 +5956,11 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>  	struct sched_domain *sd;
>  	int i, recent_used_cpu;
>  
> +	/* For asymmetric capacities, try to be smart about the placement */
> +	i = select_idle_capacity(p, target);
> +	if ((unsigned)i < nr_cpumask_bits)
> +		return i;
> +
>  	if (available_idle_cpu(target) || sched_idle_cpu(target))
>  		return target;
>  
> -- 
> 2.24.0
> 

Thanks,
Pavan

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/3] sched/fair: Capacity aware wakeup rework
  2020-01-26 20:09 [PATCH v3 0/3] sched/fair: Capacity aware wakeup rework Valentin Schneider
                   ` (2 preceding siblings ...)
  2020-01-26 20:09 ` [PATCH v3 3/3] sched/fair: Kill wake_cap() Valentin Schneider
@ 2020-01-28  9:24 ` Dietmar Eggemann
  2020-01-28 11:34   ` Valentin Schneider
  3 siblings, 1 reply; 17+ messages in thread
From: Dietmar Eggemann @ 2020-01-28  9:24 UTC (permalink / raw)
  To: Valentin Schneider, linux-kernel
  Cc: mingo, peterz, vincent.guittot, morten.rasmussen, qperret, adharmap

On 26/01/2020 21:09, Valentin Schneider wrote:

[...]

> v2 -> v3
> --------
> o Added missing sync_entity_load_avg() (Quentin)
> o Added fallback CPU selection (maximize capacity)
> o Added special case for CPU hogs: task_fits_capacity() will always return 'false'
>   for tasks that are simply too big, due to the margin.

v3 fixes the Geekbench multicore regression I saw on Pixel4 (Android 10,
Android Common Kernel v4.14 based, Snapdragon 855) running v1.

I changed the Pixel4 kernel a bit (PELT instead WALT, mainline
select_idle_sibling() instead the csctate aware one), mainline
task_fits_capacity()) for base, v1 & v3.

Since it's not mainline kernel the results have to be taken with a pinch
of salt but they probably show that the new condition:

if (rq->cpu_capacity_orig == READ_ONCE(rq->rd->max_cpu_capacity) && ...
    return cpu;

has an effect when dealing with tasks with util_est values > 0.8 * 1024.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan
  2020-01-28  6:22   ` Pavan Kondeti
@ 2020-01-28 11:30     ` Valentin Schneider
  2020-01-29  3:52       ` Pavan Kondeti
  2020-01-29 10:38       ` Dietmar Eggemann
  0 siblings, 2 replies; 17+ messages in thread
From: Valentin Schneider @ 2020-01-28 11:30 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: linux-kernel, mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, qperret, adharmap

Hi Pavan,

On 28/01/2020 06:22, Pavan Kondeti wrote:
> Hi Valentin,
> 
> On Sun, Jan 26, 2020 at 08:09:32PM +0000, Valentin Schneider wrote:
>>  
>> +static inline int check_cpu_capacity(struct rq *rq, struct sched_domain *sd);
>> +
>> +/*
>> + * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
>> + * the task fits. If no CPU is big enough, but there are idle ones, try to
>> + * maximize capacity.
>> + */
>> +static int select_idle_capacity(struct task_struct *p, int target)
>> +{
>> +	unsigned long best_cap = 0;
>> +	struct sched_domain *sd;
>> +	struct cpumask *cpus;
>> +	int best_cpu = -1;
>> +	struct rq *rq;
>> +	int cpu;
>> +
>> +	if (!static_branch_unlikely(&sched_asym_cpucapacity))
>> +		return -1;
>> +
>> +	sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
>> +	if (!sd)
>> +		return -1;
>> +
>> +	sync_entity_load_avg(&p->se);
>> +
>> +	cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
>> +	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>> +
>> +	for_each_cpu_wrap(cpu, cpus, target) {
>> +		rq = cpu_rq(cpu);
>> +
>> +		if (!available_idle_cpu(cpu))
>> +			continue;
>> +		if (task_fits_capacity(p, rq->cpu_capacity))
>> +			return cpu;
> 
> I have couple of questions.
> 
> (1) Any particular reason for not checking sched_idle_cpu() as a backup
> for the case where all eligible CPUs are busy? select_idle_cpu() does
> that.
> 

No particular reason other than we didn't consider it, I think. I don't see
any harm in folding it in, I'll do that for v4. I am curious however; are
you folks making use of SCHED_IDLE? AFAIA Android isn't making use of it
yet, though Viresh paved the way for that to happen.

> (2) Assuming all CPUs are busy, we return -1 from here and end up
> calling select_idle_cpu(). The traversal in select_idle_cpu() may be
> waste in cases where sd_llc == sd_asym_cpucapacity . For example SDM845.
> Should we worry about this?
> 

Before v3, since we didn't have the fallback CPU selection within
select_idle_capacity(), we would need the fall-through to select_idle_cpu()
(we could've skipped an idle CPU just because its capacity wasn't high
enough).

That's not the case anymore, so indeed we may be able to bail out of
select_idle_sibling() right after select_idle_capacity() (or after the
prev / recent_used_cpu checks). Our only requirement here is that sd_llc
remains a subset of sd_asym_cpucapacity.

So far for Arm topologies we can have:
- sd_llc < sd_asym_cpucapacity (e.g. legacy big.LITTLE like Juno)
- sd_llc == sd_asym_cpucapacity (e.g. DynamIQ like SDM845)

I'm slightly worried about sd_llc > sd_asym_cpucapacity ever being an
actual thing - I don't believe it makes much sense, but that's not stopping
anyone.

AFAIA we (Arm) *currently* don't allow that with big.LITTLE or DynamIQ, nor
do I think it can happen with the default scheduler topology where MC is
the last possible level we can have as sd_llc.

So it *might* be a safe assumption - and I can still add a SCHED_WARN_ON().

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/3] sched/fair: Capacity aware wakeup rework
  2020-01-28  9:24 ` [PATCH v3 0/3] sched/fair: Capacity aware wakeup rework Dietmar Eggemann
@ 2020-01-28 11:34   ` Valentin Schneider
  0 siblings, 0 replies; 17+ messages in thread
From: Valentin Schneider @ 2020-01-28 11:34 UTC (permalink / raw)
  To: Dietmar Eggemann, linux-kernel
  Cc: mingo, peterz, vincent.guittot, morten.rasmussen, qperret, adharmap

On 28/01/2020 09:24, Dietmar Eggemann wrote:
> On 26/01/2020 21:09, Valentin Schneider wrote:
> 
> [...]
> 
>> v2 -> v3
>> --------
>> o Added missing sync_entity_load_avg() (Quentin)
>> o Added fallback CPU selection (maximize capacity)
>> o Added special case for CPU hogs: task_fits_capacity() will always return 'false'
>>   for tasks that are simply too big, due to the margin.
> 
> v3 fixes the Geekbench multicore regression I saw on Pixel4 (Android 10,
> Android Common Kernel v4.14 based, Snapdragon 855) running v1.
> 
> I changed the Pixel4 kernel a bit (PELT instead WALT, mainline
> select_idle_sibling() instead the csctate aware one), mainline
> task_fits_capacity()) for base, v1 & v3.
> 
> Since it's not mainline kernel the results have to be taken with a pinch
> of salt but they probably show that the new condition:
> 
> if (rq->cpu_capacity_orig == READ_ONCE(rq->rd->max_cpu_capacity) && ...
>     return cpu;
> 
> has an effect when dealing with tasks with util_est values > 0.8 * 1024.
> 

Thanks for going through that hassle! I'm quite pleased with the results
of that change - if you also look at e.g. the stddev for sysbench on Juno,
runs are much more consistent now.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan
  2020-01-28 11:30     ` Valentin Schneider
@ 2020-01-29  3:52       ` Pavan Kondeti
  2020-01-29  8:16         ` Qais Yousef
  2020-01-29  9:25         ` Valentin Schneider
  2020-01-29 10:38       ` Dietmar Eggemann
  1 sibling, 2 replies; 17+ messages in thread
From: Pavan Kondeti @ 2020-01-29  3:52 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, qperret, adharmap

On Tue, Jan 28, 2020 at 11:30:26AM +0000, Valentin Schneider wrote:
> Hi Pavan,
> 
> On 28/01/2020 06:22, Pavan Kondeti wrote:
> > Hi Valentin,
> > 
> > On Sun, Jan 26, 2020 at 08:09:32PM +0000, Valentin Schneider wrote:
> >>  
> >> +static inline int check_cpu_capacity(struct rq *rq, struct sched_domain *sd);
> >> +
> >> +/*
> >> + * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
> >> + * the task fits. If no CPU is big enough, but there are idle ones, try to
> >> + * maximize capacity.
> >> + */
> >> +static int select_idle_capacity(struct task_struct *p, int target)
> >> +{
> >> +	unsigned long best_cap = 0;
> >> +	struct sched_domain *sd;
> >> +	struct cpumask *cpus;
> >> +	int best_cpu = -1;
> >> +	struct rq *rq;
> >> +	int cpu;
> >> +
> >> +	if (!static_branch_unlikely(&sched_asym_cpucapacity))
> >> +		return -1;
> >> +
> >> +	sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
> >> +	if (!sd)
> >> +		return -1;
> >> +
> >> +	sync_entity_load_avg(&p->se);
> >> +
> >> +	cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> >> +	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> >> +
> >> +	for_each_cpu_wrap(cpu, cpus, target) {
> >> +		rq = cpu_rq(cpu);
> >> +
> >> +		if (!available_idle_cpu(cpu))
> >> +			continue;
> >> +		if (task_fits_capacity(p, rq->cpu_capacity))
> >> +			return cpu;
> > 
> > I have couple of questions.
> > 
> > (1) Any particular reason for not checking sched_idle_cpu() as a backup
> > for the case where all eligible CPUs are busy? select_idle_cpu() does
> > that.
> > 
> 
> No particular reason other than we didn't consider it, I think. I don't see
> any harm in folding it in, I'll do that for v4. I am curious however; are
> you folks making use of SCHED_IDLE? AFAIA Android isn't making use of it
> yet, though Viresh paved the way for that to happen.
> 

We are not using SCHED_IDLE in product setups. I am told Android may use it
for background tasks in future. I am not completely sure though. I asked it
because select_idle_cpu() is using it.

> > (2) Assuming all CPUs are busy, we return -1 from here and end up
> > calling select_idle_cpu(). The traversal in select_idle_cpu() may be
> > waste in cases where sd_llc == sd_asym_cpucapacity . For example SDM845.
> > Should we worry about this?
> > 
> 
> Before v3, since we didn't have the fallback CPU selection within
> select_idle_capacity(), we would need the fall-through to select_idle_cpu()
> (we could've skipped an idle CPU just because its capacity wasn't high
> enough).
> 
> That's not the case anymore, so indeed we may be able to bail out of
> select_idle_sibling() right after select_idle_capacity() (or after the
> prev / recent_used_cpu checks). Our only requirement here is that sd_llc
> remains a subset of sd_asym_cpucapacity.
> 
> So far for Arm topologies we can have:
> - sd_llc < sd_asym_cpucapacity (e.g. legacy big.LITTLE like Juno)
> - sd_llc == sd_asym_cpucapacity (e.g. DynamIQ like SDM845)
> 
> I'm slightly worried about sd_llc > sd_asym_cpucapacity ever being an
> actual thing - I don't believe it makes much sense, but that's not stopping
> anyone.
> 
> AFAIA we (Arm) *currently* don't allow that with big.LITTLE or DynamIQ, nor
> do I think it can happen with the default scheduler topology where MC is
> the last possible level we can have as sd_llc.
> 
> So it *might* be a safe assumption - and I can still add a SCHED_WARN_ON().

Agreed.

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan
  2020-01-29  3:52       ` Pavan Kondeti
@ 2020-01-29  8:16         ` Qais Yousef
  2020-01-29  9:25         ` Valentin Schneider
  1 sibling, 0 replies; 17+ messages in thread
From: Qais Yousef @ 2020-01-29  8:16 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: Valentin Schneider, linux-kernel, mingo, peterz, vincent.guittot,
	dietmar.eggemann, morten.rasmussen, qperret, adharmap

On 01/29/20 09:22, Pavan Kondeti wrote:
> On Tue, Jan 28, 2020 at 11:30:26AM +0000, Valentin Schneider wrote:
> > Hi Pavan,
> > 
> > On 28/01/2020 06:22, Pavan Kondeti wrote:
> > > Hi Valentin,
> > > 
> > > On Sun, Jan 26, 2020 at 08:09:32PM +0000, Valentin Schneider wrote:
> > >>  
> > >> +static inline int check_cpu_capacity(struct rq *rq, struct sched_domain *sd);
> > >> +
> > >> +/*
> > >> + * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
> > >> + * the task fits. If no CPU is big enough, but there are idle ones, try to
> > >> + * maximize capacity.
> > >> + */
> > >> +static int select_idle_capacity(struct task_struct *p, int target)
> > >> +{
> > >> +	unsigned long best_cap = 0;
> > >> +	struct sched_domain *sd;
> > >> +	struct cpumask *cpus;
> > >> +	int best_cpu = -1;
> > >> +	struct rq *rq;
> > >> +	int cpu;
> > >> +
> > >> +	if (!static_branch_unlikely(&sched_asym_cpucapacity))
> > >> +		return -1;
> > >> +
> > >> +	sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
> > >> +	if (!sd)
> > >> +		return -1;
> > >> +
> > >> +	sync_entity_load_avg(&p->se);
> > >> +
> > >> +	cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> > >> +	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > >> +
> > >> +	for_each_cpu_wrap(cpu, cpus, target) {
> > >> +		rq = cpu_rq(cpu);
> > >> +
> > >> +		if (!available_idle_cpu(cpu))
> > >> +			continue;
> > >> +		if (task_fits_capacity(p, rq->cpu_capacity))
> > >> +			return cpu;
> > > 
> > > I have couple of questions.
> > > 
> > > (1) Any particular reason for not checking sched_idle_cpu() as a backup
> > > for the case where all eligible CPUs are busy? select_idle_cpu() does
> > > that.
> > > 
> > 
> > No particular reason other than we didn't consider it, I think. I don't see
> > any harm in folding it in, I'll do that for v4. I am curious however; are
> > you folks making use of SCHED_IDLE? AFAIA Android isn't making use of it
> > yet, though Viresh paved the way for that to happen.
> > 
> 
> We are not using SCHED_IDLE in product setups. I am told Android may use it
> for background tasks in future. I am not completely sure though. I asked it
> because select_idle_cpu() is using it.

I believe Viresh intention when he pushed for the support was allowing this use
case.

FWIW, I had a patch locally to implement this but waiting on latency_nice
attribute to go in [1] before I attempt to upstream it.

The design goals I had in mind:

	1. If there are multiple energy efficient CPUs we can select, select
	   the idle one.
	2. If latency nice is set and the most energy efficient CPU is not
	   idle, then fallback to the most energy efficient idle CPU.

Android use case needs EAS path support before it can be useful to it.

[1] https://lore.kernel.org/lkml/20190919144259.vpuv7hvtqon4qgrv@e107158-lin.cambridge.arm.com/

--
Qais Yousef

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan
  2020-01-29  3:52       ` Pavan Kondeti
  2020-01-29  8:16         ` Qais Yousef
@ 2020-01-29  9:25         ` Valentin Schneider
  1 sibling, 0 replies; 17+ messages in thread
From: Valentin Schneider @ 2020-01-29  9:25 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: linux-kernel, mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, qperret, adharmap



On 29/01/2020 03:52, Pavan Kondeti wrote:
>> No particular reason other than we didn't consider it, I think. I don't see
>> any harm in folding it in, I'll do that for v4. I am curious however; are
>> you folks making use of SCHED_IDLE? AFAIA Android isn't making use of it
>> yet, though Viresh paved the way for that to happen.
>>
> 
> We are not using SCHED_IDLE in product setups. I am told Android may use it
> for background tasks in future. I am not completely sure though. I asked it
> because select_idle_cpu() is using it.
> 

Fair enough! Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan
  2020-01-28 11:30     ` Valentin Schneider
  2020-01-29  3:52       ` Pavan Kondeti
@ 2020-01-29 10:38       ` Dietmar Eggemann
  2020-01-29 12:10         ` Valentin Schneider
  1 sibling, 1 reply; 17+ messages in thread
From: Dietmar Eggemann @ 2020-01-29 10:38 UTC (permalink / raw)
  To: Valentin Schneider, Pavan Kondeti
  Cc: linux-kernel, mingo, peterz, vincent.guittot, morten.rasmussen,
	qperret, adharmap

On 28/01/2020 12:30, Valentin Schneider wrote:
> Hi Pavan,
> 
> On 28/01/2020 06:22, Pavan Kondeti wrote:
>> Hi Valentin,
>>
>> On Sun, Jan 26, 2020 at 08:09:32PM +0000, Valentin Schneider wrote:

[...]

>>> +
>>> +	if (!static_branch_unlikely(&sched_asym_cpucapacity))
>>> +		return -1;

We do need this one to bail out quickly on non CPU asym systems. (1)

>>> +	sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
>>> +	if (!sd)
>>> +		return -1;

And I assume we can't return target here because of exclusive cpusets
which can form symmetric CPU capacities islands on a CPU asymmetric
system? (2)

>>> +	sync_entity_load_avg(&p->se);
>>> +
>>> +	cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
>>> +	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>>> +
>>> +	for_each_cpu_wrap(cpu, cpus, target) {
>>> +		rq = cpu_rq(cpu);
>>> +
>>> +		if (!available_idle_cpu(cpu))
>>> +			continue;
>>> +		if (task_fits_capacity(p, rq->cpu_capacity))
>>> +			return cpu;
>>
>> I have couple of questions.

[...]

>> (2) Assuming all CPUs are busy, we return -1 from here and end up
>> calling select_idle_cpu(). The traversal in select_idle_cpu() may be
>> waste in cases where sd_llc == sd_asym_cpucapacity . For example SDM845.
>> Should we worry about this?
>>
> 
> Before v3, since we didn't have the fallback CPU selection within
> select_idle_capacity(), we would need the fall-through to select_idle_cpu()
> (we could've skipped an idle CPU just because its capacity wasn't high
> enough).
> 
> That's not the case anymore, so indeed we may be able to bail out of
> select_idle_sibling() right after select_idle_capacity() (or after the
> prev / recent_used_cpu checks). Our only requirement here is that sd_llc
> remains a subset of sd_asym_cpucapacity.

How do you distinguish '-1' in (1), (2) and 'best_cpu = -1' (3)?

In (1) and (2) you want to check if target is idle (or sched_idle) but
in (3) you probably only want to check 'recent_used_cpu'?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan
  2020-01-26 20:09 ` [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan Valentin Schneider
  2020-01-28  6:22   ` Pavan Kondeti
@ 2020-01-29 11:04   ` Dietmar Eggemann
  2020-01-29 12:10     ` Valentin Schneider
  1 sibling, 1 reply; 17+ messages in thread
From: Dietmar Eggemann @ 2020-01-29 11:04 UTC (permalink / raw)
  To: Valentin Schneider, linux-kernel
  Cc: mingo, peterz, vincent.guittot, morten.rasmussen, qperret, adharmap

On 26/01/2020 21:09, Valentin Schneider wrote:

[...]

> +static int select_idle_capacity(struct task_struct *p, int target)
> +{
> +	unsigned long best_cap = 0;
> +	struct sched_domain *sd;
> +	struct cpumask *cpus;
> +	int best_cpu = -1;
> +	struct rq *rq;
> +	int cpu;
> +
> +	if (!static_branch_unlikely(&sched_asym_cpucapacity))
> +		return -1;
> +
> +	sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
> +	if (!sd)
> +		return -1;
> +
> +	sync_entity_load_avg(&p->se);
> +
> +	cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> +	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> +
> +	for_each_cpu_wrap(cpu, cpus, target) {
> +		rq = cpu_rq(cpu);
> +
> +		if (!available_idle_cpu(cpu))
> +			continue;
> +		if (task_fits_capacity(p, rq->cpu_capacity))
> +			return cpu;
> +
> +		/*
> +		 * It would be silly to keep looping when we've found a CPU
> +		 * of highest available capacity. Just check that it's not been
> +		 * too pressured lately.
> +		 */
> +		if (rq->cpu_capacity_orig == READ_ONCE(rq->rd->max_cpu_capacity) &&

There is a similar check in check_misfit_status(). Common helper function?

> +		    !check_cpu_capacity(rq, sd))
> +			return cpu;

I wonder how this special treatment of a big CPU behaves in (LITTLE,
medium, big) system like Pixel4 (Snapdragon 855):

 flame:/ $ cat /sys/devices/system/cpu/cpu*/cpu_capacity

261
261
261
261
871
871
871
1024

Or on legacy systems where the sd->imbalance_pct is 25% instead of 17%?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan
  2020-01-29 10:38       ` Dietmar Eggemann
@ 2020-01-29 12:10         ` Valentin Schneider
  0 siblings, 0 replies; 17+ messages in thread
From: Valentin Schneider @ 2020-01-29 12:10 UTC (permalink / raw)
  To: Dietmar Eggemann, Pavan Kondeti
  Cc: linux-kernel, mingo, peterz, vincent.guittot, morten.rasmussen,
	qperret, adharmap

On 29/01/2020 10:38, Dietmar Eggemann wrote:
> On 28/01/2020 12:30, Valentin Schneider wrote:
>> Hi Pavan,
>>
>> On 28/01/2020 06:22, Pavan Kondeti wrote:
>>> Hi Valentin,
>>>
>>> On Sun, Jan 26, 2020 at 08:09:32PM +0000, Valentin Schneider wrote:
> 
> [...]
> 
>>>> +
>>>> +	if (!static_branch_unlikely(&sched_asym_cpucapacity))
>>>> +		return -1;
> 
> We do need this one to bail out quickly on non CPU asym systems. (1)
> 
>>>> +	sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
>>>> +	if (!sd)
>>>> +		return -1;
> 
> And I assume we can't return target here because of exclusive cpusets
> which can form symmetric CPU capacities islands on a CPU asymmetric
> system? (2)
> 

Precisely, the "canonical" check for asymmetry is static key + SD pointer.
In terms of functionality we could "just" check sd_asym_cpucapacity (it can't
be set without having the static key set, though the reverse isn't true),
but we *want* to use the static key here to make SMP people happy.

>> That's not the case anymore, so indeed we may be able to bail out of
>> select_idle_sibling() right after select_idle_capacity() (or after the
>> prev / recent_used_cpu checks). Our only requirement here is that sd_llc
>> remains a subset of sd_asym_cpucapacity.
> 
> How do you distinguish '-1' in (1), (2) and 'best_cpu = -1' (3)?
> 
> In (1) and (2) you want to check if target is idle (or sched_idle) but
> in (3) you probably only want to check 'recent_used_cpu'?
> 

Right, when we come back from select_idle_capacity(), and we did go through
the CPU loop, but we still returned -1, it means all CPUs in sd_asym_cpucapacity
were not idle. This includes 'target' of course, so we shouldn't need to
check it again. 

In those cases we might still not have evaluated 'prev' or 'recent_used_cpu'.
It's somewhat of a last ditch attempt to find an idle CPU, and they'll only
help when they aren't in sd_asym_cpucapacity. I'm actually curious as to how
much the 'recent_used_cpu' thing helps, I've never paid it much attention.

So yeah my options are (for asym CPU capacity topologies):
a) just bail out after select_idle_capacity()
b) try to squeeze out a bit more out of select_idle_sibling() by also doing
  the 'prev' & 'recent_used_cpu' checks.

a) is quite easy to implement; I can just inline the static key and sd checks
  in select_idle_sibling() and return unconditionally once I'm past those
  checks.

b) is more intrusive and I don't have a clear picture yet as to how much it
  will really bring to the table.

I'll think about it and try to play around with both of these.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan
  2020-01-29 11:04   ` Dietmar Eggemann
@ 2020-01-29 12:10     ` Valentin Schneider
  0 siblings, 0 replies; 17+ messages in thread
From: Valentin Schneider @ 2020-01-29 12:10 UTC (permalink / raw)
  To: Dietmar Eggemann, linux-kernel
  Cc: mingo, peterz, vincent.guittot, morten.rasmussen, qperret, adharmap

On 29/01/2020 11:04, Dietmar Eggemann wrote:
>> +		/*
>> +		 * It would be silly to keep looping when we've found a CPU
>> +		 * of highest available capacity. Just check that it's not been
>> +		 * too pressured lately.
>> +		 */
>> +		if (rq->cpu_capacity_orig == READ_ONCE(rq->rd->max_cpu_capacity) &&
> 
> There is a similar check in check_misfit_status(). Common helper function?

Mright, and check_misfit_status() is missing the READ_ONCE(). That said...

> 
>> +		    !check_cpu_capacity(rq, sd))
>> +			return cpu;
> 
> I wonder how this special treatment of a big CPU behaves in (LITTLE,
> medium, big) system like Pixel4 (Snapdragon 855):
> 
>  flame:/ $ cat /sys/devices/system/cpu/cpu*/cpu_capacity
> 
> 261
> 261
> 261
> 261
> 871
> 871
> 871
> 1024
> 
> Or on legacy systems where the sd->imbalance_pct is 25% instead of 17%?
> 

... This is a very valid point. When I wrote this bit I had the good old
big.LITTLE split in mind where there are big differences between the capacity
values. As you point out, that's not so true with DynamIQ systems sporting
> 2 capacity values. The issue here is that we could bail early picking a
(slightly) pressured big (1024 capacity_orig) when there was a non-pressured
idle medium (871 capacity orig).

It's borderline in this example - the threshold for a big to be seen as
pressured by check_cpu_capacity(), assuming a flat topology with just an MC
domain, is ~ 875. If we have e.g. mediums at 900 and bigs at 1024, this
logic is broken.

So this is pretty much a case of my trying to be too clever for my own good,
I'll remove that "fastpath" in v4. Thanks for pointing it out!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/3] sched/topology: Remove SD_BALANCE_WAKE on asymmetric capacity systems
  2020-01-26 20:09 ` [PATCH v3 2/3] sched/topology: Remove SD_BALANCE_WAKE on asymmetric capacity systems Valentin Schneider
@ 2020-01-29 16:27   ` Dietmar Eggemann
  2020-01-30 11:06     ` Valentin Schneider
  0 siblings, 1 reply; 17+ messages in thread
From: Dietmar Eggemann @ 2020-01-29 16:27 UTC (permalink / raw)
  To: Valentin Schneider, linux-kernel
  Cc: mingo, peterz, vincent.guittot, morten.rasmussen, qperret, adharmap

On 26/01/2020 21:09, Valentin Schneider wrote:
> From: Morten Rasmussen <morten.rasmussen@arm.com>

[...]

> @@ -1374,18 +1374,9 @@ sd_init(struct sched_domain_topology_level *tl,
>  	 * Convert topological properties into behaviour.
>  	 */
>  
> -	if (sd->flags & SD_ASYM_CPUCAPACITY) {
> -		struct sched_domain *t = sd;
> -
> -		/*
> -		 * Don't attempt to spread across CPUs of different capacities.
> -		 */
> -		if (sd->child)
> -			sd->child->flags &= ~SD_PREFER_SIBLING;
> -
> -		for_each_lower_domain(t)

It seems that with this goes the last use of for_each_lower_domain().

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/3] sched/topology: Remove SD_BALANCE_WAKE on asymmetric capacity systems
  2020-01-29 16:27   ` Dietmar Eggemann
@ 2020-01-30 11:06     ` Valentin Schneider
  0 siblings, 0 replies; 17+ messages in thread
From: Valentin Schneider @ 2020-01-30 11:06 UTC (permalink / raw)
  To: Dietmar Eggemann, linux-kernel
  Cc: mingo, peterz, vincent.guittot, morten.rasmussen, qperret, adharmap



On 29/01/2020 16:27, Dietmar Eggemann wrote:
> On 26/01/2020 21:09, Valentin Schneider wrote:
>> From: Morten Rasmussen <morten.rasmussen@arm.com>
> 
> [...]
> 
>> @@ -1374,18 +1374,9 @@ sd_init(struct sched_domain_topology_level *tl,
>>  	 * Convert topological properties into behaviour.
>>  	 */
>>  
>> -	if (sd->flags & SD_ASYM_CPUCAPACITY) {
>> -		struct sched_domain *t = sd;
>> -
>> -		/*
>> -		 * Don't attempt to spread across CPUs of different capacities.
>> -		 */
>> -		if (sd->child)
>> -			sd->child->flags &= ~SD_PREFER_SIBLING;
>> -
>> -		for_each_lower_domain(t)
> 
> It seems that with this goes the last use of for_each_lower_domain().
> 
> [...]
> 

Indeed, good catch. I see that was used at some point before the
select_idle_sibling() rework:

  10e2f1acd010 ("sched/core: Rewrite and improve select_idle_siblings()")

and our asymmetry policy saved it. I'll remove it in v4.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-01-30 11:06 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-26 20:09 [PATCH v3 0/3] sched/fair: Capacity aware wakeup rework Valentin Schneider
2020-01-26 20:09 ` [PATCH v3 1/3] sched/fair: Add asymmetric CPU capacity wakeup scan Valentin Schneider
2020-01-28  6:22   ` Pavan Kondeti
2020-01-28 11:30     ` Valentin Schneider
2020-01-29  3:52       ` Pavan Kondeti
2020-01-29  8:16         ` Qais Yousef
2020-01-29  9:25         ` Valentin Schneider
2020-01-29 10:38       ` Dietmar Eggemann
2020-01-29 12:10         ` Valentin Schneider
2020-01-29 11:04   ` Dietmar Eggemann
2020-01-29 12:10     ` Valentin Schneider
2020-01-26 20:09 ` [PATCH v3 2/3] sched/topology: Remove SD_BALANCE_WAKE on asymmetric capacity systems Valentin Schneider
2020-01-29 16:27   ` Dietmar Eggemann
2020-01-30 11:06     ` Valentin Schneider
2020-01-26 20:09 ` [PATCH v3 3/3] sched/fair: Kill wake_cap() Valentin Schneider
2020-01-28  9:24 ` [PATCH v3 0/3] sched/fair: Capacity aware wakeup rework Dietmar Eggemann
2020-01-28 11:34   ` Valentin Schneider

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).