linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] sched/topology: Asymmetric topologies fixes
@ 2019-10-23 15:37 Valentin Schneider
  2019-10-23 15:37 ` [PATCH v4 1/2] sched/topology: Don't try to build empty sched domains Valentin Schneider
  2019-10-23 15:37 ` [PATCH v4 2/2] sched/topology: Allow sched_asym_cpucapacity to be disabled Valentin Schneider
  0 siblings, 2 replies; 10+ messages in thread
From: Valentin Schneider @ 2019-10-23 15:37 UTC (permalink / raw)
  To: linux-kernel, cgroups
  Cc: lizefan, tj, hannes, mingo, peterz, vincent.guittot,
	Dietmar.Eggemann, morten.rasmussen, qperret

Hi,

I got a nice splat while testing out the toggling of
sched_asym_cpucapacity, so this is a cpuset fix plus a topology patch.

Details are in the logs.

v2 changes:
  - Use static_branch_{inc,dec} rather than enable/disable

v3 changes:
  - New patch: add fix for empty cpumap in sched domain rebuild
  - Move static_branch_dec outside of RCU read-side section (Quentin)

v4 changes:
  - Patch 1/2: Directly tweak the cpuset array (Dietmar)
  - Patch 2/2: Add an example to the changelog (Dietmar)

Cheers,
Valentin

Valentin Schneider (2):
  sched/topology: Don't try to build empty sched domains
  sched/topology: Allow sched_asym_cpucapacity to be disabled

 kernel/cgroup/cpuset.c  |  3 ++-
 kernel/sched/topology.c | 11 +++++++++--
 2 files changed, 11 insertions(+), 3 deletions(-)

--
2.22.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4 1/2] sched/topology: Don't try to build empty sched domains
  2019-10-23 15:37 [PATCH v4 0/2] sched/topology: Asymmetric topologies fixes Valentin Schneider
@ 2019-10-23 15:37 ` Valentin Schneider
  2019-10-24 16:19   ` Dietmar Eggemann
                     ` (2 more replies)
  2019-10-23 15:37 ` [PATCH v4 2/2] sched/topology: Allow sched_asym_cpucapacity to be disabled Valentin Schneider
  1 sibling, 3 replies; 10+ messages in thread
From: Valentin Schneider @ 2019-10-23 15:37 UTC (permalink / raw)
  To: linux-kernel, cgroups
  Cc: lizefan, tj, hannes, mingo, peterz, vincent.guittot,
	Dietmar.Eggemann, morten.rasmussen, qperret, stable

Turns out hotplugging CPUs that are in exclusive cpusets can lead to the
cpuset code feeding empty cpumasks to the sched domain rebuild machinery.
This leads to the following splat:

[   30.618174] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[   30.623697] Modules linked in:
[   30.626731] CPU: 0 PID: 235 Comm: kworker/5:2 Not tainted 5.4.0-rc1-00005-g8d495477d62e #23
[   30.635003] Hardware name: ARM Juno development board (r0) (DT)
[   30.640877] Workqueue: events cpuset_hotplug_workfn
[   30.645713] pstate: 60000005 (nZCv daif -PAN -UAO)
[   30.650464] pc : build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
[   30.655126] lr : build_sched_domains (kernel/sched/topology.c:1966)
[...]
[   30.742047] Call trace:
[   30.744474] build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
[   30.748793] partition_sched_domains_locked (kernel/sched/topology.c:2250)
[   30.753971] rebuild_sched_domains_locked (./include/linux/bitmap.h:370 ./include/linux/cpumask.h:538 kernel/cgroup/cpuset.c:955 kernel/cgroup/cpuset.c:978 kernel/cgroup/cpuset.c:1019)
[   30.758977] rebuild_sched_domains (kernel/cgroup/cpuset.c:1032)
[   30.763209] cpuset_hotplug_workfn (kernel/cgroup/cpuset.c:3205 (discriminator 2))
[   30.767613] process_one_work (./arch/arm64/include/asm/jump_label.h:21 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:114 kernel/workqueue.c:2274)
[   30.771586] worker_thread (./include/linux/compiler.h:199 ./include/linux/list.h:268 kernel/workqueue.c:2416)
[   30.775217] kthread (kernel/kthread.c:255)
[   30.778418] ret_from_fork (arch/arm64/kernel/entry.S:1167)
[ 30.781965] Code: f860dae2 912802d6 aa1603e1 12800000 (f8616853)

The faulty line in question is

  cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));

and we're not checking the return value against nr_cpu_ids (we shouldn't
have to!), which leads to the above.

Prevent generate_sched_domains() from returning empty cpumasks, and add
some assertion in build_sched_domains() to scream bloody murder if it
happens again.

The above splat was obtained on my Juno r0 with:

  cgcreate -g cpuset:asym
  cgset -r cpuset.cpus=0-3 asym
  cgset -r cpuset.mems=0 asym
  cgset -r cpuset.cpu_exclusive=1 asym

  cgcreate -g cpuset:smp
  cgset -r cpuset.cpus=4-5 smp
  cgset -r cpuset.mems=0 smp
  cgset -r cpuset.cpu_exclusive=1 smp

  cgset -r cpuset.sched_load_balance=0 .

  echo 0 > /sys/devices/system/cpu/cpu4/online
  echo 0 > /sys/devices/system/cpu/cpu5/online

Cc: <stable@vger.kernel.org>
Fixes: 05484e098448 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection")
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/cgroup/cpuset.c  | 3 ++-
 kernel/sched/topology.c | 5 ++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index c52bc91f882b..c87ee6412b36 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -798,7 +798,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
 		    cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
 			continue;
 
-		if (is_sched_load_balance(cp))
+		if (is_sched_load_balance(cp) &&
+		    !cpumask_empty(cp->effective_cpus))
 			csa[csn++] = cp;
 
 		/* skip @cp's subtree if not a partition root */
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 3623ffe85d18..2e7af755e17a 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1945,7 +1945,7 @@ static struct sched_domain_topology_level
 static int
 build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *attr)
 {
-	enum s_alloc alloc_state;
+	enum s_alloc alloc_state = sa_none;
 	struct sched_domain *sd;
 	struct s_data d;
 	struct rq *rq = NULL;
@@ -1953,6 +1953,9 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
 	struct sched_domain_topology_level *tl_asym;
 	bool has_asym = false;
 
+	if (WARN_ON(cpumask_empty(cpu_map)))
+		goto error;
+
 	alloc_state = __visit_domain_allocation_hell(&d, cpu_map);
 	if (alloc_state != sa_rootdomain)
 		goto error;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 2/2] sched/topology: Allow sched_asym_cpucapacity to be disabled
  2019-10-23 15:37 [PATCH v4 0/2] sched/topology: Asymmetric topologies fixes Valentin Schneider
  2019-10-23 15:37 ` [PATCH v4 1/2] sched/topology: Don't try to build empty sched domains Valentin Schneider
@ 2019-10-23 15:37 ` Valentin Schneider
  2019-10-29  9:52   ` [tip: sched/urgent] " tip-bot2 for Valentin Schneider
  1 sibling, 1 reply; 10+ messages in thread
From: Valentin Schneider @ 2019-10-23 15:37 UTC (permalink / raw)
  To: linux-kernel, cgroups
  Cc: lizefan, tj, hannes, mingo, peterz, vincent.guittot,
	Dietmar.Eggemann, morten.rasmussen, qperret, stable,
	Dietmar Eggemann

While the static key is correctly initialized as being disabled, it will
remain forever enabled once turned on. This means that if we start with an
asymmetric system and hotplug out enough CPUs to end up with an SMP system,
the static key will remain set - which is obviously wrong. We should detect
this and turn off things like misfit migration and capacity aware wakeups.

As Quentin pointed out, having separate root domains makes this slightly
trickier. We could have exclusive cpusets that create an SMP island - IOW,
the domains within this root domain will not see any asymmetry. This means
we can't just disable the key on domain destruction, we need to count how
many asymmetric root domains we have.


Consider the following example using Juno r0 which is 2+4 big.LITTLE, where
two identical cpusets are created: they both span both big and LITTLE CPUs:

    asym0    asym1
  [       ][       ]
   L  L  B  L  L  B

  cgcreate -g cpuset:asym0
  cgset -r cpuset.cpus=0,1,3 asym0
  cgset -r cpuset.mems=0 asym0
  cgset -r cpuset.cpu_exclusive=1 asym0

  cgcreate -g cpuset:asym1
  cgset -r cpuset.cpus=2,4,5 asym1
  cgset -r cpuset.mems=0 asym1
  cgset -r cpuset.cpu_exclusive=1 asym1

  cgset -r cpuset.sched_load_balance=0 .

(the CPU numbering may look odd because on the Juno LITTLEs are CPUs 0,3-5
and bigs are CPUs 1-2)

If we make one of those SMP (IOW remove asymmetry) by e.g. hotplugging its
big core, we would end up with an SMP cpuset and an asymmetric cpuset - the
static key must remain set, because we still have one asymmetric root domain.

With the above example, this could be done with:

  echo 0 > /sys/devices/system/cpu/cpu2/online

Which would result in:

    asym0   asym1
  [       ][    ]
   L  L  B  L  L

When both SMP and asymmetric cpusets are present, all CPUs will observe
sched_asym_cpucapacity being set (it is system-wide), but not all CPUs
observe asymmetry in their sched domain hierarchy:

per_cpu(sd_asym_cpucapacity, <any CPU in asym0>) == <some SD at DIE level>
per_cpu(sd_asym_cpucapacity, <any CPU in asym1>) == NULL


Change the simple key enablement to an increment, and decrement the key
counter when destroying domains that cover asymmetric CPUs.

Cc: <stable@vger.kernel.org>
Fixes: df054e8445a4 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations")
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/topology.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 2e7af755e17a..6ec1e595b1d4 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2026,7 +2026,7 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
 	rcu_read_unlock();
 
 	if (has_asym)
-		static_branch_enable_cpuslocked(&sched_asym_cpucapacity);
+		static_branch_inc_cpuslocked(&sched_asym_cpucapacity);
 
 	if (rq && sched_debug_enabled) {
 		pr_info("root domain span: %*pbl (max cpu_capacity = %lu)\n",
@@ -2121,9 +2121,12 @@ int sched_init_domains(const struct cpumask *cpu_map)
  */
 static void detach_destroy_domains(const struct cpumask *cpu_map)
 {
+	unsigned int cpu = cpumask_any(cpu_map);
 	int i;
 
+	if (rcu_access_pointer(per_cpu(sd_asym_cpucapacity, cpu)))
+		static_branch_dec_cpuslocked(&sched_asym_cpucapacity);
+
 	rcu_read_lock();
 	for_each_cpu(i, cpu_map)
 		cpu_attach_domain(NULL, &def_root_domain, i);
--
2.22.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/2] sched/topology: Don't try to build empty sched domains
  2019-10-23 15:37 ` [PATCH v4 1/2] sched/topology: Don't try to build empty sched domains Valentin Schneider
@ 2019-10-24 16:19   ` Dietmar Eggemann
  2019-10-24 16:45     ` Valentin Schneider
  2019-10-29  9:52   ` [tip: sched/urgent] " tip-bot2 for Valentin Schneider
  2019-10-31 16:23   ` [PATCH v4 1/2] " Michal Koutný
  2 siblings, 1 reply; 10+ messages in thread
From: Dietmar Eggemann @ 2019-10-24 16:19 UTC (permalink / raw)
  To: Valentin Schneider, linux-kernel, cgroups
  Cc: lizefan, tj, hannes, mingo, peterz, vincent.guittot,
	morten.rasmussen, qperret, stable

On 23/10/2019 17:37, Valentin Schneider wrote:
> Turns out hotplugging CPUs that are in exclusive cpusets can lead to the
> cpuset code feeding empty cpumasks to the sched domain rebuild machinery.
> This leads to the following splat:

[...]

> The faulty line in question is
> 
>   cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));
> 
> and we're not checking the return value against nr_cpu_ids (we shouldn't
> have to!), which leads to the above.
> 
> Prevent generate_sched_domains() from returning empty cpumasks, and add
> some assertion in build_sched_domains() to scream bloody murder if it
> happens again.
> 
> The above splat was obtained on my Juno r0 with:
> 
>   cgcreate -g cpuset:asym
>   cgset -r cpuset.cpus=0-3 asym
>   cgset -r cpuset.mems=0 asym
>   cgset -r cpuset.cpu_exclusive=1 asym
> 
>   cgcreate -g cpuset:smp
>   cgset -r cpuset.cpus=4-5 smp
>   cgset -r cpuset.mems=0 smp
>   cgset -r cpuset.cpu_exclusive=1 smp
> 
>   cgset -r cpuset.sched_load_balance=0 .
> 
>   echo 0 > /sys/devices/system/cpu/cpu4/online
>   echo 0 > /sys/devices/system/cpu/cpu5/online
> 
> Cc: <stable@vger.kernel.org>
> Fixes: 05484e098448 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection")

Sorry for being picky but IMHO you should also mention that it fixes

f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting
information")

Tested it on a hikey620 (8 CPus SMP) with v5.4-rc4 and a local fix for
asym_cpu_capacity_level().
2 exclusive cpusets [0-3] and [4-7], hp'ing out [0-3] and then hp'ing in
[0] again.

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 5a174ae6ecf3..8f83e8e3ea9a 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2203,8 +2203,19 @@ void partition_sched_domains_locked(int
ndoms_new, cpumask_var_t doms_new[],
        for (i = 0; i < ndoms_cur; i++) {
                for (j = 0; j < n && !new_topology; j++) {
                        if (cpumask_equal(doms_cur[i], doms_new[j]) &&
-                           dattrs_equal(dattr_cur, i, dattr_new, j))
+                           dattrs_equal(dattr_cur, i, dattr_new, j)) {
+                               struct root_domain *rd;
+
+                        /*
+                         * This domain won't be destroyed and as such
+                         * its dl_bw->total_bw needs to be cleared.  It
+                         * will be recomputed in function
+                         * update_tasks_root_domain().
+                         */
+                         rd = cpu_rq(cpumask_any(doms_cur[i]))->rd;

We have an issue here if doms_cur[i] is empty.

+                         dl_clear_root_domain(rd);
                          goto match1;


There is yet another similar issue behind the first one
(asym_cpu_capacity_level()).

342 static bool build_perf_domains(const struct cpumask *cpu_map)
 343 {
 344     int i, nr_pd = 0, nr_cs = 0, nr_cpus = cpumask_weight(cpu_map);
 345     struct perf_domain *pd = NULL, *tmp;
 346     int cpu = cpumask_first(cpu_map);          <--- !!!
 347     struct root_domain *rd = cpu_rq(cpu)->rd;  <--- !!!
 348     struct cpufreq_policy *policy;
 349     struct cpufreq_governor *gov;
 ...
 406     tmp = rd->pd;                              <--- !!!

Caught when running hikey620 (8 CPus SMP) with v5.4-rc4 and a local fix
for asym_cpu_capacity_level() with CONFIG_ENERGY_MODEL=y.

There might be other places in build_sched_domains() suffering from the
same issue. So I assume it's wise to not call it with an empty cpu_map
and warn if done so.

[...]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/2] sched/topology: Don't try to build empty sched domains
  2019-10-24 16:19   ` Dietmar Eggemann
@ 2019-10-24 16:45     ` Valentin Schneider
  0 siblings, 0 replies; 10+ messages in thread
From: Valentin Schneider @ 2019-10-24 16:45 UTC (permalink / raw)
  To: Dietmar Eggemann, linux-kernel, cgroups
  Cc: lizefan, tj, hannes, mingo, peterz, vincent.guittot,
	morten.rasmussen, qperret, stable

On 24/10/2019 17:19, Dietmar Eggemann wrote:
> Sorry for being picky but IMHO you should also mention that it fixes
> 
> f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting
> information")
> 

I can append the following to the changelog, although I'd like some
feedback from the cgroup folks before doing a respin:

"""
Note that commit

  f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting information")

introduced a similar issue. Since doms_new is assigned to doms_cur without
any filtering, we can end up with an empty cpumask in the doms_cur array.

The next time we go through a rebuild, this will break on:

  rd = cpu_rq(cpumask_any(doms_cur[i]))->rd;

If there wasn't enough already, this is yet another argument for *not*
handing over empty cpumasks to the sched domain rebuild.
"""

I tagged the commit that introduces the static key with Fixes: because it
was introduced earlier - I don't think it would make sense to have two
"Fixes:" lines? In any case, it'll now be listed in the changelog.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip: sched/urgent] sched/topology: Don't try to build empty sched domains
  2019-10-23 15:37 ` [PATCH v4 1/2] sched/topology: Don't try to build empty sched domains Valentin Schneider
  2019-10-24 16:19   ` Dietmar Eggemann
@ 2019-10-29  9:52   ` tip-bot2 for Valentin Schneider
  2019-10-31 16:23   ` [PATCH v4 1/2] " Michal Koutný
  2 siblings, 0 replies; 10+ messages in thread
From: tip-bot2 for Valentin Schneider @ 2019-10-29  9:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Valentin Schneider, Peter Zijlstra (Intel),
	Dietmar.Eggemann, Linus Torvalds, Thomas Gleixner, hannes,
	lizefan, morten.rasmussen, qperret, tj, vincent.guittot,
	Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     cd1cb3350561d2bf544ddfef76fbf0b1c9c7178f
Gitweb:        https://git.kernel.org/tip/cd1cb3350561d2bf544ddfef76fbf0b1c9c7178f
Author:        Valentin Schneider <valentin.schneider@arm.com>
AuthorDate:    Wed, 23 Oct 2019 16:37:44 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 29 Oct 2019 09:58:45 +01:00

sched/topology: Don't try to build empty sched domains

Turns out hotplugging CPUs that are in exclusive cpusets can lead to the
cpuset code feeding empty cpumasks to the sched domain rebuild machinery.

This leads to the following splat:

    Internal error: Oops: 96000004 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 0 PID: 235 Comm: kworker/5:2 Not tainted 5.4.0-rc1-00005-g8d495477d62e #23
    Hardware name: ARM Juno development board (r0) (DT)
    Workqueue: events cpuset_hotplug_workfn
    pstate: 60000005 (nZCv daif -PAN -UAO)
    pc : build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
    lr : build_sched_domains (kernel/sched/topology.c:1966)
    Call trace:
    build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
    partition_sched_domains_locked (kernel/sched/topology.c:2250)
    rebuild_sched_domains_locked (./include/linux/bitmap.h:370 ./include/linux/cpumask.h:538 kernel/cgroup/cpuset.c:955 kernel/cgroup/cpuset.c:978 kernel/cgroup/cpuset.c:1019)
    rebuild_sched_domains (kernel/cgroup/cpuset.c:1032)
    cpuset_hotplug_workfn (kernel/cgroup/cpuset.c:3205 (discriminator 2))
    process_one_work (./arch/arm64/include/asm/jump_label.h:21 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:114 kernel/workqueue.c:2274)
    worker_thread (./include/linux/compiler.h:199 ./include/linux/list.h:268 kernel/workqueue.c:2416)
    kthread (kernel/kthread.c:255)
    ret_from_fork (arch/arm64/kernel/entry.S:1167)
    Code: f860dae2 912802d6 aa1603e1 12800000 (f8616853)

The faulty line in question is:

  cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));

and we're not checking the return value against nr_cpu_ids (we shouldn't
have to!), which leads to the above.

Prevent generate_sched_domains() from returning empty cpumasks, and add
some assertion in build_sched_domains() to scream bloody murder if it
happens again.

The above splat was obtained on my Juno r0 with the following reproducer:

  $ cgcreate -g cpuset:asym
  $ cgset -r cpuset.cpus=0-3 asym
  $ cgset -r cpuset.mems=0 asym
  $ cgset -r cpuset.cpu_exclusive=1 asym

  $ cgcreate -g cpuset:smp
  $ cgset -r cpuset.cpus=4-5 smp
  $ cgset -r cpuset.mems=0 smp
  $ cgset -r cpuset.cpu_exclusive=1 smp

  $ cgset -r cpuset.sched_load_balance=0 .

  $ echo 0 > /sys/devices/system/cpu/cpu4/online
  $ echo 0 > /sys/devices/system/cpu/cpu5/online

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hannes@cmpxchg.org
Cc: lizefan@huawei.com
Cc: morten.rasmussen@arm.com
Cc: qperret@google.com
Cc: tj@kernel.org
Cc: vincent.guittot@linaro.org
Fixes: 05484e098448 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection")
Link: https://lkml.kernel.org/r/20191023153745.19515-2-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/cgroup/cpuset.c  | 3 ++-
 kernel/sched/topology.c | 5 ++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index c52bc91..c87ee64 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -798,7 +798,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
 		    cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
 			continue;
 
-		if (is_sched_load_balance(cp))
+		if (is_sched_load_balance(cp) &&
+		    !cpumask_empty(cp->effective_cpus))
 			csa[csn++] = cp;
 
 		/* skip @cp's subtree if not a partition root */
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b5667a2..9318acf 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1948,7 +1948,7 @@ next_level:
 static int
 build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *attr)
 {
-	enum s_alloc alloc_state;
+	enum s_alloc alloc_state = sa_none;
 	struct sched_domain *sd;
 	struct s_data d;
 	struct rq *rq = NULL;
@@ -1956,6 +1956,9 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
 	struct sched_domain_topology_level *tl_asym;
 	bool has_asym = false;
 
+	if (WARN_ON(cpumask_empty(cpu_map)))
+		goto error;
+
 	alloc_state = __visit_domain_allocation_hell(&d, cpu_map);
 	if (alloc_state != sa_rootdomain)
 		goto error;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip: sched/urgent] sched/topology: Allow sched_asym_cpucapacity to be disabled
  2019-10-23 15:37 ` [PATCH v4 2/2] sched/topology: Allow sched_asym_cpucapacity to be disabled Valentin Schneider
@ 2019-10-29  9:52   ` tip-bot2 for Valentin Schneider
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot2 for Valentin Schneider @ 2019-10-29  9:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Valentin Schneider, Peter Zijlstra (Intel),
	Dietmar Eggemann, Dietmar.Eggemann, Linus Torvalds,
	Thomas Gleixner, hannes, lizefan, morten.rasmussen, qperret, tj,
	vincent.guittot, Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     e284df705cf1eeedb5ec3a66ed82d17a64659150
Gitweb:        https://git.kernel.org/tip/e284df705cf1eeedb5ec3a66ed82d17a64659150
Author:        Valentin Schneider <valentin.schneider@arm.com>
AuthorDate:    Wed, 23 Oct 2019 16:37:45 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 29 Oct 2019 09:58:46 +01:00

sched/topology: Allow sched_asym_cpucapacity to be disabled

While the static key is correctly initialized as being disabled, it will
remain forever enabled once turned on. This means that if we start with an
asymmetric system and hotplug out enough CPUs to end up with an SMP system,
the static key will remain set - which is obviously wrong. We should detect
this and turn off things like misfit migration and capacity aware wakeups.

As Quentin pointed out, having separate root domains makes this slightly
trickier. We could have exclusive cpusets that create an SMP island - IOW,
the domains within this root domain will not see any asymmetry. This means
we can't just disable the key on domain destruction, we need to count how
many asymmetric root domains we have.

Consider the following example using Juno r0 which is 2+4 big.LITTLE, where
two identical cpusets are created: they both span both big and LITTLE CPUs:

    asym0    asym1
  [       ][       ]
   L  L  B  L  L  B

  $ cgcreate -g cpuset:asym0
  $ cgset -r cpuset.cpus=0,1,3 asym0
  $ cgset -r cpuset.mems=0 asym0
  $ cgset -r cpuset.cpu_exclusive=1 asym0

  $ cgcreate -g cpuset:asym1
  $ cgset -r cpuset.cpus=2,4,5 asym1
  $ cgset -r cpuset.mems=0 asym1
  $ cgset -r cpuset.cpu_exclusive=1 asym1

  $ cgset -r cpuset.sched_load_balance=0 .

(the CPU numbering may look odd because on the Juno LITTLEs are CPUs 0,3-5
and bigs are CPUs 1-2)

If we make one of those SMP (IOW remove asymmetry) by e.g. hotplugging its
big core, we would end up with an SMP cpuset and an asymmetric cpuset - the
static key must remain set, because we still have one asymmetric root domain.

With the above example, this could be done with:

  $ echo 0 > /sys/devices/system/cpu/cpu2/online

Which would result in:

    asym0   asym1
  [       ][    ]
   L  L  B  L  L

When both SMP and asymmetric cpusets are present, all CPUs will observe
sched_asym_cpucapacity being set (it is system-wide), but not all CPUs
observe asymmetry in their sched domain hierarchy:

  per_cpu(sd_asym_cpucapacity, <any CPU in asym0>) == <some SD at DIE level>
  per_cpu(sd_asym_cpucapacity, <any CPU in asym1>) == NULL

Change the simple key enablement to an increment, and decrement the key
counter when destroying domains that cover asymmetric CPUs.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hannes@cmpxchg.org
Cc: lizefan@huawei.com
Cc: morten.rasmussen@arm.com
Cc: qperret@google.com
Cc: tj@kernel.org
Cc: vincent.guittot@linaro.org
Fixes: df054e8445a4 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations")
Link: https://lkml.kernel.org/r/20191023153745.19515-3-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/topology.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 9318acf..49b835f 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2029,7 +2029,7 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
 	rcu_read_unlock();
 
 	if (has_asym)
-		static_branch_enable_cpuslocked(&sched_asym_cpucapacity);
+		static_branch_inc_cpuslocked(&sched_asym_cpucapacity);
 
 	if (rq && sched_debug_enabled) {
 		pr_info("root domain span: %*pbl (max cpu_capacity = %lu)\n",
@@ -2124,8 +2124,12 @@ int sched_init_domains(const struct cpumask *cpu_map)
  */
 static void detach_destroy_domains(const struct cpumask *cpu_map)
 {
+	unsigned int cpu = cpumask_any(cpu_map);
 	int i;
 
+	if (rcu_access_pointer(per_cpu(sd_asym_cpucapacity, cpu)))
+		static_branch_dec_cpuslocked(&sched_asym_cpucapacity);
+
 	rcu_read_lock();
 	for_each_cpu(i, cpu_map)
 		cpu_attach_domain(NULL, &def_root_domain, i);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/2] sched/topology: Don't try to build empty sched domains
  2019-10-23 15:37 ` [PATCH v4 1/2] sched/topology: Don't try to build empty sched domains Valentin Schneider
  2019-10-24 16:19   ` Dietmar Eggemann
  2019-10-29  9:52   ` [tip: sched/urgent] " tip-bot2 for Valentin Schneider
@ 2019-10-31 16:23   ` Michal Koutný
  2019-10-31 17:23     ` Valentin Schneider
  2 siblings, 1 reply; 10+ messages in thread
From: Michal Koutný @ 2019-10-31 16:23 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, cgroups, lizefan, tj, hannes, mingo, peterz,
	vincent.guittot, Dietmar.Eggemann, morten.rasmussen, qperret,
	stable

[-- Attachment #1: Type: text/plain, Size: 1154 bytes --]

On Wed, Oct 23, 2019 at 04:37:44PM +0100, Valentin Schneider <valentin.schneider@arm.com> wrote:
> Prevent generate_sched_domains() from returning empty cpumasks, and add
> some assertion in build_sched_domains() to scream bloody murder if it
> happens again.
Good catch. It makes sense to prune the empty domains in
generate_sched_domains already.

> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index c52bc91f882b..c87ee6412b36 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -798,7 +798,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
>  		    cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
>  			continue;
>  
> -		if (is_sched_load_balance(cp))
> +		if (is_sched_load_balance(cp) &&
> +		    !cpumask_empty(cp->effective_cpus))
>  			csa[csn++] = cp;
If I didn't overlook anything, cp->effective_cpus can contain CPUs
exluded by housekeeping_cpumask(HK_FLAG_DOMAIN) later, i.e. possibly
still returning domains with empty cpusets.

I'd suggest moving the emptiness check down into the loop where domain
cpumasks are ultimately constructed.

Michal

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/2] sched/topology: Don't try to build empty sched domains
  2019-10-31 16:23   ` [PATCH v4 1/2] " Michal Koutný
@ 2019-10-31 17:23     ` Valentin Schneider
  2019-11-01 10:08       ` Michal Koutný
  0 siblings, 1 reply; 10+ messages in thread
From: Valentin Schneider @ 2019-10-31 17:23 UTC (permalink / raw)
  To: Michal Koutný
  Cc: linux-kernel, cgroups, lizefan, tj, hannes, mingo, peterz,
	vincent.guittot, Dietmar.Eggemann, morten.rasmussen, qperret,
	stable

Hi Michal,

On 31/10/2019 17:23, Michal Koutný wrote:
> On Wed, Oct 23, 2019 at 04:37:44PM +0100, Valentin Schneider <valentin.schneider@arm.com> wrote:
>> Prevent generate_sched_domains() from returning empty cpumasks, and add
>> some assertion in build_sched_domains() to scream bloody murder if it
>> happens again.
> Good catch. It makes sense to prune the empty domains in
> generate_sched_domains already.
> 
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index c52bc91f882b..c87ee6412b36 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -798,7 +798,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
>>  		    cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
>>  			continue;
>>  
>> -		if (is_sched_load_balance(cp))
>> +		if (is_sched_load_balance(cp) &&
>> +		    !cpumask_empty(cp->effective_cpus))
>>  			csa[csn++] = cp;
> If I didn't overlook anything, cp->effective_cpus can contain CPUs
> exluded by housekeeping_cpumask(HK_FLAG_DOMAIN) later, i.e. possibly
> still returning domains with empty cpusets.
> 
> I'd suggest moving the emptiness check down into the loop where domain
> cpumasks are ultimately constructed.
> 

Ah, wasn't aware of this - thanks for having a look!

I think I need to have the check before the final cpumask gets built,
because at this point the cpumask array is already built and it's handed
off directly to the sched domain rebuild.

Do you reckon the following would work? 

----8<----
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index c87ee6412b36..e4c10785dc7c 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -798,8 +798,14 @@ static int generate_sched_domains(cpumask_var_t **domains,
 		    cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
 			continue;
 
+		/*
+		 * Skip cpusets that would lead to an empty sched domain.
+		 * That could be because effective_cpus is empty, or because
+		 * it's only spanning CPUs outside the housekeeping mask.
+		 */
 		if (is_sched_load_balance(cp) &&
-		    !cpumask_empty(cp->effective_cpus))
+		    cpumask_intersects(cp->effective_cpus,
+				       housekeeping_cpumask(HK_FLAG_DOMAIN)))
 			csa[csn++] = cp;
 
 		/* skip @cp's subtree if not a partition root */

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/2] sched/topology: Don't try to build empty sched domains
  2019-10-31 17:23     ` Valentin Schneider
@ 2019-11-01 10:08       ` Michal Koutný
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Koutný @ 2019-11-01 10:08 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, cgroups, lizefan, tj, hannes, mingo, peterz,
	vincent.guittot, Dietmar.Eggemann, morten.rasmussen, qperret,
	stable

[-- Attachment #1: Type: text/plain, Size: 268 bytes --]

On Thu, Oct 31, 2019 at 06:23:12PM +0100, Valentin Schneider <valentin.schneider@arm.com> wrote:
> Do you reckon the following would work? 
LGTM (i.e. cpuset will be skipped if no CPUs taking part in load
balancing remain in it after hot(un)plug event).

Michal

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-11-01 10:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-23 15:37 [PATCH v4 0/2] sched/topology: Asymmetric topologies fixes Valentin Schneider
2019-10-23 15:37 ` [PATCH v4 1/2] sched/topology: Don't try to build empty sched domains Valentin Schneider
2019-10-24 16:19   ` Dietmar Eggemann
2019-10-24 16:45     ` Valentin Schneider
2019-10-29  9:52   ` [tip: sched/urgent] " tip-bot2 for Valentin Schneider
2019-10-31 16:23   ` [PATCH v4 1/2] " Michal Koutný
2019-10-31 17:23     ` Valentin Schneider
2019-11-01 10:08       ` Michal Koutný
2019-10-23 15:37 ` [PATCH v4 2/2] sched/topology: Allow sched_asym_cpucapacity to be disabled Valentin Schneider
2019-10-29  9:52   ` [tip: sched/urgent] " tip-bot2 for Valentin Schneider

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).