* [PATCH] sched/topology: Assert non-NUMA topology masks don't (partially) overlap
@ 2020-01-15 16:09 Valentin Schneider
2020-01-16 10:44 ` Peter Zijlstra
2020-01-17 10:08 ` [tip: sched/core] " tip-bot2 for Valentin Schneider
0 siblings, 2 replies; 6+ messages in thread
From: Valentin Schneider @ 2020-01-15 16:09 UTC (permalink / raw)
To: linux-kernel
Cc: sudeep.holla, prime.zeng, dietmar.eggemann, morten.rasmussen,
peterz, mingo
topology.c::get_group() relies on the assumption that non-NUMA domains do
not partially overlap. Zeng Tao pointed out in [1] that such topology
descriptions, while completely bogus, can end up being exposed to the
scheduler.
In his example (8 CPUs, 2-node system), we end up with:
MC span for CPU3 == 3-7
MC span for CPU4 == 4-7
The first pass through get_group(3, sdd@MC) will result in the following
sched_group list:
3 -> 4 -> 5 -> 6 -> 7
^ /
`----------------'
And a later pass through get_group(4, sdd@MC) will "corrupt" that to:
3 -> 4 -> 5 -> 6 -> 7
^ /
`-----------'
which will completely break things like 'while (sg != sd->groups)' when
using CPU3's base sched_domain.
There already are some architecture-specific checks in place such as
x86/kernel/smpboot.c::topology.sane(), but this is something we can detect
in the core scheduler, so it seems worthwhile to do so.
Warn and abort the construction of the sched domains if such a broken
topology description is detected. Note that this is somewhat
expensive (O(t.c²), 't' non-NUMA topology levels and 'c' CPUs) and could be
gated under SCHED_DEBUG if deemed necessary.
Testing
=======
Dietmar managed to reproduce this using the following qemu incantation:
$ qemu-system-aarch64 -kernel ./Image -hda ./qemu-image-aarch64.img \
-append 'root=/dev/vda console=ttyAMA0 loglevel=8 sched_debug' -smp \
cores=8 --nographic -m 512 -cpu cortex-a53 -machine virt -numa \
node,cpus=0-2,nodeid=0 -numa node,cpus=3-7,nodeid=1
alongside the following drivers/base/arch_topology.c hack (AIUI wouldn't be
needed if '-smp cores=X, sockets=Y' would work with qemu):
8<---
@@ -465,6 +465,9 @@ void update_siblings_masks(unsigned int cpuid)
if (cpuid_topo->package_id != cpu_topo->package_id)
continue;
+ if ((cpu < 4 && cpuid > 3) || (cpu > 3 && cpuid < 4))
+ continue;
+
cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
cpumask_set_cpu(cpu, &cpuid_topo->core_sibling);
8<---
[1]: https://lkml.kernel.org/r/1577088979-8545-1-git-send-email-prime.zeng@hisilicon.com
Reported-by: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
A "less intrusive" alternative is to assert the sd->groups list doesn't get
re-written, which is a symptom of such bogus topologies. I've briefly
tested this, you can have a look at it here:
http://www.linux-arm.org/git?p=linux-vs.git;a=commit;h=e0ead72137332cbd3d69c9055ab29e6ffae5b37b
fetchable via:
git fetch git://linux-arm.org/linux-vs mainline/topology/non_numa_overlap_sg
---
kernel/sched/topology.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 6ec1e595b1d4..dfb64c08a407 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1879,6 +1879,42 @@ static struct sched_domain *build_sched_domain(struct sched_domain_topology_leve
return sd;
}
+/*
+ * Ensure topology masks are sane, i.e. there are no conflicts (overlaps) for
+ * any two given CPUs at this (non-NUMA) topology level.
+ */
+static bool topology_span_sane(struct sched_domain_topology_level *tl,
+ const struct cpumask *cpu_map, int cpu)
+{
+ int i;
+
+ /* NUMA levels are allowed to overlap */
+ if (tl->flags & SDTL_OVERLAP)
+ return true;
+
+ /*
+ * Non-NUMA levels cannot partially overlap - they must be either
+ * completely equal or completely disjoint. Otherwise we can end up
+ * breaking the sched_group lists - i.e. a later get_group() pass
+ * breaks the linking done for an earlier span.
+ */
+ for_each_cpu(i, cpu_map) {
+ if (i == cpu)
+ continue;
+ /*
+ * We should 'and' all those masks with 'cpu_map' to exactly
+ * match the topology we're about to build, but that can only
+ * remove CPUs, which only lessens our ability to detect
+ * overlaps
+ */
+ if (!cpumask_equal(tl->mask(cpu), tl->mask(i)) &&
+ cpumask_intersects(tl->mask(cpu), tl->mask(i)))
+ return false;
+ }
+
+ return true;
+}
+
/*
* Find the sched_domain_topology_level where all CPU capacities are visible
* for all CPUs.
@@ -1975,6 +2011,9 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
has_asym = true;
}
+ if (WARN_ON(!topology_span_sane(tl, cpu_map, i)))
+ goto error;
+
sd = build_sched_domain(tl, cpu_map, attr, sd, dflags, i);
if (tl == sched_domain_topology)
--
2.24.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] sched/topology: Assert non-NUMA topology masks don't (partially) overlap
2020-01-15 16:09 [PATCH] sched/topology: Assert non-NUMA topology masks don't (partially) overlap Valentin Schneider
@ 2020-01-16 10:44 ` Peter Zijlstra
2020-01-16 10:48 ` Valentin Schneider
2020-01-16 15:19 ` Peter Zijlstra
2020-01-17 10:08 ` [tip: sched/core] " tip-bot2 for Valentin Schneider
1 sibling, 2 replies; 6+ messages in thread
From: Peter Zijlstra @ 2020-01-16 10:44 UTC (permalink / raw)
To: Valentin Schneider
Cc: linux-kernel, sudeep.holla, prime.zeng, dietmar.eggemann,
morten.rasmussen, mingo
On Wed, Jan 15, 2020 at 04:09:15PM +0000, Valentin Schneider wrote:
> A "less intrusive" alternative is to assert the sd->groups list doesn't get
> re-written, which is a symptom of such bogus topologies. I've briefly
> tested this, you can have a look at it here:
>
> http://www.linux-arm.org/git?p=linux-vs.git;a=commit;h=e0ead72137332cbd3d69c9055ab29e6ffae5b37b
Something like that might still make sense. Can't never be too careful,
right ;-)
> kernel/sched/topology.c | 39 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 39 insertions(+)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 6ec1e595b1d4..dfb64c08a407 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1879,6 +1879,42 @@ static struct sched_domain *build_sched_domain(struct sched_domain_topology_leve
> return sd;
> }
>
> +/*
> + * Ensure topology masks are sane, i.e. there are no conflicts (overlaps) for
> + * any two given CPUs at this (non-NUMA) topology level.
> + */
> +static bool topology_span_sane(struct sched_domain_topology_level *tl,
> + const struct cpumask *cpu_map, int cpu)
> +{
> + int i;
> +
> + /* NUMA levels are allowed to overlap */
> + if (tl->flags & SDTL_OVERLAP)
> + return true;
> +
> + /*
> + * Non-NUMA levels cannot partially overlap - they must be either
> + * completely equal or completely disjoint. Otherwise we can end up
> + * breaking the sched_group lists - i.e. a later get_group() pass
> + * breaks the linking done for an earlier span.
> + */
> + for_each_cpu(i, cpu_map) {
> + if (i == cpu)
> + continue;
> + /*
> + * We should 'and' all those masks with 'cpu_map' to exactly
> + * match the topology we're about to build, but that can only
> + * remove CPUs, which only lessens our ability to detect
> + * overlaps
> + */
> + if (!cpumask_equal(tl->mask(cpu), tl->mask(i)) &&
> + cpumask_intersects(tl->mask(cpu), tl->mask(i)))
> + return false;
> + }
> +
> + return true;
> +}
> +
> /*
> * Find the sched_domain_topology_level where all CPU capacities are visible
> * for all CPUs.
> @@ -1975,6 +2011,9 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
> has_asym = true;
> }
>
> + if (WARN_ON(!topology_span_sane(tl, cpu_map, i)))
> + goto error;
> +
> sd = build_sched_domain(tl, cpu_map, attr, sd, dflags, i);
>
> if (tl == sched_domain_topology)
This is O(nr_cpus), but then, that function already is, so I don't see a
problem with this.
I'll take it, thanks!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] sched/topology: Assert non-NUMA topology masks don't (partially) overlap
2020-01-16 10:44 ` Peter Zijlstra
@ 2020-01-16 10:48 ` Valentin Schneider
2020-01-16 15:19 ` Peter Zijlstra
1 sibling, 0 replies; 6+ messages in thread
From: Valentin Schneider @ 2020-01-16 10:48 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, sudeep.holla, prime.zeng, dietmar.eggemann,
morten.rasmussen, mingo
On 16/01/2020 10:44, Peter Zijlstra wrote:
> On Wed, Jan 15, 2020 at 04:09:15PM +0000, Valentin Schneider wrote:
>> A "less intrusive" alternative is to assert the sd->groups list doesn't get
>> re-written, which is a symptom of such bogus topologies. I've briefly
>> tested this, you can have a look at it here:
>>
>> http://www.linux-arm.org/git?p=linux-vs.git;a=commit;h=e0ead72137332cbd3d69c9055ab29e6ffae5b37b
>
> Something like that might still make sense. Can't never be too careful,
> right ;-)
>
True. I'll ponder about it and see if I can't make it a bit neater
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] sched/topology: Assert non-NUMA topology masks don't (partially) overlap
2020-01-16 10:44 ` Peter Zijlstra
2020-01-16 10:48 ` Valentin Schneider
@ 2020-01-16 15:19 ` Peter Zijlstra
2020-01-16 15:30 ` Valentin Schneider
1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2020-01-16 15:19 UTC (permalink / raw)
To: Valentin Schneider
Cc: linux-kernel, sudeep.holla, prime.zeng, dietmar.eggemann,
morten.rasmussen, mingo
On Thu, Jan 16, 2020 at 11:44:28AM +0100, Peter Zijlstra wrote:
> On Wed, Jan 15, 2020 at 04:09:15PM +0000, Valentin Schneider wrote:
> > @@ -1975,6 +2011,9 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
> > has_asym = true;
> > }
> >
> > + if (WARN_ON(!topology_span_sane(tl, cpu_map, i)))
> > + goto error;
> > +
> > sd = build_sched_domain(tl, cpu_map, attr, sd, dflags, i);
> >
> > if (tl == sched_domain_topology)
>
> This is O(nr_cpus), but then, that function already is, so I don't see a
> problem with this.
Clearly I meant to write O(nr_cpus^2), there's a bunch of nested
for_each_cpu() in there.
> I'll take it, thanks!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] sched/topology: Assert non-NUMA topology masks don't (partially) overlap
2020-01-16 15:19 ` Peter Zijlstra
@ 2020-01-16 15:30 ` Valentin Schneider
0 siblings, 0 replies; 6+ messages in thread
From: Valentin Schneider @ 2020-01-16 15:30 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, sudeep.holla, prime.zeng, dietmar.eggemann,
morten.rasmussen, mingo
On 16/01/2020 15:19, Peter Zijlstra wrote:
> On Thu, Jan 16, 2020 at 11:44:28AM +0100, Peter Zijlstra wrote:
>> On Wed, Jan 15, 2020 at 04:09:15PM +0000, Valentin Schneider wrote:
>>> @@ -1975,6 +2011,9 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>>> has_asym = true;
>>> }
>>>
>>> + if (WARN_ON(!topology_span_sane(tl, cpu_map, i)))
>>> + goto error;
>>> +
>>> sd = build_sched_domain(tl, cpu_map, attr, sd, dflags, i);
>>>
>>> if (tl == sched_domain_topology)
>>
>> This is O(nr_cpus), but then, that function already is, so I don't see a
>> problem with this.
>
> Clearly I meant to write O(nr_cpus^2), there's a bunch of nested
> for_each_cpu() in there.
>
Hm so the sanity check is O(topo_levels * nr_cpus²), and I think that
roughly matches the complexity of building the sched_groups (I say
roughly because you go over less CPUs as you go up the topo levels).
As you said the important bit is that we don't make build_sched_domains()
noticeably worse, which I think is the case...
>> I'll take it, thanks!
^ permalink raw reply [flat|nested] 6+ messages in thread
* [tip: sched/core] sched/topology: Assert non-NUMA topology masks don't (partially) overlap
2020-01-15 16:09 [PATCH] sched/topology: Assert non-NUMA topology masks don't (partially) overlap Valentin Schneider
2020-01-16 10:44 ` Peter Zijlstra
@ 2020-01-17 10:08 ` tip-bot2 for Valentin Schneider
1 sibling, 0 replies; 6+ messages in thread
From: tip-bot2 for Valentin Schneider @ 2020-01-17 10:08 UTC (permalink / raw)
To: linux-tip-commits
Cc: Zeng Tao, Valentin Schneider, Peter Zijlstra (Intel), x86, LKML
The following commit has been merged into the sched/core branch of tip:
Commit-ID: ccf74128d66ce937876184ad55db2e0276af08d3
Gitweb: https://git.kernel.org/tip/ccf74128d66ce937876184ad55db2e0276af08d3
Author: Valentin Schneider <valentin.schneider@arm.com>
AuthorDate: Wed, 15 Jan 2020 16:09:15
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 17 Jan 2020 10:19:23 +01:00
sched/topology: Assert non-NUMA topology masks don't (partially) overlap
topology.c::get_group() relies on the assumption that non-NUMA domains do
not partially overlap. Zeng Tao pointed out in [1] that such topology
descriptions, while completely bogus, can end up being exposed to the
scheduler.
In his example (8 CPUs, 2-node system), we end up with:
MC span for CPU3 == 3-7
MC span for CPU4 == 4-7
The first pass through get_group(3, sdd@MC) will result in the following
sched_group list:
3 -> 4 -> 5 -> 6 -> 7
^ /
`----------------'
And a later pass through get_group(4, sdd@MC) will "corrupt" that to:
3 -> 4 -> 5 -> 6 -> 7
^ /
`-----------'
which will completely break things like 'while (sg != sd->groups)' when
using CPU3's base sched_domain.
There already are some architecture-specific checks in place such as
x86/kernel/smpboot.c::topology.sane(), but this is something we can detect
in the core scheduler, so it seems worthwhile to do so.
Warn and abort the construction of the sched domains if such a broken
topology description is detected. Note that this is somewhat
expensive (O(t.c²), 't' non-NUMA topology levels and 'c' CPUs) and could be
gated under SCHED_DEBUG if deemed necessary.
Testing
=======
Dietmar managed to reproduce this using the following qemu incantation:
$ qemu-system-aarch64 -kernel ./Image -hda ./qemu-image-aarch64.img \
-append 'root=/dev/vda console=ttyAMA0 loglevel=8 sched_debug' -smp \
cores=8 --nographic -m 512 -cpu cortex-a53 -machine virt -numa \
node,cpus=0-2,nodeid=0 -numa node,cpus=3-7,nodeid=1
alongside the following drivers/base/arch_topology.c hack (AIUI wouldn't be
needed if '-smp cores=X, sockets=Y' would work with qemu):
8<---
@@ -465,6 +465,9 @@ void update_siblings_masks(unsigned int cpuid)
if (cpuid_topo->package_id != cpu_topo->package_id)
continue;
+ if ((cpu < 4 && cpuid > 3) || (cpu > 3 && cpuid < 4))
+ continue;
+
cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
cpumask_set_cpu(cpu, &cpuid_topo->core_sibling);
8<---
[1]: https://lkml.kernel.org/r/1577088979-8545-1-git-send-email-prime.zeng@hisilicon.com
Reported-by: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200115160915.22575-1-valentin.schneider@arm.com
---
kernel/sched/topology.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 6ec1e59..dfb64c0 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1880,6 +1880,42 @@ static struct sched_domain *build_sched_domain(struct sched_domain_topology_leve
}
/*
+ * Ensure topology masks are sane, i.e. there are no conflicts (overlaps) for
+ * any two given CPUs at this (non-NUMA) topology level.
+ */
+static bool topology_span_sane(struct sched_domain_topology_level *tl,
+ const struct cpumask *cpu_map, int cpu)
+{
+ int i;
+
+ /* NUMA levels are allowed to overlap */
+ if (tl->flags & SDTL_OVERLAP)
+ return true;
+
+ /*
+ * Non-NUMA levels cannot partially overlap - they must be either
+ * completely equal or completely disjoint. Otherwise we can end up
+ * breaking the sched_group lists - i.e. a later get_group() pass
+ * breaks the linking done for an earlier span.
+ */
+ for_each_cpu(i, cpu_map) {
+ if (i == cpu)
+ continue;
+ /*
+ * We should 'and' all those masks with 'cpu_map' to exactly
+ * match the topology we're about to build, but that can only
+ * remove CPUs, which only lessens our ability to detect
+ * overlaps
+ */
+ if (!cpumask_equal(tl->mask(cpu), tl->mask(i)) &&
+ cpumask_intersects(tl->mask(cpu), tl->mask(i)))
+ return false;
+ }
+
+ return true;
+}
+
+/*
* Find the sched_domain_topology_level where all CPU capacities are visible
* for all CPUs.
*/
@@ -1975,6 +2011,9 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
has_asym = true;
}
+ if (WARN_ON(!topology_span_sane(tl, cpu_map, i)))
+ goto error;
+
sd = build_sched_domain(tl, cpu_map, attr, sd, dflags, i);
if (tl == sched_domain_topology)
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-01-17 10:09 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-15 16:09 [PATCH] sched/topology: Assert non-NUMA topology masks don't (partially) overlap Valentin Schneider
2020-01-16 10:44 ` Peter Zijlstra
2020-01-16 10:48 ` Valentin Schneider
2020-01-16 15:19 ` Peter Zijlstra
2020-01-16 15:30 ` Valentin Schneider
2020-01-17 10:08 ` [tip: sched/core] " tip-bot2 for Valentin Schneider
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).