* [PATCH] sched, cpumask: don't leak impossible cpus via for_each_cpu_wrap(). @ 2022-08-02 21:40 Neel Natu 2022-08-03 1:22 ` Yury Norov 0 siblings, 1 reply; 6+ messages in thread From: Neel Natu @ 2022-08-02 21:40 UTC (permalink / raw) To: Yury Norov, Andy Shevchenko, Rasmus Villemoes, Peter Zijlstra, linux-kernel Cc: Neel Natu The value of 'nr_cpumask_bits' is dependent on CONFIG_CPUMASK_OFFSTACK. This in turn can change the set of cpus visited by for_each_cpu_wrap() with a mask that has bits set in the range [nr_cpu_ids, NR_CPUS). Specifically on !CONFIG_CPUMASK_OFFSTACK kernels the API can iterate over cpus outside the 'cpu_possible_mask'. Fix this to make its behavior match for_each_cpu() which always limits the iteration to the range [0, nr_cpu_ids). Signed-off-by: Neel Natu <neelnatu@google.com> --- include/linux/cpumask.h | 2 +- lib/cpumask.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index fe29ac7cc469..2a308cfc43da 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -303,7 +303,7 @@ extern int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool */ #define for_each_cpu_wrap(cpu, mask, start) \ for ((cpu) = cpumask_next_wrap((start)-1, (mask), (start), false); \ - (cpu) < nr_cpumask_bits; \ + (cpu) < nr_cpu_ids; \ (cpu) = cpumask_next_wrap((cpu), (mask), (start), true)) /** diff --git a/lib/cpumask.c b/lib/cpumask.c index a971a82d2f43..d47937fb49eb 100644 --- a/lib/cpumask.c +++ b/lib/cpumask.c @@ -82,9 +82,9 @@ int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap) next = cpumask_next(n, mask); if (wrap && n < start && next >= start) { - return nr_cpumask_bits; + return nr_cpu_ids; - } else if (next >= nr_cpumask_bits) { + } else if (next >= nr_cpu_ids) { wrap = true; n = -1; goto again; -- 2.37.1.455.g008518b4e5-goog ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] sched, cpumask: don't leak impossible cpus via for_each_cpu_wrap(). 2022-08-02 21:40 [PATCH] sched, cpumask: don't leak impossible cpus via for_each_cpu_wrap() Neel Natu @ 2022-08-03 1:22 ` Yury Norov 2022-08-03 17:49 ` Neel Natu 0 siblings, 1 reply; 6+ messages in thread From: Yury Norov @ 2022-08-03 1:22 UTC (permalink / raw) To: Neel Natu; +Cc: Andy Shevchenko, Rasmus Villemoes, Peter Zijlstra, linux-kernel On Tue, Aug 2, 2022 at 2:41 PM Neel Natu <neelnatu@google.com> wrote: > > The value of 'nr_cpumask_bits' is dependent on CONFIG_CPUMASK_OFFSTACK. > This in turn can change the set of cpus visited by for_each_cpu_wrap() > with a mask that has bits set in the range [nr_cpu_ids, NR_CPUS). > > Specifically on !CONFIG_CPUMASK_OFFSTACK kernels the API can iterate > over cpus outside the 'cpu_possible_mask'. > > Fix this to make its behavior match for_each_cpu() which always limits > the iteration to the range [0, nr_cpu_ids). > > Signed-off-by: Neel Natu <neelnatu@google.com> The patch itself doesn't look correct because it randomly switches a piece of cpumask API from nr_cpumask_bits to nr_cpu_ids, and doesn't touch others. However... I don't know the story behind having 2 variables holding the max possible number of cpus, and it looks like it dates back to prehistoric times. In modern kernel, there are 2 cases where nr_cpumask_bits == nr_cpu_ids for sure: it's CONFIG_CPUMASK_OFFSTACK=y and CONFIG_HOTPLUG_CPU=y. At least one of those is enabled in defconfig of every popular architecture. In case of HOTPLUG is off, I don't understand why we should have nr_cpu_ids and nr_cpumask_bits different - what case should it cover?... Interestingly, in comments to cpumask functions and in the code those two are referred interchangeably. Even more interestingly, we have a function bitmap_setall() that sets all bits up to nr_cpumask_bits, and it could trigger the problem that you described, so that someone would complain. (Are there any other valid reasons to set bits behind nr_cpu_ids intentionally?) Can you share more details about how you triggered that? If you observe those bits set, something else is probably already wrong... So, if there is no real case in modern kernel to have nr_cpumask_bits and nr_cpu_ids different, the proper fix would be to just drop the first. If there is such a case, this is probably your case, and we'd know more details to understand how to deal with that. Thanks, Yury ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] sched, cpumask: don't leak impossible cpus via for_each_cpu_wrap(). 2022-08-03 1:22 ` Yury Norov @ 2022-08-03 17:49 ` Neel Natu 2022-08-03 18:52 ` Yury Norov 0 siblings, 1 reply; 6+ messages in thread From: Neel Natu @ 2022-08-03 17:49 UTC (permalink / raw) To: Yury Norov Cc: Andy Shevchenko, Rasmus Villemoes, Peter Zijlstra, linux-kernel On Tue, Aug 2, 2022 at 6:22 PM Yury Norov <yury.norov@gmail.com> wrote: > > On Tue, Aug 2, 2022 at 2:41 PM Neel Natu <neelnatu@google.com> wrote: > > > > The value of 'nr_cpumask_bits' is dependent on CONFIG_CPUMASK_OFFSTACK. > > This in turn can change the set of cpus visited by for_each_cpu_wrap() > > with a mask that has bits set in the range [nr_cpu_ids, NR_CPUS). > > > > Specifically on !CONFIG_CPUMASK_OFFSTACK kernels the API can iterate > > over cpus outside the 'cpu_possible_mask'. > > > > Fix this to make its behavior match for_each_cpu() which always limits > > the iteration to the range [0, nr_cpu_ids). > > > > Signed-off-by: Neel Natu <neelnatu@google.com> > > The patch itself doesn't look correct because it randomly switches a piece > of cpumask API from nr_cpumask_bits to nr_cpu_ids, and doesn't touch > others. > > However... > > I don't know the story behind having 2 variables holding the max possible > number of cpus, and it looks like it dates back to prehistoric times. In > modern kernel, there are 2 cases where nr_cpumask_bits == nr_cpu_ids > for sure: it's CONFIG_CPUMASK_OFFSTACK=y and > CONFIG_HOTPLUG_CPU=y. At least one of those is enabled in defconfig > of every popular architecture. > Hmm, in a kernel with CONFIG_HOTPLUG_CPU=y but not CONFIG_CPUMASK_OFFSTACK I see "nr_cpu_ids = 20, nr_cpumask_bits = 512". FYI since it doesn't match the observation above that nr_cpumask_bits == nr_cpu_ids when CONFIG_HOTPLUG_CPU=y. > In case of HOTPLUG is off, I don't understand why we should have nr_cpu_ids > and nr_cpumask_bits different - what case should it cover?... Interestingly, in > comments to cpumask functions and in the code those two are referred > interchangeably. > > Even more interestingly, we have a function bitmap_setall() that sets all bits > up to nr_cpumask_bits, and it could trigger the problem that you described, I think you mean cpumask_setall() that in turn calls bitmap_fill(nr_cpumask_bits)? > so that someone would complain. (Are there any other valid reasons to set > bits behind nr_cpu_ids intentionally?) > I don't know of any although this wasn't the case that trigger in my case. > Can you share more details about how you triggered that? If you observe > those bits set, something else is probably already wrong... The non-intuitive behavior of for_each_cpu_wrap() was triggered when iterating over a cpumask passed by userspace that set a bit in the [nr_cpu_ids, nr_cpumask_bits) range. The kernel code is out of tree but open source so happy to provide a pointer if needed. I considered ANDing the user supplied mask with 'cpu_possible_mask' but that felt like working around an inconsistency in the kernel API (hence the proposed fix). > So, if there is no real case in modern kernel to have nr_cpumask_bits and > nr_cpu_ids different, the proper fix would be to just drop the first. > I'll let other people more knowledgable than me in this area chime in. I'll be happy either way if that fixes the problem at hand. best Neel > If there is such a case, this is probably your case, and we'd know more > details to understand how to deal with that. > > Thanks, > Yury ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] sched, cpumask: don't leak impossible cpus via for_each_cpu_wrap(). 2022-08-03 17:49 ` Neel Natu @ 2022-08-03 18:52 ` Yury Norov 2022-08-03 21:21 ` Neel Natu 0 siblings, 1 reply; 6+ messages in thread From: Yury Norov @ 2022-08-03 18:52 UTC (permalink / raw) To: Neel Natu; +Cc: Andy Shevchenko, Rasmus Villemoes, Peter Zijlstra, linux-kernel On Wed, Aug 03, 2022 at 10:49:57AM -0700, Neel Natu wrote: > On Tue, Aug 2, 2022 at 6:22 PM Yury Norov <yury.norov@gmail.com> wrote: > > > > On Tue, Aug 2, 2022 at 2:41 PM Neel Natu <neelnatu@google.com> wrote: > > > > > > The value of 'nr_cpumask_bits' is dependent on CONFIG_CPUMASK_OFFSTACK. > > > This in turn can change the set of cpus visited by for_each_cpu_wrap() > > > with a mask that has bits set in the range [nr_cpu_ids, NR_CPUS). > > > > > > Specifically on !CONFIG_CPUMASK_OFFSTACK kernels the API can iterate > > > over cpus outside the 'cpu_possible_mask'. > > > > > > Fix this to make its behavior match for_each_cpu() which always limits > > > the iteration to the range [0, nr_cpu_ids). > > > > > > Signed-off-by: Neel Natu <neelnatu@google.com> > > > > The patch itself doesn't look correct because it randomly switches a piece > > of cpumask API from nr_cpumask_bits to nr_cpu_ids, and doesn't touch > > others. > > > > However... > > > > I don't know the story behind having 2 variables holding the max possible > > number of cpus, and it looks like it dates back to prehistoric times. In > > modern kernel, there are 2 cases where nr_cpumask_bits == nr_cpu_ids > > for sure: it's CONFIG_CPUMASK_OFFSTACK=y and > > CONFIG_HOTPLUG_CPU=y. At least one of those is enabled in defconfig > > of every popular architecture. > > > > Hmm, in a kernel with CONFIG_HOTPLUG_CPU=y but not CONFIG_CPUMASK_OFFSTACK > I see "nr_cpu_ids = 20, nr_cpumask_bits = 512". FYI since it doesn't > match the observation > above that nr_cpumask_bits == nr_cpu_ids when CONFIG_HOTPLUG_CPU=y. I said this because the comment in include/linux/cpumaks.h says: * If HOTPLUG is enabled, then cpu_possible_mask is forced to have * all NR_CPUS bits set, otherwise it is just the set of CPUs that * ACPI reports present at boot. And because of the code that sets nr_cpu_ids: void __init setup_nr_cpu_ids(void) { nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1; } Some arches override it, so it may be an issue. Are you running x86, or maybe you have "nr_cpus" boot parameter? But anyways, I feel like this should be investigated for more... Can you please share the config of your system and boot params? > > In case of HOTPLUG is off, I don't understand why we should have nr_cpu_ids > > and nr_cpumask_bits different - what case should it cover?... Interestingly, in > > comments to cpumask functions and in the code those two are referred > > interchangeably. > > > > Even more interestingly, we have a function bitmap_setall() that sets all bits > > up to nr_cpumask_bits, and it could trigger the problem that you described, > > I think you mean cpumask_setall() that in turn calls > bitmap_fill(nr_cpumask_bits)? Yes, sorry. > > so that someone would complain. (Are there any other valid reasons to set > > bits behind nr_cpu_ids intentionally?) > > > > I don't know of any although this wasn't the case that trigger in my case. > > > Can you share more details about how you triggered that? If you observe > > those bits set, something else is probably already wrong... > > The non-intuitive behavior of for_each_cpu_wrap() was triggered when iterating > over a cpumask passed by userspace that set a bit in the [nr_cpu_ids, > nr_cpumask_bits) > range. If you receive bitmap from userspace, you need to copy it with bitmap_copy_clear_tail(), or bitmap_from_arr{32,64}, as appropriate. That will clear unneeded bits. For user-to-kernel communications, I'd recommend passing bitmaps in a human-readable formats - hex string or bitmap list. Check the functions cpumask_parse_user() and cpumask_parselist_user(). This would help to avoid all possible issues related to endianness and 32/64 word length mismatch. > The kernel code is out of tree but open source so happy to provide a > pointer if needed. Yes please > I considered ANDing the user supplied mask with 'cpu_possible_mask' > but that felt > like working around an inconsistency in the kernel API (hence the proposed fix). > > > So, if there is no real case in modern kernel to have nr_cpumask_bits and > > nr_cpu_ids different, the proper fix would be to just drop the first. > > > > I'll let other people more knowledgable than me in this area chime in. > I'll be happy either > way if that fixes the problem at hand. > > best > Neel > > > If there is such a case, this is probably your case, and we'd know more > > details to understand how to deal with that. > > > > Thanks, > > Yury ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] sched, cpumask: don't leak impossible cpus via for_each_cpu_wrap(). 2022-08-03 18:52 ` Yury Norov @ 2022-08-03 21:21 ` Neel Natu 2022-08-03 23:51 ` Yury Norov 0 siblings, 1 reply; 6+ messages in thread From: Neel Natu @ 2022-08-03 21:21 UTC (permalink / raw) To: Yury Norov Cc: Andy Shevchenko, Rasmus Villemoes, Peter Zijlstra, linux-kernel On Wed, Aug 3, 2022 at 11:52 AM Yury Norov <yury.norov@gmail.com> wrote: > > On Wed, Aug 03, 2022 at 10:49:57AM -0700, Neel Natu wrote: > > On Tue, Aug 2, 2022 at 6:22 PM Yury Norov <yury.norov@gmail.com> wrote: > > > > > > On Tue, Aug 2, 2022 at 2:41 PM Neel Natu <neelnatu@google.com> wrote: > > > > > > > > The value of 'nr_cpumask_bits' is dependent on CONFIG_CPUMASK_OFFSTACK. > > > > This in turn can change the set of cpus visited by for_each_cpu_wrap() > > > > with a mask that has bits set in the range [nr_cpu_ids, NR_CPUS). > > > > > > > > Specifically on !CONFIG_CPUMASK_OFFSTACK kernels the API can iterate > > > > over cpus outside the 'cpu_possible_mask'. > > > > > > > > Fix this to make its behavior match for_each_cpu() which always limits > > > > the iteration to the range [0, nr_cpu_ids). > > > > > > > > Signed-off-by: Neel Natu <neelnatu@google.com> > > > > > > The patch itself doesn't look correct because it randomly switches a piece > > > of cpumask API from nr_cpumask_bits to nr_cpu_ids, and doesn't touch > > > others. > > > > > > However... > > > > > > I don't know the story behind having 2 variables holding the max possible > > > number of cpus, and it looks like it dates back to prehistoric times. In > > > modern kernel, there are 2 cases where nr_cpumask_bits == nr_cpu_ids > > > for sure: it's CONFIG_CPUMASK_OFFSTACK=y and > > > CONFIG_HOTPLUG_CPU=y. At least one of those is enabled in defconfig > > > of every popular architecture. > > > > > > > Hmm, in a kernel with CONFIG_HOTPLUG_CPU=y but not CONFIG_CPUMASK_OFFSTACK > > I see "nr_cpu_ids = 20, nr_cpumask_bits = 512". FYI since it doesn't > > match the observation > > above that nr_cpumask_bits == nr_cpu_ids when CONFIG_HOTPLUG_CPU=y. > > I said this because the comment in include/linux/cpumaks.h says: > * If HOTPLUG is enabled, then cpu_possible_mask is forced to have > * all NR_CPUS bits set, otherwise it is just the set of CPUs that > * ACPI reports present at boot. > > And because of the code that sets nr_cpu_ids: > > void __init setup_nr_cpu_ids(void) > { > nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1; > } > > Some arches override it, so it may be an issue. Are you running x86, > or maybe you have "nr_cpus" boot parameter? > > But anyways, I feel like this should be investigated for more... Can you > please share the config of your system and boot params? > Yeah, this is a stock defconfig compiled on an x86_64 host and booted inside a 20 vcpu virtual machine (virtme). There are no command line params to modify the number of cpus. I think everything is working as expected based on some debug prints I added during boot: [ 0.641798] smp: setup_nr_cpu_ids: nr_cpu_ids 20, cpu_possible_mask 0-19 [ 0.648424] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:20 nr_node_ids:2 The first one is from setup_nr_cpu_ids() in kernel/smp.c. The second one is from setup_per_cpu_areas() from arch/x86/setup_percpu.c. > > > In case of HOTPLUG is off, I don't understand why we should have nr_cpu_ids > > > and nr_cpumask_bits different - what case should it cover?... Interestingly, in > > > comments to cpumask functions and in the code those two are referred > > > interchangeably. > > > > > > Even more interestingly, we have a function bitmap_setall() that sets all bits > > > up to nr_cpumask_bits, and it could trigger the problem that you described, > > > > I think you mean cpumask_setall() that in turn calls > > bitmap_fill(nr_cpumask_bits)? > > Yes, sorry. > > > > so that someone would complain. (Are there any other valid reasons to set > > > bits behind nr_cpu_ids intentionally?) > > > > > > > I don't know of any although this wasn't the case that trigger in my case. > > > > > Can you share more details about how you triggered that? If you observe > > > those bits set, something else is probably already wrong... > > > > The non-intuitive behavior of for_each_cpu_wrap() was triggered when iterating > > over a cpumask passed by userspace that set a bit in the [nr_cpu_ids, > > nr_cpumask_bits) > > range. > > If you receive bitmap from userspace, you need to copy it with > bitmap_copy_clear_tail(), or bitmap_from_arr{32,64}, as appropriate. > That will clear unneeded bits. > > For user-to-kernel communications, I'd recommend passing bitmaps in a > human-readable formats - hex string or bitmap list. Check the functions > cpumask_parse_user() and cpumask_parselist_user(). This would help to > avoid all possible issues related to endianness and 32/64 word length > mismatch. > > > The kernel code is out of tree but open source so happy to provide a > > pointer if needed. > > Yes please > Here is where we copy the user supplied cpumask using the get_user_cpu_mask() helper: https://github.com/google/ghost-kernel/blob/c21b36f87663efa2189876b2caa701fe74e72adf/kernel/sched/ghost.c#L5729 For performance reasons we cannot use human readable cpu masks in this code path. Note that the helper copies up to 'nr_cpumask_bits' which in some kernels (!CONFIG_CPUMASK_OFFSTACK) can copy bits beyond 'nr_cpu_ids'. A possible option could be to fix this helper but I do feel that for_each_cpu() and for_each_cpu_wrap() should visit the same set of cpus given the same cpumask (ordering can be different but the set of cpus should remain the same). What do you think? best Neel > > I considered ANDing the user supplied mask with 'cpu_possible_mask' > > but that felt > > like working around an inconsistency in the kernel API (hence the proposed fix). > > > > > So, if there is no real case in modern kernel to have nr_cpumask_bits and > > > nr_cpu_ids different, the proper fix would be to just drop the first. > > > > > > > I'll let other people more knowledgable than me in this area chime in. > > I'll be happy either > > way if that fixes the problem at hand. > > > > best > > Neel > > > > > If there is such a case, this is probably your case, and we'd know more > > > details to understand how to deal with that. > > > > > > Thanks, > > > Yury ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] sched, cpumask: don't leak impossible cpus via for_each_cpu_wrap(). 2022-08-03 21:21 ` Neel Natu @ 2022-08-03 23:51 ` Yury Norov 0 siblings, 0 replies; 6+ messages in thread From: Yury Norov @ 2022-08-03 23:51 UTC (permalink / raw) To: Neel Natu Cc: Andy Shevchenko, Rasmus Villemoes, Peter Zijlstra, linux-kernel, Thomas Gleixner, Andy Lutomirski + Andy Lutomirski <luto@kernel.org> + Thomas Gleixner <tglx@linutronix.de> On Wed, Aug 03, 2022 at 02:21:46PM -0700, Neel Natu wrote: > ject: Re: [PATCH] sched, cpumask: don't leak impossible cpus via for_each_cpu_wrap(). > To: Yury Norov <yury.norov@gmail.com> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>, > Rasmus Villemoes <linux@rasmusvillemoes.dk>, Peter Zijlstra <peterz@infradead.org>, > linux-kernel@vger.kernel.org > Content-Type: text/plain; charset="UTF-8" > Status: O > Content-Length: 6037 > Lines: 152 > > On Wed, Aug 3, 2022 at 11:52 AM Yury Norov <yury.norov@gmail.com> wrote: > > > > On Wed, Aug 03, 2022 at 10:49:57AM -0700, Neel Natu wrote: > > > On Tue, Aug 2, 2022 at 6:22 PM Yury Norov <yury.norov@gmail.com> wrote: > > > > > > > > On Tue, Aug 2, 2022 at 2:41 PM Neel Natu <neelnatu@google.com> wrote: > > > > > > > > > > The value of 'nr_cpumask_bits' is dependent on CONFIG_CPUMASK_OFFSTACK. > > > > > This in turn can change the set of cpus visited by for_each_cpu_wrap() > > > > > with a mask that has bits set in the range [nr_cpu_ids, NR_CPUS). > > > > > > > > > > Specifically on !CONFIG_CPUMASK_OFFSTACK kernels the API can iterate > > > > > over cpus outside the 'cpu_possible_mask'. > > > > > > > > > > Fix this to make its behavior match for_each_cpu() which always limits > > > > > the iteration to the range [0, nr_cpu_ids). > > > > > > > > > > Signed-off-by: Neel Natu <neelnatu@google.com> > > > > > > > > The patch itself doesn't look correct because it randomly switches a piece > > > > of cpumask API from nr_cpumask_bits to nr_cpu_ids, and doesn't touch > > > > others. > > > > > > > > However... > > > > > > > > I don't know the story behind having 2 variables holding the max possible > > > > number of cpus, and it looks like it dates back to prehistoric times. In > > > > modern kernel, there are 2 cases where nr_cpumask_bits == nr_cpu_ids > > > > for sure: it's CONFIG_CPUMASK_OFFSTACK=y and > > > > CONFIG_HOTPLUG_CPU=y. At least one of those is enabled in defconfig > > > > of every popular architecture. > > > > > > > > > > Hmm, in a kernel with CONFIG_HOTPLUG_CPU=y but not CONFIG_CPUMASK_OFFSTACK > > > I see "nr_cpu_ids = 20, nr_cpumask_bits = 512". FYI since it doesn't > > > match the observation > > > above that nr_cpumask_bits == nr_cpu_ids when CONFIG_HOTPLUG_CPU=y. > > > > I said this because the comment in include/linux/cpumaks.h says: > > * If HOTPLUG is enabled, then cpu_possible_mask is forced to have > > * all NR_CPUS bits set, otherwise it is just the set of CPUs that > > * ACPI reports present at boot. > > > > And because of the code that sets nr_cpu_ids: > > > > void __init setup_nr_cpu_ids(void) > > { > > nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1; > > } > > > > Some arches override it, so it may be an issue. Are you running x86, > > or maybe you have "nr_cpus" boot parameter? > > > > But anyways, I feel like this should be investigated for more... Can you > > please share the config of your system and boot params? > > > > Yeah, this is a stock defconfig compiled on an x86_64 host and booted > inside a 20 vcpu virtual machine (virtme). There are no command line > params to modify the number of cpus. > > I think everything is working as expected based on some debug prints I > added during boot: > [ 0.641798] smp: setup_nr_cpu_ids: nr_cpu_ids 20, cpu_possible_mask 0-19 > [ 0.648424] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 > nr_cpu_ids:20 nr_node_ids:2 > > The first one is from setup_nr_cpu_ids() in kernel/smp.c. The second > one is from setup_per_cpu_areas() from arch/x86/setup_percpu.c. So it doesn't look correct. When HOTPLUG is ON, than cpu_possible_mask must be 0-NR_CPUS, as it's mentioned in include/linux/cpumaks.h, because user may add up to NR_CPUS... Adding Thomas, Peter and Andy. Guys, can you please clarify: 1. What for do we have both nr_cpu_ids and nr_cpumask_bits variables? In case of CPUMSAK_OFFSTACK, nr_cpumask_bits is aliased to nr_cpu_ids, and in case of HOTPLUG both should be equal to NR_CPUS. In HOTPLUG=off, everything is set up at boot time and never changes, so again - why do we need 2 variables? 2. Neel observes that with HOTPLUG_CPU=y, nr_cpu_ids differs from nr_cpumask_bits because cpu_possible_mask is not set completely, despite the comment in cpumask.h: * If HOTPLUG is enabled, then cpu_possible_mask is forced to have * all NR_CPUS bits set, otherwise it is just the set of CPUs that * ACPI reports present at boot. Is that a bug, or just incorrect comment? If not a bug, what code increases nr_cpu_ids when a cpu goes up? From what I see, nr_cpu_ids is set only once in start_kernel(), and then never changes. > > > > In case of HOTPLUG is off, I don't understand why we should have nr_cpu_ids > > > > and nr_cpumask_bits different - what case should it cover?... Interestingly, in > > > > comments to cpumask functions and in the code those two are referred > > > > interchangeably. > > > > > > > > Even more interestingly, we have a function bitmap_setall() that sets all bits > > > > up to nr_cpumask_bits, and it could trigger the problem that you described, > > > > > > I think you mean cpumask_setall() that in turn calls > > > bitmap_fill(nr_cpumask_bits)? > > > > Yes, sorry. > > > > > > so that someone would complain. (Are there any other valid reasons to set > > > > bits behind nr_cpu_ids intentionally?) > > > > > > > > > > I don't know of any although this wasn't the case that trigger in my case. > > > > > > > Can you share more details about how you triggered that? If you observe > > > > those bits set, something else is probably already wrong... > > > > > > The non-intuitive behavior of for_each_cpu_wrap() was triggered when iterating > > > over a cpumask passed by userspace that set a bit in the [nr_cpu_ids, > > > nr_cpumask_bits) > > > range. > > > > If you receive bitmap from userspace, you need to copy it with > > bitmap_copy_clear_tail(), or bitmap_from_arr{32,64}, as appropriate. > > That will clear unneeded bits. > > > > For user-to-kernel communications, I'd recommend passing bitmaps in a > > human-readable formats - hex string or bitmap list. Check the functions > > cpumask_parse_user() and cpumask_parselist_user(). This would help to > > avoid all possible issues related to endianness and 32/64 word length > > mismatch. > > > > > The kernel code is out of tree but open source so happy to provide a > > > pointer if needed. > > > > Yes please > > > > Here is where we copy the user supplied cpumask using the > get_user_cpu_mask() helper: > https://github.com/google/ghost-kernel/blob/c21b36f87663efa2189876b2caa701fe74e72adf/kernel/sched/ghost.c#L5729 > > For performance reasons we cannot use human readable cpu masks in this > code path. If nr_cpu_ids == 20, I can't imagine a case where copying 4 bytes string to bitmap_parse() instead of 3 bytes of plain bitmap would affect your performance. Even if you're going to scale, it's hard to believe that code that calls alloc() functions 3 times without GFP_ATOMIC is so critical... I didn't look at the code deeply, but it looks like it sends many IPIs, which also doesn't look deterministic in terms of high-perf loads. > Note that the helper copies up to 'nr_cpumask_bits' which in some > kernels (!CONFIG_CPUMASK_OFFSTACK) can copy bits beyond 'nr_cpu_ids'. > A possible option could be to fix this helper but I do feel that > for_each_cpu() and for_each_cpu_wrap() should visit the same set of > cpus given the same cpumask (ordering can be different but the set of > cpus should remain the same). > > What do you think? I think that for_each_cpu{,_wrap} is correct here because you have HOTPLUG=y, and any cpu may be present. Let's see what Peter, Andy and Tomas will say. I have a feeling that we need to drop nr_cpumask_bits, and always rely on nr_cpu_ids. For your problem, because you are passed with the mask from userspace, you shouldn't rely on it anyways and need some sanity checks. Maybe like this: if (!cpumask_subset(user_mask, cpu_possible_mask)) return -EINVAL; Thanks, Yury > best > Neel > > > > I considered ANDing the user supplied mask with 'cpu_possible_mask' > > > but that felt > > > like working around an inconsistency in the kernel API (hence the proposed fix). > > > > > > > So, if there is no real case in modern kernel to have nr_cpumask_bits and > > > > nr_cpu_ids different, the ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-08-03 23:51 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-02 21:40 [PATCH] sched, cpumask: don't leak impossible cpus via for_each_cpu_wrap() Neel Natu 2022-08-03 1:22 ` Yury Norov 2022-08-03 17:49 ` Neel Natu 2022-08-03 18:52 ` Yury Norov 2022-08-03 21:21 ` Neel Natu 2022-08-03 23:51 ` Yury Norov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.