linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero
@ 2015-12-03 22:24 Steven Rostedt
  2015-12-04  1:35 ` Rusty Russell
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2015-12-03 22:24 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Peter Zijlstra, Andrew Morton, Rusty Russell,
	Sergey Senozhatsky, Xunlei Pang

Xunlei Pang reported a bug in the scheduler code when
CONFIG_CPUMASK_OFFSTACK is set, several of the cpumasks used by the
root domains can contain garbage. The code does the following:

        memset(rd, 0, sizeof(*rd));

        if (!alloc_cpumask_var(&rd->span, GFP_KERNEL))
                goto out;
        if (!alloc_cpumask_var(&rd->online, GFP_KERNEL))
                goto free_span;
        if (!alloc_cpumask_var(&rd->dlo_mask, GFP_KERNEL))
                goto free_online;
        if (!alloc_cpumask_var(&rd->rto_mask, GFP_KERNEL))
                goto free_dlo_mask;

When CONFIG_CPUMASK_OFFSTACK is not defined, the four cpumasks are part
of the 'rd' structure, and the memset() will zero them all out. But
when CONFIG_CPUMASK_OFFSTACK is enabled, those cpumasks are no longer
set by the memset() but are allocated independently. That allocation
may contain garbage.

In order to make alloc_cpumask_var() and friends behave the same with
respect to being zero or not whether or not CONFIG_CPUMASK_OFFSTACK is
defined, a check is made to the contents of the mask pointer. If the
contents of the mask pointer is zero, it is assumed that the value was
zeroed out before and __GFP_ZERO is added to the flags for allocation
to make the returned cpumasks already zeroed.

Calls to alloc_cpumask_var() are not done in performance critical
paths, and even if they are, zeroing them out shouldn't add much
overhead to it. The up side to this change is that we remove subtle
bugs when enabling CONFIG_CPUMASK_OFFSTACK with cpumask logic that
worked fined when that config was not enabled.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 5a70f6196f57..c0d68752a8b9 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -60,6 +60,19 @@ int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
  */
 bool alloc_cpumask_var_node(cpumask_var_t *mask, gfp_t flags, int node)
 {
+	/*
+	 * When CONFIG_CPUMASK_OFFSTACK is not set, the cpumask may
+	 * be zeroed by a memset of the structure that contains the
+	 * mask. But if CONFIG_CPUMASK_OFFSTACK is then enabled,
+	 * the mask may end up containing garbage. By checking
+	 * if the pointer of the mask is already zero, we can assume
+	 * that the mask itself should be allocated to contain all
+	 * zeros as well. This will prevent subtle bugs by the
+	 * inconsistency of the config being set or not.
+	 */
+	if ((long)*mask == 0)
+		flags |= __GFP_ZERO;
+
 	*mask = kmalloc_node(cpumask_size(), flags, node);
 
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero
  2015-12-03 22:24 [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero Steven Rostedt
@ 2015-12-04  1:35 ` Rusty Russell
  2015-12-04  2:37   ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Rusty Russell @ 2015-12-04  1:35 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Ingo Molnar, Peter Zijlstra, Andrew Morton, Sergey Senozhatsky,
	Xunlei Pang

Steven Rostedt <rostedt@goodmis.org> writes:
> Xunlei Pang reported a bug in the scheduler code when
> CONFIG_CPUMASK_OFFSTACK is set, several of the cpumasks used by the
> root domains can contain garbage. The code does the following:
>
>         memset(rd, 0, sizeof(*rd));
>
>         if (!alloc_cpumask_var(&rd->span, GFP_KERNEL))
>                 goto out;
>         if (!alloc_cpumask_var(&rd->online, GFP_KERNEL))
>                 goto free_span;
>         if (!alloc_cpumask_var(&rd->dlo_mask, GFP_KERNEL))
>                 goto free_online;
>         if (!alloc_cpumask_var(&rd->rto_mask, GFP_KERNEL))
>                 goto free_dlo_mask;
>
> When CONFIG_CPUMASK_OFFSTACK is not defined, the four cpumasks are part
> of the 'rd' structure, and the memset() will zero them all out. But
> when CONFIG_CPUMASK_OFFSTACK is enabled, those cpumasks are no longer
> set by the memset() but are allocated independently. That allocation
> may contain garbage.
>
> In order to make alloc_cpumask_var() and friends behave the same with
> respect to being zero or not whether or not CONFIG_CPUMASK_OFFSTACK is
> defined, a check is made to the contents of the mask pointer. If the
> contents of the mask pointer is zero, it is assumed that the value was
> zeroed out before and __GFP_ZERO is added to the flags for allocation
> to make the returned cpumasks already zeroed.
>
> Calls to alloc_cpumask_var() are not done in performance critical
> paths, and even if they are, zeroing them out shouldn't add much
> overhead to it. The up side to this change is that we remove subtle
> bugs when enabling CONFIG_CPUMASK_OFFSTACK with cpumask logic that
> worked fined when that config was not enabled.

This is clever, but I would advise against such subtle code.  We will
never be able to remove this code once it is in.

Would suggest making the non-CPUMASK_OFFSTACK stubs write garbage into
the cpumasks instead, iff !(flags & __GFP_ZERO).

Cheers,
Rusty.




>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
> diff --git a/lib/cpumask.c b/lib/cpumask.c
> index 5a70f6196f57..c0d68752a8b9 100644
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -60,6 +60,19 @@ int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
>   */
>  bool alloc_cpumask_var_node(cpumask_var_t *mask, gfp_t flags, int node)
>  {
> +	/*
> +	 * When CONFIG_CPUMASK_OFFSTACK is not set, the cpumask may
> +	 * be zeroed by a memset of the structure that contains the
> +	 * mask. But if CONFIG_CPUMASK_OFFSTACK is then enabled,
> +	 * the mask may end up containing garbage. By checking
> +	 * if the pointer of the mask is already zero, we can assume
> +	 * that the mask itself should be allocated to contain all
> +	 * zeros as well. This will prevent subtle bugs by the
> +	 * inconsistency of the config being set or not.
> +	 */
> +	if ((long)*mask == 0)
> +		flags |= __GFP_ZERO;
> +
>  	*mask = kmalloc_node(cpumask_size(), flags, node);
>  
>  #ifdef CONFIG_DEBUG_PER_CPU_MAPS

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero
  2015-12-04  1:35 ` Rusty Russell
@ 2015-12-04  2:37   ` Steven Rostedt
  2015-12-04  7:34     ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2015-12-04  2:37 UTC (permalink / raw)
  To: Rusty Russell
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Xunlei Pang

On Fri, 04 Dec 2015 12:05:12 +1030
Rusty Russell <rusty@rustcorp.com.au> wrote:

> This is clever, but I would advise against such subtle code.  We will
> never be able to remove this code once it is in.
> 
> Would suggest making the non-CPUMASK_OFFSTACK stubs write garbage into
> the cpumasks instead, iff !(flags & __GFP_ZERO).
> 
>
I actually thought of the same thing, but thought it was a bit harsh.
If others think that's a better solution, then I'll submit a patch to
do that.

-- Steve


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero
  2015-12-04  2:37   ` Steven Rostedt
@ 2015-12-04  7:34     ` Ingo Molnar
  2015-12-04 20:30       ` Rusty Russell
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2015-12-04  7:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Rusty Russell, LKML, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Xunlei Pang


* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 04 Dec 2015 12:05:12 +1030
> Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > This is clever, but I would advise against such subtle code.  We will never be 
> > able to remove this code once it is in.
> > 
> > Would suggest making the non-CPUMASK_OFFSTACK stubs write garbage into the 
> > cpumasks instead, iff !(flags & __GFP_ZERO).
>
> I actually thought of the same thing, but thought it was a bit harsh. If others 
> think that's a better solution, then I'll submit a patch to do that.

That just makes things more fragile - 'garbage' will spread the breakage, and if 
the breakage is subtle, it will spread subtle breakage.

So why not use a kzmalloc_node() [equivalent] call instead of kmalloc_node(), to 
make sure it's all zeroed instead of uninitialized?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero
  2015-12-04  7:34     ` Ingo Molnar
@ 2015-12-04 20:30       ` Rusty Russell
  2015-12-06 17:29         ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Rusty Russell @ 2015-12-04 20:30 UTC (permalink / raw)
  To: Ingo Molnar, Steven Rostedt
  Cc: LKML, Peter Zijlstra, Andrew Morton, Sergey Senozhatsky, Xunlei Pang

Ingo Molnar <mingo@kernel.org> writes:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> On Fri, 04 Dec 2015 12:05:12 +1030
>> Rusty Russell <rusty@rustcorp.com.au> wrote:
>> 
>> > This is clever, but I would advise against such subtle code.  We will never be 
>> > able to remove this code once it is in.
>> > 
>> > Would suggest making the non-CPUMASK_OFFSTACK stubs write garbage into the 
>> > cpumasks instead, iff !(flags & __GFP_ZERO).
>>
>> I actually thought of the same thing, but thought it was a bit harsh. If others 
>> think that's a better solution, then I'll submit a patch to do that.
>
> That just makes things more fragile - 'garbage' will spread the breakage, and if 
> the breakage is subtle, it will spread subtle breakage.
>
> So why not use a kzmalloc_node() [equivalent] call instead of kmalloc_node(), to 
> make sure it's all zeroed instead of uninitialized?

OTOH, why not make *every* kmalloc a kzmalloc?

The issue here is not that the issue is subtle (not using a zeroing
allocator is a pretty clear bug), it's that it's papered over by the
normal config.

If we had a config option already to garbage-fill allocations, it'd be a
simple solution.

I don't think there are great answers here.  But adding more subtle
zeroing semantics feels wrong, even if it will mostly Just Work.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero
  2015-12-04 20:30       ` Rusty Russell
@ 2015-12-06 17:29         ` Ingo Molnar
  2015-12-07  1:56           ` Rusty Russell
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2015-12-06 17:29 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Steven Rostedt, LKML, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Xunlei Pang


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> Ingo Molnar <mingo@kernel.org> writes:
> > * Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> >> On Fri, 04 Dec 2015 12:05:12 +1030
> >> Rusty Russell <rusty@rustcorp.com.au> wrote:
> >> 
> >> > This is clever, but I would advise against such subtle code.  We will never be 
> >> > able to remove this code once it is in.
> >> > 
> >> > Would suggest making the non-CPUMASK_OFFSTACK stubs write garbage into the 
> >> > cpumasks instead, iff !(flags & __GFP_ZERO).
> >>
> >> I actually thought of the same thing, but thought it was a bit harsh. If others 
> >> think that's a better solution, then I'll submit a patch to do that.
> >
> > That just makes things more fragile - 'garbage' will spread the breakage, and if 
> > the breakage is subtle, it will spread subtle breakage.
> >
> > So why not use a kzmalloc_node() [equivalent] call instead of kmalloc_node(), to 
> > make sure it's all zeroed instead of uninitialized?
> 
> OTOH, why not make *every* kmalloc a kzmalloc?

The big difference to alloc_cpumask_var_node() is that kmalloc() is well-defined 
in the sense that it will return uninitialized buffers (sometimes even poisoned 
ones), all the time.

But alloc_cpumask_var_node() will return a zeroed cpumask 99.9% of the time when 
the kernel being run is using on-stack cpumasks. So it's very easy to not 
initialize and not discover it for extended periods of time.

As it happened here, and as was fixed with the patch. Hence my suggestion.

> The issue here is not that the issue is subtle (not using a zeroing allocator is 
> a pretty clear bug), it's that it's papered over by the normal config.

Exactly.

> If we had a config option already to garbage-fill allocations, it'd be a simple 
> solution.
> 
> I don't think there are great answers here.  But adding more subtle zeroing 
> semantics feels wrong, even if it will mostly Just Work.

It's not subtle if the naming clearly reflects it (hence my suggestion to rename 
the API) - and the status quo for on-stack allocations is zeroing anyway, so it's 
not a big jump...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero
  2015-12-06 17:29         ` Ingo Molnar
@ 2015-12-07  1:56           ` Rusty Russell
  2015-12-07  8:23             ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Rusty Russell @ 2015-12-07  1:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, LKML, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Xunlei Pang

Ingo Molnar <mingo@kernel.org> writes:
> * Rusty Russell <rusty@rustcorp.com.au> wrote:
>> I don't think there are great answers here.  But adding more subtle zeroing 
>> semantics feels wrong, even if it will mostly Just Work.
>
> It's not subtle if the naming clearly reflects it (hence my suggestion to rename 
> the API) - and the status quo for on-stack allocations is zeroing anyway, so it's 
> not a big jump...

True, but we already have zalloc_cpumask_var() though if we want that?

It probably makes sense to just switch everyone to that and get rid of
the non-z one?

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero
  2015-12-07  1:56           ` Rusty Russell
@ 2015-12-07  8:23             ` Ingo Molnar
  0 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2015-12-07  8:23 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Steven Rostedt, LKML, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Xunlei Pang


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> Ingo Molnar <mingo@kernel.org> writes:
> > * Rusty Russell <rusty@rustcorp.com.au> wrote:
> >> I don't think there are great answers here.  But adding more subtle zeroing 
> >> semantics feels wrong, even if it will mostly Just Work.
> >
> > It's not subtle if the naming clearly reflects it (hence my suggestion to rename 
> > the API) - and the status quo for on-stack allocations is zeroing anyway, so it's 
> > not a big jump...
> 
> True, but we already have zalloc_cpumask_var() though if we want that?

Indeed, didn't realize that.

> It probably makes sense to just switch everyone to that and get rid of the non-z 
> one?

Yeah, I think this long-lived bug is a proper trigger for that. Lemme send a 
2-patch series.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-12-07  8:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-03 22:24 [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero Steven Rostedt
2015-12-04  1:35 ` Rusty Russell
2015-12-04  2:37   ` Steven Rostedt
2015-12-04  7:34     ` Ingo Molnar
2015-12-04 20:30       ` Rusty Russell
2015-12-06 17:29         ` Ingo Molnar
2015-12-07  1:56           ` Rusty Russell
2015-12-07  8:23             ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).