All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail
@ 2018-12-20 20:23 Nicholas Mc Guire
  2018-12-21 15:53 ` Michal Hocko
  2018-12-21 21:58   ` David Rientjes
  0 siblings, 2 replies; 12+ messages in thread
From: Nicholas Mc Guire @ 2018-12-20 20:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chintan Pandya, Michal Hocko, Andrey Ryabinin, Arun KS,
	Joe Perches, Luis R. Rodriguez, linux-mm, linux-kernel,
	Nicholas Mc Guire

While this is in a very early stage of the system boot and if memory
were exhausted the system has a more serious problem anyway - but still
the kzalloc here seems unsafe. Looking at the history it was previously
switched from alloc_bootmem() to kzalloc() using GFP_NOWAIT flag but
there never seems to have been a check for NULL return. So if this is
expected to never fail should it not be using | __GFP_NOFAIL here ?
Or put differently - what is the rational for GFP_NOWAIT to be safe here ?

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Fixes 43ebdac42f16 ("vmalloc: use kzalloc() instead of alloc_bootmem()")
---

Problem was found by an experimental coccinelle script

Patch was only compile tested for x86_64_defconfig

Patch is against v4.20-rc7 (localversion-next next-20181220)

 mm/vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 871e41c..1c118d7 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
 
 	/* Import existing vmlist entries. */
 	for (tmp = vmlist; tmp; tmp = tmp->next) {
-		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
+		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
 		va->flags = VM_VM_AREA;
 		va->va_start = (unsigned long)tmp->addr;
 		va->va_end = va->va_start + tmp->size;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail
  2018-12-20 20:23 [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail Nicholas Mc Guire
@ 2018-12-21 15:53 ` Michal Hocko
  2018-12-21 21:58   ` David Rientjes
  1 sibling, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2018-12-21 15:53 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: Andrew Morton, Chintan Pandya, Andrey Ryabinin, Arun KS,
	Joe Perches, Luis R. Rodriguez, linux-mm, linux-kernel

On Thu 20-12-18 21:23:57, Nicholas Mc Guire wrote:
> While this is in a very early stage of the system boot and if memory
> were exhausted the system has a more serious problem anyway - but still
> the kzalloc here seems unsafe. Looking at the history it was previously
> switched from alloc_bootmem() to kzalloc() using GFP_NOWAIT flag but
> there never seems to have been a check for NULL return. So if this is
> expected to never fail should it not be using | __GFP_NOFAIL here ?
> Or put differently - what is the rational for GFP_NOWAIT to be safe here ?

Is there an actual problem you are trying to solve? GFP_NOWAIT|
__GFP_NOFAIL is a terrible idea. If this is an early allocation then
what would break this allocation out of the loop? There is nothing to
reclaim, there is nothing to kill. The allocation failure check would be
nice but what can you do except for BUG_ON?

> Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
> Fixes 43ebdac42f16 ("vmalloc: use kzalloc() instead of alloc_bootmem()")

So no, this is definitely not the right thing to do.
Nacked-by: Michal Hocko <mhocko@suse.com>

> ---
> 
> Problem was found by an experimental coccinelle script
> 
> Patch was only compile tested for x86_64_defconfig
> 
> Patch is against v4.20-rc7 (localversion-next next-20181220)
> 
>  mm/vmalloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 871e41c..1c118d7 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
>  
>  	/* Import existing vmlist entries. */
>  	for (tmp = vmlist; tmp; tmp = tmp->next) {
> -		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> +		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
>  		va->flags = VM_VM_AREA;
>  		va->va_start = (unsigned long)tmp->addr;
>  		va->va_end = va->va_start + tmp->size;
> -- 
> 2.1.4
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail
@ 2018-12-21 21:58   ` David Rientjes
  0 siblings, 0 replies; 12+ messages in thread
From: David Rientjes @ 2018-12-21 21:58 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: Andrew Morton, Chintan Pandya, Michal Hocko, Andrey Ryabinin,
	Arun KS, Joe Perches, Luis R. Rodriguez, linux-mm, linux-kernel

On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:

> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 871e41c..1c118d7 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
>  
>  	/* Import existing vmlist entries. */
>  	for (tmp = vmlist; tmp; tmp = tmp->next) {
> -		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> +		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
>  		va->flags = VM_VM_AREA;
>  		va->va_start = (unsigned long)tmp->addr;
>  		va->va_end = va->va_start + tmp->size;

Hi Nicholas,

You're right that this looks wrong because there's no guarantee that va is 
actually non-NULL.  __GFP_NOFAIL won't help in init, unfortunately, since 
we're not giving the page allocator a chance to reclaim so this would 
likely just end up looping forever instead of crashing with a NULL pointer 
dereference, which would actually be the better result.

You could do

	BUG_ON(!va);

to make it obvious why we crashed, however.  It makes it obvious that the 
crash is intentional rather than some error in the kernel code.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail
@ 2018-12-21 21:58   ` David Rientjes
  0 siblings, 0 replies; 12+ messages in thread
From: David Rientjes @ 2018-12-21 21:58 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: Andrew Morton, Chintan Pandya, Michal Hocko, Andrey Ryabinin,
	Arun KS, Joe Perches, Luis R. Rodriguez, linux-mm, linux-kernel

On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:

> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 871e41c..1c118d7 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
>  
>  	/* Import existing vmlist entries. */
>  	for (tmp = vmlist; tmp; tmp = tmp->next) {
> -		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> +		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
>  		va->flags = VM_VM_AREA;
>  		va->va_start = (unsigned long)tmp->addr;
>  		va->va_end = va->va_start + tmp->size;

Hi Nicholas,

You're right that this looks wrong because there's no guarantee that va is 
actually non-NULL.  __GFP_NOFAIL won't help in init, unfortunately, since 
we're not giving the page allocator a chance to reclaim so this would 
likely just end up looping forever instead of crashing with a NULL pointer 
dereference, which would actually be the better result.

You could do

	BUG_ON(!va);

to make it obvious why we crashed, however.  It makes it obvious that the 
crash is intentional rather than some error in the kernel code.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail
  2018-12-21 21:58   ` David Rientjes
@ 2018-12-22  8:04     ` Nicholas Mc Guire
  -1 siblings, 0 replies; 12+ messages in thread
From: Nicholas Mc Guire @ 2018-12-22  8:04 UTC (permalink / raw)
  To: David Rientjes
  Cc: Nicholas Mc Guire, Andrew Morton, Chintan Pandya, Michal Hocko,
	Andrey Ryabinin, Arun KS, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel

On Fri, Dec 21, 2018 at 01:58:39PM -0800, David Rientjes wrote:
> On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:
> 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 871e41c..1c118d7 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
> >  
> >  	/* Import existing vmlist entries. */
> >  	for (tmp = vmlist; tmp; tmp = tmp->next) {
> > -		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> > +		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
> >  		va->flags = VM_VM_AREA;
> >  		va->va_start = (unsigned long)tmp->addr;
> >  		va->va_end = va->va_start + tmp->size;
> 
> Hi Nicholas,
> 
> You're right that this looks wrong because there's no guarantee that va is 
> actually non-NULL.  __GFP_NOFAIL won't help in init, unfortunately, since 
> we're not giving the page allocator a chance to reclaim so this would 
> likely just end up looping forever instead of crashing with a NULL pointer 
> dereference, which would actually be the better result.
>
tried tracing the __GFP_NOFAIL path and had concluded that it would
end in out_of_memory() -> panic("System is deadlocked on memory\n");
which also should point cleanly to the cause - but I´m actually not
that sure if that trace was correct in all cases.
 
> You could do
> 
> 	BUG_ON(!va);
> 
> to make it obvious why we crashed, however.  It makes it obvious that the 
> crash is intentional rather than some error in the kernel code.

makes sense - that atleast makes it imediately clear from the code
that there is no way out from here.

thx!
hofrat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail
@ 2018-12-22  8:04     ` Nicholas Mc Guire
  0 siblings, 0 replies; 12+ messages in thread
From: Nicholas Mc Guire @ 2018-12-22  8:04 UTC (permalink / raw)
  To: David Rientjes
  Cc: Nicholas Mc Guire, Andrew Morton, Chintan Pandya, Michal Hocko,
	Andrey Ryabinin, Arun KS, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel

On Fri, Dec 21, 2018 at 01:58:39PM -0800, David Rientjes wrote:
> On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:
> 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 871e41c..1c118d7 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
> >  
> >  	/* Import existing vmlist entries. */
> >  	for (tmp = vmlist; tmp; tmp = tmp->next) {
> > -		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> > +		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
> >  		va->flags = VM_VM_AREA;
> >  		va->va_start = (unsigned long)tmp->addr;
> >  		va->va_end = va->va_start + tmp->size;
> 
> Hi Nicholas,
> 
> You're right that this looks wrong because there's no guarantee that va is 
> actually non-NULL.  __GFP_NOFAIL won't help in init, unfortunately, since 
> we're not giving the page allocator a chance to reclaim so this would 
> likely just end up looping forever instead of crashing with a NULL pointer 
> dereference, which would actually be the better result.
>
tried tracing the __GFP_NOFAIL path and had concluded that it would
end in out_of_memory() -> panic("System is deadlocked on memory\n");
which also should point cleanly to the cause - but I�m actually not
that sure if that trace was correct in all cases.
 
> You could do
> 
> 	BUG_ON(!va);
> 
> to make it obvious why we crashed, however.  It makes it obvious that the 
> crash is intentional rather than some error in the kernel code.

makes sense - that atleast makes it imediately clear from the code
that there is no way out from here.

thx!
hofrat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail
  2018-12-22  8:04     ` Nicholas Mc Guire
@ 2018-12-24  8:10       ` Michal Hocko
  -1 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2018-12-24  8:10 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: David Rientjes, Nicholas Mc Guire, Andrew Morton, Chintan Pandya,
	Andrey Ryabinin, Arun KS, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel

On Sat 22-12-18 09:04:21, Nicholas Mc Guire wrote:
> On Fri, Dec 21, 2018 at 01:58:39PM -0800, David Rientjes wrote:
> > On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:
> > 
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 871e41c..1c118d7 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
> > >  
> > >  	/* Import existing vmlist entries. */
> > >  	for (tmp = vmlist; tmp; tmp = tmp->next) {
> > > -		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> > > +		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
> > >  		va->flags = VM_VM_AREA;
> > >  		va->va_start = (unsigned long)tmp->addr;
> > >  		va->va_end = va->va_start + tmp->size;
> > 
> > Hi Nicholas,
> > 
> > You're right that this looks wrong because there's no guarantee that va is 
> > actually non-NULL.  __GFP_NOFAIL won't help in init, unfortunately, since 
> > we're not giving the page allocator a chance to reclaim so this would 
> > likely just end up looping forever instead of crashing with a NULL pointer 
> > dereference, which would actually be the better result.
> >
> tried tracing the __GFP_NOFAIL path and had concluded that it would
> end in out_of_memory() -> panic("System is deadlocked on memory\n");
> which also should point cleanly to the cause - but I´m actually not
> that sure if that trace was correct in all cases.

No, we do not trigger the memory reclaim path nor the oom killer when
using GFP_NOWAIT. In fact the current implementation even ignores
__GFP_NOFAIL AFAICS (so I was wrong about the endless loop but I suspect
that we used to loop fpr __GFP_NOFAIL at some point in the past). The
patch simply doesn't have any effect. But the primary objection is that
the behavior might change in future and you certainly do not want to get
stuck in the boot process without knowing what is going on. Crashing
will tell you that quite obviously. Although I have hard time imagine
how that could happen in a reasonably configured system.

> > You could do
> > 
> > 	BUG_ON(!va);
> > 
> > to make it obvious why we crashed, however.  It makes it obvious that the 
> > crash is intentional rather than some error in the kernel code.
> 
> makes sense - that atleast makes it imediately clear from the code
> that there is no way out from here.

How does it differ from blowing up right there when dereferencing flags?
It would be clear from the oops.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail
@ 2018-12-24  8:10       ` Michal Hocko
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2018-12-24  8:10 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: David Rientjes, Nicholas Mc Guire, Andrew Morton, Chintan Pandya,
	Andrey Ryabinin, Arun KS, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel

On Sat 22-12-18 09:04:21, Nicholas Mc Guire wrote:
> On Fri, Dec 21, 2018 at 01:58:39PM -0800, David Rientjes wrote:
> > On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:
> > 
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 871e41c..1c118d7 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
> > >  
> > >  	/* Import existing vmlist entries. */
> > >  	for (tmp = vmlist; tmp; tmp = tmp->next) {
> > > -		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> > > +		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
> > >  		va->flags = VM_VM_AREA;
> > >  		va->va_start = (unsigned long)tmp->addr;
> > >  		va->va_end = va->va_start + tmp->size;
> > 
> > Hi Nicholas,
> > 
> > You're right that this looks wrong because there's no guarantee that va is 
> > actually non-NULL.  __GFP_NOFAIL won't help in init, unfortunately, since 
> > we're not giving the page allocator a chance to reclaim so this would 
> > likely just end up looping forever instead of crashing with a NULL pointer 
> > dereference, which would actually be the better result.
> >
> tried tracing the __GFP_NOFAIL path and had concluded that it would
> end in out_of_memory() -> panic("System is deadlocked on memory\n");
> which also should point cleanly to the cause - but I�m actually not
> that sure if that trace was correct in all cases.

No, we do not trigger the memory reclaim path nor the oom killer when
using GFP_NOWAIT. In fact the current implementation even ignores
__GFP_NOFAIL AFAICS (so I was wrong about the endless loop but I suspect
that we used to loop fpr __GFP_NOFAIL at some point in the past). The
patch simply doesn't have any effect. But the primary objection is that
the behavior might change in future and you certainly do not want to get
stuck in the boot process without knowing what is going on. Crashing
will tell you that quite obviously. Although I have hard time imagine
how that could happen in a reasonably configured system.

> > You could do
> > 
> > 	BUG_ON(!va);
> > 
> > to make it obvious why we crashed, however.  It makes it obvious that the 
> > crash is intentional rather than some error in the kernel code.
> 
> makes sense - that atleast makes it imediately clear from the code
> that there is no way out from here.

How does it differ from blowing up right there when dereferencing flags?
It would be clear from the oops.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail
  2018-12-24  8:10       ` Michal Hocko
@ 2018-12-24  9:38         ` Nicholas Mc Guire
  -1 siblings, 0 replies; 12+ messages in thread
From: Nicholas Mc Guire @ 2018-12-24  9:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Rientjes, Nicholas Mc Guire, Andrew Morton, Chintan Pandya,
	Andrey Ryabinin, Arun KS, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel

On Mon, Dec 24, 2018 at 09:10:56AM +0100, Michal Hocko wrote:
> On Sat 22-12-18 09:04:21, Nicholas Mc Guire wrote:
> > On Fri, Dec 21, 2018 at 01:58:39PM -0800, David Rientjes wrote:
> > > On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:
> > > 
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 871e41c..1c118d7 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
> > > >  
> > > >  	/* Import existing vmlist entries. */
> > > >  	for (tmp = vmlist; tmp; tmp = tmp->next) {
> > > > -		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> > > > +		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
> > > >  		va->flags = VM_VM_AREA;
> > > >  		va->va_start = (unsigned long)tmp->addr;
> > > >  		va->va_end = va->va_start + tmp->size;
> > > 
> > > Hi Nicholas,
> > > 
> > > You're right that this looks wrong because there's no guarantee that va is 
> > > actually non-NULL.  __GFP_NOFAIL won't help in init, unfortunately, since 
> > > we're not giving the page allocator a chance to reclaim so this would 
> > > likely just end up looping forever instead of crashing with a NULL pointer 
> > > dereference, which would actually be the better result.
> > >
> > tried tracing the __GFP_NOFAIL path and had concluded that it would
> > end in out_of_memory() -> panic("System is deadlocked on memory\n");
> > which also should point cleanly to the cause - but I´m actually not
> > that sure if that trace was correct in all cases.
> 
> No, we do not trigger the memory reclaim path nor the oom killer when
> using GFP_NOWAIT. In fact the current implementation even ignores
> __GFP_NOFAIL AFAICS (so I was wrong about the endless loop but I suspect
> that we used to loop fpr __GFP_NOFAIL at some point in the past). The
> patch simply doesn't have any effect. But the primary objection is that
> the behavior might change in future and you certainly do not want to get
> stuck in the boot process without knowing what is going on. Crashing
> will tell you that quite obviously. Although I have hard time imagine
> how that could happen in a reasonably configured system.

I think most of the defensive structures are covering rare to almost
impossible cases - but those are precisely the hard ones to understand if
they do happen.

> 
> > > You could do
> > > 
> > > 	BUG_ON(!va);
> > > 
> > > to make it obvious why we crashed, however.  It makes it obvious that the 
> > > crash is intentional rather than some error in the kernel code.
> > 
> > makes sense - that atleast makes it imediately clear from the code
> > that there is no way out from here.
> 
> How does it differ from blowing up right there when dereferencing flags?
> It would be clear from the oops.

The question is how soon does it blow-up if it were imediate then three is
probably no real difference if there is some delay say due to the region
affected by the NULL pointer not being imediately in use - it may be very
hard to differenciate between an allocation failure and memory corruption
so having a directly associated trace should be significantly simpler to
understand - and you might actually not want a system to try booting if there
are problems at this level.

thx!
hofrat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail
@ 2018-12-24  9:38         ` Nicholas Mc Guire
  0 siblings, 0 replies; 12+ messages in thread
From: Nicholas Mc Guire @ 2018-12-24  9:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Rientjes, Nicholas Mc Guire, Andrew Morton, Chintan Pandya,
	Andrey Ryabinin, Arun KS, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel

On Mon, Dec 24, 2018 at 09:10:56AM +0100, Michal Hocko wrote:
> On Sat 22-12-18 09:04:21, Nicholas Mc Guire wrote:
> > On Fri, Dec 21, 2018 at 01:58:39PM -0800, David Rientjes wrote:
> > > On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:
> > > 
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 871e41c..1c118d7 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
> > > >  
> > > >  	/* Import existing vmlist entries. */
> > > >  	for (tmp = vmlist; tmp; tmp = tmp->next) {
> > > > -		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> > > > +		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
> > > >  		va->flags = VM_VM_AREA;
> > > >  		va->va_start = (unsigned long)tmp->addr;
> > > >  		va->va_end = va->va_start + tmp->size;
> > > 
> > > Hi Nicholas,
> > > 
> > > You're right that this looks wrong because there's no guarantee that va is 
> > > actually non-NULL.  __GFP_NOFAIL won't help in init, unfortunately, since 
> > > we're not giving the page allocator a chance to reclaim so this would 
> > > likely just end up looping forever instead of crashing with a NULL pointer 
> > > dereference, which would actually be the better result.
> > >
> > tried tracing the __GFP_NOFAIL path and had concluded that it would
> > end in out_of_memory() -> panic("System is deadlocked on memory\n");
> > which also should point cleanly to the cause - but I�m actually not
> > that sure if that trace was correct in all cases.
> 
> No, we do not trigger the memory reclaim path nor the oom killer when
> using GFP_NOWAIT. In fact the current implementation even ignores
> __GFP_NOFAIL AFAICS (so I was wrong about the endless loop but I suspect
> that we used to loop fpr __GFP_NOFAIL at some point in the past). The
> patch simply doesn't have any effect. But the primary objection is that
> the behavior might change in future and you certainly do not want to get
> stuck in the boot process without knowing what is going on. Crashing
> will tell you that quite obviously. Although I have hard time imagine
> how that could happen in a reasonably configured system.

I think most of the defensive structures are covering rare to almost
impossible cases - but those are precisely the hard ones to understand if
they do happen.

> 
> > > You could do
> > > 
> > > 	BUG_ON(!va);
> > > 
> > > to make it obvious why we crashed, however.  It makes it obvious that the 
> > > crash is intentional rather than some error in the kernel code.
> > 
> > makes sense - that atleast makes it imediately clear from the code
> > that there is no way out from here.
> 
> How does it differ from blowing up right there when dereferencing flags?
> It would be clear from the oops.

The question is how soon does it blow-up if it were imediate then three is
probably no real difference if there is some delay say due to the region
affected by the NULL pointer not being imediately in use - it may be very
hard to differenciate between an allocation failure and memory corruption
so having a directly associated trace should be significantly simpler to
understand - and you might actually not want a system to try booting if there
are problems at this level.

thx!
hofrat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail
  2018-12-24  9:38         ` Nicholas Mc Guire
@ 2018-12-24 11:58           ` Nicholas Mc Guire
  -1 siblings, 0 replies; 12+ messages in thread
From: Nicholas Mc Guire @ 2018-12-24 11:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Rientjes, Nicholas Mc Guire, Andrew Morton, Chintan Pandya,
	Andrey Ryabinin, Arun KS, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel

On Mon, Dec 24, 2018 at 10:38:04AM +0100, Nicholas Mc Guire wrote:
> On Mon, Dec 24, 2018 at 09:10:56AM +0100, Michal Hocko wrote:
> > On Sat 22-12-18 09:04:21, Nicholas Mc Guire wrote:
> > > On Fri, Dec 21, 2018 at 01:58:39PM -0800, David Rientjes wrote:
> > > > On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:
> > > > 
> > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > index 871e41c..1c118d7 100644
> > > > > --- a/mm/vmalloc.c
> > > > > +++ b/mm/vmalloc.c
> > > > > @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
> > > > >  
> > > > >  	/* Import existing vmlist entries. */
> > > > >  	for (tmp = vmlist; tmp; tmp = tmp->next) {
> > > > > -		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> > > > > +		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
> > > > >  		va->flags = VM_VM_AREA;
> > > > >  		va->va_start = (unsigned long)tmp->addr;
> > > > >  		va->va_end = va->va_start + tmp->size;
> > > > 
> > > > Hi Nicholas,
> > > > 
> > > > You're right that this looks wrong because there's no guarantee that va is 
> > > > actually non-NULL.  __GFP_NOFAIL won't help in init, unfortunately, since 
> > > > we're not giving the page allocator a chance to reclaim so this would 
> > > > likely just end up looping forever instead of crashing with a NULL pointer 
> > > > dereference, which would actually be the better result.
> > > >
> > > tried tracing the __GFP_NOFAIL path and had concluded that it would
> > > end in out_of_memory() -> panic("System is deadlocked on memory\n");
> > > which also should point cleanly to the cause - but I´m actually not
> > > that sure if that trace was correct in all cases.
> > 
> > No, we do not trigger the memory reclaim path nor the oom killer when
> > using GFP_NOWAIT. In fact the current implementation even ignores
> > __GFP_NOFAIL AFAICS (so I was wrong about the endless loop but I suspect
> > that we used to loop fpr __GFP_NOFAIL at some point in the past). The
> > patch simply doesn't have any effect. But the primary objection is that
> > the behavior might change in future and you certainly do not want to get
> > stuck in the boot process without knowing what is going on. Crashing
> > will tell you that quite obviously. Although I have hard time imagine
> > how that could happen in a reasonably configured system.
> 
> I think most of the defensive structures are covering rare to almost
> impossible cases - but those are precisely the hard ones to understand if
> they do happen.
> 
> > 
> > > > You could do
> > > > 
> > > > 	BUG_ON(!va);
> > > > 
> > > > to make it obvious why we crashed, however.  It makes it obvious that the 
> > > > crash is intentional rather than some error in the kernel code.
> > > 
> > > makes sense - that atleast makes it imediately clear from the code
> > > that there is no way out from here.
> > 
> > How does it differ from blowing up right there when dereferencing flags?
> > It would be clear from the oops.
> 
> The question is how soon does it blow-up if it were imediate then three is
> probably no real difference if there is some delay say due to the region
> affected by the NULL pointer not being imediately in use - it may be very
> hard to differenciate between an allocation failure and memory corruption
> so having a directly associated trace should be significantly simpler to
> understand - and you might actually not want a system to try booting if there
> are problems at this level.
>
sorry - you are right - it would blow up imediately - so there is no way this
could be delayed in this case. So then its just a matter of the code making
clear that the NULL case was considered - by a comment or by BUG_ON().

thx!
hofrat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail
@ 2018-12-24 11:58           ` Nicholas Mc Guire
  0 siblings, 0 replies; 12+ messages in thread
From: Nicholas Mc Guire @ 2018-12-24 11:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Rientjes, Nicholas Mc Guire, Andrew Morton, Chintan Pandya,
	Andrey Ryabinin, Arun KS, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel

On Mon, Dec 24, 2018 at 10:38:04AM +0100, Nicholas Mc Guire wrote:
> On Mon, Dec 24, 2018 at 09:10:56AM +0100, Michal Hocko wrote:
> > On Sat 22-12-18 09:04:21, Nicholas Mc Guire wrote:
> > > On Fri, Dec 21, 2018 at 01:58:39PM -0800, David Rientjes wrote:
> > > > On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:
> > > > 
> > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > index 871e41c..1c118d7 100644
> > > > > --- a/mm/vmalloc.c
> > > > > +++ b/mm/vmalloc.c
> > > > > @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
> > > > >  
> > > > >  	/* Import existing vmlist entries. */
> > > > >  	for (tmp = vmlist; tmp; tmp = tmp->next) {
> > > > > -		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> > > > > +		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
> > > > >  		va->flags = VM_VM_AREA;
> > > > >  		va->va_start = (unsigned long)tmp->addr;
> > > > >  		va->va_end = va->va_start + tmp->size;
> > > > 
> > > > Hi Nicholas,
> > > > 
> > > > You're right that this looks wrong because there's no guarantee that va is 
> > > > actually non-NULL.  __GFP_NOFAIL won't help in init, unfortunately, since 
> > > > we're not giving the page allocator a chance to reclaim so this would 
> > > > likely just end up looping forever instead of crashing with a NULL pointer 
> > > > dereference, which would actually be the better result.
> > > >
> > > tried tracing the __GFP_NOFAIL path and had concluded that it would
> > > end in out_of_memory() -> panic("System is deadlocked on memory\n");
> > > which also should point cleanly to the cause - but I�m actually not
> > > that sure if that trace was correct in all cases.
> > 
> > No, we do not trigger the memory reclaim path nor the oom killer when
> > using GFP_NOWAIT. In fact the current implementation even ignores
> > __GFP_NOFAIL AFAICS (so I was wrong about the endless loop but I suspect
> > that we used to loop fpr __GFP_NOFAIL at some point in the past). The
> > patch simply doesn't have any effect. But the primary objection is that
> > the behavior might change in future and you certainly do not want to get
> > stuck in the boot process without knowing what is going on. Crashing
> > will tell you that quite obviously. Although I have hard time imagine
> > how that could happen in a reasonably configured system.
> 
> I think most of the defensive structures are covering rare to almost
> impossible cases - but those are precisely the hard ones to understand if
> they do happen.
> 
> > 
> > > > You could do
> > > > 
> > > > 	BUG_ON(!va);
> > > > 
> > > > to make it obvious why we crashed, however.  It makes it obvious that the 
> > > > crash is intentional rather than some error in the kernel code.
> > > 
> > > makes sense - that atleast makes it imediately clear from the code
> > > that there is no way out from here.
> > 
> > How does it differ from blowing up right there when dereferencing flags?
> > It would be clear from the oops.
> 
> The question is how soon does it blow-up if it were imediate then three is
> probably no real difference if there is some delay say due to the region
> affected by the NULL pointer not being imediately in use - it may be very
> hard to differenciate between an allocation failure and memory corruption
> so having a directly associated trace should be significantly simpler to
> understand - and you might actually not want a system to try booting if there
> are problems at this level.
>
sorry - you are right - it would blow up imediately - so there is no way this
could be delayed in this case. So then its just a matter of the code making
clear that the NULL case was considered - by a comment or by BUG_ON().

thx!
hofrat

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-12-24 11:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-20 20:23 [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail Nicholas Mc Guire
2018-12-21 15:53 ` Michal Hocko
2018-12-21 21:58 ` David Rientjes
2018-12-21 21:58   ` David Rientjes
2018-12-22  8:04   ` Nicholas Mc Guire
2018-12-22  8:04     ` Nicholas Mc Guire
2018-12-24  8:10     ` Michal Hocko
2018-12-24  8:10       ` Michal Hocko
2018-12-24  9:38       ` Nicholas Mc Guire
2018-12-24  9:38         ` Nicholas Mc Guire
2018-12-24 11:58         ` Nicholas Mc Guire
2018-12-24 11:58           ` Nicholas Mc Guire

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.