linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] print vmalloc() state after allocation failures
@ 2011-04-07 17:23 Dave Hansen
  2011-04-07 22:11 ` David Rientjes
  2011-04-08  0:19 ` Johannes Weiner
  0 siblings, 2 replies; 4+ messages in thread
From: Dave Hansen @ 2011-04-07 17:23 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Andrew Morton, Dave Hansen


I was tracking down a page allocation failure that ended up in vmalloc().
Since vmalloc() uses 0-order pages, if somebody asks for an insane amount
of memory, we'll still get a warning with "order:0" in it.  That's not
very useful.

During recovery, vmalloc() also nicely frees all of the memory that it
got up to the point of the failure.  That is wonderful, but it also
quickly hides any issues.  We have a much different sitation if vmalloc()
repeatedly fails 10GB in to:

	vmalloc(100 * 1<<30);

versus repeatedly failing 4096 bytes in to a:

	vmalloc(8192);

This will print out messages that look like this:

[   30.040774] bash: vmalloc failure allocating after 0 / 73728 bytes

As a side issue, I also noticed that ctl_ioctl() does vmalloc() based
solely on an unverified value passed in from userspace.  Granted, it's
under CAP_SYS_ADMIN, but it still frightens me a bit.

multipathd: page allocation failure. order:0, mode:0xd2
Call Trace:
[c0000000f34ef570] [c000000000012d84] .show_stack+0x74/0x1c0 (unreliable)
[c0000000f34ef620] [c000000000159ed4] .__alloc_pages_nodemask+0x574/0x830
[c0000000f34ef7a0] [c00000000019306c] .alloc_pages_current+0x8c/0x110
[c0000000f34ef840] [c000000000183bdc] .__vmalloc_area_node+0x17c/0x220
[c0000000f34ef900] [d00000000132bb24] .copy_params+0x74/0xc0 [dm_mod]
[c0000000f34efad0] [d00000000132bcec] .ctl_ioctl+0x17c/0x2c0 [dm_mod]
[c0000000f34efb90] [d00000000132be48] .dm_ctl_ioctl+0x18/0x30 [dm_mod]
[c0000000f34efc00] [c0000000001c4ee4] .vfs_ioctl+0x54/0x140
[c0000000f34efc90] [c0000000001c5130] .do_vfs_ioctl+0x90/0x7c0
[c0000000f34efd80] [c0000000001c5914] .SyS_ioctl+0xb4/0xd0
[c0000000f34efe30] [c00000000000852c] syscall_exit+0x0/0x40
Mem-Info:
Node 0 DMA per-cpu:
...


Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/mm/vmalloc.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff -puN mm/vmalloc.c~vmalloc-warn mm/vmalloc.c
--- linux-2.6.git/mm/vmalloc.c~vmalloc-warn	2011-04-07 10:21:27.792401938 -0700
+++ linux-2.6.git-dave/mm/vmalloc.c	2011-04-07 10:21:27.800401934 -0700
@@ -1579,6 +1579,18 @@ static void *__vmalloc_area_node(struct 
 	return area->addr;
 
 fail:
+	if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
+		/*
+		 * We probably did a show_mem() and a stack dump above
+		 * inside of alloc_page*().  This is only so we can
+		 * tell how big the vmalloc() really was.  This will
+		 * also not be exactly the same as what was passed
+		 * to vmalloc() due to alignment and the guard page.
+		 */
+		printk(KERN_WARNING "%s: vmalloc: allocation failure, "
+			"allocated %ld of %ld bytes\n", current->comm,
+			(area->nr_pages*PAGE_SIZE), area->size);
+	}
 	vfree(area->addr);
 	return NULL;
 }
_

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] print vmalloc() state after allocation failures
  2011-04-07 17:23 [PATCH] print vmalloc() state after allocation failures Dave Hansen
@ 2011-04-07 22:11 ` David Rientjes
  2011-04-08  0:19 ` Johannes Weiner
  1 sibling, 0 replies; 4+ messages in thread
From: David Rientjes @ 2011-04-07 22:11 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, Andrew Morton

On Thu, 7 Apr 2011, Dave Hansen wrote:

> 
> I was tracking down a page allocation failure that ended up in vmalloc().
> Since vmalloc() uses 0-order pages, if somebody asks for an insane amount
> of memory, we'll still get a warning with "order:0" in it.  That's not
> very useful.
> 
> During recovery, vmalloc() also nicely frees all of the memory that it
> got up to the point of the failure.  That is wonderful, but it also
> quickly hides any issues.  We have a much different sitation if vmalloc()
> repeatedly fails 10GB in to:
> 
> 	vmalloc(100 * 1<<30);
> 
> versus repeatedly failing 4096 bytes in to a:
> 
> 	vmalloc(8192);
> 
> This will print out messages that look like this:
> 
> [   30.040774] bash: vmalloc failure allocating after 0 / 73728 bytes
> 

Won't it print "bash: vmalloc: allocation failure, allocated 0 of 73728 
bytes" instead?

> As a side issue, I also noticed that ctl_ioctl() does vmalloc() based
> solely on an unverified value passed in from userspace.  Granted, it's
> under CAP_SYS_ADMIN, but it still frightens me a bit.
> 
> multipathd: page allocation failure. order:0, mode:0xd2
> Call Trace:
> [c0000000f34ef570] [c000000000012d84] .show_stack+0x74/0x1c0 (unreliable)
> [c0000000f34ef620] [c000000000159ed4] .__alloc_pages_nodemask+0x574/0x830
> [c0000000f34ef7a0] [c00000000019306c] .alloc_pages_current+0x8c/0x110
> [c0000000f34ef840] [c000000000183bdc] .__vmalloc_area_node+0x17c/0x220
> [c0000000f34ef900] [d00000000132bb24] .copy_params+0x74/0xc0 [dm_mod]
> [c0000000f34efad0] [d00000000132bcec] .ctl_ioctl+0x17c/0x2c0 [dm_mod]
> [c0000000f34efb90] [d00000000132be48] .dm_ctl_ioctl+0x18/0x30 [dm_mod]
> [c0000000f34efc00] [c0000000001c4ee4] .vfs_ioctl+0x54/0x140
> [c0000000f34efc90] [c0000000001c5130] .do_vfs_ioctl+0x90/0x7c0
> [c0000000f34efd80] [c0000000001c5914] .SyS_ioctl+0xb4/0xd0
> [c0000000f34efe30] [c00000000000852c] syscall_exit+0x0/0x40
> Mem-Info:
> Node 0 DMA per-cpu:
> ...
> 
> 
> Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
> ---
> 
>  linux-2.6.git-dave/mm/vmalloc.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff -puN mm/vmalloc.c~vmalloc-warn mm/vmalloc.c
> --- linux-2.6.git/mm/vmalloc.c~vmalloc-warn	2011-04-07 10:21:27.792401938 -0700
> +++ linux-2.6.git-dave/mm/vmalloc.c	2011-04-07 10:21:27.800401934 -0700
> @@ -1579,6 +1579,18 @@ static void *__vmalloc_area_node(struct 
>  	return area->addr;
>  
>  fail:
> +	if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
> +		/*
> +		 * We probably did a show_mem() and a stack dump above
> +		 * inside of alloc_page*().  This is only so we can
> +		 * tell how big the vmalloc() really was.  This will
> +		 * also not be exactly the same as what was passed
> +		 * to vmalloc() due to alignment and the guard page.
> +		 */
> +		printk(KERN_WARNING "%s: vmalloc: allocation failure, "
> +			"allocated %ld of %ld bytes\n", current->comm,
> +			(area->nr_pages*PAGE_SIZE), area->size);
> +	}
>  	vfree(area->addr);
>  	return NULL;
>  }

Looks good.

Acked-by: David Rientjes <rientjes@google.com>

__vmalloc_area_node() can also be moved into __vmalloc_node_range() since 
that's its only caller if you're interested.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] print vmalloc() state after allocation failures
  2011-04-07 17:23 [PATCH] print vmalloc() state after allocation failures Dave Hansen
  2011-04-07 22:11 ` David Rientjes
@ 2011-04-08  0:19 ` Johannes Weiner
  2011-04-08 13:23   ` Dave Hansen
  1 sibling, 1 reply; 4+ messages in thread
From: Johannes Weiner @ 2011-04-08  0:19 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, Andrew Morton

On Thu, Apr 07, 2011 at 10:23:02AM -0700, Dave Hansen wrote:
> Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>

I agree with this in general, but have some nitpicks.

> @@ -1579,6 +1579,18 @@ static void *__vmalloc_area_node(struct 
>  	return area->addr;
>  
>  fail:
> +	if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {

There is a comment above the declaration of printk_ratelimit:

/*
 * Please don't use printk_ratelimit(), because it shares ratelimiting state
 * with all other unrelated printk_ratelimit() callsites.  Instead use
 * printk_ratelimited() or plain old __ratelimit().
 */

I realize that the page allocator does it the same way, but I think it
should probably be fixed in there, rather than spread any further.

> +		/*
> +		 * We probably did a show_mem() and a stack dump above
> +		 * inside of alloc_page*().  This is only so we can
> +		 * tell how big the vmalloc() really was.  This will
> +		 * also not be exactly the same as what was passed
> +		 * to vmalloc() due to alignment and the guard page.
> +		 */
> +		printk(KERN_WARNING "%s: vmalloc: allocation failure, "
> +			"allocated %ld of %ld bytes\n", current->comm,
> +			(area->nr_pages*PAGE_SIZE), area->size);
> +	}

To me, this does not look like something that should just be appended
to the whole pile spewed out by dump_stack() and show_mem().  What do
you think about doing the page allocation with __GFP_NOWARN and have
the full report come from this place, with the line you introduce as
leader?

	Hannes

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] print vmalloc() state after allocation failures
  2011-04-08  0:19 ` Johannes Weiner
@ 2011-04-08 13:23   ` Dave Hansen
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Hansen @ 2011-04-08 13:23 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: linux-mm, linux-kernel, Andrew Morton

On Thu, 2011-04-07 at 17:19 -0700, Johannes Weiner wrote:
> On Thu, Apr 07, 2011 at 10:23:02AM -0700, Dave Hansen wrote:
> > @@ -1579,6 +1579,18 @@ static void *__vmalloc_area_node(struct 
> >  	return area->addr;
> >  
> >  fail:
> > +	if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
> 
> There is a comment above the declaration of printk_ratelimit:
> 
> /*
>  * Please don't use printk_ratelimit(), because it shares ratelimiting state
>  * with all other unrelated printk_ratelimit() callsites.  Instead use
>  * printk_ratelimited() or plain old __ratelimit().
>  */
> 
> I realize that the page allocator does it the same way, but I think it
> should probably be fixed in there, rather than spread any further.

You're the second person to mention this.  I should have listened the
first time. :)  I'll fix it up and repost.

> > +		/*
> > +		 * We probably did a show_mem() and a stack dump above
> > +		 * inside of alloc_page*().  This is only so we can
> > +		 * tell how big the vmalloc() really was.  This will
> > +		 * also not be exactly the same as what was passed
> > +		 * to vmalloc() due to alignment and the guard page.
> > +		 */
> > +		printk(KERN_WARNING "%s: vmalloc: allocation failure, "
> > +			"allocated %ld of %ld bytes\n", current->comm,
> > +			(area->nr_pages*PAGE_SIZE), area->size);
> > +	}
> 
> To me, this does not look like something that should just be appended
> to the whole pile spewed out by dump_stack() and show_mem().  What do
> you think about doing the page allocation with __GFP_NOWARN and have
> the full report come from this place, with the line you introduce as
> leader?

That sounds fine to me.  

-- Dave


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-04-08 13:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-07 17:23 [PATCH] print vmalloc() state after allocation failures Dave Hansen
2011-04-07 22:11 ` David Rientjes
2011-04-08  0:19 ` Johannes Weiner
2011-04-08 13:23   ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).