All of lore.kernel.org
 help / color / mirror / Atom feed
* suppress page allocation failure warnings from sys_listxattr
@ 2012-03-13 18:22 Dave Jones
  2012-03-13 21:33 ` Colin Walters
  2012-03-27 22:51 ` Andrew Morton
  0 siblings, 2 replies; 20+ messages in thread
From: Dave Jones @ 2012-03-13 18:22 UTC (permalink / raw)
  To: viro; +Cc: Linux Kernel

This size is user controllable, and so it's trivial for someone to trigger a
stream of order:4 page allocation errors.

Signed-off-by: Dave Jones <davej@redhat.com>

--- 
There's also a similar problem in setxattr, but I'm not sure how we want
to pass NOWARN down to memdup_user. Thoughts ?

diff --git a/fs/xattr.c b/fs/xattr.c
index 82f4337..544df90 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -496,7 +496,7 @@ listxattr(struct dentry *d, char __user *list, size_t size)
 	if (size) {
 		if (size > XATTR_LIST_MAX)
 			size = XATTR_LIST_MAX;
-		klist = kmalloc(size, GFP_KERNEL);
+		klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL);
 		if (!klist)
 			return -ENOMEM;
 	}

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-13 18:22 suppress page allocation failure warnings from sys_listxattr Dave Jones
@ 2012-03-13 21:33 ` Colin Walters
  2012-03-27 22:51 ` Andrew Morton
  1 sibling, 0 replies; 20+ messages in thread
From: Colin Walters @ 2012-03-13 21:33 UTC (permalink / raw)
  To: Dave Jones; +Cc: viro, Linux Kernel

On Tue, 2012-03-13 at 14:22 -0400, Dave Jones wrote:
> This size is user controllable, and so it's trivial for someone to trigger a
> stream of order:4 page allocation errors.

I spent some time today struggling with an order:4 allocation failure
(my application uses CLONE_NEWNET to make an empty network stack for
software builds, and apparently one of the netfilter caches requires
this).

But is that the general principle, that we just add GFP_NOWARN if the
allocation size is trivially user controllable?  I guess examples like
fs/pipe.c:pipe_set_size() agree with you, but it feels kind of like
it's papering over the problem...




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-13 18:22 suppress page allocation failure warnings from sys_listxattr Dave Jones
  2012-03-13 21:33 ` Colin Walters
@ 2012-03-27 22:51 ` Andrew Morton
  2012-03-28  0:15   ` Dave Jones
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2012-03-27 22:51 UTC (permalink / raw)
  To: Dave Jones; +Cc: viro, Linux Kernel

On Tue, 13 Mar 2012 14:22:20 -0400
Dave Jones <davej@redhat.com> wrote:

> This size is user controllable, and so it's trivial for someone to trigger a
> stream of order:4 page allocation errors.
> 
> Signed-off-by: Dave Jones <davej@redhat.com>
> 
> --- 
> There's also a similar problem in setxattr, but I'm not sure how we want
> to pass NOWARN down to memdup_user. Thoughts ?
> 
> diff --git a/fs/xattr.c b/fs/xattr.c
> index 82f4337..544df90 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -496,7 +496,7 @@ listxattr(struct dentry *d, char __user *list, size_t size)
>  	if (size) {
>  		if (size > XATTR_LIST_MAX)
>  			size = XATTR_LIST_MAX;
> -		klist = kmalloc(size, GFP_KERNEL);
> +		klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL);
>  		if (!klist)
>  			return -ENOMEM;
>  	}

hm.  The patch is good, but one would hope that it isn't "trivial" to
trigger a page allocation failure for a kmalloc(65536, GFP_KERNEL) -
the VM is supposed to be able to handle that.

Is it really *that* easy, or is Something Unusual happening with that
machine?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-27 22:51 ` Andrew Morton
@ 2012-03-28  0:15   ` Dave Jones
  2012-03-28  0:26     ` Andrew Morton
  2012-03-28  4:39     ` Dave Chinner
  0 siblings, 2 replies; 20+ messages in thread
From: Dave Jones @ 2012-03-28  0:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: viro, Linux Kernel

On Tue, Mar 27, 2012 at 03:51:49PM -0700, Andrew Morton wrote:
 > On Tue, 13 Mar 2012 14:22:20 -0400
 > Dave Jones <davej@redhat.com> wrote:
 > 
 > > This size is user controllable, and so it's trivial for someone to trigger a
 > > stream of order:4 page allocation errors.
 > > 
 > > Signed-off-by: Dave Jones <davej@redhat.com>
 > > 
 > > --- 
 > > There's also a similar problem in setxattr, but I'm not sure how we want
 > > to pass NOWARN down to memdup_user. Thoughts ?
 > > 
 > > diff --git a/fs/xattr.c b/fs/xattr.c
 > > index 82f4337..544df90 100644
 > > --- a/fs/xattr.c
 > > +++ b/fs/xattr.c
 > > @@ -496,7 +496,7 @@ listxattr(struct dentry *d, char __user *list, size_t size)
 > >  	if (size) {
 > >  		if (size > XATTR_LIST_MAX)
 > >  			size = XATTR_LIST_MAX;
 > > -		klist = kmalloc(size, GFP_KERNEL);
 > > +		klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL);
 > >  		if (!klist)
 > >  			return -ENOMEM;
 > >  	}
 > 
 > hm.  The patch is good, but one would hope that it isn't "trivial" to
 > trigger a page allocation failure for a kmalloc(65536, GFP_KERNEL) -
 > the VM is supposed to be able to handle that.
 > 
 > Is it really *that* easy, or is Something Unusual happening with that
 > machine?

Well, the unusual thing was that I was fuzzing system calls for a few hours.

My fuzzing tool was able to trigger these very easily after an hour or two
of uptime and memory had fragmented a little, so yeah, quite trivial.

	Dave

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-28  0:15   ` Dave Jones
@ 2012-03-28  0:26     ` Andrew Morton
  2012-03-28  7:13       ` David Rientjes
  2012-03-28  4:39     ` Dave Chinner
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2012-03-28  0:26 UTC (permalink / raw)
  To: Dave Jones; +Cc: viro, Linux Kernel

On Tue, 27 Mar 2012 20:15:50 -0400 Dave Jones <davej@redhat.com> wrote:

> On Tue, Mar 27, 2012 at 03:51:49PM -0700, Andrew Morton wrote:
>  > On Tue, 13 Mar 2012 14:22:20 -0400
>  > Dave Jones <davej@redhat.com> wrote:
>  > 
>  > > This size is user controllable, and so it's trivial for someone to trigger a
>  > > stream of order:4 page allocation errors.
>  > > 
>  > > Signed-off-by: Dave Jones <davej@redhat.com>
>  > > 
>  > > --- 
>  > > There's also a similar problem in setxattr, but I'm not sure how we want
>  > > to pass NOWARN down to memdup_user. Thoughts ?
>  > > 
>  > > diff --git a/fs/xattr.c b/fs/xattr.c
>  > > index 82f4337..544df90 100644
>  > > --- a/fs/xattr.c
>  > > +++ b/fs/xattr.c
>  > > @@ -496,7 +496,7 @@ listxattr(struct dentry *d, char __user *list, size_t size)
>  > >  	if (size) {
>  > >  		if (size > XATTR_LIST_MAX)
>  > >  			size = XATTR_LIST_MAX;
>  > > -		klist = kmalloc(size, GFP_KERNEL);
>  > > +		klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL);
>  > >  		if (!klist)
>  > >  			return -ENOMEM;
>  > >  	}
>  > 
>  > hm.  The patch is good, but one would hope that it isn't "trivial" to
>  > trigger a page allocation failure for a kmalloc(65536, GFP_KERNEL) -
>  > the VM is supposed to be able to handle that.
>  > 
>  > Is it really *that* easy, or is Something Unusual happening with that
>  > machine?
> 
> Well, the unusual thing was that I was fuzzing system calls for a few hours.
> 
> My fuzzing tool was able to trigger these very easily after an hour or two
> of uptime and memory had fragmented a little, so yeah, quite trivial.
> 

/*
 * PAGE_ALLOC_COSTLY_ORDER is the order at which allocations are deemed
 * costly to service.  That is between allocation orders which should
 * coelesce naturally under reasonable reclaim pressure and those which
 * will not.
 */
#define PAGE_ALLOC_COSTLY_ORDER 3


Death to magic numbers :(

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-28  0:15   ` Dave Jones
  2012-03-28  0:26     ` Andrew Morton
@ 2012-03-28  4:39     ` Dave Chinner
  2012-03-28 23:47       ` Andrew Morton
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2012-03-28  4:39 UTC (permalink / raw)
  To: Dave Jones, Andrew Morton, viro, Linux Kernel

On Tue, Mar 27, 2012 at 08:15:50PM -0400, Dave Jones wrote:
> On Tue, Mar 27, 2012 at 03:51:49PM -0700, Andrew Morton wrote:
>  > On Tue, 13 Mar 2012 14:22:20 -0400
>  > Dave Jones <davej@redhat.com> wrote:
>  > 
>  > > This size is user controllable, and so it's trivial for someone to trigger a
>  > > stream of order:4 page allocation errors.
>  > > 
>  > > Signed-off-by: Dave Jones <davej@redhat.com>
>  > > 
>  > > --- 
>  > > There's also a similar problem in setxattr, but I'm not sure how we want
>  > > to pass NOWARN down to memdup_user. Thoughts ?
>  > > 
>  > > diff --git a/fs/xattr.c b/fs/xattr.c
>  > > index 82f4337..544df90 100644
>  > > --- a/fs/xattr.c
>  > > +++ b/fs/xattr.c
>  > > @@ -496,7 +496,7 @@ listxattr(struct dentry *d, char __user *list, size_t size)
>  > >  	if (size) {
>  > >  		if (size > XATTR_LIST_MAX)
>  > >  			size = XATTR_LIST_MAX;
>  > > -		klist = kmalloc(size, GFP_KERNEL);
>  > > +		klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL);
>  > >  		if (!klist)
>  > >  			return -ENOMEM;
>  > >  	}
>  > 
>  > hm.  The patch is good, but one would hope that it isn't "trivial" to
>  > trigger a page allocation failure for a kmalloc(65536, GFP_KERNEL) -
>  > the VM is supposed to be able to handle that.
>  > 
>  > Is it really *that* easy, or is Something Unusual happening with that
>  > machine?
> 
> Well, the unusual thing was that I was fuzzing system calls for a few hours.
> 
> My fuzzing tool was able to trigger these very easily after an hour or two
> of uptime and memory had fragmented a little, so yeah, quite trivial.

We've recently been seeing reports of xfsdump trigging a similar
allocation failures in the XFS attr code when we are doing hundreds
of thousands of attribute lookups to back them up.

ad650f5 xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get

I think that falling back to vmalloc here is much better solution
than failing to retreive the attribute - it will work no matter how
fragmented memory gets. That means we don't get incomplete
backups occurring after days or months of uptime and successful
backups...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-28  0:26     ` Andrew Morton
@ 2012-03-28  7:13       ` David Rientjes
  0 siblings, 0 replies; 20+ messages in thread
From: David Rientjes @ 2012-03-28  7:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave Jones, viro, Linux Kernel

On Tue, 27 Mar 2012, Andrew Morton wrote:

> /*
>  * PAGE_ALLOC_COSTLY_ORDER is the order at which allocations are deemed
>  * costly to service.  That is between allocation orders which should
>  * coelesce naturally under reasonable reclaim pressure and those which
>  * will not.
>  */
> #define PAGE_ALLOC_COSTLY_ORDER 3
> 
> 
> Death to magic numbers :(

This isn't as dire as it sounds, memory compaction is specifically 
targeted to run when the order is greater than this, see 
compaction_ready().  If direct reclaim and compaction both fail then 
there's nothing the VM can do other than oom kill to free memory and we 
avoid doing that because there's no guarantee of freeing enough memory 
that the high-order allocation will be successful.

Not even __GFP_REPEAT is going to be helpful since we can't oom kill 
anything, the only alternative would be to use __GFP_NOFAIL and that would 
just be deadly for such an allocation request.  This error is recoverable, 
so

Acked-by: David Rientjes <rientjes@google.com>

to the patch.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-28  4:39     ` Dave Chinner
@ 2012-03-28 23:47       ` Andrew Morton
  2012-03-29  0:54         ` Dave Jones
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2012-03-28 23:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Dave Jones, viro, Linux Kernel, David Rientjes

On Wed, 28 Mar 2012 15:39:51 +1100
Dave Chinner <david@fromorbit.com> wrote:

> > Well, the unusual thing was that I was fuzzing system calls for a few hours.
> > 
> > My fuzzing tool was able to trigger these very easily after an hour or two
> > of uptime and memory had fragmented a little, so yeah, quite trivial.
> 
> We've recently been seeing reports of xfsdump trigging a similar
> allocation failures in the XFS attr code when we are doing hundreds
> of thousands of attribute lookups to back them up.
> 
> ad650f5 xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get
> 
> I think that falling back to vmalloc here is much better solution
> than failing to retreive the attribute - it will work no matter how
> fragmented memory gets. That means we don't get incomplete
> backups occurring after days or months of uptime and successful
> backups...

Yup.  How does the below look?

This patch needs more Davids.


From: Andrew Morton <akpm@linux-foundation.org>
Subject: fs/xattr.c:listxattr(): fall back to vmalloc() if kmalloc() failed

This allocation can be as large as 64k.  As David points out, "falling
back to vmalloc here is much better solution than failing to retreive the
attribute - it will work no matter how fragmented memory gets.  That means
we don't get incomplete backups occurring after days or months of uptime
and successful backups".

Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: David Rientjes <rientjes@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/xattr.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff -puN fs/xattr.c~fs-xattrc-listxattr-fall-back-to-vmalloc-if-kmalloc-failed fs/xattr.c
--- a/fs/xattr.c~fs-xattrc-listxattr-fall-back-to-vmalloc-if-kmalloc-failed
+++ a/fs/xattr.c
@@ -492,13 +492,18 @@ listxattr(struct dentry *d, char __user 
 {
 	ssize_t error;
 	char *klist = NULL;
+	char *vlist = NULL;	/* If non-NULL, we used vmalloc() */
 
 	if (size) {
 		if (size > XATTR_LIST_MAX)
 			size = XATTR_LIST_MAX;
 		klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL);
-		if (!klist)
-			return -ENOMEM;
+		if (!klist) {
+			vlist = vmalloc(size);
+			if (!vlist)
+				return -ENOMEM;
+			klist = vlist;
+		}
 	}
 
 	error = vfs_listxattr(d, klist, size);
@@ -510,7 +515,10 @@ listxattr(struct dentry *d, char __user 
 		   than XATTR_LIST_MAX bytes. Not possible. */
 		error = -E2BIG;
 	}
-	kfree(klist);
+	if (vlist)
+		vfree(vlist);
+	else
+		kfree(klist);
 	return error;
 }
 
_


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-28 23:47       ` Andrew Morton
@ 2012-03-29  0:54         ` Dave Jones
  2012-03-29  1:10           ` Andrew Morton
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Jones @ 2012-03-29  0:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave Chinner, viro, Linux Kernel, David Rientjes

On Wed, Mar 28, 2012 at 04:47:20PM -0700, Andrew Morton wrote:
 > On Wed, 28 Mar 2012 15:39:51 +1100
 > Dave Chinner <david@fromorbit.com> wrote:
 > 
 > > > Well, the unusual thing was that I was fuzzing system calls for a few hours.
 > > > 
 > > > My fuzzing tool was able to trigger these very easily after an hour or two
 > > > of uptime and memory had fragmented a little, so yeah, quite trivial.
 > > 
 > > We've recently been seeing reports of xfsdump trigging a similar
 > > allocation failures in the XFS attr code when we are doing hundreds
 > > of thousands of attribute lookups to back them up.
 > > 
 > > ad650f5 xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get
 > > 
 > > I think that falling back to vmalloc here is much better solution
 > > than failing to retreive the attribute - it will work no matter how
 > > fragmented memory gets. That means we don't get incomplete
 > > backups occurring after days or months of uptime and successful
 > > backups...
 > 
 > Yup.  How does the below look?

Don't see anything immediately wrong with it.
Any thoughts on what to do about the similar problem in setxattr ? (memdup_user)

	Dave

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-29  0:54         ` Dave Jones
@ 2012-03-29  1:10           ` Andrew Morton
  2012-03-29  1:28             ` Joe Perches
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2012-03-29  1:10 UTC (permalink / raw)
  To: Dave Jones; +Cc: Dave Chinner, viro, Linux Kernel, David Rientjes

On Wed, 28 Mar 2012 20:54:42 -0400 Dave Jones <davej@redhat.com> wrote:

> On Wed, Mar 28, 2012 at 04:47:20PM -0700, Andrew Morton wrote:
>  > On Wed, 28 Mar 2012 15:39:51 +1100
>  > Dave Chinner <david@fromorbit.com> wrote:
>  > 
>  > > > Well, the unusual thing was that I was fuzzing system calls for a few hours.
>  > > > 
>  > > > My fuzzing tool was able to trigger these very easily after an hour or two
>  > > > of uptime and memory had fragmented a little, so yeah, quite trivial.
>  > > 
>  > > We've recently been seeing reports of xfsdump trigging a similar
>  > > allocation failures in the XFS attr code when we are doing hundreds
>  > > of thousands of attribute lookups to back them up.
>  > > 
>  > > ad650f5 xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get
>  > > 
>  > > I think that falling back to vmalloc here is much better solution
>  > > than failing to retreive the attribute - it will work no matter how
>  > > fragmented memory gets. That means we don't get incomplete
>  > > backups occurring after days or months of uptime and successful
>  > > backups...
>  > 
>  > Yup.  How does the below look?
> 
> Don't see anything immediately wrong with it.
> Any thoughts on what to do about the similar problem in setxattr ? (memdup_user)
> 

I can't think of anything clever.

The dumb approach:

From: Andrew Morton <akpm@linux-foundation.org>
Subject: fs/xattr.c:setxattr(): improve handling of allocation failures

This allocation can be as large as 64k.

- Add __GFP_NOWARN so that a falied kmalloc() is silent

- Fall back to vmalloc() if the kmalloc() failed

Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: David Rientjes <rientjes@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/xattr.c |   21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff -puN fs/xattr.c~fs-xattrc-setxattr-improve-handling-of-allocation-failures fs/xattr.c
--- a/fs/xattr.c~fs-xattrc-setxattr-improve-handling-of-allocation-failures
+++ a/fs/xattr.c
@@ -320,6 +320,7 @@ setxattr(struct dentry *d, const char __
 {
 	int error;
 	void *kvalue = NULL;
+	void *vvalue = NULL;	/* If non-NULL, we used vmalloc() */
 	char kname[XATTR_NAME_MAX + 1];
 
 	if (flags & ~(XATTR_CREATE|XATTR_REPLACE))
@@ -334,13 +335,25 @@ setxattr(struct dentry *d, const char __
 	if (size) {
 		if (size > XATTR_SIZE_MAX)
 			return -E2BIG;
-		kvalue = memdup_user(value, size);
-		if (IS_ERR(kvalue))
-			return PTR_ERR(kvalue);
+		kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
+		if (!kvalue) {
+			vvalue = vmalloc(size);
+			if (!vvalue)
+				return -ENOMEM;
+			kvalue = vvalue;
+		}
+		if (copy_from_user(kvalue, value, size)) {
+			error = -EFAULT;
+			goto out;
+		}
 	}
 
 	error = vfs_setxattr(d, kname, kvalue, size, flags);
-	kfree(kvalue);
+out:
+	if (vvalue)
+		vfree(vvalue);
+	else
+		kfree(kvalue);
 	return error;
 }
 
_


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-29  1:10           ` Andrew Morton
@ 2012-03-29  1:28             ` Joe Perches
  2012-03-29  1:46               ` Andrew Morton
  0 siblings, 1 reply; 20+ messages in thread
From: Joe Perches @ 2012-03-29  1:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Jones, Dave Chinner, viro, Linux Kernel, David Rientjes

On Wed, 2012-03-28 at 18:10 -0700, Andrew Morton wrote:
> On Wed, 28 Mar 2012 20:54:42 -0400 Dave Jones <davej@redhat.com> wrote:
> >  > Yup.  How does the below look?
> > Don't see anything immediately wrong with it.
> > Any thoughts on what to do about the similar problem in setxattr ? (memdup_user)
[]
> diff -puN fs/xattr.c~fs-xattrc-setxattr-improve-handling-of-allocation-failures fs/xattr.c
[]
> @@ -334,13 +335,25 @@ setxattr(struct dentry *d, const char __
[]
> +		kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> +		if (!kvalue) {
> +			vvalue = vmalloc(size);
[]
> +	if (vvalue)
> +		vfree(vvalue);
> +	else
> +		kfree(kvalue);
>  	return error;

These patterns are pretty common, maybe create a standard helper?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-29  1:28             ` Joe Perches
@ 2012-03-29  1:46               ` Andrew Morton
  2012-03-29  1:50                 ` Dave Jones
  2012-03-29  5:35                 ` Dave Chinner
  0 siblings, 2 replies; 20+ messages in thread
From: Andrew Morton @ 2012-03-29  1:46 UTC (permalink / raw)
  To: Joe Perches; +Cc: Dave Jones, Dave Chinner, viro, Linux Kernel, David Rientjes

On Wed, 28 Mar 2012 18:28:43 -0700 Joe Perches <joe@perches.com> wrote:

> On Wed, 2012-03-28 at 18:10 -0700, Andrew Morton wrote:
> > On Wed, 28 Mar 2012 20:54:42 -0400 Dave Jones <davej@redhat.com> wrote:
> > >  > Yup.  How does the below look?
> > > Don't see anything immediately wrong with it.
> > > Any thoughts on what to do about the similar problem in setxattr ? (memdup_user)
> []
> > diff -puN fs/xattr.c~fs-xattrc-setxattr-improve-handling-of-allocation-failures fs/xattr.c
> []
> > @@ -334,13 +335,25 @@ setxattr(struct dentry *d, const char __
> []
> > +		kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > +		if (!kvalue) {
> > +			vvalue = vmalloc(size);
> []
> > +	if (vvalue)
> > +		vfree(vvalue);
> > +	else
> > +		kfree(kvalue);
> >  	return error;
> 
> These patterns are pretty common, maybe create a standard helper?

Could.  There was some discussion last year and implementations were
tossed around.

I'm a bit apprehensive - kernel code is supposed to be robust, and
large allocations are not robust and vmalloc() is crappy.  Formalising
these things in an API probably won't make anything worse, but will
deprive us of opportunities for ritualistic humiliation and
knuckle-rapping.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-29  1:46               ` Andrew Morton
@ 2012-03-29  1:50                 ` Dave Jones
  2012-03-29  2:02                   ` Andrew Morton
  2012-03-29  5:35                 ` Dave Chinner
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Jones @ 2012-03-29  1:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joe Perches, Dave Chinner, viro, Linux Kernel, David Rientjes

On Wed, Mar 28, 2012 at 06:46:02PM -0700, Andrew Morton wrote:

 > Could.  There was some discussion last year and implementations were
 > tossed around.
 > 
 > I'm a bit apprehensive - kernel code is supposed to be robust, and
 > large allocations are not robust and vmalloc() is crappy.

Can you expand on crappy ?  Also, what happens if something allocates
and sits on a bunch of vmalloc'd memory ? would we start seeing oom kills ?
(thinking of the context of my fuzzing tool where a bunch of instances could
 feasibly call these syscalls and not sit on huge amounts per thread, but
 collectively...  I'm wondering if it could be provoked into killing
 processes I don't own)

	Dave


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-29  1:50                 ` Dave Jones
@ 2012-03-29  2:02                   ` Andrew Morton
  2012-03-29  2:08                     ` Dave Jones
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2012-03-29  2:02 UTC (permalink / raw)
  To: Dave Jones; +Cc: Joe Perches, Dave Chinner, viro, Linux Kernel, David Rientjes

On Wed, 28 Mar 2012 21:50:59 -0400 Dave Jones <davej@redhat.com> wrote:

> On Wed, Mar 28, 2012 at 06:46:02PM -0700, Andrew Morton wrote:
> 
>  > Could.  There was some discussion last year and implementations were
>  > tossed around.
>  > 
>  > I'm a bit apprehensive - kernel code is supposed to be robust, and
>  > large allocations are not robust and vmalloc() is crappy.
> 
> Can you expand on crappy ?

It's expensive on a per-call basis and can end up failing due to
internal fragmentation of vmalloc()'s virtually-addressed arena.  I
don't think I've ever seen a report of anyone getting a vmalloc()
failure due to the fragmentation issue, so it's largely theoretical. 
But of course, the more we use it (especially for long-lived
allocations), the greater the risk becomes.  Mainly to 32-bit machines,
I assume.

> Also, what happens if something allocates
> and sits on a bunch of vmalloc'd memory ? would we start seeing oom kills ?

vmalloc() would fail.

> (thinking of the context of my fuzzing tool where a bunch of instances could
>  feasibly call these syscalls and not sit on huge amounts per thread, but
>  collectively...  I'm wondering if it could be provoked into killing
>  processes I don't own)

umm, if you wanted to deliberately trigger a vmalloc() failure then I
guess a good approach would be to locate a vmalloc() site which can
persist beyond the syscall (modprobe is a good one!) then exercise it
in a way so that there are no N-byte holes left in the arena, then
trigger an N-byte vmalloc().

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-29  2:02                   ` Andrew Morton
@ 2012-03-29  2:08                     ` Dave Jones
  2012-03-29  2:28                       ` Andrew Morton
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Jones @ 2012-03-29  2:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joe Perches, Dave Chinner, viro, Linux Kernel, David Rientjes

On Wed, Mar 28, 2012 at 07:02:11PM -0700, Andrew Morton wrote:

 > > Also, what happens if something allocates
 > > and sits on a bunch of vmalloc'd memory ? would we start seeing oom kills ?
 > 
 > vmalloc() would fail.

Ok, that's a pretty boring failure mode, so not a big deal probably.

 > > (thinking of the context of my fuzzing tool where a bunch of instances could
 > >  feasibly call these syscalls and not sit on huge amounts per thread, but
 > >  collectively...  I'm wondering if it could be provoked into killing
 > >  processes I don't own)
 > 
 > umm, if you wanted to deliberately trigger a vmalloc() failure then I
 > guess a good approach would be to locate a vmalloc() site which can
 > persist beyond the syscall (modprobe is a good one!) then exercise it
 > in a way so that there are no N-byte holes left in the arena, then
 > trigger an N-byte vmalloc().

Well modprobe is root-only, so that's not so bad. But it looks like
key_add (see other thread from this evening) and probably others can be
called as a user and gobble up vmalloc space. omnomnom.

	Dave


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-29  2:08                     ` Dave Jones
@ 2012-03-29  2:28                       ` Andrew Morton
  2012-03-29  3:00                         ` Dave Jones
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2012-03-29  2:28 UTC (permalink / raw)
  To: Dave Jones; +Cc: Joe Perches, Dave Chinner, viro, Linux Kernel, David Rientjes

On Wed, 28 Mar 2012 22:08:20 -0400 Dave Jones <davej@redhat.com> wrote:

> On Wed, Mar 28, 2012 at 07:02:11PM -0700, Andrew Morton wrote:
> 
>  > > Also, what happens if something allocates
>  > > and sits on a bunch of vmalloc'd memory ? would we start seeing oom kills ?
>  > 
>  > vmalloc() would fail.
> 
> Ok, that's a pretty boring failure mode, so not a big deal probably.
> 
>  > > (thinking of the context of my fuzzing tool where a bunch of instances could
>  > >  feasibly call these syscalls and not sit on huge amounts per thread, but
>  > >  collectively...  I'm wondering if it could be provoked into killing
>  > >  processes I don't own)
>  > 
>  > umm, if you wanted to deliberately trigger a vmalloc() failure then I
>  > guess a good approach would be to locate a vmalloc() site which can
>  > persist beyond the syscall (modprobe is a good one!) then exercise it
>  > in a way so that there are no N-byte holes left in the arena, then
>  > trigger an N-byte vmalloc().
> 
> Well modprobe is root-only, so that's not so bad.

Even if it's root-only, we can still end up with a toasted machine. 
Accidentally toasted, not deliberately.

> But it looks like
> key_add (see other thread from this evening) and probably others can be
> called as a user and gobble up vmalloc space. omnomnom.

hm, the keys code appears to prevent the user from reserving more than
20000 bytes of memory total (key_payload_reserve()), so it doesn't look
very useful for screwing up vmalloc().

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-29  2:28                       ` Andrew Morton
@ 2012-03-29  3:00                         ` Dave Jones
  2012-03-29 21:09                           ` Andrew Morton
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Jones @ 2012-03-29  3:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joe Perches, Dave Chinner, viro, Linux Kernel, David Rientjes

On Wed, Mar 28, 2012 at 07:28:04PM -0700, Andrew Morton wrote:
 > > But it looks like
 > > key_add (see other thread from this evening) and probably others can be
 > > called as a user and gobble up vmalloc space. omnomnom.
 > 
 > hm, the keys code appears to prevent the user from reserving more than
 > 20000 bytes of memory total (key_payload_reserve()), so it doesn't look
 > very useful for screwing up vmalloc().

Then how did I trick it into trying an order 8 allocation ?

trinity: page allocation failure: order:8, mode:0x40d0
Pid: 27119, comm: trinity Not tainted 3.3.0+ #31
Call Trace:
 [<ffffffff8115dd66>] warn_alloc_failed+0xf6/0x160
 [<ffffffff816ad436>] ? __alloc_pages_direct_compact+0x1d0/0x1e2
 [<ffffffff81162492>] __alloc_pages_nodemask+0x8b2/0xb10
 [<ffffffff8119dae6>] alloc_pages_current+0xb6/0x120
 [<ffffffff8115d3b4>] __get_free_pages+0x14/0x50
 [<ffffffff811ac64f>] kmalloc_order_trace+0x3f/0x1a0
 [<ffffffff811aca0a>] __kmalloc+0x25a/0x280
 [<ffffffff812c034a>] sys_add_key+0x9a/0x210
 [<ffffffff813386be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff816c04e9>] system_call_fastpath+0x16/0x1b


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-29  1:46               ` Andrew Morton
  2012-03-29  1:50                 ` Dave Jones
@ 2012-03-29  5:35                 ` Dave Chinner
  1 sibling, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2012-03-29  5:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Joe Perches, Dave Jones, viro, Linux Kernel, David Rientjes

On Wed, Mar 28, 2012 at 06:46:02PM -0700, Andrew Morton wrote:
> On Wed, 28 Mar 2012 18:28:43 -0700 Joe Perches <joe@perches.com> wrote:
> 
> > On Wed, 2012-03-28 at 18:10 -0700, Andrew Morton wrote:
> > > On Wed, 28 Mar 2012 20:54:42 -0400 Dave Jones <davej@redhat.com> wrote:
> > > >  > Yup.  How does the below look?
> > > > Don't see anything immediately wrong with it.
> > > > Any thoughts on what to do about the similar problem in setxattr ? (memdup_user)
> > []
> > > diff -puN fs/xattr.c~fs-xattrc-setxattr-improve-handling-of-allocation-failures fs/xattr.c
> > []
> > > @@ -334,13 +335,25 @@ setxattr(struct dentry *d, const char __
> > []
> > > +		kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > > +		if (!kvalue) {
> > > +			vvalue = vmalloc(size);
> > []
> > > +	if (vvalue)
> > > +		vfree(vvalue);
> > > +	else
> > > +		kfree(kvalue);
> > >  	return error;
> > 
> > These patterns are pretty common, maybe create a standard helper?
> 
> Could.  There was some discussion last year and implementations were
> tossed around.
> 
> I'm a bit apprehensive - kernel code is supposed to be robust, and
> large allocations are not robust and vmalloc() is crappy.  Formalising
> these things in an API probably won't make anything worse, but will
> deprive us of opportunities for ritualistic humiliation and
> knuckle-rapping.

I did a sweep of this recently, considering helpers for exactly such
an allocation and replacing the existing per-filesystem wrappers for
it. IIRC, there are wrappper functions in ext4, gfs2, and ntfs, XFS
now open codes it in a couple of places, there's alloc_fdmem(), cgroup
pidlists and the network code does it in several places, etc. 
Even some drivers are doing this.  It's a widespread pattern.

The easiest way to find the trivial wrappers is to grep for
is_vmalloc_addr, because all the wrapper functions use this code to
determine what to do:

	if (is_vmalloc_addr(p))
		vfree(p)
	else
		kfree(p)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-29  3:00                         ` Dave Jones
@ 2012-03-29 21:09                           ` Andrew Morton
  2012-03-29 21:13                             ` Dave Jones
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2012-03-29 21:09 UTC (permalink / raw)
  To: Dave Jones; +Cc: Joe Perches, Dave Chinner, viro, Linux Kernel, David Rientjes

On Wed, 28 Mar 2012 23:00:00 -0400
Dave Jones <davej@redhat.com> wrote:

> On Wed, Mar 28, 2012 at 07:28:04PM -0700, Andrew Morton wrote:
>  > > But it looks like
>  > > key_add (see other thread from this evening) and probably others can be
>  > > called as a user and gobble up vmalloc space. omnomnom.
>  > 
>  > hm, the keys code appears to prevent the user from reserving more than
>  > 20000 bytes of memory total (key_payload_reserve()), so it doesn't look
>  > very useful for screwing up vmalloc().
> 
> Then how did I trick it into trying an order 8 allocation ?
> 
> trinity: page allocation failure: order:8, mode:0x40d0
> Pid: 27119, comm: trinity Not tainted 3.3.0+ #31
> Call Trace:
>  [<ffffffff8115dd66>] warn_alloc_failed+0xf6/0x160
>  [<ffffffff816ad436>] ? __alloc_pages_direct_compact+0x1d0/0x1e2
>  [<ffffffff81162492>] __alloc_pages_nodemask+0x8b2/0xb10
>  [<ffffffff8119dae6>] alloc_pages_current+0xb6/0x120
>  [<ffffffff8115d3b4>] __get_free_pages+0x14/0x50
>  [<ffffffff811ac64f>] kmalloc_order_trace+0x3f/0x1a0
>  [<ffffffff811aca0a>] __kmalloc+0x25a/0x280
>  [<ffffffff812c034a>] sys_add_key+0x9a/0x210
>  [<ffffffff813386be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff816c04e9>] system_call_fastpath+0x16/0x1b

Ah, that's different.  The memory at *payload doesn't live beyond the
syscall so it can't be used to cause vmalloc fragmentation.

We should squish the warning:


From: Andrew Morton <akpm@linux-foundation.org>
Subject: security/keys/keyctl.c: suppress memory allocation failure warning	

This allocation may be large.  The code is probing to see if it will
succeed and if not, it falls back to vmalloc().  We should suppress any
page-allocation failure messages when the fallback happens.

Reported-by: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 security/keys/keyctl.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN security/keys/keyctl.c~security-keys-keyctlc-suppress-memory-allocation-failure-warning security/keys/keyctl.c
--- a/security/keys/keyctl.c~security-keys-keyctlc-suppress-memory-allocation-failure-warning
+++ a/security/keys/keyctl.c
@@ -84,7 +84,7 @@ SYSCALL_DEFINE5(add_key, const char __us
 	vm = false;
 	if (_payload) {
 		ret = -ENOMEM;
-		payload = kmalloc(plen, GFP_KERNEL);
+		payload = kmalloc(plen, GFP_KERNEL | __GFP_NOWARN);
 		if (!payload) {
 			if (plen <= PAGE_SIZE)
 				goto error2;
_


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: suppress page allocation failure warnings from sys_listxattr
  2012-03-29 21:09                           ` Andrew Morton
@ 2012-03-29 21:13                             ` Dave Jones
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Jones @ 2012-03-29 21:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joe Perches, Dave Chinner, viro, Linux Kernel, David Rientjes

On Thu, Mar 29, 2012 at 02:09:34PM -0700, Andrew Morton wrote:
 > On Wed, 28 Mar 2012 23:00:00 -0400
 > Dave Jones <davej@redhat.com> wrote:
 > 
 > > On Wed, Mar 28, 2012 at 07:28:04PM -0700, Andrew Morton wrote:
 > >  > > But it looks like
 > >  > > key_add (see other thread from this evening) and probably others can be
 > >  > > called as a user and gobble up vmalloc space. omnomnom.
 > >  > 
 > >  > hm, the keys code appears to prevent the user from reserving more than
 > >  > 20000 bytes of memory total (key_payload_reserve()), so it doesn't look
 > >  > very useful for screwing up vmalloc().
 > > 
 > > Then how did I trick it into trying an order 8 allocation ?
 > > 
 > > trinity: page allocation failure: order:8, mode:0x40d0
 > > Pid: 27119, comm: trinity Not tainted 3.3.0+ #31
 > > Call Trace:
 > >  [<ffffffff8115dd66>] warn_alloc_failed+0xf6/0x160
 > >  [<ffffffff816ad436>] ? __alloc_pages_direct_compact+0x1d0/0x1e2
 > >  [<ffffffff81162492>] __alloc_pages_nodemask+0x8b2/0xb10
 > >  [<ffffffff8119dae6>] alloc_pages_current+0xb6/0x120
 > >  [<ffffffff8115d3b4>] __get_free_pages+0x14/0x50
 > >  [<ffffffff811ac64f>] kmalloc_order_trace+0x3f/0x1a0
 > >  [<ffffffff811aca0a>] __kmalloc+0x25a/0x280
 > >  [<ffffffff812c034a>] sys_add_key+0x9a/0x210
 > >  [<ffffffff813386be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 > >  [<ffffffff816c04e9>] system_call_fastpath+0x16/0x1b
 > 
 > Ah, that's different.  The memory at *payload doesn't live beyond the
 > syscall so it can't be used to cause vmalloc fragmentation.
 > 
 > We should squish the warning:

That's the same patch I sent in the other thread, so ack ;-)

	Dave

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-03-29 21:14 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-13 18:22 suppress page allocation failure warnings from sys_listxattr Dave Jones
2012-03-13 21:33 ` Colin Walters
2012-03-27 22:51 ` Andrew Morton
2012-03-28  0:15   ` Dave Jones
2012-03-28  0:26     ` Andrew Morton
2012-03-28  7:13       ` David Rientjes
2012-03-28  4:39     ` Dave Chinner
2012-03-28 23:47       ` Andrew Morton
2012-03-29  0:54         ` Dave Jones
2012-03-29  1:10           ` Andrew Morton
2012-03-29  1:28             ` Joe Perches
2012-03-29  1:46               ` Andrew Morton
2012-03-29  1:50                 ` Dave Jones
2012-03-29  2:02                   ` Andrew Morton
2012-03-29  2:08                     ` Dave Jones
2012-03-29  2:28                       ` Andrew Morton
2012-03-29  3:00                         ` Dave Jones
2012-03-29 21:09                           ` Andrew Morton
2012-03-29 21:13                             ` Dave Jones
2012-03-29  5:35                 ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.