linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] sysfs: Unconditionally use vmalloc for buffer
@ 2021-04-01  2:21 Kees Cook
  2021-04-01  5:16 ` Greg Kroah-Hartman
  2021-04-01  7:14 ` Michal Hocko
  0 siblings, 2 replies; 7+ messages in thread
From: Kees Cook @ 2021-04-01  2:21 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kees Cook, Andrew Morton, Rafael J. Wysocki, Michal Hocko,
	Alexey Dobriyan, Lee Duncan, Chris Leech, Adam Nichols,
	linux-fsdevel, linux-hardening, linux-kernel

The sysfs interface to seq_file continues to be rather fragile
(seq_get_buf() should not be used outside of seq_file), as seen with
some recent exploits[1]. Move the seq_file buffer to the vmap area
(while retaining the accounting flag), since it has guard pages that
will catch and stop linear overflows. This seems justified given that
sysfs's use of seq_file already uses kvmalloc(), is almost always using
a PAGE_SIZE or larger allocation, has normally short-lived allocations,
and is not normally on a performance critical path.

Once seq_get_buf() has been removed (and all sysfs callbacks using
seq_file directly), this change can also be removed.

[1] https://blog.grimm-co.com/2021/03/new-old-bugs-in-linux-kernel.html

Signed-off-by: Kees Cook <keescook@chromium.org>
---
v3:
- Limit to only sysfs (instead of all of seq_file).
v2: https://lore.kernel.org/lkml/20210315174851.622228-1-keescook@chromium.org/
v1: https://lore.kernel.org/lkml/20210312205558.2947488-1-keescook@chromium.org/
---
 fs/sysfs/file.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 9aefa7779b29..70e7a450e5d1 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -16,6 +16,7 @@
 #include <linux/mutex.h>
 #include <linux/seq_file.h>
 #include <linux/mm.h>
+#include <linux/vmalloc.h>
 
 #include "sysfs.h"
 
@@ -32,6 +33,25 @@ static const struct sysfs_ops *sysfs_file_ops(struct kernfs_node *kn)
 	return kobj->ktype ? kobj->ktype->sysfs_ops : NULL;
 }
 
+/*
+ * To be proactively defensive against sysfs show() handlers that do not
+ * correctly stay within their PAGE_SIZE buffer, use the vmap area to gain
+ * the trailing guard page which will stop linear buffer overflows.
+ */
+static void *sysfs_kf_seq_start(struct seq_file *sf, loff_t *ppos)
+{
+	struct kernfs_open_file *of = sf->private;
+	struct kernfs_node *kn = of->kn;
+
+	WARN_ON_ONCE(sf->buf);
+	sf->buf = __vmalloc(kn->attr.size, GFP_KERNEL_ACCOUNT);
+	if (!sf->buf)
+		return ERR_PTR(-ENOMEM);
+	sf->size = kn->attr.size;
+
+	return NULL + !*ppos;
+}
+
 /*
  * Reads on sysfs are handled through seq_file, which takes care of hairy
  * details like buffering and seeking.  The following function pipes
@@ -206,14 +226,17 @@ static const struct kernfs_ops sysfs_file_kfops_empty = {
 };
 
 static const struct kernfs_ops sysfs_file_kfops_ro = {
+	.seq_start	= sysfs_kf_seq_start,
 	.seq_show	= sysfs_kf_seq_show,
 };
 
 static const struct kernfs_ops sysfs_file_kfops_wo = {
+	.seq_start	= sysfs_kf_seq_start,
 	.write		= sysfs_kf_write,
 };
 
 static const struct kernfs_ops sysfs_file_kfops_rw = {
+	.seq_start	= sysfs_kf_seq_start,
 	.seq_show	= sysfs_kf_seq_show,
 	.write		= sysfs_kf_write,
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sysfs: Unconditionally use vmalloc for buffer
  2021-04-01  2:21 [PATCH v3] sysfs: Unconditionally use vmalloc for buffer Kees Cook
@ 2021-04-01  5:16 ` Greg Kroah-Hartman
  2021-04-01  6:52   ` Kees Cook
  2021-04-01  7:14 ` Michal Hocko
  1 sibling, 1 reply; 7+ messages in thread
From: Greg Kroah-Hartman @ 2021-04-01  5:16 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Rafael J. Wysocki, Michal Hocko, Alexey Dobriyan,
	Lee Duncan, Chris Leech, Adam Nichols, linux-fsdevel,
	linux-hardening, linux-kernel

On Wed, Mar 31, 2021 at 07:21:45PM -0700, Kees Cook wrote:
> The sysfs interface to seq_file continues to be rather fragile
> (seq_get_buf() should not be used outside of seq_file), as seen with
> some recent exploits[1]. Move the seq_file buffer to the vmap area
> (while retaining the accounting flag), since it has guard pages that
> will catch and stop linear overflows. This seems justified given that
> sysfs's use of seq_file already uses kvmalloc(), is almost always using
> a PAGE_SIZE or larger allocation, has normally short-lived allocations,
> and is not normally on a performance critical path.
> 
> Once seq_get_buf() has been removed (and all sysfs callbacks using
> seq_file directly), this change can also be removed.
> 
> [1] https://blog.grimm-co.com/2021/03/new-old-bugs-in-linux-kernel.html
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> v3:
> - Limit to only sysfs (instead of all of seq_file).
> v2: https://lore.kernel.org/lkml/20210315174851.622228-1-keescook@chromium.org/
> v1: https://lore.kernel.org/lkml/20210312205558.2947488-1-keescook@chromium.org/
> ---
>  fs/sysfs/file.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> index 9aefa7779b29..70e7a450e5d1 100644
> --- a/fs/sysfs/file.c
> +++ b/fs/sysfs/file.c
> @@ -16,6 +16,7 @@
>  #include <linux/mutex.h>
>  #include <linux/seq_file.h>
>  #include <linux/mm.h>
> +#include <linux/vmalloc.h>
>  
>  #include "sysfs.h"
>  
> @@ -32,6 +33,25 @@ static const struct sysfs_ops *sysfs_file_ops(struct kernfs_node *kn)
>  	return kobj->ktype ? kobj->ktype->sysfs_ops : NULL;
>  }
>  
> +/*
> + * To be proactively defensive against sysfs show() handlers that do not
> + * correctly stay within their PAGE_SIZE buffer, use the vmap area to gain
> + * the trailing guard page which will stop linear buffer overflows.
> + */
> +static void *sysfs_kf_seq_start(struct seq_file *sf, loff_t *ppos)
> +{
> +	struct kernfs_open_file *of = sf->private;
> +	struct kernfs_node *kn = of->kn;
> +
> +	WARN_ON_ONCE(sf->buf);

How can buf ever not be NULL?  And if it is, we will leak memory in the
next line so we shouldn't have _ONCE, we should always know, but not
rebooting the machine would be nice.

> +	sf->buf = __vmalloc(kn->attr.size, GFP_KERNEL_ACCOUNT);
> +	if (!sf->buf)
> +		return ERR_PTR(-ENOMEM);
> +	sf->size = kn->attr.size;
> +
> +	return NULL + !*ppos;
> +}

Will this also cause the vmalloc fragmentation/abuse that others have
mentioned as userspace can trigger this?

And what code frees it?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sysfs: Unconditionally use vmalloc for buffer
  2021-04-01  5:16 ` Greg Kroah-Hartman
@ 2021-04-01  6:52   ` Kees Cook
  2021-04-01  7:10     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2021-04-01  6:52 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andrew Morton, Rafael J. Wysocki, Michal Hocko, Alexey Dobriyan,
	Lee Duncan, Chris Leech, Adam Nichols, linux-fsdevel,
	linux-hardening, linux-kernel

On Thu, Apr 01, 2021 at 07:16:56AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Mar 31, 2021 at 07:21:45PM -0700, Kees Cook wrote:
> > The sysfs interface to seq_file continues to be rather fragile
> > (seq_get_buf() should not be used outside of seq_file), as seen with
> > some recent exploits[1]. Move the seq_file buffer to the vmap area
> > (while retaining the accounting flag), since it has guard pages that
> > will catch and stop linear overflows. This seems justified given that
> > sysfs's use of seq_file already uses kvmalloc(), is almost always using
> > a PAGE_SIZE or larger allocation, has normally short-lived allocations,
> > and is not normally on a performance critical path.
> > 
> > Once seq_get_buf() has been removed (and all sysfs callbacks using
> > seq_file directly), this change can also be removed.
> > 
> > [1] https://blog.grimm-co.com/2021/03/new-old-bugs-in-linux-kernel.html
> > 
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > ---
> > v3:
> > - Limit to only sysfs (instead of all of seq_file).
> > v2: https://lore.kernel.org/lkml/20210315174851.622228-1-keescook@chromium.org/
> > v1: https://lore.kernel.org/lkml/20210312205558.2947488-1-keescook@chromium.org/
> > ---
> >  fs/sysfs/file.c | 23 +++++++++++++++++++++++
> >  1 file changed, 23 insertions(+)
> > 
> > diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> > index 9aefa7779b29..70e7a450e5d1 100644
> > --- a/fs/sysfs/file.c
> > +++ b/fs/sysfs/file.c
> > @@ -16,6 +16,7 @@
> >  #include <linux/mutex.h>
> >  #include <linux/seq_file.h>
> >  #include <linux/mm.h>
> > +#include <linux/vmalloc.h>
> >  
> >  #include "sysfs.h"
> >  
> > @@ -32,6 +33,25 @@ static const struct sysfs_ops *sysfs_file_ops(struct kernfs_node *kn)
> >  	return kobj->ktype ? kobj->ktype->sysfs_ops : NULL;
> >  }
> >  
> > +/*
> > + * To be proactively defensive against sysfs show() handlers that do not
> > + * correctly stay within their PAGE_SIZE buffer, use the vmap area to gain
> > + * the trailing guard page which will stop linear buffer overflows.
> > + */
> > +static void *sysfs_kf_seq_start(struct seq_file *sf, loff_t *ppos)
> > +{
> > +	struct kernfs_open_file *of = sf->private;
> > +	struct kernfs_node *kn = of->kn;
> > +
> > +	WARN_ON_ONCE(sf->buf);
> 
> How can buf ever not be NULL?  And if it is, we will leak memory in the
> next line so we shouldn't have _ONCE, we should always know, but not
> rebooting the machine would be nice.

It should never be possible. I did this because seq_file has some
unusual buf allocation patterns in the kernel, and I liked the cheap
leak check. I use _ONCE because spewing endlessly doesn't help most
cases. And if you want to trigger it again, you don't have to reboot:
https://www.kernel.org/doc/html/latest/admin-guide/clearing-warn-once.html

> 
> > +	sf->buf = __vmalloc(kn->attr.size, GFP_KERNEL_ACCOUNT);
> > +	if (!sf->buf)
> > +		return ERR_PTR(-ENOMEM);
> > +	sf->size = kn->attr.size;
> > +
> > +	return NULL + !*ppos;
> > +}
> 
> Will this also cause the vmalloc fragmentation/abuse that others have
> mentioned as userspace can trigger this?

If I understood the concern correctly, it was about it being a risk for
doing it for all seq_file uses. This version confines the changes to only
sysfs seq_file uses.

> And what code frees it?

The existing hooks to seq_release() handle this already. This kind of
"preallocation" of the seq_file buffer is done in a few places already
(hence my desire for the sanity checking WARN lest future seq_file
semantics change).

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sysfs: Unconditionally use vmalloc for buffer
  2021-04-01  6:52   ` Kees Cook
@ 2021-04-01  7:10     ` Greg Kroah-Hartman
  2021-04-01  7:30       ` Kees Cook
  0 siblings, 1 reply; 7+ messages in thread
From: Greg Kroah-Hartman @ 2021-04-01  7:10 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Rafael J. Wysocki, Michal Hocko, Alexey Dobriyan,
	Lee Duncan, Chris Leech, Adam Nichols, linux-fsdevel,
	linux-hardening, linux-kernel

On Wed, Mar 31, 2021 at 11:52:20PM -0700, Kees Cook wrote:
> On Thu, Apr 01, 2021 at 07:16:56AM +0200, Greg Kroah-Hartman wrote:
> > On Wed, Mar 31, 2021 at 07:21:45PM -0700, Kees Cook wrote:
> > > The sysfs interface to seq_file continues to be rather fragile
> > > (seq_get_buf() should not be used outside of seq_file), as seen with
> > > some recent exploits[1]. Move the seq_file buffer to the vmap area
> > > (while retaining the accounting flag), since it has guard pages that
> > > will catch and stop linear overflows. This seems justified given that
> > > sysfs's use of seq_file already uses kvmalloc(), is almost always using
> > > a PAGE_SIZE or larger allocation, has normally short-lived allocations,
> > > and is not normally on a performance critical path.
> > > 
> > > Once seq_get_buf() has been removed (and all sysfs callbacks using
> > > seq_file directly), this change can also be removed.
> > > 
> > > [1] https://blog.grimm-co.com/2021/03/new-old-bugs-in-linux-kernel.html
> > > 
> > > Signed-off-by: Kees Cook <keescook@chromium.org>
> > > ---
> > > v3:
> > > - Limit to only sysfs (instead of all of seq_file).
> > > v2: https://lore.kernel.org/lkml/20210315174851.622228-1-keescook@chromium.org/
> > > v1: https://lore.kernel.org/lkml/20210312205558.2947488-1-keescook@chromium.org/
> > > ---
> > >  fs/sysfs/file.c | 23 +++++++++++++++++++++++
> > >  1 file changed, 23 insertions(+)
> > > 
> > > diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> > > index 9aefa7779b29..70e7a450e5d1 100644
> > > --- a/fs/sysfs/file.c
> > > +++ b/fs/sysfs/file.c
> > > @@ -16,6 +16,7 @@
> > >  #include <linux/mutex.h>
> > >  #include <linux/seq_file.h>
> > >  #include <linux/mm.h>
> > > +#include <linux/vmalloc.h>
> > >  
> > >  #include "sysfs.h"
> > >  
> > > @@ -32,6 +33,25 @@ static const struct sysfs_ops *sysfs_file_ops(struct kernfs_node *kn)
> > >  	return kobj->ktype ? kobj->ktype->sysfs_ops : NULL;
> > >  }
> > >  
> > > +/*
> > > + * To be proactively defensive against sysfs show() handlers that do not
> > > + * correctly stay within their PAGE_SIZE buffer, use the vmap area to gain
> > > + * the trailing guard page which will stop linear buffer overflows.
> > > + */
> > > +static void *sysfs_kf_seq_start(struct seq_file *sf, loff_t *ppos)
> > > +{
> > > +	struct kernfs_open_file *of = sf->private;
> > > +	struct kernfs_node *kn = of->kn;
> > > +
> > > +	WARN_ON_ONCE(sf->buf);
> > 
> > How can buf ever not be NULL?  And if it is, we will leak memory in the
> > next line so we shouldn't have _ONCE, we should always know, but not
> > rebooting the machine would be nice.
> 
> It should never be possible. I did this because seq_file has some
> unusual buf allocation patterns in the kernel, and I liked the cheap
> leak check. I use _ONCE because spewing endlessly doesn't help most
> cases. And if you want to trigger it again, you don't have to reboot:
> https://www.kernel.org/doc/html/latest/admin-guide/clearing-warn-once.html

True, I was thinking of the panic-on-warn people, and the hesitation of
adding new WARN_ON() to the kernel code.  If this really can happen,
shouldn't we handle it properly?

> > > +	sf->buf = __vmalloc(kn->attr.size, GFP_KERNEL_ACCOUNT);
> > > +	if (!sf->buf)
> > > +		return ERR_PTR(-ENOMEM);
> > > +	sf->size = kn->attr.size;
> > > +
> > > +	return NULL + !*ppos;
> > > +}
> > 
> > Will this also cause the vmalloc fragmentation/abuse that others have
> > mentioned as userspace can trigger this?
> 
> If I understood the concern correctly, it was about it being a risk for
> doing it for all seq_file uses. This version confines the changes to only
> sysfs seq_file uses.

There are a few sysfs files that userspace can read from out there :)

> > And what code frees it?
> 
> The existing hooks to seq_release() handle this already. This kind of
> "preallocation" of the seq_file buffer is done in a few places already
> (hence my desire for the sanity checking WARN lest future seq_file
> semantics change).

Ah, "magic", gotta love it...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sysfs: Unconditionally use vmalloc for buffer
  2021-04-01  2:21 [PATCH v3] sysfs: Unconditionally use vmalloc for buffer Kees Cook
  2021-04-01  5:16 ` Greg Kroah-Hartman
@ 2021-04-01  7:14 ` Michal Hocko
  2021-04-01  7:37   ` Kees Cook
  1 sibling, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2021-04-01  7:14 UTC (permalink / raw)
  To: Kees Cook
  Cc: Greg Kroah-Hartman, Andrew Morton, Rafael J. Wysocki,
	Alexey Dobriyan, Lee Duncan, Chris Leech, Adam Nichols,
	linux-fsdevel, linux-hardening, linux-kernel

On Wed 31-03-21 19:21:45, Kees Cook wrote:
> The sysfs interface to seq_file continues to be rather fragile
> (seq_get_buf() should not be used outside of seq_file), as seen with
> some recent exploits[1]. Move the seq_file buffer to the vmap area
> (while retaining the accounting flag), since it has guard pages that
> will catch and stop linear overflows.

I thought the previous discussion has led to a conclusion that the
preferred way is to disallow direct seq_file buffer usage. But this is
obviously up to sysfs maintainers. I am happy you do not want to spread
this out to all seq_file users anymore.

> This seems justified given that
> sysfs's use of seq_file already uses kvmalloc(), is almost always using
> a PAGE_SIZE or larger allocation, has normally short-lived allocations,
> and is not normally on a performance critical path.

Let me clarify on this, because this is not quite right. kvmalloc vs
vmalloc (both with GFP_KERNEL) on PAGE_SIZE are two different beasts.
The first one is almost always going to use kmalloc because the page
allocator almost never fails those requests.

> Once seq_get_buf() has been removed (and all sysfs callbacks using
> seq_file directly), this change can also be removed.
> 
> [1] https://blog.grimm-co.com/2021/03/new-old-bugs-in-linux-kernel.html
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> v3:
> - Limit to only sysfs (instead of all of seq_file).
> v2: https://lore.kernel.org/lkml/20210315174851.622228-1-keescook@chromium.org/
> v1: https://lore.kernel.org/lkml/20210312205558.2947488-1-keescook@chromium.org/
> ---
>  fs/sysfs/file.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> index 9aefa7779b29..70e7a450e5d1 100644
> --- a/fs/sysfs/file.c
> +++ b/fs/sysfs/file.c
> @@ -16,6 +16,7 @@
>  #include <linux/mutex.h>
>  #include <linux/seq_file.h>
>  #include <linux/mm.h>
> +#include <linux/vmalloc.h>
>  
>  #include "sysfs.h"
>  
> @@ -32,6 +33,25 @@ static const struct sysfs_ops *sysfs_file_ops(struct kernfs_node *kn)
>  	return kobj->ktype ? kobj->ktype->sysfs_ops : NULL;
>  }
>  
> +/*
> + * To be proactively defensive against sysfs show() handlers that do not
> + * correctly stay within their PAGE_SIZE buffer, use the vmap area to gain
> + * the trailing guard page which will stop linear buffer overflows.
> + */
> +static void *sysfs_kf_seq_start(struct seq_file *sf, loff_t *ppos)
> +{
> +	struct kernfs_open_file *of = sf->private;
> +	struct kernfs_node *kn = of->kn;
> +
> +	WARN_ON_ONCE(sf->buf);
> +	sf->buf = __vmalloc(kn->attr.size, GFP_KERNEL_ACCOUNT);
> +	if (!sf->buf)
> +		return ERR_PTR(-ENOMEM);
> +	sf->size = kn->attr.size;
> +
> +	return NULL + !*ppos;
> +}
> +
>  /*
>   * Reads on sysfs are handled through seq_file, which takes care of hairy
>   * details like buffering and seeking.  The following function pipes
> @@ -206,14 +226,17 @@ static const struct kernfs_ops sysfs_file_kfops_empty = {
>  };
>  
>  static const struct kernfs_ops sysfs_file_kfops_ro = {
> +	.seq_start	= sysfs_kf_seq_start,
>  	.seq_show	= sysfs_kf_seq_show,
>  };
>  
>  static const struct kernfs_ops sysfs_file_kfops_wo = {
> +	.seq_start	= sysfs_kf_seq_start,
>  	.write		= sysfs_kf_write,
>  };
>  
>  static const struct kernfs_ops sysfs_file_kfops_rw = {
> +	.seq_start	= sysfs_kf_seq_start,
>  	.seq_show	= sysfs_kf_seq_show,
>  	.write		= sysfs_kf_write,
>  };
> -- 
> 2.25.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sysfs: Unconditionally use vmalloc for buffer
  2021-04-01  7:10     ` Greg Kroah-Hartman
@ 2021-04-01  7:30       ` Kees Cook
  0 siblings, 0 replies; 7+ messages in thread
From: Kees Cook @ 2021-04-01  7:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andrew Morton, Rafael J. Wysocki, Michal Hocko, Alexey Dobriyan,
	Lee Duncan, Chris Leech, Adam Nichols, linux-fsdevel,
	linux-hardening, linux-kernel

On Thu, Apr 01, 2021 at 09:10:05AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Mar 31, 2021 at 11:52:20PM -0700, Kees Cook wrote:
> > On Thu, Apr 01, 2021 at 07:16:56AM +0200, Greg Kroah-Hartman wrote:
> > > On Wed, Mar 31, 2021 at 07:21:45PM -0700, Kees Cook wrote:
> > > > The sysfs interface to seq_file continues to be rather fragile
> > > > (seq_get_buf() should not be used outside of seq_file), as seen with
> > > > some recent exploits[1]. Move the seq_file buffer to the vmap area
> > > > (while retaining the accounting flag), since it has guard pages that
> > > > will catch and stop linear overflows. This seems justified given that
> > > > sysfs's use of seq_file already uses kvmalloc(), is almost always using
> > > > a PAGE_SIZE or larger allocation, has normally short-lived allocations,
> > > > and is not normally on a performance critical path.
> > > > 
> > > > Once seq_get_buf() has been removed (and all sysfs callbacks using
> > > > seq_file directly), this change can also be removed.
> > > > 
> > > > [1] https://blog.grimm-co.com/2021/03/new-old-bugs-in-linux-kernel.html
> > > > 
> > > > Signed-off-by: Kees Cook <keescook@chromium.org>
> > > > ---
> > > > v3:
> > > > - Limit to only sysfs (instead of all of seq_file).
> > > > v2: https://lore.kernel.org/lkml/20210315174851.622228-1-keescook@chromium.org/
> > > > v1: https://lore.kernel.org/lkml/20210312205558.2947488-1-keescook@chromium.org/
> > > > ---
> > > >  fs/sysfs/file.c | 23 +++++++++++++++++++++++
> > > >  1 file changed, 23 insertions(+)
> > > > 
> > > > diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> > > > index 9aefa7779b29..70e7a450e5d1 100644
> > > > --- a/fs/sysfs/file.c
> > > > +++ b/fs/sysfs/file.c
> > > > @@ -16,6 +16,7 @@
> > > >  #include <linux/mutex.h>
> > > >  #include <linux/seq_file.h>
> > > >  #include <linux/mm.h>
> > > > +#include <linux/vmalloc.h>
> > > >  
> > > >  #include "sysfs.h"
> > > >  
> > > > @@ -32,6 +33,25 @@ static const struct sysfs_ops *sysfs_file_ops(struct kernfs_node *kn)
> > > >  	return kobj->ktype ? kobj->ktype->sysfs_ops : NULL;
> > > >  }
> > > >  
> > > > +/*
> > > > + * To be proactively defensive against sysfs show() handlers that do not
> > > > + * correctly stay within their PAGE_SIZE buffer, use the vmap area to gain
> > > > + * the trailing guard page which will stop linear buffer overflows.
> > > > + */
> > > > +static void *sysfs_kf_seq_start(struct seq_file *sf, loff_t *ppos)
> > > > +{
> > > > +	struct kernfs_open_file *of = sf->private;
> > > > +	struct kernfs_node *kn = of->kn;
> > > > +
> > > > +	WARN_ON_ONCE(sf->buf);
> > > 
> > > How can buf ever not be NULL?  And if it is, we will leak memory in the
> > > next line so we shouldn't have _ONCE, we should always know, but not
> > > rebooting the machine would be nice.
> > 
> > It should never be possible. I did this because seq_file has some
> > unusual buf allocation patterns in the kernel, and I liked the cheap
> > leak check. I use _ONCE because spewing endlessly doesn't help most
> > cases. And if you want to trigger it again, you don't have to reboot:
> > https://www.kernel.org/doc/html/latest/admin-guide/clearing-warn-once.html
> 
> True, I was thinking of the panic-on-warn people, and the hesitation of
> adding new WARN_ON() to the kernel code.  If this really can happen,
> shouldn't we handle it properly?

It should never happen, but I hate silent bugs. Given the existing
pattern of "external preallocation", it seems like a fragile interface
worth asserting our expectations.

The panic_on_warn folks will get exactly what they wanted: immediate
feedback on "expected to be impossible" cases:
https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on

> > > > +	sf->buf = __vmalloc(kn->attr.size, GFP_KERNEL_ACCOUNT);
> > > > +	if (!sf->buf)
> > > > +		return ERR_PTR(-ENOMEM);
> > > > +	sf->size = kn->attr.size;
> > > > +
> > > > +	return NULL + !*ppos;
> > > > +}
> > > 
> > > Will this also cause the vmalloc fragmentation/abuse that others have
> > > mentioned as userspace can trigger this?
> > 
> > If I understood the concern correctly, it was about it being a risk for
> > doing it for all seq_file uses. This version confines the changes to only
> > sysfs seq_file uses.
> 
> There are a few sysfs files that userspace can read from out there :)

Yes, but the vmap area is also used by default for process stacks, etc.
Malicious fragmentation is already possible. I understood the concern to
be about "regular" use. (And if I'm wrong, we can add a knob maybe?)

> > > And what code frees it?
> > 
> > The existing hooks to seq_release() handle this already. This kind of
> > "preallocation" of the seq_file buffer is done in a few places already
> > (hence my desire for the sanity checking WARN lest future seq_file
> > semantics change).
> 
> Ah, "magic", gotta love it...

Yeeeah. :P

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sysfs: Unconditionally use vmalloc for buffer
  2021-04-01  7:14 ` Michal Hocko
@ 2021-04-01  7:37   ` Kees Cook
  0 siblings, 0 replies; 7+ messages in thread
From: Kees Cook @ 2021-04-01  7:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Greg Kroah-Hartman, Andrew Morton, Rafael J. Wysocki,
	Alexey Dobriyan, Lee Duncan, Chris Leech, Adam Nichols,
	linux-fsdevel, linux-hardening, linux-kernel

On Thu, Apr 01, 2021 at 09:14:25AM +0200, Michal Hocko wrote:
> On Wed 31-03-21 19:21:45, Kees Cook wrote:
> > The sysfs interface to seq_file continues to be rather fragile
> > (seq_get_buf() should not be used outside of seq_file), as seen with
> > some recent exploits[1]. Move the seq_file buffer to the vmap area
> > (while retaining the accounting flag), since it has guard pages that
> > will catch and stop linear overflows.
> 
> I thought the previous discussion has led to a conclusion that the
> preferred way is to disallow direct seq_file buffer usage. But this is
> obviously up to sysfs maintainers. I am happy you do not want to spread
> this out to all seq_file users anymore.

Yeah; I still want to remove external seq_get_buf(), but that'll take
time. I'll be doing the work, though, since I still need access to
f_cred for show() access control checks.

> > This seems justified given that
> > sysfs's use of seq_file already uses kvmalloc(), is almost always using
> > a PAGE_SIZE or larger allocation, has normally short-lived allocations,
> > and is not normally on a performance critical path.
> 
> Let me clarify on this, because this is not quite right. kvmalloc vs
> vmalloc (both with GFP_KERNEL) on PAGE_SIZE are two different beasts.
> The first one is almost always going to use kmalloc because the page
> allocator almost never fails those requests.

Yes, good point. I will adjust my changelog.

Thanks!

-Kees

> 
> > Once seq_get_buf() has been removed (and all sysfs callbacks using
> > seq_file directly), this change can also be removed.
> > 
> > [1] https://blog.grimm-co.com/2021/03/new-old-bugs-in-linux-kernel.html
> > 
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > ---
> > v3:
> > - Limit to only sysfs (instead of all of seq_file).
> > v2: https://lore.kernel.org/lkml/20210315174851.622228-1-keescook@chromium.org/
> > v1: https://lore.kernel.org/lkml/20210312205558.2947488-1-keescook@chromium.org/
> > ---
> >  fs/sysfs/file.c | 23 +++++++++++++++++++++++
> >  1 file changed, 23 insertions(+)
> > 
> > diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> > index 9aefa7779b29..70e7a450e5d1 100644
> > --- a/fs/sysfs/file.c
> > +++ b/fs/sysfs/file.c
> > @@ -16,6 +16,7 @@
> >  #include <linux/mutex.h>
> >  #include <linux/seq_file.h>
> >  #include <linux/mm.h>
> > +#include <linux/vmalloc.h>
> >  
> >  #include "sysfs.h"
> >  
> > @@ -32,6 +33,25 @@ static const struct sysfs_ops *sysfs_file_ops(struct kernfs_node *kn)
> >  	return kobj->ktype ? kobj->ktype->sysfs_ops : NULL;
> >  }
> >  
> > +/*
> > + * To be proactively defensive against sysfs show() handlers that do not
> > + * correctly stay within their PAGE_SIZE buffer, use the vmap area to gain
> > + * the trailing guard page which will stop linear buffer overflows.
> > + */
> > +static void *sysfs_kf_seq_start(struct seq_file *sf, loff_t *ppos)
> > +{
> > +	struct kernfs_open_file *of = sf->private;
> > +	struct kernfs_node *kn = of->kn;
> > +
> > +	WARN_ON_ONCE(sf->buf);
> > +	sf->buf = __vmalloc(kn->attr.size, GFP_KERNEL_ACCOUNT);
> > +	if (!sf->buf)
> > +		return ERR_PTR(-ENOMEM);
> > +	sf->size = kn->attr.size;
> > +
> > +	return NULL + !*ppos;
> > +}
> > +
> >  /*
> >   * Reads on sysfs are handled through seq_file, which takes care of hairy
> >   * details like buffering and seeking.  The following function pipes
> > @@ -206,14 +226,17 @@ static const struct kernfs_ops sysfs_file_kfops_empty = {
> >  };
> >  
> >  static const struct kernfs_ops sysfs_file_kfops_ro = {
> > +	.seq_start	= sysfs_kf_seq_start,
> >  	.seq_show	= sysfs_kf_seq_show,
> >  };
> >  
> >  static const struct kernfs_ops sysfs_file_kfops_wo = {
> > +	.seq_start	= sysfs_kf_seq_start,
> >  	.write		= sysfs_kf_write,
> >  };
> >  
> >  static const struct kernfs_ops sysfs_file_kfops_rw = {
> > +	.seq_start	= sysfs_kf_seq_start,
> >  	.seq_show	= sysfs_kf_seq_show,
> >  	.write		= sysfs_kf_write,
> >  };
> > -- 
> > 2.25.1
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-04-01  7:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-01  2:21 [PATCH v3] sysfs: Unconditionally use vmalloc for buffer Kees Cook
2021-04-01  5:16 ` Greg Kroah-Hartman
2021-04-01  6:52   ` Kees Cook
2021-04-01  7:10     ` Greg Kroah-Hartman
2021-04-01  7:30       ` Kees Cook
2021-04-01  7:14 ` Michal Hocko
2021-04-01  7:37   ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).