regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Regression when writing to /proc/<pid>/attr/
@ 2021-06-07 14:22 Christian Brauner
  2021-06-07 23:38 ` Kees Cook
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Brauner @ 2021-06-07 14:22 UTC (permalink / raw)
  To: Kees Cook, Linus Torvalds; +Cc: regressions, Andrea Righi

Hey Linus,
hey Kees,

This morning I got a report about regressions when running containers
using lsm profiles when spawning a new process into a container. Andrea
bisected this to: bfb819ea20ce ("proc: Check /proc/$pid/attr/ writes
against file opener")

Spawning a new process into a running container is a bit messy due to
accumulated legacy cruft and here's one way we're currently doing it.
Parent process -> immediate process -> attached process: the
intermediate process is needed to attach to the container's namespaces
and then we fork so that the "attached process" is a proper member of
the pid namespace of the container, i.e. a child of PID 1 in the new pid
namespace.

The IPC mechanism is:

/* 
 * IPC mechanism: (X is receiver)
 *   initial process        transient process   attached process
 *        X           <---  send pid of
 *                          attached proc,
 *                          then exit
 *    send 0 ------------------------------------>    X
 *                                              [do initialization]
 *        X  <------------------------------------  send 1
 *   [add to cgroup, ...]
 *    send 2 ------------------------------------>    X
 *						[set LXC_ATTACH_NO_NEW_PRIVS]
 *        X  <------------------------------------  send 3
 *   [open LSM label fd]
 *    send 4 ------------------------------------>    X
 *   						[set LSM label]
 *   close socket                                 close socket
 *                                                run program
 */

With your fix Kees, the last step where the attached process writes its
own lsm profile fails with EPERM where it would succeed before. That
means v5.13 breaks all container users currently where it has worked
continuously before. :)

The LSM profile is written after we've become root in our new namespace

	if (!lxc_drop_groups())
		goto on_error;

	if (options->namespaces & CLONE_NEWUSER)
		if (!lxc_switch_uid_gid(ctx->setup_ns_uid, ctx->setup_ns_gid))
			goto on_error;

	if (attach_lsm(options) && ctx->lsm_label) {
		/* Change into our new LSM profile. */
		ret = ctx->lsm_ops->process_label_set_at(ctx->lsm_ops, fd_lsm, ctx->lsm_label, on_exec);
		if (ret < 0)
			goto on_error;

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		TRACE("Set %s LSM label to \"%s\"", ctx->lsm_ops->name, ctx->lsm_label);
	}

So the effective ids of the process writing the lsm profile are
different from the ids of the process that opened the lsm fd in this
case.

Christian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression when writing to /proc/<pid>/attr/
  2021-06-07 14:22 Regression when writing to /proc/<pid>/attr/ Christian Brauner
@ 2021-06-07 23:38 ` Kees Cook
  2021-06-08  0:02   ` Linus Torvalds
  2021-06-08  8:51   ` Christian Brauner
  0 siblings, 2 replies; 9+ messages in thread
From: Kees Cook @ 2021-06-07 23:38 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Linus Torvalds, regressions, Andrea Righi

On Mon, Jun 07, 2021 at 04:22:45PM +0200, Christian Brauner wrote:
> Hey Linus,
> hey Kees,
> 
> This morning I got a report about regressions when running containers
> using lsm profiles when spawning a new process into a container. Andrea
> bisected this to: bfb819ea20ce ("proc: Check /proc/$pid/attr/ writes
> against file opener")

Aaagh.

> Spawning a new process into a running container is a bit messy due to
> accumulated legacy cruft and here's one way we're currently doing it.
> Parent process -> immediate process -> attached process: the
> intermediate process is needed to attach to the container's namespaces
> and then we fork so that the "attached process" is a proper member of
> the pid namespace of the container, i.e. a child of PID 1 in the new pid
> namespace.
>
> The IPC mechanism is:

In here, "initial" means "parent", "transient" means "intermediate"?

> 
> /* 
>  * IPC mechanism: (X is receiver)
>  *   initial process        transient process   attached process
>  *        X           <---  send pid of
>  *                          attached proc,
>  *                          then exit
>  *    send 0 ------------------------------------>    X
>  *                                              [do initialization]
>  *        X  <------------------------------------  send 1
>  *   [add to cgroup, ...]
>  *    send 2 ------------------------------------>    X
>  *						[set LXC_ATTACH_NO_NEW_PRIVS]
>  *        X  <------------------------------------  send 3
>  *   [open LSM label fd]

As in, "initial process" is opening "attached process"'s attr fd?

>  *    send 4 ------------------------------------>    X
>  *   						[set LSM label]

Does "initial" send the fd to "attached"?

>  *   close socket                                 close socket
>  *                                                run program
>  */
> 
> With your fix Kees, the last step where the attached process writes its
> own lsm profile fails with EPERM where it would succeed before. That
> means v5.13 breaks all container users currently where it has worked
> continuously before. :)

I can only understand this if the fd is passed to the writer, or the
writer opens, changes creds, and then writes?

> The LSM profile is written after we've become root in our new namespace
> 
> 	if (!lxc_drop_groups())
> 		goto on_error;
> 
> 	if (options->namespaces & CLONE_NEWUSER)
> 		if (!lxc_switch_uid_gid(ctx->setup_ns_uid, ctx->setup_ns_gid))
> 			goto on_error;
> 
> 	if (attach_lsm(options) && ctx->lsm_label) {
> 		/* Change into our new LSM profile. */
> 		ret = ctx->lsm_ops->process_label_set_at(ctx->lsm_ops, fd_lsm, ctx->lsm_label, on_exec);
> 		if (ret < 0)
> 			goto on_error;
> 
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> 		TRACE("Set %s LSM label to \"%s\"", ctx->lsm_ops->name, ctx->lsm_label);
> 	}
> 
> So the effective ids of the process writing the lsm profile are
> different from the ids of the process that opened the lsm fd in this
> case.

I'm assuming the issue is the latter (open, drop privs, write). And
I assume fsuid/fsgid has changed? (i.e. cred_fscmp() couldn't be used
either?)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression when writing to /proc/<pid>/attr/
  2021-06-07 23:38 ` Kees Cook
@ 2021-06-08  0:02   ` Linus Torvalds
  2021-06-08  2:15     ` Kees Cook
  2021-06-08  8:51   ` Christian Brauner
  1 sibling, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2021-06-08  0:02 UTC (permalink / raw)
  To: Kees Cook; +Cc: Christian Brauner, regressions, Andrea Righi

On Mon, Jun 7, 2021 at 4:38 PM Kees Cook <keescook@chromium.org> wrote:
>
> I'm assuming the issue is the latter (open, drop privs, write). And
> I assume fsuid/fsgid has changed? (i.e. cred_fscmp() couldn't be used
> either?)

Hmm. Do we have some place to hide self_exec_id at open time, and then
just verify that it's still the same at IO time?

IOW, replace that f_cred comparison with a "self_exec_id has not
changed" comparison instead?

Perhaps squirrel it away in file->f_private? Or are we already using
that (didn't check)?

           Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression when writing to /proc/<pid>/attr/
  2021-06-08  0:02   ` Linus Torvalds
@ 2021-06-08  2:15     ` Kees Cook
  2021-06-08  6:44       ` Andrea Righi
  2021-06-08 11:59       ` Christian Brauner
  0 siblings, 2 replies; 9+ messages in thread
From: Kees Cook @ 2021-06-08  2:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christian Brauner, regressions, Andrea Righi

On Mon, Jun 07, 2021 at 05:02:28PM -0700, Linus Torvalds wrote:
> On Mon, Jun 7, 2021 at 4:38 PM Kees Cook <keescook@chromium.org> wrote:
> >
> > I'm assuming the issue is the latter (open, drop privs, write). And
> > I assume fsuid/fsgid has changed? (i.e. cred_fscmp() couldn't be used
> > either?)
> 
> Hmm. Do we have some place to hide self_exec_id at open time, and then
> just verify that it's still the same at IO time?
> 
> IOW, replace that f_cred comparison with a "self_exec_id has not
> changed" comparison instead?

I think we can't use self_exec_id because the original flaw could just
be changed to have the parent open two children (the fd opener and the
victim), which would have the same self_exec_id.

> Perhaps squirrel it away in file->f_private? Or are we already using
> that (didn't check)?

But we can do tracking via file->private_data since the attr files don't
use a custom opener. I think an mm_struct comparison is likely what's
needed here? (This is actually what several of these special proc files
are already doing, but they actually _use_ mm_struct.)

UNTESTED:


diff --git a/fs/proc/base.c b/fs/proc/base.c
index 58bbf334265b..7118ebe38fa6 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2674,6 +2674,11 @@ static int proc_pident_readdir(struct file *file, struct dir_context *ctx,
 }
 
 #ifdef CONFIG_SECURITY
+static int proc_pid_attr_open(struct inode *inode, struct file *file)
+{
+	return __mem_open(inode, file, PTRACE_MODE_READ_FSCREDS);
+}
+
 static ssize_t proc_pid_attr_read(struct file * file, char __user * buf,
 				  size_t count, loff_t *ppos)
 {
@@ -2704,7 +2709,7 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
 	int rv;
 
 	/* A task may only write when it was the opener. */
-	if (file->f_cred != current_real_cred())
+	if (file->private_data != current->mm)
 		return -EPERM;
 
 	rcu_read_lock();
@@ -2754,9 +2759,11 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
 }
 
 static const struct file_operations proc_pid_attr_operations = {
+	.open		= proc_pid_attr_open,
 	.read		= proc_pid_attr_read,
 	.write		= proc_pid_attr_write,
 	.llseek		= generic_file_llseek,
+	.release	= mem_release,
 };
 
 #define LSM_DIR_OPS(LSM) \

-- 
Kees Cook

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Regression when writing to /proc/<pid>/attr/
  2021-06-08  2:15     ` Kees Cook
@ 2021-06-08  6:44       ` Andrea Righi
  2021-06-08 17:03         ` Kees Cook
  2021-06-08 11:59       ` Christian Brauner
  1 sibling, 1 reply; 9+ messages in thread
From: Andrea Righi @ 2021-06-08  6:44 UTC (permalink / raw)
  To: Kees Cook; +Cc: Linus Torvalds, Christian Brauner, regressions

On Mon, Jun 07, 2021 at 07:15:36PM -0700, Kees Cook wrote:
> On Mon, Jun 07, 2021 at 05:02:28PM -0700, Linus Torvalds wrote:
> > On Mon, Jun 7, 2021 at 4:38 PM Kees Cook <keescook@chromium.org> wrote:
> > >
> > > I'm assuming the issue is the latter (open, drop privs, write). And
> > > I assume fsuid/fsgid has changed? (i.e. cred_fscmp() couldn't be used
> > > either?)
> > 
> > Hmm. Do we have some place to hide self_exec_id at open time, and then
> > just verify that it's still the same at IO time?
> > 
> > IOW, replace that f_cred comparison with a "self_exec_id has not
> > changed" comparison instead?
> 
> I think we can't use self_exec_id because the original flaw could just
> be changed to have the parent open two children (the fd opener and the
> victim), which would have the same self_exec_id.
> 
> > Perhaps squirrel it away in file->f_private? Or are we already using
> > that (didn't check)?
> 
> But we can do tracking via file->private_data since the attr files don't
> use a custom opener. I think an mm_struct comparison is likely what's
> needed here? (This is actually what several of these special proc files
> are already doing, but they actually _use_ mm_struct.)
> 
> UNTESTED:

Thanks Kees! Comparing mm_struct makes sense to me. With this applied
the regression that I was experiencing with containers doesn't happen
anymore.

I'll run more stress tests with this, but for now it looks good to me,
so FWIW:

Tested-by: Andrea Righi <andrea.righi@canonical.com>

> 
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 58bbf334265b..7118ebe38fa6 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -2674,6 +2674,11 @@ static int proc_pident_readdir(struct file *file, struct dir_context *ctx,
>  }
>  
>  #ifdef CONFIG_SECURITY
> +static int proc_pid_attr_open(struct inode *inode, struct file *file)
> +{
> +	return __mem_open(inode, file, PTRACE_MODE_READ_FSCREDS);
> +}
> +
>  static ssize_t proc_pid_attr_read(struct file * file, char __user * buf,
>  				  size_t count, loff_t *ppos)
>  {
> @@ -2704,7 +2709,7 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
>  	int rv;
>  
>  	/* A task may only write when it was the opener. */
> -	if (file->f_cred != current_real_cred())
> +	if (file->private_data != current->mm)
>  		return -EPERM;
>  
>  	rcu_read_lock();
> @@ -2754,9 +2759,11 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
>  }
>  
>  static const struct file_operations proc_pid_attr_operations = {
> +	.open		= proc_pid_attr_open,
>  	.read		= proc_pid_attr_read,
>  	.write		= proc_pid_attr_write,
>  	.llseek		= generic_file_llseek,
> +	.release	= mem_release,
>  };
>  
>  #define LSM_DIR_OPS(LSM) \
> 
> -- 
> Kees Cook

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression when writing to /proc/<pid>/attr/
  2021-06-07 23:38 ` Kees Cook
  2021-06-08  0:02   ` Linus Torvalds
@ 2021-06-08  8:51   ` Christian Brauner
  1 sibling, 0 replies; 9+ messages in thread
From: Christian Brauner @ 2021-06-08  8:51 UTC (permalink / raw)
  To: Kees Cook; +Cc: Linus Torvalds, regressions, Andrea Righi

On Mon, Jun 07, 2021 at 04:38:50PM -0700, Kees Cook wrote:
> On Mon, Jun 07, 2021 at 04:22:45PM +0200, Christian Brauner wrote:
> > Hey Linus,
> > hey Kees,
> > 
> > This morning I got a report about regressions when running containers
> > using lsm profiles when spawning a new process into a container. Andrea
> > bisected this to: bfb819ea20ce ("proc: Check /proc/$pid/attr/ writes
> > against file opener")
> 
> Aaagh.
> 
> > Spawning a new process into a running container is a bit messy due to
> > accumulated legacy cruft and here's one way we're currently doing it.
> > Parent process -> immediate process -> attached process: the
> > intermediate process is needed to attach to the container's namespaces
> > and then we fork so that the "attached process" is a proper member of
> > the pid namespace of the container, i.e. a child of PID 1 in the new pid
> > namespace.
> >
> > The IPC mechanism is:
> 
> In here, "initial" means "parent", "transient" means "intermediate"?
> 
> > 
> > /* 
> >  * IPC mechanism: (X is receiver)
> >  *   initial process        transient process   attached process
> >  *        X           <---  send pid of
> >  *                          attached proc,
> >  *                          then exit
> >  *    send 0 ------------------------------------>    X
> >  *                                              [do initialization]
> >  *        X  <------------------------------------  send 1
> >  *   [add to cgroup, ...]
> >  *    send 2 ------------------------------------>    X
> >  *						[set LXC_ATTACH_NO_NEW_PRIVS]
> >  *        X  <------------------------------------  send 3
> >  *   [open LSM label fd]
> 
> As in, "initial process" is opening "attached process"'s attr fd?

Yes.

> 
> >  *    send 4 ------------------------------------>    X
> >  *   						[set LSM label]
> 
> Does "initial" send the fd to "attached"?

Yes.

> 
> >  *   close socket                                 close socket
> >  *                                                run program
> >  */
> > 
> > With your fix Kees, the last step where the attached process writes its
> > own lsm profile fails with EPERM where it would succeed before. That
> > means v5.13 breaks all container users currently where it has worked
> > continuously before. :)
> 
> I can only understand this if the fd is passed to the writer, or the
> writer opens, changes creds, and then writes?

The fd is opened by the <initial process> which is the parent of the
<attached process>. (The <attached process> is created with CLONE_PARENT).

The <initial process> openes the lsm fd for the <attached process> (The
<attached process> may not have procfs mounted or may not be attached to
the mount namespace, lack privileges, seccomp filter etc.) and then
sends it to the <attached process> so it can write its own lsm policy.

> 
> > The LSM profile is written after we've become root in our new namespace
> > 
> > 	if (!lxc_drop_groups())
> > 		goto on_error;
> > 
> > 	if (options->namespaces & CLONE_NEWUSER)
> > 		if (!lxc_switch_uid_gid(ctx->setup_ns_uid, ctx->setup_ns_gid))
> > 			goto on_error;
> > 
> > 	if (attach_lsm(options) && ctx->lsm_label) {
> > 		/* Change into our new LSM profile. */
> > 		ret = ctx->lsm_ops->process_label_set_at(ctx->lsm_ops, fd_lsm, ctx->lsm_label, on_exec);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > 		TRACE("Set %s LSM label to \"%s\"", ctx->lsm_ops->name, ctx->lsm_label);
> > 	}
> > 
> > So the effective ids of the process writing the lsm profile are
> > different from the ids of the process that opened the lsm fd in this
> > case.
> 
> I'm assuming the issue is the latter (open, drop privs, write). And
> I assume fsuid/fsgid has changed? (i.e. cred_fscmp() couldn't be used
> either?)

Yes, when the <attached process> set*id()s in the CLONE_NEWUSER case it
will change creds so the creds of the opener and the creds of the writer
don't match. And yes, it'll change fs*id as well, i.e. set*id() will
cause an fs*id change implicitly.

Christian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression when writing to /proc/<pid>/attr/
  2021-06-08  2:15     ` Kees Cook
  2021-06-08  6:44       ` Andrea Righi
@ 2021-06-08 11:59       ` Christian Brauner
  2021-06-08 16:39         ` Linus Torvalds
  1 sibling, 1 reply; 9+ messages in thread
From: Christian Brauner @ 2021-06-08 11:59 UTC (permalink / raw)
  To: Kees Cook; +Cc: Linus Torvalds, regressions, Andrea Righi

On Mon, Jun 07, 2021 at 07:15:36PM -0700, Kees Cook wrote:
> On Mon, Jun 07, 2021 at 05:02:28PM -0700, Linus Torvalds wrote:
> > On Mon, Jun 7, 2021 at 4:38 PM Kees Cook <keescook@chromium.org> wrote:
> > >
> > > I'm assuming the issue is the latter (open, drop privs, write). And
> > > I assume fsuid/fsgid has changed? (i.e. cred_fscmp() couldn't be used
> > > either?)
> > 
> > Hmm. Do we have some place to hide self_exec_id at open time, and then
> > just verify that it's still the same at IO time?
> > 
> > IOW, replace that f_cred comparison with a "self_exec_id has not
> > changed" comparison instead?
> 
> I think we can't use self_exec_id because the original flaw could just
> be changed to have the parent open two children (the fd opener and the
> victim), which would have the same self_exec_id.

Hm, but doesn't the mm check have the same problem? It's not checking
the mm of the opener against the mm of the writer. To stay with the
example in this thread, it's checking the mm of the <attached process>
at open time against the mm of the <attached process> at write time if I
read this correctly.

> 
> > Perhaps squirrel it away in file->f_private? Or are we already using
> > that (didn't check)?
> 
> But we can do tracking via file->private_data since the attr files don't
> use a custom opener. I think an mm_struct comparison is likely what's
> needed here? (This is actually what several of these special proc files

So ok, looking at your original fix it tried to make sure that the
credentials stay invariant during open and write time, i.e. only the
opener can write. The main worry being that a process with more
privileges could potentially be tricked into writing say an lsm profile
for itself it didn't want to.

This solution would mean for the example in this thread that f_cred
would be set to the creds of the <initial process> and would then be
compared to the creds of the <attached process>. Which is how this
regression happens. Afaict, any non-threaded scenario where one process
opens an attr fd and sends it to another one to write would be broken. :)

Iiuc, your mm change does something different now. __mem_open() records
the mm of the <attached process> during open and then compares it
against the <attached process> mm during write. And only if they match
are you allowed to write. So the semantics change from enforcing the
writer be the opener to the writer not having changed mm.

Just for the record, this still implies that if a process calls
unshare(CLONE_VM) and then writes its LSM profile afterwards it will
fail where it would have succeeded before. It's probably a very rare
scenario so it's probably fine but still.

But afaiu, the mm check is about making sure that the <attached process>
hasn't execed and thus changed creds that way in between the open and
the write. So in that case wouldn't a self_exec_id check indeed be
simpler?

Christian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression when writing to /proc/<pid>/attr/
  2021-06-08 11:59       ` Christian Brauner
@ 2021-06-08 16:39         ` Linus Torvalds
  0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2021-06-08 16:39 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Kees Cook, regressions, Andrea Righi

On Tue, Jun 8, 2021 at 4:59 AM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> Hm, but doesn't the mm check have the same problem? It's not checking
> the mm of the opener against the mm of the writer. To stay with the
> example in this thread, it's checking the mm of the <attached process>
> at open time against the mm of the <attached process> at write time if I
> read this correctly.

Yes and no. It is checking that the mm of the target still matches,
but what you don't see in the patch (because it was pre-existing) is
that it also checks that the target is "current".

So it does effectively check that current hasn't execve'd. So ack on
the patch from me.

Of course, we could _also_ allow a cross-execve() write if the creds
remain the same, but I think this "hasn't execve'd" is the better
check (and was what the original cred check patch basically aimed to
do anyway).

           Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression when writing to /proc/<pid>/attr/
  2021-06-08  6:44       ` Andrea Righi
@ 2021-06-08 17:03         ` Kees Cook
  0 siblings, 0 replies; 9+ messages in thread
From: Kees Cook @ 2021-06-08 17:03 UTC (permalink / raw)
  To: Andrea Righi; +Cc: Linus Torvalds, Christian Brauner, regressions

On Tue, Jun 08, 2021 at 08:44:17AM +0200, Andrea Righi wrote:
> On Mon, Jun 07, 2021 at 07:15:36PM -0700, Kees Cook wrote:
> > On Mon, Jun 07, 2021 at 05:02:28PM -0700, Linus Torvalds wrote:
> > > On Mon, Jun 7, 2021 at 4:38 PM Kees Cook <keescook@chromium.org> wrote:
> > > >
> > > > I'm assuming the issue is the latter (open, drop privs, write). And
> > > > I assume fsuid/fsgid has changed? (i.e. cred_fscmp() couldn't be used
> > > > either?)
> > > 
> > > Hmm. Do we have some place to hide self_exec_id at open time, and then
> > > just verify that it's still the same at IO time?
> > > 
> > > IOW, replace that f_cred comparison with a "self_exec_id has not
> > > changed" comparison instead?
> > 
> > I think we can't use self_exec_id because the original flaw could just
> > be changed to have the parent open two children (the fd opener and the
> > victim), which would have the same self_exec_id.
> > 
> > > Perhaps squirrel it away in file->f_private? Or are we already using
> > > that (didn't check)?
> > 
> > But we can do tracking via file->private_data since the attr files don't
> > use a custom opener. I think an mm_struct comparison is likely what's
> > needed here? (This is actually what several of these special proc files
> > are already doing, but they actually _use_ mm_struct.)
> > 
> > UNTESTED:
> 
> Thanks Kees! Comparing mm_struct makes sense to me. With this applied
> the regression that I was experiencing with containers doesn't happen
> anymore.
> 
> I'll run more stress tests with this, but for now it looks good to me,
> so FWIW:
> 
> Tested-by: Andrea Righi <andrea.righi@canonical.com>

Oh good! Thanks for testing and sorry for the extra work I created! I'll
get this spun up into a proper patch.

-Kees

> 
> > 
> > 
> > diff --git a/fs/proc/base.c b/fs/proc/base.c
> > index 58bbf334265b..7118ebe38fa6 100644
> > --- a/fs/proc/base.c
> > +++ b/fs/proc/base.c
> > @@ -2674,6 +2674,11 @@ static int proc_pident_readdir(struct file *file, struct dir_context *ctx,
> >  }
> >  
> >  #ifdef CONFIG_SECURITY
> > +static int proc_pid_attr_open(struct inode *inode, struct file *file)
> > +{
> > +	return __mem_open(inode, file, PTRACE_MODE_READ_FSCREDS);
> > +}
> > +
> >  static ssize_t proc_pid_attr_read(struct file * file, char __user * buf,
> >  				  size_t count, loff_t *ppos)
> >  {
> > @@ -2704,7 +2709,7 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
> >  	int rv;
> >  
> >  	/* A task may only write when it was the opener. */
> > -	if (file->f_cred != current_real_cred())
> > +	if (file->private_data != current->mm)
> >  		return -EPERM;
> >  
> >  	rcu_read_lock();
> > @@ -2754,9 +2759,11 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
> >  }
> >  
> >  static const struct file_operations proc_pid_attr_operations = {
> > +	.open		= proc_pid_attr_open,
> >  	.read		= proc_pid_attr_read,
> >  	.write		= proc_pid_attr_write,
> >  	.llseek		= generic_file_llseek,
> > +	.release	= mem_release,
> >  };
> >  
> >  #define LSM_DIR_OPS(LSM) \
> > 
> > -- 
> > Kees Cook

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-06-08 17:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-07 14:22 Regression when writing to /proc/<pid>/attr/ Christian Brauner
2021-06-07 23:38 ` Kees Cook
2021-06-08  0:02   ` Linus Torvalds
2021-06-08  2:15     ` Kees Cook
2021-06-08  6:44       ` Andrea Righi
2021-06-08 17:03         ` Kees Cook
2021-06-08 11:59       ` Christian Brauner
2021-06-08 16:39         ` Linus Torvalds
2021-06-08  8:51   ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).