linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION v4.16-rc6] [PATCH] mqueue: forbid unprivileged user access to internal mount
@ 2018-03-23  6:04 Aleksa Sarai
  2018-03-23  6:31 ` Eric W. Biederman
  0 siblings, 1 reply; 8+ messages in thread
From: Aleksa Sarai @ 2018-03-23  6:04 UTC (permalink / raw)
  To: Eric W. Biederman, Al Viro; +Cc: linux-kernel, containers

[-- Attachment #1: Type: text/plain, Size: 2619 bytes --]

Hi all,

Felix reported weird behaviour on 4.16.0-rc6 with regards to mqueue[1],
which was introduced by 36735a6a2b5e ("mqueue: switch to on-demand
creation of internal mount").

Basically, the reproducer boils down to being able to mount mqueue if
you create a new user namespace, even if you don't unshare the IPC
namespace.

Previously this was not possible, and you would get an -EPERM. The mount
is the *host* mqueue mount, which is being cached and just returned from
mqueue_mount(). To be honest, I'm not sure if this is safe or not (or if
it was intentional -- since I'm not familiar with mqueue).

To me it looks like there is a missing permission check. I've included a
patch below that I've compile-tested, and should block the above case.
Can someone please tell me if I'm missing something? Is this actually
safe?

[1]: https://github.com/docker/docker/issues/36674

--8<--------------------------------------------------------------------

Fix a regression caused by 36735a6a2b5e ("mqueue: switch to on-demand
creation of internal mount"), where an unprivileged user is permitted to
mount mqueue even if they don't have CAP_SYS_ADMIN in the ipcns's
associated userns. This can be reproduced as in the following.

  % unshare -Urm                       # ipc isn't unshare'd
  # mount -t mqueue mqueue /dev/mqueue # should have failed
  # echo $?
  0

Previously the above would error out with an -EPERM, as the mount was
protected by mount_ns(), but the patch in question switched to
kern_mount_data() which doesn't do this necessary permission check. So
add it explicitly to mq_internal_mount().

Fixes: 36735a6a2b5e ("mqueue: switch to on-demand creation of internal mount")
Reported-by: Felix Abecassis <fabecassis@nvidia.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
---
 ipc/mqueue.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index d7f309f74dec..ddb85091398d 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -353,6 +353,12 @@ static struct vfsmount *mq_internal_mount(void)
 {
 	struct ipc_namespace *ns = current->nsproxy->ipc_ns;
 	struct vfsmount *m = ns->mq_mnt;
+	/*
+	 * Match the semantics of mount_ns, to avoid unprivileged users from being
+	 * able to mount mqueue from an IPC namespace they don't have ownership of.
+	 */
+	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
 	if (m)
 		return m;
 	m = kern_mount_data(&mqueue_fs_type, ns);
-- 
2.16.2

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [REGRESSION v4.16-rc6] [PATCH] mqueue: forbid unprivileged user access to internal mount
  2018-03-23  6:04 [REGRESSION v4.16-rc6] [PATCH] mqueue: forbid unprivileged user access to internal mount Aleksa Sarai
@ 2018-03-23  6:31 ` Eric W. Biederman
  2018-03-23 21:41   ` [PATCH 1/2] fs: Extend mount_ns with support for a fast namespace to vfsmount function Eric W. Biederman
  0 siblings, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2018-03-23  6:31 UTC (permalink / raw)
  To: Aleksa Sarai; +Cc: Al Viro, linux-kernel, containers

Aleksa Sarai <asarai@suse.de> writes:

> Hi all,
>
> Felix reported weird behaviour on 4.16.0-rc6 with regards to mqueue[1],
> which was introduced by 36735a6a2b5e ("mqueue: switch to on-demand
> creation of internal mount").
>
> Basically, the reproducer boils down to being able to mount mqueue if
> you create a new user namespace, even if you don't unshare the IPC
> namespace.
>
> Previously this was not possible, and you would get an -EPERM. The mount
> is the *host* mqueue mount, which is being cached and just returned from
> mqueue_mount(). To be honest, I'm not sure if this is safe or not (or if
> it was intentional -- since I'm not familiar with mqueue).
>
> To me it looks like there is a missing permission check. I've included a
> patch below that I've compile-tested, and should block the above case.
> Can someone please tell me if I'm missing something? Is this actually
> safe?

I think it may be safe by luck.  If mqueuefs had any mount options this
would allow them to be changed.  

Looking at the code there is another issue. sb->s_user_ns is getting
set to &init_user_ns instead of ns->user_ns.  That will cause other
operations to fail like mount -o remount to fail that should not.

So I think the fix needs a little more work.

Eric




>
> [1]: https://github.com/docker/docker/issues/36674
>
> --8<--------------------------------------------------------------------
>
> Fix a regression caused by 36735a6a2b5e ("mqueue: switch to on-demand
> creation of internal mount"), where an unprivileged user is permitted to
> mount mqueue even if they don't have CAP_SYS_ADMIN in the ipcns's
> associated userns. This can be reproduced as in the following.
>
>   % unshare -Urm                       # ipc isn't unshare'd
>   # mount -t mqueue mqueue /dev/mqueue # should have failed
>   # echo $?
>   0
>
> Previously the above would error out with an -EPERM, as the mount was
> protected by mount_ns(), but the patch in question switched to
> kern_mount_data() which doesn't do this necessary permission check. So
> add it explicitly to mq_internal_mount().
>
> Fixes: 36735a6a2b5e ("mqueue: switch to on-demand creation of internal mount")
> Reported-by: Felix Abecassis <fabecassis@nvidia.com>
> Signed-off-by: Aleksa Sarai <asarai@suse.de>
> ---
>  ipc/mqueue.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index d7f309f74dec..ddb85091398d 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -353,6 +353,12 @@ static struct vfsmount *mq_internal_mount(void)
>  {
>  	struct ipc_namespace *ns = current->nsproxy->ipc_ns;
>  	struct vfsmount *m = ns->mq_mnt;
> +	/*
> +	 * Match the semantics of mount_ns, to avoid unprivileged users from being
> +	 * able to mount mqueue from an IPC namespace they don't have ownership of.
> +	 */
> +	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
> +		return ERR_PTR(-EPERM);
>  	if (m)
>  		return m;
>  	m = kern_mount_data(&mqueue_fs_type, ns);
> -- 
> 2.16.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] fs: Extend mount_ns with support for a fast namespace to vfsmount function
  2018-03-23  6:31 ` Eric W. Biederman
@ 2018-03-23 21:41   ` Eric W. Biederman
  2018-03-23 21:43     ` [PATCH 2/2] mqueuefs: Fix the permissions and permission checks when mounting mqueuefs Eric W. Biederman
  2018-03-23 23:15     ` [PATCH 1/2] fs: Extend mount_ns with support for a fast namespace to vfsmount function Al Viro
  0 siblings, 2 replies; 8+ messages in thread
From: Eric W. Biederman @ 2018-03-23 21:41 UTC (permalink / raw)
  To: Aleksa Sarai; +Cc: Al Viro, linux-kernel, containers


If this function is present use it to lookup up the vfsmount except
when performaning internal kernel mounts.  When performing internal
kernel mounts don't look through the list of superblocks just create a
new one.

After a quick survey it appears all callers of mount_ns are candidates
for this optimization.  So extending the generic helper appears
like the right thing.

The motivation for this change is that this optimization was performed
recently on mqueuefs and a permission check was dropped and
sb->s_user_ns was set incorrectly.

To enable fixing mqueuefs this logic was extracted from mqueuefs and
added to mount_ns which gets the permission check correct and set
sb->s_user_ns properly.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/nfsd/nfsctl.c      |  3 ++-
 fs/proc/root.c        |  3 ++-
 fs/super.c            | 18 +++++++++++++++---
 include/linux/fs.h    |  1 +
 net/sunrpc/rpc_pipe.c |  2 +-
 5 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index d107b4426f7e..ffd8d91a68df 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1182,7 +1182,8 @@ static struct dentry *nfsd_mount(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
 	struct net *net = current->nsproxy->net_ns;
-	return mount_ns(fs_type, flags, data, net, net->user_ns, nfsd_fill_super);
+	return mount_ns(fs_type, flags, data, net, net->user_ns,
+			NULL, nfsd_fill_super);
 }
 
 static void nfsd_umount(struct super_block *sb)
diff --git a/fs/proc/root.c b/fs/proc/root.c
index ede8e64974be..4111565b6944 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -98,7 +98,8 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 		ns = task_active_pid_ns(current);
 	}
 
-	return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super);
+	return mount_ns(fs_type, flags, data, ns, ns->user_ns,
+			NULL, proc_fill_super);
 }
 
 static void proc_kill_sb(struct super_block *sb)
diff --git a/fs/super.c b/fs/super.c
index 672538ca9831..4734d423b403 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1016,18 +1016,30 @@ static int ns_set_super(struct super_block *sb, void *data)
 
 struct dentry *mount_ns(struct file_system_type *fs_type,
 	int flags, void *data, void *ns, struct user_namespace *user_ns,
+	struct vfsmount *(*ns_to_mnt)(void *ns),
 	int (*fill_super)(struct super_block *, void *, int))
 {
 	struct super_block *sb;
-
+	int (*test_super)(struct super_block *, void *) = ns_test_super;
 	/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
 	 * over the namespace.
 	 */
 	if (!(flags & SB_KERNMOUNT) && !ns_capable(user_ns, CAP_SYS_ADMIN))
 		return ERR_PTR(-EPERM);
 
-	sb = sget_userns(fs_type, ns_test_super, ns_set_super, flags,
-			 user_ns, ns);
+	if (ns_to_mnt) {
+		test_super = NULL;
+		if (!(flags & SB_KERNMOUNT)) {
+			struct vfsmount *m = ns_to_mnt(ns);
+			if (IS_ERR(m))
+				return ERR_CAST(m);
+			atomic_inc(&m->mnt_sb->s_active);
+			down_write(&m->mnt_sb->s_umount);
+			return dget(m->mnt_root);
+		}
+	}
+
+	sb = sget_userns(fs_type, test_super, ns_set_super, flags, user_ns, ns);
 	if (IS_ERR(sb))
 		return ERR_CAST(sb);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2a815560fda0..ca7f59ff144c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2091,6 +2091,7 @@ struct file_system_type {
 
 extern struct dentry *mount_ns(struct file_system_type *fs_type,
 	int flags, void *data, void *ns, struct user_namespace *user_ns,
+	struct vfsmount *(*ns_to_mnt)(void *ns),
 	int (*fill_super)(struct super_block *, void *, int));
 #ifdef CONFIG_BLOCK
 extern struct dentry *mount_bdev(struct file_system_type *fs_type,
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index fc97fc3ed637..824e740fe740 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -1448,7 +1448,7 @@ rpc_mount(struct file_system_type *fs_type,
 		int flags, const char *dev_name, void *data)
 {
 	struct net *net = current->nsproxy->net_ns;
-	return mount_ns(fs_type, flags, data, net, net->user_ns, rpc_fill_super);
+	return mount_ns(fs_type, flags, data, net, net->user_ns, NULL, rpc_fill_super);
 }
 
 static void rpc_kill_sb(struct super_block *sb)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] mqueuefs: Fix the permissions and permission checks when mounting mqueuefs
  2018-03-23 21:41   ` [PATCH 1/2] fs: Extend mount_ns with support for a fast namespace to vfsmount function Eric W. Biederman
@ 2018-03-23 21:43     ` Eric W. Biederman
  2018-03-23 23:15     ` [PATCH 1/2] fs: Extend mount_ns with support for a fast namespace to vfsmount function Al Viro
  1 sibling, 0 replies; 8+ messages in thread
From: Eric W. Biederman @ 2018-03-23 21:43 UTC (permalink / raw)
  To: Aleksa Sarai; +Cc: Al Viro, linux-kernel, containers


Aleksa Sarai writes:
>
> Felix reported weird behaviour on 4.16.0-rc6 with regards to mqueue[1],
> which was introduced by 36735a6a2b5e ("mqueue: switch to on-demand
> creation of internal mount").
>
> Basically, the reproducer boils down to being able to mount mqueue if
> you create a new user namespace, even if you don't unshare the IPC
> namespace.
>
> Previously this was not possible, and you would get an -EPERM. The mount
> is the *host* mqueue mount, which is being cached and just returned from
> mqueue_mount(). To be honest, I'm not sure if this is safe or not (or if
> it was intentional -- since I'm not familiar with mqueue).
>
> To me it looks like there is a missing permission check. I've included a
> patch below that I've compile-tested, and should block the above case.
> Can someone please tell me if I'm missing something? Is this actually
> safe?
>
>
> [1]: https://github.com/docker/docker/issues/36674

After examination of the code it might be safe by chance but it is
definitely wrong.  The missing permission checks are needed in the
general case, and sb->s_user_ns needs to be set to ns->user_ns to give
root in the user namespace the appropriate permissions over the
filesystem.

Fixes: 36735a6a2b5e ("mqueue: switch to on-demand creation of internal mount")
Reported-by: Felix Abecassis <fabecassis@nvidia.com>
Reported-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

Unless there are objections I will push these fixes to Linus in a day
or so.

 ipc/mqueue.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index d7f309f74dec..832c1ec21318 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -325,9 +325,8 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
 static int mqueue_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct inode *inode;
-	struct ipc_namespace *ns = data;
+	struct ipc_namespace *ns = sb->s_fs_info;
 
-	sb->s_fs_info = ns;
 	sb->s_iflags |= SB_I_NOEXEC | SB_I_NODEV;
 	sb->s_blocksize = PAGE_SIZE;
 	sb->s_blocksize_bits = PAGE_SHIFT;
@@ -349,9 +348,9 @@ static struct file_system_type mqueue_fs_type;
  * Return value is pinned only by reference in ->mq_mnt; it will
  * live until ipcns dies.  Caller does not need to drop it.
  */
-static struct vfsmount *mq_internal_mount(void)
+static struct vfsmount *mq_internal_mount(void *nsp)
 {
-	struct ipc_namespace *ns = current->nsproxy->ipc_ns;
+	struct ipc_namespace *ns = nsp;
 	struct vfsmount *m = ns->mq_mnt;
 	if (m)
 		return m;
@@ -373,15 +372,9 @@ static struct dentry *mqueue_mount(struct file_system_type *fs_type,
 			 int flags, const char *dev_name,
 			 void *data)
 {
-	struct vfsmount *m;
-	if (flags & SB_KERNMOUNT)
-		return mount_nodev(fs_type, flags, data, mqueue_fill_super);
-	m = mq_internal_mount();
-	if (IS_ERR(m))
-		return ERR_CAST(m);
-	atomic_inc(&m->mnt_sb->s_active);
-	down_write(&m->mnt_sb->s_umount);
-	return dget(m->mnt_root);
+	struct ipc_namespace *ns = current->nsproxy->ipc_ns;
+	return mount_ns(fs_type, flags, data, ns, ns->user_ns,
+			mq_internal_mount, mqueue_fill_super);
 }
 
 static void init_once(void *foo)
@@ -771,7 +764,7 @@ static int prepare_open(struct dentry *dentry, int oflag, int ro,
 static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
 		      struct mq_attr *attr)
 {
-	struct vfsmount *mnt = mq_internal_mount();
+	struct vfsmount *mnt = mq_internal_mount(current->nsproxy->ipc_ns);
 	struct dentry *root;
 	struct filename *name;
 	struct path path;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] fs: Extend mount_ns with support for a fast namespace to vfsmount function
  2018-03-23 21:41   ` [PATCH 1/2] fs: Extend mount_ns with support for a fast namespace to vfsmount function Eric W. Biederman
  2018-03-23 21:43     ` [PATCH 2/2] mqueuefs: Fix the permissions and permission checks when mounting mqueuefs Eric W. Biederman
@ 2018-03-23 23:15     ` Al Viro
  2018-03-24 16:12       ` Eric W. Biederman
  1 sibling, 1 reply; 8+ messages in thread
From: Al Viro @ 2018-03-23 23:15 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Aleksa Sarai, linux-kernel, containers

On Fri, Mar 23, 2018 at 04:41:40PM -0500, Eric W. Biederman wrote:

>  struct dentry *mount_ns(struct file_system_type *fs_type,
>  	int flags, void *data, void *ns, struct user_namespace *user_ns,
> +	struct vfsmount *(*ns_to_mnt)(void *ns),
>  	int (*fill_super)(struct super_block *, void *, int))
>  {
>  	struct super_block *sb;
> -
> +	int (*test_super)(struct super_block *, void *) = ns_test_super;
>  	/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
>  	 * over the namespace.
>  	 */
>  	if (!(flags & SB_KERNMOUNT) && !ns_capable(user_ns, CAP_SYS_ADMIN))
>  		return ERR_PTR(-EPERM);
>  
> -	sb = sget_userns(fs_type, ns_test_super, ns_set_super, flags,
> -			 user_ns, ns);
> +	if (ns_to_mnt) {
> +		test_super = NULL;
> +		if (!(flags & SB_KERNMOUNT)) {
> +			struct vfsmount *m = ns_to_mnt(ns);
> +			if (IS_ERR(m))
> +				return ERR_CAST(m);
> +			atomic_inc(&m->mnt_sb->s_active);
> +			down_write(&m->mnt_sb->s_umount);
> +			return dget(m->mnt_root);

This is completely wrong.  Look:
	* SB_KERNMOUNT and !SB_KERNMOUNT cases are almost entirely isolated;
completely so once that ns_to_mnt becomes unconditionally non-NULL.  
	* in !SB_KERNMOUNT passing ns_to_mnt() is pointless - you might as
well pass existing vfsmount (or ERR_PTR()) and use _that_.  fill_super()
is not used at all in that case.
	* is SB_KERNMOUNT ns_to_mnt serves only as a flag, eventually
constant true.

So let's split it in two helpers and give them sane arguments.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] fs: Extend mount_ns with support for a fast namespace to vfsmount function
  2018-03-23 23:15     ` [PATCH 1/2] fs: Extend mount_ns with support for a fast namespace to vfsmount function Al Viro
@ 2018-03-24 16:12       ` Eric W. Biederman
  2018-03-24 21:48         ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2018-03-24 16:12 UTC (permalink / raw)
  To: Al Viro; +Cc: Aleksa Sarai, linux-kernel, containers

Al Viro <viro@ZenIV.linux.org.uk> writes:

> On Fri, Mar 23, 2018 at 04:41:40PM -0500, Eric W. Biederman wrote:
>
>>  struct dentry *mount_ns(struct file_system_type *fs_type,
>>  	int flags, void *data, void *ns, struct user_namespace *user_ns,
>> +	struct vfsmount *(*ns_to_mnt)(void *ns),
>>  	int (*fill_super)(struct super_block *, void *, int))
>>  {
>>  	struct super_block *sb;
>> -
>> +	int (*test_super)(struct super_block *, void *) = ns_test_super;
>>  	/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
>>  	 * over the namespace.
>>  	 */
>>  	if (!(flags & SB_KERNMOUNT) && !ns_capable(user_ns, CAP_SYS_ADMIN))
>>  		return ERR_PTR(-EPERM);
>>  
>> -	sb = sget_userns(fs_type, ns_test_super, ns_set_super, flags,
>> -			 user_ns, ns);
>> +	if (ns_to_mnt) {
>> +		test_super = NULL;
>> +		if (!(flags & SB_KERNMOUNT)) {
>> +			struct vfsmount *m = ns_to_mnt(ns);
>> +			if (IS_ERR(m))
>> +				return ERR_CAST(m);
>> +			atomic_inc(&m->mnt_sb->s_active);
>> +			down_write(&m->mnt_sb->s_umount);
>> +			return dget(m->mnt_root);
>
> This is completely wrong.  Look:
> 	* SB_KERNMOUNT and !SB_KERNMOUNT cases are almost entirely isolated;
> completely so once that ns_to_mnt becomes unconditionally non-NULL.  
> 	* in !SB_KERNMOUNT passing ns_to_mnt() is pointless - you might as
> well pass existing vfsmount (or ERR_PTR()) and use _that_.  fill_super()
> is not used at all in that case.
> 	* is SB_KERNMOUNT ns_to_mnt serves only as a flag, eventually
> constant true.
>
> So let's split it in two helpers and give them sane arguments.

Everything I look at with multiple helpers feels even worse to me.
The above has the advantage it is the minimal change to fix the
regression.  So I am not worried about code correctness.

I keep wondering is the intention long term to fix sget so it has an
efficient data structure for finding super blocks (like an rbtree) or if
the intention is to deprecate sget entirely and just have everything
call alloc_super, and be responsible for their own data structures for
finding existing superblocks.

At this point since we are not in agreement on a proper fix I am going
to plan on just queueing up a revert.   So that we don't ship 4.16 with
a regression in a permission check.

Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] fs: Extend mount_ns with support for a fast namespace to vfsmount function
  2018-03-24 16:12       ` Eric W. Biederman
@ 2018-03-24 21:48         ` Al Viro
  2018-03-25  1:25           ` [GIT PULL] Revert "mqueue: switch to on-demand creation of internal mount" Eric W. Biederman
  0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2018-03-24 21:48 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Aleksa Sarai, linux-kernel, containers

On Sat, Mar 24, 2018 at 11:12:02AM -0500, Eric W. Biederman wrote:

> > This is completely wrong.  Look:
> > 	* SB_KERNMOUNT and !SB_KERNMOUNT cases are almost entirely isolated;
> > completely so once that ns_to_mnt becomes unconditionally non-NULL.  
> > 	* in !SB_KERNMOUNT passing ns_to_mnt() is pointless - you might as
> > well pass existing vfsmount (or ERR_PTR()) and use _that_.  fill_super()
> > is not used at all in that case.
> > 	* is SB_KERNMOUNT ns_to_mnt serves only as a flag, eventually
> > constant true.
> >
> > So let's split it in two helpers and give them sane arguments.
> 
> Everything I look at with multiple helpers feels even worse to me.
> The above has the advantage it is the minimal change to fix the
> regression.  So I am not worried about code correctness.

> I keep wondering is the intention long term to fix sget so it has an
> efficient data structure for finding super blocks (like an rbtree) or if
> the intention is to deprecate sget entirely and just have everything
> call alloc_super, and be responsible for their own data structures for
> finding existing superblocks.
>
> At this point since we are not in agreement on a proper fix I am going
> to plan on just queueing up a revert.   So that we don't ship 4.16 with
> a regression in a permission check.

Permission check is trivial to put back in; I'll do that.

FWIW, I don't believe that sget_userns() is a good place for any kind of
universal permission checks.  It's a library helper, not a place everything
must come through when mounting something.  So's mount_ns(), etc.

BTW, will you be at LSF?  I would suggest discussing the architectural
issues there - they are directly related to fsmount() proposals...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [GIT PULL] Revert "mqueue: switch to on-demand creation of internal mount"
  2018-03-24 21:48         ` Al Viro
@ 2018-03-25  1:25           ` Eric W. Biederman
  0 siblings, 0 replies; 8+ messages in thread
From: Eric W. Biederman @ 2018-03-25  1:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Aleksa Sarai, linux-kernel, containers, Al Viro, Giuseppe Scrivano


Linus,

Please pull the for-linus branch from the git tree:

   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus

   HEAD: cfb2f6f6e0ba11ea7b263d6b69c170c4b32ac0ea Revert "mqueue: switch to on-demand creation of internal mount"

This fixes a regression that came in the merge window for v4.16.  The
problem is that the permissions for mounting and using the mqueuefs
filesystem are broken.  The necessary permission check is missing
letting people who should not be able to mount mqueuefs mount mqueuefs.
The field sb->s_user_ns is set incorrectly not allowing the mounter of
mqueuefs to remount and otherwise have proper control over the
filesystem.

Al Viro and I see the path to the necessary fixes differently and I am
not even certain at this point he actually sees all of the necessary
fixes.  Given a couple weeks we can probably work something out but I
don't see the review being resolved in time for the final v4.16.  I
don't want v4.16 shipping with a nasty regression.  So unfortunately I
am sending a revert.

Eric

From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Sat, 24 Mar 2018 11:28:14 -0500
Subject: [PATCH] Revert "mqueue: switch to on-demand creation of internal mount"

This reverts commit 36735a6a2b5e042db1af956ce4bcc13f3ff99e21.

Aleksa Sarai <asarai@suse.de> writes:
> [REGRESSION v4.16-rc6] [PATCH] mqueue: forbid unprivileged user access to internal mount
>
> Felix reported weird behaviour on 4.16.0-rc6 with regards to mqueue[1],
> which was introduced by 36735a6a2b5e ("mqueue: switch to on-demand
> creation of internal mount").
>
> Basically, the reproducer boils down to being able to mount mqueue if
> you create a new user namespace, even if you don't unshare the IPC
> namespace.
>
> Previously this was not possible, and you would get an -EPERM. The mount
> is the *host* mqueue mount, which is being cached and just returned from
> mqueue_mount(). To be honest, I'm not sure if this is safe or not (or if
> it was intentional -- since I'm not familiar with mqueue).
>
> To me it looks like there is a missing permission check. I've included a
> patch below that I've compile-tested, and should block the above case.
> Can someone please tell me if I'm missing something? Is this actually
> safe?
>
> [1]: https://github.com/docker/docker/issues/36674

The issue is a lot deeper than a missing permission check.  sb->s_user_ns
was is improperly set as well.  So in addition to the filesystem being
mounted when it should not be mounted, so things are not allow that should
be.

We are practically to the release of 4.16 and there is no agreement between
Al Viro and myself on what the code should looks like to fix things properly.
So revert the code to what it was before so that we can take our time
and discuss this properly.

Fixes: 36735a6a2b5e ("mqueue: switch to on-demand creation of internal mount")
Reported-by: Felix Abecassis <fabecassis@nvidia.com>
Reported-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 ipc/mqueue.c | 74 ++++++++++++++++--------------------------------------------
 1 file changed, 19 insertions(+), 55 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index d7f309f74dec..a808f29d4c5a 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -325,9 +325,8 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
 static int mqueue_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct inode *inode;
-	struct ipc_namespace *ns = data;
+	struct ipc_namespace *ns = sb->s_fs_info;
 
-	sb->s_fs_info = ns;
 	sb->s_iflags |= SB_I_NOEXEC | SB_I_NODEV;
 	sb->s_blocksize = PAGE_SIZE;
 	sb->s_blocksize_bits = PAGE_SHIFT;
@@ -344,44 +343,18 @@ static int mqueue_fill_super(struct super_block *sb, void *data, int silent)
 	return 0;
 }
 
-static struct file_system_type mqueue_fs_type;
-/*
- * Return value is pinned only by reference in ->mq_mnt; it will
- * live until ipcns dies.  Caller does not need to drop it.
- */
-static struct vfsmount *mq_internal_mount(void)
-{
-	struct ipc_namespace *ns = current->nsproxy->ipc_ns;
-	struct vfsmount *m = ns->mq_mnt;
-	if (m)
-		return m;
-	m = kern_mount_data(&mqueue_fs_type, ns);
-	spin_lock(&mq_lock);
-	if (unlikely(ns->mq_mnt)) {
-		spin_unlock(&mq_lock);
-		if (!IS_ERR(m))
-			kern_unmount(m);
-		return ns->mq_mnt;
-	}
-	if (!IS_ERR(m))
-		ns->mq_mnt = m;
-	spin_unlock(&mq_lock);
-	return m;
-}
-
 static struct dentry *mqueue_mount(struct file_system_type *fs_type,
 			 int flags, const char *dev_name,
 			 void *data)
 {
-	struct vfsmount *m;
-	if (flags & SB_KERNMOUNT)
-		return mount_nodev(fs_type, flags, data, mqueue_fill_super);
-	m = mq_internal_mount();
-	if (IS_ERR(m))
-		return ERR_CAST(m);
-	atomic_inc(&m->mnt_sb->s_active);
-	down_write(&m->mnt_sb->s_umount);
-	return dget(m->mnt_root);
+	struct ipc_namespace *ns;
+	if (flags & SB_KERNMOUNT) {
+		ns = data;
+		data = NULL;
+	} else {
+		ns = current->nsproxy->ipc_ns;
+	}
+	return mount_ns(fs_type, flags, data, ns, ns->user_ns, mqueue_fill_super);
 }
 
 static void init_once(void *foo)
@@ -771,16 +744,13 @@ static int prepare_open(struct dentry *dentry, int oflag, int ro,
 static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
 		      struct mq_attr *attr)
 {
-	struct vfsmount *mnt = mq_internal_mount();
-	struct dentry *root;
+	struct vfsmount *mnt = current->nsproxy->ipc_ns->mq_mnt;
+	struct dentry *root = mnt->mnt_root;
 	struct filename *name;
 	struct path path;
 	int fd, error;
 	int ro;
 
-	if (IS_ERR(mnt))
-		return PTR_ERR(mnt);
-
 	audit_mq_open(oflag, mode, attr);
 
 	if (IS_ERR(name = getname(u_name)))
@@ -791,7 +761,6 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
 		goto out_putname;
 
 	ro = mnt_want_write(mnt);	/* we'll drop it in any case */
-	root = mnt->mnt_root;
 	inode_lock(d_inode(root));
 	path.dentry = lookup_one_len(name->name, root, strlen(name->name));
 	if (IS_ERR(path.dentry)) {
@@ -840,9 +809,6 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
 	struct ipc_namespace *ipc_ns = current->nsproxy->ipc_ns;
 	struct vfsmount *mnt = ipc_ns->mq_mnt;
 
-	if (!mnt)
-		return -ENOENT;
-
 	name = getname(u_name);
 	if (IS_ERR(name))
 		return PTR_ERR(name);
@@ -1569,26 +1535,28 @@ int mq_init_ns(struct ipc_namespace *ns)
 	ns->mq_msgsize_max   = DFLT_MSGSIZEMAX;
 	ns->mq_msg_default   = DFLT_MSG;
 	ns->mq_msgsize_default  = DFLT_MSGSIZE;
-	ns->mq_mnt = NULL;
 
+	ns->mq_mnt = kern_mount_data(&mqueue_fs_type, ns);
+	if (IS_ERR(ns->mq_mnt)) {
+		int err = PTR_ERR(ns->mq_mnt);
+		ns->mq_mnt = NULL;
+		return err;
+	}
 	return 0;
 }
 
 void mq_clear_sbinfo(struct ipc_namespace *ns)
 {
-	if (ns->mq_mnt)
-		ns->mq_mnt->mnt_sb->s_fs_info = NULL;
+	ns->mq_mnt->mnt_sb->s_fs_info = NULL;
 }
 
 void mq_put_mnt(struct ipc_namespace *ns)
 {
-	if (ns->mq_mnt)
-		kern_unmount(ns->mq_mnt);
+	kern_unmount(ns->mq_mnt);
 }
 
 static int __init init_mqueue_fs(void)
 {
-	struct vfsmount *m;
 	int error;
 
 	mqueue_inode_cachep = kmem_cache_create("mqueue_inode_cache",
@@ -1610,10 +1578,6 @@ static int __init init_mqueue_fs(void)
 	if (error)
 		goto out_filesystem;
 
-	m = kern_mount_data(&mqueue_fs_type, &init_ipc_ns);
-	if (IS_ERR(m))
-		goto out_filesystem;
-	init_ipc_ns.mq_mnt = m;
 	return 0;
 
 out_filesystem:
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-03-25  1:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-23  6:04 [REGRESSION v4.16-rc6] [PATCH] mqueue: forbid unprivileged user access to internal mount Aleksa Sarai
2018-03-23  6:31 ` Eric W. Biederman
2018-03-23 21:41   ` [PATCH 1/2] fs: Extend mount_ns with support for a fast namespace to vfsmount function Eric W. Biederman
2018-03-23 21:43     ` [PATCH 2/2] mqueuefs: Fix the permissions and permission checks when mounting mqueuefs Eric W. Biederman
2018-03-23 23:15     ` [PATCH 1/2] fs: Extend mount_ns with support for a fast namespace to vfsmount function Al Viro
2018-03-24 16:12       ` Eric W. Biederman
2018-03-24 21:48         ` Al Viro
2018-03-25  1:25           ` [GIT PULL] Revert "mqueue: switch to on-demand creation of internal mount" Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).