All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH review 0/13] Adding a userns owner to struct super_block
@ 2016-06-20 17:09 ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:09 UTC (permalink / raw)
  To: Linux Containers
  Cc: Miklos Szeredi, Andy Lutomirski, James Bottomley, Seth Forshee,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni


Available from git at:

    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing

This changeset is part of ongoing work by Seth Forshee and myself to
update the VFS to allow ordinary users to mount filesystems with a
backing store.  The primary target is the fuse filesystem but there
are other filesystems such as shiftfs that will benefit.

The high level idea is to:

- Assign filesystems a owning user namespace (s_user_ns).

- Update permission checks (such as the one in remount) to use s_user_ns.

- Interpret uids/gids from outside the kernels control as coming
  from inside s_user_ns.

- Handle vfs uid and gid fields containing INVALID_UID and INVALID_GID
  indicating there is no mapping from the filesystem uids and gids into
  the kernel representation.

This changeset addresses the first step in this process mounting
filesystems, and adding a s_user_ns field to struct super_block and
populating it appropriately.

The goal is to keep everything that is not filesystem specific at the
VFS layer and to ensure the VFS and security module issues are properly
handled before updating adding support for filesystems with backing
store external to the kernel such as fuse.

The bulk of this changeset is spent in the weird corner cases that exist
for the existing filesystems we allow mounting with just user namespace
permissions.  Cleaning up and reorganizing that code and handling the
generic mount options nodeve, noexec and nosuid.  The s_iflags flag
SB_I_NODEV is added to mark filesystems that may never contain devices
(which is everything except devpts that is mounted with just user
 namespace permissions).

Eric W. Biederman (13):
      mnt: Account for MS_RDONLY in fs_fully_visible
      mnt: Refactor fs_fully_visible into mount_too_revealing
      ipc: Initialize ipc_namespace->user_ns early.
      vfs: Pass data, ns, and ns->userns to mount_ns
      proc: Convert proc_mount to use mount_ns.
      fs: Add user namespace member to struct super_block
      mnt: Move the FS_USERNS_MOUNT check into sget_userns
      kernfs: The cgroup filesystem also benefits from SB_I_NOEXEC
      ipc/mqueue: The mqueue filesystem should never contain executables
      vfs: Generalize filesystem nodev handling.
      mnt: Simplify mount_too_revealing
      userns: Remove implicit MNT_NODEV fragility.
      userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag

 fs/block_dev.c        |  2 +-
 fs/devpts/inode.c     |  3 +-
 fs/kernfs/mount.c     |  5 ++-
 fs/namei.c            |  8 ++++-
 fs/namespace.c        | 90 ++++++++++++++++++++++-----------------------------
 fs/nfsd/nfsctl.c      | 13 +++-----
 fs/proc/inode.c       |  8 ++++-
 fs/proc/internal.h    |  3 +-
 fs/proc/root.c        | 54 +++----------------------------
 fs/super.c            | 69 ++++++++++++++++++++++++++++++++++-----
 fs/sysfs/mount.c      |  5 ++-
 include/linux/fs.h    | 24 +++++++++++---
 ipc/mqueue.c          | 20 ++++++------
 ipc/namespace.c       |  5 +--
 net/sunrpc/rpc_pipe.c |  8 ++---
 15 files changed, 169 insertions(+), 148 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH review 0/13] Adding a userns owner to struct super_block
@ 2016-06-20 17:09 ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:09 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Miklos Szeredi, James Bottomley, Djalal Harouni,
	Seth Forshee, Serge E. Hallyn, Andy Lutomirski


Available from git at:

    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing

This changeset is part of ongoing work by Seth Forshee and myself to
update the VFS to allow ordinary users to mount filesystems with a
backing store.  The primary target is the fuse filesystem but there
are other filesystems such as shiftfs that will benefit.

The high level idea is to:

- Assign filesystems a owning user namespace (s_user_ns).

- Update permission checks (such as the one in remount) to use s_user_ns.

- Interpret uids/gids from outside the kernels control as coming
  from inside s_user_ns.

- Handle vfs uid and gid fields containing INVALID_UID and INVALID_GID
  indicating there is no mapping from the filesystem uids and gids into
  the kernel representation.

This changeset addresses the first step in this process mounting
filesystems, and adding a s_user_ns field to struct super_block and
populating it appropriately.

The goal is to keep everything that is not filesystem specific at the
VFS layer and to ensure the VFS and security module issues are properly
handled before updating adding support for filesystems with backing
store external to the kernel such as fuse.

The bulk of this changeset is spent in the weird corner cases that exist
for the existing filesystems we allow mounting with just user namespace
permissions.  Cleaning up and reorganizing that code and handling the
generic mount options nodeve, noexec and nosuid.  The s_iflags flag
SB_I_NODEV is added to mark filesystems that may never contain devices
(which is everything except devpts that is mounted with just user
 namespace permissions).

Eric W. Biederman (13):
      mnt: Account for MS_RDONLY in fs_fully_visible
      mnt: Refactor fs_fully_visible into mount_too_revealing
      ipc: Initialize ipc_namespace->user_ns early.
      vfs: Pass data, ns, and ns->userns to mount_ns
      proc: Convert proc_mount to use mount_ns.
      fs: Add user namespace member to struct super_block
      mnt: Move the FS_USERNS_MOUNT check into sget_userns
      kernfs: The cgroup filesystem also benefits from SB_I_NOEXEC
      ipc/mqueue: The mqueue filesystem should never contain executables
      vfs: Generalize filesystem nodev handling.
      mnt: Simplify mount_too_revealing
      userns: Remove implicit MNT_NODEV fragility.
      userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag

 fs/block_dev.c        |  2 +-
 fs/devpts/inode.c     |  3 +-
 fs/kernfs/mount.c     |  5 ++-
 fs/namei.c            |  8 ++++-
 fs/namespace.c        | 90 ++++++++++++++++++++++-----------------------------
 fs/nfsd/nfsctl.c      | 13 +++-----
 fs/proc/inode.c       |  8 ++++-
 fs/proc/internal.h    |  3 +-
 fs/proc/root.c        | 54 +++----------------------------
 fs/super.c            | 69 ++++++++++++++++++++++++++++++++++-----
 fs/sysfs/mount.c      |  5 ++-
 include/linux/fs.h    | 24 +++++++++++---
 ipc/mqueue.c          | 20 ++++++------
 ipc/namespace.c       |  5 +--
 net/sunrpc/rpc_pipe.c |  8 ++---
 15 files changed, 169 insertions(+), 148 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH review 01/13] mnt: Account for MS_RDONLY in fs_fully_visible
  2016-06-20 17:09 ` Eric W. Biederman
@ 2016-06-20 17:21     ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: Miklos Szeredi, Andy Lutomirski, James Bottomley, Seth Forshee,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni

In rare cases it is possible for s_flags & MS_RDONLY to be set but
MNT_READONLY to be clear.  This starting combination can cause
fs_fully_visible to fail to ensure that the new mount is readonly.
Therefore force MNT_LOCK_READONLY in the new mount if MS_RDONLY
is set on the source filesystem of the mount.

In general both MS_RDONLY and MNT_READONLY are set at the same for
mounts so I don't expect any programs to care.  Nor do I expect
MS_RDONLY to be set on proc or sysfs in the initial user namespace,
which further decreases the likelyhood of problems.

Which means this change should only affect system configurations by
paranoid sysadmins who should welcome the additional protection
as it keeps people from wriggling out of their policies.

Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Fixes: 8c6cf9cc829f ("mnt: Modify fs_fully_visible to deal with locked ro nodev and atime")
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/namespace.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index a7ec92c051f5..783004af5707 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3247,6 +3247,10 @@ static bool fs_fully_visible(struct file_system_type *type, int *new_mnt_flags)
 		if (mnt->mnt.mnt_sb->s_iflags & SB_I_NOEXEC)
 			mnt_flags &= ~(MNT_LOCK_NOSUID | MNT_LOCK_NOEXEC);
 
+		/* Don't miss readonly hidden in the superblock flags */
+		if (mnt->mnt.mnt_sb->s_flags & MS_RDONLY)
+			mnt_flags |= MNT_LOCK_READONLY;
+
 		/* Verify the mount flags are equal to or more permissive
 		 * than the proposed new mount.
 		 */
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 01/13] mnt: Account for MS_RDONLY in fs_fully_visible
@ 2016-06-20 17:21     ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Miklos Szeredi, James Bottomley, Djalal Harouni,
	Seth Forshee, Serge E. Hallyn, Andy Lutomirski

In rare cases it is possible for s_flags & MS_RDONLY to be set but
MNT_READONLY to be clear.  This starting combination can cause
fs_fully_visible to fail to ensure that the new mount is readonly.
Therefore force MNT_LOCK_READONLY in the new mount if MS_RDONLY
is set on the source filesystem of the mount.

In general both MS_RDONLY and MNT_READONLY are set at the same for
mounts so I don't expect any programs to care.  Nor do I expect
MS_RDONLY to be set on proc or sysfs in the initial user namespace,
which further decreases the likelyhood of problems.

Which means this change should only affect system configurations by
paranoid sysadmins who should welcome the additional protection
as it keeps people from wriggling out of their policies.

Cc: stable@vger.kernel.org
Fixes: 8c6cf9cc829f ("mnt: Modify fs_fully_visible to deal with locked ro nodev and atime")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namespace.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index a7ec92c051f5..783004af5707 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3247,6 +3247,10 @@ static bool fs_fully_visible(struct file_system_type *type, int *new_mnt_flags)
 		if (mnt->mnt.mnt_sb->s_iflags & SB_I_NOEXEC)
 			mnt_flags &= ~(MNT_LOCK_NOSUID | MNT_LOCK_NOEXEC);
 
+		/* Don't miss readonly hidden in the superblock flags */
+		if (mnt->mnt.mnt_sb->s_flags & MS_RDONLY)
+			mnt_flags |= MNT_LOCK_READONLY;
+
 		/* Verify the mount flags are equal to or more permissive
 		 * than the proposed new mount.
 		 */
-- 
2.8.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 02/13] mnt: Refactor fs_fully_visible into mount_too_revealing
  2016-06-20 17:21     ` Eric W. Biederman
@ 2016-06-20 17:21         ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: Miklos Szeredi, Andy Lutomirski, James Bottomley, Seth Forshee,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni

Replace the call of fs_fully_visible in do_new_mount from before the
new superblock is allocated with a call of mount_too_revealing after
the superblock is allocated.   This winds up being a much better location
for maintainability of the code.

The first change this enables is the replacement of FS_USERNS_VISIBLE
with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
to sb_iflags on the superblock.

Unfortunately mount_too_revealing fundamentally needs to touch
mnt_flags adding several MNT_LOCKED_XXX flags at the appropriate
times.  If the mnt_flags did not need to be touched the code
could be easily moved into the filesystem specific mount code.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/namespace.c     | 38 +++++++++++++++++++++++++-------------
 fs/proc/inode.c    |  1 +
 fs/proc/root.c     |  2 +-
 fs/sysfs/mount.c   |  4 ++--
 include/linux/fs.h |  4 +++-
 5 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 783004af5707..1a69aa786975 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2375,7 +2375,7 @@ unlock:
 	return err;
 }
 
-static bool fs_fully_visible(struct file_system_type *fs_type, int *new_mnt_flags);
+static bool mount_too_revealing(struct vfsmount *mnt, int *new_mnt_flags);
 
 /*
  * create a new mount for userspace and request it to be added into the
@@ -2408,12 +2408,6 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 			flags |= MS_NODEV;
 			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
 		}
-		if (type->fs_flags & FS_USERNS_VISIBLE) {
-			if (!fs_fully_visible(type, &mnt_flags)) {
-				put_filesystem(type);
-				return -EPERM;
-			}
-		}
 	}
 
 	mnt = vfs_kern_mount(type, flags, name, data);
@@ -2425,6 +2419,11 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 	if (IS_ERR(mnt))
 		return PTR_ERR(mnt);
 
+	if (mount_too_revealing(mnt, &mnt_flags)) {
+		mntput(mnt);
+		return -EPERM;
+	}
+
 	err = do_add_mount(real_mount(mnt), path, mnt_flags);
 	if (err)
 		mntput(mnt);
@@ -3216,22 +3215,19 @@ bool current_chrooted(void)
 	return chrooted;
 }
 
-static bool fs_fully_visible(struct file_system_type *type, int *new_mnt_flags)
+static bool mnt_already_visible(struct mnt_namespace *ns, struct vfsmount *new,
+				int *new_mnt_flags)
 {
-	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
 	int new_flags = *new_mnt_flags;
 	struct mount *mnt;
 	bool visible = false;
 
-	if (unlikely(!ns))
-		return false;
-
 	down_read(&namespace_sem);
 	list_for_each_entry(mnt, &ns->list, mnt_list) {
 		struct mount *child;
 		int mnt_flags;
 
-		if (mnt->mnt.mnt_sb->s_type != type)
+		if (mnt->mnt.mnt_sb->s_type != new->mnt_sb->s_type)
 			continue;
 
 		/* This mount is not fully visible if it's root directory
@@ -3298,6 +3294,22 @@ found:
 	return visible;
 }
 
+static bool mount_too_revealing(struct vfsmount *mnt, int *new_mnt_flags)
+{
+	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
+	unsigned long s_iflags;
+
+	if (ns->user_ns == &init_user_ns)
+		return false;
+
+	/* Can this filesystem be too revealing? */
+	s_iflags = mnt->mnt_sb->s_iflags;
+	if (!(s_iflags & SB_I_USERNS_VISIBLE))
+		return false;
+
+	return !mnt_already_visible(ns, mnt, new_mnt_flags);
+}
+
 static struct ns_common *mntns_get(struct task_struct *task)
 {
 	struct ns_common *ns = NULL;
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 42305ddcbaa0..78fa452d65ed 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -462,6 +462,7 @@ int proc_fill_super(struct super_block *s)
 	struct inode *root_inode;
 	int ret;
 
+	s->s_iflags |= SB_I_USERNS_VISIBLE;
 	s->s_flags |= MS_NODIRATIME | MS_NOSUID | MS_NOEXEC;
 	s->s_blocksize = 1024;
 	s->s_blocksize_bits = 10;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 55bc7d6c8aac..a1b2860fec62 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -158,7 +158,7 @@ static struct file_system_type proc_fs_type = {
 	.name		= "proc",
 	.mount		= proc_mount,
 	.kill_sb	= proc_kill_sb,
-	.fs_flags	= FS_USERNS_VISIBLE | FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 void __init proc_root_init(void)
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index f3db82071cfb..f31e36994dfb 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -42,7 +42,7 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type,
 		kobj_ns_drop(KOBJ_NS_TYPE_NET, ns);
 	else if (new_sb)
 		/* Userspace would break if executables appear on sysfs */
-		root->d_sb->s_iflags |= SB_I_NOEXEC;
+		root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC;
 
 	return root;
 }
@@ -59,7 +59,7 @@ static struct file_system_type sysfs_fs_type = {
 	.name		= "sysfs",
 	.mount		= sysfs_mount,
 	.kill_sb	= sysfs_kill_sb,
-	.fs_flags	= FS_USERNS_VISIBLE | FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 int __init sysfs_init(void)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index dd288148a6b1..71988dd3af95 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1328,6 +1328,9 @@ struct mm_struct;
 #define SB_I_CGROUPWB	0x00000001	/* cgroup-aware writeback enabled */
 #define SB_I_NOEXEC	0x00000002	/* Ignore executables on this fs */
 
+/* sb->s_iflags to limit user namespace mounts */
+#define SB_I_USERNS_VISIBLE		0x00000010 /* fstype already mounted */
+
 /* Possible states of 'frozen' field */
 enum {
 	SB_UNFROZEN = 0,		/* FS is unfrozen */
@@ -2011,7 +2014,6 @@ struct file_system_type {
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_USERNS_DEV_MOUNT	16 /* A userns mount does not imply MNT_NODEV */
-#define FS_USERNS_VISIBLE	32	/* FS must already be visible */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	struct dentry *(*mount) (struct file_system_type *, int,
 		       const char *, void *);
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 02/13] mnt: Refactor fs_fully_visible into mount_too_revealing
@ 2016-06-20 17:21         ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Miklos Szeredi, James Bottomley, Djalal Harouni,
	Seth Forshee, Serge E. Hallyn, Andy Lutomirski

Replace the call of fs_fully_visible in do_new_mount from before the
new superblock is allocated with a call of mount_too_revealing after
the superblock is allocated.   This winds up being a much better location
for maintainability of the code.

The first change this enables is the replacement of FS_USERNS_VISIBLE
with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
to sb_iflags on the superblock.

Unfortunately mount_too_revealing fundamentally needs to touch
mnt_flags adding several MNT_LOCKED_XXX flags at the appropriate
times.  If the mnt_flags did not need to be touched the code
could be easily moved into the filesystem specific mount code.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namespace.c     | 38 +++++++++++++++++++++++++-------------
 fs/proc/inode.c    |  1 +
 fs/proc/root.c     |  2 +-
 fs/sysfs/mount.c   |  4 ++--
 include/linux/fs.h |  4 +++-
 5 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 783004af5707..1a69aa786975 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2375,7 +2375,7 @@ unlock:
 	return err;
 }
 
-static bool fs_fully_visible(struct file_system_type *fs_type, int *new_mnt_flags);
+static bool mount_too_revealing(struct vfsmount *mnt, int *new_mnt_flags);
 
 /*
  * create a new mount for userspace and request it to be added into the
@@ -2408,12 +2408,6 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 			flags |= MS_NODEV;
 			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
 		}
-		if (type->fs_flags & FS_USERNS_VISIBLE) {
-			if (!fs_fully_visible(type, &mnt_flags)) {
-				put_filesystem(type);
-				return -EPERM;
-			}
-		}
 	}
 
 	mnt = vfs_kern_mount(type, flags, name, data);
@@ -2425,6 +2419,11 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 	if (IS_ERR(mnt))
 		return PTR_ERR(mnt);
 
+	if (mount_too_revealing(mnt, &mnt_flags)) {
+		mntput(mnt);
+		return -EPERM;
+	}
+
 	err = do_add_mount(real_mount(mnt), path, mnt_flags);
 	if (err)
 		mntput(mnt);
@@ -3216,22 +3215,19 @@ bool current_chrooted(void)
 	return chrooted;
 }
 
-static bool fs_fully_visible(struct file_system_type *type, int *new_mnt_flags)
+static bool mnt_already_visible(struct mnt_namespace *ns, struct vfsmount *new,
+				int *new_mnt_flags)
 {
-	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
 	int new_flags = *new_mnt_flags;
 	struct mount *mnt;
 	bool visible = false;
 
-	if (unlikely(!ns))
-		return false;
-
 	down_read(&namespace_sem);
 	list_for_each_entry(mnt, &ns->list, mnt_list) {
 		struct mount *child;
 		int mnt_flags;
 
-		if (mnt->mnt.mnt_sb->s_type != type)
+		if (mnt->mnt.mnt_sb->s_type != new->mnt_sb->s_type)
 			continue;
 
 		/* This mount is not fully visible if it's root directory
@@ -3298,6 +3294,22 @@ found:
 	return visible;
 }
 
+static bool mount_too_revealing(struct vfsmount *mnt, int *new_mnt_flags)
+{
+	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
+	unsigned long s_iflags;
+
+	if (ns->user_ns == &init_user_ns)
+		return false;
+
+	/* Can this filesystem be too revealing? */
+	s_iflags = mnt->mnt_sb->s_iflags;
+	if (!(s_iflags & SB_I_USERNS_VISIBLE))
+		return false;
+
+	return !mnt_already_visible(ns, mnt, new_mnt_flags);
+}
+
 static struct ns_common *mntns_get(struct task_struct *task)
 {
 	struct ns_common *ns = NULL;
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 42305ddcbaa0..78fa452d65ed 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -462,6 +462,7 @@ int proc_fill_super(struct super_block *s)
 	struct inode *root_inode;
 	int ret;
 
+	s->s_iflags |= SB_I_USERNS_VISIBLE;
 	s->s_flags |= MS_NODIRATIME | MS_NOSUID | MS_NOEXEC;
 	s->s_blocksize = 1024;
 	s->s_blocksize_bits = 10;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 55bc7d6c8aac..a1b2860fec62 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -158,7 +158,7 @@ static struct file_system_type proc_fs_type = {
 	.name		= "proc",
 	.mount		= proc_mount,
 	.kill_sb	= proc_kill_sb,
-	.fs_flags	= FS_USERNS_VISIBLE | FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 void __init proc_root_init(void)
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index f3db82071cfb..f31e36994dfb 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -42,7 +42,7 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type,
 		kobj_ns_drop(KOBJ_NS_TYPE_NET, ns);
 	else if (new_sb)
 		/* Userspace would break if executables appear on sysfs */
-		root->d_sb->s_iflags |= SB_I_NOEXEC;
+		root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC;
 
 	return root;
 }
@@ -59,7 +59,7 @@ static struct file_system_type sysfs_fs_type = {
 	.name		= "sysfs",
 	.mount		= sysfs_mount,
 	.kill_sb	= sysfs_kill_sb,
-	.fs_flags	= FS_USERNS_VISIBLE | FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 int __init sysfs_init(void)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index dd288148a6b1..71988dd3af95 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1328,6 +1328,9 @@ struct mm_struct;
 #define SB_I_CGROUPWB	0x00000001	/* cgroup-aware writeback enabled */
 #define SB_I_NOEXEC	0x00000002	/* Ignore executables on this fs */
 
+/* sb->s_iflags to limit user namespace mounts */
+#define SB_I_USERNS_VISIBLE		0x00000010 /* fstype already mounted */
+
 /* Possible states of 'frozen' field */
 enum {
 	SB_UNFROZEN = 0,		/* FS is unfrozen */
@@ -2011,7 +2014,6 @@ struct file_system_type {
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_USERNS_DEV_MOUNT	16 /* A userns mount does not imply MNT_NODEV */
-#define FS_USERNS_VISIBLE	32	/* FS must already be visible */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	struct dentry *(*mount) (struct file_system_type *, int,
 		       const char *, void *);
-- 
2.8.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 03/13] ipc: Initialize ipc_namespace->user_ns early.
  2016-06-20 17:21     ` Eric W. Biederman
@ 2016-06-20 17:21         ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: Miklos Szeredi, Andy Lutomirski, James Bottomley, Seth Forshee,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni

Allow the ipc namespace initialization code to depend on ns->user_ns
being set during initialization.

In particular this allows mq_init_ns to use ns->user_ns for permission
checks and initializating s_user_ns while the the mq filesystem is
being mounted.

Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Suggested-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 ipc/namespace.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/ipc/namespace.c b/ipc/namespace.c
index 068caf18d565..04cb07eb81f1 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -34,8 +34,11 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 	ns->ns.ops = &ipcns_operations;
 
 	atomic_set(&ns->count, 1);
+	ns->user_ns = get_user_ns(user_ns);
+
 	err = mq_init_ns(ns);
 	if (err) {
+		put_user_ns(ns->user_ns);
 		ns_free_inum(&ns->ns);
 		kfree(ns);
 		return ERR_PTR(err);
@@ -46,8 +49,6 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 	msg_init_ns(ns);
 	shm_init_ns(ns);
 
-	ns->user_ns = get_user_ns(user_ns);
-
 	return ns;
 }
 
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 03/13] ipc: Initialize ipc_namespace->user_ns early.
@ 2016-06-20 17:21         ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Miklos Szeredi, James Bottomley, Djalal Harouni,
	Seth Forshee, Serge E. Hallyn, Andy Lutomirski

Allow the ipc namespace initialization code to depend on ns->user_ns
being set during initialization.

In particular this allows mq_init_ns to use ns->user_ns for permission
checks and initializating s_user_ns while the the mq filesystem is
being mounted.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Suggested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 ipc/namespace.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/ipc/namespace.c b/ipc/namespace.c
index 068caf18d565..04cb07eb81f1 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -34,8 +34,11 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 	ns->ns.ops = &ipcns_operations;
 
 	atomic_set(&ns->count, 1);
+	ns->user_ns = get_user_ns(user_ns);
+
 	err = mq_init_ns(ns);
 	if (err) {
+		put_user_ns(ns->user_ns);
 		ns_free_inum(&ns->ns);
 		kfree(ns);
 		return ERR_PTR(err);
@@ -46,8 +49,6 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 	msg_init_ns(ns);
 	shm_init_ns(ns);
 
-	ns->user_ns = get_user_ns(user_ns);
-
 	return ns;
 }
 
-- 
2.8.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 04/13] vfs: Pass data, ns, and ns->userns to mount_ns
  2016-06-20 17:21     ` Eric W. Biederman
@ 2016-06-20 17:21         ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: Miklos Szeredi, Andy Lutomirski, James Bottomley, Seth Forshee,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni

Today what is normally called data (the mount options) is not passed
to fill_super through mount_ns.

Pass the mount options and the namespace separately to mount_ns so
that filesystems such as proc that have mount options, can use
mount_ns.

Pass the user namespace to mount_ns so that the standard permission
check that verifies the mounter has permissions over the namespace can
be performed in mount_ns instead of in each filesystems .mount method.
Thus removing the duplication between mqueuefs and proc in terms of
permission checks.  The extra permission check does not currently
affect the rpc_pipefs filesystem and the nfsd filesystem as those
filesystems do not currently allow unprivileged mounts.  Without
unpvileged mounts it is guaranteed that the caller has already passed
capable(CAP_SYS_ADMIN) which guarantees extra permission check will
pass.

Update rpc_pipefs and the nfsd filesystem to ensure that the network
namespace reference is always taken in fill_super and always put in kill_sb
so that the logic is simpler and so that errors originating inside of
fill_super do not cause a network namespace leak.

Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/nfsd/nfsctl.c      | 13 ++++---------
 fs/super.c            | 13 ++++++++++---
 include/linux/fs.h    |  5 +++--
 ipc/mqueue.c          | 19 ++++++++-----------
 net/sunrpc/rpc_pipe.c |  8 ++++----
 5 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 9690cb4dd588..5a6ae2522266 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1154,20 +1154,15 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
 #endif
 		/* last one */ {""}
 	};
-	struct net *net = data;
-	int ret;
-
-	ret = simple_fill_super(sb, 0x6e667364, nfsd_files);
-	if (ret)
-		return ret;
-	sb->s_fs_info = get_net(net);
-	return 0;
+	get_net(sb->s_fs_info);
+	return simple_fill_super(sb, 0x6e667364, nfsd_files);
 }
 
 static struct dentry *nfsd_mount(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
-	return mount_ns(fs_type, flags, current->nsproxy->net_ns, nfsd_fill_super);
+	struct net *net = current->nsproxy->net_ns;
+	return mount_ns(fs_type, flags, data, net, net->user_ns, nfsd_fill_super);
 }
 
 static void nfsd_umount(struct super_block *sb)
diff --git a/fs/super.c b/fs/super.c
index d78b9847e6cb..fd65667832e5 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -918,12 +918,19 @@ static int ns_set_super(struct super_block *sb, void *data)
 	return set_anon_super(sb, NULL);
 }
 
-struct dentry *mount_ns(struct file_system_type *fs_type, int flags,
-	void *data, int (*fill_super)(struct super_block *, void *, int))
+struct dentry *mount_ns(struct file_system_type *fs_type,
+	int flags, void *data, void *ns, struct user_namespace *user_ns,
+	int (*fill_super)(struct super_block *, void *, int))
 {
 	struct super_block *sb;
 
-	sb = sget(fs_type, ns_test_super, ns_set_super, flags, data);
+	/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
+	 * over the namespace.
+	 */
+	if (!(flags & MS_KERNMOUNT) && !ns_capable(user_ns, CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
+
+	sb = sget(fs_type, ns_test_super, ns_set_super, flags, ns);
 	if (IS_ERR(sb))
 		return ERR_CAST(sb);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 71988dd3af95..1ce006a24f49 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2034,8 +2034,9 @@ struct file_system_type {
 
 #define MODULE_ALIAS_FS(NAME) MODULE_ALIAS("fs-" NAME)
 
-extern struct dentry *mount_ns(struct file_system_type *fs_type, int flags,
-	void *data, int (*fill_super)(struct super_block *, void *, int));
+extern struct dentry *mount_ns(struct file_system_type *fs_type,
+	int flags, void *data, void *ns, struct user_namespace *user_ns,
+	int (*fill_super)(struct super_block *, void *, int));
 extern struct dentry *mount_bdev(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data,
 	int (*fill_super)(struct super_block *, void *, int));
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index ade739f67f1d..60d97082f4dc 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -305,7 +305,7 @@ err:
 static int mqueue_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct inode *inode;
-	struct ipc_namespace *ns = data;
+	struct ipc_namespace *ns = sb->s_fs_info;
 
 	sb->s_blocksize = PAGE_SIZE;
 	sb->s_blocksize_bits = PAGE_SHIFT;
@@ -326,17 +326,14 @@ static struct dentry *mqueue_mount(struct file_system_type *fs_type,
 			 int flags, const char *dev_name,
 			 void *data)
 {
-	if (!(flags & MS_KERNMOUNT)) {
-		struct ipc_namespace *ns = current->nsproxy->ipc_ns;
-		/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
-		 * over the ipc namespace.
-		 */
-		if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
-			return ERR_PTR(-EPERM);
-
-		data = ns;
+	struct ipc_namespace *ns;
+	if (flags & MS_KERNMOUNT) {
+		ns = data;
+		data = NULL;
+	} else {
+		ns = current->nsproxy->ipc_ns;
 	}
-	return mount_ns(fs_type, flags, data, mqueue_fill_super);
+	return mount_ns(fs_type, flags, data, ns, ns->user_ns, mqueue_fill_super);
 }
 
 static void init_once(void *foo)
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index fc48eca21fd2..84f98cbe31c3 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -1386,7 +1386,7 @@ rpc_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct inode *inode;
 	struct dentry *root, *gssd_dentry;
-	struct net *net = data;
+	struct net *net = get_net(sb->s_fs_info);
 	struct sunrpc_net *sn = net_generic(net, sunrpc_net_id);
 	int err;
 
@@ -1419,7 +1419,6 @@ rpc_fill_super(struct super_block *sb, void *data, int silent)
 					   sb);
 	if (err)
 		goto err_depopulate;
-	sb->s_fs_info = get_net(net);
 	mutex_unlock(&sn->pipefs_sb_lock);
 	return 0;
 
@@ -1448,7 +1447,8 @@ static struct dentry *
 rpc_mount(struct file_system_type *fs_type,
 		int flags, const char *dev_name, void *data)
 {
-	return mount_ns(fs_type, flags, current->nsproxy->net_ns, rpc_fill_super);
+	struct net *net = current->nsproxy->net_ns;
+	return mount_ns(fs_type, flags, data, net, net->user_ns, rpc_fill_super);
 }
 
 static void rpc_kill_sb(struct super_block *sb)
@@ -1468,9 +1468,9 @@ static void rpc_kill_sb(struct super_block *sb)
 					   RPC_PIPEFS_UMOUNT,
 					   sb);
 	mutex_unlock(&sn->pipefs_sb_lock);
-	put_net(net);
 out:
 	kill_litter_super(sb);
+	put_net(net);
 }
 
 static struct file_system_type rpc_pipe_fs_type = {
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 04/13] vfs: Pass data, ns, and ns->userns to mount_ns
@ 2016-06-20 17:21         ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Miklos Szeredi, James Bottomley, Djalal Harouni,
	Seth Forshee, Serge E. Hallyn, Andy Lutomirski

Today what is normally called data (the mount options) is not passed
to fill_super through mount_ns.

Pass the mount options and the namespace separately to mount_ns so
that filesystems such as proc that have mount options, can use
mount_ns.

Pass the user namespace to mount_ns so that the standard permission
check that verifies the mounter has permissions over the namespace can
be performed in mount_ns instead of in each filesystems .mount method.
Thus removing the duplication between mqueuefs and proc in terms of
permission checks.  The extra permission check does not currently
affect the rpc_pipefs filesystem and the nfsd filesystem as those
filesystems do not currently allow unprivileged mounts.  Without
unpvileged mounts it is guaranteed that the caller has already passed
capable(CAP_SYS_ADMIN) which guarantees extra permission check will
pass.

Update rpc_pipefs and the nfsd filesystem to ensure that the network
namespace reference is always taken in fill_super and always put in kill_sb
so that the logic is simpler and so that errors originating inside of
fill_super do not cause a network namespace leak.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/nfsd/nfsctl.c      | 13 ++++---------
 fs/super.c            | 13 ++++++++++---
 include/linux/fs.h    |  5 +++--
 ipc/mqueue.c          | 19 ++++++++-----------
 net/sunrpc/rpc_pipe.c |  8 ++++----
 5 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 9690cb4dd588..5a6ae2522266 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1154,20 +1154,15 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
 #endif
 		/* last one */ {""}
 	};
-	struct net *net = data;
-	int ret;
-
-	ret = simple_fill_super(sb, 0x6e667364, nfsd_files);
-	if (ret)
-		return ret;
-	sb->s_fs_info = get_net(net);
-	return 0;
+	get_net(sb->s_fs_info);
+	return simple_fill_super(sb, 0x6e667364, nfsd_files);
 }
 
 static struct dentry *nfsd_mount(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
-	return mount_ns(fs_type, flags, current->nsproxy->net_ns, nfsd_fill_super);
+	struct net *net = current->nsproxy->net_ns;
+	return mount_ns(fs_type, flags, data, net, net->user_ns, nfsd_fill_super);
 }
 
 static void nfsd_umount(struct super_block *sb)
diff --git a/fs/super.c b/fs/super.c
index d78b9847e6cb..fd65667832e5 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -918,12 +918,19 @@ static int ns_set_super(struct super_block *sb, void *data)
 	return set_anon_super(sb, NULL);
 }
 
-struct dentry *mount_ns(struct file_system_type *fs_type, int flags,
-	void *data, int (*fill_super)(struct super_block *, void *, int))
+struct dentry *mount_ns(struct file_system_type *fs_type,
+	int flags, void *data, void *ns, struct user_namespace *user_ns,
+	int (*fill_super)(struct super_block *, void *, int))
 {
 	struct super_block *sb;
 
-	sb = sget(fs_type, ns_test_super, ns_set_super, flags, data);
+	/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
+	 * over the namespace.
+	 */
+	if (!(flags & MS_KERNMOUNT) && !ns_capable(user_ns, CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
+
+	sb = sget(fs_type, ns_test_super, ns_set_super, flags, ns);
 	if (IS_ERR(sb))
 		return ERR_CAST(sb);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 71988dd3af95..1ce006a24f49 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2034,8 +2034,9 @@ struct file_system_type {
 
 #define MODULE_ALIAS_FS(NAME) MODULE_ALIAS("fs-" NAME)
 
-extern struct dentry *mount_ns(struct file_system_type *fs_type, int flags,
-	void *data, int (*fill_super)(struct super_block *, void *, int));
+extern struct dentry *mount_ns(struct file_system_type *fs_type,
+	int flags, void *data, void *ns, struct user_namespace *user_ns,
+	int (*fill_super)(struct super_block *, void *, int));
 extern struct dentry *mount_bdev(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data,
 	int (*fill_super)(struct super_block *, void *, int));
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index ade739f67f1d..60d97082f4dc 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -305,7 +305,7 @@ err:
 static int mqueue_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct inode *inode;
-	struct ipc_namespace *ns = data;
+	struct ipc_namespace *ns = sb->s_fs_info;
 
 	sb->s_blocksize = PAGE_SIZE;
 	sb->s_blocksize_bits = PAGE_SHIFT;
@@ -326,17 +326,14 @@ static struct dentry *mqueue_mount(struct file_system_type *fs_type,
 			 int flags, const char *dev_name,
 			 void *data)
 {
-	if (!(flags & MS_KERNMOUNT)) {
-		struct ipc_namespace *ns = current->nsproxy->ipc_ns;
-		/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
-		 * over the ipc namespace.
-		 */
-		if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
-			return ERR_PTR(-EPERM);
-
-		data = ns;
+	struct ipc_namespace *ns;
+	if (flags & MS_KERNMOUNT) {
+		ns = data;
+		data = NULL;
+	} else {
+		ns = current->nsproxy->ipc_ns;
 	}
-	return mount_ns(fs_type, flags, data, mqueue_fill_super);
+	return mount_ns(fs_type, flags, data, ns, ns->user_ns, mqueue_fill_super);
 }
 
 static void init_once(void *foo)
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index fc48eca21fd2..84f98cbe31c3 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -1386,7 +1386,7 @@ rpc_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct inode *inode;
 	struct dentry *root, *gssd_dentry;
-	struct net *net = data;
+	struct net *net = get_net(sb->s_fs_info);
 	struct sunrpc_net *sn = net_generic(net, sunrpc_net_id);
 	int err;
 
@@ -1419,7 +1419,6 @@ rpc_fill_super(struct super_block *sb, void *data, int silent)
 					   sb);
 	if (err)
 		goto err_depopulate;
-	sb->s_fs_info = get_net(net);
 	mutex_unlock(&sn->pipefs_sb_lock);
 	return 0;
 
@@ -1448,7 +1447,8 @@ static struct dentry *
 rpc_mount(struct file_system_type *fs_type,
 		int flags, const char *dev_name, void *data)
 {
-	return mount_ns(fs_type, flags, current->nsproxy->net_ns, rpc_fill_super);
+	struct net *net = current->nsproxy->net_ns;
+	return mount_ns(fs_type, flags, data, net, net->user_ns, rpc_fill_super);
 }
 
 static void rpc_kill_sb(struct super_block *sb)
@@ -1468,9 +1468,9 @@ static void rpc_kill_sb(struct super_block *sb)
 					   RPC_PIPEFS_UMOUNT,
 					   sb);
 	mutex_unlock(&sn->pipefs_sb_lock);
-	put_net(net);
 out:
 	kill_litter_super(sb);
+	put_net(net);
 }
 
 static struct file_system_type rpc_pipe_fs_type = {
-- 
2.8.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 05/13] proc: Convert proc_mount to use mount_ns.
  2016-06-20 17:21     ` Eric W. Biederman
@ 2016-06-20 17:21         ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: Miklos Szeredi, Andy Lutomirski, James Bottomley, Seth Forshee,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni

Move the call of get_pid_ns, the call of proc_parse_options, and
the setting of s_iflags into proc_fill_super so that mount_ns
can be used.

Convert proc_mount to call mount_ns and remove the now unnecessary
code.

Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/proc/inode.c    |  9 +++++++--
 fs/proc/internal.h |  3 ++-
 fs/proc/root.c     | 52 ++++------------------------------------------------
 3 files changed, 13 insertions(+), 51 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 78fa452d65ed..f4817efb25a6 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -457,12 +457,17 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 	return inode;
 }
 
-int proc_fill_super(struct super_block *s)
+int proc_fill_super(struct super_block *s, void *data, int silent)
 {
+	struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
 	struct inode *root_inode;
 	int ret;
 
-	s->s_iflags |= SB_I_USERNS_VISIBLE;
+	if (!proc_parse_options(data, ns))
+		return -EINVAL;
+
+	/* User space would break if executables appear on proc */
+	s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC;
 	s->s_flags |= MS_NODIRATIME | MS_NOSUID | MS_NOEXEC;
 	s->s_blocksize = 1024;
 	s->s_blocksize_bits = 10;
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index aa2781095bd1..7931c558c192 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -212,7 +212,7 @@ extern const struct inode_operations proc_pid_link_inode_operations;
 
 extern void proc_init_inodecache(void);
 extern struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *);
-extern int proc_fill_super(struct super_block *);
+extern int proc_fill_super(struct super_block *, void *data, int flags);
 extern void proc_entry_rundown(struct proc_dir_entry *);
 
 /*
@@ -268,6 +268,7 @@ static inline void proc_tty_init(void) {}
  * root.c
  */
 extern struct proc_dir_entry proc_root;
+extern int proc_parse_options(char *options, struct pid_namespace *pid);
 
 extern void proc_self_init(void);
 extern int proc_remount(struct super_block *, int *, char *);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index a1b2860fec62..8d3e484055a6 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -23,21 +23,6 @@
 
 #include "internal.h"
 
-static int proc_test_super(struct super_block *sb, void *data)
-{
-	return sb->s_fs_info == data;
-}
-
-static int proc_set_super(struct super_block *sb, void *data)
-{
-	int err = set_anon_super(sb, NULL);
-	if (!err) {
-		struct pid_namespace *ns = (struct pid_namespace *)data;
-		sb->s_fs_info = get_pid_ns(ns);
-	}
-	return err;
-}
-
 enum {
 	Opt_gid, Opt_hidepid, Opt_err,
 };
@@ -48,7 +33,7 @@ static const match_table_t tokens = {
 	{Opt_err, NULL},
 };
 
-static int proc_parse_options(char *options, struct pid_namespace *pid)
+int proc_parse_options(char *options, struct pid_namespace *pid)
 {
 	char *p;
 	substring_t args[MAX_OPT_ARGS];
@@ -100,45 +85,16 @@ int proc_remount(struct super_block *sb, int *flags, char *data)
 static struct dentry *proc_mount(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
-	int err;
-	struct super_block *sb;
 	struct pid_namespace *ns;
-	char *options;
 
 	if (flags & MS_KERNMOUNT) {
-		ns = (struct pid_namespace *)data;
-		options = NULL;
+		ns = data;
+		data = NULL;
 	} else {
 		ns = task_active_pid_ns(current);
-		options = data;
-
-		/* Does the mounter have privilege over the pid namespace? */
-		if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
-			return ERR_PTR(-EPERM);
-	}
-
-	sb = sget(fs_type, proc_test_super, proc_set_super, flags, ns);
-	if (IS_ERR(sb))
-		return ERR_CAST(sb);
-
-	if (!proc_parse_options(options, ns)) {
-		deactivate_locked_super(sb);
-		return ERR_PTR(-EINVAL);
-	}
-
-	if (!sb->s_root) {
-		err = proc_fill_super(sb);
-		if (err) {
-			deactivate_locked_super(sb);
-			return ERR_PTR(err);
-		}
-
-		sb->s_flags |= MS_ACTIVE;
-		/* User space would break if executables appear on proc */
-		sb->s_iflags |= SB_I_NOEXEC;
 	}
 
-	return dget(sb->s_root);
+	return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super);
 }
 
 static void proc_kill_sb(struct super_block *sb)
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 05/13] proc: Convert proc_mount to use mount_ns.
@ 2016-06-20 17:21         ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Miklos Szeredi, James Bottomley, Djalal Harouni,
	Seth Forshee, Serge E. Hallyn, Andy Lutomirski

Move the call of get_pid_ns, the call of proc_parse_options, and
the setting of s_iflags into proc_fill_super so that mount_ns
can be used.

Convert proc_mount to call mount_ns and remove the now unnecessary
code.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/proc/inode.c    |  9 +++++++--
 fs/proc/internal.h |  3 ++-
 fs/proc/root.c     | 52 ++++------------------------------------------------
 3 files changed, 13 insertions(+), 51 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 78fa452d65ed..f4817efb25a6 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -457,12 +457,17 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 	return inode;
 }
 
-int proc_fill_super(struct super_block *s)
+int proc_fill_super(struct super_block *s, void *data, int silent)
 {
+	struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
 	struct inode *root_inode;
 	int ret;
 
-	s->s_iflags |= SB_I_USERNS_VISIBLE;
+	if (!proc_parse_options(data, ns))
+		return -EINVAL;
+
+	/* User space would break if executables appear on proc */
+	s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC;
 	s->s_flags |= MS_NODIRATIME | MS_NOSUID | MS_NOEXEC;
 	s->s_blocksize = 1024;
 	s->s_blocksize_bits = 10;
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index aa2781095bd1..7931c558c192 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -212,7 +212,7 @@ extern const struct inode_operations proc_pid_link_inode_operations;
 
 extern void proc_init_inodecache(void);
 extern struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *);
-extern int proc_fill_super(struct super_block *);
+extern int proc_fill_super(struct super_block *, void *data, int flags);
 extern void proc_entry_rundown(struct proc_dir_entry *);
 
 /*
@@ -268,6 +268,7 @@ static inline void proc_tty_init(void) {}
  * root.c
  */
 extern struct proc_dir_entry proc_root;
+extern int proc_parse_options(char *options, struct pid_namespace *pid);
 
 extern void proc_self_init(void);
 extern int proc_remount(struct super_block *, int *, char *);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index a1b2860fec62..8d3e484055a6 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -23,21 +23,6 @@
 
 #include "internal.h"
 
-static int proc_test_super(struct super_block *sb, void *data)
-{
-	return sb->s_fs_info == data;
-}
-
-static int proc_set_super(struct super_block *sb, void *data)
-{
-	int err = set_anon_super(sb, NULL);
-	if (!err) {
-		struct pid_namespace *ns = (struct pid_namespace *)data;
-		sb->s_fs_info = get_pid_ns(ns);
-	}
-	return err;
-}
-
 enum {
 	Opt_gid, Opt_hidepid, Opt_err,
 };
@@ -48,7 +33,7 @@ static const match_table_t tokens = {
 	{Opt_err, NULL},
 };
 
-static int proc_parse_options(char *options, struct pid_namespace *pid)
+int proc_parse_options(char *options, struct pid_namespace *pid)
 {
 	char *p;
 	substring_t args[MAX_OPT_ARGS];
@@ -100,45 +85,16 @@ int proc_remount(struct super_block *sb, int *flags, char *data)
 static struct dentry *proc_mount(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
-	int err;
-	struct super_block *sb;
 	struct pid_namespace *ns;
-	char *options;
 
 	if (flags & MS_KERNMOUNT) {
-		ns = (struct pid_namespace *)data;
-		options = NULL;
+		ns = data;
+		data = NULL;
 	} else {
 		ns = task_active_pid_ns(current);
-		options = data;
-
-		/* Does the mounter have privilege over the pid namespace? */
-		if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
-			return ERR_PTR(-EPERM);
-	}
-
-	sb = sget(fs_type, proc_test_super, proc_set_super, flags, ns);
-	if (IS_ERR(sb))
-		return ERR_CAST(sb);
-
-	if (!proc_parse_options(options, ns)) {
-		deactivate_locked_super(sb);
-		return ERR_PTR(-EINVAL);
-	}
-
-	if (!sb->s_root) {
-		err = proc_fill_super(sb);
-		if (err) {
-			deactivate_locked_super(sb);
-			return ERR_PTR(err);
-		}
-
-		sb->s_flags |= MS_ACTIVE;
-		/* User space would break if executables appear on proc */
-		sb->s_iflags |= SB_I_NOEXEC;
 	}
 
-	return dget(sb->s_root);
+	return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super);
 }
 
 static void proc_kill_sb(struct super_block *sb)
-- 
2.8.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 06/13] fs: Add user namespace member to struct super_block
  2016-06-20 17:21     ` Eric W. Biederman
@ 2016-06-20 17:21         ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: Miklos Szeredi, Andy Lutomirski, James Bottomley, Seth Forshee,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni

Start marking filesystems with a user namespace owner, s_user_ns.  In
this change this is only used for permission checks of who may mount a
filesystem.  Ultimately s_user_ns will be used for translating ids and
checking capabilities for filesystems mounted from user namespaces.

The default policy for setting s_user_ns is implemented in sget(),
which arranges for s_user_ns to be set to current_user_ns() and to
ensure that the mounter of the filesystem has CAP_SYS_ADMIN in that
user_ns.

The guts of sget are split out into another function sget_userns().
The function sget_userns calls alloc_super with the specified user
namespace or it verifies the existing superblock that was found
has the expected user namespace, and fails with EBUSY when it is not.
This failing prevents users with the wrong privileges mounting a
filesystem.

The reason for the split of sget_userns from sget is that in some
cases such as mount_ns and kernfs_mount_ns a different policy for
permission checking of mounts and setting s_user_ns is necessary, and
the existence of sget_userns() allows those policies to be
implemented.

The helper mount_ns is expected to be used for filesystems such as
proc and mqueuefs which present per namespace information.  The
function mount_ns is modified to call sget_userns instead of sget to
ensure the user namespace owner of the namespace whose information is
presented by the filesystem is used on the superblock.

For sysfs and cgroup the appropriate permission checks are already in
place, and kernfs_mount_ns is modified to call sget_userns so that
the init_user_ns is the only user namespace used.

For the cgroup filesystem cgroup namespace mounts are bind mounts of a
subset of the full cgroup filesystem and as such s_user_ns must be the
same for all of them as there is only a single superblock.

Mounts of sysfs that vary based on the network namespace could in principle
change s_user_ns but it keeps the analysis and implementation of kernfs
simpler if that is not supported, and at present there appear to be no
benefits from supporting a different s_user_ns on any sysfs mount.

Getting the details of setting s_user_ns correct has been
a long process.  Thanks to Pavel Tikhorirorv who spotted a leak
in sget_userns.  Thanks to Seth Forshee who has kept the work alive.

Thanks-to: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Thanks-to: Pavel Tikhomirov <ptikhomirov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/kernfs/mount.c  |  3 ++-
 fs/super.c         | 52 ++++++++++++++++++++++++++++++++++++++++++++++------
 include/linux/fs.h | 12 ++++++++++++
 3 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 63534f5f9073..d90d574c15a2 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -241,7 +241,8 @@ struct dentry *kernfs_mount_ns(struct file_system_type *fs_type, int flags,
 	info->root = root;
 	info->ns = ns;
 
-	sb = sget(fs_type, kernfs_test_super, kernfs_set_super, flags, info);
+	sb = sget_userns(fs_type, kernfs_test_super, kernfs_set_super, flags,
+			 &init_user_ns, info);
 	if (IS_ERR(sb) || sb->s_fs_info != info)
 		kfree(info);
 	if (IS_ERR(sb))
diff --git a/fs/super.c b/fs/super.c
index fd65667832e5..874c7e3ebb8f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -33,6 +33,7 @@
 #include <linux/cleancache.h>
 #include <linux/fsnotify.h>
 #include <linux/lockdep.h>
+#include <linux/user_namespace.h>
 #include "internal.h"
 
 
@@ -165,6 +166,7 @@ static void destroy_super(struct super_block *s)
 	list_lru_destroy(&s->s_inode_lru);
 	security_sb_free(s);
 	WARN_ON(!list_empty(&s->s_mounts));
+	put_user_ns(s->s_user_ns);
 	kfree(s->s_subtype);
 	kfree(s->s_options);
 	call_rcu(&s->rcu, destroy_super_rcu);
@@ -174,11 +176,13 @@ static void destroy_super(struct super_block *s)
  *	alloc_super	-	create new superblock
  *	@type:	filesystem type superblock should belong to
  *	@flags: the mount flags
+ *	@user_ns: User namespace for the super_block
  *
  *	Allocates and initializes a new &struct super_block.  alloc_super()
  *	returns a pointer new superblock or %NULL if allocation had failed.
  */
-static struct super_block *alloc_super(struct file_system_type *type, int flags)
+static struct super_block *alloc_super(struct file_system_type *type, int flags,
+				       struct user_namespace *user_ns)
 {
 	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
 	static const struct super_operations default_op;
@@ -188,6 +192,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 		return NULL;
 
 	INIT_LIST_HEAD(&s->s_mounts);
+	s->s_user_ns = get_user_ns(user_ns);
 
 	if (security_sb_alloc(s))
 		goto fail;
@@ -443,17 +448,18 @@ void generic_shutdown_super(struct super_block *sb)
 EXPORT_SYMBOL(generic_shutdown_super);
 
 /**
- *	sget	-	find or create a superblock
+ *	sget_userns -	find or create a superblock
  *	@type:	filesystem type superblock should belong to
  *	@test:	comparison callback
  *	@set:	setup callback
  *	@flags:	mount flags
+ *	@user_ns: User namespace for the super_block
  *	@data:	argument to each of them
  */
-struct super_block *sget(struct file_system_type *type,
+struct super_block *sget_userns(struct file_system_type *type,
 			int (*test)(struct super_block *,void *),
 			int (*set)(struct super_block *,void *),
-			int flags,
+			int flags, struct user_namespace *user_ns,
 			void *data)
 {
 	struct super_block *s = NULL;
@@ -466,6 +472,14 @@ retry:
 		hlist_for_each_entry(old, &type->fs_supers, s_instances) {
 			if (!test(old, data))
 				continue;
+			if (user_ns != old->s_user_ns) {
+				spin_unlock(&sb_lock);
+				if (s) {
+					up_write(&s->s_umount);
+					destroy_super(s);
+				}
+				return ERR_PTR(-EBUSY);
+			}
 			if (!grab_super(old))
 				goto retry;
 			if (s) {
@@ -478,7 +492,7 @@ retry:
 	}
 	if (!s) {
 		spin_unlock(&sb_lock);
-		s = alloc_super(type, flags);
+		s = alloc_super(type, flags, user_ns);
 		if (!s)
 			return ERR_PTR(-ENOMEM);
 		goto retry;
@@ -501,6 +515,31 @@ retry:
 	return s;
 }
 
+EXPORT_SYMBOL(sget_userns);
+
+/**
+ *	sget	-	find or create a superblock
+ *	@type:	  filesystem type superblock should belong to
+ *	@test:	  comparison callback
+ *	@set:	  setup callback
+ *	@flags:	  mount flags
+ *	@data:	  argument to each of them
+ */
+struct super_block *sget(struct file_system_type *type,
+			int (*test)(struct super_block *,void *),
+			int (*set)(struct super_block *,void *),
+			int flags,
+			void *data)
+{
+	struct user_namespace *user_ns = current_user_ns();
+
+	/* Ensure the requestor has permissions over the target filesystem */
+	if (!(flags & MS_KERNMOUNT) && !ns_capable(user_ns, CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
+
+	return sget_userns(type, test, set, flags, user_ns, data);
+}
+
 EXPORT_SYMBOL(sget);
 
 void drop_super(struct super_block *sb)
@@ -930,7 +969,8 @@ struct dentry *mount_ns(struct file_system_type *fs_type,
 	if (!(flags & MS_KERNMOUNT) && !ns_capable(user_ns, CAP_SYS_ADMIN))
 		return ERR_PTR(-EPERM);
 
-	sb = sget(fs_type, ns_test_super, ns_set_super, flags, ns);
+	sb = sget_userns(fs_type, ns_test_super, ns_set_super, flags,
+			 user_ns, ns);
 	if (IS_ERR(sb))
 		return ERR_CAST(sb);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1ce006a24f49..9eef64f23a75 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1433,6 +1433,13 @@ struct super_block {
 	struct hlist_head s_pins;
 
 	/*
+	 * Owning user namespace and default context in which to
+	 * interpret filesystem uids, gids, quotas, device nodes,
+	 * xattrs and security labels.
+	 */
+	struct user_namespace *s_user_ns;
+
+	/*
 	 * Keep the lru lists last in the structure so they always sit on their
 	 * own individual cachelines.
 	 */
@@ -2056,6 +2063,11 @@ void deactivate_locked_super(struct super_block *sb);
 int set_anon_super(struct super_block *s, void *data);
 int get_anon_bdev(dev_t *);
 void free_anon_bdev(dev_t);
+struct super_block *sget_userns(struct file_system_type *type,
+			int (*test)(struct super_block *,void *),
+			int (*set)(struct super_block *,void *),
+			int flags, struct user_namespace *user_ns,
+			void *data);
 struct super_block *sget(struct file_system_type *type,
 			int (*test)(struct super_block *,void *),
 			int (*set)(struct super_block *,void *),
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 06/13] fs: Add user namespace member to struct super_block
@ 2016-06-20 17:21         ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Miklos Szeredi, James Bottomley, Djalal Harouni,
	Seth Forshee, Serge E. Hallyn, Andy Lutomirski

Start marking filesystems with a user namespace owner, s_user_ns.  In
this change this is only used for permission checks of who may mount a
filesystem.  Ultimately s_user_ns will be used for translating ids and
checking capabilities for filesystems mounted from user namespaces.

The default policy for setting s_user_ns is implemented in sget(),
which arranges for s_user_ns to be set to current_user_ns() and to
ensure that the mounter of the filesystem has CAP_SYS_ADMIN in that
user_ns.

The guts of sget are split out into another function sget_userns().
The function sget_userns calls alloc_super with the specified user
namespace or it verifies the existing superblock that was found
has the expected user namespace, and fails with EBUSY when it is not.
This failing prevents users with the wrong privileges mounting a
filesystem.

The reason for the split of sget_userns from sget is that in some
cases such as mount_ns and kernfs_mount_ns a different policy for
permission checking of mounts and setting s_user_ns is necessary, and
the existence of sget_userns() allows those policies to be
implemented.

The helper mount_ns is expected to be used for filesystems such as
proc and mqueuefs which present per namespace information.  The
function mount_ns is modified to call sget_userns instead of sget to
ensure the user namespace owner of the namespace whose information is
presented by the filesystem is used on the superblock.

For sysfs and cgroup the appropriate permission checks are already in
place, and kernfs_mount_ns is modified to call sget_userns so that
the init_user_ns is the only user namespace used.

For the cgroup filesystem cgroup namespace mounts are bind mounts of a
subset of the full cgroup filesystem and as such s_user_ns must be the
same for all of them as there is only a single superblock.

Mounts of sysfs that vary based on the network namespace could in principle
change s_user_ns but it keeps the analysis and implementation of kernfs
simpler if that is not supported, and at present there appear to be no
benefits from supporting a different s_user_ns on any sysfs mount.

Getting the details of setting s_user_ns correct has been
a long process.  Thanks to Pavel Tikhorirorv who spotted a leak
in sget_userns.  Thanks to Seth Forshee who has kept the work alive.

Thanks-to: Seth Forshee <seth.forshee@canonical.com>
Thanks-to: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/kernfs/mount.c  |  3 ++-
 fs/super.c         | 52 ++++++++++++++++++++++++++++++++++++++++++++++------
 include/linux/fs.h | 12 ++++++++++++
 3 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 63534f5f9073..d90d574c15a2 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -241,7 +241,8 @@ struct dentry *kernfs_mount_ns(struct file_system_type *fs_type, int flags,
 	info->root = root;
 	info->ns = ns;
 
-	sb = sget(fs_type, kernfs_test_super, kernfs_set_super, flags, info);
+	sb = sget_userns(fs_type, kernfs_test_super, kernfs_set_super, flags,
+			 &init_user_ns, info);
 	if (IS_ERR(sb) || sb->s_fs_info != info)
 		kfree(info);
 	if (IS_ERR(sb))
diff --git a/fs/super.c b/fs/super.c
index fd65667832e5..874c7e3ebb8f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -33,6 +33,7 @@
 #include <linux/cleancache.h>
 #include <linux/fsnotify.h>
 #include <linux/lockdep.h>
+#include <linux/user_namespace.h>
 #include "internal.h"
 
 
@@ -165,6 +166,7 @@ static void destroy_super(struct super_block *s)
 	list_lru_destroy(&s->s_inode_lru);
 	security_sb_free(s);
 	WARN_ON(!list_empty(&s->s_mounts));
+	put_user_ns(s->s_user_ns);
 	kfree(s->s_subtype);
 	kfree(s->s_options);
 	call_rcu(&s->rcu, destroy_super_rcu);
@@ -174,11 +176,13 @@ static void destroy_super(struct super_block *s)
  *	alloc_super	-	create new superblock
  *	@type:	filesystem type superblock should belong to
  *	@flags: the mount flags
+ *	@user_ns: User namespace for the super_block
  *
  *	Allocates and initializes a new &struct super_block.  alloc_super()
  *	returns a pointer new superblock or %NULL if allocation had failed.
  */
-static struct super_block *alloc_super(struct file_system_type *type, int flags)
+static struct super_block *alloc_super(struct file_system_type *type, int flags,
+				       struct user_namespace *user_ns)
 {
 	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
 	static const struct super_operations default_op;
@@ -188,6 +192,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 		return NULL;
 
 	INIT_LIST_HEAD(&s->s_mounts);
+	s->s_user_ns = get_user_ns(user_ns);
 
 	if (security_sb_alloc(s))
 		goto fail;
@@ -443,17 +448,18 @@ void generic_shutdown_super(struct super_block *sb)
 EXPORT_SYMBOL(generic_shutdown_super);
 
 /**
- *	sget	-	find or create a superblock
+ *	sget_userns -	find or create a superblock
  *	@type:	filesystem type superblock should belong to
  *	@test:	comparison callback
  *	@set:	setup callback
  *	@flags:	mount flags
+ *	@user_ns: User namespace for the super_block
  *	@data:	argument to each of them
  */
-struct super_block *sget(struct file_system_type *type,
+struct super_block *sget_userns(struct file_system_type *type,
 			int (*test)(struct super_block *,void *),
 			int (*set)(struct super_block *,void *),
-			int flags,
+			int flags, struct user_namespace *user_ns,
 			void *data)
 {
 	struct super_block *s = NULL;
@@ -466,6 +472,14 @@ retry:
 		hlist_for_each_entry(old, &type->fs_supers, s_instances) {
 			if (!test(old, data))
 				continue;
+			if (user_ns != old->s_user_ns) {
+				spin_unlock(&sb_lock);
+				if (s) {
+					up_write(&s->s_umount);
+					destroy_super(s);
+				}
+				return ERR_PTR(-EBUSY);
+			}
 			if (!grab_super(old))
 				goto retry;
 			if (s) {
@@ -478,7 +492,7 @@ retry:
 	}
 	if (!s) {
 		spin_unlock(&sb_lock);
-		s = alloc_super(type, flags);
+		s = alloc_super(type, flags, user_ns);
 		if (!s)
 			return ERR_PTR(-ENOMEM);
 		goto retry;
@@ -501,6 +515,31 @@ retry:
 	return s;
 }
 
+EXPORT_SYMBOL(sget_userns);
+
+/**
+ *	sget	-	find or create a superblock
+ *	@type:	  filesystem type superblock should belong to
+ *	@test:	  comparison callback
+ *	@set:	  setup callback
+ *	@flags:	  mount flags
+ *	@data:	  argument to each of them
+ */
+struct super_block *sget(struct file_system_type *type,
+			int (*test)(struct super_block *,void *),
+			int (*set)(struct super_block *,void *),
+			int flags,
+			void *data)
+{
+	struct user_namespace *user_ns = current_user_ns();
+
+	/* Ensure the requestor has permissions over the target filesystem */
+	if (!(flags & MS_KERNMOUNT) && !ns_capable(user_ns, CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
+
+	return sget_userns(type, test, set, flags, user_ns, data);
+}
+
 EXPORT_SYMBOL(sget);
 
 void drop_super(struct super_block *sb)
@@ -930,7 +969,8 @@ struct dentry *mount_ns(struct file_system_type *fs_type,
 	if (!(flags & MS_KERNMOUNT) && !ns_capable(user_ns, CAP_SYS_ADMIN))
 		return ERR_PTR(-EPERM);
 
-	sb = sget(fs_type, ns_test_super, ns_set_super, flags, ns);
+	sb = sget_userns(fs_type, ns_test_super, ns_set_super, flags,
+			 user_ns, ns);
 	if (IS_ERR(sb))
 		return ERR_CAST(sb);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1ce006a24f49..9eef64f23a75 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1433,6 +1433,13 @@ struct super_block {
 	struct hlist_head s_pins;
 
 	/*
+	 * Owning user namespace and default context in which to
+	 * interpret filesystem uids, gids, quotas, device nodes,
+	 * xattrs and security labels.
+	 */
+	struct user_namespace *s_user_ns;
+
+	/*
 	 * Keep the lru lists last in the structure so they always sit on their
 	 * own individual cachelines.
 	 */
@@ -2056,6 +2063,11 @@ void deactivate_locked_super(struct super_block *sb);
 int set_anon_super(struct super_block *s, void *data);
 int get_anon_bdev(dev_t *);
 void free_anon_bdev(dev_t);
+struct super_block *sget_userns(struct file_system_type *type,
+			int (*test)(struct super_block *,void *),
+			int (*set)(struct super_block *,void *),
+			int flags, struct user_namespace *user_ns,
+			void *data);
 struct super_block *sget(struct file_system_type *type,
 			int (*test)(struct super_block *,void *),
 			int (*set)(struct super_block *,void *),
-- 
2.8.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 07/13] mnt: Move the FS_USERNS_MOUNT check into sget_userns
  2016-06-20 17:21     ` Eric W. Biederman
@ 2016-06-20 17:21         ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: Miklos Szeredi, Andy Lutomirski, James Bottomley, Seth Forshee,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni

Allowing a filesystem to be mounted by other than root in the initial
user namespace is a filesystem property not a mount namespace property
and as such should be checked in filesystem specific code.  Move the
FS_USERNS_MOUNT test into super.c:sget_userns().

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/namespace.c | 4 ----
 fs/super.c     | 4 ++++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 1a69aa786975..2e13f6cfe5df 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2397,10 +2397,6 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 		return -ENODEV;
 
 	if (user_ns != &init_user_ns) {
-		if (!(type->fs_flags & FS_USERNS_MOUNT)) {
-			put_filesystem(type);
-			return -EPERM;
-		}
 		/* Only in special cases allow devices from mounts
 		 * created outside the initial user namespace.
 		 */
diff --git a/fs/super.c b/fs/super.c
index 874c7e3ebb8f..78790ada7191 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -466,6 +466,10 @@ struct super_block *sget_userns(struct file_system_type *type,
 	struct super_block *old;
 	int err;
 
+	if (!(flags & MS_KERNMOUNT) &&
+	    !(type->fs_flags & FS_USERNS_MOUNT) &&
+	    !capable(CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
 retry:
 	spin_lock(&sb_lock);
 	if (test) {
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 07/13] mnt: Move the FS_USERNS_MOUNT check into sget_userns
@ 2016-06-20 17:21         ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Miklos Szeredi, James Bottomley, Djalal Harouni,
	Seth Forshee, Serge E. Hallyn, Andy Lutomirski

Allowing a filesystem to be mounted by other than root in the initial
user namespace is a filesystem property not a mount namespace property
and as such should be checked in filesystem specific code.  Move the
FS_USERNS_MOUNT test into super.c:sget_userns().

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namespace.c | 4 ----
 fs/super.c     | 4 ++++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 1a69aa786975..2e13f6cfe5df 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2397,10 +2397,6 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 		return -ENODEV;
 
 	if (user_ns != &init_user_ns) {
-		if (!(type->fs_flags & FS_USERNS_MOUNT)) {
-			put_filesystem(type);
-			return -EPERM;
-		}
 		/* Only in special cases allow devices from mounts
 		 * created outside the initial user namespace.
 		 */
diff --git a/fs/super.c b/fs/super.c
index 874c7e3ebb8f..78790ada7191 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -466,6 +466,10 @@ struct super_block *sget_userns(struct file_system_type *type,
 	struct super_block *old;
 	int err;
 
+	if (!(flags & MS_KERNMOUNT) &&
+	    !(type->fs_flags & FS_USERNS_MOUNT) &&
+	    !capable(CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
 retry:
 	spin_lock(&sb_lock);
 	if (test) {
-- 
2.8.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 08/13] kernfs: The cgroup filesystem also benefits from SB_I_NOEXEC
  2016-06-20 17:21     ` Eric W. Biederman
@ 2016-06-20 17:21         ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: Miklos Szeredi, Andy Lutomirski, James Bottomley, Seth Forshee,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni

The cgroup filesystem is in the same boat as sysfs.  No one ever
permits executables of any kind on the cgroup filesystem, and there is
no reasonable future case to support executables in the future.

Therefore move the setting of SB_I_NOEXEC which makes the code proof
against future mistakes of accidentally creating executables from
sysfs to kernfs itself.  Making the code simpler and covering the
sysfs, cgroup, and cgroup2 filesystems.

Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/kernfs/mount.c | 2 ++
 fs/sysfs/mount.c  | 3 +--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index d90d574c15a2..1443df670260 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -152,6 +152,8 @@ static int kernfs_fill_super(struct super_block *sb, unsigned long magic)
 	struct dentry *root;
 
 	info->sb = sb;
+	/* Userspace would break if executables appear on sysfs */
+	sb->s_iflags |= SB_I_NOEXEC;
 	sb->s_blocksize = PAGE_SIZE;
 	sb->s_blocksize_bits = PAGE_SHIFT;
 	sb->s_magic = magic;
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index f31e36994dfb..20b8f82e115b 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -41,8 +41,7 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type,
 	if (IS_ERR(root) || !new_sb)
 		kobj_ns_drop(KOBJ_NS_TYPE_NET, ns);
 	else if (new_sb)
-		/* Userspace would break if executables appear on sysfs */
-		root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC;
+		root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE;
 
 	return root;
 }
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 08/13] kernfs: The cgroup filesystem also benefits from SB_I_NOEXEC
@ 2016-06-20 17:21         ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Miklos Szeredi, James Bottomley, Djalal Harouni,
	Seth Forshee, Serge E. Hallyn, Andy Lutomirski

The cgroup filesystem is in the same boat as sysfs.  No one ever
permits executables of any kind on the cgroup filesystem, and there is
no reasonable future case to support executables in the future.

Therefore move the setting of SB_I_NOEXEC which makes the code proof
against future mistakes of accidentally creating executables from
sysfs to kernfs itself.  Making the code simpler and covering the
sysfs, cgroup, and cgroup2 filesystems.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/kernfs/mount.c | 2 ++
 fs/sysfs/mount.c  | 3 +--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index d90d574c15a2..1443df670260 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -152,6 +152,8 @@ static int kernfs_fill_super(struct super_block *sb, unsigned long magic)
 	struct dentry *root;
 
 	info->sb = sb;
+	/* Userspace would break if executables appear on sysfs */
+	sb->s_iflags |= SB_I_NOEXEC;
 	sb->s_blocksize = PAGE_SIZE;
 	sb->s_blocksize_bits = PAGE_SHIFT;
 	sb->s_magic = magic;
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index f31e36994dfb..20b8f82e115b 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -41,8 +41,7 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type,
 	if (IS_ERR(root) || !new_sb)
 		kobj_ns_drop(KOBJ_NS_TYPE_NET, ns);
 	else if (new_sb)
-		/* Userspace would break if executables appear on sysfs */
-		root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC;
+		root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE;
 
 	return root;
 }
-- 
2.8.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 09/13] ipc/mqueue: The mqueue filesystem should never contain executables
  2016-06-20 17:21     ` Eric W. Biederman
@ 2016-06-20 17:21         ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: Miklos Szeredi, Andy Lutomirski, James Bottomley, Seth Forshee,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni

Set SB_I_NOEXEC on mqueuefs to ensure small implementation mistakes
do not result in executable on mqueuefs by accident.

Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 ipc/mqueue.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 60d97082f4dc..5bdd50de7d05 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -307,6 +307,7 @@ static int mqueue_fill_super(struct super_block *sb, void *data, int silent)
 	struct inode *inode;
 	struct ipc_namespace *ns = sb->s_fs_info;
 
+	sb->s_iflags |= SB_I_NOEXEC;
 	sb->s_blocksize = PAGE_SIZE;
 	sb->s_blocksize_bits = PAGE_SHIFT;
 	sb->s_magic = MQUEUE_MAGIC;
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 09/13] ipc/mqueue: The mqueue filesystem should never contain executables
@ 2016-06-20 17:21         ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Miklos Szeredi, James Bottomley, Djalal Harouni,
	Seth Forshee, Serge E. Hallyn, Andy Lutomirski

Set SB_I_NOEXEC on mqueuefs to ensure small implementation mistakes
do not result in executable on mqueuefs by accident.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 ipc/mqueue.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 60d97082f4dc..5bdd50de7d05 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -307,6 +307,7 @@ static int mqueue_fill_super(struct super_block *sb, void *data, int silent)
 	struct inode *inode;
 	struct ipc_namespace *ns = sb->s_fs_info;
 
+	sb->s_iflags |= SB_I_NOEXEC;
 	sb->s_blocksize = PAGE_SIZE;
 	sb->s_blocksize_bits = PAGE_SHIFT;
 	sb->s_magic = MQUEUE_MAGIC;
-- 
2.8.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 10/13] vfs: Generalize filesystem nodev handling.
  2016-06-20 17:21     ` Eric W. Biederman
@ 2016-06-20 17:21         ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: Miklos Szeredi, Andy Lutomirski, James Bottomley, Seth Forshee,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni

Introduce a function may_open_dev that tests MNT_NODEV and a new
superblock flab SB_I_NODEV.  Use this new function in all of the
places where MNT_NODEV was previously tested.

Add the new SB_I_NODEV s_iflag to proc, sysfs, and mqueuefs as those
filesystems should never support device nodes, and a simple superblock
flags makes that very hard to get wrong.  With SB_I_NODEV set if any
device nodes somehow manage to show up on on a filesystem those
device nodes will be unopenable.

Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/block_dev.c     | 2 +-
 fs/kernfs/mount.c  | 4 ++--
 fs/namei.c         | 8 +++++++-
 fs/proc/inode.c    | 4 ++--
 include/linux/fs.h | 2 ++
 ipc/mqueue.c       | 2 +-
 6 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 71ccab1d22c6..30b8d568203a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1857,7 +1857,7 @@ struct block_device *lookup_bdev(const char *pathname)
 	if (!S_ISBLK(inode->i_mode))
 		goto fail;
 	error = -EACCES;
-	if (path.mnt->mnt_flags & MNT_NODEV)
+	if (!may_open_dev(&path))
 		goto fail;
 	error = -ENOMEM;
 	bdev = bd_acquire(inode);
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 1443df670260..b3d73ad52b22 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -152,8 +152,8 @@ static int kernfs_fill_super(struct super_block *sb, unsigned long magic)
 	struct dentry *root;
 
 	info->sb = sb;
-	/* Userspace would break if executables appear on sysfs */
-	sb->s_iflags |= SB_I_NOEXEC;
+	/* Userspace would break if executables or devices appear on sysfs */
+	sb->s_iflags |= SB_I_NOEXEC | SB_I_NODEV;
 	sb->s_blocksize = PAGE_SIZE;
 	sb->s_blocksize_bits = PAGE_SHIFT;
 	sb->s_magic = magic;
diff --git a/fs/namei.c b/fs/namei.c
index 6a82fb7e2127..757a32725d92 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2881,6 +2881,12 @@ int vfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
 }
 EXPORT_SYMBOL(vfs_create);
 
+bool may_open_dev(const struct path *path)
+{
+	return !(path->mnt->mnt_flags & MNT_NODEV) &&
+		!(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
+}
+
 static int may_open(struct path *path, int acc_mode, int flag)
 {
 	struct dentry *dentry = path->dentry;
@@ -2899,7 +2905,7 @@ static int may_open(struct path *path, int acc_mode, int flag)
 		break;
 	case S_IFBLK:
 	case S_IFCHR:
-		if (path->mnt->mnt_flags & MNT_NODEV)
+		if (!may_open_dev(path))
 			return -EACCES;
 		/*FALLTHRU*/
 	case S_IFIFO:
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index f4817efb25a6..a5b2c33745b7 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -466,8 +466,8 @@ int proc_fill_super(struct super_block *s, void *data, int silent)
 	if (!proc_parse_options(data, ns))
 		return -EINVAL;
 
-	/* User space would break if executables appear on proc */
-	s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC;
+	/* User space would break if executables or devices appear on proc */
+	s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV;
 	s->s_flags |= MS_NODIRATIME | MS_NOSUID | MS_NOEXEC;
 	s->s_blocksize = 1024;
 	s->s_blocksize_bits = 10;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9eef64f23a75..e05983170d23 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1327,6 +1327,7 @@ struct mm_struct;
 /* sb->s_iflags */
 #define SB_I_CGROUPWB	0x00000001	/* cgroup-aware writeback enabled */
 #define SB_I_NOEXEC	0x00000002	/* Ignore executables on this fs */
+#define SB_I_NODEV	0x00000004	/* Ignore devices on this fs */
 
 /* sb->s_iflags to limit user namespace mounts */
 #define SB_I_USERNS_VISIBLE		0x00000010 /* fstype already mounted */
@@ -1602,6 +1603,7 @@ extern int vfs_whiteout(struct inode *, struct dentry *);
  */
 extern void inode_init_owner(struct inode *inode, const struct inode *dir,
 			umode_t mode);
+extern bool may_open_dev(const struct path *path);
 /*
  * VFS FS_IOC_FIEMAP helper definitions.
  */
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 5bdd50de7d05..0b13ace266f2 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -307,7 +307,7 @@ static int mqueue_fill_super(struct super_block *sb, void *data, int silent)
 	struct inode *inode;
 	struct ipc_namespace *ns = sb->s_fs_info;
 
-	sb->s_iflags |= SB_I_NOEXEC;
+	sb->s_iflags |= SB_I_NOEXEC | SB_I_NODEV;
 	sb->s_blocksize = PAGE_SIZE;
 	sb->s_blocksize_bits = PAGE_SHIFT;
 	sb->s_magic = MQUEUE_MAGIC;
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 10/13] vfs: Generalize filesystem nodev handling.
@ 2016-06-20 17:21         ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Miklos Szeredi, James Bottomley, Djalal Harouni,
	Seth Forshee, Serge E. Hallyn, Andy Lutomirski

Introduce a function may_open_dev that tests MNT_NODEV and a new
superblock flab SB_I_NODEV.  Use this new function in all of the
places where MNT_NODEV was previously tested.

Add the new SB_I_NODEV s_iflag to proc, sysfs, and mqueuefs as those
filesystems should never support device nodes, and a simple superblock
flags makes that very hard to get wrong.  With SB_I_NODEV set if any
device nodes somehow manage to show up on on a filesystem those
device nodes will be unopenable.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/block_dev.c     | 2 +-
 fs/kernfs/mount.c  | 4 ++--
 fs/namei.c         | 8 +++++++-
 fs/proc/inode.c    | 4 ++--
 include/linux/fs.h | 2 ++
 ipc/mqueue.c       | 2 +-
 6 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 71ccab1d22c6..30b8d568203a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1857,7 +1857,7 @@ struct block_device *lookup_bdev(const char *pathname)
 	if (!S_ISBLK(inode->i_mode))
 		goto fail;
 	error = -EACCES;
-	if (path.mnt->mnt_flags & MNT_NODEV)
+	if (!may_open_dev(&path))
 		goto fail;
 	error = -ENOMEM;
 	bdev = bd_acquire(inode);
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 1443df670260..b3d73ad52b22 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -152,8 +152,8 @@ static int kernfs_fill_super(struct super_block *sb, unsigned long magic)
 	struct dentry *root;
 
 	info->sb = sb;
-	/* Userspace would break if executables appear on sysfs */
-	sb->s_iflags |= SB_I_NOEXEC;
+	/* Userspace would break if executables or devices appear on sysfs */
+	sb->s_iflags |= SB_I_NOEXEC | SB_I_NODEV;
 	sb->s_blocksize = PAGE_SIZE;
 	sb->s_blocksize_bits = PAGE_SHIFT;
 	sb->s_magic = magic;
diff --git a/fs/namei.c b/fs/namei.c
index 6a82fb7e2127..757a32725d92 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2881,6 +2881,12 @@ int vfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
 }
 EXPORT_SYMBOL(vfs_create);
 
+bool may_open_dev(const struct path *path)
+{
+	return !(path->mnt->mnt_flags & MNT_NODEV) &&
+		!(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
+}
+
 static int may_open(struct path *path, int acc_mode, int flag)
 {
 	struct dentry *dentry = path->dentry;
@@ -2899,7 +2905,7 @@ static int may_open(struct path *path, int acc_mode, int flag)
 		break;
 	case S_IFBLK:
 	case S_IFCHR:
-		if (path->mnt->mnt_flags & MNT_NODEV)
+		if (!may_open_dev(path))
 			return -EACCES;
 		/*FALLTHRU*/
 	case S_IFIFO:
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index f4817efb25a6..a5b2c33745b7 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -466,8 +466,8 @@ int proc_fill_super(struct super_block *s, void *data, int silent)
 	if (!proc_parse_options(data, ns))
 		return -EINVAL;
 
-	/* User space would break if executables appear on proc */
-	s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC;
+	/* User space would break if executables or devices appear on proc */
+	s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV;
 	s->s_flags |= MS_NODIRATIME | MS_NOSUID | MS_NOEXEC;
 	s->s_blocksize = 1024;
 	s->s_blocksize_bits = 10;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9eef64f23a75..e05983170d23 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1327,6 +1327,7 @@ struct mm_struct;
 /* sb->s_iflags */
 #define SB_I_CGROUPWB	0x00000001	/* cgroup-aware writeback enabled */
 #define SB_I_NOEXEC	0x00000002	/* Ignore executables on this fs */
+#define SB_I_NODEV	0x00000004	/* Ignore devices on this fs */
 
 /* sb->s_iflags to limit user namespace mounts */
 #define SB_I_USERNS_VISIBLE		0x00000010 /* fstype already mounted */
@@ -1602,6 +1603,7 @@ extern int vfs_whiteout(struct inode *, struct dentry *);
  */
 extern void inode_init_owner(struct inode *inode, const struct inode *dir,
 			umode_t mode);
+extern bool may_open_dev(const struct path *path);
 /*
  * VFS FS_IOC_FIEMAP helper definitions.
  */
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 5bdd50de7d05..0b13ace266f2 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -307,7 +307,7 @@ static int mqueue_fill_super(struct super_block *sb, void *data, int silent)
 	struct inode *inode;
 	struct ipc_namespace *ns = sb->s_fs_info;
 
-	sb->s_iflags |= SB_I_NOEXEC;
+	sb->s_iflags |= SB_I_NOEXEC | SB_I_NODEV;
 	sb->s_blocksize = PAGE_SIZE;
 	sb->s_blocksize_bits = PAGE_SHIFT;
 	sb->s_magic = MQUEUE_MAGIC;
-- 
2.8.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 11/13] mnt: Simplify mount_too_revealing
  2016-06-20 17:21     ` Eric W. Biederman
@ 2016-06-20 17:21         ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: Miklos Szeredi, Andy Lutomirski, James Bottomley, Seth Forshee,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni

Verify all filesystems that we check in mount_too_revealing set
SB_I_NOEXEC and SB_I_NODEV in sb->s_iflags.  That is true for today
and it should remain true in the future.

Remove the now unnecessary checks from mnt_already_visibile that
ensure MNT_LOCK_NOSUID, MNT_LOCK_NOEXEC, and MNT_LOCK_NODEV are
preserved.  Making the code shorter and easier to read.

Relying on SB_I_NOEXEC and SB_I_NODEV instead of the user visible
MNT_NOSUID, MNT_NOEXEC, and MNT_NODEV ensures the many current
systems where proc and sysfs are mounted with "nosuid, nodev, noexec"
and several slightly buggy container applications don't bother to
set those flags continue to work.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/namespace.c | 25 ++++++++-----------------
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2e13f6cfe5df..b1da7f8182c4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3232,12 +3232,8 @@ static bool mnt_already_visible(struct mnt_namespace *ns, struct vfsmount *new,
 		if (mnt->mnt.mnt_root != mnt->mnt.mnt_sb->s_root)
 			continue;
 
-		/* Read the mount flags and filter out flags that
-		 * may safely be ignored.
-		 */
+		/* A local view of the mount flags */
 		mnt_flags = mnt->mnt.mnt_flags;
-		if (mnt->mnt.mnt_sb->s_iflags & SB_I_NOEXEC)
-			mnt_flags &= ~(MNT_LOCK_NOSUID | MNT_LOCK_NOEXEC);
 
 		/* Don't miss readonly hidden in the superblock flags */
 		if (mnt->mnt.mnt_sb->s_flags & MS_RDONLY)
@@ -3249,15 +3245,6 @@ static bool mnt_already_visible(struct mnt_namespace *ns, struct vfsmount *new,
 		if ((mnt_flags & MNT_LOCK_READONLY) &&
 		    !(new_flags & MNT_READONLY))
 			continue;
-		if ((mnt_flags & MNT_LOCK_NODEV) &&
-		    !(new_flags & MNT_NODEV))
-			continue;
-		if ((mnt_flags & MNT_LOCK_NOSUID) &&
-		    !(new_flags & MNT_NOSUID))
-			continue;
-		if ((mnt_flags & MNT_LOCK_NOEXEC) &&
-		    !(new_flags & MNT_NOEXEC))
-			continue;
 		if ((mnt_flags & MNT_LOCK_ATIME) &&
 		    ((mnt_flags & MNT_ATIME_MASK) != (new_flags & MNT_ATIME_MASK)))
 			continue;
@@ -3277,9 +3264,6 @@ static bool mnt_already_visible(struct mnt_namespace *ns, struct vfsmount *new,
 		}
 		/* Preserve the locked attributes */
 		*new_mnt_flags |= mnt_flags & (MNT_LOCK_READONLY | \
-					       MNT_LOCK_NODEV    | \
-					       MNT_LOCK_NOSUID   | \
-					       MNT_LOCK_NOEXEC   | \
 					       MNT_LOCK_ATIME);
 		visible = true;
 		goto found;
@@ -3292,6 +3276,7 @@ found:
 
 static bool mount_too_revealing(struct vfsmount *mnt, int *new_mnt_flags)
 {
+	const unsigned long required_iflags = SB_I_NOEXEC | SB_I_NODEV;
 	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
 	unsigned long s_iflags;
 
@@ -3303,6 +3288,12 @@ static bool mount_too_revealing(struct vfsmount *mnt, int *new_mnt_flags)
 	if (!(s_iflags & SB_I_USERNS_VISIBLE))
 		return false;
 
+	if ((s_iflags & required_iflags) != required_iflags) {
+		WARN_ONCE(1, "Expected s_iflags to contain 0x%lx\n",
+			  required_iflags);
+		return true;
+	}
+
 	return !mnt_already_visible(ns, mnt, new_mnt_flags);
 }
 
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 11/13] mnt: Simplify mount_too_revealing
@ 2016-06-20 17:21         ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Miklos Szeredi, James Bottomley, Djalal Harouni,
	Seth Forshee, Serge E. Hallyn, Andy Lutomirski

Verify all filesystems that we check in mount_too_revealing set
SB_I_NOEXEC and SB_I_NODEV in sb->s_iflags.  That is true for today
and it should remain true in the future.

Remove the now unnecessary checks from mnt_already_visibile that
ensure MNT_LOCK_NOSUID, MNT_LOCK_NOEXEC, and MNT_LOCK_NODEV are
preserved.  Making the code shorter and easier to read.

Relying on SB_I_NOEXEC and SB_I_NODEV instead of the user visible
MNT_NOSUID, MNT_NOEXEC, and MNT_NODEV ensures the many current
systems where proc and sysfs are mounted with "nosuid, nodev, noexec"
and several slightly buggy container applications don't bother to
set those flags continue to work.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namespace.c | 25 ++++++++-----------------
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2e13f6cfe5df..b1da7f8182c4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3232,12 +3232,8 @@ static bool mnt_already_visible(struct mnt_namespace *ns, struct vfsmount *new,
 		if (mnt->mnt.mnt_root != mnt->mnt.mnt_sb->s_root)
 			continue;
 
-		/* Read the mount flags and filter out flags that
-		 * may safely be ignored.
-		 */
+		/* A local view of the mount flags */
 		mnt_flags = mnt->mnt.mnt_flags;
-		if (mnt->mnt.mnt_sb->s_iflags & SB_I_NOEXEC)
-			mnt_flags &= ~(MNT_LOCK_NOSUID | MNT_LOCK_NOEXEC);
 
 		/* Don't miss readonly hidden in the superblock flags */
 		if (mnt->mnt.mnt_sb->s_flags & MS_RDONLY)
@@ -3249,15 +3245,6 @@ static bool mnt_already_visible(struct mnt_namespace *ns, struct vfsmount *new,
 		if ((mnt_flags & MNT_LOCK_READONLY) &&
 		    !(new_flags & MNT_READONLY))
 			continue;
-		if ((mnt_flags & MNT_LOCK_NODEV) &&
-		    !(new_flags & MNT_NODEV))
-			continue;
-		if ((mnt_flags & MNT_LOCK_NOSUID) &&
-		    !(new_flags & MNT_NOSUID))
-			continue;
-		if ((mnt_flags & MNT_LOCK_NOEXEC) &&
-		    !(new_flags & MNT_NOEXEC))
-			continue;
 		if ((mnt_flags & MNT_LOCK_ATIME) &&
 		    ((mnt_flags & MNT_ATIME_MASK) != (new_flags & MNT_ATIME_MASK)))
 			continue;
@@ -3277,9 +3264,6 @@ static bool mnt_already_visible(struct mnt_namespace *ns, struct vfsmount *new,
 		}
 		/* Preserve the locked attributes */
 		*new_mnt_flags |= mnt_flags & (MNT_LOCK_READONLY | \
-					       MNT_LOCK_NODEV    | \
-					       MNT_LOCK_NOSUID   | \
-					       MNT_LOCK_NOEXEC   | \
 					       MNT_LOCK_ATIME);
 		visible = true;
 		goto found;
@@ -3292,6 +3276,7 @@ found:
 
 static bool mount_too_revealing(struct vfsmount *mnt, int *new_mnt_flags)
 {
+	const unsigned long required_iflags = SB_I_NOEXEC | SB_I_NODEV;
 	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
 	unsigned long s_iflags;
 
@@ -3303,6 +3288,12 @@ static bool mount_too_revealing(struct vfsmount *mnt, int *new_mnt_flags)
 	if (!(s_iflags & SB_I_USERNS_VISIBLE))
 		return false;
 
+	if ((s_iflags & required_iflags) != required_iflags) {
+		WARN_ONCE(1, "Expected s_iflags to contain 0x%lx\n",
+			  required_iflags);
+		return true;
+	}
+
 	return !mnt_already_visible(ns, mnt, new_mnt_flags);
 }
 
-- 
2.8.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 12/13] userns: Remove implicit MNT_NODEV fragility.
  2016-06-20 17:21     ` Eric W. Biederman
@ 2016-06-20 17:21         ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: Miklos Szeredi, Andy Lutomirski, James Bottomley, Seth Forshee,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni

Replace the implict setting of MNT_NODEV on mounts that happen with
just user namespace permissions with an implicit setting of SB_I_NODEV
in s_iflags.  The visibility of the implicit MNT_NODEV has caused
problems in the past.

With this change the fragile case where an implicit MNT_NODEV needs to
be preserved in do_remount is removed.  Using SB_I_NODEV is much less
fragile as s_iflags are set during the original mount and never
changed.

In do_new_mount with the implicit setting of MNT_NODEV gone, the only
code that can affect mnt_flags is fs_fully_visible so simplify the if
statement and reduce the indentation of the code to make that clear.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/namespace.c | 19 +------------------
 fs/super.c     |  3 +++
 2 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index b1da7f8182c4..9786a38d1681 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2185,13 +2185,7 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	}
 	if ((mnt->mnt.mnt_flags & MNT_LOCK_NODEV) &&
 	    !(mnt_flags & MNT_NODEV)) {
-		/* Was the nodev implicitly added in mount? */
-		if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
-		    !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
-			mnt_flags |= MNT_NODEV;
-		} else {
-			return -EPERM;
-		}
+		return -EPERM;
 	}
 	if ((mnt->mnt.mnt_flags & MNT_LOCK_NOSUID) &&
 	    !(mnt_flags & MNT_NOSUID)) {
@@ -2385,7 +2379,6 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 			int mnt_flags, const char *name, void *data)
 {
 	struct file_system_type *type;
-	struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
 	struct vfsmount *mnt;
 	int err;
 
@@ -2396,16 +2389,6 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 	if (!type)
 		return -ENODEV;
 
-	if (user_ns != &init_user_ns) {
-		/* Only in special cases allow devices from mounts
-		 * created outside the initial user namespace.
-		 */
-		if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
-			flags |= MS_NODEV;
-			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
-		}
-	}
-
 	mnt = vfs_kern_mount(type, flags, name, data);
 	if (!IS_ERR(mnt) && (type->fs_flags & FS_HAS_SUBTYPE) &&
 	    !mnt->mnt_sb->s_subtype)
diff --git a/fs/super.c b/fs/super.c
index 78790ada7191..25cdceed2ad3 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -206,6 +206,9 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	init_waitqueue_head(&s->s_writers.wait_unfrozen);
 	s->s_bdi = &noop_backing_dev_info;
 	s->s_flags = flags;
+	if ((s->s_user_ns != &init_user_ns) &&
+	    !(type->fs_flags & FS_USERNS_DEV_MOUNT))
+		s->s_iflags |= SB_I_NODEV;
 	INIT_HLIST_NODE(&s->s_instances);
 	INIT_HLIST_BL_HEAD(&s->s_anon);
 	mutex_init(&s->s_sync_lock);
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 12/13] userns: Remove implicit MNT_NODEV fragility.
@ 2016-06-20 17:21         ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Miklos Szeredi, James Bottomley, Djalal Harouni,
	Seth Forshee, Serge E. Hallyn, Andy Lutomirski

Replace the implict setting of MNT_NODEV on mounts that happen with
just user namespace permissions with an implicit setting of SB_I_NODEV
in s_iflags.  The visibility of the implicit MNT_NODEV has caused
problems in the past.

With this change the fragile case where an implicit MNT_NODEV needs to
be preserved in do_remount is removed.  Using SB_I_NODEV is much less
fragile as s_iflags are set during the original mount and never
changed.

In do_new_mount with the implicit setting of MNT_NODEV gone, the only
code that can affect mnt_flags is fs_fully_visible so simplify the if
statement and reduce the indentation of the code to make that clear.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namespace.c | 19 +------------------
 fs/super.c     |  3 +++
 2 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index b1da7f8182c4..9786a38d1681 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2185,13 +2185,7 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	}
 	if ((mnt->mnt.mnt_flags & MNT_LOCK_NODEV) &&
 	    !(mnt_flags & MNT_NODEV)) {
-		/* Was the nodev implicitly added in mount? */
-		if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
-		    !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
-			mnt_flags |= MNT_NODEV;
-		} else {
-			return -EPERM;
-		}
+		return -EPERM;
 	}
 	if ((mnt->mnt.mnt_flags & MNT_LOCK_NOSUID) &&
 	    !(mnt_flags & MNT_NOSUID)) {
@@ -2385,7 +2379,6 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 			int mnt_flags, const char *name, void *data)
 {
 	struct file_system_type *type;
-	struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
 	struct vfsmount *mnt;
 	int err;
 
@@ -2396,16 +2389,6 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 	if (!type)
 		return -ENODEV;
 
-	if (user_ns != &init_user_ns) {
-		/* Only in special cases allow devices from mounts
-		 * created outside the initial user namespace.
-		 */
-		if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
-			flags |= MS_NODEV;
-			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
-		}
-	}
-
 	mnt = vfs_kern_mount(type, flags, name, data);
 	if (!IS_ERR(mnt) && (type->fs_flags & FS_HAS_SUBTYPE) &&
 	    !mnt->mnt_sb->s_subtype)
diff --git a/fs/super.c b/fs/super.c
index 78790ada7191..25cdceed2ad3 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -206,6 +206,9 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	init_waitqueue_head(&s->s_writers.wait_unfrozen);
 	s->s_bdi = &noop_backing_dev_info;
 	s->s_flags = flags;
+	if ((s->s_user_ns != &init_user_ns) &&
+	    !(type->fs_flags & FS_USERNS_DEV_MOUNT))
+		s->s_iflags |= SB_I_NODEV;
 	INIT_HLIST_NODE(&s->s_instances);
 	INIT_HLIST_BL_HEAD(&s->s_anon);
 	mutex_init(&s->s_sync_lock);
-- 
2.8.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 13/13] userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
  2016-06-20 17:21     ` Eric W. Biederman
@ 2016-06-20 17:21         ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: Miklos Szeredi, Andy Lutomirski, James Bottomley, Seth Forshee,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni

Now that SB_I_NODEV controls the nodev behavior devpts can just clear
this flag during mount.  Simplifying the code and making it easier
to audit how the code works.  While still preserving the invariant
that s_iflags is only modified during mount.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/devpts/inode.c  | 3 ++-
 fs/super.c         | 3 +--
 include/linux/fs.h | 1 -
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 37c134a132c7..d116453b0276 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -396,6 +396,7 @@ devpts_fill_super(struct super_block *s, void *data, int silent)
 {
 	struct inode *inode;
 
+	s->s_iflags &= ~SB_I_NODEV;
 	s->s_blocksize = 1024;
 	s->s_blocksize_bits = 10;
 	s->s_magic = DEVPTS_SUPER_MAGIC;
@@ -480,7 +481,7 @@ static struct file_system_type devpts_fs_type = {
 	.name		= "devpts",
 	.mount		= devpts_mount,
 	.kill_sb	= devpts_kill_sb,
-	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 /*
diff --git a/fs/super.c b/fs/super.c
index 25cdceed2ad3..37813bf479cf 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -206,8 +206,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	init_waitqueue_head(&s->s_writers.wait_unfrozen);
 	s->s_bdi = &noop_backing_dev_info;
 	s->s_flags = flags;
-	if ((s->s_user_ns != &init_user_ns) &&
-	    !(type->fs_flags & FS_USERNS_DEV_MOUNT))
+	if (s->s_user_ns != &init_user_ns)
 		s->s_iflags |= SB_I_NODEV;
 	INIT_HLIST_NODE(&s->s_instances);
 	INIT_HLIST_BL_HEAD(&s->s_anon);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e05983170d23..375e37f42cdf 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2022,7 +2022,6 @@ struct file_system_type {
 #define FS_BINARY_MOUNTDATA	2
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
-#define FS_USERNS_DEV_MOUNT	16 /* A userns mount does not imply MNT_NODEV */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	struct dentry *(*mount) (struct file_system_type *, int,
 		       const char *, void *);
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH review 13/13] userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
@ 2016-06-20 17:21         ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-20 17:21 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Miklos Szeredi, James Bottomley, Djalal Harouni,
	Seth Forshee, Serge E. Hallyn, Andy Lutomirski

Now that SB_I_NODEV controls the nodev behavior devpts can just clear
this flag during mount.  Simplifying the code and making it easier
to audit how the code works.  While still preserving the invariant
that s_iflags is only modified during mount.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/devpts/inode.c  | 3 ++-
 fs/super.c         | 3 +--
 include/linux/fs.h | 1 -
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 37c134a132c7..d116453b0276 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -396,6 +396,7 @@ devpts_fill_super(struct super_block *s, void *data, int silent)
 {
 	struct inode *inode;
 
+	s->s_iflags &= ~SB_I_NODEV;
 	s->s_blocksize = 1024;
 	s->s_blocksize_bits = 10;
 	s->s_magic = DEVPTS_SUPER_MAGIC;
@@ -480,7 +481,7 @@ static struct file_system_type devpts_fs_type = {
 	.name		= "devpts",
 	.mount		= devpts_mount,
 	.kill_sb	= devpts_kill_sb,
-	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 /*
diff --git a/fs/super.c b/fs/super.c
index 25cdceed2ad3..37813bf479cf 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -206,8 +206,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	init_waitqueue_head(&s->s_writers.wait_unfrozen);
 	s->s_bdi = &noop_backing_dev_info;
 	s->s_flags = flags;
-	if ((s->s_user_ns != &init_user_ns) &&
-	    !(type->fs_flags & FS_USERNS_DEV_MOUNT))
+	if (s->s_user_ns != &init_user_ns)
 		s->s_iflags |= SB_I_NODEV;
 	INIT_HLIST_NODE(&s->s_instances);
 	INIT_HLIST_BL_HEAD(&s->s_anon);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e05983170d23..375e37f42cdf 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2022,7 +2022,6 @@ struct file_system_type {
 #define FS_BINARY_MOUNTDATA	2
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
-#define FS_USERNS_DEV_MOUNT	16 /* A userns mount does not imply MNT_NODEV */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	struct dentry *(*mount) (struct file_system_type *, int,
 		       const char *, void *);
-- 
2.8.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH review 02/13] mnt: Refactor fs_fully_visible into mount_too_revealing
  2016-06-20 17:21         ` Eric W. Biederman
@ 2016-06-20 22:53             ` Andy Lutomirski
  -1 siblings, 0 replies; 56+ messages in thread
From: Andy Lutomirski @ 2016-06-20 22:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, James Bottomley, Seth Forshee,
	Linux FS Devel, Djalal Harouni

On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Replace the call of fs_fully_visible in do_new_mount from before the
> new superblock is allocated with a call of mount_too_revealing after
> the superblock is allocated.   This winds up being a much better location
> for maintainability of the code.
>
> The first change this enables is the replacement of FS_USERNS_VISIBLE
> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
> to sb_iflags on the superblock.

Why is this useful?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 02/13] mnt: Refactor fs_fully_visible into mount_too_revealing
@ 2016-06-20 22:53             ` Andy Lutomirski
  0 siblings, 0 replies; 56+ messages in thread
From: Andy Lutomirski @ 2016-06-20 22:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Linux FS Devel, Miklos Szeredi,
	James Bottomley, Djalal Harouni, Seth Forshee, Serge E. Hallyn

On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Replace the call of fs_fully_visible in do_new_mount from before the
> new superblock is allocated with a call of mount_too_revealing after
> the superblock is allocated.   This winds up being a much better location
> for maintainability of the code.
>
> The first change this enables is the replacement of FS_USERNS_VISIBLE
> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
> to sb_iflags on the superblock.

Why is this useful?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 10/13] vfs: Generalize filesystem nodev handling.
  2016-06-20 17:21         ` Eric W. Biederman
@ 2016-06-20 22:57             ` Andy Lutomirski
  -1 siblings, 0 replies; 56+ messages in thread
From: Andy Lutomirski @ 2016-06-20 22:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, James Bottomley, Seth Forshee,
	Linux FS Devel, Djalal Harouni

On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Introduce a function may_open_dev that tests MNT_NODEV and a new
> superblock flab SB_I_NODEV.  Use this new function in all of the
> places where MNT_NODEV was previously tested.

This would be nicer as two patches: one to refactor the check and the
second to change the condition.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 10/13] vfs: Generalize filesystem nodev handling.
@ 2016-06-20 22:57             ` Andy Lutomirski
  0 siblings, 0 replies; 56+ messages in thread
From: Andy Lutomirski @ 2016-06-20 22:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Linux FS Devel, Miklos Szeredi,
	James Bottomley, Djalal Harouni, Seth Forshee, Serge E. Hallyn

On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Introduce a function may_open_dev that tests MNT_NODEV and a new
> superblock flab SB_I_NODEV.  Use this new function in all of the
> places where MNT_NODEV was previously tested.

This would be nicer as two patches: one to refactor the check and the
second to change the condition.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 12/13] userns: Remove implicit MNT_NODEV fragility.
  2016-06-20 17:21         ` Eric W. Biederman
@ 2016-06-20 22:58             ` Andy Lutomirski
  -1 siblings, 0 replies; 56+ messages in thread
From: Andy Lutomirski @ 2016-06-20 22:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, James Bottomley, Seth Forshee,
	Linux FS Devel, Djalal Harouni

On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Replace the implict setting of MNT_NODEV on mounts that happen with
> just user namespace permissions with an implicit setting of SB_I_NODEV
> in s_iflags.  The visibility of the implicit MNT_NODEV has caused
> problems in the past.

I like this!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 12/13] userns: Remove implicit MNT_NODEV fragility.
@ 2016-06-20 22:58             ` Andy Lutomirski
  0 siblings, 0 replies; 56+ messages in thread
From: Andy Lutomirski @ 2016-06-20 22:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Linux FS Devel, Miklos Szeredi,
	James Bottomley, Djalal Harouni, Seth Forshee, Serge E. Hallyn

On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Replace the implict setting of MNT_NODEV on mounts that happen with
> just user namespace permissions with an implicit setting of SB_I_NODEV
> in s_iflags.  The visibility of the implicit MNT_NODEV has caused
> problems in the past.

I like this!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 02/13] mnt: Refactor fs_fully_visible into mount_too_revealing
  2016-06-20 22:53             ` Andy Lutomirski
@ 2016-06-21 18:54                 ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-21 18:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, Linux Containers, James Bottomley, Seth Forshee,
	Linux FS Devel, Djalal Harouni

Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:

> On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> Replace the call of fs_fully_visible in do_new_mount from before the
>> new superblock is allocated with a call of mount_too_revealing after
>> the superblock is allocated.   This winds up being a much better location
>> for maintainability of the code.
>>
>> The first change this enables is the replacement of FS_USERNS_VISIBLE
>> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
>> to sb_iflags on the superblock.
>
> Why is this useful?

A couple of reasons.
- It helps clean up do_new_mount which is currently so overloaded by
  special cases that it is difficult to see the core logic.

- It makes the check about the actual superblock that is being mounted
  rather than the superblock that might be mounted.

- The practical place where being about the actual superblock that is
  being mounted helps is that in "11/13 mnt: Simplify mount_too_revealing"
  that removes the MNT_LOCK_NOSUID MNT_LOCK_NOEXEC and MNT_LOCK_NODEV
  tests from the code, while verify that those tests are not needed
  because the sb_iflags contains SB_I_NOEXEC and SB_I_NODEV.

- The conceptual change of testing once the superblock has been
  generated makes changes like the one above much more sensible
  and it helps untangle mount namespace versus superblock concerns.

That last is a big part of what this patchset is about.  When do we care
about the superblock and when do we care about the mount namespace.

Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 02/13] mnt: Refactor fs_fully_visible into mount_too_revealing
@ 2016-06-21 18:54                 ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-21 18:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Linux FS Devel, Miklos Szeredi,
	James Bottomley, Djalal Harouni, Seth Forshee, Serge E. Hallyn

Andy Lutomirski <luto@amacapital.net> writes:

> On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Replace the call of fs_fully_visible in do_new_mount from before the
>> new superblock is allocated with a call of mount_too_revealing after
>> the superblock is allocated.   This winds up being a much better location
>> for maintainability of the code.
>>
>> The first change this enables is the replacement of FS_USERNS_VISIBLE
>> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
>> to sb_iflags on the superblock.
>
> Why is this useful?

A couple of reasons.
- It helps clean up do_new_mount which is currently so overloaded by
  special cases that it is difficult to see the core logic.

- It makes the check about the actual superblock that is being mounted
  rather than the superblock that might be mounted.

- The practical place where being about the actual superblock that is
  being mounted helps is that in "11/13 mnt: Simplify mount_too_revealing"
  that removes the MNT_LOCK_NOSUID MNT_LOCK_NOEXEC and MNT_LOCK_NODEV
  tests from the code, while verify that those tests are not needed
  because the sb_iflags contains SB_I_NOEXEC and SB_I_NODEV.

- The conceptual change of testing once the superblock has been
  generated makes changes like the one above much more sensible
  and it helps untangle mount namespace versus superblock concerns.

That last is a big part of what this patchset is about.  When do we care
about the superblock and when do we care about the mount namespace.

Eric


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 10/13] vfs: Generalize filesystem nodev handling.
  2016-06-20 22:57             ` Andy Lutomirski
@ 2016-06-21 19:09                 ` Eric W. Biederman
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-21 19:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, Linux Containers, James Bottomley, Seth Forshee,
	Linux FS Devel, Djalal Harouni

Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:

> On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> Introduce a function may_open_dev that tests MNT_NODEV and a new
>> superblock flab SB_I_NODEV.  Use this new function in all of the
>> places where MNT_NODEV was previously tested.
>
> This would be nicer as two patches: one to refactor the check and the
> second to change the condition.

I can see where introducing may_open_dev before changing the conditions
in may_open_dev might have been a a hair more readable.   At the same
time that approaches the ridiculously small patches, and this change
is one clear focused change (introduce and test SB_I_NODEV) and the
change is small enough I don't see anything getting lost in the noise.

I did very deliberately separate this from
"12/13 userns: Remove implicit MNT_NODEV fragility."
As a combination there would have been very confusing.

Which is the really interesting result as it removes that stupid
unnecessary difference between mounts inside and outside of user
namespaces.

Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 10/13] vfs: Generalize filesystem nodev handling.
@ 2016-06-21 19:09                 ` Eric W. Biederman
  0 siblings, 0 replies; 56+ messages in thread
From: Eric W. Biederman @ 2016-06-21 19:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Linux FS Devel, Miklos Szeredi,
	James Bottomley, Djalal Harouni, Seth Forshee, Serge E. Hallyn

Andy Lutomirski <luto@amacapital.net> writes:

> On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Introduce a function may_open_dev that tests MNT_NODEV and a new
>> superblock flab SB_I_NODEV.  Use this new function in all of the
>> places where MNT_NODEV was previously tested.
>
> This would be nicer as two patches: one to refactor the check and the
> second to change the condition.

I can see where introducing may_open_dev before changing the conditions
in may_open_dev might have been a a hair more readable.   At the same
time that approaches the ridiculously small patches, and this change
is one clear focused change (introduce and test SB_I_NODEV) and the
change is small enough I don't see anything getting lost in the noise.

I did very deliberately separate this from
"12/13 userns: Remove implicit MNT_NODEV fragility."
As a combination there would have been very confusing.

Which is the really interesting result as it removes that stupid
unnecessary difference between mounts inside and outside of user
namespaces.

Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 02/13] mnt: Refactor fs_fully_visible into mount_too_revealing
  2016-06-20 17:21         ` Eric W. Biederman
@ 2016-06-22 19:40             ` Seth Forshee
  -1 siblings, 0 replies; 56+ messages in thread
From: Seth Forshee @ 2016-06-22 19:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Andy Lutomirski,
	James Bottomley, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Djalal Harouni

On Mon, Jun 20, 2016 at 12:21:19PM -0500, Eric W. Biederman wrote:
> Replace the call of fs_fully_visible in do_new_mount from before the
> new superblock is allocated with a call of mount_too_revealing after
> the superblock is allocated.   This winds up being a much better location
> for maintainability of the code.
> 
> The first change this enables is the replacement of FS_USERNS_VISIBLE
> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
> to sb_iflags on the superblock.
> 
> Unfortunately mount_too_revealing fundamentally needs to touch
> mnt_flags adding several MNT_LOCKED_XXX flags at the appropriate
> times.  If the mnt_flags did not need to be touched the code
> could be easily moved into the filesystem specific mount code.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 02/13] mnt: Refactor fs_fully_visible into mount_too_revealing
@ 2016-06-22 19:40             ` Seth Forshee
  0 siblings, 0 replies; 56+ messages in thread
From: Seth Forshee @ 2016-06-22 19:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Miklos Szeredi, James Bottomley,
	Djalal Harouni, Serge E. Hallyn, Andy Lutomirski

On Mon, Jun 20, 2016 at 12:21:19PM -0500, Eric W. Biederman wrote:
> Replace the call of fs_fully_visible in do_new_mount from before the
> new superblock is allocated with a call of mount_too_revealing after
> the superblock is allocated.   This winds up being a much better location
> for maintainability of the code.
> 
> The first change this enables is the replacement of FS_USERNS_VISIBLE
> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
> to sb_iflags on the superblock.
> 
> Unfortunately mount_too_revealing fundamentally needs to touch
> mnt_flags adding several MNT_LOCKED_XXX flags at the appropriate
> times.  If the mnt_flags did not need to be touched the code
> could be easily moved into the filesystem specific mount code.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Seth Forshee <seth.forshee@canonical.com>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 07/13] mnt: Move the FS_USERNS_MOUNT check into sget_userns
  2016-06-20 17:21         ` Eric W. Biederman
@ 2016-06-22 19:43             ` Seth Forshee
  -1 siblings, 0 replies; 56+ messages in thread
From: Seth Forshee @ 2016-06-22 19:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Andy Lutomirski,
	James Bottomley, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Djalal Harouni

On Mon, Jun 20, 2016 at 12:21:24PM -0500, Eric W. Biederman wrote:
> Allowing a filesystem to be mounted by other than root in the initial
> user namespace is a filesystem property not a mount namespace property
> and as such should be checked in filesystem specific code.  Move the
> FS_USERNS_MOUNT test into super.c:sget_userns().
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 07/13] mnt: Move the FS_USERNS_MOUNT check into sget_userns
@ 2016-06-22 19:43             ` Seth Forshee
  0 siblings, 0 replies; 56+ messages in thread
From: Seth Forshee @ 2016-06-22 19:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Miklos Szeredi, James Bottomley,
	Djalal Harouni, Serge E. Hallyn, Andy Lutomirski

On Mon, Jun 20, 2016 at 12:21:24PM -0500, Eric W. Biederman wrote:
> Allowing a filesystem to be mounted by other than root in the initial
> user namespace is a filesystem property not a mount namespace property
> and as such should be checked in filesystem specific code.  Move the
> FS_USERNS_MOUNT test into super.c:sget_userns().
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Seth Forshee <seth.forshee@canonical.com>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 11/13] mnt: Simplify mount_too_revealing
  2016-06-20 17:21         ` Eric W. Biederman
@ 2016-06-22 19:48             ` Seth Forshee
  -1 siblings, 0 replies; 56+ messages in thread
From: Seth Forshee @ 2016-06-22 19:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Andy Lutomirski,
	James Bottomley, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Djalal Harouni

On Mon, Jun 20, 2016 at 12:21:28PM -0500, Eric W. Biederman wrote:
> Verify all filesystems that we check in mount_too_revealing set
> SB_I_NOEXEC and SB_I_NODEV in sb->s_iflags.  That is true for today
> and it should remain true in the future.
> 
> Remove the now unnecessary checks from mnt_already_visibile that
> ensure MNT_LOCK_NOSUID, MNT_LOCK_NOEXEC, and MNT_LOCK_NODEV are
> preserved.  Making the code shorter and easier to read.
> 
> Relying on SB_I_NOEXEC and SB_I_NODEV instead of the user visible
> MNT_NOSUID, MNT_NOEXEC, and MNT_NODEV ensures the many current
> systems where proc and sysfs are mounted with "nosuid, nodev, noexec"
> and several slightly buggy container applications don't bother to
> set those flags continue to work.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 11/13] mnt: Simplify mount_too_revealing
@ 2016-06-22 19:48             ` Seth Forshee
  0 siblings, 0 replies; 56+ messages in thread
From: Seth Forshee @ 2016-06-22 19:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Miklos Szeredi, James Bottomley,
	Djalal Harouni, Serge E. Hallyn, Andy Lutomirski

On Mon, Jun 20, 2016 at 12:21:28PM -0500, Eric W. Biederman wrote:
> Verify all filesystems that we check in mount_too_revealing set
> SB_I_NOEXEC and SB_I_NODEV in sb->s_iflags.  That is true for today
> and it should remain true in the future.
> 
> Remove the now unnecessary checks from mnt_already_visibile that
> ensure MNT_LOCK_NOSUID, MNT_LOCK_NOEXEC, and MNT_LOCK_NODEV are
> preserved.  Making the code shorter and easier to read.
> 
> Relying on SB_I_NOEXEC and SB_I_NODEV instead of the user visible
> MNT_NOSUID, MNT_NOEXEC, and MNT_NODEV ensures the many current
> systems where proc and sysfs are mounted with "nosuid, nodev, noexec"
> and several slightly buggy container applications don't bother to
> set those flags continue to work.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Seth Forshee <seth.forshee@canonical.com>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 12/13] userns: Remove implicit MNT_NODEV fragility.
       [not found]         ` <20160620172130.15712-12-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2016-06-20 22:58             ` Andy Lutomirski
@ 2016-06-22 19:49           ` Seth Forshee
  1 sibling, 0 replies; 56+ messages in thread
From: Seth Forshee @ 2016-06-22 19:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Andy Lutomirski,
	James Bottomley, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Djalal Harouni

On Mon, Jun 20, 2016 at 12:21:29PM -0500, Eric W. Biederman wrote:
> Replace the implict setting of MNT_NODEV on mounts that happen with
> just user namespace permissions with an implicit setting of SB_I_NODEV
> in s_iflags.  The visibility of the implicit MNT_NODEV has caused
> problems in the past.
> 
> With this change the fragile case where an implicit MNT_NODEV needs to
> be preserved in do_remount is removed.  Using SB_I_NODEV is much less
> fragile as s_iflags are set during the original mount and never
> changed.
> 
> In do_new_mount with the implicit setting of MNT_NODEV gone, the only
> code that can affect mnt_flags is fs_fully_visible so simplify the if
> statement and reduce the indentation of the code to make that clear.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

This is much, much nicer.

Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 12/13] userns: Remove implicit MNT_NODEV fragility.
  2016-06-20 17:21         ` Eric W. Biederman
  (?)
@ 2016-06-22 19:49         ` Seth Forshee
  -1 siblings, 0 replies; 56+ messages in thread
From: Seth Forshee @ 2016-06-22 19:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Miklos Szeredi, James Bottomley,
	Djalal Harouni, Serge E. Hallyn, Andy Lutomirski

On Mon, Jun 20, 2016 at 12:21:29PM -0500, Eric W. Biederman wrote:
> Replace the implict setting of MNT_NODEV on mounts that happen with
> just user namespace permissions with an implicit setting of SB_I_NODEV
> in s_iflags.  The visibility of the implicit MNT_NODEV has caused
> problems in the past.
> 
> With this change the fragile case where an implicit MNT_NODEV needs to
> be preserved in do_remount is removed.  Using SB_I_NODEV is much less
> fragile as s_iflags are set during the original mount and never
> changed.
> 
> In do_new_mount with the implicit setting of MNT_NODEV gone, the only
> code that can affect mnt_flags is fs_fully_visible so simplify the if
> statement and reduce the indentation of the code to make that clear.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

This is much, much nicer.

Acked-by: Seth Forshee <seth.forshee@canonical.com>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 13/13] userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
       [not found]         ` <20160620172130.15712-13-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2016-06-22 19:50           ` Seth Forshee
  0 siblings, 0 replies; 56+ messages in thread
From: Seth Forshee @ 2016-06-22 19:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Andy Lutomirski,
	James Bottomley, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Djalal Harouni

On Mon, Jun 20, 2016 at 12:21:30PM -0500, Eric W. Biederman wrote:
> Now that SB_I_NODEV controls the nodev behavior devpts can just clear
> this flag during mount.  Simplifying the code and making it easier
> to audit how the code works.  While still preserving the invariant
> that s_iflags is only modified during mount.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 13/13] userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
  2016-06-20 17:21         ` Eric W. Biederman
  (?)
@ 2016-06-22 19:50         ` Seth Forshee
  -1 siblings, 0 replies; 56+ messages in thread
From: Seth Forshee @ 2016-06-22 19:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Miklos Szeredi, James Bottomley,
	Djalal Harouni, Serge E. Hallyn, Andy Lutomirski

On Mon, Jun 20, 2016 at 12:21:30PM -0500, Eric W. Biederman wrote:
> Now that SB_I_NODEV controls the nodev behavior devpts can just clear
> this flag during mount.  Simplifying the code and making it easier
> to audit how the code works.  While still preserving the invariant
> that s_iflags is only modified during mount.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Seth Forshee <seth.forshee@canonical.com>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 05/13] proc: Convert proc_mount to use mount_ns.
       [not found]         ` <20160620172130.15712-5-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2016-06-23 11:50           ` Djalal Harouni
  0 siblings, 0 replies; 56+ messages in thread
From: Djalal Harouni @ 2016-06-23 11:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Andy Lutomirski,
	James Bottomley, Seth Forshee, Linux FS Devel

On Mon, Jun 20, 2016 at 7:21 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Move the call of get_pid_ns, the call of proc_parse_options, and
> the setting of s_iflags into proc_fill_super so that mount_ns
> can be used.
>
> Convert proc_mount to call mount_ns and remove the now unnecessary
> code.
>
> Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Reviewed-by: Djalal Harouni <tixxdz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>


> ---
>  fs/proc/inode.c    |  9 +++++++--
>  fs/proc/internal.h |  3 ++-
>  fs/proc/root.c     | 52 ++++------------------------------------------------
>  3 files changed, 13 insertions(+), 51 deletions(-)
>
> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
> index 78fa452d65ed..f4817efb25a6 100644
> --- a/fs/proc/inode.c
> +++ b/fs/proc/inode.c
> @@ -457,12 +457,17 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
>         return inode;
>  }
>
> -int proc_fill_super(struct super_block *s)
> +int proc_fill_super(struct super_block *s, void *data, int silent)
>  {
> +       struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
>         struct inode *root_inode;
>         int ret;
>
> -       s->s_iflags |= SB_I_USERNS_VISIBLE;
> +       if (!proc_parse_options(data, ns))
> +               return -EINVAL;
> +
> +       /* User space would break if executables appear on proc */
> +       s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC;
>         s->s_flags |= MS_NODIRATIME | MS_NOSUID | MS_NOEXEC;
>         s->s_blocksize = 1024;
>         s->s_blocksize_bits = 10;
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index aa2781095bd1..7931c558c192 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -212,7 +212,7 @@ extern const struct inode_operations proc_pid_link_inode_operations;
>
>  extern void proc_init_inodecache(void);
>  extern struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *);
> -extern int proc_fill_super(struct super_block *);
> +extern int proc_fill_super(struct super_block *, void *data, int flags);
>  extern void proc_entry_rundown(struct proc_dir_entry *);
>
>  /*
> @@ -268,6 +268,7 @@ static inline void proc_tty_init(void) {}
>   * root.c
>   */
>  extern struct proc_dir_entry proc_root;
> +extern int proc_parse_options(char *options, struct pid_namespace *pid);
>
>  extern void proc_self_init(void);
>  extern int proc_remount(struct super_block *, int *, char *);
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index a1b2860fec62..8d3e484055a6 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -23,21 +23,6 @@
>
>  #include "internal.h"
>
> -static int proc_test_super(struct super_block *sb, void *data)
> -{
> -       return sb->s_fs_info == data;
> -}
> -
> -static int proc_set_super(struct super_block *sb, void *data)
> -{
> -       int err = set_anon_super(sb, NULL);
> -       if (!err) {
> -               struct pid_namespace *ns = (struct pid_namespace *)data;
> -               sb->s_fs_info = get_pid_ns(ns);
> -       }
> -       return err;
> -}
> -
>  enum {
>         Opt_gid, Opt_hidepid, Opt_err,
>  };
> @@ -48,7 +33,7 @@ static const match_table_t tokens = {
>         {Opt_err, NULL},
>  };
>
> -static int proc_parse_options(char *options, struct pid_namespace *pid)
> +int proc_parse_options(char *options, struct pid_namespace *pid)
>  {
>         char *p;
>         substring_t args[MAX_OPT_ARGS];
> @@ -100,45 +85,16 @@ int proc_remount(struct super_block *sb, int *flags, char *data)
>  static struct dentry *proc_mount(struct file_system_type *fs_type,
>         int flags, const char *dev_name, void *data)
>  {
> -       int err;
> -       struct super_block *sb;
>         struct pid_namespace *ns;
> -       char *options;
>
>         if (flags & MS_KERNMOUNT) {
> -               ns = (struct pid_namespace *)data;
> -               options = NULL;
> +               ns = data;
> +               data = NULL;
>         } else {
>                 ns = task_active_pid_ns(current);
> -               options = data;
> -
> -               /* Does the mounter have privilege over the pid namespace? */
> -               if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
> -                       return ERR_PTR(-EPERM);
> -       }
> -
> -       sb = sget(fs_type, proc_test_super, proc_set_super, flags, ns);
> -       if (IS_ERR(sb))
> -               return ERR_CAST(sb);
> -
> -       if (!proc_parse_options(options, ns)) {
> -               deactivate_locked_super(sb);
> -               return ERR_PTR(-EINVAL);
> -       }
> -
> -       if (!sb->s_root) {
> -               err = proc_fill_super(sb);
> -               if (err) {
> -                       deactivate_locked_super(sb);
> -                       return ERR_PTR(err);
> -               }
> -
> -               sb->s_flags |= MS_ACTIVE;
> -               /* User space would break if executables appear on proc */
> -               sb->s_iflags |= SB_I_NOEXEC;
>         }
>
> -       return dget(sb->s_root);
> +       return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super);
>  }
>
>  static void proc_kill_sb(struct super_block *sb)
> --
> 2.8.3
>


-- 
tixxdz
http://opendz.org

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 05/13] proc: Convert proc_mount to use mount_ns.
  2016-06-20 17:21         ` Eric W. Biederman
  (?)
  (?)
@ 2016-06-23 11:50         ` Djalal Harouni
  -1 siblings, 0 replies; 56+ messages in thread
From: Djalal Harouni @ 2016-06-23 11:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Linux FS Devel, Miklos Szeredi,
	James Bottomley, Seth Forshee, Serge E. Hallyn, Andy Lutomirski

On Mon, Jun 20, 2016 at 7:21 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Move the call of get_pid_ns, the call of proc_parse_options, and
> the setting of s_iflags into proc_fill_super so that mount_ns
> can be used.
>
> Convert proc_mount to call mount_ns and remove the now unnecessary
> code.
>
> Acked-by: Seth Forshee <seth.forshee@canonical.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Djalal Harouni <tixxdz@gmail.com>


> ---
>  fs/proc/inode.c    |  9 +++++++--
>  fs/proc/internal.h |  3 ++-
>  fs/proc/root.c     | 52 ++++------------------------------------------------
>  3 files changed, 13 insertions(+), 51 deletions(-)
>
> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
> index 78fa452d65ed..f4817efb25a6 100644
> --- a/fs/proc/inode.c
> +++ b/fs/proc/inode.c
> @@ -457,12 +457,17 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
>         return inode;
>  }
>
> -int proc_fill_super(struct super_block *s)
> +int proc_fill_super(struct super_block *s, void *data, int silent)
>  {
> +       struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
>         struct inode *root_inode;
>         int ret;
>
> -       s->s_iflags |= SB_I_USERNS_VISIBLE;
> +       if (!proc_parse_options(data, ns))
> +               return -EINVAL;
> +
> +       /* User space would break if executables appear on proc */
> +       s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC;
>         s->s_flags |= MS_NODIRATIME | MS_NOSUID | MS_NOEXEC;
>         s->s_blocksize = 1024;
>         s->s_blocksize_bits = 10;
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index aa2781095bd1..7931c558c192 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -212,7 +212,7 @@ extern const struct inode_operations proc_pid_link_inode_operations;
>
>  extern void proc_init_inodecache(void);
>  extern struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *);
> -extern int proc_fill_super(struct super_block *);
> +extern int proc_fill_super(struct super_block *, void *data, int flags);
>  extern void proc_entry_rundown(struct proc_dir_entry *);
>
>  /*
> @@ -268,6 +268,7 @@ static inline void proc_tty_init(void) {}
>   * root.c
>   */
>  extern struct proc_dir_entry proc_root;
> +extern int proc_parse_options(char *options, struct pid_namespace *pid);
>
>  extern void proc_self_init(void);
>  extern int proc_remount(struct super_block *, int *, char *);
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index a1b2860fec62..8d3e484055a6 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -23,21 +23,6 @@
>
>  #include "internal.h"
>
> -static int proc_test_super(struct super_block *sb, void *data)
> -{
> -       return sb->s_fs_info == data;
> -}
> -
> -static int proc_set_super(struct super_block *sb, void *data)
> -{
> -       int err = set_anon_super(sb, NULL);
> -       if (!err) {
> -               struct pid_namespace *ns = (struct pid_namespace *)data;
> -               sb->s_fs_info = get_pid_ns(ns);
> -       }
> -       return err;
> -}
> -
>  enum {
>         Opt_gid, Opt_hidepid, Opt_err,
>  };
> @@ -48,7 +33,7 @@ static const match_table_t tokens = {
>         {Opt_err, NULL},
>  };
>
> -static int proc_parse_options(char *options, struct pid_namespace *pid)
> +int proc_parse_options(char *options, struct pid_namespace *pid)
>  {
>         char *p;
>         substring_t args[MAX_OPT_ARGS];
> @@ -100,45 +85,16 @@ int proc_remount(struct super_block *sb, int *flags, char *data)
>  static struct dentry *proc_mount(struct file_system_type *fs_type,
>         int flags, const char *dev_name, void *data)
>  {
> -       int err;
> -       struct super_block *sb;
>         struct pid_namespace *ns;
> -       char *options;
>
>         if (flags & MS_KERNMOUNT) {
> -               ns = (struct pid_namespace *)data;
> -               options = NULL;
> +               ns = data;
> +               data = NULL;
>         } else {
>                 ns = task_active_pid_ns(current);
> -               options = data;
> -
> -               /* Does the mounter have privilege over the pid namespace? */
> -               if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
> -                       return ERR_PTR(-EPERM);
> -       }
> -
> -       sb = sget(fs_type, proc_test_super, proc_set_super, flags, ns);
> -       if (IS_ERR(sb))
> -               return ERR_CAST(sb);
> -
> -       if (!proc_parse_options(options, ns)) {
> -               deactivate_locked_super(sb);
> -               return ERR_PTR(-EINVAL);
> -       }
> -
> -       if (!sb->s_root) {
> -               err = proc_fill_super(sb);
> -               if (err) {
> -                       deactivate_locked_super(sb);
> -                       return ERR_PTR(err);
> -               }
> -
> -               sb->s_flags |= MS_ACTIVE;
> -               /* User space would break if executables appear on proc */
> -               sb->s_iflags |= SB_I_NOEXEC;
>         }
>
> -       return dget(sb->s_root);
> +       return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super);
>  }
>
>  static void proc_kill_sb(struct super_block *sb)
> --
> 2.8.3
>


-- 
tixxdz
http://opendz.org

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 02/13] mnt: Refactor fs_fully_visible into mount_too_revealing
  2016-06-21 18:54                 ` Eric W. Biederman
@ 2016-06-23 21:23                     ` Djalal Harouni
  -1 siblings, 0 replies; 56+ messages in thread
From: Djalal Harouni @ 2016-06-23 21:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Andy Lutomirski,
	James Bottomley, Seth Forshee, Linux FS Devel

On Tue, Jun 21, 2016 at 8:54 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
>
>> On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
>> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>> Replace the call of fs_fully_visible in do_new_mount from before the
>>> new superblock is allocated with a call of mount_too_revealing after
>>> the superblock is allocated.   This winds up being a much better location
>>> for maintainability of the code.
>>>
>>> The first change this enables is the replacement of FS_USERNS_VISIBLE
>>> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
>>> to sb_iflags on the superblock.
>>
>> Why is this useful?
>
> A couple of reasons.
> - It helps clean up do_new_mount which is currently so overloaded by
>   special cases that it is difficult to see the core logic.
>
> - It makes the check about the actual superblock that is being mounted
>   rather than the superblock that might be mounted.
>
> - The practical place where being about the actual superblock that is
>   being mounted helps is that in "11/13 mnt: Simplify mount_too_revealing"
>   that removes the MNT_LOCK_NOSUID MNT_LOCK_NOEXEC and MNT_LOCK_NODEV
>   tests from the code, while verify that those tests are not needed
>   because the sb_iflags contains SB_I_NOEXEC and SB_I_NODEV.

Yes, but it seems in that patch 11/13 the SB_I_NOEXEC and SB_I_NODEV
flags are only enforced and checked in case 'user_ns != init_user_ns' so for
init_user_ns we don't enforce it. Even if we set the flags and things
are correct
now, but as you have noted in your previous patches related to this we try to
commit to never exec on procfs and sysfs... so maybe take that check on its
own and move it before the init_user_ns one ?

> - The conceptual change of testing once the superblock has been
>   generated makes changes like the one above much more sensible
>   and it helps untangle mount namespace versus superblock concerns.
>
> That last is a big part of what this patchset is about.  When do we care
> about the superblock and when do we care about the mount namespace.

Historically fs_fully_visible() or mount_too_revealing() now gathered lot of
security checks... so one may wonder about the implication of moving it
after !?... yes having a clear context about superblocks and mount
namespaces matters... but I'm not sure about the order.

> Eric
>

Thank you!

-- 
tixxdz
http://opendz.org

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 02/13] mnt: Refactor fs_fully_visible into mount_too_revealing
@ 2016-06-23 21:23                     ` Djalal Harouni
  0 siblings, 0 replies; 56+ messages in thread
From: Djalal Harouni @ 2016-06-23 21:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, Linux Containers, Linux FS Devel,
	Miklos Szeredi, James Bottomley, Seth Forshee, Serge E. Hallyn

On Tue, Jun 21, 2016 at 8:54 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>> Replace the call of fs_fully_visible in do_new_mount from before the
>>> new superblock is allocated with a call of mount_too_revealing after
>>> the superblock is allocated.   This winds up being a much better location
>>> for maintainability of the code.
>>>
>>> The first change this enables is the replacement of FS_USERNS_VISIBLE
>>> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
>>> to sb_iflags on the superblock.
>>
>> Why is this useful?
>
> A couple of reasons.
> - It helps clean up do_new_mount which is currently so overloaded by
>   special cases that it is difficult to see the core logic.
>
> - It makes the check about the actual superblock that is being mounted
>   rather than the superblock that might be mounted.
>
> - The practical place where being about the actual superblock that is
>   being mounted helps is that in "11/13 mnt: Simplify mount_too_revealing"
>   that removes the MNT_LOCK_NOSUID MNT_LOCK_NOEXEC and MNT_LOCK_NODEV
>   tests from the code, while verify that those tests are not needed
>   because the sb_iflags contains SB_I_NOEXEC and SB_I_NODEV.

Yes, but it seems in that patch 11/13 the SB_I_NOEXEC and SB_I_NODEV
flags are only enforced and checked in case 'user_ns != init_user_ns' so for
init_user_ns we don't enforce it. Even if we set the flags and things
are correct
now, but as you have noted in your previous patches related to this we try to
commit to never exec on procfs and sysfs... so maybe take that check on its
own and move it before the init_user_ns one ?

> - The conceptual change of testing once the superblock has been
>   generated makes changes like the one above much more sensible
>   and it helps untangle mount namespace versus superblock concerns.
>
> That last is a big part of what this patchset is about.  When do we care
> about the superblock and when do we care about the mount namespace.

Historically fs_fully_visible() or mount_too_revealing() now gathered lot of
security checks... so one may wonder about the implication of moving it
after !?... yes having a clear context about superblocks and mount
namespaces matters... but I'm not sure about the order.

> Eric
>

Thank you!

-- 
tixxdz
http://opendz.org

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 02/13] mnt: Refactor fs_fully_visible into mount_too_revealing
       [not found]                 ` <874m8m4bky.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
  2016-06-23 21:23                     ` Djalal Harouni
@ 2016-06-24  6:56                   ` Serge E. Hallyn
  1 sibling, 0 replies; 56+ messages in thread
From: Serge E. Hallyn @ 2016-06-24  6:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Andy Lutomirski,
	James Bottomley, Seth Forshee, Linux FS Devel, Djalal Harouni

On Tue, Jun 21, 2016 at 01:54:21PM -0500, Eric W. Biederman wrote:
> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
> 
> > On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
> > <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> >> Replace the call of fs_fully_visible in do_new_mount from before the
> >> new superblock is allocated with a call of mount_too_revealing after
> >> the superblock is allocated.   This winds up being a much better location
> >> for maintainability of the code.
> >>
> >> The first change this enables is the replacement of FS_USERNS_VISIBLE
> >> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
> >> to sb_iflags on the superblock.
> >
> > Why is this useful?
> 
> A couple of reasons.
> - It helps clean up do_new_mount which is currently so overloaded by
>   special cases that it is difficult to see the core logic.

Agreed, i find this easier to read and reason about.

> - It makes the check about the actual superblock that is being mounted
>   rather than the superblock that might be mounted.
> 
> - The practical place where being about the actual superblock that is
>   being mounted helps is that in "11/13 mnt: Simplify mount_too_revealing"
>   that removes the MNT_LOCK_NOSUID MNT_LOCK_NOEXEC and MNT_LOCK_NODEV
>   tests from the code, while verify that those tests are not needed
>   because the sb_iflags contains SB_I_NOEXEC and SB_I_NODEV.
> 
> - The conceptual change of testing once the superblock has been
>   generated makes changes like the one above much more sensible
>   and it helps untangle mount namespace versus superblock concerns.
> 
> That last is a big part of what this patchset is about.  When do we care
> about the superblock and when do we care about the mount namespace.
> 
> Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 02/13] mnt: Refactor fs_fully_visible into mount_too_revealing
  2016-06-21 18:54                 ` Eric W. Biederman
  (?)
  (?)
@ 2016-06-24  6:56                 ` Serge E. Hallyn
  -1 siblings, 0 replies; 56+ messages in thread
From: Serge E. Hallyn @ 2016-06-24  6:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, Linux Containers, Linux FS Devel,
	Miklos Szeredi, James Bottomley, Djalal Harouni, Seth Forshee,
	Serge E. Hallyn

On Tue, Jun 21, 2016 at 01:54:21PM -0500, Eric W. Biederman wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
> 
> > On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
> > <ebiederm@xmission.com> wrote:
> >> Replace the call of fs_fully_visible in do_new_mount from before the
> >> new superblock is allocated with a call of mount_too_revealing after
> >> the superblock is allocated.   This winds up being a much better location
> >> for maintainability of the code.
> >>
> >> The first change this enables is the replacement of FS_USERNS_VISIBLE
> >> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
> >> to sb_iflags on the superblock.
> >
> > Why is this useful?
> 
> A couple of reasons.
> - It helps clean up do_new_mount which is currently so overloaded by
>   special cases that it is difficult to see the core logic.

Agreed, i find this easier to read and reason about.

> - It makes the check about the actual superblock that is being mounted
>   rather than the superblock that might be mounted.
> 
> - The practical place where being about the actual superblock that is
>   being mounted helps is that in "11/13 mnt: Simplify mount_too_revealing"
>   that removes the MNT_LOCK_NOSUID MNT_LOCK_NOEXEC and MNT_LOCK_NODEV
>   tests from the code, while verify that those tests are not needed
>   because the sb_iflags contains SB_I_NOEXEC and SB_I_NODEV.
> 
> - The conceptual change of testing once the superblock has been
>   generated makes changes like the one above much more sensible
>   and it helps untangle mount namespace versus superblock concerns.
> 
> That last is a big part of what this patchset is about.  When do we care
> about the superblock and when do we care about the mount namespace.
> 
> Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 03/13] ipc: Initialize ipc_namespace->user_ns early.
  2016-06-20 17:21         ` Eric W. Biederman
@ 2016-06-24 14:34             ` Djalal Harouni
  -1 siblings, 0 replies; 56+ messages in thread
From: Djalal Harouni @ 2016-06-24 14:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Andy Lutomirski,
	James Bottomley, Seth Forshee, Linux FS Devel

On Mon, Jun 20, 2016 at 7:21 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Allow the ipc namespace initialization code to depend on ns->user_ns
> being set during initialization.
>
> In particular this allows mq_init_ns to use ns->user_ns for permission
> checks and initializating s_user_ns while the the mq filesystem is
> being mounted.
>
> Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> Suggested-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Reviewed-by: Djalal Harouni <tixxdz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

> ---
>  ipc/namespace.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/ipc/namespace.c b/ipc/namespace.c
> index 068caf18d565..04cb07eb81f1 100644
> --- a/ipc/namespace.c
> +++ b/ipc/namespace.c
> @@ -34,8 +34,11 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
>         ns->ns.ops = &ipcns_operations;
>
>         atomic_set(&ns->count, 1);
> +       ns->user_ns = get_user_ns(user_ns);
> +
>         err = mq_init_ns(ns);
>         if (err) {
> +               put_user_ns(ns->user_ns);
>                 ns_free_inum(&ns->ns);
>                 kfree(ns);
>                 return ERR_PTR(err);
> @@ -46,8 +49,6 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
>         msg_init_ns(ns);
>         shm_init_ns(ns);
>
> -       ns->user_ns = get_user_ns(user_ns);
> -
>         return ns;
>  }
>
> --
> 2.8.3
>



-- 
tixxdz
http://opendz.org
http://blog.opendz.org

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH review 03/13] ipc: Initialize ipc_namespace->user_ns early.
@ 2016-06-24 14:34             ` Djalal Harouni
  0 siblings, 0 replies; 56+ messages in thread
From: Djalal Harouni @ 2016-06-24 14:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Linux FS Devel, Miklos Szeredi,
	James Bottomley, Seth Forshee, Serge E. Hallyn, Andy Lutomirski

On Mon, Jun 20, 2016 at 7:21 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Allow the ipc namespace initialization code to depend on ns->user_ns
> being set during initialization.
>
> In particular this allows mq_init_ns to use ns->user_ns for permission
> checks and initializating s_user_ns while the the mq filesystem is
> being mounted.
>
> Acked-by: Seth Forshee <seth.forshee@canonical.com>
> Suggested-by: Seth Forshee <seth.forshee@canonical.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Djalal Harouni <tixxdz@gmail.com>

> ---
>  ipc/namespace.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/ipc/namespace.c b/ipc/namespace.c
> index 068caf18d565..04cb07eb81f1 100644
> --- a/ipc/namespace.c
> +++ b/ipc/namespace.c
> @@ -34,8 +34,11 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
>         ns->ns.ops = &ipcns_operations;
>
>         atomic_set(&ns->count, 1);
> +       ns->user_ns = get_user_ns(user_ns);
> +
>         err = mq_init_ns(ns);
>         if (err) {
> +               put_user_ns(ns->user_ns);
>                 ns_free_inum(&ns->ns);
>                 kfree(ns);
>                 return ERR_PTR(err);
> @@ -46,8 +49,6 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
>         msg_init_ns(ns);
>         shm_init_ns(ns);
>
> -       ns->user_ns = get_user_ns(user_ns);
> -
>         return ns;
>  }
>
> --
> 2.8.3
>



-- 
tixxdz
http://opendz.org
http://blog.opendz.org

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2016-06-24 14:34 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-20 17:09 [PATCH review 0/13] Adding a userns owner to struct super_block Eric W. Biederman
2016-06-20 17:09 ` Eric W. Biederman
     [not found] ` <87fus77pns.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-06-20 17:21   ` [PATCH review 01/13] mnt: Account for MS_RDONLY in fs_fully_visible Eric W. Biederman
2016-06-20 17:21     ` Eric W. Biederman
     [not found]     ` <20160620172130.15712-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-06-20 17:21       ` [PATCH review 02/13] mnt: Refactor fs_fully_visible into mount_too_revealing Eric W. Biederman
2016-06-20 17:21         ` Eric W. Biederman
     [not found]         ` <20160620172130.15712-2-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-06-20 22:53           ` Andy Lutomirski
2016-06-20 22:53             ` Andy Lutomirski
     [not found]             ` <CALCETrXv2aeP38AmUaRVMC+O-oeWKwDcy8fPfsOCu1f8mncZEA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-21 18:54               ` Eric W. Biederman
2016-06-21 18:54                 ` Eric W. Biederman
     [not found]                 ` <874m8m4bky.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-06-23 21:23                   ` Djalal Harouni
2016-06-23 21:23                     ` Djalal Harouni
2016-06-24  6:56                   ` Serge E. Hallyn
2016-06-24  6:56                 ` Serge E. Hallyn
2016-06-22 19:40           ` Seth Forshee
2016-06-22 19:40             ` Seth Forshee
2016-06-20 17:21       ` [PATCH review 03/13] ipc: Initialize ipc_namespace->user_ns early Eric W. Biederman
2016-06-20 17:21         ` Eric W. Biederman
     [not found]         ` <20160620172130.15712-3-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-06-24 14:34           ` Djalal Harouni
2016-06-24 14:34             ` Djalal Harouni
2016-06-20 17:21       ` [PATCH review 04/13] vfs: Pass data, ns, and ns->userns to mount_ns Eric W. Biederman
2016-06-20 17:21         ` Eric W. Biederman
2016-06-20 17:21       ` [PATCH review 05/13] proc: Convert proc_mount to use mount_ns Eric W. Biederman
2016-06-20 17:21         ` Eric W. Biederman
     [not found]         ` <20160620172130.15712-5-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-06-23 11:50           ` Djalal Harouni
2016-06-23 11:50         ` Djalal Harouni
2016-06-20 17:21       ` [PATCH review 06/13] fs: Add user namespace member to struct super_block Eric W. Biederman
2016-06-20 17:21         ` Eric W. Biederman
2016-06-20 17:21       ` [PATCH review 07/13] mnt: Move the FS_USERNS_MOUNT check into sget_userns Eric W. Biederman
2016-06-20 17:21         ` Eric W. Biederman
     [not found]         ` <20160620172130.15712-7-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-06-22 19:43           ` Seth Forshee
2016-06-22 19:43             ` Seth Forshee
2016-06-20 17:21       ` [PATCH review 08/13] kernfs: The cgroup filesystem also benefits from SB_I_NOEXEC Eric W. Biederman
2016-06-20 17:21         ` Eric W. Biederman
2016-06-20 17:21       ` [PATCH review 09/13] ipc/mqueue: The mqueue filesystem should never contain executables Eric W. Biederman
2016-06-20 17:21         ` Eric W. Biederman
2016-06-20 17:21       ` [PATCH review 10/13] vfs: Generalize filesystem nodev handling Eric W. Biederman
2016-06-20 17:21         ` Eric W. Biederman
     [not found]         ` <20160620172130.15712-10-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-06-20 22:57           ` Andy Lutomirski
2016-06-20 22:57             ` Andy Lutomirski
     [not found]             ` <CALCETrUWsnRgjyRyb+_0u0PYubx9gg=hUAso=073yjJY+m205g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-21 19:09               ` Eric W. Biederman
2016-06-21 19:09                 ` Eric W. Biederman
2016-06-20 17:21       ` [PATCH review 11/13] mnt: Simplify mount_too_revealing Eric W. Biederman
2016-06-20 17:21         ` Eric W. Biederman
     [not found]         ` <20160620172130.15712-11-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-06-22 19:48           ` Seth Forshee
2016-06-22 19:48             ` Seth Forshee
2016-06-20 17:21       ` [PATCH review 12/13] userns: Remove implicit MNT_NODEV fragility Eric W. Biederman
2016-06-20 17:21         ` Eric W. Biederman
2016-06-22 19:49         ` Seth Forshee
     [not found]         ` <20160620172130.15712-12-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-06-20 22:58           ` Andy Lutomirski
2016-06-20 22:58             ` Andy Lutomirski
2016-06-22 19:49           ` Seth Forshee
2016-06-20 17:21       ` [PATCH review 13/13] userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag Eric W. Biederman
2016-06-20 17:21         ` Eric W. Biederman
2016-06-22 19:50         ` Seth Forshee
     [not found]         ` <20160620172130.15712-13-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-06-22 19:50           ` Seth Forshee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.