[PATCH v3 0/7] Initial support for user namespace owned mounts

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/7] Initial support for user namespace owned mounts
@ 2015-09-16 20:02 Seth Forshee
  2015-09-16 20:02 ` [PATCH v3 1/7] fs: Add user namesapace member to struct super_block Seth Forshee
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Seth Forshee @ 2015-09-16 20:02 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, linux-mtd, linux-fsdevel,
	linux-security-module, selinux
  Cc: Serge Hallyn, Andy Lutomirski, linux-kernel, Seth Forshee

This is the third revision of the patch series to introduce the concept
of superblocks owned by user namespaces. These are nearly identical to
the v2 patches; the only changes are resolving conflicts from rebasing
onto 4.3-rc1 and reverting a change I inadvertantly introduced in v2.

These are the first in a larger set of patches, with the goal of
eventually allowing some regular filesystem types to be mounted in
unprivileged containers. The full series is available at:

  git://kernel.ubuntu.com/sforshee/linux.git userns-mounts

The strategy for this series is to do as much of the heavy lifting as
possible in the vfs to minimize the need to handle edge cases in
individual filesystems. The patches that follow lay some of the
groundwork and fall into two groups:

 1. Patches 1-2 add s_user_ns to struct superblock and use it to
    simplify MNT_NODEV handling.

 2. Patches 3-7 tighten down security for mounts with s_user_ns !=
    &init_user_ns.

Note that these patches only address security at the vfs level. As has
been discussed previously, individual filesystems may still be
vulnerable to attacks via malicious metadata in the backing store. The
goal is to find a small set of filesystems which can be hardened from
attacks from below. I am initially targeting fuse, which has been
designed to resist such attacks, and ext4, which is so far standing up
quite well to fuzzing.

Changes since v2:
 - Resolved conflicts from rebasing onto 4.3-rc1.
 - Reverted a change in the v2 patches which made fs_fully_visible use
   current_user_ns instead of s_user_ns when deciding whether to filter
   out MNT_LOCK_NODEV.

Andy Lutomirski (1):
  fs: Treat foreign mounts as nosuid

Eric W. Biederman (1):
  userns: Simpilify MNT_NODEV handling.

Seth Forshee (5):
  fs: Add user namesapace member to struct super_block
  fs: Verify access of user towards block device file when mounting
  fs: Limit file caps to the user namespace of the super block
  Smack: Add support for unprivileged mounts from user namespaces
  selinux: Add support for unprivileged mounts from user namespaces

 drivers/mtd/mtdsuper.c         |  7 +++++-
 fs/block_dev.c                 | 54 +++++++++++++++++++++++++++++++++---------
 fs/exec.c                      |  2 +-
 fs/namei.c                     |  9 ++++++-
 fs/namespace.c                 | 34 +++++++++++++++-----------
 fs/proc/root.c                 |  3 ++-
 fs/super.c                     | 38 +++++++++++++++++++++++++----
 include/linux/fs.h             | 11 ++++++++-
 include/linux/mount.h          |  1 +
 include/linux/user_namespace.h |  8 +++++++
 kernel/user_namespace.c        | 14 +++++++++++
 security/commoncap.c           |  4 +++-
 security/selinux/hooks.c       | 25 ++++++++++++++++++-
 security/smack/smack.h         |  6 +++++
 security/smack/smack_lsm.c     | 35 ++++++++++++++++++++-------
 15 files changed, 206 insertions(+), 45 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/7] fs: Add user namesapace member to struct super_block
  2015-09-16 20:02 [PATCH v3 0/7] Initial support for user namespace owned mounts Seth Forshee
@ 2015-09-16 20:02 ` Seth Forshee
  2015-09-16 20:02 ` [PATCH v3 2/7] userns: Simpilify MNT_NODEV handling Seth Forshee
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Seth Forshee @ 2015-09-16 20:02 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Jeff Layton, J. Bruce Fields
  Cc: Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	Seth Forshee

Initially this will be used to eliminate the implicit MNT_NODEV
flag for mounts from user namespaces. In the future it will also
be used for translating ids and checking capabilities for
filesystems mounted from user namespaces.

s_user_ns is initialized in alloc_super() and is generally set to
current_user_ns(). To avoid security and corruption issues, two
additional mount checks are also added:

 - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
   in current_user_ns().

 - sget() will fail with EBUSY when the filesystem it's looking
   for is already mounted from another user namespace.

proc requires some special handling. The user namespace of
current isn't appropriate when forking as a result of clone (2)
with CLONE_NEWPID|CLONE_NEWUSER, as it will set s_user_ns to the
namespace of the parent and make proc unmountable in the new user
namespace. Instead, the user namespace which owns the new pid
namespace is used. sget_userns() is allowed to allow passing in
a namespace other than that of current, and sget becomes a
wrapper around sget_userns() which passes current_user_ns().

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 fs/namespace.c     |  3 +++
 fs/proc/root.c     |  3 ++-
 fs/super.c         | 38 +++++++++++++++++++++++++++++++++-----
 include/linux/fs.h |  9 ++++++++-
 4 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 0570729c87fd..d023a353dc63 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2381,6 +2381,9 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 	struct vfsmount *mnt;
 	int err;
 
+	if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
+		return -EPERM;
+
 	if (!fstype)
 		return -EINVAL;
 
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 361ab4ee42fc..4b302cbf13f9 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -117,7 +117,8 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 			return ERR_PTR(-EPERM);
 	}
 
-	sb = sget(fs_type, proc_test_super, proc_set_super, flags, ns);
+	sb = sget_userns(fs_type, proc_test_super, proc_set_super, flags,
+			 ns->user_ns, ns);
 	if (IS_ERR(sb))
 		return ERR_CAST(sb);
 
diff --git a/fs/super.c b/fs/super.c
index 954aeb80e202..42837da7d641 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -33,6 +33,7 @@
 #include <linux/cleancache.h>
 #include <linux/fsnotify.h>
 #include <linux/lockdep.h>
+#include <linux/user_namespace.h>
 #include "internal.h"
 
 
@@ -163,6 +164,7 @@ static void destroy_super(struct super_block *s)
 {
 	list_lru_destroy(&s->s_dentry_lru);
 	list_lru_destroy(&s->s_inode_lru);
+	put_user_ns(s->s_user_ns);
 	security_sb_free(s);
 	WARN_ON(!list_empty(&s->s_mounts));
 	kfree(s->s_subtype);
@@ -178,7 +180,8 @@ static void destroy_super(struct super_block *s)
  *	Allocates and initializes a new &struct super_block.  alloc_super()
  *	returns a pointer new superblock or %NULL if allocation had failed.
  */
-static struct super_block *alloc_super(struct file_system_type *type, int flags)
+static struct super_block *alloc_super(struct file_system_type *type, int flags,
+				       struct user_namespace *user_ns)
 {
 	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
 	static const struct super_operations default_op;
@@ -246,6 +249,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 	s->s_shrink.count_objects = super_cache_count;
 	s->s_shrink.batch = 1024;
 	s->s_shrink.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE;
+
+	s->s_user_ns = get_user_ns(user_ns);
 	return s;
 
 fail:
@@ -442,17 +447,17 @@ void generic_shutdown_super(struct super_block *sb)
 EXPORT_SYMBOL(generic_shutdown_super);
 
 /**
- *	sget	-	find or create a superblock
+ *	sget_userns -	find or create a superblock
  *	@type:	filesystem type superblock should belong to
  *	@test:	comparison callback
  *	@set:	setup callback
  *	@flags:	mount flags
  *	@data:	argument to each of them
  */
-struct super_block *sget(struct file_system_type *type,
+struct super_block *sget_userns(struct file_system_type *type,
 			int (*test)(struct super_block *,void *),
 			int (*set)(struct super_block *,void *),
-			int flags,
+			int flags, struct user_namespace *user_ns,
 			void *data)
 {
 	struct super_block *s = NULL;
@@ -465,6 +470,10 @@ retry:
 		hlist_for_each_entry(old, &type->fs_supers, s_instances) {
 			if (!test(old, data))
 				continue;
+			if (user_ns != old->s_user_ns) {
+				spin_unlock(&sb_lock);
+				return ERR_PTR(-EBUSY);
+			}
 			if (!grab_super(old))
 				goto retry;
 			if (s) {
@@ -477,7 +486,7 @@ retry:
 	}
 	if (!s) {
 		spin_unlock(&sb_lock);
-		s = alloc_super(type, flags);
+		s = alloc_super(type, flags, user_ns);
 		if (!s)
 			return ERR_PTR(-ENOMEM);
 		goto retry;
@@ -500,6 +509,25 @@ retry:
 	return s;
 }
 
+EXPORT_SYMBOL(sget_userns);
+
+/**
+ *	sget	-	find or create a superblock
+ *	@type:	  filesystem type superblock should belong to
+ *	@test:	  comparison callback
+ *	@set:	  setup callback
+ *	@flags:	  mount flags
+ *	@data:	  argument to each of them
+ */
+struct super_block *sget(struct file_system_type *type,
+			int (*test)(struct super_block *,void *),
+			int (*set)(struct super_block *,void *),
+			int flags,
+			void *data)
+{
+	return sget_userns(type, test, set, flags, current_user_ns(), data);
+}
+
 EXPORT_SYMBOL(sget);
 
 void drop_super(struct super_block *sb)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 72d8a844c692..79c15ab2159d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -31,6 +31,7 @@
 #include <linux/blk_types.h>
 #include <linux/workqueue.h>
 #include <linux/percpu-rwsem.h>
+#include <linux/user_namespace.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -1367,6 +1368,8 @@ struct super_block {
 	struct workqueue_struct *s_dio_done_wq;
 	struct hlist_head s_pins;
 
+	struct user_namespace *s_user_ns;
+
 	/*
 	 * Keep the lru lists last in the structure so they always sit on their
 	 * own individual cachelines.
@@ -1509,7 +1512,6 @@ static inline void sb_start_intwrite(struct super_block *sb)
 	__sb_start_write(sb, SB_FREEZE_FS, true);
 }
 
-
 extern bool inode_owner_or_capable(const struct inode *inode);
 
 /*
@@ -1984,6 +1986,11 @@ void deactivate_locked_super(struct super_block *sb);
 int set_anon_super(struct super_block *s, void *data);
 int get_anon_bdev(dev_t *);
 void free_anon_bdev(dev_t);
+struct super_block *sget_userns(struct file_system_type *type,
+			int (*test)(struct super_block *,void *),
+			int (*set)(struct super_block *,void *),
+			int flags, struct user_namespace *user_ns,
+			void *data);
 struct super_block *sget(struct file_system_type *type,
 			int (*test)(struct super_block *,void *),
 			int (*set)(struct super_block *,void *),
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/7] userns: Simpilify MNT_NODEV handling.
  2015-09-16 20:02 [PATCH v3 0/7] Initial support for user namespace owned mounts Seth Forshee
  2015-09-16 20:02 ` [PATCH v3 1/7] fs: Add user namesapace member to struct super_block Seth Forshee
@ 2015-09-16 20:02 ` Seth Forshee
  2015-09-17  0:24   ` Andy Lutomirski
  2015-09-16 20:02 ` [PATCH v3 3/7] fs: Verify access of user towards block device file when mounting Seth Forshee
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Seth Forshee @ 2015-09-16 20:02 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Jeff Layton, J. Bruce Fields
  Cc: Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	Seth Forshee

From: "Eric W. Biederman" <ebiederm@xmission.com>

- Consolidate the testing if a device node may be opened in a new
  function may_open_dev.

- Move the check for allowing access to device nodes on filesystems
  not mounted in the initial user namespace from mount time to open
  time and include it in may_open_dev.

This set of changes removes the implicit adding of MNT_NODEV which
simplifies the logic in fs/namespace.c and removes a potentially
problematic user visible difference in how normal and unprivileged
mount namespaces work.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/block_dev.c     |  2 +-
 fs/namei.c         |  9 ++++++++-
 fs/namespace.c     | 18 ++++--------------
 include/linux/fs.h |  1 +
 4 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 22ea424ee741..26cee058dc02 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1730,7 +1730,7 @@ struct block_device *lookup_bdev(const char *pathname)
 	if (!S_ISBLK(inode->i_mode))
 		goto fail;
 	error = -EACCES;
-	if (path.mnt->mnt_flags & MNT_NODEV)
+	if (!may_open_dev(&path))
 		goto fail;
 	error = -ENOMEM;
 	bdev = bd_acquire(inode);
diff --git a/fs/namei.c b/fs/namei.c
index 726d211db484..fcc5751d6395 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2663,6 +2663,13 @@ int vfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
 }
 EXPORT_SYMBOL(vfs_create);
 
+bool may_open_dev(const struct path *path)
+{
+	return !(path->mnt->mnt_flags & MNT_NODEV) &&
+		((path->mnt->mnt_sb->s_user_ns == &init_user_ns) ||
+		 (path->mnt->mnt_sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT));
+}
+
 static int may_open(struct path *path, int acc_mode, int flag)
 {
 	struct dentry *dentry = path->dentry;
@@ -2685,7 +2692,7 @@ static int may_open(struct path *path, int acc_mode, int flag)
 		break;
 	case S_IFBLK:
 	case S_IFCHR:
-		if (path->mnt->mnt_flags & MNT_NODEV)
+		if (!may_open_dev(path))
 			return -EACCES;
 		/*FALLTHRU*/
 	case S_IFIFO:
diff --git a/fs/namespace.c b/fs/namespace.c
index d023a353dc63..da70f7c4ece1 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2177,13 +2177,7 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	}
 	if ((mnt->mnt.mnt_flags & MNT_LOCK_NODEV) &&
 	    !(mnt_flags & MNT_NODEV)) {
-		/* Was the nodev implicitly added in mount? */
-		if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
-		    !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
-			mnt_flags |= MNT_NODEV;
-		} else {
-			return -EPERM;
-		}
+		return -EPERM;
 	}
 	if ((mnt->mnt.mnt_flags & MNT_LOCK_NOSUID) &&
 	    !(mnt_flags & MNT_NOSUID)) {
@@ -2396,13 +2390,6 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 			put_filesystem(type);
 			return -EPERM;
 		}
-		/* Only in special cases allow devices from mounts
-		 * created outside the initial user namespace.
-		 */
-		if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
-			flags |= MS_NODEV;
-			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
-		}
 		if (type->fs_flags & FS_USERNS_VISIBLE) {
 			if (!fs_fully_visible(type, &mnt_flags))
 				return -EPERM;
@@ -3238,6 +3225,9 @@ static bool fs_fully_visible(struct file_system_type *type, int *new_mnt_flags)
 		mnt_flags = mnt->mnt.mnt_flags;
 		if (mnt->mnt.mnt_sb->s_iflags & SB_I_NOEXEC)
 			mnt_flags &= ~(MNT_LOCK_NOSUID | MNT_LOCK_NOEXEC);
+		if (mnt->mnt.mnt_sb->s_user_ns != &init_user_ns &&
+		    !(mnt->mnt.mnt_sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT))
+			mnt_flags &= ~(MNT_LOCK_NODEV);
 
 		/* Verify the mount flags are equal to or more permissive
 		 * than the proposed new mount.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 79c15ab2159d..5ec201e8308c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1537,6 +1537,7 @@ extern void dentry_unhash(struct dentry *dentry);
  */
 extern void inode_init_owner(struct inode *inode, const struct inode *dir,
 			umode_t mode);
+extern bool may_open_dev(const struct path *path);
 /*
  * VFS FS_IOC_FIEMAP helper definitions.
  */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/7] fs: Verify access of user towards block device file when mounting
  2015-09-16 20:02 [PATCH v3 0/7] Initial support for user namespace owned mounts Seth Forshee
  2015-09-16 20:02 ` [PATCH v3 1/7] fs: Add user namesapace member to struct super_block Seth Forshee
  2015-09-16 20:02 ` [PATCH v3 2/7] userns: Simpilify MNT_NODEV handling Seth Forshee
@ 2015-09-16 20:02 ` Seth Forshee
  2015-09-16 20:02 ` [PATCH v3 4/7] fs: Limit file caps to the user namespace of the super block Seth Forshee
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Seth Forshee @ 2015-09-16 20:02 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, David Woodhouse, Brian Norris,
	Jeff Layton, J. Bruce Fields
  Cc: Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	Seth Forshee

When mounting a filesystem on a block device there is currently
no verification that the user has appropriate access to the
device file passed to mount. This has not been an issue so far
since the user in question has always been root, but this must
be changed before allowing unprivileged users to mount in user
namespaces.

To do this, a new version of lookup_bdev() is added named
lookup_bdev_perm(). Both of these functions become wrappers
around a common inner fucntion. The behavior of lookup_bdev() is
unchanged, but calling lookup_bdev_perm() will fail if the user
does not have the specified access rights to the supplied path.
The permission check is skipped if the user has CAP_SYS_ADMIN to
avoid any possible regressions in behavior.

blkdev_get_by_path() is updated to use lookup_bdev_perm(). This
is used by mount_bdev() and mount_mtd(), so this will cause
mounts on block devices to fail when the user lacks the required
permissions. Other calls to blkdev_get_by_path() will all happen
with root privileges, so these calls will be unaffected.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 drivers/mtd/mtdsuper.c |  7 ++++++-
 fs/block_dev.c         | 52 ++++++++++++++++++++++++++++++++++++++++----------
 include/linux/fs.h     |  1 +
 3 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
index 20c02a3b7417..a3970b0506b9 100644
--- a/drivers/mtd/mtdsuper.c
+++ b/drivers/mtd/mtdsuper.c
@@ -125,6 +125,7 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
 #ifdef CONFIG_BLOCK
 	struct block_device *bdev;
 	int ret, major;
+	int perm = 0;
 #endif
 	int mtdnr;
 
@@ -176,7 +177,11 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
 	/* try the old way - the hack where we allowed users to mount
 	 * /dev/mtdblock$(n) but didn't actually _use_ the blockdev
 	 */
-	bdev = lookup_bdev(dev_name);
+	if (flags & FMODE_READ)
+		perm |= MAY_READ;
+	if (flags & FMODE_WRITE)
+		perm |= MAY_WRITE;
+	bdev = lookup_bdev_perm(dev_name, perm);
 	if (IS_ERR(bdev)) {
 		ret = PTR_ERR(bdev);
 		pr_debug("MTDSB: lookup_bdev() returned %d\n", ret);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 26cee058dc02..d7b83e4ec138 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1394,9 +1394,14 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
 					void *holder)
 {
 	struct block_device *bdev;
+	int perm = 0;
 	int err;
 
-	bdev = lookup_bdev(path);
+	if (mode & FMODE_READ)
+		perm |= MAY_READ;
+	if (mode & FMODE_WRITE)
+		perm |= MAY_WRITE;
+	bdev = lookup_bdev_perm(path, perm);
 	if (IS_ERR(bdev))
 		return bdev;
 
@@ -1703,15 +1708,8 @@ int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
 
 EXPORT_SYMBOL(ioctl_by_bdev);
 
-/**
- * lookup_bdev  - lookup a struct block_device by name
- * @pathname:	special file representing the block device
- *
- * Get a reference to the blockdevice at @pathname in the current
- * namespace if possible and return it.  Return ERR_PTR(error)
- * otherwise.
- */
-struct block_device *lookup_bdev(const char *pathname)
+static struct block_device *__lookup_bdev(const char *pathname, int mask,
+					  bool check_perm)
 {
 	struct block_device *bdev;
 	struct inode *inode;
@@ -1726,6 +1724,11 @@ struct block_device *lookup_bdev(const char *pathname)
 		return ERR_PTR(error);
 
 	inode = d_backing_inode(path.dentry);
+	if (check_perm && !capable(CAP_SYS_ADMIN)) {
+		error = inode_permission(inode, mask);
+		if (error)
+			goto fail;
+	}
 	error = -ENOTBLK;
 	if (!S_ISBLK(inode->i_mode))
 		goto fail;
@@ -1743,6 +1746,35 @@ fail:
 	bdev = ERR_PTR(error);
 	goto out;
 }
+
+/*
+ * lookup_bdev_perm - lookup a struct block_device by name and check for
+ *		      access rights
+ * @pathname:	special file representing the block device
+ * @mask:	rights to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
+ *
+ * Get a reference to the blockdevice at @pathname in the current
+ * namespace if possible, check for the specified rights to the device
+ * at @pathname, and return it.  Return ERR_PTR(error) otherwise.
+ */
+struct block_device *lookup_bdev_perm(const char *pathname, int mask)
+{
+	return __lookup_bdev(pathname, mask, true);
+}
+EXPORT_SYMBOL(lookup_bdev_perm);
+
+/**
+ * lookup_bdev  - lookup a struct block_device by name
+ * @pathname:	special file representing the block device
+ *
+ * Get a reference to the blockdevice at @pathname in the current
+ * namespace if possible and return it.  Return ERR_PTR(error)
+ * otherwise.
+ */
+struct block_device *lookup_bdev(const char *pathname)
+{
+	return __lookup_bdev(pathname, 0, false);
+}
 EXPORT_SYMBOL(lookup_bdev);
 
 int __invalidate_device(struct block_device *bdev, bool kill_dirty)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 5ec201e8308c..d03c2061bdd2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2383,6 +2383,7 @@ static inline void unregister_chrdev(unsigned int major, const char *name)
 #define BLKDEV_MAJOR_HASH_SIZE	255
 extern const char *__bdevname(dev_t, char *buffer);
 extern const char *bdevname(struct block_device *bdev, char *buffer);
+extern struct block_device *lookup_bdev_perm(const char *, int);
 extern struct block_device *lookup_bdev(const char *);
 extern void blkdev_show(struct seq_file *,off_t);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/7] fs: Limit file caps to the user namespace of the super block
  2015-09-16 20:02 [PATCH v3 0/7] Initial support for user namespace owned mounts Seth Forshee
                   ` (2 preceding siblings ...)
  2015-09-16 20:02 ` [PATCH v3 3/7] fs: Verify access of user towards block device file when mounting Seth Forshee
@ 2015-09-16 20:02 ` Seth Forshee
  2015-09-16 20:02 ` [PATCH v3 5/7] fs: Treat foreign mounts as nosuid Seth Forshee
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Seth Forshee @ 2015-09-16 20:02 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Serge Hallyn, James Morris,
	Serge E. Hallyn
  Cc: Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd, Seth Forshee

Capability sets attached to files must be ignored except in the
user namespaces where the mounter is privileged, i.e. s_user_ns
and its descendants. Otherwise a vector exists for gaining
privileges in namespaces where a user is not already privileged.

Add a new helper function, in_user_ns(), to test whether a user
namespace is the same as or a descendant of another namespace.
Use this helper to determine whether a file's capability set
should be applied to the caps constructed during exec.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 include/linux/user_namespace.h |  8 ++++++++
 kernel/user_namespace.c        | 14 ++++++++++++++
 security/commoncap.c           |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 8297e5b341d8..a43faa727124 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -72,6 +72,8 @@ extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t,
 extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
 extern int proc_setgroups_show(struct seq_file *m, void *v);
 extern bool userns_may_setgroups(const struct user_namespace *ns);
+extern bool in_userns(const struct user_namespace *ns,
+		      const struct user_namespace *target_ns);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -100,6 +102,12 @@ static inline bool userns_may_setgroups(const struct user_namespace *ns)
 {
 	return true;
 }
+
+static inline bool in_userns(const struct user_namespace *ns,
+			     const struct user_namespace *target_ns)
+{
+	return true;
+}
 #endif
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 88fefa68c516..69fbc377357b 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -945,6 +945,20 @@ bool userns_may_setgroups(const struct user_namespace *ns)
 	return allowed;
 }
 
+/*
+ * Returns true if @ns is the same namespace as or a descendant of
+ * @target_ns.
+ */
+bool in_userns(const struct user_namespace *ns,
+	       const struct user_namespace *target_ns)
+{
+	for (; ns; ns = ns->parent) {
+		if (ns == target_ns)
+			return true;
+	}
+	return false;
+}
+
 static inline struct user_namespace *to_user_ns(struct ns_common *ns)
 {
 	return container_of(ns, struct user_namespace, ns);
diff --git a/security/commoncap.c b/security/commoncap.c
index 1832cf701c3d..400aa224b491 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -450,6 +450,8 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 
 	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
 		return 0;
+	if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns))
+		return 0;
 
 	rc = get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
 	if (rc < 0) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 5/7] fs: Treat foreign mounts as nosuid
  2015-09-16 20:02 [PATCH v3 0/7] Initial support for user namespace owned mounts Seth Forshee
                   ` (3 preceding siblings ...)
  2015-09-16 20:02 ` [PATCH v3 4/7] fs: Limit file caps to the user namespace of the super block Seth Forshee
@ 2015-09-16 20:02 ` Seth Forshee
  2015-09-16 20:57   ` Andy Lutomirski
  2015-09-16 20:02 ` [PATCH v3 6/7] Smack: Add support for unprivileged mounts from user namespaces Seth Forshee
  2015-09-16 20:02 ` [PATCH v3 7/7] selinux: " Seth Forshee
  6 siblings, 1 reply; 16+ messages in thread
From: Seth Forshee @ 2015-09-16 20:02 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Serge Hallyn, James Morris,
	Serge E. Hallyn, Paul Moore, Stephen Smalley, Eric Paris
  Cc: Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd, Seth Forshee

From: Andy Lutomirski <luto@amacapital.net>

If a process gets access to a mount from a different user
namespace, that process should not be able to take advantage of
setuid files or selinux entrypoints from that filesystem.  Prevent
this by treating mounts from other mount namespaces and those not
owned by current_user_ns() or an ancestor as nosuid.

This will make it safer to allow more complex filesystems to be
mounted in non-root user namespaces.

This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
setgid, and file capability bits can no longer be abused if code in
a user namespace were to clear nosuid on an untrusted filesystem,
but this patch, by itself, is insufficient to protect the system
from abuse of files that, when execed, would increase MAC privilege.

As a more concrete explanation, any task that can manipulate a
vfsmount associated with a given user namespace already has
capabilities in that namespace and all of its descendents.  If they
can cause a malicious setuid, setgid, or file-caps executable to
appear in that mount, then that executable will only allow them to
elevate privileges in exactly the set of namespaces in which they
are already privileges.

On the other hand, if they can cause a malicious executable to
appear with a dangerous MAC label, running it could change the
caller's security context in a way that should not have been
possible, even inside the namespace in which the task is confined.

As a hardening measure, this would have made CVE-2014-5207 much
more difficult to exploit.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 fs/exec.c                |  2 +-
 fs/namespace.c           | 13 +++++++++++++
 include/linux/mount.h    |  1 +
 security/commoncap.c     |  2 +-
 security/selinux/hooks.c |  2 +-
 5 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b06623a9347f..ea7311d72cc3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1295,7 +1295,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
 	bprm->cred->euid = current_euid();
 	bprm->cred->egid = current_egid();
 
-	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+	if (!mnt_may_suid(bprm->file->f_path.mnt))
 		return;
 
 	if (task_no_new_privs(current))
diff --git a/fs/namespace.c b/fs/namespace.c
index da70f7c4ece1..2101ce7b96ab 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3276,6 +3276,19 @@ found:
 	return visible;
 }
 
+bool mnt_may_suid(struct vfsmount *mnt)
+{
+	/*
+	 * Foreign mounts (accessed via fchdir or through /proc
+	 * symlinks) are always treated as if they are nosuid.  This
+	 * prevents namespaces from trusting potentially unsafe
+	 * suid/sgid bits, file caps, or security labels that originate
+	 * in other namespaces.
+	 */
+	return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) &&
+	       in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns);
+}
+
 static struct ns_common *mntns_get(struct task_struct *task)
 {
 	struct ns_common *ns = NULL;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index f822c3c11377..54a594d49733 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -81,6 +81,7 @@ extern void mntput(struct vfsmount *mnt);
 extern struct vfsmount *mntget(struct vfsmount *mnt);
 extern struct vfsmount *mnt_clone_internal(struct path *path);
 extern int __mnt_is_readonly(struct vfsmount *mnt);
+extern bool mnt_may_suid(struct vfsmount *mnt);
 
 struct path;
 extern struct vfsmount *clone_private_mount(struct path *path);
diff --git a/security/commoncap.c b/security/commoncap.c
index 400aa224b491..6243aef5860e 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -448,7 +448,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 	if (!file_caps_enabled)
 		return 0;
 
-	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+	if (!mnt_may_suid(bprm->file->f_path.mnt))
 		return 0;
 	if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns))
 		return 0;
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index e4369d86e588..de05207eb665 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2171,7 +2171,7 @@ static int check_nnp_nosuid(const struct linux_binprm *bprm,
 			    const struct task_security_struct *new_tsec)
 {
 	int nnp = (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS);
-	int nosuid = (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID);
+	int nosuid = !mnt_may_suid(bprm->file->f_path.mnt);
 	int rc;
 
 	if (!nnp && !nosuid)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 6/7] Smack: Add support for unprivileged mounts from user namespaces
  2015-09-16 20:02 [PATCH v3 0/7] Initial support for user namespace owned mounts Seth Forshee
                   ` (4 preceding siblings ...)
  2015-09-16 20:02 ` [PATCH v3 5/7] fs: Treat foreign mounts as nosuid Seth Forshee
@ 2015-09-16 20:02 ` Seth Forshee
  2015-09-16 20:33   ` Casey Schaufler
  2015-09-16 20:02 ` [PATCH v3 7/7] selinux: " Seth Forshee
  6 siblings, 1 reply; 16+ messages in thread
From: Seth Forshee @ 2015-09-16 20:02 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Casey Schaufler
  Cc: Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	Seth Forshee, James Morris, Serge E. Hallyn

Security labels from unprivileged mounts cannot be trusted.
Ideally for these mounts we would assign the objects in the
filesystem the same label as the inode for the backing device
passed to mount. Unfortunately it's currently impossible to
determine which inode this is from the LSM mount hooks, so we
settle for the label of the process doing the mount.

This label is assigned to s_root, and also to smk_default to
ensure that new inodes receive this label. The transmute property
is also set on s_root to make this behavior more explicit, even
though it is technically not necessary.

If a filesystem has existing security labels, access to inodes is
permitted if the label is the same as smk_root, otherwise access
is denied. The SMACK64EXEC xattr is completely ignored.

Explicit setting of security labels continues to require
CAP_MAC_ADMIN in init_user_ns.

Altogether, this ensures that filesystem objects are not
accessible to subjects which cannot already access the backing
store, that MAC is not violated for any objects in the fileystem
which are already labeled, and that a user cannot use an
unprivileged mount to gain elevated MAC privileges.

sysfs, tmpfs, and ramfs are already mountable from user
namespaces and support security labels. We can't rule out the
possibility that these filesystems may already be used in mounts
from user namespaces with security lables set from the init
namespace, so failing to trust lables in these filesystems may
introduce regressions. It is safe to trust labels from these
filesystems, since the unprivileged user does not control the
backing store and thus cannot supply security labels, so an
explicit exception is made to trust labels from these
filesystems.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 security/smack/smack.h     |  6 ++++++
 security/smack/smack_lsm.c | 35 +++++++++++++++++++++++++++--------
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/security/smack/smack.h b/security/smack/smack.h
index fff0c612bbb7..070223960a2c 100644
--- a/security/smack/smack.h
+++ b/security/smack/smack.h
@@ -91,8 +91,14 @@ struct superblock_smack {
 	struct smack_known	*smk_hat;
 	struct smack_known	*smk_default;
 	int			smk_initialized;
+	int			smk_flags;
 };

+/*
+ * Superblock flags
+ */
+#define SMK_SB_UNTRUSTED	0x01
+
 struct socket_smack {
 	struct smack_known	*smk_out;	/* outbound label */
 	struct smack_known	*smk_in;	/* inbound label */
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 996c88956438..cdfd67b61534 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -793,6 +793,17 @@ static int smack_set_mnt_opts(struct super_block *sb,
 		skp = smk_of_current();
 		sp->smk_root = skp;
 		sp->smk_default = skp;
+		/*
+		 * For a handful of fs types with no user-controlled
+		 * backing store it's okay to trust security labels
+		 * in the filesystem. The rest are untrusted.
+		 */
+		if (sb->s_user_ns != &init_user_ns &&
+		    sb->s_magic != SYSFS_MAGIC && sb->s_magic != TMPFS_MAGIC &&
+		    sb->s_magic != RAMFS_MAGIC) {
+			transmute = 1;
+			sp->smk_flags |= SMK_SB_UNTRUSTED;
+		}
 	}

 	/*
@@ -1175,6 +1186,7 @@ static int smack_inode_rename(struct inode *old_inode,
  */
 static int smack_inode_permission(struct inode *inode, int mask)
 {
+	struct superblock_smack *sbsp = inode->i_sb->s_security;
 	struct smk_audit_info ad;
 	int no_block = mask & MAY_NOT_BLOCK;
 	int rc;
@@ -1186,6 +1198,11 @@ static int smack_inode_permission(struct inode *inode, int mask)
 	if (mask == 0)
 		return 0;

+	if (sbsp->smk_flags & SMK_SB_UNTRUSTED) {
+		if (smk_of_inode(inode) != sbsp->smk_root)
+			return -EACCES;
+	}
+
 	/* May be droppable after audit */
 	if (no_block)
 		return -ECHILD;
@@ -3475,14 +3492,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
 			if (rc >= 0)
 				transflag = SMK_INODE_TRANSMUTE;
 		}
-		/*
-		 * Don't let the exec or mmap label be "*" or "@".
-		 */
-		skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
-		if (IS_ERR(skp) || skp == &smack_known_star ||
-		    skp == &smack_known_web)
-			skp = NULL;
-		isp->smk_task = skp;
+		if (!(sbsp->smk_flags & SMK_SB_UNTRUSTED)) {
+			/*
+			 * Don't let the exec or mmap label be "*" or "@".
+			 */
+			skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
+			if (IS_ERR(skp) || skp == &smack_known_star ||
+			    skp == &smack_known_web)
+				skp = NULL;
+			isp->smk_task = skp;
+		}

 		skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp);
 		if (IS_ERR(skp) || skp == &smack_known_star ||
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 7/7] selinux: Add support for unprivileged mounts from user namespaces
  2015-09-16 20:02 [PATCH v3 0/7] Initial support for user namespace owned mounts Seth Forshee
                   ` (5 preceding siblings ...)
  2015-09-16 20:02 ` [PATCH v3 6/7] Smack: Add support for unprivileged mounts from user namespaces Seth Forshee
@ 2015-09-16 20:02 ` Seth Forshee
  6 siblings, 0 replies; 16+ messages in thread
From: Seth Forshee @ 2015-09-16 20:02 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Paul Moore, Stephen Smalley,
	Eric Paris
  Cc: Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	Seth Forshee, James Morris, Serge E. Hallyn

Security labels from unprivileged mounts in user namespaces must
be ignored. Force superblocks from user namespaces whose labeling
behavior is to use xattrs to use mountpoint labeling instead.
For the mountpoint label, default to converting the current task
context into a form suitable for file objects, but also allow the
policy writer to specify a different label through policy
transition rules.

Pieced together from code snippets provided by Stephen Smalley.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 security/selinux/hooks.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index de05207eb665..09be1dc21e58 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -756,6 +756,28 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 			goto out;
 		}
 	}
+
+	/*
+	 * If this is a user namespace mount, no contexts are allowed
+	 * on the command line and security labels must be ignored.
+	 */
+	if (sb->s_user_ns != &init_user_ns) {
+		if (context_sid || fscontext_sid || rootcontext_sid ||
+		    defcontext_sid) {
+			rc = -EACCES;
+			goto out;
+		}
+		if (sbsec->behavior == SECURITY_FS_USE_XATTR) {
+			sbsec->behavior = SECURITY_FS_USE_MNTPOINT;
+			rc = security_transition_sid(current_sid(), current_sid(),
+						     SECCLASS_FILE, NULL,
+						     &sbsec->mntpoint_sid);
+			if (rc)
+				goto out;
+		}
+		goto out_set_opts;
+	}
+
 	/* sets the context of the superblock for the fs being mounted. */
 	if (fscontext_sid) {
 		rc = may_context_mount_sb_relabel(fscontext_sid, sbsec, cred);
@@ -824,6 +846,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 		sbsec->def_sid = defcontext_sid;
 	}
 
+out_set_opts:
 	rc = sb_finish_set_opts(sb);
 out:
 	mutex_unlock(&sbsec->lock);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 6/7] Smack: Add support for unprivileged mounts from user namespaces
  2015-09-16 20:02 ` [PATCH v3 6/7] Smack: Add support for unprivileged mounts from user namespaces Seth Forshee
@ 2015-09-16 20:33   ` Casey Schaufler
  2015-09-17 12:50     ` Seth Forshee
  0 siblings, 1 reply; 16+ messages in thread
From: Casey Schaufler @ 2015-09-16 20:33 UTC (permalink / raw)
  To: Seth Forshee, Eric W. Biederman, Alexander Viro
  Cc: Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	James Morris, Serge E. Hallyn, Casey Schaufler

On 9/16/2015 1:02 PM, Seth Forshee wrote:
> Security labels from unprivileged mounts cannot be trusted.
> Ideally for these mounts we would assign the objects in the
> filesystem the same label as the inode for the backing device
> passed to mount. Unfortunately it's currently impossible to
> determine which inode this is from the LSM mount hooks, so we
> settle for the label of the process doing the mount.
>
> This label is assigned to s_root, and also to smk_default to
> ensure that new inodes receive this label. The transmute property
> is also set on s_root to make this behavior more explicit, even
> though it is technically not necessary.
>
> If a filesystem has existing security labels, access to inodes is
> permitted if the label is the same as smk_root, otherwise access
> is denied. The SMACK64EXEC xattr is completely ignored.
>
> Explicit setting of security labels continues to require
> CAP_MAC_ADMIN in init_user_ns.
>
> Altogether, this ensures that filesystem objects are not
> accessible to subjects which cannot already access the backing
> store, that MAC is not violated for any objects in the fileystem
> which are already labeled, and that a user cannot use an
> unprivileged mount to gain elevated MAC privileges.
>
> sysfs, tmpfs, and ramfs are already mountable from user
> namespaces and support security labels. We can't rule out the
> possibility that these filesystems may already be used in mounts
> from user namespaces with security lables set from the init
> namespace, so failing to trust lables in these filesystems may
> introduce regressions. It is safe to trust labels from these
> filesystems, since the unprivileged user does not control the
> backing store and thus cannot supply security labels, so an
> explicit exception is made to trust labels from these
> filesystems.
>
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

One coding comment below, otherwise looking good.

> ---
>  security/smack/smack.h     |  6 ++++++
>  security/smack/smack_lsm.c | 35 +++++++++++++++++++++++++++--------
>  2 files changed, 33 insertions(+), 8 deletions(-)
>
> diff --git a/security/smack/smack.h b/security/smack/smack.h
> index fff0c612bbb7..070223960a2c 100644
> --- a/security/smack/smack.h
> +++ b/security/smack/smack.h
> @@ -91,8 +91,14 @@ struct superblock_smack {
>  	struct smack_known	*smk_hat;
>  	struct smack_known	*smk_default;
>  	int			smk_initialized;
> +	int			smk_flags;

How about deleting smk_initialized and using a bit
in smk_flags. A whole int for each seems excessive.
The smk_initialized field is only used in two places,
both in smack_set_mnt_opts.

>  };
>  
> +/*
> + * Superblock flags
> + */
> +#define SMK_SB_UNTRUSTED	0x01

+ #define SMK_SB_INITIALIZED	0x02

> +
>  struct socket_smack {
>  	struct smack_known	*smk_out;	/* outbound label */
>  	struct smack_known	*smk_in;	/* inbound label */
> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
> index 996c88956438..cdfd67b61534 100644
> --- a/security/smack/smack_lsm.c
> +++ b/security/smack/smack_lsm.c
> @@ -793,6 +793,17 @@ static int smack_set_mnt_opts(struct super_block *sb,
>  		skp = smk_of_current();
>  		sp->smk_root = skp;
>  		sp->smk_default = skp;
> +		/*
> +		 * For a handful of fs types with no user-controlled
> +		 * backing store it's okay to trust security labels
> +		 * in the filesystem. The rest are untrusted.
> +		 */
> +		if (sb->s_user_ns != &init_user_ns &&
> +		    sb->s_magic != SYSFS_MAGIC && sb->s_magic != TMPFS_MAGIC &&
> +		    sb->s_magic != RAMFS_MAGIC) {
> +			transmute = 1;
> +			sp->smk_flags |= SMK_SB_UNTRUSTED;
> +		}
>  	}
>  
>  	/*
> @@ -1175,6 +1186,7 @@ static int smack_inode_rename(struct inode *old_inode,
>   */
>  static int smack_inode_permission(struct inode *inode, int mask)
>  {
> +	struct superblock_smack *sbsp = inode->i_sb->s_security;
>  	struct smk_audit_info ad;
>  	int no_block = mask & MAY_NOT_BLOCK;
>  	int rc;
> @@ -1186,6 +1198,11 @@ static int smack_inode_permission(struct inode *inode, int mask)
>  	if (mask == 0)
>  		return 0;
>  
> +	if (sbsp->smk_flags & SMK_SB_UNTRUSTED) {
> +		if (smk_of_inode(inode) != sbsp->smk_root)
> +			return -EACCES;
> +	}
> +
>  	/* May be droppable after audit */
>  	if (no_block)
>  		return -ECHILD;
> @@ -3475,14 +3492,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
>  			if (rc >= 0)
>  				transflag = SMK_INODE_TRANSMUTE;
>  		}
> -		/*
> -		 * Don't let the exec or mmap label be "*" or "@".
> -		 */
> -		skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
> -		if (IS_ERR(skp) || skp == &smack_known_star ||
> -		    skp == &smack_known_web)
> -			skp = NULL;
> -		isp->smk_task = skp;
> +		if (!(sbsp->smk_flags & SMK_SB_UNTRUSTED)) {
> +			/*
> +			 * Don't let the exec or mmap label be "*" or "@".
> +			 */
> +			skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
> +			if (IS_ERR(skp) || skp == &smack_known_star ||
> +			    skp == &smack_known_web)
> +				skp = NULL;
> +			isp->smk_task = skp;
> +		}
>  
>  		skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp);
>  		if (IS_ERR(skp) || skp == &smack_known_star ||


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 5/7] fs: Treat foreign mounts as nosuid
  2015-09-16 20:02 ` [PATCH v3 5/7] fs: Treat foreign mounts as nosuid Seth Forshee
@ 2015-09-16 20:57   ` Andy Lutomirski
  2015-09-17 12:49     ` Seth Forshee
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2015-09-16 20:57 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn, James Morris,
	Serge E. Hallyn, Paul Moore, Stephen Smalley, Eric Paris,
	Linux FS Devel, LSM List, SELinux-NSA, linux-kernel, linux-mtd

On Wed, Sep 16, 2015 at 1:02 PM, Seth Forshee
<seth.forshee@canonical.com> wrote:
> From: Andy Lutomirski <luto@amacapital.net>
>
> If a process gets access to a mount from a different user
> namespace, that process should not be able to take advantage of
> setuid files or selinux entrypoints from that filesystem.  Prevent
> this by treating mounts from other mount namespaces and those not
> owned by current_user_ns() or an ancestor as nosuid.
>
> This will make it safer to allow more complex filesystems to be
> mounted in non-root user namespaces.
>
> This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
> setgid, and file capability bits can no longer be abused if code in
> a user namespace were to clear nosuid on an untrusted filesystem,
> but this patch, by itself, is insufficient to protect the system
> from abuse of files that, when execed, would increase MAC privilege.
>
> As a more concrete explanation, any task that can manipulate a
> vfsmount associated with a given user namespace already has
> capabilities in that namespace and all of its descendents.  If they
> can cause a malicious setuid, setgid, or file-caps executable to
> appear in that mount, then that executable will only allow them to
> elevate privileges in exactly the set of namespaces in which they
> are already privileges.
>
> On the other hand, if they can cause a malicious executable to
> appear with a dangerous MAC label, running it could change the
> caller's security context in a way that should not have been
> possible, even inside the namespace in which the task is confined.
>
> As a hardening measure, this would have made CVE-2014-5207 much
> more difficult to exploit.
>
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> ---
>  fs/exec.c                |  2 +-
>  fs/namespace.c           | 13 +++++++++++++
>  include/linux/mount.h    |  1 +
>  security/commoncap.c     |  2 +-
>  security/selinux/hooks.c |  2 +-
>  5 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index b06623a9347f..ea7311d72cc3 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1295,7 +1295,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
>         bprm->cred->euid = current_euid();
>         bprm->cred->egid = current_egid();
>
> -       if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
> +       if (!mnt_may_suid(bprm->file->f_path.mnt))
>                 return;
>
>         if (task_no_new_privs(current))
> diff --git a/fs/namespace.c b/fs/namespace.c
> index da70f7c4ece1..2101ce7b96ab 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3276,6 +3276,19 @@ found:
>         return visible;
>  }
>
> +bool mnt_may_suid(struct vfsmount *mnt)
> +{
> +       /*
> +        * Foreign mounts (accessed via fchdir or through /proc
> +        * symlinks) are always treated as if they are nosuid.  This
> +        * prevents namespaces from trusting potentially unsafe
> +        * suid/sgid bits, file caps, or security labels that originate
> +        * in other namespaces.
> +        */
> +       return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) &&
> +              in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns);

Is check_mnt correct here?  If I read it correctly, this means that,
if I just unshare my userns and do nothing else (and, in particular,
don't unshare my mount namespace), then everything will have
mnt_may_suid return false.

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/7] userns: Simpilify MNT_NODEV handling.
  2015-09-16 20:02 ` [PATCH v3 2/7] userns: Simpilify MNT_NODEV handling Seth Forshee
@ 2015-09-17  0:24   ` Andy Lutomirski
  2015-09-17  0:54     ` Eric W. Biederman
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2015-09-17  0:24 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Eric W. Biederman, Alexander Viro, Jeff Layton, J. Bruce Fields,
	Serge Hallyn, Linux FS Devel, LSM List, SELinux-NSA,
	linux-kernel, linux-mtd

On Wed, Sep 16, 2015 at 1:02 PM, Seth Forshee
<seth.forshee@canonical.com> wrote:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
>
> - Consolidate the testing if a device node may be opened in a new
>   function may_open_dev.
>
> - Move the check for allowing access to device nodes on filesystems
>   not mounted in the initial user namespace from mount time to open
>   time and include it in may_open_dev.
>
> This set of changes removes the implicit adding of MNT_NODEV which
> simplifies the logic in fs/namespace.c and removes a potentially
> problematic user visible difference in how normal and unprivileged
> mount namespaces work.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

> -               /* Only in special cases allow devices from mounts
> -                * created outside the initial user namespace.
> -                */
> -               if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> -                       flags |= MS_NODEV;
> -                       mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
> -               }

This is an ABI change.  It's probably okay, but I think the commit
message should make it clear what's happening.


--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/7] userns: Simpilify MNT_NODEV handling.
  2015-09-17  0:24   ` Andy Lutomirski
@ 2015-09-17  0:54     ` Eric W. Biederman
  2015-09-17 22:15       ` Andy Lutomirski
  0 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2015-09-17  0:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Seth Forshee, Alexander Viro, Jeff Layton, J. Bruce Fields,
	Serge Hallyn, Linux FS Devel, LSM List, SELinux-NSA,
	linux-kernel, linux-mtd

Andy Lutomirski <luto@amacapital.net> writes:

> On Wed, Sep 16, 2015 at 1:02 PM, Seth Forshee
> <seth.forshee@canonical.com> wrote:
>> From: "Eric W. Biederman" <ebiederm@xmission.com>
>>
>> - Consolidate the testing if a device node may be opened in a new
>>   function may_open_dev.
>>
>> - Move the check for allowing access to device nodes on filesystems
>>   not mounted in the initial user namespace from mount time to open
>>   time and include it in may_open_dev.
>>
>> This set of changes removes the implicit adding of MNT_NODEV which
>> simplifies the logic in fs/namespace.c and removes a potentially
>> problematic user visible difference in how normal and unprivileged
>> mount namespaces work.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
>> -               /* Only in special cases allow devices from mounts
>> -                * created outside the initial user namespace.
>> -                */
>> -               if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
>> -                       flags |= MS_NODEV;
>> -                       mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
>> -               }
>
> This is an ABI change.  It's probably okay, but I think the commit
> message should make it clear what's happening.

You mean it should include in big flashing neon letters
            ***REGRESSION FIX***
?

It is longer in coming than I had hoped.  But that is part of the reason
I did not fix the security hole this way.  Getting the s_user_ns stuff
just so has been non-trivial.

I do agree that because this is a user visible change we do need to keep
our eyes peeled for pieces of userspace software that may depend on the
exact details of the current behavior.

Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 5/7] fs: Treat foreign mounts as nosuid
  2015-09-16 20:57   ` Andy Lutomirski
@ 2015-09-17 12:49     ` Seth Forshee
  2015-09-23 21:00       ` Andy Lutomirski
  0 siblings, 1 reply; 16+ messages in thread
From: Seth Forshee @ 2015-09-17 12:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn, James Morris,
	Serge E. Hallyn, Paul Moore, Stephen Smalley, Eric Paris,
	Linux FS Devel, LSM List, SELinux-NSA, linux-kernel, linux-mtd

On Wed, Sep 16, 2015 at 01:57:10PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 16, 2015 at 1:02 PM, Seth Forshee
> <seth.forshee@canonical.com> wrote:
> > From: Andy Lutomirski <luto@amacapital.net>
> >
> > If a process gets access to a mount from a different user
> > namespace, that process should not be able to take advantage of
> > setuid files or selinux entrypoints from that filesystem.  Prevent
> > this by treating mounts from other mount namespaces and those not
> > owned by current_user_ns() or an ancestor as nosuid.
> >
> > This will make it safer to allow more complex filesystems to be
> > mounted in non-root user namespaces.
> >
> > This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
> > setgid, and file capability bits can no longer be abused if code in
> > a user namespace were to clear nosuid on an untrusted filesystem,
> > but this patch, by itself, is insufficient to protect the system
> > from abuse of files that, when execed, would increase MAC privilege.
> >
> > As a more concrete explanation, any task that can manipulate a
> > vfsmount associated with a given user namespace already has
> > capabilities in that namespace and all of its descendents.  If they
> > can cause a malicious setuid, setgid, or file-caps executable to
> > appear in that mount, then that executable will only allow them to
> > elevate privileges in exactly the set of namespaces in which they
> > are already privileges.
> >
> > On the other hand, if they can cause a malicious executable to
> > appear with a dangerous MAC label, running it could change the
> > caller's security context in a way that should not have been
> > possible, even inside the namespace in which the task is confined.
> >
> > As a hardening measure, this would have made CVE-2014-5207 much
> > more difficult to exploit.
> >
> > Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> > ---
> >  fs/exec.c                |  2 +-
> >  fs/namespace.c           | 13 +++++++++++++
> >  include/linux/mount.h    |  1 +
> >  security/commoncap.c     |  2 +-
> >  security/selinux/hooks.c |  2 +-
> >  5 files changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index b06623a9347f..ea7311d72cc3 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -1295,7 +1295,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
> >         bprm->cred->euid = current_euid();
> >         bprm->cred->egid = current_egid();
> >
> > -       if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
> > +       if (!mnt_may_suid(bprm->file->f_path.mnt))
> >                 return;
> >
> >         if (task_no_new_privs(current))
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index da70f7c4ece1..2101ce7b96ab 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -3276,6 +3276,19 @@ found:
> >         return visible;
> >  }
> >
> > +bool mnt_may_suid(struct vfsmount *mnt)
> > +{
> > +       /*
> > +        * Foreign mounts (accessed via fchdir or through /proc
> > +        * symlinks) are always treated as if they are nosuid.  This
> > +        * prevents namespaces from trusting potentially unsafe
> > +        * suid/sgid bits, file caps, or security labels that originate
> > +        * in other namespaces.
> > +        */
> > +       return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) &&
> > +              in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns);
> 
> Is check_mnt correct here?  If I read it correctly, this means that,
> if I just unshare my userns and do nothing else (and, in particular,
> don't unshare my mount namespace), then everything will have
> mnt_may_suid return false.

The condition in check_mnt is exactly the same as the condition that
check_mnt replaces. If mnt_may_suid returned true before you unshared
only your user namespace then it should also return true after unshare.
The mount ns is the same as it was before so check_mnt returns true, and
the new user namespace is a child of the previous one so in_userns also
returns true.

Seth

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 6/7] Smack: Add support for unprivileged mounts from user namespaces
  2015-09-16 20:33   ` Casey Schaufler
@ 2015-09-17 12:50     ` Seth Forshee
  0 siblings, 0 replies; 16+ messages in thread
From: Seth Forshee @ 2015-09-17 12:50 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn, Andy Lutomirski,
	linux-fsdevel, linux-security-module, selinux, linux-kernel,
	linux-mtd, James Morris, Serge E. Hallyn

On Wed, Sep 16, 2015 at 01:33:50PM -0700, Casey Schaufler wrote:
> On 9/16/2015 1:02 PM, Seth Forshee wrote:
> > Security labels from unprivileged mounts cannot be trusted.
> > Ideally for these mounts we would assign the objects in the
> > filesystem the same label as the inode for the backing device
> > passed to mount. Unfortunately it's currently impossible to
> > determine which inode this is from the LSM mount hooks, so we
> > settle for the label of the process doing the mount.
> >
> > This label is assigned to s_root, and also to smk_default to
> > ensure that new inodes receive this label. The transmute property
> > is also set on s_root to make this behavior more explicit, even
> > though it is technically not necessary.
> >
> > If a filesystem has existing security labels, access to inodes is
> > permitted if the label is the same as smk_root, otherwise access
> > is denied. The SMACK64EXEC xattr is completely ignored.
> >
> > Explicit setting of security labels continues to require
> > CAP_MAC_ADMIN in init_user_ns.
> >
> > Altogether, this ensures that filesystem objects are not
> > accessible to subjects which cannot already access the backing
> > store, that MAC is not violated for any objects in the fileystem
> > which are already labeled, and that a user cannot use an
> > unprivileged mount to gain elevated MAC privileges.
> >
> > sysfs, tmpfs, and ramfs are already mountable from user
> > namespaces and support security labels. We can't rule out the
> > possibility that these filesystems may already be used in mounts
> > from user namespaces with security lables set from the init
> > namespace, so failing to trust lables in these filesystems may
> > introduce regressions. It is safe to trust labels from these
> > filesystems, since the unprivileged user does not control the
> > backing store and thus cannot supply security labels, so an
> > explicit exception is made to trust labels from these
> > filesystems.
> >
> > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> 
> One coding comment below, otherwise looking good.
> 
> > ---
> >  security/smack/smack.h     |  6 ++++++
> >  security/smack/smack_lsm.c | 35 +++++++++++++++++++++++++++--------
> >  2 files changed, 33 insertions(+), 8 deletions(-)
> >
> > diff --git a/security/smack/smack.h b/security/smack/smack.h
> > index fff0c612bbb7..070223960a2c 100644
> > --- a/security/smack/smack.h
> > +++ b/security/smack/smack.h
> > @@ -91,8 +91,14 @@ struct superblock_smack {
> >  	struct smack_known	*smk_hat;
> >  	struct smack_known	*smk_default;
> >  	int			smk_initialized;
> > +	int			smk_flags;
> 
> How about deleting smk_initialized and using a bit
> in smk_flags. A whole int for each seems excessive.
> The smk_initialized field is only used in two places,
> both in smack_set_mnt_opts.

Sure, I can do that.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/7] userns: Simpilify MNT_NODEV handling.
  2015-09-17  0:54     ` Eric W. Biederman
@ 2015-09-17 22:15       ` Andy Lutomirski
  0 siblings, 0 replies; 16+ messages in thread
From: Andy Lutomirski @ 2015-09-17 22:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Layton, SELinux-NSA, LSM List, linux-mtd, Serge Hallyn,
	Alexander Viro, J. Bruce Fields, Linux FS Devel, linux-kernel,
	Seth Forshee

On Sep 16, 2015 6:01 PM, "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>
> Andy Lutomirski <luto@amacapital.net> writes:
>
> > On Wed, Sep 16, 2015 at 1:02 PM, Seth Forshee
> > <seth.forshee@canonical.com> wrote:
> >> From: "Eric W. Biederman" <ebiederm@xmission.com>
> >>
> >> - Consolidate the testing if a device node may be opened in a new
> >>   function may_open_dev.
> >>
> >> - Move the check for allowing access to device nodes on filesystems
> >>   not mounted in the initial user namespace from mount time to open
> >>   time and include it in may_open_dev.
> >>
> >> This set of changes removes the implicit adding of MNT_NODEV which
> >> simplifies the logic in fs/namespace.c and removes a potentially
> >> problematic user visible difference in how normal and unprivileged
> >> mount namespaces work.
> >>
> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> >
> >> -               /* Only in special cases allow devices from mounts
> >> -                * created outside the initial user namespace.
> >> -                */
> >> -               if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> >> -                       flags |= MS_NODEV;
> >> -                       mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
> >> -               }
> >
> > This is an ABI change.  It's probably okay, but I think the commit
> > message should make it clear what's happening.
>
> You mean it should include in big flashing neon letters
>             ***REGRESSION FIX***
> ?
>
> It is longer in coming than I had hoped.  But that is part of the reason
> I did not fix the security hole this way.  Getting the s_user_ns stuff
> just so has been non-trivial.
>
> I do agree that because this is a user visible change we do need to keep
> our eyes peeled for pieces of userspace software that may depend on the
> exact details of the current behavior.

Yeah.  "Removes a potentially problematic user visible difference"
sounds like the difference has been there forever.   If it needs to
get backported, people will appreciate it.

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 5/7] fs: Treat foreign mounts as nosuid
  2015-09-17 12:49     ` Seth Forshee
@ 2015-09-23 21:00       ` Andy Lutomirski
  0 siblings, 0 replies; 16+ messages in thread
From: Andy Lutomirski @ 2015-09-23 21:00 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn, James Morris,
	Serge E. Hallyn, Paul Moore, Stephen Smalley, Eric Paris,
	Linux FS Devel, LSM List, SELinux-NSA, linux-kernel, linux-mtd

On Thu, Sep 17, 2015 at 5:49 AM, Seth Forshee
<seth.forshee@canonical.com> wrote:
> On Wed, Sep 16, 2015 at 01:57:10PM -0700, Andy Lutomirski wrote:
>> On Wed, Sep 16, 2015 at 1:02 PM, Seth Forshee
>> <seth.forshee@canonical.com> wrote:
>> > From: Andy Lutomirski <luto@amacapital.net>
>> >
>> > If a process gets access to a mount from a different user
>> > namespace, that process should not be able to take advantage of
>> > setuid files or selinux entrypoints from that filesystem.  Prevent
>> > this by treating mounts from other mount namespaces and those not
>> > owned by current_user_ns() or an ancestor as nosuid.
>> >
>> > This will make it safer to allow more complex filesystems to be
>> > mounted in non-root user namespaces.
>> >
>> > This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
>> > setgid, and file capability bits can no longer be abused if code in
>> > a user namespace were to clear nosuid on an untrusted filesystem,
>> > but this patch, by itself, is insufficient to protect the system
>> > from abuse of files that, when execed, would increase MAC privilege.
>> >
>> > As a more concrete explanation, any task that can manipulate a
>> > vfsmount associated with a given user namespace already has
>> > capabilities in that namespace and all of its descendents.  If they
>> > can cause a malicious setuid, setgid, or file-caps executable to
>> > appear in that mount, then that executable will only allow them to
>> > elevate privileges in exactly the set of namespaces in which they
>> > are already privileges.
>> >
>> > On the other hand, if they can cause a malicious executable to
>> > appear with a dangerous MAC label, running it could change the
>> > caller's security context in a way that should not have been
>> > possible, even inside the namespace in which the task is confined.
>> >
>> > As a hardening measure, this would have made CVE-2014-5207 much
>> > more difficult to exploit.
>> >
>> > Signed-off-by: Andy Lutomirski <luto@amacapital.net>
>> > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
>> > ---
>> >  fs/exec.c                |  2 +-
>> >  fs/namespace.c           | 13 +++++++++++++
>> >  include/linux/mount.h    |  1 +
>> >  security/commoncap.c     |  2 +-
>> >  security/selinux/hooks.c |  2 +-
>> >  5 files changed, 17 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/fs/exec.c b/fs/exec.c
>> > index b06623a9347f..ea7311d72cc3 100644
>> > --- a/fs/exec.c
>> > +++ b/fs/exec.c
>> > @@ -1295,7 +1295,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
>> >         bprm->cred->euid = current_euid();
>> >         bprm->cred->egid = current_egid();
>> >
>> > -       if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
>> > +       if (!mnt_may_suid(bprm->file->f_path.mnt))
>> >                 return;
>> >
>> >         if (task_no_new_privs(current))
>> > diff --git a/fs/namespace.c b/fs/namespace.c
>> > index da70f7c4ece1..2101ce7b96ab 100644
>> > --- a/fs/namespace.c
>> > +++ b/fs/namespace.c
>> > @@ -3276,6 +3276,19 @@ found:
>> >         return visible;
>> >  }
>> >
>> > +bool mnt_may_suid(struct vfsmount *mnt)
>> > +{
>> > +       /*
>> > +        * Foreign mounts (accessed via fchdir or through /proc
>> > +        * symlinks) are always treated as if they are nosuid.  This
>> > +        * prevents namespaces from trusting potentially unsafe
>> > +        * suid/sgid bits, file caps, or security labels that originate
>> > +        * in other namespaces.
>> > +        */
>> > +       return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) &&
>> > +              in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns);
>>
>> Is check_mnt correct here?  If I read it correctly, this means that,
>> if I just unshare my userns and do nothing else (and, in particular,
>> don't unshare my mount namespace), then everything will have
>> mnt_may_suid return false.
>
> The condition in check_mnt is exactly the same as the condition that
> check_mnt replaces. If mnt_may_suid returned true before you unshared
> only your user namespace then it should also return true after unshare.
> The mount ns is the same as it was before so check_mnt returns true, and
> the new user namespace is a child of the previous one so in_userns also
> returns true.

Indeed, I was wrong.

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-09-23 21:00 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-16 20:02 [PATCH v3 0/7] Initial support for user namespace owned mounts Seth Forshee
2015-09-16 20:02 ` [PATCH v3 1/7] fs: Add user namesapace member to struct super_block Seth Forshee
2015-09-16 20:02 ` [PATCH v3 2/7] userns: Simpilify MNT_NODEV handling Seth Forshee
2015-09-17  0:24   ` Andy Lutomirski
2015-09-17  0:54     ` Eric W. Biederman
2015-09-17 22:15       ` Andy Lutomirski
2015-09-16 20:02 ` [PATCH v3 3/7] fs: Verify access of user towards block device file when mounting Seth Forshee
2015-09-16 20:02 ` [PATCH v3 4/7] fs: Limit file caps to the user namespace of the super block Seth Forshee
2015-09-16 20:02 ` [PATCH v3 5/7] fs: Treat foreign mounts as nosuid Seth Forshee
2015-09-16 20:57   ` Andy Lutomirski
2015-09-17 12:49     ` Seth Forshee
2015-09-23 21:00       ` Andy Lutomirski
2015-09-16 20:02 ` [PATCH v3 6/7] Smack: Add support for unprivileged mounts from user namespaces Seth Forshee
2015-09-16 20:33   ` Casey Schaufler
2015-09-17 12:50     ` Seth Forshee
2015-09-16 20:02 ` [PATCH v3 7/7] selinux: " Seth Forshee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).