[PATCH v4 0/7] Initial support for user namespace owned mounts

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/7] Initial support for user namespace owned mounts
@ 2015-09-23 20:16 Seth Forshee
  2015-09-23 20:16 ` [PATCH v4 1/7] fs: Add user namesapace member to struct super_block Seth Forshee
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Seth Forshee @ 2015-09-23 20:16 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, linux-mtd, linux-fsdevel,
	linux-security-module, selinux
  Cc: Serge Hallyn, Andy Lutomirski, linux-kernel, Seth Forshee

This is the fourth revision of the patch series introducing the concept
of superblocks owned by user namespaces, containing only trivial changes
from the previous revision.

These are the first in a larger set of patches, with the goal of
eventually allowing some regular filesystem types to be mounted in
unprivileged containers. The full series is available at:

  git://kernel.ubuntu.com/sforshee/linux.git userns-mounts

The strategy for this series is to do as much of the heavy lifting as
possible in the vfs to minimize the need to handle edge cases in
individual filesystems. The patches that follow lay some of the
groundwork and fall into two groups:

 1. Patches 1-2 add s_user_ns to struct superblock and use it to
    simplify MNT_NODEV handling.

 2. Patches 3-7 tighten down security for mounts with s_user_ns !=
    &init_user_ns.

Note that these patches only address security at the vfs level. As has
been discussed previously, individual filesystems may still be
vulnerable to attacks via malicious metadata in the backing store. The
goal is to find a small set of filesystems which can be hardened from
attacks from below. I am initially targeting fuse, which has been
designed to resist such attacks, and ext4, which is so far standing up
quite well to fuzzing.

Changes since v3:
 - Reword commit message for the MNT_NODEV handling changes to better
   emphasize that it results in a user visible change in behavior.
 - Consolidate smk_initialized into smk_flags as requested by Casey.

Andy Lutomirski (1):
  fs: Treat foreign mounts as nosuid

Eric W. Biederman (1):
  userns: Simpilify MNT_NODEV handling.

Seth Forshee (5):
  fs: Add user namesapace member to struct super_block
  fs: Verify access of user towards block device file when mounting
  fs: Limit file caps to the user namespace of the super block
  Smack: Add support for unprivileged mounts from user namespaces
  selinux: Add support for unprivileged mounts from user namespaces

 drivers/mtd/mtdsuper.c         |  7 +++++-
 fs/block_dev.c                 | 54 +++++++++++++++++++++++++++++++++---------
 fs/exec.c                      |  2 +-
 fs/namei.c                     |  9 ++++++-
 fs/namespace.c                 | 34 +++++++++++++++-----------
 fs/proc/root.c                 |  3 ++-
 fs/super.c                     | 38 +++++++++++++++++++++++++----
 include/linux/fs.h             | 11 ++++++++-
 include/linux/mount.h          |  1 +
 include/linux/user_namespace.h |  8 +++++++
 kernel/user_namespace.c        | 14 +++++++++++
 security/commoncap.c           |  4 +++-
 security/selinux/hooks.c       | 25 ++++++++++++++++++-
 security/smack/smack.h         |  8 ++++++-
 security/smack/smack_lsm.c     | 41 +++++++++++++++++++++++---------
 15 files changed, 210 insertions(+), 49 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 1/7] fs: Add user namesapace member to struct super_block
  2015-09-23 20:16 [PATCH v4 0/7] Initial support for user namespace owned mounts Seth Forshee
@ 2015-09-23 20:16 ` Seth Forshee
  2015-09-24 21:14   ` Eric W. Biederman
  2015-09-23 20:16 ` [PATCH v4 2/7] userns: Simpilify MNT_NODEV handling Seth Forshee
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Seth Forshee @ 2015-09-23 20:16 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Jeff Layton, J. Bruce Fields
  Cc: Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	Seth Forshee

Initially this will be used to eliminate the implicit MNT_NODEV
flag for mounts from user namespaces. In the future it will also
be used for translating ids and checking capabilities for
filesystems mounted from user namespaces.

s_user_ns is initialized in alloc_super() and is generally set to
current_user_ns(). To avoid security and corruption issues, two
additional mount checks are also added:

 - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
   in current_user_ns().

 - sget() will fail with EBUSY when the filesystem it's looking
   for is already mounted from another user namespace.

proc requires some special handling. The user namespace of
current isn't appropriate when forking as a result of clone (2)
with CLONE_NEWPID|CLONE_NEWUSER, as it will set s_user_ns to the
namespace of the parent and make proc unmountable in the new user
namespace. Instead, the user namespace which owns the new pid
namespace is used. sget_userns() is allowed to allow passing in
a namespace other than that of current, and sget becomes a
wrapper around sget_userns() which passes current_user_ns().

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 fs/namespace.c     |  3 +++
 fs/proc/root.c     |  3 ++-
 fs/super.c         | 38 +++++++++++++++++++++++++++++++++-----
 include/linux/fs.h |  9 ++++++++-
 4 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 0570729c87fd..d023a353dc63 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2381,6 +2381,9 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 	struct vfsmount *mnt;
 	int err;
 
+	if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
+		return -EPERM;
+
 	if (!fstype)
 		return -EINVAL;
 
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 361ab4ee42fc..4b302cbf13f9 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -117,7 +117,8 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 			return ERR_PTR(-EPERM);
 	}
 
-	sb = sget(fs_type, proc_test_super, proc_set_super, flags, ns);
+	sb = sget_userns(fs_type, proc_test_super, proc_set_super, flags,
+			 ns->user_ns, ns);
 	if (IS_ERR(sb))
 		return ERR_CAST(sb);
 
diff --git a/fs/super.c b/fs/super.c
index 954aeb80e202..42837da7d641 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -33,6 +33,7 @@
 #include <linux/cleancache.h>
 #include <linux/fsnotify.h>
 #include <linux/lockdep.h>
+#include <linux/user_namespace.h>
 #include "internal.h"
 
 
@@ -163,6 +164,7 @@ static void destroy_super(struct super_block *s)
 {
 	list_lru_destroy(&s->s_dentry_lru);
 	list_lru_destroy(&s->s_inode_lru);
+	put_user_ns(s->s_user_ns);
 	security_sb_free(s);
 	WARN_ON(!list_empty(&s->s_mounts));
 	kfree(s->s_subtype);
@@ -178,7 +180,8 @@ static void destroy_super(struct super_block *s)
  *	Allocates and initializes a new &struct super_block.  alloc_super()
  *	returns a pointer new superblock or %NULL if allocation had failed.
  */
-static struct super_block *alloc_super(struct file_system_type *type, int flags)
+static struct super_block *alloc_super(struct file_system_type *type, int flags,
+				       struct user_namespace *user_ns)
 {
 	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
 	static const struct super_operations default_op;
@@ -246,6 +249,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 	s->s_shrink.count_objects = super_cache_count;
 	s->s_shrink.batch = 1024;
 	s->s_shrink.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE;
+
+	s->s_user_ns = get_user_ns(user_ns);
 	return s;
 
 fail:
@@ -442,17 +447,17 @@ void generic_shutdown_super(struct super_block *sb)
 EXPORT_SYMBOL(generic_shutdown_super);
 
 /**
- *	sget	-	find or create a superblock
+ *	sget_userns -	find or create a superblock
  *	@type:	filesystem type superblock should belong to
  *	@test:	comparison callback
  *	@set:	setup callback
  *	@flags:	mount flags
  *	@data:	argument to each of them
  */
-struct super_block *sget(struct file_system_type *type,
+struct super_block *sget_userns(struct file_system_type *type,
 			int (*test)(struct super_block *,void *),
 			int (*set)(struct super_block *,void *),
-			int flags,
+			int flags, struct user_namespace *user_ns,
 			void *data)
 {
 	struct super_block *s = NULL;
@@ -465,6 +470,10 @@ retry:
 		hlist_for_each_entry(old, &type->fs_supers, s_instances) {
 			if (!test(old, data))
 				continue;
+			if (user_ns != old->s_user_ns) {
+				spin_unlock(&sb_lock);
+				return ERR_PTR(-EBUSY);
+			}
 			if (!grab_super(old))
 				goto retry;
 			if (s) {
@@ -477,7 +486,7 @@ retry:
 	}
 	if (!s) {
 		spin_unlock(&sb_lock);
-		s = alloc_super(type, flags);
+		s = alloc_super(type, flags, user_ns);
 		if (!s)
 			return ERR_PTR(-ENOMEM);
 		goto retry;
@@ -500,6 +509,25 @@ retry:
 	return s;
 }
 
+EXPORT_SYMBOL(sget_userns);
+
+/**
+ *	sget	-	find or create a superblock
+ *	@type:	  filesystem type superblock should belong to
+ *	@test:	  comparison callback
+ *	@set:	  setup callback
+ *	@flags:	  mount flags
+ *	@data:	  argument to each of them
+ */
+struct super_block *sget(struct file_system_type *type,
+			int (*test)(struct super_block *,void *),
+			int (*set)(struct super_block *,void *),
+			int flags,
+			void *data)
+{
+	return sget_userns(type, test, set, flags, current_user_ns(), data);
+}
+
 EXPORT_SYMBOL(sget);
 
 void drop_super(struct super_block *sb)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 72d8a844c692..79c15ab2159d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -31,6 +31,7 @@
 #include <linux/blk_types.h>
 #include <linux/workqueue.h>
 #include <linux/percpu-rwsem.h>
+#include <linux/user_namespace.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -1367,6 +1368,8 @@ struct super_block {
 	struct workqueue_struct *s_dio_done_wq;
 	struct hlist_head s_pins;
 
+	struct user_namespace *s_user_ns;
+
 	/*
 	 * Keep the lru lists last in the structure so they always sit on their
 	 * own individual cachelines.
@@ -1509,7 +1512,6 @@ static inline void sb_start_intwrite(struct super_block *sb)
 	__sb_start_write(sb, SB_FREEZE_FS, true);
 }
 
-
 extern bool inode_owner_or_capable(const struct inode *inode);
 
 /*
@@ -1984,6 +1986,11 @@ void deactivate_locked_super(struct super_block *sb);
 int set_anon_super(struct super_block *s, void *data);
 int get_anon_bdev(dev_t *);
 void free_anon_bdev(dev_t);
+struct super_block *sget_userns(struct file_system_type *type,
+			int (*test)(struct super_block *,void *),
+			int (*set)(struct super_block *,void *),
+			int flags, struct user_namespace *user_ns,
+			void *data);
 struct super_block *sget(struct file_system_type *type,
 			int (*test)(struct super_block *,void *),
 			int (*set)(struct super_block *,void *),
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 2/7] userns: Simpilify MNT_NODEV handling.
  2015-09-23 20:16 [PATCH v4 0/7] Initial support for user namespace owned mounts Seth Forshee
  2015-09-23 20:16 ` [PATCH v4 1/7] fs: Add user namesapace member to struct super_block Seth Forshee
@ 2015-09-23 20:16 ` Seth Forshee
  2015-09-23 20:16 ` [PATCH v4 3/7] fs: Verify access of user towards block device file when mounting Seth Forshee
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Seth Forshee @ 2015-09-23 20:16 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Jeff Layton, J. Bruce Fields
  Cc: Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	Seth Forshee

From: "Eric W. Biederman" <ebiederm@xmission.com>

- Consolidate the testing if a device node may be opened in a new
  function may_open_dev.

- Move the check for allowing access to device nodes on filesystems
  not mounted in the initial user namespace from mount time to open
  time and include it in may_open_dev.

This set of changes removes the implicit adding of MNT_NODEV which
simplifies the logic in fs/namespace.c and removes a potentially
problematic difference in how normal and unprivileged mount
namespaces work.  This is a user visible change in behavior for
remount in unpriviliged mount namespaces but is unlikely to cause
problems for existing software.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/block_dev.c     |  2 +-
 fs/namei.c         |  9 ++++++++-
 fs/namespace.c     | 18 ++++--------------
 include/linux/fs.h |  1 +
 4 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 073bb57adab1..46bd98482f71 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1737,7 +1737,7 @@ struct block_device *lookup_bdev(const char *pathname)
 	if (!S_ISBLK(inode->i_mode))
 		goto fail;
 	error = -EACCES;
-	if (path.mnt->mnt_flags & MNT_NODEV)
+	if (!may_open_dev(&path))
 		goto fail;
 	error = -ENOMEM;
 	bdev = bd_acquire(inode);
diff --git a/fs/namei.c b/fs/namei.c
index 726d211db484..fcc5751d6395 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2663,6 +2663,13 @@ int vfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
 }
 EXPORT_SYMBOL(vfs_create);
 
+bool may_open_dev(const struct path *path)
+{
+	return !(path->mnt->mnt_flags & MNT_NODEV) &&
+		((path->mnt->mnt_sb->s_user_ns == &init_user_ns) ||
+		 (path->mnt->mnt_sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT));
+}
+
 static int may_open(struct path *path, int acc_mode, int flag)
 {
 	struct dentry *dentry = path->dentry;
@@ -2685,7 +2692,7 @@ static int may_open(struct path *path, int acc_mode, int flag)
 		break;
 	case S_IFBLK:
 	case S_IFCHR:
-		if (path->mnt->mnt_flags & MNT_NODEV)
+		if (!may_open_dev(path))
 			return -EACCES;
 		/*FALLTHRU*/
 	case S_IFIFO:
diff --git a/fs/namespace.c b/fs/namespace.c
index d023a353dc63..da70f7c4ece1 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2177,13 +2177,7 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	}
 	if ((mnt->mnt.mnt_flags & MNT_LOCK_NODEV) &&
 	    !(mnt_flags & MNT_NODEV)) {
-		/* Was the nodev implicitly added in mount? */
-		if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
-		    !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
-			mnt_flags |= MNT_NODEV;
-		} else {
-			return -EPERM;
-		}
+		return -EPERM;
 	}
 	if ((mnt->mnt.mnt_flags & MNT_LOCK_NOSUID) &&
 	    !(mnt_flags & MNT_NOSUID)) {
@@ -2396,13 +2390,6 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 			put_filesystem(type);
 			return -EPERM;
 		}
-		/* Only in special cases allow devices from mounts
-		 * created outside the initial user namespace.
-		 */
-		if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
-			flags |= MS_NODEV;
-			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
-		}
 		if (type->fs_flags & FS_USERNS_VISIBLE) {
 			if (!fs_fully_visible(type, &mnt_flags))
 				return -EPERM;
@@ -3238,6 +3225,9 @@ static bool fs_fully_visible(struct file_system_type *type, int *new_mnt_flags)
 		mnt_flags = mnt->mnt.mnt_flags;
 		if (mnt->mnt.mnt_sb->s_iflags & SB_I_NOEXEC)
 			mnt_flags &= ~(MNT_LOCK_NOSUID | MNT_LOCK_NOEXEC);
+		if (mnt->mnt.mnt_sb->s_user_ns != &init_user_ns &&
+		    !(mnt->mnt.mnt_sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT))
+			mnt_flags &= ~(MNT_LOCK_NODEV);
 
 		/* Verify the mount flags are equal to or more permissive
 		 * than the proposed new mount.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 79c15ab2159d..5ec201e8308c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1537,6 +1537,7 @@ extern void dentry_unhash(struct dentry *dentry);
  */
 extern void inode_init_owner(struct inode *inode, const struct inode *dir,
 			umode_t mode);
+extern bool may_open_dev(const struct path *path);
 /*
  * VFS FS_IOC_FIEMAP helper definitions.
  */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 3/7] fs: Verify access of user towards block device file when mounting
  2015-09-23 20:16 [PATCH v4 0/7] Initial support for user namespace owned mounts Seth Forshee
  2015-09-23 20:16 ` [PATCH v4 1/7] fs: Add user namesapace member to struct super_block Seth Forshee
  2015-09-23 20:16 ` [PATCH v4 2/7] userns: Simpilify MNT_NODEV handling Seth Forshee
@ 2015-09-23 20:16 ` Seth Forshee
  2015-09-24 21:53   ` Eric W. Biederman
  2015-09-23 20:16 ` [PATCH v4 4/7] fs: Limit file caps to the user namespace of the super block Seth Forshee
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Seth Forshee @ 2015-09-23 20:16 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, David Woodhouse, Brian Norris,
	Jeff Layton, J. Bruce Fields
  Cc: Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	Seth Forshee

When mounting a filesystem on a block device there is currently
no verification that the user has appropriate access to the
device file passed to mount. This has not been an issue so far
since the user in question has always been root, but this must
be changed before allowing unprivileged users to mount in user
namespaces.

To do this, a new version of lookup_bdev() is added named
lookup_bdev_perm(). Both of these functions become wrappers
around a common inner fucntion. The behavior of lookup_bdev() is
unchanged, but calling lookup_bdev_perm() will fail if the user
does not have the specified access rights to the supplied path.
The permission check is skipped if the user has CAP_SYS_ADMIN to
avoid any possible regressions in behavior.

blkdev_get_by_path() is updated to use lookup_bdev_perm(). This
is used by mount_bdev() and mount_mtd(), so this will cause
mounts on block devices to fail when the user lacks the required
permissions. Other calls to blkdev_get_by_path() will all happen
with root privileges, so these calls will be unaffected.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 drivers/mtd/mtdsuper.c |  7 ++++++-
 fs/block_dev.c         | 52 ++++++++++++++++++++++++++++++++++++++++----------
 include/linux/fs.h     |  1 +
 3 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
index 20c02a3b7417..a3970b0506b9 100644
--- a/drivers/mtd/mtdsuper.c
+++ b/drivers/mtd/mtdsuper.c
@@ -125,6 +125,7 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
 #ifdef CONFIG_BLOCK
 	struct block_device *bdev;
 	int ret, major;
+	int perm = 0;
 #endif
 	int mtdnr;
 
@@ -176,7 +177,11 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
 	/* try the old way - the hack where we allowed users to mount
 	 * /dev/mtdblock$(n) but didn't actually _use_ the blockdev
 	 */
-	bdev = lookup_bdev(dev_name);
+	if (flags & FMODE_READ)
+		perm |= MAY_READ;
+	if (flags & FMODE_WRITE)
+		perm |= MAY_WRITE;
+	bdev = lookup_bdev_perm(dev_name, perm);
 	if (IS_ERR(bdev)) {
 		ret = PTR_ERR(bdev);
 		pr_debug("MTDSB: lookup_bdev() returned %d\n", ret);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 46bd98482f71..7ca0f8911e97 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1401,9 +1401,14 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
 					void *holder)
 {
 	struct block_device *bdev;
+	int perm = 0;
 	int err;
 
-	bdev = lookup_bdev(path);
+	if (mode & FMODE_READ)
+		perm |= MAY_READ;
+	if (mode & FMODE_WRITE)
+		perm |= MAY_WRITE;
+	bdev = lookup_bdev_perm(path, perm);
 	if (IS_ERR(bdev))
 		return bdev;
 
@@ -1710,15 +1715,8 @@ int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
 
 EXPORT_SYMBOL(ioctl_by_bdev);
 
-/**
- * lookup_bdev  - lookup a struct block_device by name
- * @pathname:	special file representing the block device
- *
- * Get a reference to the blockdevice at @pathname in the current
- * namespace if possible and return it.  Return ERR_PTR(error)
- * otherwise.
- */
-struct block_device *lookup_bdev(const char *pathname)
+static struct block_device *__lookup_bdev(const char *pathname, int mask,
+					  bool check_perm)
 {
 	struct block_device *bdev;
 	struct inode *inode;
@@ -1733,6 +1731,11 @@ struct block_device *lookup_bdev(const char *pathname)
 		return ERR_PTR(error);
 
 	inode = d_backing_inode(path.dentry);
+	if (check_perm && !capable(CAP_SYS_ADMIN)) {
+		error = inode_permission(inode, mask);
+		if (error)
+			goto fail;
+	}
 	error = -ENOTBLK;
 	if (!S_ISBLK(inode->i_mode))
 		goto fail;
@@ -1750,6 +1753,35 @@ fail:
 	bdev = ERR_PTR(error);
 	goto out;
 }
+
+/*
+ * lookup_bdev_perm - lookup a struct block_device by name and check for
+ *		      access rights
+ * @pathname:	special file representing the block device
+ * @mask:	rights to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
+ *
+ * Get a reference to the blockdevice at @pathname in the current
+ * namespace if possible, check for the specified rights to the device
+ * at @pathname, and return it.  Return ERR_PTR(error) otherwise.
+ */
+struct block_device *lookup_bdev_perm(const char *pathname, int mask)
+{
+	return __lookup_bdev(pathname, mask, true);
+}
+EXPORT_SYMBOL(lookup_bdev_perm);
+
+/**
+ * lookup_bdev  - lookup a struct block_device by name
+ * @pathname:	special file representing the block device
+ *
+ * Get a reference to the blockdevice at @pathname in the current
+ * namespace if possible and return it.  Return ERR_PTR(error)
+ * otherwise.
+ */
+struct block_device *lookup_bdev(const char *pathname)
+{
+	return __lookup_bdev(pathname, 0, false);
+}
 EXPORT_SYMBOL(lookup_bdev);
 
 int __invalidate_device(struct block_device *bdev, bool kill_dirty)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 5ec201e8308c..d03c2061bdd2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2383,6 +2383,7 @@ static inline void unregister_chrdev(unsigned int major, const char *name)
 #define BLKDEV_MAJOR_HASH_SIZE	255
 extern const char *__bdevname(dev_t, char *buffer);
 extern const char *bdevname(struct block_device *bdev, char *buffer);
+extern struct block_device *lookup_bdev_perm(const char *, int);
 extern struct block_device *lookup_bdev(const char *);
 extern void blkdev_show(struct seq_file *,off_t);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 4/7] fs: Limit file caps to the user namespace of the super block
  2015-09-23 20:16 [PATCH v4 0/7] Initial support for user namespace owned mounts Seth Forshee
                   ` (2 preceding siblings ...)
  2015-09-23 20:16 ` [PATCH v4 3/7] fs: Verify access of user towards block device file when mounting Seth Forshee
@ 2015-09-23 20:16 ` Seth Forshee
  2015-09-24 21:59   ` Eric W. Biederman
  2015-09-23 20:16 ` [PATCH v4 5/7] fs: Treat foreign mounts as nosuid Seth Forshee
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Seth Forshee @ 2015-09-23 20:16 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Serge Hallyn, James Morris,
	Serge E. Hallyn
  Cc: Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd, Seth Forshee

Capability sets attached to files must be ignored except in the
user namespaces where the mounter is privileged, i.e. s_user_ns
and its descendants. Otherwise a vector exists for gaining
privileges in namespaces where a user is not already privileged.

Add a new helper function, in_user_ns(), to test whether a user
namespace is the same as or a descendant of another namespace.
Use this helper to determine whether a file's capability set
should be applied to the caps constructed during exec.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 include/linux/user_namespace.h |  8 ++++++++
 kernel/user_namespace.c        | 14 ++++++++++++++
 security/commoncap.c           |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 8297e5b341d8..a43faa727124 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -72,6 +72,8 @@ extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t,
 extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
 extern int proc_setgroups_show(struct seq_file *m, void *v);
 extern bool userns_may_setgroups(const struct user_namespace *ns);
+extern bool in_userns(const struct user_namespace *ns,
+		      const struct user_namespace *target_ns);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -100,6 +102,12 @@ static inline bool userns_may_setgroups(const struct user_namespace *ns)
 {
 	return true;
 }
+
+static inline bool in_userns(const struct user_namespace *ns,
+			     const struct user_namespace *target_ns)
+{
+	return true;
+}
 #endif
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 88fefa68c516..69fbc377357b 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -945,6 +945,20 @@ bool userns_may_setgroups(const struct user_namespace *ns)
 	return allowed;
 }
 
+/*
+ * Returns true if @ns is the same namespace as or a descendant of
+ * @target_ns.
+ */
+bool in_userns(const struct user_namespace *ns,
+	       const struct user_namespace *target_ns)
+{
+	for (; ns; ns = ns->parent) {
+		if (ns == target_ns)
+			return true;
+	}
+	return false;
+}
+
 static inline struct user_namespace *to_user_ns(struct ns_common *ns)
 {
 	return container_of(ns, struct user_namespace, ns);
diff --git a/security/commoncap.c b/security/commoncap.c
index 1832cf701c3d..400aa224b491 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -450,6 +450,8 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 
 	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
 		return 0;
+	if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns))
+		return 0;
 
 	rc = get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
 	if (rc < 0) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 5/7] fs: Treat foreign mounts as nosuid
  2015-09-23 20:16 [PATCH v4 0/7] Initial support for user namespace owned mounts Seth Forshee
                   ` (3 preceding siblings ...)
  2015-09-23 20:16 ` [PATCH v4 4/7] fs: Limit file caps to the user namespace of the super block Seth Forshee
@ 2015-09-23 20:16 ` Seth Forshee
  2015-09-23 20:16 ` [PATCH v4 6/7] Smack: Add support for unprivileged mounts from user namespaces Seth Forshee
  2015-09-23 20:16 ` [PATCH v4 7/7] selinux: " Seth Forshee
  6 siblings, 0 replies; 23+ messages in thread
From: Seth Forshee @ 2015-09-23 20:16 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Serge Hallyn, James Morris,
	Serge E. Hallyn, Paul Moore, Stephen Smalley, Eric Paris
  Cc: Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd, Seth Forshee

From: Andy Lutomirski <luto@amacapital.net>

If a process gets access to a mount from a different user
namespace, that process should not be able to take advantage of
setuid files or selinux entrypoints from that filesystem.  Prevent
this by treating mounts from other mount namespaces and those not
owned by current_user_ns() or an ancestor as nosuid.

This will make it safer to allow more complex filesystems to be
mounted in non-root user namespaces.

This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
setgid, and file capability bits can no longer be abused if code in
a user namespace were to clear nosuid on an untrusted filesystem,
but this patch, by itself, is insufficient to protect the system
from abuse of files that, when execed, would increase MAC privilege.

As a more concrete explanation, any task that can manipulate a
vfsmount associated with a given user namespace already has
capabilities in that namespace and all of its descendents.  If they
can cause a malicious setuid, setgid, or file-caps executable to
appear in that mount, then that executable will only allow them to
elevate privileges in exactly the set of namespaces in which they
are already privileges.

On the other hand, if they can cause a malicious executable to
appear with a dangerous MAC label, running it could change the
caller's security context in a way that should not have been
possible, even inside the namespace in which the task is confined.

As a hardening measure, this would have made CVE-2014-5207 much
more difficult to exploit.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 fs/exec.c                |  2 +-
 fs/namespace.c           | 13 +++++++++++++
 include/linux/mount.h    |  1 +
 security/commoncap.c     |  2 +-
 security/selinux/hooks.c |  2 +-
 5 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b06623a9347f..ea7311d72cc3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1295,7 +1295,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
 	bprm->cred->euid = current_euid();
 	bprm->cred->egid = current_egid();
 
-	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+	if (!mnt_may_suid(bprm->file->f_path.mnt))
 		return;
 
 	if (task_no_new_privs(current))
diff --git a/fs/namespace.c b/fs/namespace.c
index da70f7c4ece1..2101ce7b96ab 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3276,6 +3276,19 @@ found:
 	return visible;
 }
 
+bool mnt_may_suid(struct vfsmount *mnt)
+{
+	/*
+	 * Foreign mounts (accessed via fchdir or through /proc
+	 * symlinks) are always treated as if they are nosuid.  This
+	 * prevents namespaces from trusting potentially unsafe
+	 * suid/sgid bits, file caps, or security labels that originate
+	 * in other namespaces.
+	 */
+	return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) &&
+	       in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns);
+}
+
 static struct ns_common *mntns_get(struct task_struct *task)
 {
 	struct ns_common *ns = NULL;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index f822c3c11377..54a594d49733 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -81,6 +81,7 @@ extern void mntput(struct vfsmount *mnt);
 extern struct vfsmount *mntget(struct vfsmount *mnt);
 extern struct vfsmount *mnt_clone_internal(struct path *path);
 extern int __mnt_is_readonly(struct vfsmount *mnt);
+extern bool mnt_may_suid(struct vfsmount *mnt);
 
 struct path;
 extern struct vfsmount *clone_private_mount(struct path *path);
diff --git a/security/commoncap.c b/security/commoncap.c
index 400aa224b491..6243aef5860e 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -448,7 +448,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 	if (!file_caps_enabled)
 		return 0;
 
-	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+	if (!mnt_may_suid(bprm->file->f_path.mnt))
 		return 0;
 	if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns))
 		return 0;
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index e4369d86e588..de05207eb665 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2171,7 +2171,7 @@ static int check_nnp_nosuid(const struct linux_binprm *bprm,
 			    const struct task_security_struct *new_tsec)
 {
 	int nnp = (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS);
-	int nosuid = (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID);
+	int nosuid = !mnt_may_suid(bprm->file->f_path.mnt);
 	int rc;
 
 	if (!nnp && !nosuid)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 6/7] Smack: Add support for unprivileged mounts from user namespaces
  2015-09-23 20:16 [PATCH v4 0/7] Initial support for user namespace owned mounts Seth Forshee
                   ` (4 preceding siblings ...)
  2015-09-23 20:16 ` [PATCH v4 5/7] fs: Treat foreign mounts as nosuid Seth Forshee
@ 2015-09-23 20:16 ` Seth Forshee
  2015-09-24 22:16   ` Eric W. Biederman
                     ` (2 more replies)
  2015-09-23 20:16 ` [PATCH v4 7/7] selinux: " Seth Forshee
  6 siblings, 3 replies; 23+ messages in thread
From: Seth Forshee @ 2015-09-23 20:16 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Casey Schaufler
  Cc: Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	Seth Forshee, James Morris, Serge E. Hallyn

Security labels from unprivileged mounts cannot be trusted.
Ideally for these mounts we would assign the objects in the
filesystem the same label as the inode for the backing device
passed to mount. Unfortunately it's currently impossible to
determine which inode this is from the LSM mount hooks, so we
settle for the label of the process doing the mount.

This label is assigned to s_root, and also to smk_default to
ensure that new inodes receive this label. The transmute property
is also set on s_root to make this behavior more explicit, even
though it is technically not necessary.

If a filesystem has existing security labels, access to inodes is
permitted if the label is the same as smk_root, otherwise access
is denied. The SMACK64EXEC xattr is completely ignored.

Explicit setting of security labels continues to require
CAP_MAC_ADMIN in init_user_ns.

Altogether, this ensures that filesystem objects are not
accessible to subjects which cannot already access the backing
store, that MAC is not violated for any objects in the fileystem
which are already labeled, and that a user cannot use an
unprivileged mount to gain elevated MAC privileges.

sysfs, tmpfs, and ramfs are already mountable from user
namespaces and support security labels. We can't rule out the
possibility that these filesystems may already be used in mounts
from user namespaces with security lables set from the init
namespace, so failing to trust lables in these filesystems may
introduce regressions. It is safe to trust labels from these
filesystems, since the unprivileged user does not control the
backing store and thus cannot supply security labels, so an
explicit exception is made to trust labels from these
filesystems.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 security/smack/smack.h     |  8 +++++++-
 security/smack/smack_lsm.c | 41 ++++++++++++++++++++++++++++++-----------
 2 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/security/smack/smack.h b/security/smack/smack.h
index fff0c612bbb7..f95759015f29 100644
--- a/security/smack/smack.h
+++ b/security/smack/smack.h
@@ -90,9 +90,15 @@ struct superblock_smack {
 	struct smack_known	*smk_floor;
 	struct smack_known	*smk_hat;
 	struct smack_known	*smk_default;
-	int			smk_initialized;
+	int			smk_flags;
 };
 
+/*
+ * Superblock flags
+ */
+#define SMK_SB_INITIALIZED	0x01
+#define SMK_SB_UNTRUSTED	0x02
+
 struct socket_smack {
 	struct smack_known	*smk_out;	/* outbound label */
 	struct smack_known	*smk_in;	/* inbound label */
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 996c88956438..621200f86b56 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -521,7 +521,7 @@ static int smack_sb_alloc_security(struct super_block *sb)
 	sbsp->smk_floor = &smack_known_floor;
 	sbsp->smk_hat = &smack_known_hat;
 	/*
-	 * smk_initialized will be zero from kzalloc.
+	 * SMK_SB_INITIALIZED will be zero from kzalloc.
 	 */
 	sb->s_security = sbsp;
 
@@ -738,10 +738,10 @@ static int smack_set_mnt_opts(struct super_block *sb,
 	int num_opts = opts->num_mnt_opts;
 	int transmute = 0;
 
-	if (sp->smk_initialized)
+	if (sp->smk_flags & SMK_SB_INITIALIZED)
 		return 0;
 
-	sp->smk_initialized = 1;
+	sp->smk_flags |= SMK_SB_INITIALIZED;
 
 	for (i = 0; i < num_opts; i++) {
 		switch (opts->mnt_opts_flags[i]) {
@@ -793,6 +793,17 @@ static int smack_set_mnt_opts(struct super_block *sb,
 		skp = smk_of_current();
 		sp->smk_root = skp;
 		sp->smk_default = skp;
+		/*
+		 * For a handful of fs types with no user-controlled
+		 * backing store it's okay to trust security labels
+		 * in the filesystem. The rest are untrusted.
+		 */
+		if (sb->s_user_ns != &init_user_ns &&
+		    sb->s_magic != SYSFS_MAGIC && sb->s_magic != TMPFS_MAGIC &&
+		    sb->s_magic != RAMFS_MAGIC) {
+			transmute = 1;
+			sp->smk_flags |= SMK_SB_UNTRUSTED;
+		}
 	}
 
 	/*
@@ -1175,6 +1186,7 @@ static int smack_inode_rename(struct inode *old_inode,
  */
 static int smack_inode_permission(struct inode *inode, int mask)
 {
+	struct superblock_smack *sbsp = inode->i_sb->s_security;
 	struct smk_audit_info ad;
 	int no_block = mask & MAY_NOT_BLOCK;
 	int rc;
@@ -1186,6 +1198,11 @@ static int smack_inode_permission(struct inode *inode, int mask)
 	if (mask == 0)
 		return 0;
 
+	if (sbsp->smk_flags & SMK_SB_UNTRUSTED) {
+		if (smk_of_inode(inode) != sbsp->smk_root)
+			return -EACCES;
+	}
+
 	/* May be droppable after audit */
 	if (no_block)
 		return -ECHILD;
@@ -3475,14 +3492,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
 			if (rc >= 0)
 				transflag = SMK_INODE_TRANSMUTE;
 		}
-		/*
-		 * Don't let the exec or mmap label be "*" or "@".
-		 */
-		skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
-		if (IS_ERR(skp) || skp == &smack_known_star ||
-		    skp == &smack_known_web)
-			skp = NULL;
-		isp->smk_task = skp;
+		if (!(sbsp->smk_flags & SMK_SB_UNTRUSTED)) {
+			/*
+			 * Don't let the exec or mmap label be "*" or "@".
+			 */
+			skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
+			if (IS_ERR(skp) || skp == &smack_known_star ||
+			    skp == &smack_known_web)
+				skp = NULL;
+			isp->smk_task = skp;
+		}
 
 		skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp);
 		if (IS_ERR(skp) || skp == &smack_known_star ||
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 7/7] selinux: Add support for unprivileged mounts from user namespaces
  2015-09-23 20:16 [PATCH v4 0/7] Initial support for user namespace owned mounts Seth Forshee
                   ` (5 preceding siblings ...)
  2015-09-23 20:16 ` [PATCH v4 6/7] Smack: Add support for unprivileged mounts from user namespaces Seth Forshee
@ 2015-09-23 20:16 ` Seth Forshee
  6 siblings, 0 replies; 23+ messages in thread
From: Seth Forshee @ 2015-09-23 20:16 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Paul Moore, Stephen Smalley,
	Eric Paris
  Cc: Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	Seth Forshee, James Morris, Serge E. Hallyn

Security labels from unprivileged mounts in user namespaces must
be ignored. Force superblocks from user namespaces whose labeling
behavior is to use xattrs to use mountpoint labeling instead.
For the mountpoint label, default to converting the current task
context into a form suitable for file objects, but also allow the
policy writer to specify a different label through policy
transition rules.

Pieced together from code snippets provided by Stephen Smalley.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 security/selinux/hooks.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index de05207eb665..09be1dc21e58 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -756,6 +756,28 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 			goto out;
 		}
 	}
+
+	/*
+	 * If this is a user namespace mount, no contexts are allowed
+	 * on the command line and security labels must be ignored.
+	 */
+	if (sb->s_user_ns != &init_user_ns) {
+		if (context_sid || fscontext_sid || rootcontext_sid ||
+		    defcontext_sid) {
+			rc = -EACCES;
+			goto out;
+		}
+		if (sbsec->behavior == SECURITY_FS_USE_XATTR) {
+			sbsec->behavior = SECURITY_FS_USE_MNTPOINT;
+			rc = security_transition_sid(current_sid(), current_sid(),
+						     SECCLASS_FILE, NULL,
+						     &sbsec->mntpoint_sid);
+			if (rc)
+				goto out;
+		}
+		goto out_set_opts;
+	}
+
 	/* sets the context of the superblock for the fs being mounted. */
 	if (fscontext_sid) {
 		rc = may_context_mount_sb_relabel(fscontext_sid, sbsec, cred);
@@ -824,6 +846,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 		sbsec->def_sid = defcontext_sid;
 	}
 
+out_set_opts:
 	rc = sb_finish_set_opts(sb);
 out:
 	mutex_unlock(&sbsec->lock);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/7] fs: Add user namesapace member to struct super_block
  2015-09-23 20:16 ` [PATCH v4 1/7] fs: Add user namesapace member to struct super_block Seth Forshee
@ 2015-09-24 21:14   ` Eric W. Biederman
  2015-09-25 12:54     ` Seth Forshee
  0 siblings, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2015-09-24 21:14 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Jeff Layton, J. Bruce Fields, Serge Hallyn,
	Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd

Seth Forshee <seth.forshee@canonical.com> writes:

> Initially this will be used to eliminate the implicit MNT_NODEV
> flag for mounts from user namespaces. In the future it will also
> be used for translating ids and checking capabilities for
> filesystems mounted from user namespaces.
>
> s_user_ns is initialized in alloc_super() and is generally set to
> current_user_ns(). To avoid security and corruption issues, two
> additional mount checks are also added:
>
>  - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
>    in current_user_ns().
>
>  - sget() will fail with EBUSY when the filesystem it's looking
>    for is already mounted from another user namespace.
>
> proc requires some special handling. The user namespace of
> current isn't appropriate when forking as a result of clone (2)
> with CLONE_NEWPID|CLONE_NEWUSER, as it will set s_user_ns to the
> namespace of the parent and make proc unmountable in the new user
> namespace. Instead, the user namespace which owns the new pid
> namespace is used. sget_userns() is allowed to allow passing in
> a namespace other than that of current, and sget becomes a
> wrapper around sget_userns() which passes current_user_ns().

Minor nits below.  I have fixed them up.

> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> ---
>  fs/namespace.c     |  3 +++
>  fs/proc/root.c     |  3 ++-
>  fs/super.c         | 38 +++++++++++++++++++++++++++++++++-----
>  include/linux/fs.h |  9 ++++++++-
>  4 files changed, 46 insertions(+), 7 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 0570729c87fd..d023a353dc63 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2381,6 +2381,9 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
>  	struct vfsmount *mnt;
>  	int err;
>  
> +	if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
> +		return -EPERM;
> +
>  	if (!fstype)
>  		return -EINVAL;
>  
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 361ab4ee42fc..4b302cbf13f9 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -117,7 +117,8 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
>  			return ERR_PTR(-EPERM);
>  	}
>  
> -	sb = sget(fs_type, proc_test_super, proc_set_super, flags, ns);
> +	sb = sget_userns(fs_type, proc_test_super, proc_set_super, flags,
> +			 ns->user_ns, ns);
>  	if (IS_ERR(sb))
>  		return ERR_CAST(sb);
>  
> diff --git a/fs/super.c b/fs/super.c
> index 954aeb80e202..42837da7d641 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -33,6 +33,7 @@
>  #include <linux/cleancache.h>
>  #include <linux/fsnotify.h>
>  #include <linux/lockdep.h>
> +#include <linux/user_namespace.h>
>  #include "internal.h"
>  
>  
> @@ -163,6 +164,7 @@ static void destroy_super(struct super_block *s)
>  {
>  	list_lru_destroy(&s->s_dentry_lru);
>  	list_lru_destroy(&s->s_inode_lru);
> +	put_user_ns(s->s_user_ns);
>  	security_sb_free(s);
>  	WARN_ON(!list_empty(&s->s_mounts));
>  	kfree(s->s_subtype);
> @@ -178,7 +180,8 @@ static void destroy_super(struct super_block *s)
>   *	Allocates and initializes a new &struct super_block.  alloc_super()
>   *	returns a pointer new superblock or %NULL if allocation had failed.
>   */
> -static struct super_block *alloc_super(struct file_system_type *type, int flags)
> +static struct super_block *alloc_super(struct file_system_type *type, int flags,
> +				       struct user_namespace *user_ns)
>  {
>  	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
>  	static const struct super_operations default_op;
> @@ -246,6 +249,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
>  	s->s_shrink.count_objects = super_cache_count;
>  	s->s_shrink.batch = 1024;
>  	s->s_shrink.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE;
> +
> +	s->s_user_ns = get_user_ns(user_ns);
>  	return s;
>  
>  fail:
> @@ -442,17 +447,17 @@ void generic_shutdown_super(struct super_block *sb)
>  EXPORT_SYMBOL(generic_shutdown_super);
>  
>  /**
> - *	sget	-	find or create a superblock
> + *	sget_userns -	find or create a superblock
>   *	@type:	filesystem type superblock should belong to
>   *	@test:	comparison callback
>   *	@set:	setup callback
>   *	@flags:	mount flags

You don't mention the user namespace parameter here.  I have fixed that
as.

  + *     @user_ns: User namespace you need CAP_SYS_ADMIN over to mount this fs.

>   *	@data:	argument to each of them
>   */
> -struct super_block *sget(struct file_system_type *type,
> +struct super_block *sget_userns(struct file_system_type *type,
>  			int (*test)(struct super_block *,void *),
>  			int (*set)(struct super_block *,void *),
> -			int flags,
> +			int flags, struct user_namespace *user_ns,
>  			void *data)
>  {
>  	struct super_block *s = NULL;
> @@ -465,6 +470,10 @@ retry:
>  		hlist_for_each_entry(old, &type->fs_supers, s_instances) {
>  			if (!test(old, data))
>  				continue;
> +			if (user_ns != old->s_user_ns) {
> +				spin_unlock(&sb_lock);
> +				return ERR_PTR(-EBUSY);
> +			}
>  			if (!grab_super(old))
>  				goto retry;
>  			if (s) {
> @@ -477,7 +486,7 @@ retry:
>  	}
>  	if (!s) {
>  		spin_unlock(&sb_lock);
> -		s = alloc_super(type, flags);
> +		s = alloc_super(type, flags, user_ns);
>  		if (!s)
>  			return ERR_PTR(-ENOMEM);
>  		goto retry;
> @@ -500,6 +509,25 @@ retry:
>  	return s;
>  }
>  
> +EXPORT_SYMBOL(sget_userns);
> +
> +/**
> + *	sget	-	find or create a superblock
> + *	@type:	  filesystem type superblock should belong to
> + *	@test:	  comparison callback
> + *	@set:	  setup callback
> + *	@flags:	  mount flags
> + *	@data:	  argument to each of them
> + */
> +struct super_block *sget(struct file_system_type *type,
> +			int (*test)(struct super_block *,void *),
> +			int (*set)(struct super_block *,void *),
> +			int flags,
> +			void *data)
> +{
> +	return sget_userns(type, test, set, flags, current_user_ns(), data);
> +}
> +
>  EXPORT_SYMBOL(sget);
>  
>  void drop_super(struct super_block *sb)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 72d8a844c692..79c15ab2159d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -31,6 +31,7 @@
>  #include <linux/blk_types.h>
>  #include <linux/workqueue.h>
>  #include <linux/percpu-rwsem.h>
> +#include <linux/user_namespace.h>
>  
>  #include <asm/byteorder.h>
>  #include <uapi/linux/fs.h>
> @@ -1367,6 +1368,8 @@ struct super_block {
>  	struct workqueue_struct *s_dio_done_wq;
>  	struct hlist_head s_pins;
>  
> +	struct user_namespace *s_user_ns;
> +
>  	/*
>  	 * Keep the lru lists last in the structure so they always sit on their
>  	 * own individual cachelines.
> @@ -1509,7 +1512,6 @@ static inline void sb_start_intwrite(struct super_block *sb)
>  	__sb_start_write(sb, SB_FREEZE_FS, true);
>  }
>  
> -

You are unncessarily deleting a line here.

>  extern bool inode_owner_or_capable(const struct inode *inode);
>  
>  /*
> @@ -1984,6 +1986,11 @@ void deactivate_locked_super(struct super_block *sb);
>  int set_anon_super(struct super_block *s, void *data);
>  int get_anon_bdev(dev_t *);
>  void free_anon_bdev(dev_t);
> +struct super_block *sget_userns(struct file_system_type *type,
> +			int (*test)(struct super_block *,void *),
> +			int (*set)(struct super_block *,void *),
> +			int flags, struct user_namespace *user_ns,
> +			void *data);
>  struct super_block *sget(struct file_system_type *type,
>  			int (*test)(struct super_block *,void *),
>  			int (*set)(struct super_block *,void *),

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/7] fs: Verify access of user towards block device file when mounting
  2015-09-23 20:16 ` [PATCH v4 3/7] fs: Verify access of user towards block device file when mounting Seth Forshee
@ 2015-09-24 21:53   ` Eric W. Biederman
  2015-09-25 12:48     ` Seth Forshee
  0 siblings, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2015-09-24 21:53 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, David Woodhouse, Brian Norris, Jeff Layton,
	J. Bruce Fields, Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd

Seth Forshee <seth.forshee@canonical.com> writes:

> When mounting a filesystem on a block device there is currently
> no verification that the user has appropriate access to the
> device file passed to mount. This has not been an issue so far
> since the user in question has always been root, but this must
> be changed before allowing unprivileged users to mount in user
> namespaces.
>
> To do this, a new version of lookup_bdev() is added named
> lookup_bdev_perm(). Both of these functions become wrappers
> around a common inner fucntion. The behavior of lookup_bdev() is
> unchanged, but calling lookup_bdev_perm() will fail if the user
> does not have the specified access rights to the supplied path.
> The permission check is skipped if the user has CAP_SYS_ADMIN to
> avoid any possible regressions in behavior.
>
> blkdev_get_by_path() is updated to use lookup_bdev_perm(). This
> is used by mount_bdev() and mount_mtd(), so this will cause
> mounts on block devices to fail when the user lacks the required
> permissions. Other calls to blkdev_get_by_path() will all happen
> with root privileges, so these calls will be unaffected.

Good but buggy patch.

In the mtd bits the flags are super flags, not file mode bits,
which makes testing them against FMODE_READ and FMODE_WRITE is
incorrect.

Further it looks like quite a few more possibly all of the lookup_bdev
instances could use inode level permission checking.

Certainly code such as quotactl makes me wonder.

Ugh.  Your code needs to be using __inode_permission and not
inode_permission.

The readability or writability of a device node has little or nothing to
do with the readability/writability of the superblock.   So I don't see
any reason we should be checking that.

Eric
>
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> ---
>  drivers/mtd/mtdsuper.c |  7 ++++++-
>  fs/block_dev.c         | 52 ++++++++++++++++++++++++++++++++++++++++----------
>  include/linux/fs.h     |  1 +
>  3 files changed, 49 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
> index 20c02a3b7417..a3970b0506b9 100644
> --- a/drivers/mtd/mtdsuper.c
> +++ b/drivers/mtd/mtdsuper.c
> @@ -125,6 +125,7 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
>  #ifdef CONFIG_BLOCK
>  	struct block_device *bdev;
>  	int ret, major;
> +	int perm = 0;
>  #endif
>  	int mtdnr;
>  
> @@ -176,7 +177,11 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
>  	/* try the old way - the hack where we allowed users to mount
>  	 * /dev/mtdblock$(n) but didn't actually _use_ the blockdev
>  	 */
> -	bdev = lookup_bdev(dev_name);
> +	if (flags & FMODE_READ)
> +		perm |= MAY_READ;
> +	if (flags & FMODE_WRITE)
> +		perm |= MAY_WRITE;
> +	bdev = lookup_bdev_perm(dev_name, perm);
>  	if (IS_ERR(bdev)) {
>  		ret = PTR_ERR(bdev);
>  		pr_debug("MTDSB: lookup_bdev() returned %d\n", ret);
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 46bd98482f71..7ca0f8911e97 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1401,9 +1401,14 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
>  					void *holder)
>  {
>  	struct block_device *bdev;
> +	int perm = 0;
>  	int err;
>  
> -	bdev = lookup_bdev(path);
> +	if (mode & FMODE_READ)
> +		perm |= MAY_READ;
> +	if (mode & FMODE_WRITE)
> +		perm |= MAY_WRITE;
> +	bdev = lookup_bdev_perm(path, perm);
>  	if (IS_ERR(bdev))
>  		return bdev;
>  
> @@ -1710,15 +1715,8 @@ int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
>  
>  EXPORT_SYMBOL(ioctl_by_bdev);
>  
> -/**
> - * lookup_bdev  - lookup a struct block_device by name
> - * @pathname:	special file representing the block device
> - *
> - * Get a reference to the blockdevice at @pathname in the current
> - * namespace if possible and return it.  Return ERR_PTR(error)
> - * otherwise.
> - */
> -struct block_device *lookup_bdev(const char *pathname)
> +static struct block_device *__lookup_bdev(const char *pathname, int mask,
> +					  bool check_perm)
>  {
>  	struct block_device *bdev;
>  	struct inode *inode;
> @@ -1733,6 +1731,11 @@ struct block_device *lookup_bdev(const char *pathname)
>  		return ERR_PTR(error);
>  
>  	inode = d_backing_inode(path.dentry);
> +	if (check_perm && !capable(CAP_SYS_ADMIN)) {
> +		error = inode_permission(inode, mask);
> +		if (error)
> +			goto fail;
> +	}
>  	error = -ENOTBLK;
>  	if (!S_ISBLK(inode->i_mode))
>  		goto fail;
> @@ -1750,6 +1753,35 @@ fail:
>  	bdev = ERR_PTR(error);
>  	goto out;
>  }
> +
> +/*
> + * lookup_bdev_perm - lookup a struct block_device by name and check for
> + *		      access rights
> + * @pathname:	special file representing the block device
> + * @mask:	rights to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
> + *
> + * Get a reference to the blockdevice at @pathname in the current
> + * namespace if possible, check for the specified rights to the device
> + * at @pathname, and return it.  Return ERR_PTR(error) otherwise.
> + */
> +struct block_device *lookup_bdev_perm(const char *pathname, int mask)
> +{
> +	return __lookup_bdev(pathname, mask, true);
> +}
> +EXPORT_SYMBOL(lookup_bdev_perm);
> +
> +/**
> + * lookup_bdev  - lookup a struct block_device by name
> + * @pathname:	special file representing the block device
> + *
> + * Get a reference to the blockdevice at @pathname in the current
> + * namespace if possible and return it.  Return ERR_PTR(error)
> + * otherwise.
> + */
> +struct block_device *lookup_bdev(const char *pathname)
> +{
> +	return __lookup_bdev(pathname, 0, false);
> +}
>  EXPORT_SYMBOL(lookup_bdev);
>  
>  int __invalidate_device(struct block_device *bdev, bool kill_dirty)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 5ec201e8308c..d03c2061bdd2 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2383,6 +2383,7 @@ static inline void unregister_chrdev(unsigned int major, const char *name)
>  #define BLKDEV_MAJOR_HASH_SIZE	255
>  extern const char *__bdevname(dev_t, char *buffer);
>  extern const char *bdevname(struct block_device *bdev, char *buffer);
> +extern struct block_device *lookup_bdev_perm(const char *, int);
>  extern struct block_device *lookup_bdev(const char *);
>  extern void blkdev_show(struct seq_file *,off_t);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 4/7] fs: Limit file caps to the user namespace of the super block
  2015-09-23 20:16 ` [PATCH v4 4/7] fs: Limit file caps to the user namespace of the super block Seth Forshee
@ 2015-09-24 21:59   ` Eric W. Biederman
  2015-09-25 12:49     ` Seth Forshee
  0 siblings, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2015-09-24 21:59 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Serge Hallyn, James Morris, Serge E. Hallyn,
	Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd

Seth Forshee <seth.forshee@canonical.com> writes:

> Capability sets attached to files must be ignored except in the
> user namespaces where the mounter is privileged, i.e. s_user_ns
> and its descendants. Otherwise a vector exists for gaining
> privileges in namespaces where a user is not already privileged.
>
> Add a new helper function, in_user_ns(), to test whether a user
> namespace is the same as or a descendant of another namespace.
> Use this helper to determine whether a file's capability set
> should be applied to the caps constructed during exec.

No issues with this but given that we always pass current_user_ns()
we may want to simplify the users of in_user_ns by renaming it
current_in_user_ns() and hard codeing current_user_ns().

Eric


> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> ---
>  include/linux/user_namespace.h |  8 ++++++++
>  kernel/user_namespace.c        | 14 ++++++++++++++
>  security/commoncap.c           |  2 ++
>  3 files changed, 24 insertions(+)
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 8297e5b341d8..a43faa727124 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -72,6 +72,8 @@ extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t,
>  extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
>  extern int proc_setgroups_show(struct seq_file *m, void *v);
>  extern bool userns_may_setgroups(const struct user_namespace *ns);
> +extern bool in_userns(const struct user_namespace *ns,
> +		      const struct user_namespace *target_ns);
>  #else
>  
>  static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
> @@ -100,6 +102,12 @@ static inline bool userns_may_setgroups(const struct user_namespace *ns)
>  {
>  	return true;
>  }
> +
> +static inline bool in_userns(const struct user_namespace *ns,
> +			     const struct user_namespace *target_ns)
> +{
> +	return true;
> +}
>  #endif
>  
>  #endif /* _LINUX_USER_H */
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 88fefa68c516..69fbc377357b 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -945,6 +945,20 @@ bool userns_may_setgroups(const struct user_namespace *ns)
>  	return allowed;
>  }
>  
> +/*
> + * Returns true if @ns is the same namespace as or a descendant of
> + * @target_ns.
> + */
> +bool in_userns(const struct user_namespace *ns,
> +	       const struct user_namespace *target_ns)
> +{
> +	for (; ns; ns = ns->parent) {
> +		if (ns == target_ns)
> +			return true;
> +	}
> +	return false;
> +}
> +
>  static inline struct user_namespace *to_user_ns(struct ns_common *ns)
>  {
>  	return container_of(ns, struct user_namespace, ns);
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 1832cf701c3d..400aa224b491 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -450,6 +450,8 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
>  
>  	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
>  		return 0;
> +	if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns))
> +		return 0;
>  
>  	rc = get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
>  	if (rc < 0) {

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 6/7] Smack: Add support for unprivileged mounts from user namespaces
  2015-09-23 20:16 ` [PATCH v4 6/7] Smack: Add support for unprivileged mounts from user namespaces Seth Forshee
@ 2015-09-24 22:16   ` Eric W. Biederman
  2015-09-24 22:34   ` Casey Schaufler
  2015-09-27 19:30   ` Eric W. Biederman
  2 siblings, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2015-09-24 22:16 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Casey Schaufler, Serge Hallyn, Andy Lutomirski,
	linux-fsdevel, linux-security-module, selinux, linux-kernel,
	linux-mtd, James Morris, Serge E. Hallyn

Seth Forshee <seth.forshee@canonical.com> writes:

> Security labels from unprivileged mounts cannot be trusted.
> Ideally for these mounts we would assign the objects in the
> filesystem the same label as the inode for the backing device
> passed to mount. Unfortunately it's currently impossible to
> determine which inode this is from the LSM mount hooks, so we
> settle for the label of the process doing the mount.
>
> This label is assigned to s_root, and also to smk_default to
> ensure that new inodes receive this label. The transmute property
> is also set on s_root to make this behavior more explicit, even
> though it is technically not necessary.
>
> If a filesystem has existing security labels, access to inodes is
> permitted if the label is the same as smk_root, otherwise access
> is denied. The SMACK64EXEC xattr is completely ignored.
>
> Explicit setting of security labels continues to require
> CAP_MAC_ADMIN in init_user_ns.
>
> Altogether, this ensures that filesystem objects are not
> accessible to subjects which cannot already access the backing
> store, that MAC is not violated for any objects in the fileystem
> which are already labeled, and that a user cannot use an
> unprivileged mount to gain elevated MAC privileges.
>
> sysfs, tmpfs, and ramfs are already mountable from user
> namespaces and support security labels. We can't rule out the
> possibility that these filesystems may already be used in mounts
> from user namespaces with security lables set from the init
> namespace, so failing to trust lables in these filesystems may
> introduce regressions. It is safe to trust labels from these
> filesystems, since the unprivileged user does not control the
> backing store and thus cannot supply security labels, so an
> explicit exception is made to trust labels from these
> filesystems.

Casey can I get your ack on this patch?  Or do you still have concerns?

Eric

> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> ---
>  security/smack/smack.h     |  8 +++++++-
>  security/smack/smack_lsm.c | 41 ++++++++++++++++++++++++++++++-----------
>  2 files changed, 37 insertions(+), 12 deletions(-)
>
> diff --git a/security/smack/smack.h b/security/smack/smack.h
> index fff0c612bbb7..f95759015f29 100644
> --- a/security/smack/smack.h
> +++ b/security/smack/smack.h
> @@ -90,9 +90,15 @@ struct superblock_smack {
>  	struct smack_known	*smk_floor;
>  	struct smack_known	*smk_hat;
>  	struct smack_known	*smk_default;
> -	int			smk_initialized;
> +	int			smk_flags;
>  };
>  
> +/*
> + * Superblock flags
> + */
> +#define SMK_SB_INITIALIZED	0x01
> +#define SMK_SB_UNTRUSTED	0x02
> +
>  struct socket_smack {
>  	struct smack_known	*smk_out;	/* outbound label */
>  	struct smack_known	*smk_in;	/* inbound label */
> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
> index 996c88956438..621200f86b56 100644
> --- a/security/smack/smack_lsm.c
> +++ b/security/smack/smack_lsm.c
> @@ -521,7 +521,7 @@ static int smack_sb_alloc_security(struct super_block *sb)
>  	sbsp->smk_floor = &smack_known_floor;
>  	sbsp->smk_hat = &smack_known_hat;
>  	/*
> -	 * smk_initialized will be zero from kzalloc.
> +	 * SMK_SB_INITIALIZED will be zero from kzalloc.
>  	 */
>  	sb->s_security = sbsp;
>  
> @@ -738,10 +738,10 @@ static int smack_set_mnt_opts(struct super_block *sb,
>  	int num_opts = opts->num_mnt_opts;
>  	int transmute = 0;
>  
> -	if (sp->smk_initialized)
> +	if (sp->smk_flags & SMK_SB_INITIALIZED)
>  		return 0;
>  
> -	sp->smk_initialized = 1;
> +	sp->smk_flags |= SMK_SB_INITIALIZED;
>  
>  	for (i = 0; i < num_opts; i++) {
>  		switch (opts->mnt_opts_flags[i]) {
> @@ -793,6 +793,17 @@ static int smack_set_mnt_opts(struct super_block *sb,
>  		skp = smk_of_current();
>  		sp->smk_root = skp;
>  		sp->smk_default = skp;
> +		/*
> +		 * For a handful of fs types with no user-controlled
> +		 * backing store it's okay to trust security labels
> +		 * in the filesystem. The rest are untrusted.
> +		 */
> +		if (sb->s_user_ns != &init_user_ns &&
> +		    sb->s_magic != SYSFS_MAGIC && sb->s_magic != TMPFS_MAGIC &&
> +		    sb->s_magic != RAMFS_MAGIC) {
> +			transmute = 1;
> +			sp->smk_flags |= SMK_SB_UNTRUSTED;
> +		}
>  	}
>  
>  	/*
> @@ -1175,6 +1186,7 @@ static int smack_inode_rename(struct inode *old_inode,
>   */
>  static int smack_inode_permission(struct inode *inode, int mask)
>  {
> +	struct superblock_smack *sbsp = inode->i_sb->s_security;
>  	struct smk_audit_info ad;
>  	int no_block = mask & MAY_NOT_BLOCK;
>  	int rc;
> @@ -1186,6 +1198,11 @@ static int smack_inode_permission(struct inode *inode, int mask)
>  	if (mask == 0)
>  		return 0;
>  
> +	if (sbsp->smk_flags & SMK_SB_UNTRUSTED) {
> +		if (smk_of_inode(inode) != sbsp->smk_root)
> +			return -EACCES;
> +	}
> +
>  	/* May be droppable after audit */
>  	if (no_block)
>  		return -ECHILD;
> @@ -3475,14 +3492,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
>  			if (rc >= 0)
>  				transflag = SMK_INODE_TRANSMUTE;
>  		}
> -		/*
> -		 * Don't let the exec or mmap label be "*" or "@".
> -		 */
> -		skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
> -		if (IS_ERR(skp) || skp == &smack_known_star ||
> -		    skp == &smack_known_web)
> -			skp = NULL;
> -		isp->smk_task = skp;
> +		if (!(sbsp->smk_flags & SMK_SB_UNTRUSTED)) {
> +			/*
> +			 * Don't let the exec or mmap label be "*" or "@".
> +			 */
> +			skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
> +			if (IS_ERR(skp) || skp == &smack_known_star ||
> +			    skp == &smack_known_web)
> +				skp = NULL;
> +			isp->smk_task = skp;
> +		}
>  
>  		skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp);
>  		if (IS_ERR(skp) || skp == &smack_known_star ||

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 6/7] Smack: Add support for unprivileged mounts from user namespaces
  2015-09-23 20:16 ` [PATCH v4 6/7] Smack: Add support for unprivileged mounts from user namespaces Seth Forshee
  2015-09-24 22:16   ` Eric W. Biederman
@ 2015-09-24 22:34   ` Casey Schaufler
  2015-09-27 19:30   ` Eric W. Biederman
  2 siblings, 0 replies; 23+ messages in thread
From: Casey Schaufler @ 2015-09-24 22:34 UTC (permalink / raw)
  To: Seth Forshee, Eric W. Biederman, Alexander Viro
  Cc: Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	James Morris, Serge E. Hallyn, Casey Schaufler

On 9/23/2015 1:16 PM, Seth Forshee wrote:
> Security labels from unprivileged mounts cannot be trusted.
> Ideally for these mounts we would assign the objects in the
> filesystem the same label as the inode for the backing device
> passed to mount. Unfortunately it's currently impossible to
> determine which inode this is from the LSM mount hooks, so we
> settle for the label of the process doing the mount.
>
> This label is assigned to s_root, and also to smk_default to
> ensure that new inodes receive this label. The transmute property
> is also set on s_root to make this behavior more explicit, even
> though it is technically not necessary.
>
> If a filesystem has existing security labels, access to inodes is
> permitted if the label is the same as smk_root, otherwise access
> is denied. The SMACK64EXEC xattr is completely ignored.
>
> Explicit setting of security labels continues to require
> CAP_MAC_ADMIN in init_user_ns.
>
> Altogether, this ensures that filesystem objects are not
> accessible to subjects which cannot already access the backing
> store, that MAC is not violated for any objects in the fileystem
> which are already labeled, and that a user cannot use an
> unprivileged mount to gain elevated MAC privileges.
>
> sysfs, tmpfs, and ramfs are already mountable from user
> namespaces and support security labels. We can't rule out the
> possibility that these filesystems may already be used in mounts
> from user namespaces with security lables set from the init
> namespace, so failing to trust lables in these filesystems may
> introduce regressions. It is safe to trust labels from these
> filesystems, since the unprivileged user does not control the
> backing store and thus cannot supply security labels, so an
> explicit exception is made to trust labels from these
> filesystems.
>
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

Acked-by: Casey Schaufler <casey@schaufler-ca.com>

> ---
>  security/smack/smack.h     |  8 +++++++-
>  security/smack/smack_lsm.c | 41 ++++++++++++++++++++++++++++++-----------
>  2 files changed, 37 insertions(+), 12 deletions(-)
>
> diff --git a/security/smack/smack.h b/security/smack/smack.h
> index fff0c612bbb7..f95759015f29 100644
> --- a/security/smack/smack.h
> +++ b/security/smack/smack.h
> @@ -90,9 +90,15 @@ struct superblock_smack {
>  	struct smack_known	*smk_floor;
>  	struct smack_known	*smk_hat;
>  	struct smack_known	*smk_default;
> -	int			smk_initialized;
> +	int			smk_flags;
>  };
>  
> +/*
> + * Superblock flags
> + */
> +#define SMK_SB_INITIALIZED	0x01
> +#define SMK_SB_UNTRUSTED	0x02
> +
>  struct socket_smack {
>  	struct smack_known	*smk_out;	/* outbound label */
>  	struct smack_known	*smk_in;	/* inbound label */
> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
> index 996c88956438..621200f86b56 100644
> --- a/security/smack/smack_lsm.c
> +++ b/security/smack/smack_lsm.c
> @@ -521,7 +521,7 @@ static int smack_sb_alloc_security(struct super_block *sb)
>  	sbsp->smk_floor = &smack_known_floor;
>  	sbsp->smk_hat = &smack_known_hat;
>  	/*
> -	 * smk_initialized will be zero from kzalloc.
> +	 * SMK_SB_INITIALIZED will be zero from kzalloc.
>  	 */
>  	sb->s_security = sbsp;
>  
> @@ -738,10 +738,10 @@ static int smack_set_mnt_opts(struct super_block *sb,
>  	int num_opts = opts->num_mnt_opts;
>  	int transmute = 0;
>  
> -	if (sp->smk_initialized)
> +	if (sp->smk_flags & SMK_SB_INITIALIZED)
>  		return 0;
>  
> -	sp->smk_initialized = 1;
> +	sp->smk_flags |= SMK_SB_INITIALIZED;
>  
>  	for (i = 0; i < num_opts; i++) {
>  		switch (opts->mnt_opts_flags[i]) {
> @@ -793,6 +793,17 @@ static int smack_set_mnt_opts(struct super_block *sb,
>  		skp = smk_of_current();
>  		sp->smk_root = skp;
>  		sp->smk_default = skp;
> +		/*
> +		 * For a handful of fs types with no user-controlled
> +		 * backing store it's okay to trust security labels
> +		 * in the filesystem. The rest are untrusted.
> +		 */
> +		if (sb->s_user_ns != &init_user_ns &&
> +		    sb->s_magic != SYSFS_MAGIC && sb->s_magic != TMPFS_MAGIC &&
> +		    sb->s_magic != RAMFS_MAGIC) {
> +			transmute = 1;
> +			sp->smk_flags |= SMK_SB_UNTRUSTED;
> +		}
>  	}
>  
>  	/*
> @@ -1175,6 +1186,7 @@ static int smack_inode_rename(struct inode *old_inode,
>   */
>  static int smack_inode_permission(struct inode *inode, int mask)
>  {
> +	struct superblock_smack *sbsp = inode->i_sb->s_security;
>  	struct smk_audit_info ad;
>  	int no_block = mask & MAY_NOT_BLOCK;
>  	int rc;
> @@ -1186,6 +1198,11 @@ static int smack_inode_permission(struct inode *inode, int mask)
>  	if (mask == 0)
>  		return 0;
>  
> +	if (sbsp->smk_flags & SMK_SB_UNTRUSTED) {
> +		if (smk_of_inode(inode) != sbsp->smk_root)
> +			return -EACCES;
> +	}
> +
>  	/* May be droppable after audit */
>  	if (no_block)
>  		return -ECHILD;
> @@ -3475,14 +3492,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
>  			if (rc >= 0)
>  				transflag = SMK_INODE_TRANSMUTE;
>  		}
> -		/*
> -		 * Don't let the exec or mmap label be "*" or "@".
> -		 */
> -		skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
> -		if (IS_ERR(skp) || skp == &smack_known_star ||
> -		    skp == &smack_known_web)
> -			skp = NULL;
> -		isp->smk_task = skp;
> +		if (!(sbsp->smk_flags & SMK_SB_UNTRUSTED)) {
> +			/*
> +			 * Don't let the exec or mmap label be "*" or "@".
> +			 */
> +			skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
> +			if (IS_ERR(skp) || skp == &smack_known_star ||
> +			    skp == &smack_known_web)
> +				skp = NULL;
> +			isp->smk_task = skp;
> +		}
>  
>  		skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp);
>  		if (IS_ERR(skp) || skp == &smack_known_star ||


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/7] fs: Verify access of user towards block device file when mounting
  2015-09-24 21:53   ` Eric W. Biederman
@ 2015-09-25 12:48     ` Seth Forshee
  2015-09-25 17:16       ` Eric W. Biederman
  0 siblings, 1 reply; 23+ messages in thread
From: Seth Forshee @ 2015-09-25 12:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Viro, David Woodhouse, Brian Norris, Jeff Layton,
	J. Bruce Fields, Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd

On Thu, Sep 24, 2015 at 04:53:11PM -0500, Eric W. Biederman wrote:
> Seth Forshee <seth.forshee@canonical.com> writes:
> 
> > When mounting a filesystem on a block device there is currently
> > no verification that the user has appropriate access to the
> > device file passed to mount. This has not been an issue so far
> > since the user in question has always been root, but this must
> > be changed before allowing unprivileged users to mount in user
> > namespaces.
> >
> > To do this, a new version of lookup_bdev() is added named
> > lookup_bdev_perm(). Both of these functions become wrappers
> > around a common inner fucntion. The behavior of lookup_bdev() is
> > unchanged, but calling lookup_bdev_perm() will fail if the user
> > does not have the specified access rights to the supplied path.
> > The permission check is skipped if the user has CAP_SYS_ADMIN to
> > avoid any possible regressions in behavior.
> >
> > blkdev_get_by_path() is updated to use lookup_bdev_perm(). This
> > is used by mount_bdev() and mount_mtd(), so this will cause
> > mounts on block devices to fail when the user lacks the required
> > permissions. Other calls to blkdev_get_by_path() will all happen
> > with root privileges, so these calls will be unaffected.
> 
> Good but buggy patch.
> 
> In the mtd bits the flags are super flags, not file mode bits,
> which makes testing them against FMODE_READ and FMODE_WRITE is
> incorrect.

Bah, yes. Fixed.

> Further it looks like quite a few more possibly all of the lookup_bdev
> instances could use inode level permission checking.
> 
> Certainly code such as quotactl makes me wonder.

I opted to stick to places related to mounting, but let's take a look at
the other callers.

bcache calls it in the context of sysfs writes, and those attributes are
writable only by root. In that case the inode permission check will be
skipped anyway, so it makes no difference either way.

Device mapper calls it in dm_get_device, which is called from a bunch of
places. I had started trying to walk back through all the callers of
dm_get_device, but that rabbit hole got really deep really quickly so I
didn't feel confident that changing it wouldn't break anyone.

quotactl gave me pause, as it seems to have done for you too. I was
surprised that inode permissions aren't checked, but
check_quotactl_permission does get called before actually doing
anything. I fear that adding a check of inode permissions could end up
breaking someone.

> Ugh.  Your code needs to be using __inode_permission and not
> inode_permission.
> 
> The readability or writability of a device node has little or nothing to
> do with the readability/writability of the superblock.   So I don't see
> any reason we should be checking that.

Makes sense, fixed.

Thanks,
Seth


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 4/7] fs: Limit file caps to the user namespace of the super block
  2015-09-24 21:59   ` Eric W. Biederman
@ 2015-09-25 12:49     ` Seth Forshee
  2015-09-25 17:57       ` Eric W. Biederman
  0 siblings, 1 reply; 23+ messages in thread
From: Seth Forshee @ 2015-09-25 12:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Viro, Serge Hallyn, James Morris, Serge E. Hallyn,
	Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd

On Thu, Sep 24, 2015 at 04:59:35PM -0500, Eric W. Biederman wrote:
> Seth Forshee <seth.forshee@canonical.com> writes:
> 
> > Capability sets attached to files must be ignored except in the
> > user namespaces where the mounter is privileged, i.e. s_user_ns
> > and its descendants. Otherwise a vector exists for gaining
> > privileges in namespaces where a user is not already privileged.
> >
> > Add a new helper function, in_user_ns(), to test whether a user
> > namespace is the same as or a descendant of another namespace.
> > Use this helper to determine whether a file's capability set
> > should be applied to the caps constructed during exec.
> 
> No issues with this but given that we always pass current_user_ns()
> we may want to simplify the users of in_user_ns by renaming it
> current_in_user_ns() and hard codeing current_user_ns().

Sure, if that's what you prefer then I'll change it.

Seth

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/7] fs: Add user namesapace member to struct super_block
  2015-09-24 21:14   ` Eric W. Biederman
@ 2015-09-25 12:54     ` Seth Forshee
  2015-09-25 17:27       ` Eric W. Biederman
  0 siblings, 1 reply; 23+ messages in thread
From: Seth Forshee @ 2015-09-25 12:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Viro, Jeff Layton, J. Bruce Fields, Serge Hallyn,
	Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd

On Thu, Sep 24, 2015 at 04:14:33PM -0500, Eric W. Biederman wrote:
> Seth Forshee <seth.forshee@canonical.com> writes:
> 
> > Initially this will be used to eliminate the implicit MNT_NODEV
> > flag for mounts from user namespaces. In the future it will also
> > be used for translating ids and checking capabilities for
> > filesystems mounted from user namespaces.
> >
> > s_user_ns is initialized in alloc_super() and is generally set to
> > current_user_ns(). To avoid security and corruption issues, two
> > additional mount checks are also added:
> >
> >  - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
> >    in current_user_ns().
> >
> >  - sget() will fail with EBUSY when the filesystem it's looking
> >    for is already mounted from another user namespace.
> >
> > proc requires some special handling. The user namespace of
> > current isn't appropriate when forking as a result of clone (2)
> > with CLONE_NEWPID|CLONE_NEWUSER, as it will set s_user_ns to the
> > namespace of the parent and make proc unmountable in the new user
> > namespace. Instead, the user namespace which owns the new pid
> > namespace is used. sget_userns() is allowed to allow passing in
> > a namespace other than that of current, and sget becomes a
> > wrapper around sget_userns() which passes current_user_ns().
> 
> Minor nits below.  I have fixed them up.
> 
> > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> > ---
> >  fs/namespace.c     |  3 +++
> >  fs/proc/root.c     |  3 ++-
> >  fs/super.c         | 38 +++++++++++++++++++++++++++++++++-----
> >  include/linux/fs.h |  9 ++++++++-
> >  4 files changed, 46 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 0570729c87fd..d023a353dc63 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -2381,6 +2381,9 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
> >  	struct vfsmount *mnt;
> >  	int err;
> >  
> > +	if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
> > +		return -EPERM;
> > +
> >  	if (!fstype)
> >  		return -EINVAL;
> >  
> > diff --git a/fs/proc/root.c b/fs/proc/root.c
> > index 361ab4ee42fc..4b302cbf13f9 100644
> > --- a/fs/proc/root.c
> > +++ b/fs/proc/root.c
> > @@ -117,7 +117,8 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
> >  			return ERR_PTR(-EPERM);
> >  	}
> >  
> > -	sb = sget(fs_type, proc_test_super, proc_set_super, flags, ns);
> > +	sb = sget_userns(fs_type, proc_test_super, proc_set_super, flags,
> > +			 ns->user_ns, ns);
> >  	if (IS_ERR(sb))
> >  		return ERR_CAST(sb);
> >  
> > diff --git a/fs/super.c b/fs/super.c
> > index 954aeb80e202..42837da7d641 100644
> > --- a/fs/super.c
> > +++ b/fs/super.c
> > @@ -33,6 +33,7 @@
> >  #include <linux/cleancache.h>
> >  #include <linux/fsnotify.h>
> >  #include <linux/lockdep.h>
> > +#include <linux/user_namespace.h>
> >  #include "internal.h"
> >  
> >  
> > @@ -163,6 +164,7 @@ static void destroy_super(struct super_block *s)
> >  {
> >  	list_lru_destroy(&s->s_dentry_lru);
> >  	list_lru_destroy(&s->s_inode_lru);
> > +	put_user_ns(s->s_user_ns);
> >  	security_sb_free(s);
> >  	WARN_ON(!list_empty(&s->s_mounts));
> >  	kfree(s->s_subtype);
> > @@ -178,7 +180,8 @@ static void destroy_super(struct super_block *s)
> >   *	Allocates and initializes a new &struct super_block.  alloc_super()
> >   *	returns a pointer new superblock or %NULL if allocation had failed.
> >   */
> > -static struct super_block *alloc_super(struct file_system_type *type, int flags)
> > +static struct super_block *alloc_super(struct file_system_type *type, int flags,
> > +				       struct user_namespace *user_ns)
> >  {
> >  	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
> >  	static const struct super_operations default_op;
> > @@ -246,6 +249,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
> >  	s->s_shrink.count_objects = super_cache_count;
> >  	s->s_shrink.batch = 1024;
> >  	s->s_shrink.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE;
> > +
> > +	s->s_user_ns = get_user_ns(user_ns);
> >  	return s;
> >  
> >  fail:
> > @@ -442,17 +447,17 @@ void generic_shutdown_super(struct super_block *sb)
> >  EXPORT_SYMBOL(generic_shutdown_super);
> >  
> >  /**
> > - *	sget	-	find or create a superblock
> > + *	sget_userns -	find or create a superblock
> >   *	@type:	filesystem type superblock should belong to
> >   *	@test:	comparison callback
> >   *	@set:	setup callback
> >   *	@flags:	mount flags
> 
> You don't mention the user namespace parameter here.  I have fixed that
> as.
> 
>   + *     @user_ns: User namespace you need CAP_SYS_ADMIN over to mount this fs.

Looks good, thanks. Seems I also missed it in alloc_super though.

Seth


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/7] fs: Verify access of user towards block device file when mounting
  2015-09-25 12:48     ` Seth Forshee
@ 2015-09-25 17:16       ` Eric W. Biederman
  2015-09-25 17:39         ` Seth Forshee
  0 siblings, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2015-09-25 17:16 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, David Woodhouse, Brian Norris, Jeff Layton,
	J. Bruce Fields, Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd


Argh.  This looks like morning person meets night owl.

Seth Forshee <seth.forshee@canonical.com> writes:

> On Thu, Sep 24, 2015 at 04:53:11PM -0500, Eric W. Biederman wrote:
>> Seth Forshee <seth.forshee@canonical.com> writes:
>> 
>> > When mounting a filesystem on a block device there is currently
>> > no verification that the user has appropriate access to the
>> > device file passed to mount. This has not been an issue so far
>> > since the user in question has always been root, but this must
>> > be changed before allowing unprivileged users to mount in user
>> > namespaces.
>> >
>> > To do this, a new version of lookup_bdev() is added named
>> > lookup_bdev_perm(). Both of these functions become wrappers
>> > around a common inner fucntion. The behavior of lookup_bdev() is
>> > unchanged, but calling lookup_bdev_perm() will fail if the user
>> > does not have the specified access rights to the supplied path.
>> > The permission check is skipped if the user has CAP_SYS_ADMIN to
>> > avoid any possible regressions in behavior.
>> >
>> > blkdev_get_by_path() is updated to use lookup_bdev_perm(). This
>> > is used by mount_bdev() and mount_mtd(), so this will cause
>> > mounts on block devices to fail when the user lacks the required
>> > permissions. Other calls to blkdev_get_by_path() will all happen
>> > with root privileges, so these calls will be unaffected.
>> 
>> Good but buggy patch.
>> 
>> In the mtd bits the flags are super flags, not file mode bits,
>> which makes testing them against FMODE_READ and FMODE_WRITE is
>> incorrect.
>
> Bah, yes. Fixed.
>
>> Further it looks like quite a few more possibly all of the lookup_bdev
>> instances could use inode level permission checking.
>> 
>> Certainly code such as quotactl makes me wonder.
>
> I opted to stick to places related to mounting, but let's take a look at
> the other callers.
>
> bcache calls it in the context of sysfs writes, and those attributes are
> writable only by root. In that case the inode permission check will be
> skipped anyway, so it makes no difference either way.
>
> Device mapper calls it in dm_get_device, which is called from a bunch of
> places. I had started trying to walk back through all the callers of
> dm_get_device, but that rabbit hole got really deep really quickly so I
> didn't feel confident that changing it wouldn't break anyone.
>
> quotactl gave me pause, as it seems to have done for you too. I was
> surprised that inode permissions aren't checked, but
> check_quotactl_permission does get called before actually doing
> anything. I fear that adding a check of inode permissions could end up
> breaking someone.

My gut feel on all of this is that we should act like may_open and have
have a flag of 0 for access mode mean don't check permissions.

That way we can change all of the callers of lookup_bdev to pass an
additional argument and make it explicit what is going on but we don't
actually have to change the callers to actually perform an additional
check.

Leaving stones unturned is a good way to introduce a security hole by
accident so I don't want to leave dm_get_device unreviewed, but any
changes can be in later patches.

Eric


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/7] fs: Add user namesapace member to struct super_block
  2015-09-25 12:54     ` Seth Forshee
@ 2015-09-25 17:27       ` Eric W. Biederman
  0 siblings, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2015-09-25 17:27 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Jeff Layton, J. Bruce Fields, Serge Hallyn,
	Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd

Seth Forshee <seth.forshee@canonical.com> writes:

> On Thu, Sep 24, 2015 at 04:14:33PM -0500, Eric W. Biederman wrote:
>> Seth Forshee <seth.forshee@canonical.com> writes:
>> 
>> > Initially this will be used to eliminate the implicit MNT_NODEV
>> > flag for mounts from user namespaces. In the future it will also
>> > be used for translating ids and checking capabilities for
>> > filesystems mounted from user namespaces.
>> >
>> > s_user_ns is initialized in alloc_super() and is generally set to
>> > current_user_ns(). To avoid security and corruption issues, two
>> > additional mount checks are also added:
>> >
>> >  - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
>> >    in current_user_ns().
>> >
>> >  - sget() will fail with EBUSY when the filesystem it's looking
>> >    for is already mounted from another user namespace.
>> >
>> > proc requires some special handling. The user namespace of
>> > current isn't appropriate when forking as a result of clone (2)
>> > with CLONE_NEWPID|CLONE_NEWUSER, as it will set s_user_ns to the
>> > namespace of the parent and make proc unmountable in the new user
>> > namespace. Instead, the user namespace which owns the new pid
>> > namespace is used. sget_userns() is allowed to allow passing in
>> > a namespace other than that of current, and sget becomes a
>> > wrapper around sget_userns() which passes current_user_ns().
>> 
>> Minor nits below.  I have fixed them up.
>> 
>> > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
>> > ---
>> >  fs/namespace.c     |  3 +++
>> >  fs/proc/root.c     |  3 ++-
>> >  fs/super.c         | 38 +++++++++++++++++++++++++++++++++-----
>> >  include/linux/fs.h |  9 ++++++++-
>> >  4 files changed, 46 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/fs/namespace.c b/fs/namespace.c
>> > index 0570729c87fd..d023a353dc63 100644
>> > --- a/fs/namespace.c
>> > +++ b/fs/namespace.c
>> > @@ -2381,6 +2381,9 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
>> >  	struct vfsmount *mnt;
>> >  	int err;
>> >  
>> > +	if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
>> > +		return -EPERM;
>> > +
>> >  	if (!fstype)
>> >  		return -EINVAL;
>> >  
>> > diff --git a/fs/proc/root.c b/fs/proc/root.c
>> > index 361ab4ee42fc..4b302cbf13f9 100644
>> > --- a/fs/proc/root.c
>> > +++ b/fs/proc/root.c
>> > @@ -117,7 +117,8 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
>> >  			return ERR_PTR(-EPERM);
>> >  	}
>> >  
>> > -	sb = sget(fs_type, proc_test_super, proc_set_super, flags, ns);
>> > +	sb = sget_userns(fs_type, proc_test_super, proc_set_super, flags,
>> > +			 ns->user_ns, ns);
>> >  	if (IS_ERR(sb))
>> >  		return ERR_CAST(sb);
>> >  
>> > diff --git a/fs/super.c b/fs/super.c
>> > index 954aeb80e202..42837da7d641 100644
>> > --- a/fs/super.c
>> > +++ b/fs/super.c
>> > @@ -33,6 +33,7 @@
>> >  #include <linux/cleancache.h>
>> >  #include <linux/fsnotify.h>
>> >  #include <linux/lockdep.h>
>> > +#include <linux/user_namespace.h>
>> >  #include "internal.h"
>> >  
>> >  
>> > @@ -163,6 +164,7 @@ static void destroy_super(struct super_block *s)
>> >  {
>> >  	list_lru_destroy(&s->s_dentry_lru);
>> >  	list_lru_destroy(&s->s_inode_lru);
>> > +	put_user_ns(s->s_user_ns);
>> >  	security_sb_free(s);
>> >  	WARN_ON(!list_empty(&s->s_mounts));
>> >  	kfree(s->s_subtype);
>> > @@ -178,7 +180,8 @@ static void destroy_super(struct super_block *s)
>> >   *	Allocates and initializes a new &struct super_block.  alloc_super()
>> >   *	returns a pointer new superblock or %NULL if allocation had failed.
>> >   */
>> > -static struct super_block *alloc_super(struct file_system_type *type, int flags)
>> > +static struct super_block *alloc_super(struct file_system_type *type, int flags,
>> > +				       struct user_namespace *user_ns)
>> >  {
>> >  	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
>> >  	static const struct super_operations default_op;
>> > @@ -246,6 +249,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
>> >  	s->s_shrink.count_objects = super_cache_count;
>> >  	s->s_shrink.batch = 1024;
>> >  	s->s_shrink.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE;
>> > +
>> > +	s->s_user_ns = get_user_ns(user_ns);
>> >  	return s;
>> >  
>> >  fail:
>> > @@ -442,17 +447,17 @@ void generic_shutdown_super(struct super_block *sb)
>> >  EXPORT_SYMBOL(generic_shutdown_super);
>> >  
>> >  /**
>> > - *	sget	-	find or create a superblock
>> > + *	sget_userns -	find or create a superblock
>> >   *	@type:	filesystem type superblock should belong to
>> >   *	@test:	comparison callback
>> >   *	@set:	setup callback
>> >   *	@flags:	mount flags
>> 
>> You don't mention the user namespace parameter here.  I have fixed that
>> as.
>> 
>>   + *     @user_ns: User namespace you need CAP_SYS_ADMIN over to mount this fs.
>
> Looks good, thanks. Seems I also missed it in alloc_super though.

FYI I have placed everything that I has made it through my review in my
for-testing branch up on kernel.org.  So you can see what I have merged,
and the build test bots can look and see if they find anything to
complain about.

git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing

Eric


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/7] fs: Verify access of user towards block device file when mounting
  2015-09-25 17:16       ` Eric W. Biederman
@ 2015-09-25 17:39         ` Seth Forshee
  2015-09-25 17:49           ` Eric W. Biederman
  0 siblings, 1 reply; 23+ messages in thread
From: Seth Forshee @ 2015-09-25 17:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Viro, David Woodhouse, Brian Norris, Jeff Layton,
	J. Bruce Fields, Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd

On Fri, Sep 25, 2015 at 12:16:59PM -0500, Eric W. Biederman wrote:
> 
> Argh.  This looks like morning person meets night owl.

Indded :-)

> Seth Forshee <seth.forshee@canonical.com> writes:
> 
> > On Thu, Sep 24, 2015 at 04:53:11PM -0500, Eric W. Biederman wrote:
> >> Seth Forshee <seth.forshee@canonical.com> writes:
> >> 
> >> > When mounting a filesystem on a block device there is currently
> >> > no verification that the user has appropriate access to the
> >> > device file passed to mount. This has not been an issue so far
> >> > since the user in question has always been root, but this must
> >> > be changed before allowing unprivileged users to mount in user
> >> > namespaces.
> >> >
> >> > To do this, a new version of lookup_bdev() is added named
> >> > lookup_bdev_perm(). Both of these functions become wrappers
> >> > around a common inner fucntion. The behavior of lookup_bdev() is
> >> > unchanged, but calling lookup_bdev_perm() will fail if the user
> >> > does not have the specified access rights to the supplied path.
> >> > The permission check is skipped if the user has CAP_SYS_ADMIN to
> >> > avoid any possible regressions in behavior.
> >> >
> >> > blkdev_get_by_path() is updated to use lookup_bdev_perm(). This
> >> > is used by mount_bdev() and mount_mtd(), so this will cause
> >> > mounts on block devices to fail when the user lacks the required
> >> > permissions. Other calls to blkdev_get_by_path() will all happen
> >> > with root privileges, so these calls will be unaffected.
> >> 
> >> Good but buggy patch.
> >> 
> >> In the mtd bits the flags are super flags, not file mode bits,
> >> which makes testing them against FMODE_READ and FMODE_WRITE is
> >> incorrect.
> >
> > Bah, yes. Fixed.
> >
> >> Further it looks like quite a few more possibly all of the lookup_bdev
> >> instances could use inode level permission checking.
> >> 
> >> Certainly code such as quotactl makes me wonder.
> >
> > I opted to stick to places related to mounting, but let's take a look at
> > the other callers.
> >
> > bcache calls it in the context of sysfs writes, and those attributes are
> > writable only by root. In that case the inode permission check will be
> > skipped anyway, so it makes no difference either way.
> >
> > Device mapper calls it in dm_get_device, which is called from a bunch of
> > places. I had started trying to walk back through all the callers of
> > dm_get_device, but that rabbit hole got really deep really quickly so I
> > didn't feel confident that changing it wouldn't break anyone.
> >
> > quotactl gave me pause, as it seems to have done for you too. I was
> > surprised that inode permissions aren't checked, but
> > check_quotactl_permission does get called before actually doing
> > anything. I fear that adding a check of inode permissions could end up
> > breaking someone.
> 
> My gut feel on all of this is that we should act like may_open and have
> have a flag of 0 for access mode mean don't check permissions.
> 
> That way we can change all of the callers of lookup_bdev to pass an
> additional argument and make it explicit what is going on but we don't
> actually have to change the callers to actually perform an additional
> check.

Sounds reasonable, I'll make that change.

> Leaving stones unturned is a good way to introduce a security hole by
> accident so I don't want to leave dm_get_device unreviewed, but any
> changes can be in later patches.

Unless I've made a mistake it shouldn't introduce security holes,
dm_get_device should behave exactly the same was as it behaves today.
Any security problems would already be present.

I can take another crack at reviewing, but it might also be good if
someone who already knows the code commented as well. As I recall I gave
up after getting several levels deep in indirect function calls where the
names of the struct members which held the function pointers were
identical at a couple of levels, so I was having a hard time knowing if
I was keeping everything straight.

Seth

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/7] fs: Verify access of user towards block device file when mounting
  2015-09-25 17:39         ` Seth Forshee
@ 2015-09-25 17:49           ` Eric W. Biederman
  0 siblings, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2015-09-25 17:49 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, David Woodhouse, Brian Norris, Jeff Layton,
	J. Bruce Fields, Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd

Seth Forshee <seth.forshee@canonical.com> writes:

> On Fri, Sep 25, 2015 at 12:16:59PM -0500, Eric W. Biederman wrote:
>> 
>> Argh.  This looks like morning person meets night owl.
>
> Indded :-)
>
>> Seth Forshee <seth.forshee@canonical.com> writes:
>> 
>> > On Thu, Sep 24, 2015 at 04:53:11PM -0500, Eric W. Biederman wrote:
>> >> Seth Forshee <seth.forshee@canonical.com> writes:
>> >> 
>> >> > When mounting a filesystem on a block device there is currently
>> >> > no verification that the user has appropriate access to the
>> >> > device file passed to mount. This has not been an issue so far
>> >> > since the user in question has always been root, but this must
>> >> > be changed before allowing unprivileged users to mount in user
>> >> > namespaces.
>> >> >
>> >> > To do this, a new version of lookup_bdev() is added named
>> >> > lookup_bdev_perm(). Both of these functions become wrappers
>> >> > around a common inner fucntion. The behavior of lookup_bdev() is
>> >> > unchanged, but calling lookup_bdev_perm() will fail if the user
>> >> > does not have the specified access rights to the supplied path.
>> >> > The permission check is skipped if the user has CAP_SYS_ADMIN to
>> >> > avoid any possible regressions in behavior.
>> >> >
>> >> > blkdev_get_by_path() is updated to use lookup_bdev_perm(). This
>> >> > is used by mount_bdev() and mount_mtd(), so this will cause
>> >> > mounts on block devices to fail when the user lacks the required
>> >> > permissions. Other calls to blkdev_get_by_path() will all happen
>> >> > with root privileges, so these calls will be unaffected.
>> >> 
>> >> Good but buggy patch.
>> >> 
>> >> In the mtd bits the flags are super flags, not file mode bits,
>> >> which makes testing them against FMODE_READ and FMODE_WRITE is
>> >> incorrect.
>> >
>> > Bah, yes. Fixed.
>> >
>> >> Further it looks like quite a few more possibly all of the lookup_bdev
>> >> instances could use inode level permission checking.
>> >> 
>> >> Certainly code such as quotactl makes me wonder.
>> >
>> > I opted to stick to places related to mounting, but let's take a look at
>> > the other callers.
>> >
>> > bcache calls it in the context of sysfs writes, and those attributes are
>> > writable only by root. In that case the inode permission check will be
>> > skipped anyway, so it makes no difference either way.
>> >
>> > Device mapper calls it in dm_get_device, which is called from a bunch of
>> > places. I had started trying to walk back through all the callers of
>> > dm_get_device, but that rabbit hole got really deep really quickly so I
>> > didn't feel confident that changing it wouldn't break anyone.
>> >
>> > quotactl gave me pause, as it seems to have done for you too. I was
>> > surprised that inode permissions aren't checked, but
>> > check_quotactl_permission does get called before actually doing
>> > anything. I fear that adding a check of inode permissions could end up
>> > breaking someone.
>> 
>> My gut feel on all of this is that we should act like may_open and have
>> have a flag of 0 for access mode mean don't check permissions.
>> 
>> That way we can change all of the callers of lookup_bdev to pass an
>> additional argument and make it explicit what is going on but we don't
>> actually have to change the callers to actually perform an additional
>> check.
>
> Sounds reasonable, I'll make that change.
>
>> Leaving stones unturned is a good way to introduce a security hole by
>> accident so I don't want to leave dm_get_device unreviewed, but any
>> changes can be in later patches.
>
> Unless I've made a mistake it shouldn't introduce security holes,
> dm_get_device should behave exactly the same was as it behaves today.
> Any security problems would already be present.

*Nod*

There are two parts to this.  Splitting this into changes that are small
enough to be correct.  And just passing in a 0 for acc_mode seems good
enough in that case.  The other question is what happens when we start
allowing unprivileged users to mount things, and in general what happens
when we start relaxing permission checks.

Frankly we need to understand how everything interacts if we are to
relax permissions safely if we don't want to be chasing security issues
for the next several years as things come to lite that we have
overlooked.

> I can take another crack at reviewing, but it might also be good if
> someone who already knows the code commented as well. As I recall I gave
> up after getting several levels deep in indirect function calls where the
> names of the struct members which held the function pointers were
> identical at a couple of levels, so I was having a hard time knowing if
> I was keeping everything straight.

For this patch I am not concerned but we do need to understand how all
of the pieces interact and that is just plain challenging.

Eric


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 4/7] fs: Limit file caps to the user namespace of the super block
  2015-09-25 12:49     ` Seth Forshee
@ 2015-09-25 17:57       ` Eric W. Biederman
  0 siblings, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2015-09-25 17:57 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Serge Hallyn, James Morris, Serge E. Hallyn,
	Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd

Seth Forshee <seth.forshee@canonical.com> writes:

> On Thu, Sep 24, 2015 at 04:59:35PM -0500, Eric W. Biederman wrote:
>> Seth Forshee <seth.forshee@canonical.com> writes:
>> 
>> > Capability sets attached to files must be ignored except in the
>> > user namespaces where the mounter is privileged, i.e. s_user_ns
>> > and its descendants. Otherwise a vector exists for gaining
>> > privileges in namespaces where a user is not already privileged.
>> >
>> > Add a new helper function, in_user_ns(), to test whether a user
>> > namespace is the same as or a descendant of another namespace.
>> > Use this helper to determine whether a file's capability set
>> > should be applied to the caps constructed during exec.
>> 
>> No issues with this but given that we always pass current_user_ns()
>> we may want to simplify the users of in_user_ns by renaming it
>> current_in_user_ns() and hard codeing current_user_ns().
>
> Sure, if that's what you prefer then I'll change it.

I took your patch as is.  This is a suggestion for a possible
incremental improvement.

Eric

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 6/7] Smack: Add support for unprivileged mounts from user namespaces
  2015-09-23 20:16 ` [PATCH v4 6/7] Smack: Add support for unprivileged mounts from user namespaces Seth Forshee
  2015-09-24 22:16   ` Eric W. Biederman
  2015-09-24 22:34   ` Casey Schaufler
@ 2015-09-27 19:30   ` Eric W. Biederman
  2015-09-28 19:45     ` Seth Forshee
  2 siblings, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2015-09-27 19:30 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Casey Schaufler, Serge Hallyn, Andy Lutomirski,
	linux-fsdevel, linux-security-module, selinux, linux-kernel,
	linux-mtd, James Morris, Serge E. Hallyn

Seth Forshee <seth.forshee@canonical.com> writes:

> Security labels from unprivileged mounts cannot be trusted.
> Ideally for these mounts we would assign the objects in the
> filesystem the same label as the inode for the backing device
> passed to mount. Unfortunately it's currently impossible to
> determine which inode this is from the LSM mount hooks, so we
> settle for the label of the process doing the mount.
>
> This label is assigned to s_root, and also to smk_default to
> ensure that new inodes receive this label. The transmute property
> is also set on s_root to make this behavior more explicit, even
> though it is technically not necessary.
>
> If a filesystem has existing security labels, access to inodes is
> permitted if the label is the same as smk_root, otherwise access
> is denied. The SMACK64EXEC xattr is completely ignored.
>
> Explicit setting of security labels continues to require
> CAP_MAC_ADMIN in init_user_ns.
>
> Altogether, this ensures that filesystem objects are not
> accessible to subjects which cannot already access the backing
> store, that MAC is not violated for any objects in the fileystem
> which are already labeled, and that a user cannot use an
> unprivileged mount to gain elevated MAC privileges.
>
> sysfs, tmpfs, and ramfs are already mountable from user
> namespaces and support security labels. We can't rule out the
> possibility that these filesystems may already be used in mounts
> from user namespaces with security lables set from the init
> namespace, so failing to trust lables in these filesystems may
> introduce regressions. It is safe to trust labels from these
> filesystems, since the unprivileged user does not control the
> backing store and thus cannot supply security labels, so an
> explicit exception is made to trust labels from these
> filesystems.

Hmm.

>
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> ---
>  security/smack/smack.h     |  8 +++++++-
>  security/smack/smack_lsm.c | 41 ++++++++++++++++++++++++++++++-----------
>  2 files changed, 37 insertions(+), 12 deletions(-)

[snip]

> @@ -3475,14 +3492,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
>  			if (rc >= 0)
>  				transflag = SMK_INODE_TRANSMUTE;
>  		}
> -		/*
> -		 * Don't let the exec or mmap label be "*" or "@".
> -		 */
> -		skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
> -		if (IS_ERR(skp) || skp == &smack_known_star ||
> -		    skp == &smack_known_web)
> -			skp = NULL;
> -		isp->smk_task = skp;
> +		if (!(sbsp->smk_flags & SMK_SB_UNTRUSTED)) {
> +			/*
> +			 * Don't let the exec or mmap label be "*" or "@".
> +			 */
> +			skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
> +			if (IS_ERR(skp) || skp == &smack_known_star ||
> +			    skp == &smack_known_web)
> +				skp = NULL;
> +			isp->smk_task = skp;

I have to stop and ask is this really what we want to do?

If I have permission I can get around this by explicitly setting the
XATTR_NAME_SMACKEXEC.  Perhaps that does not matter but I think it is
siginficant.

We don't do any filtering on the the smk_mmap label.

Given the policy as I understand it is to only honor labels that match
smk_root would we not be better off allowing anything to be set and
filtering the labels at use when SMK_SB_UNTRUSTED is set?

Having three different policies depending on the kind of label concerns
me.

> +		}
>  
>  		skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp);
>  		if (IS_ERR(skp) || skp == &smack_known_star ||

Eric

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 6/7] Smack: Add support for unprivileged mounts from user namespaces
  2015-09-27 19:30   ` Eric W. Biederman
@ 2015-09-28 19:45     ` Seth Forshee
  0 siblings, 0 replies; 23+ messages in thread
From: Seth Forshee @ 2015-09-28 19:45 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Viro, Casey Schaufler, Serge Hallyn, Andy Lutomirski,
	linux-fsdevel, linux-security-module, selinux, linux-kernel,
	linux-mtd, James Morris, Serge E. Hallyn

On Sun, Sep 27, 2015 at 02:30:58PM -0500, Eric W. Biederman wrote:
> Seth Forshee <seth.forshee@canonical.com> writes:
> 
> > Security labels from unprivileged mounts cannot be trusted.
> > Ideally for these mounts we would assign the objects in the
> > filesystem the same label as the inode for the backing device
> > passed to mount. Unfortunately it's currently impossible to
> > determine which inode this is from the LSM mount hooks, so we
> > settle for the label of the process doing the mount.
> >
> > This label is assigned to s_root, and also to smk_default to
> > ensure that new inodes receive this label. The transmute property
> > is also set on s_root to make this behavior more explicit, even
> > though it is technically not necessary.
> >
> > If a filesystem has existing security labels, access to inodes is
> > permitted if the label is the same as smk_root, otherwise access
> > is denied. The SMACK64EXEC xattr is completely ignored.
> >
> > Explicit setting of security labels continues to require
> > CAP_MAC_ADMIN in init_user_ns.
> >
> > Altogether, this ensures that filesystem objects are not
> > accessible to subjects which cannot already access the backing
> > store, that MAC is not violated for any objects in the fileystem
> > which are already labeled, and that a user cannot use an
> > unprivileged mount to gain elevated MAC privileges.
> >
> > sysfs, tmpfs, and ramfs are already mountable from user
> > namespaces and support security labels. We can't rule out the
> > possibility that these filesystems may already be used in mounts
> > from user namespaces with security lables set from the init
> > namespace, so failing to trust lables in these filesystems may
> > introduce regressions. It is safe to trust labels from these
> > filesystems, since the unprivileged user does not control the
> > backing store and thus cannot supply security labels, so an
> > explicit exception is made to trust labels from these
> > filesystems.
> 
> Hmm.
> 
> >
> > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> > ---
> >  security/smack/smack.h     |  8 +++++++-
> >  security/smack/smack_lsm.c | 41 ++++++++++++++++++++++++++++++-----------
> >  2 files changed, 37 insertions(+), 12 deletions(-)
> 
> [snip]
> 
> > @@ -3475,14 +3492,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
> >  			if (rc >= 0)
> >  				transflag = SMK_INODE_TRANSMUTE;
> >  		}
> > -		/*
> > -		 * Don't let the exec or mmap label be "*" or "@".
> > -		 */
> > -		skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
> > -		if (IS_ERR(skp) || skp == &smack_known_star ||
> > -		    skp == &smack_known_web)
> > -			skp = NULL;
> > -		isp->smk_task = skp;
> > +		if (!(sbsp->smk_flags & SMK_SB_UNTRUSTED)) {
> > +			/*
> > +			 * Don't let the exec or mmap label be "*" or "@".
> > +			 */
> > +			skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
> > +			if (IS_ERR(skp) || skp == &smack_known_star ||
> > +			    skp == &smack_known_web)
> > +				skp = NULL;
> > +			isp->smk_task = skp;
> 
> I have to stop and ask is this really what we want to do?
> 
> If I have permission I can get around this by explicitly setting the
> XATTR_NAME_SMACKEXEC.  Perhaps that does not matter but I think it is
> siginficant.
> 
> We don't do any filtering on the the smk_mmap label.
> 
> Given the policy as I understand it is to only honor labels that match
> smk_root would we not be better off allowing anything to be set and
> filtering the labels at use when SMK_SB_UNTRUSTED is set?
> 
> Having three different policies depending on the kind of label concerns
> me.

The thinking behind the behavior was that since SMACK64EXEC is analagous
to suid or file caps it should be handled similarly, i.e. ignore it in
situations when the mount isn't trusted. You have a point about it being
incongruous with how the other Smack labels are handled though.

So let's say we read the label in an untrusted mount and handle it in
bprm_set_creds. If the label matches smk_root, we allow the transition.
I can't think of a problem with that, but I'd like to hear what Casey
has to say.

What if the label doesn't match? We have a couple of options, either
refuse to exec or exec without changing the label. I'd still favor the
latter, which would keep it consistent with what we do with suid or file
caps.

I'm okay with moving to something like this.

Seems like we probably do need to handle smk_mmap as well. Good catch.

Seth

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-09-28 19:45 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-23 20:16 [PATCH v4 0/7] Initial support for user namespace owned mounts Seth Forshee
2015-09-23 20:16 ` [PATCH v4 1/7] fs: Add user namesapace member to struct super_block Seth Forshee
2015-09-24 21:14   ` Eric W. Biederman
2015-09-25 12:54     ` Seth Forshee
2015-09-25 17:27       ` Eric W. Biederman
2015-09-23 20:16 ` [PATCH v4 2/7] userns: Simpilify MNT_NODEV handling Seth Forshee
2015-09-23 20:16 ` [PATCH v4 3/7] fs: Verify access of user towards block device file when mounting Seth Forshee
2015-09-24 21:53   ` Eric W. Biederman
2015-09-25 12:48     ` Seth Forshee
2015-09-25 17:16       ` Eric W. Biederman
2015-09-25 17:39         ` Seth Forshee
2015-09-25 17:49           ` Eric W. Biederman
2015-09-23 20:16 ` [PATCH v4 4/7] fs: Limit file caps to the user namespace of the super block Seth Forshee
2015-09-24 21:59   ` Eric W. Biederman
2015-09-25 12:49     ` Seth Forshee
2015-09-25 17:57       ` Eric W. Biederman
2015-09-23 20:16 ` [PATCH v4 5/7] fs: Treat foreign mounts as nosuid Seth Forshee
2015-09-23 20:16 ` [PATCH v4 6/7] Smack: Add support for unprivileged mounts from user namespaces Seth Forshee
2015-09-24 22:16   ` Eric W. Biederman
2015-09-24 22:34   ` Casey Schaufler
2015-09-27 19:30   ` Eric W. Biederman
2015-09-28 19:45     ` Seth Forshee
2015-09-23 20:16 ` [PATCH v4 7/7] selinux: " Seth Forshee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).