linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] User namespace mount updates
@ 2015-09-30 20:15 Seth Forshee
  2015-09-30 20:15 ` [PATCH 1/5] fs: Verify access of user towards block device file when mounting Seth Forshee
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Seth Forshee @ 2015-09-30 20:15 UTC (permalink / raw)
  To: Eric W. Biederman, linux-bcache, dm-devel, linux-raid, linux-mtd,
	linux-fsdevel, linux-security-module, selinux
  Cc: Alexander Viro, Serge Hallyn, Andy Lutomirski, linux-kernel,
	Seth Forshee

Hi Eric,

Here's a batch of updates for the unprivileged user namespace mount
patches based on your feedback. I think everything you mentioned should
be addressed here.

These are now based on your for-testing branch.

Updates include:
 - Fix for incorrect use of flags argument in mount_mtd.
 - Eliminate lookup_bdev_perm and instead add an access mode argument to
   lookup_bdev.
 - Use __inode_permission instead of inode_permission when checking for
   rights towards a block device inode.
 - Add a patch replacing in_user_ns with current_in_user_ns.
 - Add a patch to handle Smack security labels consistently.

Thanks,
Seth

Andy Lutomirski (1):
  fs: Treat foreign mounts as nosuid

Seth Forshee (4):
  fs: Verify access of user towards block device file when mounting
  selinux: Add support for unprivileged mounts from user namespaces
  userns: Replace in_userns with current_in_userns
  Smack: Handle labels consistently in untrusted mounts

 drivers/md/bcache/super.c      |  2 +-
 drivers/md/dm-table.c          |  2 +-
 drivers/mtd/mtdsuper.c         |  6 +++++-
 fs/block_dev.c                 | 18 +++++++++++++++---
 fs/exec.c                      |  2 +-
 fs/namespace.c                 | 13 +++++++++++++
 fs/quota/quota.c               |  2 +-
 include/linux/fs.h             |  2 +-
 include/linux/mount.h          |  1 +
 include/linux/user_namespace.h |  6 ++----
 kernel/user_namespace.c        |  6 +++---
 security/commoncap.c           |  4 ++--
 security/selinux/hooks.c       | 25 ++++++++++++++++++++++++-
 security/smack/smack_lsm.c     | 28 ++++++++++++++++++----------
 14 files changed, 88 insertions(+), 29 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] fs: Verify access of user towards block device file when mounting
  2015-09-30 20:15 [PATCH 0/5] User namespace mount updates Seth Forshee
@ 2015-09-30 20:15 ` Seth Forshee
  2015-09-30 23:42   ` Mike Snitzer
  2015-10-01 15:40   ` Eric W. Biederman
  2015-09-30 20:15 ` [PATCH 2/5] fs: Treat foreign mounts as nosuid Seth Forshee
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Seth Forshee @ 2015-09-30 20:15 UTC (permalink / raw)
  To: Eric W. Biederman, Kent Overstreet, Alasdair Kergon,
	Mike Snitzer, dm-devel, Neil Brown, David Woodhouse,
	Brian Norris, Alexander Viro, Jan Kara, Jeff Layton,
	J. Bruce Fields
  Cc: Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	linux-bcache, linux-raid, Seth Forshee

When mounting a filesystem on a block device there is currently
no verification that the user has appropriate access to the
device file passed to mount. This has not been an issue so far
since the user in question has always been root, but this must
be changed before allowing unprivileged users to mount in user
namespaces.

To fix this, add an argument to lookup_bdev() to specify the
required permissions. If the mask of permissions is zero, or
if the user has CAP_SYS_ADMIN, the permission check is skipped,
otherwise the lookup fails if the user does not have the
specified access rights for the inode at the supplied path.

Callers associated with mounting are updated to pass permission
masks to lookup_bdev() so that these mounts will fail for an
unprivileged user who lacks permissions for the block device
inode. All other callers pass 0 to maintain their current
behaviors.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 drivers/md/bcache/super.c |  2 +-
 drivers/md/dm-table.c     |  2 +-
 drivers/mtd/mtdsuper.c    |  6 +++++-
 fs/block_dev.c            | 18 +++++++++++++++---
 fs/quota/quota.c          |  2 +-
 include/linux/fs.h        |  2 +-
 6 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 679a093a3bf6..e8287b0d1dac 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1926,7 +1926,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 				  sb);
 	if (IS_ERR(bdev)) {
 		if (bdev == ERR_PTR(-EBUSY)) {
-			bdev = lookup_bdev(strim(path));
+			bdev = lookup_bdev(strim(path), 0);
 			mutex_lock(&bch_register_lock);
 			if (!IS_ERR(bdev) && bch_is_open(bdev))
 				err = "device already registered";
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index e76ed003769e..35bb3ea4cbe2 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -380,7 +380,7 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
 	BUG_ON(!t);
 
 	/* convert the path to a device */
-	bdev = lookup_bdev(path);
+	bdev = lookup_bdev(path, 0);
 	if (IS_ERR(bdev)) {
 		dev = name_to_dev_t(path);
 		if (!dev)
diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
index 20c02a3b7417..5d7e7705fed8 100644
--- a/drivers/mtd/mtdsuper.c
+++ b/drivers/mtd/mtdsuper.c
@@ -125,6 +125,7 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
 #ifdef CONFIG_BLOCK
 	struct block_device *bdev;
 	int ret, major;
+	int perm;
 #endif
 	int mtdnr;
 
@@ -176,7 +177,10 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
 	/* try the old way - the hack where we allowed users to mount
 	 * /dev/mtdblock$(n) but didn't actually _use_ the blockdev
 	 */
-	bdev = lookup_bdev(dev_name);
+	perm = MAY_READ;
+	if (!(flags & MS_RDONLY))
+		perm |= MAY_WRITE;
+	bdev = lookup_bdev(dev_name, perm);
 	if (IS_ERR(bdev)) {
 		ret = PTR_ERR(bdev);
 		pr_debug("MTDSB: lookup_bdev() returned %d\n", ret);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 26cee058dc02..54d94cd64577 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1394,9 +1394,14 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
 					void *holder)
 {
 	struct block_device *bdev;
+	int perm = 0;
 	int err;
 
-	bdev = lookup_bdev(path);
+	if (mode & FMODE_READ)
+		perm |= MAY_READ;
+	if (mode & FMODE_WRITE)
+		perm |= MAY_WRITE;
+	bdev = lookup_bdev(path, perm);
 	if (IS_ERR(bdev))
 		return bdev;
 
@@ -1706,12 +1711,14 @@ EXPORT_SYMBOL(ioctl_by_bdev);
 /**
  * lookup_bdev  - lookup a struct block_device by name
  * @pathname:	special file representing the block device
+ * @mask:	rights to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
  *
  * Get a reference to the blockdevice at @pathname in the current
  * namespace if possible and return it.  Return ERR_PTR(error)
- * otherwise.
+ * otherwise.  If @mask is non-zero, check for access rights to the
+ * inode at @pathname.
  */
-struct block_device *lookup_bdev(const char *pathname)
+struct block_device *lookup_bdev(const char *pathname, int mask)
 {
 	struct block_device *bdev;
 	struct inode *inode;
@@ -1726,6 +1733,11 @@ struct block_device *lookup_bdev(const char *pathname)
 		return ERR_PTR(error);
 
 	inode = d_backing_inode(path.dentry);
+	if (mask != 0 && !capable(CAP_SYS_ADMIN)) {
+		error = __inode_permission(inode, mask);
+		if (error)
+			goto fail;
+	}
 	error = -ENOTBLK;
 	if (!S_ISBLK(inode->i_mode))
 		goto fail;
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 3746367098fd..a40eaecbd5cc 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -733,7 +733,7 @@ static struct super_block *quotactl_block(const char __user *special, int cmd)
 
 	if (IS_ERR(tmp))
 		return ERR_CAST(tmp);
-	bdev = lookup_bdev(tmp->name);
+	bdev = lookup_bdev(tmp->name, 0);
 	putname(tmp);
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 458ee7b213be..cc18dfb0b98e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2388,7 +2388,7 @@ static inline void unregister_chrdev(unsigned int major, const char *name)
 #define BLKDEV_MAJOR_HASH_SIZE	255
 extern const char *__bdevname(dev_t, char *buffer);
 extern const char *bdevname(struct block_device *bdev, char *buffer);
-extern struct block_device *lookup_bdev(const char *);
+extern struct block_device *lookup_bdev(const char *, int mask);
 extern void blkdev_show(struct seq_file *,off_t);
 
 #else
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5] fs: Treat foreign mounts as nosuid
  2015-09-30 20:15 [PATCH 0/5] User namespace mount updates Seth Forshee
  2015-09-30 20:15 ` [PATCH 1/5] fs: Verify access of user towards block device file when mounting Seth Forshee
@ 2015-09-30 20:15 ` Seth Forshee
  2015-09-30 20:15 ` [PATCH 3/5] selinux: Add support for unprivileged mounts from user namespaces Seth Forshee
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Seth Forshee @ 2015-09-30 20:15 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Serge Hallyn, James Morris,
	Serge E. Hallyn, Paul Moore, Stephen Smalley, Eric Paris
  Cc: Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd, linux-bcache, dm-devel, linux-raid,
	Seth Forshee

From: Andy Lutomirski <luto@amacapital.net>

If a process gets access to a mount from a different user
namespace, that process should not be able to take advantage of
setuid files or selinux entrypoints from that filesystem.  Prevent
this by treating mounts from other mount namespaces and those not
owned by current_user_ns() or an ancestor as nosuid.

This will make it safer to allow more complex filesystems to be
mounted in non-root user namespaces.

This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
setgid, and file capability bits can no longer be abused if code in
a user namespace were to clear nosuid on an untrusted filesystem,
but this patch, by itself, is insufficient to protect the system
from abuse of files that, when execed, would increase MAC privilege.

As a more concrete explanation, any task that can manipulate a
vfsmount associated with a given user namespace already has
capabilities in that namespace and all of its descendents.  If they
can cause a malicious setuid, setgid, or file-caps executable to
appear in that mount, then that executable will only allow them to
elevate privileges in exactly the set of namespaces in which they
are already privileges.

On the other hand, if they can cause a malicious executable to
appear with a dangerous MAC label, running it could change the
caller's security context in a way that should not have been
possible, even inside the namespace in which the task is confined.

As a hardening measure, this would have made CVE-2014-5207 much
more difficult to exploit.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 fs/exec.c                |  2 +-
 fs/namespace.c           | 13 +++++++++++++
 include/linux/mount.h    |  1 +
 security/commoncap.c     |  2 +-
 security/selinux/hooks.c |  2 +-
 5 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b06623a9347f..ea7311d72cc3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1295,7 +1295,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
 	bprm->cred->euid = current_euid();
 	bprm->cred->egid = current_egid();
 
-	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+	if (!mnt_may_suid(bprm->file->f_path.mnt))
 		return;
 
 	if (task_no_new_privs(current))
diff --git a/fs/namespace.c b/fs/namespace.c
index da70f7c4ece1..2101ce7b96ab 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3276,6 +3276,19 @@ found:
 	return visible;
 }
 
+bool mnt_may_suid(struct vfsmount *mnt)
+{
+	/*
+	 * Foreign mounts (accessed via fchdir or through /proc
+	 * symlinks) are always treated as if they are nosuid.  This
+	 * prevents namespaces from trusting potentially unsafe
+	 * suid/sgid bits, file caps, or security labels that originate
+	 * in other namespaces.
+	 */
+	return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) &&
+	       in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns);
+}
+
 static struct ns_common *mntns_get(struct task_struct *task)
 {
 	struct ns_common *ns = NULL;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index f822c3c11377..54a594d49733 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -81,6 +81,7 @@ extern void mntput(struct vfsmount *mnt);
 extern struct vfsmount *mntget(struct vfsmount *mnt);
 extern struct vfsmount *mnt_clone_internal(struct path *path);
 extern int __mnt_is_readonly(struct vfsmount *mnt);
+extern bool mnt_may_suid(struct vfsmount *mnt);
 
 struct path;
 extern struct vfsmount *clone_private_mount(struct path *path);
diff --git a/security/commoncap.c b/security/commoncap.c
index 400aa224b491..6243aef5860e 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -448,7 +448,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 	if (!file_caps_enabled)
 		return 0;
 
-	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+	if (!mnt_may_suid(bprm->file->f_path.mnt))
 		return 0;
 	if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns))
 		return 0;
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index e4369d86e588..de05207eb665 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2171,7 +2171,7 @@ static int check_nnp_nosuid(const struct linux_binprm *bprm,
 			    const struct task_security_struct *new_tsec)
 {
 	int nnp = (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS);
-	int nosuid = (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID);
+	int nosuid = !mnt_may_suid(bprm->file->f_path.mnt);
 	int rc;
 
 	if (!nnp && !nosuid)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5] selinux: Add support for unprivileged mounts from user namespaces
  2015-09-30 20:15 [PATCH 0/5] User namespace mount updates Seth Forshee
  2015-09-30 20:15 ` [PATCH 1/5] fs: Verify access of user towards block device file when mounting Seth Forshee
  2015-09-30 20:15 ` [PATCH 2/5] fs: Treat foreign mounts as nosuid Seth Forshee
@ 2015-09-30 20:15 ` Seth Forshee
  2015-09-30 20:15 ` [PATCH 4/5] userns: Replace in_userns with current_in_userns Seth Forshee
  2015-09-30 20:15 ` [PATCH 5/5] Smack: Handle labels consistently in untrusted mounts Seth Forshee
  4 siblings, 0 replies; 16+ messages in thread
From: Seth Forshee @ 2015-09-30 20:15 UTC (permalink / raw)
  To: Eric W. Biederman, Paul Moore, Stephen Smalley, Eric Paris
  Cc: Alexander Viro, Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	linux-bcache, dm-devel, linux-raid, Seth Forshee, James Morris,
	Serge E. Hallyn

Security labels from unprivileged mounts in user namespaces must
be ignored. Force superblocks from user namespaces whose labeling
behavior is to use xattrs to use mountpoint labeling instead.
For the mountpoint label, default to converting the current task
context into a form suitable for file objects, but also allow the
policy writer to specify a different label through policy
transition rules.

Pieced together from code snippets provided by Stephen Smalley.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 security/selinux/hooks.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index de05207eb665..09be1dc21e58 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -756,6 +756,28 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 			goto out;
 		}
 	}
+
+	/*
+	 * If this is a user namespace mount, no contexts are allowed
+	 * on the command line and security labels must be ignored.
+	 */
+	if (sb->s_user_ns != &init_user_ns) {
+		if (context_sid || fscontext_sid || rootcontext_sid ||
+		    defcontext_sid) {
+			rc = -EACCES;
+			goto out;
+		}
+		if (sbsec->behavior == SECURITY_FS_USE_XATTR) {
+			sbsec->behavior = SECURITY_FS_USE_MNTPOINT;
+			rc = security_transition_sid(current_sid(), current_sid(),
+						     SECCLASS_FILE, NULL,
+						     &sbsec->mntpoint_sid);
+			if (rc)
+				goto out;
+		}
+		goto out_set_opts;
+	}
+
 	/* sets the context of the superblock for the fs being mounted. */
 	if (fscontext_sid) {
 		rc = may_context_mount_sb_relabel(fscontext_sid, sbsec, cred);
@@ -824,6 +846,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 		sbsec->def_sid = defcontext_sid;
 	}
 
+out_set_opts:
 	rc = sb_finish_set_opts(sb);
 out:
 	mutex_unlock(&sbsec->lock);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5] userns: Replace in_userns with current_in_userns
  2015-09-30 20:15 [PATCH 0/5] User namespace mount updates Seth Forshee
                   ` (2 preceding siblings ...)
  2015-09-30 20:15 ` [PATCH 3/5] selinux: Add support for unprivileged mounts from user namespaces Seth Forshee
@ 2015-09-30 20:15 ` Seth Forshee
  2015-09-30 20:15 ` [PATCH 5/5] Smack: Handle labels consistently in untrusted mounts Seth Forshee
  4 siblings, 0 replies; 16+ messages in thread
From: Seth Forshee @ 2015-09-30 20:15 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Serge Hallyn, James Morris,
	Serge E. Hallyn
  Cc: Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd, linux-bcache, dm-devel, linux-raid,
	Seth Forshee

All current callers of in_userns pass current_user_ns as the
first argument. Simplify by replacing in_userns with
current_in_userns which checks whether current_user_ns is in the
namespace supplied as an argument.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 fs/namespace.c                 | 2 +-
 include/linux/user_namespace.h | 6 ++----
 kernel/user_namespace.c        | 6 +++---
 security/commoncap.c           | 2 +-
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2101ce7b96ab..18fc58760aec 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3286,7 +3286,7 @@ bool mnt_may_suid(struct vfsmount *mnt)
 	 * in other namespaces.
 	 */
 	return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) &&
-	       in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns);
+	       current_in_userns(mnt->mnt_sb->s_user_ns);
 }
 
 static struct ns_common *mntns_get(struct task_struct *task)
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index a43faa727124..9217169c64cb 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -72,8 +72,7 @@ extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t,
 extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
 extern int proc_setgroups_show(struct seq_file *m, void *v);
 extern bool userns_may_setgroups(const struct user_namespace *ns);
-extern bool in_userns(const struct user_namespace *ns,
-		      const struct user_namespace *target_ns);
+extern bool current_in_userns(const struct user_namespace *target_ns);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -103,8 +102,7 @@ static inline bool userns_may_setgroups(const struct user_namespace *ns)
 	return true;
 }
 
-static inline bool in_userns(const struct user_namespace *ns,
-			     const struct user_namespace *target_ns)
+static inline bool current_in_userns(const struct user_namespace *target_ns)
 {
 	return true;
 }
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 69fbc377357b..5960edc7e644 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -949,10 +949,10 @@ bool userns_may_setgroups(const struct user_namespace *ns)
  * Returns true if @ns is the same namespace as or a descendant of
  * @target_ns.
  */
-bool in_userns(const struct user_namespace *ns,
-	       const struct user_namespace *target_ns)
+bool current_in_userns(const struct user_namespace *target_ns)
 {
-	for (; ns; ns = ns->parent) {
+	struct user_namespace *ns;
+	for (ns = current_user_ns(); ns; ns = ns->parent) {
 		if (ns == target_ns)
 			return true;
 	}
diff --git a/security/commoncap.c b/security/commoncap.c
index 6243aef5860e..2119421613f6 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -450,7 +450,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 
 	if (!mnt_may_suid(bprm->file->f_path.mnt))
 		return 0;
-	if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns))
+	if (!current_in_userns(bprm->file->f_path.mnt->mnt_sb->s_user_ns))
 		return 0;
 
 	rc = get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] Smack: Handle labels consistently in untrusted mounts
  2015-09-30 20:15 [PATCH 0/5] User namespace mount updates Seth Forshee
                   ` (3 preceding siblings ...)
  2015-09-30 20:15 ` [PATCH 4/5] userns: Replace in_userns with current_in_userns Seth Forshee
@ 2015-09-30 20:15 ` Seth Forshee
  4 siblings, 0 replies; 16+ messages in thread
From: Seth Forshee @ 2015-09-30 20:15 UTC (permalink / raw)
  To: Eric W. Biederman, Casey Schaufler
  Cc: Alexander Viro, Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	linux-bcache, dm-devel, linux-raid, Seth Forshee, James Morris,
	Serge E. Hallyn

The SMACK64, SMACK64EXEC, and SMACK64MMAP labels are all handled
differently in untrusted mounts. This is confusing and
potentically problematic. Change this to handle them all the same
way that SMACK64 is currently handled; that is, read the label
from disk and check it at use time. For SMACK64 and SMACK64MMAP
access is denied if the label does not match smk_root. To be
consistent with suid, a SMACK64EXEC label which does not match
smk_root will still allow execution of the file but will not run
with the label supplied in the xattr.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 security/smack/smack_lsm.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 621200f86b56..bee0b2652bf4 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -891,6 +891,7 @@ static int smack_bprm_set_creds(struct linux_binprm *bprm)
 	struct inode *inode = file_inode(bprm->file);
 	struct task_smack *bsp = bprm->cred->security;
 	struct inode_smack *isp;
+	struct superblock_smack *sbsp;
 	int rc;
 
 	if (bprm->cred_prepared)
@@ -900,6 +901,10 @@ static int smack_bprm_set_creds(struct linux_binprm *bprm)
 	if (isp->smk_task == NULL || isp->smk_task == bsp->smk_task)
 		return 0;
 
+	sbsp = inode->i_sb->s_security;
+	if (sbsp->smk_flags & SMK_SB_UNTRUSTED && isp->smk_task != sbsp->smk_root)
+		return 0;
+
 	if (bprm->unsafe & (LSM_UNSAFE_PTRACE | LSM_UNSAFE_PTRACE_CAP)) {
 		struct task_struct *tracer;
 		rc = 0;
@@ -1703,6 +1708,7 @@ static int smack_mmap_file(struct file *file,
 	struct task_smack *tsp;
 	struct smack_known *okp;
 	struct inode_smack *isp;
+	struct superblock_smack *sbsp;
 	int may;
 	int mmay;
 	int tmay;
@@ -1714,6 +1720,10 @@ static int smack_mmap_file(struct file *file,
 	isp = file_inode(file)->i_security;
 	if (isp->smk_mmap == NULL)
 		return 0;
+	sbsp = file_inode(file)->i_sb->s_security;
+	if (sbsp->smk_flags & SMK_SB_UNTRUSTED &&
+	    isp->smk_mmap != sbsp->smk_root)
+		return -EACCES;
 	mkp = isp->smk_mmap;
 
 	tsp = current_security();
@@ -3492,16 +3502,14 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
 			if (rc >= 0)
 				transflag = SMK_INODE_TRANSMUTE;
 		}
-		if (!(sbsp->smk_flags & SMK_SB_UNTRUSTED)) {
-			/*
-			 * Don't let the exec or mmap label be "*" or "@".
-			 */
-			skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
-			if (IS_ERR(skp) || skp == &smack_known_star ||
-			    skp == &smack_known_web)
-				skp = NULL;
-			isp->smk_task = skp;
-		}
+		/*
+		 * Don't let the exec or mmap label be "*" or "@".
+		 */
+		skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
+		if (IS_ERR(skp) || skp == &smack_known_star ||
+		    skp == &smack_known_web)
+			skp = NULL;
+		isp->smk_task = skp;
 
 		skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp);
 		if (IS_ERR(skp) || skp == &smack_known_star ||
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] fs: Verify access of user towards block device file when mounting
  2015-09-30 20:15 ` [PATCH 1/5] fs: Verify access of user towards block device file when mounting Seth Forshee
@ 2015-09-30 23:42   ` Mike Snitzer
  2015-10-01 12:55     ` Seth Forshee
  2015-10-01 15:40   ` Eric W. Biederman
  1 sibling, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2015-09-30 23:42 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Eric W. Biederman, Kent Overstreet, Alasdair Kergon, dm-devel,
	Neil Brown, David Woodhouse, Brian Norris, Alexander Viro,
	Jan Kara, Jeff Layton, J. Bruce Fields, Serge Hallyn,
	Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd, linux-bcache, linux-raid

On Wed, Sep 30 2015 at  4:15pm -0400,
Seth Forshee <seth.forshee@canonical.com> wrote:

> When mounting a filesystem on a block device there is currently
> no verification that the user has appropriate access to the
> device file passed to mount. This has not been an issue so far
> since the user in question has always been root, but this must
> be changed before allowing unprivileged users to mount in user
> namespaces.
> 
> To fix this, add an argument to lookup_bdev() to specify the
> required permissions. If the mask of permissions is zero, or
> if the user has CAP_SYS_ADMIN, the permission check is skipped,
> otherwise the lookup fails if the user does not have the
> specified access rights for the inode at the supplied path.
> 
> Callers associated with mounting are updated to pass permission
> masks to lookup_bdev() so that these mounts will fail for an
> unprivileged user who lacks permissions for the block device
> inode. All other callers pass 0 to maintain their current
> behaviors.
> 
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> ---
>  drivers/md/bcache/super.c |  2 +-
>  drivers/md/dm-table.c     |  2 +-
>  drivers/mtd/mtdsuper.c    |  6 +++++-
>  fs/block_dev.c            | 18 +++++++++++++++---
>  fs/quota/quota.c          |  2 +-
>  include/linux/fs.h        |  2 +-
>  6 files changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index e76ed003769e..35bb3ea4cbe2 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -380,7 +380,7 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
>  	BUG_ON(!t);
>  
>  	/* convert the path to a device */
> -	bdev = lookup_bdev(path);
> +	bdev = lookup_bdev(path, 0);
>  	if (IS_ERR(bdev)) {
>  		dev = name_to_dev_t(path);
>  		if (!dev)

Given dm_get_device() is passed @mode why not have it do something like
you did in blkdev_get_by_path()? e.g.:

> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 26cee058dc02..54d94cd64577 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1394,9 +1394,14 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
>  					void *holder)
>  {
>  	struct block_device *bdev;
> +	int perm = 0;
>  	int err;
>  
> -	bdev = lookup_bdev(path);
> +	if (mode & FMODE_READ)
> +		perm |= MAY_READ;
> +	if (mode & FMODE_WRITE)
> +		perm |= MAY_WRITE;
> +	bdev = lookup_bdev(path, perm);
>  	if (IS_ERR(bdev))
>  		return bdev;
>  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] fs: Verify access of user towards block device file when mounting
  2015-09-30 23:42   ` Mike Snitzer
@ 2015-10-01 12:55     ` Seth Forshee
  2015-10-01 13:40       ` Mike Snitzer
  0 siblings, 1 reply; 16+ messages in thread
From: Seth Forshee @ 2015-10-01 12:55 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Eric W. Biederman, Kent Overstreet, Alasdair Kergon, dm-devel,
	Neil Brown, David Woodhouse, Brian Norris, Alexander Viro,
	Jan Kara, Jeff Layton, J. Bruce Fields, Serge Hallyn,
	Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd, linux-bcache, linux-raid

On Wed, Sep 30, 2015 at 07:42:15PM -0400, Mike Snitzer wrote:
> On Wed, Sep 30 2015 at  4:15pm -0400,
> Seth Forshee <seth.forshee@canonical.com> wrote:
> 
> > When mounting a filesystem on a block device there is currently
> > no verification that the user has appropriate access to the
> > device file passed to mount. This has not been an issue so far
> > since the user in question has always been root, but this must
> > be changed before allowing unprivileged users to mount in user
> > namespaces.
> > 
> > To fix this, add an argument to lookup_bdev() to specify the
> > required permissions. If the mask of permissions is zero, or
> > if the user has CAP_SYS_ADMIN, the permission check is skipped,
> > otherwise the lookup fails if the user does not have the
> > specified access rights for the inode at the supplied path.
> > 
> > Callers associated with mounting are updated to pass permission
> > masks to lookup_bdev() so that these mounts will fail for an
> > unprivileged user who lacks permissions for the block device
> > inode. All other callers pass 0 to maintain their current
> > behaviors.
> > 
> > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> > ---
> >  drivers/md/bcache/super.c |  2 +-
> >  drivers/md/dm-table.c     |  2 +-
> >  drivers/mtd/mtdsuper.c    |  6 +++++-
> >  fs/block_dev.c            | 18 +++++++++++++++---
> >  fs/quota/quota.c          |  2 +-
> >  include/linux/fs.h        |  2 +-
> >  6 files changed, 24 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index e76ed003769e..35bb3ea4cbe2 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> > @@ -380,7 +380,7 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
> >  	BUG_ON(!t);
> >  
> >  	/* convert the path to a device */
> > -	bdev = lookup_bdev(path);
> > +	bdev = lookup_bdev(path, 0);
> >  	if (IS_ERR(bdev)) {
> >  		dev = name_to_dev_t(path);
> >  		if (!dev)
> 
> Given dm_get_device() is passed @mode why not have it do something like
> you did in blkdev_get_by_path()? e.g.:

I only dealt with code related to mounting in this patch since that's
what I'm working on. I have it on my TODO list to consider converting
other callers of lookup_bdev. But if you're sure doing so makes sense
for dm_get_device and that it won't cause regressions then I could add a
patch for it.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] fs: Verify access of user towards block device file when mounting
  2015-10-01 12:55     ` Seth Forshee
@ 2015-10-01 13:40       ` Mike Snitzer
  2015-10-01 14:41         ` Seth Forshee
  2015-10-01 15:55         ` Eric W. Biederman
  0 siblings, 2 replies; 16+ messages in thread
From: Mike Snitzer @ 2015-10-01 13:40 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Eric W. Biederman, Kent Overstreet, Alasdair Kergon, dm-devel,
	Neil Brown, David Woodhouse, Brian Norris, Alexander Viro,
	Jan Kara, Jeff Layton, J. Bruce Fields, Serge Hallyn,
	Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd, linux-bcache, linux-raid

On Thu, Oct 01 2015 at  8:55am -0400,
Seth Forshee <seth.forshee@canonical.com> wrote:

> On Wed, Sep 30, 2015 at 07:42:15PM -0400, Mike Snitzer wrote:
> > On Wed, Sep 30 2015 at  4:15pm -0400,
> > Seth Forshee <seth.forshee@canonical.com> wrote:
> > 
> > > When mounting a filesystem on a block device there is currently
> > > no verification that the user has appropriate access to the
> > > device file passed to mount. This has not been an issue so far
> > > since the user in question has always been root, but this must
> > > be changed before allowing unprivileged users to mount in user
> > > namespaces.
> > > 
> > > To fix this, add an argument to lookup_bdev() to specify the
> > > required permissions. If the mask of permissions is zero, or
> > > if the user has CAP_SYS_ADMIN, the permission check is skipped,
> > > otherwise the lookup fails if the user does not have the
> > > specified access rights for the inode at the supplied path.
> > > 
> > > Callers associated with mounting are updated to pass permission
> > > masks to lookup_bdev() so that these mounts will fail for an
> > > unprivileged user who lacks permissions for the block device
> > > inode. All other callers pass 0 to maintain their current
> > > behaviors.
> > > 
> > > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> > > ---
> > >  drivers/md/bcache/super.c |  2 +-
> > >  drivers/md/dm-table.c     |  2 +-
> > >  drivers/mtd/mtdsuper.c    |  6 +++++-
> > >  fs/block_dev.c            | 18 +++++++++++++++---
> > >  fs/quota/quota.c          |  2 +-
> > >  include/linux/fs.h        |  2 +-
> > >  6 files changed, 24 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > > index e76ed003769e..35bb3ea4cbe2 100644
> > > --- a/drivers/md/dm-table.c
> > > +++ b/drivers/md/dm-table.c
> > > @@ -380,7 +380,7 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
> > >  	BUG_ON(!t);
> > >  
> > >  	/* convert the path to a device */
> > > -	bdev = lookup_bdev(path);
> > > +	bdev = lookup_bdev(path, 0);
> > >  	if (IS_ERR(bdev)) {
> > >  		dev = name_to_dev_t(path);
> > >  		if (!dev)
> > 
> > Given dm_get_device() is passed @mode why not have it do something like
> > you did in blkdev_get_by_path()? e.g.:
> 
> I only dealt with code related to mounting in this patch since that's
> what I'm working on. I have it on my TODO list to consider converting
> other callers of lookup_bdev. But if you're sure doing so makes sense
> for dm_get_device and that it won't cause regressions then I could add a
> patch for it.

OK, dm_get_device() is called in DM device activation path (by tools
like lvm2).

After lookup_bdev() it goes on to call blkdev_get_by_dev() with this
call chain: 
  dm_get_device -> dm_get_table_device -> open_table_device -> blkdev_get_by_dev

Not immediately clear to me why we'd need to augment blkdev_get_by_dev()
to do this checking also.

However, thinking further: In a device stack (e.g. dm/lvm2, md, etc)
new virtual block devices are created that layer ontop of the
traditional block devices.  This level of indirection may cause your
lookup_bdev() check to go on to succeed (if access constraints were not
established on the upper level dm or md device?).  I'm just thinking
outloud here: but have you verified your changes work as intended on
devices created with either lvm2 or mdadm?

What layer establishes access rights to historically root-only
priviledged block devices?  Is it user namespaces?

I haven't kept up with user namespaces as it relates to stacking block
drivers like DM.  But I'm happy to come up to speed and at the same time
help you verify all works as expected with DM blocks devices...

Mike

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] fs: Verify access of user towards block device file when mounting
  2015-10-01 13:40       ` Mike Snitzer
@ 2015-10-01 14:41         ` Seth Forshee
  2015-10-08 15:41           ` Seth Forshee
  2015-10-01 15:55         ` Eric W. Biederman
  1 sibling, 1 reply; 16+ messages in thread
From: Seth Forshee @ 2015-10-01 14:41 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Eric W. Biederman, Kent Overstreet, Alasdair Kergon, dm-devel,
	Neil Brown, David Woodhouse, Brian Norris, Alexander Viro,
	Jan Kara, Jeff Layton, J. Bruce Fields, Serge Hallyn,
	Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd, linux-bcache, linux-raid

On Thu, Oct 01, 2015 at 09:40:52AM -0400, Mike Snitzer wrote:
> On Thu, Oct 01 2015 at  8:55am -0400,
> Seth Forshee <seth.forshee@canonical.com> wrote:
> 
> > On Wed, Sep 30, 2015 at 07:42:15PM -0400, Mike Snitzer wrote:
> > > On Wed, Sep 30 2015 at  4:15pm -0400,
> > > Seth Forshee <seth.forshee@canonical.com> wrote:
> > > 
> > > > When mounting a filesystem on a block device there is currently
> > > > no verification that the user has appropriate access to the
> > > > device file passed to mount. This has not been an issue so far
> > > > since the user in question has always been root, but this must
> > > > be changed before allowing unprivileged users to mount in user
> > > > namespaces.
> > > > 
> > > > To fix this, add an argument to lookup_bdev() to specify the
> > > > required permissions. If the mask of permissions is zero, or
> > > > if the user has CAP_SYS_ADMIN, the permission check is skipped,
> > > > otherwise the lookup fails if the user does not have the
> > > > specified access rights for the inode at the supplied path.
> > > > 
> > > > Callers associated with mounting are updated to pass permission
> > > > masks to lookup_bdev() so that these mounts will fail for an
> > > > unprivileged user who lacks permissions for the block device
> > > > inode. All other callers pass 0 to maintain their current
> > > > behaviors.
> > > > 
> > > > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> > > > ---
> > > >  drivers/md/bcache/super.c |  2 +-
> > > >  drivers/md/dm-table.c     |  2 +-
> > > >  drivers/mtd/mtdsuper.c    |  6 +++++-
> > > >  fs/block_dev.c            | 18 +++++++++++++++---
> > > >  fs/quota/quota.c          |  2 +-
> > > >  include/linux/fs.h        |  2 +-
> > > >  6 files changed, 24 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > > > index e76ed003769e..35bb3ea4cbe2 100644
> > > > --- a/drivers/md/dm-table.c
> > > > +++ b/drivers/md/dm-table.c
> > > > @@ -380,7 +380,7 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
> > > >  	BUG_ON(!t);
> > > >  
> > > >  	/* convert the path to a device */
> > > > -	bdev = lookup_bdev(path);
> > > > +	bdev = lookup_bdev(path, 0);
> > > >  	if (IS_ERR(bdev)) {
> > > >  		dev = name_to_dev_t(path);
> > > >  		if (!dev)
> > > 
> > > Given dm_get_device() is passed @mode why not have it do something like
> > > you did in blkdev_get_by_path()? e.g.:
> > 
> > I only dealt with code related to mounting in this patch since that's
> > what I'm working on. I have it on my TODO list to consider converting
> > other callers of lookup_bdev. But if you're sure doing so makes sense
> > for dm_get_device and that it won't cause regressions then I could add a
> > patch for it.
> 
> OK, dm_get_device() is called in DM device activation path (by tools
> like lvm2).
> 
> After lookup_bdev() it goes on to call blkdev_get_by_dev() with this
> call chain: 
>   dm_get_device -> dm_get_table_device -> open_table_device -> blkdev_get_by_dev
> 
> Not immediately clear to me why we'd need to augment blkdev_get_by_dev()
> to do this checking also.
> 
> However, thinking further: In a device stack (e.g. dm/lvm2, md, etc)
> new virtual block devices are created that layer ontop of the
> traditional block devices.  This level of indirection may cause your
> lookup_bdev() check to go on to succeed (if access constraints were not
> established on the upper level dm or md device?).  I'm just thinking
> outloud here: but have you verified your changes work as intended on
> devices created with either lvm2 or mdadm?
> 
> What layer establishes access rights to historically root-only
> priviledged block devices?  Is it user namespaces?

I'm going to start with this last question and work my way backwards.

Who determines access rights to the block devices varies to some degree.
Any normal user could unshare the user/mount namespaces and still see
all the block devices in /dev. If we're going to allow that user to then
mount block devices in their private namespaces, the kernel must verify
that the user has appropriate permissions for the block device inode.
That's the point of this patch. In a container context the host process
which sets up the container might share some block devices into the
container via bind mounts, in which case it would be responsible for
setting up access rights (both via inode permissions and potentially via
devcgroups). I also have some patches I'm working on for a loop psuedo
filesystem which would allow a container to allocate loop devices for
its own use (via the loop-control device), in which case the kernel
creates a device node in the loopfs filesystem from which the new loop
device was allocated, owned by the root user in the user namespace
associated with the loopfs superblock.

Obviously a DM block device is more complicated than a traditional block
device. This patch should ensure that if an unprivileged user tries to
mount a DM blkdev that it fails if the user lacks permissions for the DM
blkdev inode. There's also the blkdevs behind the DM blkdev, and I'm not
sure whether we need to additionally verify that the user has access to
those devices. Maybe it's enough that someone with access to them set up
the DM device, and someone with sufficient privileges gave that user
access to the DM device. Do you have any thoughts?

As for activating a DM device, I assume only root is permitted to do
this. In that case lookup_bdev will skip the inode permission check
even if a non-zero permission mask is passed.

> I haven't kept up with user namespaces as it relates to stacking block
> drivers like DM.  But I'm happy to come up to speed and at the same time
> help you verify all works as expected with DM blocks devices...

That's great, I really appreciate it.

Seth

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] fs: Verify access of user towards block device file when mounting
  2015-09-30 20:15 ` [PATCH 1/5] fs: Verify access of user towards block device file when mounting Seth Forshee
  2015-09-30 23:42   ` Mike Snitzer
@ 2015-10-01 15:40   ` Eric W. Biederman
  2015-10-01 15:55     ` Seth Forshee
  1 sibling, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2015-10-01 15:40 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Kent Overstreet, Alasdair Kergon, Mike Snitzer, dm-devel,
	Neil Brown, David Woodhouse, Brian Norris, Alexander Viro,
	Jan Kara, Jeff Layton, J. Bruce Fields, Serge Hallyn,
	Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd, linux-bcache, linux-raid

Seth Forshee <seth.forshee@canonical.com> writes:

> When mounting a filesystem on a block device there is currently
> no verification that the user has appropriate access to the
> device file passed to mount. This has not been an issue so far
> since the user in question has always been root, but this must
> be changed before allowing unprivileged users to mount in user
> namespaces.
>
> To fix this, add an argument to lookup_bdev() to specify the
> required permissions. If the mask of permissions is zero, or
> if the user has CAP_SYS_ADMIN, the permission check is skipped,
> otherwise the lookup fails if the user does not have the
> specified access rights for the inode at the supplied path.
>
> Callers associated with mounting are updated to pass permission
> masks to lookup_bdev() so that these mounts will fail for an
> unprivileged user who lacks permissions for the block device
> inode. All other callers pass 0 to maintain their current
> behaviors.
>

Seth can you split this patch?

One patch to add an argument to lookup_bdev,
and then for each kind of callsite a follow-on patch (if we are ready
for that).

That will separate the logical changes and make things easier to track
via bisect and more importantly easier to review things.

Eric


> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> ---
>  drivers/md/bcache/super.c |  2 +-
>  drivers/md/dm-table.c     |  2 +-
>  drivers/mtd/mtdsuper.c    |  6 +++++-
>  fs/block_dev.c            | 18 +++++++++++++++---
>  fs/quota/quota.c          |  2 +-
>  include/linux/fs.h        |  2 +-
>  6 files changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 679a093a3bf6..e8287b0d1dac 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1926,7 +1926,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
>  				  sb);
>  	if (IS_ERR(bdev)) {
>  		if (bdev == ERR_PTR(-EBUSY)) {
> -			bdev = lookup_bdev(strim(path));
> +			bdev = lookup_bdev(strim(path), 0);
>  			mutex_lock(&bch_register_lock);
>  			if (!IS_ERR(bdev) && bch_is_open(bdev))
>  				err = "device already registered";
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index e76ed003769e..35bb3ea4cbe2 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -380,7 +380,7 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
>  	BUG_ON(!t);
>  
>  	/* convert the path to a device */
> -	bdev = lookup_bdev(path);
> +	bdev = lookup_bdev(path, 0);
>  	if (IS_ERR(bdev)) {
>  		dev = name_to_dev_t(path);
>  		if (!dev)
> diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
> index 20c02a3b7417..5d7e7705fed8 100644
> --- a/drivers/mtd/mtdsuper.c
> +++ b/drivers/mtd/mtdsuper.c
> @@ -125,6 +125,7 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
>  #ifdef CONFIG_BLOCK
>  	struct block_device *bdev;
>  	int ret, major;
> +	int perm;
>  #endif
>  	int mtdnr;
>  
> @@ -176,7 +177,10 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
>  	/* try the old way - the hack where we allowed users to mount
>  	 * /dev/mtdblock$(n) but didn't actually _use_ the blockdev
>  	 */
> -	bdev = lookup_bdev(dev_name);
> +	perm = MAY_READ;
> +	if (!(flags & MS_RDONLY))
> +		perm |= MAY_WRITE;
> +	bdev = lookup_bdev(dev_name, perm);
>  	if (IS_ERR(bdev)) {
>  		ret = PTR_ERR(bdev);
>  		pr_debug("MTDSB: lookup_bdev() returned %d\n", ret);
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 26cee058dc02..54d94cd64577 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1394,9 +1394,14 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
>  					void *holder)
>  {
>  	struct block_device *bdev;
> +	int perm = 0;
>  	int err;
>  
> -	bdev = lookup_bdev(path);
> +	if (mode & FMODE_READ)
> +		perm |= MAY_READ;
> +	if (mode & FMODE_WRITE)
> +		perm |= MAY_WRITE;
> +	bdev = lookup_bdev(path, perm);
>  	if (IS_ERR(bdev))
>  		return bdev;
>  
> @@ -1706,12 +1711,14 @@ EXPORT_SYMBOL(ioctl_by_bdev);
>  /**
>   * lookup_bdev  - lookup a struct block_device by name
>   * @pathname:	special file representing the block device
> + * @mask:	rights to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
>   *
>   * Get a reference to the blockdevice at @pathname in the current
>   * namespace if possible and return it.  Return ERR_PTR(error)
> - * otherwise.
> + * otherwise.  If @mask is non-zero, check for access rights to the
> + * inode at @pathname.
>   */
> -struct block_device *lookup_bdev(const char *pathname)
> +struct block_device *lookup_bdev(const char *pathname, int mask)
>  {
>  	struct block_device *bdev;
>  	struct inode *inode;
> @@ -1726,6 +1733,11 @@ struct block_device *lookup_bdev(const char *pathname)
>  		return ERR_PTR(error);
>  
>  	inode = d_backing_inode(path.dentry);
> +	if (mask != 0 && !capable(CAP_SYS_ADMIN)) {
> +		error = __inode_permission(inode, mask);
> +		if (error)
> +			goto fail;
> +	}
>  	error = -ENOTBLK;
>  	if (!S_ISBLK(inode->i_mode))
>  		goto fail;
> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
> index 3746367098fd..a40eaecbd5cc 100644
> --- a/fs/quota/quota.c
> +++ b/fs/quota/quota.c
> @@ -733,7 +733,7 @@ static struct super_block *quotactl_block(const char __user *special, int cmd)
>  
>  	if (IS_ERR(tmp))
>  		return ERR_CAST(tmp);
> -	bdev = lookup_bdev(tmp->name);
> +	bdev = lookup_bdev(tmp->name, 0);
>  	putname(tmp);
>  	if (IS_ERR(bdev))
>  		return ERR_CAST(bdev);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 458ee7b213be..cc18dfb0b98e 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2388,7 +2388,7 @@ static inline void unregister_chrdev(unsigned int major, const char *name)
>  #define BLKDEV_MAJOR_HASH_SIZE	255
>  extern const char *__bdevname(dev_t, char *buffer);
>  extern const char *bdevname(struct block_device *bdev, char *buffer);
> -extern struct block_device *lookup_bdev(const char *);
> +extern struct block_device *lookup_bdev(const char *, int mask);
>  extern void blkdev_show(struct seq_file *,off_t);
>  
>  #else

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] fs: Verify access of user towards block device file when mounting
  2015-10-01 15:40   ` Eric W. Biederman
@ 2015-10-01 15:55     ` Seth Forshee
  0 siblings, 0 replies; 16+ messages in thread
From: Seth Forshee @ 2015-10-01 15:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kent Overstreet, Alasdair Kergon, Mike Snitzer, dm-devel,
	Neil Brown, David Woodhouse, Brian Norris, Alexander Viro,
	Jan Kara, Jeff Layton, J. Bruce Fields, Serge Hallyn,
	Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd, linux-bcache, linux-raid

On Thu, Oct 01, 2015 at 10:40:08AM -0500, Eric W. Biederman wrote:
> Seth Forshee <seth.forshee@canonical.com> writes:
> 
> > When mounting a filesystem on a block device there is currently
> > no verification that the user has appropriate access to the
> > device file passed to mount. This has not been an issue so far
> > since the user in question has always been root, but this must
> > be changed before allowing unprivileged users to mount in user
> > namespaces.
> >
> > To fix this, add an argument to lookup_bdev() to specify the
> > required permissions. If the mask of permissions is zero, or
> > if the user has CAP_SYS_ADMIN, the permission check is skipped,
> > otherwise the lookup fails if the user does not have the
> > specified access rights for the inode at the supplied path.
> >
> > Callers associated with mounting are updated to pass permission
> > masks to lookup_bdev() so that these mounts will fail for an
> > unprivileged user who lacks permissions for the block device
> > inode. All other callers pass 0 to maintain their current
> > behaviors.
> >
> 
> Seth can you split this patch?
> 
> One patch to add an argument to lookup_bdev,
> and then for each kind of callsite a follow-on patch (if we are ready
> for that).
> 
> That will separate the logical changes and make things easier to track
> via bisect and more importantly easier to review things.

Sure, I'll do that.

Seth

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] fs: Verify access of user towards block device file when mounting
  2015-10-01 13:40       ` Mike Snitzer
  2015-10-01 14:41         ` Seth Forshee
@ 2015-10-01 15:55         ` Eric W. Biederman
  2015-10-01 23:07           ` Jan Kara
  1 sibling, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2015-10-01 15:55 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Seth Forshee, Kent Overstreet, Alasdair Kergon, dm-devel,
	Neil Brown, David Woodhouse, Brian Norris, Alexander Viro,
	Jan Kara, Jeff Layton, J. Bruce Fields, Serge Hallyn,
	Andy Lutomirski, linux-fsdevel, linux-security-module, selinux,
	linux-kernel, linux-mtd, linux-bcache, linux-raid

Mike Snitzer <snitzer@redhat.com> writes:

> What layer establishes access rights to historically root-only
> priviledged block devices?  Is it user namespaces?

Block devices are weird.

Mounts historically have not checked the permissions on the block
devices because a mounter has CAP_SYS_ADMIN.

Unprivileged users are allowes to read/write block devices if
someone has given them permissions on the device node in the
filesystem.

The thinking with this patchset is to start performing the normal
block device access permission checks when mounting filesystems
when the mounter does not have the global CAP_SYS_ADMIN permission.

The truth is we are not much past the point of realizing that there were
no permission checks to use the actual block device passed in to mount,
so we could still be missing something. There is a lot going on with dm,
md, and lvm.  I don't know if the model of just look at the block device
inode and perform the permission checks is good enough.

> I haven't kept up with user namespaces as it relates to stacking block
> drivers like DM.  But I'm happy to come up to speed and at the same time
> help you verify all works as expected with DM blocks devices...

We are just getting there.  But if you can help that would be great.
The primary concern with dm is what happens when unprivileged users get
ahold of the code, and what happens when evil users corrupt the on-disk
format.

In principle dm like loop should be safe to use if there are not bugs
that make it unsafe for unprivileged users to access the code.

The goal if possible is to run things like docker without needed to be
root or even more fun to run docker in a container, and in general
enable nested containers.

Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] fs: Verify access of user towards block device file when mounting
  2015-10-01 15:55         ` Eric W. Biederman
@ 2015-10-01 23:07           ` Jan Kara
  2015-10-05 14:26             ` Seth Forshee
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2015-10-01 23:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Mike Snitzer, Seth Forshee, Kent Overstreet, Alasdair Kergon,
	dm-devel, Neil Brown, David Woodhouse, Brian Norris,
	Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	linux-bcache, linux-raid

On Thu 01-10-15 10:55:50, Eric W. Biederman wrote:
> The goal if possible is to run things like docker without needed to be
> root or even more fun to run docker in a container, and in general
> enable nested containers.

Frankly at the filesystem side we are rather far from being able to safely
mount untrusted device and I don't think we'll ever be robust enough to
tolerate e.g. user changing the disk while fs is using it. So would this be
FUSE-only thing or is someone still hoping that general purpose filesystems
will be robust enough in future?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] fs: Verify access of user towards block device file when mounting
  2015-10-01 23:07           ` Jan Kara
@ 2015-10-05 14:26             ` Seth Forshee
  0 siblings, 0 replies; 16+ messages in thread
From: Seth Forshee @ 2015-10-05 14:26 UTC (permalink / raw)
  To: Jan Kara
  Cc: Eric W. Biederman, Mike Snitzer, Kent Overstreet,
	Alasdair Kergon, dm-devel, Neil Brown, David Woodhouse,
	Brian Norris, Alexander Viro, Jan Kara, Jeff Layton,
	J. Bruce Fields, Serge Hallyn, Andy Lutomirski, linux-fsdevel,
	linux-security-module, selinux, linux-kernel, linux-mtd,
	linux-bcache, linux-raid

On Fri, Oct 02, 2015 at 01:07:00AM +0200, Jan Kara wrote:
> On Thu 01-10-15 10:55:50, Eric W. Biederman wrote:
> > The goal if possible is to run things like docker without needed to be
> > root or even more fun to run docker in a container, and in general
> > enable nested containers.
> 
> Frankly at the filesystem side we are rather far from being able to safely
> mount untrusted device and I don't think we'll ever be robust enough to
> tolerate e.g. user changing the disk while fs is using it. So would this be
> FUSE-only thing or is someone still hoping that general purpose filesystems
> will be robust enough in future?

FUSE will almost certainly be first. I've also been working with ext4,
and I would like to see that eventually supported to some degree.

Seth

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] fs: Verify access of user towards block device file when mounting
  2015-10-01 14:41         ` Seth Forshee
@ 2015-10-08 15:41           ` Seth Forshee
  0 siblings, 0 replies; 16+ messages in thread
From: Seth Forshee @ 2015-10-08 15:41 UTC (permalink / raw)
  To: Eric W. Biederman, Mike Snitzer
  Cc: Kent Overstreet, Alasdair Kergon, dm-devel, Neil Brown,
	David Woodhouse, Brian Norris, Alexander Viro, Jan Kara,
	Jeff Layton, J. Bruce Fields, Serge Hallyn, Andy Lutomirski,
	linux-fsdevel, linux-security-module, selinux, linux-kernel,
	linux-mtd, linux-bcache, linux-raid

On Thu, Oct 01, 2015 at 09:41:37AM -0500, Seth Forshee wrote:
> On Thu, Oct 01, 2015 at 09:40:52AM -0400, Mike Snitzer wrote:
> > On Thu, Oct 01 2015 at  8:55am -0400,
> > Seth Forshee <seth.forshee@canonical.com> wrote:
> > 
> > > On Wed, Sep 30, 2015 at 07:42:15PM -0400, Mike Snitzer wrote:
> > > > On Wed, Sep 30 2015 at  4:15pm -0400,
> > > > Seth Forshee <seth.forshee@canonical.com> wrote:
> > > > 
> > > > > When mounting a filesystem on a block device there is currently
> > > > > no verification that the user has appropriate access to the
> > > > > device file passed to mount. This has not been an issue so far
> > > > > since the user in question has always been root, but this must
> > > > > be changed before allowing unprivileged users to mount in user
> > > > > namespaces.
> > > > > 
> > > > > To fix this, add an argument to lookup_bdev() to specify the
> > > > > required permissions. If the mask of permissions is zero, or
> > > > > if the user has CAP_SYS_ADMIN, the permission check is skipped,
> > > > > otherwise the lookup fails if the user does not have the
> > > > > specified access rights for the inode at the supplied path.
> > > > > 
> > > > > Callers associated with mounting are updated to pass permission
> > > > > masks to lookup_bdev() so that these mounts will fail for an
> > > > > unprivileged user who lacks permissions for the block device
> > > > > inode. All other callers pass 0 to maintain their current
> > > > > behaviors.
> > > > > 
> > > > > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> > > > > ---
> > > > >  drivers/md/bcache/super.c |  2 +-
> > > > >  drivers/md/dm-table.c     |  2 +-
> > > > >  drivers/mtd/mtdsuper.c    |  6 +++++-
> > > > >  fs/block_dev.c            | 18 +++++++++++++++---
> > > > >  fs/quota/quota.c          |  2 +-
> > > > >  include/linux/fs.h        |  2 +-
> > > > >  6 files changed, 24 insertions(+), 8 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > > > > index e76ed003769e..35bb3ea4cbe2 100644
> > > > > --- a/drivers/md/dm-table.c
> > > > > +++ b/drivers/md/dm-table.c
> > > > > @@ -380,7 +380,7 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
> > > > >  	BUG_ON(!t);
> > > > >  
> > > > >  	/* convert the path to a device */
> > > > > -	bdev = lookup_bdev(path);
> > > > > +	bdev = lookup_bdev(path, 0);
> > > > >  	if (IS_ERR(bdev)) {
> > > > >  		dev = name_to_dev_t(path);
> > > > >  		if (!dev)
> > > > 
> > > > Given dm_get_device() is passed @mode why not have it do something like
> > > > you did in blkdev_get_by_path()? e.g.:
> > > 
> > > I only dealt with code related to mounting in this patch since that's
> > > what I'm working on. I have it on my TODO list to consider converting
> > > other callers of lookup_bdev. But if you're sure doing so makes sense
> > > for dm_get_device and that it won't cause regressions then I could add a
> > > patch for it.
> > 
> > OK, dm_get_device() is called in DM device activation path (by tools
> > like lvm2).
> > 
> > After lookup_bdev() it goes on to call blkdev_get_by_dev() with this
> > call chain: 
> >   dm_get_device -> dm_get_table_device -> open_table_device -> blkdev_get_by_dev
> > 
> > Not immediately clear to me why we'd need to augment blkdev_get_by_dev()
> > to do this checking also.
> > 
> > However, thinking further: In a device stack (e.g. dm/lvm2, md, etc)
> > new virtual block devices are created that layer ontop of the
> > traditional block devices.  This level of indirection may cause your
> > lookup_bdev() check to go on to succeed (if access constraints were not
> > established on the upper level dm or md device?).  I'm just thinking
> > outloud here: but have you verified your changes work as intended on
> > devices created with either lvm2 or mdadm?
> > 
> > What layer establishes access rights to historically root-only
> > priviledged block devices?  Is it user namespaces?
> 
> I'm going to start with this last question and work my way backwards.
> 
> Who determines access rights to the block devices varies to some degree.
> Any normal user could unshare the user/mount namespaces and still see
> all the block devices in /dev. If we're going to allow that user to then
> mount block devices in their private namespaces, the kernel must verify
> that the user has appropriate permissions for the block device inode.
> That's the point of this patch. In a container context the host process
> which sets up the container might share some block devices into the
> container via bind mounts, in which case it would be responsible for
> setting up access rights (both via inode permissions and potentially via
> devcgroups). I also have some patches I'm working on for a loop psuedo
> filesystem which would allow a container to allocate loop devices for
> its own use (via the loop-control device), in which case the kernel
> creates a device node in the loopfs filesystem from which the new loop
> device was allocated, owned by the root user in the user namespace
> associated with the loopfs superblock.
> 
> Obviously a DM block device is more complicated than a traditional block
> device. This patch should ensure that if an unprivileged user tries to
> mount a DM blkdev that it fails if the user lacks permissions for the DM
> blkdev inode. There's also the blkdevs behind the DM blkdev, and I'm not
> sure whether we need to additionally verify that the user has access to
> those devices. Maybe it's enough that someone with access to them set up
> the DM device, and someone with sufficient privileges gave that user
> access to the DM device. Do you have any thoughts?
> 
> As for activating a DM device, I assume only root is permitted to do
> this. In that case lookup_bdev will skip the inode permission check
> even if a non-zero permission mask is passed.

I've been looking into this more and doing some testing.

As far as mounting goes, it ends up behaving as I expected. As long as
the user has permission to access the inode for the DM block device, the
device can be mounted. Access to the block devices behind the DM block
device is not required. I think this makes sense - it's consistent with
how read/write access to the DM block device works already independent
of user namespaces.

One thing I noticed that might be a concern (I'm not sure) are ioctls on
DM block devices. It looks to me like often these will be passed through
to the underlying block device.

It appears that all code paths which call dm_get_device originate with
ioctls on /dev/mapper/control. Doing this requires CAP_SYS_ADMIN, in
which case the permission check in lookup_bdev is skipped. So there's no
benefit to having dm_get_device pass a permission mask to
blkdev_get_by_path, nor is there any harm. It will only matter if at
some point userns-root is allowed to set up DM devices.

Seth

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-10-08 15:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-30 20:15 [PATCH 0/5] User namespace mount updates Seth Forshee
2015-09-30 20:15 ` [PATCH 1/5] fs: Verify access of user towards block device file when mounting Seth Forshee
2015-09-30 23:42   ` Mike Snitzer
2015-10-01 12:55     ` Seth Forshee
2015-10-01 13:40       ` Mike Snitzer
2015-10-01 14:41         ` Seth Forshee
2015-10-08 15:41           ` Seth Forshee
2015-10-01 15:55         ` Eric W. Biederman
2015-10-01 23:07           ` Jan Kara
2015-10-05 14:26             ` Seth Forshee
2015-10-01 15:40   ` Eric W. Biederman
2015-10-01 15:55     ` Seth Forshee
2015-09-30 20:15 ` [PATCH 2/5] fs: Treat foreign mounts as nosuid Seth Forshee
2015-09-30 20:15 ` [PATCH 3/5] selinux: Add support for unprivileged mounts from user namespaces Seth Forshee
2015-09-30 20:15 ` [PATCH 4/5] userns: Replace in_userns with current_in_userns Seth Forshee
2015-09-30 20:15 ` [PATCH 5/5] Smack: Handle labels consistently in untrusted mounts Seth Forshee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).