All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces
@ 2016-01-04 18:03 Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 01/18] block_dev: Support checking inode permissions in lookup_bdev() Seth Forshee
                   ` (3 more replies)
  0 siblings, 4 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, linux-bcache, dm-devel, linux-raid, linux-mtd,
	linux-fsdevel, fuse-devel, linux-security-module, selinux
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, Miklos Szeredi, linux-kernel, Seth Forshee

These patches implement support for mounting filesystems in user
namespaces using fuse. They are based on the patches in the for-testing
branch of
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git,
but I've rebased them onto 4.4-rc3. I've pushed all of this to:

 git://git.kernel.org/pub/scm/linux/kernel/git/sforshee/linux.git fuse-userns

The patches are organized into three high-level groups.

Patches 1-6 are related to security, adding restrictions for
unprivileged mounts and updating the LSMs as needed. Patches 1-2
(checking inode permissions for block device mounts) may not be strictly
necessary for fuseblk mounts since fuse doesn't do any IO on the block
device in the kernel, but it still seems like a good idea to fail the
mount if the user doesn't have the required permissions for the inode
(though this is a bit misleading with fuse since the mounts are done via
a suid-root helper).

Patches 7-14 update most of the vfs to translate ids correctly and deal
with inodes which may have invalid user/group ids. I've omitted patches
for anything not used by fuse - quota, fs freezing, some helper
functions, etc. - but if these are wanted for the sake of completeness I
can include them.

Patches 15-18 update fuse to deal with mounts from non-init pid and user
namespaces and enable mounting from user namespaces.

Changes since v1:
 - Drop patch for FIBMAP.
 - Use current_in_userns in fuse_allow_current_process.
 - Remove checks for uid/gid validity in fuse. Intead, ids from the
   backing store which do not map into s_user_ns will result in invalid
   ids in the vfs inode. Checks in the vfs will prevent unmappable ids
   from being passed in from above.
 - Update a couple of commit messages to provide more detail about
   changes.

Thanks,
Seth

Andy Lutomirski (1):
  fs: Treat foreign mounts as nosuid

Seth Forshee (17):
  block_dev: Support checking inode permissions in lookup_bdev()
  block_dev: Check permissions towards block device inode when mounting
  selinux: Add support for unprivileged mounts from user namespaces
  userns: Replace in_userns with current_in_userns
  Smack: Handle labels consistently in untrusted mounts
  fs: Check for invalid i_uid in may_follow_link()
  cred: Reject inodes with invalid ids in set_create_file_as()
  fs: Refuse uid/gid changes which don't map into s_user_ns
  fs: Update posix_acl support to handle user namespace mounts
  fs: Ensure the mounter of a filesystem is privileged towards its
    inodes
  fs: Don't remove suid for CAP_FSETID in s_user_ns
  fs: Allow superblock owner to access do_remount_sb()
  capabilities: Allow privileged user in s_user_ns to set security.*
    xattrs
  fuse: Add support for pid namespaces
  fuse: Support fuse filesystems outside of init_user_ns
  fuse: Restrict allow_other to the superblock's namespace or a
    descendant
  fuse: Allow user namespace mounts

 drivers/md/bcache/super.c       |  2 +-
 drivers/md/dm-table.c           |  2 +-
 drivers/mtd/mtdsuper.c          |  2 +-
 fs/attr.c                       | 11 +++++++
 fs/block_dev.c                  | 18 +++++++++--
 fs/exec.c                       |  2 +-
 fs/fuse/cuse.c                  |  3 +-
 fs/fuse/dev.c                   | 26 ++++++++++++----
 fs/fuse/dir.c                   | 16 +++++-----
 fs/fuse/file.c                  | 22 +++++++++++---
 fs/fuse/fuse_i.h                | 10 +++++-
 fs/fuse/inode.c                 | 42 +++++++++++++++++---------
 fs/inode.c                      |  6 +++-
 fs/namei.c                      |  2 +-
 fs/namespace.c                  | 17 +++++++++--
 fs/posix_acl.c                  | 67 ++++++++++++++++++++++++++---------------
 fs/quota/quota.c                |  2 +-
 fs/xattr.c                      | 19 +++++++++---
 include/linux/fs.h              |  2 +-
 include/linux/mount.h           |  1 +
 include/linux/posix_acl_xattr.h | 17 ++++++++---
 include/linux/uidgid.h          | 10 ++++++
 include/linux/user_namespace.h  |  6 ++--
 kernel/capability.c             | 13 +++++---
 kernel/cred.c                   |  2 ++
 kernel/user_namespace.c         |  6 ++--
 security/commoncap.c            | 16 ++++++----
 security/selinux/hooks.c        | 25 ++++++++++++++-
 security/smack/smack_lsm.c      | 29 ++++++++++++------
 29 files changed, 287 insertions(+), 109 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 01/18] block_dev: Support checking inode permissions in lookup_bdev()
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03 ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Kent Overstreet, Alasdair Kergon,
	Mike Snitzer, dm-devel, Neil Brown, David Woodhouse,
	Brian Norris, Alexander Viro, Jan Kara, Jeff Layton,
	J. Bruce Fields
  Cc: Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn,
	Miklos Szeredi, linux-kernel, linux-bcache, linux-raid,
	linux-mtd, linux-fsdevel, fuse-devel, linux-security-module,
	selinux, Seth Forshee

When looking up a block device by path no permission check is
done to verify that the user has access to the block device inode
at the specified path. In some cases it may be necessary to
check permissions towards the inode, such as allowing
unprivileged users to mount block devices in user namespaces.

Add an argument to lookup_bdev() to optionally perform this
permission check. A value of 0 skips the permission check and
behaves the same as before. A non-zero value specifies the mask
of access rights required towards the inode at the specified
path. The check is always skipped if the user has CAP_SYS_ADMIN.

All callers of lookup_bdev() currently pass a mask of 0, so this
patch results in no functional change. Subsequent patches will
add permission checks where appropriate.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 drivers/md/bcache/super.c |  2 +-
 drivers/md/dm-table.c     |  2 +-
 drivers/mtd/mtdsuper.c    |  2 +-
 fs/block_dev.c            | 13 ++++++++++---
 fs/quota/quota.c          |  2 +-
 include/linux/fs.h        |  2 +-
 6 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 679a093a3bf6..e8287b0d1dac 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1926,7 +1926,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 				  sb);
 	if (IS_ERR(bdev)) {
 		if (bdev == ERR_PTR(-EBUSY)) {
-			bdev = lookup_bdev(strim(path));
+			bdev = lookup_bdev(strim(path), 0);
 			mutex_lock(&bch_register_lock);
 			if (!IS_ERR(bdev) && bch_is_open(bdev))
 				err = "device already registered";
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 061152a43730..81c60b2495ed 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -380,7 +380,7 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
 	BUG_ON(!t);
 
 	/* convert the path to a device */
-	bdev = lookup_bdev(path);
+	bdev = lookup_bdev(path, 0);
 	if (IS_ERR(bdev)) {
 		dev = name_to_dev_t(path);
 		if (!dev)
diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
index 20c02a3b7417..b5b60e1af31c 100644
--- a/drivers/mtd/mtdsuper.c
+++ b/drivers/mtd/mtdsuper.c
@@ -176,7 +176,7 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
 	/* try the old way - the hack where we allowed users to mount
 	 * /dev/mtdblock$(n) but didn't actually _use_ the blockdev
 	 */
-	bdev = lookup_bdev(dev_name);
+	bdev = lookup_bdev(dev_name, 0);
 	if (IS_ERR(bdev)) {
 		ret = PTR_ERR(bdev);
 		pr_debug("MTDSB: lookup_bdev() returned %d\n", ret);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index f90d91efa1b4..3ebbde85d898 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1426,7 +1426,7 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
 	struct block_device *bdev;
 	int err;
 
-	bdev = lookup_bdev(path);
+	bdev = lookup_bdev(path, 0);
 	if (IS_ERR(bdev))
 		return bdev;
 
@@ -1736,12 +1736,14 @@ EXPORT_SYMBOL(ioctl_by_bdev);
 /**
  * lookup_bdev  - lookup a struct block_device by name
  * @pathname:	special file representing the block device
+ * @mask:	rights to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
  *
  * Get a reference to the blockdevice at @pathname in the current
  * namespace if possible and return it.  Return ERR_PTR(error)
- * otherwise.
+ * otherwise.  If @mask is non-zero, check for access rights to the
+ * inode at @pathname.
  */
-struct block_device *lookup_bdev(const char *pathname)
+struct block_device *lookup_bdev(const char *pathname, int mask)
 {
 	struct block_device *bdev;
 	struct inode *inode;
@@ -1756,6 +1758,11 @@ struct block_device *lookup_bdev(const char *pathname)
 		return ERR_PTR(error);
 
 	inode = d_backing_inode(path.dentry);
+	if (mask != 0 && !capable(CAP_SYS_ADMIN)) {
+		error = __inode_permission(inode, mask);
+		if (error)
+			goto fail;
+	}
 	error = -ENOTBLK;
 	if (!S_ISBLK(inode->i_mode))
 		goto fail;
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 3746367098fd..a40eaecbd5cc 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -733,7 +733,7 @@ static struct super_block *quotactl_block(const char __user *special, int cmd)
 
 	if (IS_ERR(tmp))
 		return ERR_CAST(tmp);
-	bdev = lookup_bdev(tmp->name);
+	bdev = lookup_bdev(tmp->name, 0);
 	putname(tmp);
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8a17c5649ef2..879ec382fd88 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2373,7 +2373,7 @@ static inline void unregister_chrdev(unsigned int major, const char *name)
 #define BLKDEV_MAJOR_HASH_SIZE	255
 extern const char *__bdevname(dev_t, char *buffer);
 extern const char *bdevname(struct block_device *bdev, char *buffer);
-extern struct block_device *lookup_bdev(const char *);
+extern struct block_device *lookup_bdev(const char *, int mask);
 extern void blkdev_show(struct seq_file *,off_t);
 
 #else
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 02/18] block_dev: Check permissions towards block device inode when mounting
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: Serge Hallyn, Seth Forshee, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	Miklos Szeredi, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Unprivileged users should not be able to mount block devices when
they lack sufficient privileges towards the block device inode.
Update blkdev_get_by_path() to validate that the user has the
required access to the inode at the specified path. The check
will be skipped for CAP_SYS_ADMIN, so privileged mounts will
continue working as before.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/block_dev.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 3ebbde85d898..4fdb6ab59816 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1424,9 +1424,14 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
 					void *holder)
 {
 	struct block_device *bdev;
+	int perm = 0;
 	int err;
 
-	bdev = lookup_bdev(path, 0);
+	if (mode & FMODE_READ)
+		perm |= MAY_READ;
+	if (mode & FMODE_WRITE)
+		perm |= MAY_WRITE;
+	bdev = lookup_bdev(path, perm);
 	if (IS_ERR(bdev))
 		return bdev;
 
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 02/18] block_dev: Check permissions towards block device inode when mounting
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn,
	Miklos Szeredi, linux-kernel, linux-bcache, dm-devel, linux-raid,
	linux-mtd, linux-fsdevel, fuse-devel, linux-security-module,
	selinux, Seth Forshee

Unprivileged users should not be able to mount block devices when
they lack sufficient privileges towards the block device inode.
Update blkdev_get_by_path() to validate that the user has the
required access to the inode at the specified path. The check
will be skipped for CAP_SYS_ADMIN, so privileged mounts will
continue working as before.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 fs/block_dev.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 3ebbde85d898..4fdb6ab59816 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1424,9 +1424,14 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
 					void *holder)
 {
 	struct block_device *bdev;
+	int perm = 0;
 	int err;
 
-	bdev = lookup_bdev(path, 0);
+	if (mode & FMODE_READ)
+		perm |= MAY_READ;
+	if (mode & FMODE_WRITE)
+		perm |= MAY_WRITE;
+	bdev = lookup_bdev(path, perm);
 	if (IS_ERR(bdev))
 		return bdev;
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 03/18] fs: Treat foreign mounts as nosuid
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Serge Hallyn, James Morris,
	Serge E. Hallyn, Paul Moore, Stephen Smalley, Eric Paris
  Cc: linux-security-module-u79uwXL29TY76Z2rM5mHXA, Seth Forshee,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski

From: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>

If a process gets access to a mount from a different user
namespace, that process should not be able to take advantage of
setuid files or selinux entrypoints from that filesystem.  Prevent
this by treating mounts from other mount namespaces and those not
owned by current_user_ns() or an ancestor as nosuid.

This will make it safer to allow more complex filesystems to be
mounted in non-root user namespaces.

This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
setgid, and file capability bits can no longer be abused if code in
a user namespace were to clear nosuid on an untrusted filesystem,
but this patch, by itself, is insufficient to protect the system
from abuse of files that, when execed, would increase MAC privilege.

As a more concrete explanation, any task that can manipulate a
vfsmount associated with a given user namespace already has
capabilities in that namespace and all of its descendents.  If they
can cause a malicious setuid, setgid, or file-caps executable to
appear in that mount, then that executable will only allow them to
elevate privileges in exactly the set of namespaces in which they
are already privileges.

On the other hand, if they can cause a malicious executable to
appear with a dangerous MAC label, running it could change the
caller's security context in a way that should not have been
possible, even inside the namespace in which the task is confined.

As a hardening measure, this would have made CVE-2014-5207 much
more difficult to exploit.

Signed-off-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: James Morris <james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/exec.c                |  2 +-
 fs/namespace.c           | 13 +++++++++++++
 include/linux/mount.h    |  1 +
 security/commoncap.c     |  2 +-
 security/selinux/hooks.c |  2 +-
 5 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b06623a9347f..ea7311d72cc3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1295,7 +1295,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
 	bprm->cred->euid = current_euid();
 	bprm->cred->egid = current_egid();
 
-	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+	if (!mnt_may_suid(bprm->file->f_path.mnt))
 		return;
 
 	if (task_no_new_privs(current))
diff --git a/fs/namespace.c b/fs/namespace.c
index da70f7c4ece1..2101ce7b96ab 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3276,6 +3276,19 @@ found:
 	return visible;
 }
 
+bool mnt_may_suid(struct vfsmount *mnt)
+{
+	/*
+	 * Foreign mounts (accessed via fchdir or through /proc
+	 * symlinks) are always treated as if they are nosuid.  This
+	 * prevents namespaces from trusting potentially unsafe
+	 * suid/sgid bits, file caps, or security labels that originate
+	 * in other namespaces.
+	 */
+	return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) &&
+	       in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns);
+}
+
 static struct ns_common *mntns_get(struct task_struct *task)
 {
 	struct ns_common *ns = NULL;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index f822c3c11377..54a594d49733 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -81,6 +81,7 @@ extern void mntput(struct vfsmount *mnt);
 extern struct vfsmount *mntget(struct vfsmount *mnt);
 extern struct vfsmount *mnt_clone_internal(struct path *path);
 extern int __mnt_is_readonly(struct vfsmount *mnt);
+extern bool mnt_may_suid(struct vfsmount *mnt);
 
 struct path;
 extern struct vfsmount *clone_private_mount(struct path *path);
diff --git a/security/commoncap.c b/security/commoncap.c
index 400aa224b491..6243aef5860e 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -448,7 +448,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 	if (!file_caps_enabled)
 		return 0;
 
-	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+	if (!mnt_may_suid(bprm->file->f_path.mnt))
 		return 0;
 	if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns))
 		return 0;
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index d0cfaa9f19d0..a5b93df6553f 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2171,7 +2171,7 @@ static int check_nnp_nosuid(const struct linux_binprm *bprm,
 			    const struct task_security_struct *new_tsec)
 {
 	int nnp = (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS);
-	int nosuid = (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID);
+	int nosuid = !mnt_may_suid(bprm->file->f_path.mnt);
 	int rc;
 
 	if (!nnp && !nosuid)
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 03/18] fs: Treat foreign mounts as nosuid
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Serge Hallyn, James Morris,
	Serge E. Hallyn, Paul Moore, Stephen Smalley, Eric Paris
  Cc: Richard Weinberger, Austin S Hemmelgarn, Miklos Szeredi,
	linux-kernel, linux-bcache, dm-devel, linux-raid, linux-mtd,
	linux-fsdevel, fuse-devel, linux-security-module, selinux,
	Seth Forshee, Andy Lutomirski

From: Andy Lutomirski <luto@amacapital.net>

If a process gets access to a mount from a different user
namespace, that process should not be able to take advantage of
setuid files or selinux entrypoints from that filesystem.  Prevent
this by treating mounts from other mount namespaces and those not
owned by current_user_ns() or an ancestor as nosuid.

This will make it safer to allow more complex filesystems to be
mounted in non-root user namespaces.

This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
setgid, and file capability bits can no longer be abused if code in
a user namespace were to clear nosuid on an untrusted filesystem,
but this patch, by itself, is insufficient to protect the system
from abuse of files that, when execed, would increase MAC privilege.

As a more concrete explanation, any task that can manipulate a
vfsmount associated with a given user namespace already has
capabilities in that namespace and all of its descendents.  If they
can cause a malicious setuid, setgid, or file-caps executable to
appear in that mount, then that executable will only allow them to
elevate privileges in exactly the set of namespaces in which they
are already privileges.

On the other hand, if they can cause a malicious executable to
appear with a dangerous MAC label, running it could change the
caller's security context in a way that should not have been
possible, even inside the namespace in which the task is confined.

As a hardening measure, this would have made CVE-2014-5207 much
more difficult to exploit.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 fs/exec.c                |  2 +-
 fs/namespace.c           | 13 +++++++++++++
 include/linux/mount.h    |  1 +
 security/commoncap.c     |  2 +-
 security/selinux/hooks.c |  2 +-
 5 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b06623a9347f..ea7311d72cc3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1295,7 +1295,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
 	bprm->cred->euid = current_euid();
 	bprm->cred->egid = current_egid();
 
-	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+	if (!mnt_may_suid(bprm->file->f_path.mnt))
 		return;
 
 	if (task_no_new_privs(current))
diff --git a/fs/namespace.c b/fs/namespace.c
index da70f7c4ece1..2101ce7b96ab 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3276,6 +3276,19 @@ found:
 	return visible;
 }
 
+bool mnt_may_suid(struct vfsmount *mnt)
+{
+	/*
+	 * Foreign mounts (accessed via fchdir or through /proc
+	 * symlinks) are always treated as if they are nosuid.  This
+	 * prevents namespaces from trusting potentially unsafe
+	 * suid/sgid bits, file caps, or security labels that originate
+	 * in other namespaces.
+	 */
+	return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) &&
+	       in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns);
+}
+
 static struct ns_common *mntns_get(struct task_struct *task)
 {
 	struct ns_common *ns = NULL;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index f822c3c11377..54a594d49733 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -81,6 +81,7 @@ extern void mntput(struct vfsmount *mnt);
 extern struct vfsmount *mntget(struct vfsmount *mnt);
 extern struct vfsmount *mnt_clone_internal(struct path *path);
 extern int __mnt_is_readonly(struct vfsmount *mnt);
+extern bool mnt_may_suid(struct vfsmount *mnt);
 
 struct path;
 extern struct vfsmount *clone_private_mount(struct path *path);
diff --git a/security/commoncap.c b/security/commoncap.c
index 400aa224b491..6243aef5860e 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -448,7 +448,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 	if (!file_caps_enabled)
 		return 0;
 
-	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+	if (!mnt_may_suid(bprm->file->f_path.mnt))
 		return 0;
 	if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns))
 		return 0;
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index d0cfaa9f19d0..a5b93df6553f 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2171,7 +2171,7 @@ static int check_nnp_nosuid(const struct linux_binprm *bprm,
 			    const struct task_security_struct *new_tsec)
 {
 	int nnp = (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS);
-	int nosuid = (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID);
+	int nosuid = !mnt_may_suid(bprm->file->f_path.mnt);
 	int rc;
 
 	if (!nnp && !nosuid)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 04/18] selinux: Add support for unprivileged mounts from user namespaces
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Paul Moore, Stephen Smalley, Eric Paris
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn, Seth Forshee,
	James Morris, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Alexander Viro,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Serge E. Hallyn

Security labels from unprivileged mounts in user namespaces must
be ignored. Force superblocks from user namespaces whose labeling
behavior is to use xattrs to use mountpoint labeling instead.
For the mountpoint label, default to converting the current task
context into a form suitable for file objects, but also allow the
policy writer to specify a different label through policy
transition rules.

Pieced together from code snippets provided by Stephen Smalley.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Stephen Smalley <sds-+05T5uksL2qpZYMLLGbcSA@public.gmane.org>
Acked-by: James Morris <james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 security/selinux/hooks.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index a5b93df6553f..5fedc36dd6b2 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -756,6 +756,28 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 			goto out;
 		}
 	}
+
+	/*
+	 * If this is a user namespace mount, no contexts are allowed
+	 * on the command line and security labels must be ignored.
+	 */
+	if (sb->s_user_ns != &init_user_ns) {
+		if (context_sid || fscontext_sid || rootcontext_sid ||
+		    defcontext_sid) {
+			rc = -EACCES;
+			goto out;
+		}
+		if (sbsec->behavior == SECURITY_FS_USE_XATTR) {
+			sbsec->behavior = SECURITY_FS_USE_MNTPOINT;
+			rc = security_transition_sid(current_sid(), current_sid(),
+						     SECCLASS_FILE, NULL,
+						     &sbsec->mntpoint_sid);
+			if (rc)
+				goto out;
+		}
+		goto out_set_opts;
+	}
+
 	/* sets the context of the superblock for the fs being mounted. */
 	if (fscontext_sid) {
 		rc = may_context_mount_sb_relabel(fscontext_sid, sbsec, cred);
@@ -824,6 +846,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 		sbsec->def_sid = defcontext_sid;
 	}
 
+out_set_opts:
 	rc = sb_finish_set_opts(sb);
 out:
 	mutex_unlock(&sbsec->lock);
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 04/18] selinux: Add support for unprivileged mounts from user namespaces
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Paul Moore, Stephen Smalley, Eric Paris
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, Miklos Szeredi, linux-kernel, linux-bcache,
	dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux, Seth Forshee, James Morris,
	Serge E. Hallyn

Security labels from unprivileged mounts in user namespaces must
be ignored. Force superblocks from user namespaces whose labeling
behavior is to use xattrs to use mountpoint labeling instead.
For the mountpoint label, default to converting the current task
context into a form suitable for file objects, but also allow the
policy writer to specify a different label through policy
transition rules.

Pieced together from code snippets provided by Stephen Smalley.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <james.l.morris@oracle.com>
---
 security/selinux/hooks.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index a5b93df6553f..5fedc36dd6b2 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -756,6 +756,28 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 			goto out;
 		}
 	}
+
+	/*
+	 * If this is a user namespace mount, no contexts are allowed
+	 * on the command line and security labels must be ignored.
+	 */
+	if (sb->s_user_ns != &init_user_ns) {
+		if (context_sid || fscontext_sid || rootcontext_sid ||
+		    defcontext_sid) {
+			rc = -EACCES;
+			goto out;
+		}
+		if (sbsec->behavior == SECURITY_FS_USE_XATTR) {
+			sbsec->behavior = SECURITY_FS_USE_MNTPOINT;
+			rc = security_transition_sid(current_sid(), current_sid(),
+						     SECCLASS_FILE, NULL,
+						     &sbsec->mntpoint_sid);
+			if (rc)
+				goto out;
+		}
+		goto out_set_opts;
+	}
+
 	/* sets the context of the superblock for the fs being mounted. */
 	if (fscontext_sid) {
 		rc = may_context_mount_sb_relabel(fscontext_sid, sbsec, cred);
@@ -824,6 +846,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 		sbsec->def_sid = defcontext_sid;
 	}
 
+out_set_opts:
 	rc = sb_finish_set_opts(sb);
 out:
 	mutex_unlock(&sbsec->lock);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 05/18] userns: Replace in_userns with current_in_userns
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Serge Hallyn, James Morris,
	Serge E. Hallyn
  Cc: linux-security-module-u79uwXL29TY76Z2rM5mHXA, Seth Forshee,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

All current callers of in_userns pass current_user_ns as the
first argument. Simplify by replacing in_userns with
current_in_userns which checks whether current_user_ns is in the
namespace supplied as an argument.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: James Morris <james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/namespace.c                 | 2 +-
 include/linux/user_namespace.h | 6 ++----
 kernel/user_namespace.c        | 6 +++---
 security/commoncap.c           | 2 +-
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2101ce7b96ab..18fc58760aec 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3286,7 +3286,7 @@ bool mnt_may_suid(struct vfsmount *mnt)
 	 * in other namespaces.
 	 */
 	return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) &&
-	       in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns);
+	       current_in_userns(mnt->mnt_sb->s_user_ns);
 }
 
 static struct ns_common *mntns_get(struct task_struct *task)
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index a43faa727124..9217169c64cb 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -72,8 +72,7 @@ extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t,
 extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
 extern int proc_setgroups_show(struct seq_file *m, void *v);
 extern bool userns_may_setgroups(const struct user_namespace *ns);
-extern bool in_userns(const struct user_namespace *ns,
-		      const struct user_namespace *target_ns);
+extern bool current_in_userns(const struct user_namespace *target_ns);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -103,8 +102,7 @@ static inline bool userns_may_setgroups(const struct user_namespace *ns)
 	return true;
 }
 
-static inline bool in_userns(const struct user_namespace *ns,
-			     const struct user_namespace *target_ns)
+static inline bool current_in_userns(const struct user_namespace *target_ns)
 {
 	return true;
 }
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 69fbc377357b..5960edc7e644 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -949,10 +949,10 @@ bool userns_may_setgroups(const struct user_namespace *ns)
  * Returns true if @ns is the same namespace as or a descendant of
  * @target_ns.
  */
-bool in_userns(const struct user_namespace *ns,
-	       const struct user_namespace *target_ns)
+bool current_in_userns(const struct user_namespace *target_ns)
 {
-	for (; ns; ns = ns->parent) {
+	struct user_namespace *ns;
+	for (ns = current_user_ns(); ns; ns = ns->parent) {
 		if (ns == target_ns)
 			return true;
 	}
diff --git a/security/commoncap.c b/security/commoncap.c
index 6243aef5860e..2119421613f6 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -450,7 +450,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 
 	if (!mnt_may_suid(bprm->file->f_path.mnt))
 		return 0;
-	if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns))
+	if (!current_in_userns(bprm->file->f_path.mnt->mnt_sb->s_user_ns))
 		return 0;
 
 	rc = get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 05/18] userns: Replace in_userns with current_in_userns
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Serge Hallyn, James Morris,
	Serge E. Hallyn
  Cc: Richard Weinberger, Austin S Hemmelgarn, Miklos Szeredi,
	linux-kernel, linux-bcache, dm-devel, linux-raid, linux-mtd,
	linux-fsdevel, fuse-devel, linux-security-module, selinux,
	Seth Forshee

All current callers of in_userns pass current_user_ns as the
first argument. Simplify by replacing in_userns with
current_in_userns which checks whether current_user_ns is in the
namespace supplied as an argument.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 fs/namespace.c                 | 2 +-
 include/linux/user_namespace.h | 6 ++----
 kernel/user_namespace.c        | 6 +++---
 security/commoncap.c           | 2 +-
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2101ce7b96ab..18fc58760aec 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3286,7 +3286,7 @@ bool mnt_may_suid(struct vfsmount *mnt)
 	 * in other namespaces.
 	 */
 	return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) &&
-	       in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns);
+	       current_in_userns(mnt->mnt_sb->s_user_ns);
 }
 
 static struct ns_common *mntns_get(struct task_struct *task)
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index a43faa727124..9217169c64cb 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -72,8 +72,7 @@ extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t,
 extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
 extern int proc_setgroups_show(struct seq_file *m, void *v);
 extern bool userns_may_setgroups(const struct user_namespace *ns);
-extern bool in_userns(const struct user_namespace *ns,
-		      const struct user_namespace *target_ns);
+extern bool current_in_userns(const struct user_namespace *target_ns);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -103,8 +102,7 @@ static inline bool userns_may_setgroups(const struct user_namespace *ns)
 	return true;
 }
 
-static inline bool in_userns(const struct user_namespace *ns,
-			     const struct user_namespace *target_ns)
+static inline bool current_in_userns(const struct user_namespace *target_ns)
 {
 	return true;
 }
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 69fbc377357b..5960edc7e644 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -949,10 +949,10 @@ bool userns_may_setgroups(const struct user_namespace *ns)
  * Returns true if @ns is the same namespace as or a descendant of
  * @target_ns.
  */
-bool in_userns(const struct user_namespace *ns,
-	       const struct user_namespace *target_ns)
+bool current_in_userns(const struct user_namespace *target_ns)
 {
-	for (; ns; ns = ns->parent) {
+	struct user_namespace *ns;
+	for (ns = current_user_ns(); ns; ns = ns->parent) {
 		if (ns == target_ns)
 			return true;
 	}
diff --git a/security/commoncap.c b/security/commoncap.c
index 6243aef5860e..2119421613f6 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -450,7 +450,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 
 	if (!mnt_may_suid(bprm->file->f_path.mnt))
 		return 0;
-	if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns))
+	if (!current_in_userns(bprm->file->f_path.mnt->mnt_sb->s_user_ns))
 		return 0;
 
 	rc = get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 06/18] Smack: Handle labels consistently in untrusted mounts
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Casey Schaufler
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn, Seth Forshee,
	James Morris, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Alexander Viro,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Serge E. Hallyn

The SMACK64, SMACK64EXEC, and SMACK64MMAP labels are all handled
differently in untrusted mounts. This is confusing and
potentically problematic. Change this to handle them all the same
way that SMACK64 is currently handled; that is, read the label
from disk and check it at use time. For SMACK64 and SMACK64MMAP
access is denied if the label does not match smk_root. To be
consistent with suid, a SMACK64EXEC label which does not match
smk_root will still allow execution of the file but will not run
with the label supplied in the xattr.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Casey Schaufler <casey-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
---
 security/smack/smack_lsm.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 16cac04214e2..0e555f64ded0 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -921,6 +921,7 @@ static int smack_bprm_set_creds(struct linux_binprm *bprm)
 	struct inode *inode = file_inode(bprm->file);
 	struct task_smack *bsp = bprm->cred->security;
 	struct inode_smack *isp;
+	struct superblock_smack *sbsp;
 	int rc;
 
 	if (bprm->cred_prepared)
@@ -930,6 +931,11 @@ static int smack_bprm_set_creds(struct linux_binprm *bprm)
 	if (isp->smk_task == NULL || isp->smk_task == bsp->smk_task)
 		return 0;
 
+	sbsp = inode->i_sb->s_security;
+	if ((sbsp->smk_flags & SMK_SB_UNTRUSTED) &&
+	    isp->smk_task != sbsp->smk_root)
+		return 0;
+
 	if (bprm->unsafe & (LSM_UNSAFE_PTRACE | LSM_UNSAFE_PTRACE_CAP)) {
 		struct task_struct *tracer;
 		rc = 0;
@@ -1733,6 +1739,7 @@ static int smack_mmap_file(struct file *file,
 	struct task_smack *tsp;
 	struct smack_known *okp;
 	struct inode_smack *isp;
+	struct superblock_smack *sbsp;
 	int may;
 	int mmay;
 	int tmay;
@@ -1744,6 +1751,10 @@ static int smack_mmap_file(struct file *file,
 	isp = file_inode(file)->i_security;
 	if (isp->smk_mmap == NULL)
 		return 0;
+	sbsp = file_inode(file)->i_sb->s_security;
+	if (sbsp->smk_flags & SMK_SB_UNTRUSTED &&
+	    isp->smk_mmap != sbsp->smk_root)
+		return -EACCES;
 	mkp = isp->smk_mmap;
 
 	tsp = current_security();
@@ -3532,16 +3543,14 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
 			if (rc >= 0)
 				transflag = SMK_INODE_TRANSMUTE;
 		}
-		if (!(sbsp->smk_flags & SMK_SB_UNTRUSTED)) {
-			/*
-			 * Don't let the exec or mmap label be "*" or "@".
-			 */
-			skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
-			if (IS_ERR(skp) || skp == &smack_known_star ||
-			    skp == &smack_known_web)
-				skp = NULL;
-			isp->smk_task = skp;
-		}
+		/*
+		 * Don't let the exec or mmap label be "*" or "@".
+		 */
+		skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
+		if (IS_ERR(skp) || skp == &smack_known_star ||
+		    skp == &smack_known_web)
+			skp = NULL;
+		isp->smk_task = skp;
 
 		skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp);
 		if (IS_ERR(skp) || skp == &smack_known_star ||
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 06/18] Smack: Handle labels consistently in untrusted mounts
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Casey Schaufler
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, Miklos Szeredi, linux-kernel, linux-bcache,
	dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux, Seth Forshee, James Morris,
	Serge E. Hallyn

The SMACK64, SMACK64EXEC, and SMACK64MMAP labels are all handled
differently in untrusted mounts. This is confusing and
potentically problematic. Change this to handle them all the same
way that SMACK64 is currently handled; that is, read the label
from disk and check it at use time. For SMACK64 and SMACK64MMAP
access is denied if the label does not match smk_root. To be
consistent with suid, a SMACK64EXEC label which does not match
smk_root will still allow execution of the file but will not run
with the label supplied in the xattr.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
---
 security/smack/smack_lsm.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 16cac04214e2..0e555f64ded0 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -921,6 +921,7 @@ static int smack_bprm_set_creds(struct linux_binprm *bprm)
 	struct inode *inode = file_inode(bprm->file);
 	struct task_smack *bsp = bprm->cred->security;
 	struct inode_smack *isp;
+	struct superblock_smack *sbsp;
 	int rc;
 
 	if (bprm->cred_prepared)
@@ -930,6 +931,11 @@ static int smack_bprm_set_creds(struct linux_binprm *bprm)
 	if (isp->smk_task == NULL || isp->smk_task == bsp->smk_task)
 		return 0;
 
+	sbsp = inode->i_sb->s_security;
+	if ((sbsp->smk_flags & SMK_SB_UNTRUSTED) &&
+	    isp->smk_task != sbsp->smk_root)
+		return 0;
+
 	if (bprm->unsafe & (LSM_UNSAFE_PTRACE | LSM_UNSAFE_PTRACE_CAP)) {
 		struct task_struct *tracer;
 		rc = 0;
@@ -1733,6 +1739,7 @@ static int smack_mmap_file(struct file *file,
 	struct task_smack *tsp;
 	struct smack_known *okp;
 	struct inode_smack *isp;
+	struct superblock_smack *sbsp;
 	int may;
 	int mmay;
 	int tmay;
@@ -1744,6 +1751,10 @@ static int smack_mmap_file(struct file *file,
 	isp = file_inode(file)->i_security;
 	if (isp->smk_mmap == NULL)
 		return 0;
+	sbsp = file_inode(file)->i_sb->s_security;
+	if (sbsp->smk_flags & SMK_SB_UNTRUSTED &&
+	    isp->smk_mmap != sbsp->smk_root)
+		return -EACCES;
 	mkp = isp->smk_mmap;
 
 	tsp = current_security();
@@ -3532,16 +3543,14 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
 			if (rc >= 0)
 				transflag = SMK_INODE_TRANSMUTE;
 		}
-		if (!(sbsp->smk_flags & SMK_SB_UNTRUSTED)) {
-			/*
-			 * Don't let the exec or mmap label be "*" or "@".
-			 */
-			skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
-			if (IS_ERR(skp) || skp == &smack_known_star ||
-			    skp == &smack_known_web)
-				skp = NULL;
-			isp->smk_task = skp;
-		}
+		/*
+		 * Don't let the exec or mmap label be "*" or "@".
+		 */
+		skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
+		if (IS_ERR(skp) || skp == &smack_known_star ||
+		    skp == &smack_known_web)
+			skp = NULL;
+		isp->smk_task = skp;
 
 		skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp);
 		if (IS_ERR(skp) || skp == &smack_known_star ||
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 07/18] fs: Check for invalid i_uid in may_follow_link()
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: Serge Hallyn, Seth Forshee, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	Miklos Szeredi, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Filesystem uids which don't map into a user namespace may result
in inode->i_uid being INVALID_UID. A symlink and its parent
could have different owners in the filesystem can both get
mapped to INVALID_UID, which may result in following a symlink
when this would not have otherwise been permitted when protected
symlinks are enabled.

Add a new helper function, uid_valid_eq(), and use this to
validate that the ids in may_follow_link() are both equal and
valid. Also add an equivalent helper for gids, which is
currently unused.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/namei.c             |  2 +-
 include/linux/uidgid.h | 10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 288e8a74bf88..4ccafd391697 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -902,7 +902,7 @@ static inline int may_follow_link(struct nameidata *nd)
 		return 0;
 
 	/* Allowed if parent directory and link owner match. */
-	if (uid_eq(parent->i_uid, inode->i_uid))
+	if (uid_valid_eq(parent->i_uid, inode->i_uid))
 		return 0;
 
 	if (nd->flags & LOOKUP_RCU)
diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
index 03835522dfcb..e09529fe2668 100644
--- a/include/linux/uidgid.h
+++ b/include/linux/uidgid.h
@@ -117,6 +117,16 @@ static inline bool gid_valid(kgid_t gid)
 	return __kgid_val(gid) != (gid_t) -1;
 }
 
+static inline bool uid_valid_eq(kuid_t left, kuid_t right)
+{
+	return uid_eq(left, right) && uid_valid(left);
+}
+
+static inline bool gid_valid_eq(kgid_t left, kgid_t right)
+{
+	return gid_eq(left, right) && gid_valid(left);
+}
+
 #ifdef CONFIG_USER_NS
 
 extern kuid_t make_kuid(struct user_namespace *from, uid_t uid);
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 07/18] fs: Check for invalid i_uid in may_follow_link()
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn,
	Miklos Szeredi, linux-kernel, linux-bcache, dm-devel, linux-raid,
	linux-mtd, linux-fsdevel, fuse-devel, linux-security-module,
	selinux, Seth Forshee

Filesystem uids which don't map into a user namespace may result
in inode->i_uid being INVALID_UID. A symlink and its parent
could have different owners in the filesystem can both get
mapped to INVALID_UID, which may result in following a symlink
when this would not have otherwise been permitted when protected
symlinks are enabled.

Add a new helper function, uid_valid_eq(), and use this to
validate that the ids in may_follow_link() are both equal and
valid. Also add an equivalent helper for gids, which is
currently unused.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 fs/namei.c             |  2 +-
 include/linux/uidgid.h | 10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 288e8a74bf88..4ccafd391697 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -902,7 +902,7 @@ static inline int may_follow_link(struct nameidata *nd)
 		return 0;
 
 	/* Allowed if parent directory and link owner match. */
-	if (uid_eq(parent->i_uid, inode->i_uid))
+	if (uid_valid_eq(parent->i_uid, inode->i_uid))
 		return 0;
 
 	if (nd->flags & LOOKUP_RCU)
diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
index 03835522dfcb..e09529fe2668 100644
--- a/include/linux/uidgid.h
+++ b/include/linux/uidgid.h
@@ -117,6 +117,16 @@ static inline bool gid_valid(kgid_t gid)
 	return __kgid_val(gid) != (gid_t) -1;
 }
 
+static inline bool uid_valid_eq(kuid_t left, kuid_t right)
+{
+	return uid_eq(left, right) && uid_valid(left);
+}
+
+static inline bool gid_valid_eq(kgid_t left, kgid_t right)
+{
+	return gid_eq(left, right) && gid_valid(left);
+}
+
 #ifdef CONFIG_USER_NS
 
 extern kuid_t make_kuid(struct user_namespace *from, uid_t uid);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 08/18] cred: Reject inodes with invalid ids in set_create_file_as()
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn, Seth Forshee,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Alexander Viro,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Using INVALID_[UG]ID for the LSM file creation context doesn't
make sense, so return an error if the inode passed to
set_create_file_as() has an invalid id.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 kernel/cred.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/cred.c b/kernel/cred.c
index 71179a09c1d6..ff8606f77d90 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -689,6 +689,8 @@ EXPORT_SYMBOL(set_security_override_from_ctx);
  */
 int set_create_files_as(struct cred *new, struct inode *inode)
 {
+	if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid))
+		return -EINVAL;
 	new->fsuid = inode->i_uid;
 	new->fsgid = inode->i_gid;
 	return security_kernel_create_files_as(new, inode);
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 08/18] cred: Reject inodes with invalid ids in set_create_file_as()
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, Miklos Szeredi, linux-kernel, linux-bcache,
	dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux, Seth Forshee

Using INVALID_[UG]ID for the LSM file creation context doesn't
make sense, so return an error if the inode passed to
set_create_file_as() has an invalid id.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 kernel/cred.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/cred.c b/kernel/cred.c
index 71179a09c1d6..ff8606f77d90 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -689,6 +689,8 @@ EXPORT_SYMBOL(set_security_override_from_ctx);
  */
 int set_create_files_as(struct cred *new, struct inode *inode)
 {
+	if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid))
+		return -EINVAL;
 	new->fsuid = inode->i_uid;
 	new->fsgid = inode->i_gid;
 	return security_kernel_create_files_as(new, inode);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 09/18] fs: Refuse uid/gid changes which don't map into s_user_ns
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: Serge Hallyn, Seth Forshee, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	Miklos Szeredi, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Add checks to inode_change_ok to verify that uid and gid changes
will map into the superblock's user namespace. If they do not
fail with -EOVERFLOW. This cannot be overriden with ATTR_FORCE.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/attr.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/attr.c b/fs/attr.c
index 6530ced19697..55b46e3aa888 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -42,6 +42,17 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
 			return error;
 	}
 
+	/*
+	 * Verify that uid/gid changes are valid in the target namespace
+	 * of the superblock. This cannot be overriden using ATTR_FORCE.
+	 */
+	if (ia_valid & ATTR_UID &&
+	    from_kuid(inode->i_sb->s_user_ns, attr->ia_uid) == (uid_t)-1)
+		return -EOVERFLOW;
+	if (ia_valid & ATTR_GID &&
+	    from_kgid(inode->i_sb->s_user_ns, attr->ia_gid) == (gid_t)-1)
+		return -EOVERFLOW;
+
 	/* If force is set do it anyway. */
 	if (ia_valid & ATTR_FORCE)
 		return 0;
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 09/18] fs: Refuse uid/gid changes which don't map into s_user_ns
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn,
	Miklos Szeredi, linux-kernel, linux-bcache, dm-devel, linux-raid,
	linux-mtd, linux-fsdevel, fuse-devel, linux-security-module,
	selinux, Seth Forshee

Add checks to inode_change_ok to verify that uid and gid changes
will map into the superblock's user namespace. If they do not
fail with -EOVERFLOW. This cannot be overriden with ATTR_FORCE.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 fs/attr.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/attr.c b/fs/attr.c
index 6530ced19697..55b46e3aa888 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -42,6 +42,17 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
 			return error;
 	}
 
+	/*
+	 * Verify that uid/gid changes are valid in the target namespace
+	 * of the superblock. This cannot be overriden using ATTR_FORCE.
+	 */
+	if (ia_valid & ATTR_UID &&
+	    from_kuid(inode->i_sb->s_user_ns, attr->ia_uid) == (uid_t)-1)
+		return -EOVERFLOW;
+	if (ia_valid & ATTR_GID &&
+	    from_kgid(inode->i_sb->s_user_ns, attr->ia_gid) == (gid_t)-1)
+		return -EOVERFLOW;
+
 	/* If force is set do it anyway. */
 	if (ia_valid & ATTR_FORCE)
 		return 0;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 01/18] block_dev: Support checking inode permissions in lookup_bdev() Seth Forshee
@ 2016-01-04 18:03 ` Seth Forshee
       [not found] ` <1451930639-94331-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
  2016-01-25 19:47 ` [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn,
	Miklos Szeredi, linux-kernel, linux-bcache, dm-devel, linux-raid,
	linux-mtd, linux-fsdevel, fuse-devel, linux-security-module,
	selinux, Seth Forshee

ids in on-disk ACLs should be converted to s_user_ns instead of
init_user_ns as is done now. This introduces the possibility for
id mappings to fail, and when this happens syscalls will return
EOVERFLOW.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 fs/posix_acl.c                  | 67 ++++++++++++++++++++++++++---------------
 fs/xattr.c                      | 19 +++++++++---
 include/linux/posix_acl_xattr.h | 17 ++++++++---
 3 files changed, 70 insertions(+), 33 deletions(-)

diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index 4adde1e2cbec..a29442eb4af8 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -595,59 +595,77 @@ EXPORT_SYMBOL_GPL(posix_acl_create);
 /*
  * Fix up the uids and gids in posix acl extended attributes in place.
  */
-static void posix_acl_fix_xattr_userns(
+static int posix_acl_fix_xattr_userns(
 	struct user_namespace *to, struct user_namespace *from,
 	void *value, size_t size)
 {
 	posix_acl_xattr_header *header = (posix_acl_xattr_header *)value;
 	posix_acl_xattr_entry *entry = (posix_acl_xattr_entry *)(header+1), *end;
 	int count;
-	kuid_t uid;
-	kgid_t gid;
+	kuid_t kuid;
+	uid_t uid;
+	kgid_t kgid;
+	gid_t gid;
 
 	if (!value)
-		return;
+		return 0;
 	if (size < sizeof(posix_acl_xattr_header))
-		return;
+		return 0;
 	if (header->a_version != cpu_to_le32(POSIX_ACL_XATTR_VERSION))
-		return;
+		return 0;
 
 	count = posix_acl_xattr_count(size);
 	if (count < 0)
-		return;
+		return 0;
 	if (count == 0)
-		return;
+		return 0;
 
 	for (end = entry + count; entry != end; entry++) {
 		switch(le16_to_cpu(entry->e_tag)) {
 		case ACL_USER:
-			uid = make_kuid(from, le32_to_cpu(entry->e_id));
-			entry->e_id = cpu_to_le32(from_kuid(to, uid));
+			kuid = make_kuid(from, le32_to_cpu(entry->e_id));
+			if (!uid_valid(kuid))
+				return -EOVERFLOW;
+			uid = from_kuid(to, kuid);
+			if (uid == (uid_t)-1)
+				return -EOVERFLOW;
+			entry->e_id = cpu_to_le32(uid);
 			break;
 		case ACL_GROUP:
-			gid = make_kgid(from, le32_to_cpu(entry->e_id));
-			entry->e_id = cpu_to_le32(from_kgid(to, gid));
+			kgid = make_kgid(from, le32_to_cpu(entry->e_id));
+			if (!gid_valid(kgid))
+				return -EOVERFLOW;
+			gid = from_kgid(to, kgid);
+			if (gid == (gid_t)-1)
+				return -EOVERFLOW;
+			entry->e_id = cpu_to_le32(gid);
 			break;
 		default:
 			break;
 		}
 	}
+
+	return 0;
 }
 
-void posix_acl_fix_xattr_from_user(void *value, size_t size)
+int
+posix_acl_fix_xattr_from_user(struct user_namespace *target_ns, void *value,
+			      size_t size)
 {
-	struct user_namespace *user_ns = current_user_ns();
-	if (user_ns == &init_user_ns)
-		return;
-	posix_acl_fix_xattr_userns(&init_user_ns, user_ns, value, size);
+	struct user_namespace *source_ns = current_user_ns();
+	if (source_ns == target_ns)
+		return 0;
+	return posix_acl_fix_xattr_userns(target_ns, source_ns, value, size);
 }
 
-void posix_acl_fix_xattr_to_user(void *value, size_t size)
+int
+posix_acl_fix_xattr_to_user(struct user_namespace *source_ns, void *value,
+			    size_t size)
 {
-	struct user_namespace *user_ns = current_user_ns();
-	if (user_ns == &init_user_ns)
-		return;
-	posix_acl_fix_xattr_userns(user_ns, &init_user_ns, value, size);
+	struct user_namespace *target_ns = current_user_ns();
+	if (target_ns == source_ns)
+		return 0;
+	return posix_acl_fix_xattr_userns(target_ns, source_ns, value, size);
 }
 
 /*
@@ -782,7 +800,7 @@ posix_acl_xattr_get(const struct xattr_handler *handler,
 	if (acl == NULL)
 		return -ENODATA;
 
-	error = posix_acl_to_xattr(&init_user_ns, acl, value, size);
+	error = posix_acl_to_xattr(dentry->d_sb->s_user_ns, acl, value, size);
 	posix_acl_release(acl);
 
 	return error;
@@ -810,7 +828,8 @@ posix_acl_xattr_set(const struct xattr_handler *handler,
 		return -EPERM;
 
 	if (value) {
-		acl = posix_acl_from_xattr(&init_user_ns, value, size);
+		acl = posix_acl_from_xattr(dentry->d_sb->s_user_ns, value,
+					   size);
 		if (IS_ERR(acl))
 			return PTR_ERR(acl);
 
diff --git a/fs/xattr.c b/fs/xattr.c
index 9b932b95d74e..1268d8d5f74b 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -351,8 +351,12 @@ setxattr(struct dentry *d, const char __user *name, const void __user *value,
 			goto out;
 		}
 		if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
-		    (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
-			posix_acl_fix_xattr_from_user(kvalue, size);
+		    (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0)) {
+			error = posix_acl_fix_xattr_from_user(d->d_sb->s_user_ns,
+							      kvalue, size);
+			if (error)
+				goto out;
+		}
 	}
 
 	error = vfs_setxattr(d, kname, kvalue, size, flags);
@@ -452,9 +456,14 @@ getxattr(struct dentry *d, const char __user *name, void __user *value,
 	error = vfs_getxattr(d, kname, kvalue, size);
 	if (error > 0) {
 		if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
-		    (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
-			posix_acl_fix_xattr_to_user(kvalue, size);
-		if (size && copy_to_user(value, kvalue, error))
+		    (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0)) {
+			int ret;
+			ret = posix_acl_fix_xattr_to_user(d->d_sb->s_user_ns,
+							  kvalue, size);
+			if (ret)
+				error = ret;
+		}
+		if (error > 0 && size && copy_to_user(value, kvalue, error))
 			error = -EFAULT;
 	} else if (error == -ERANGE && size >= XATTR_SIZE_MAX) {
 		/* The file system tried to returned a value bigger
diff --git a/include/linux/posix_acl_xattr.h b/include/linux/posix_acl_xattr.h
index 6f14ee295822..db63c57357b4 100644
--- a/include/linux/posix_acl_xattr.h
+++ b/include/linux/posix_acl_xattr.h
@@ -53,14 +53,23 @@ posix_acl_xattr_count(size_t size)
 }
 
 #ifdef CONFIG_FS_POSIX_ACL
-void posix_acl_fix_xattr_from_user(void *value, size_t size);
-void posix_acl_fix_xattr_to_user(void *value, size_t size);
+int posix_acl_fix_xattr_from_user(struct user_namespace *target_ns,
+				  void *value, size_t size);
+int posix_acl_fix_xattr_to_user(struct user_namespace *source_ns, void *value,
+				size_t size);
 #else
-static inline void posix_acl_fix_xattr_from_user(void *value, size_t size)
+static inline int
+posix_acl_fix_xattr_from_user(struct user_namespace *target_ns, void *value,
+			      size_t size)
 {
+	return 0;
 }
-static inline void posix_acl_fix_xattr_to_user(void *value, size_t size)
+
+static inline int
+posix_acl_fix_xattr_to_user(struct user_namespace *source_ns, void *value,
+			    size_t size)
 {
+	return 0;
 }
 #endif
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 11/18] fs: Ensure the mounter of a filesystem is privileged towards its inodes
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Serge Hallyn
  Cc: linux-security-module-u79uwXL29TY76Z2rM5mHXA, Seth Forshee,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

The mounter of a filesystem should be privileged towards the
inodes of that filesystem. Extend the checks in
inode_owner_or_capable() and capable_wrt_inode_uidgid() to
permit access by users priviliged in the user namespace of the
inode's superblock.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/inode.c          |  3 +++
 kernel/capability.c | 13 +++++++++----
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 1be5f9003eb3..01c036fe1950 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1962,6 +1962,9 @@ bool inode_owner_or_capable(const struct inode *inode)
 	ns = current_user_ns();
 	if (ns_capable(ns, CAP_FOWNER) && kuid_has_mapping(ns, inode->i_uid))
 		return true;
+
+	if (ns_capable(inode->i_sb->s_user_ns, CAP_FOWNER))
+		return true;
 	return false;
 }
 EXPORT_SYMBOL(inode_owner_or_capable);
diff --git a/kernel/capability.c b/kernel/capability.c
index 45432b54d5c6..5137a38a5670 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -437,13 +437,18 @@ EXPORT_SYMBOL(file_ns_capable);
  *
  * Return true if the current task has the given capability targeted at
  * its own user namespace and that the given inode's uid and gid are
- * mapped into the current user namespace.
+ * mapped into the current user namespace, or if the current task has
+ * the capability towards the user namespace of the inode's superblock.
  */
 bool capable_wrt_inode_uidgid(const struct inode *inode, int cap)
 {
-	struct user_namespace *ns = current_user_ns();
+	struct user_namespace *ns;
 
-	return ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid) &&
-		kgid_has_mapping(ns, inode->i_gid);
+	ns = current_user_ns();
+	if (ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid) &&
+	    kgid_has_mapping(ns, inode->i_gid))
+		return true;
+
+	return ns_capable(inode->i_sb->s_user_ns, cap);
 }
 EXPORT_SYMBOL(capable_wrt_inode_uidgid);
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 11/18] fs: Ensure the mounter of a filesystem is privileged towards its inodes
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Serge Hallyn
  Cc: Richard Weinberger, Austin S Hemmelgarn, Miklos Szeredi,
	linux-kernel, linux-bcache, dm-devel, linux-raid, linux-mtd,
	linux-fsdevel, fuse-devel, linux-security-module, selinux,
	Seth Forshee

The mounter of a filesystem should be privileged towards the
inodes of that filesystem. Extend the checks in
inode_owner_or_capable() and capable_wrt_inode_uidgid() to
permit access by users priviliged in the user namespace of the
inode's superblock.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 fs/inode.c          |  3 +++
 kernel/capability.c | 13 +++++++++----
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 1be5f9003eb3..01c036fe1950 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1962,6 +1962,9 @@ bool inode_owner_or_capable(const struct inode *inode)
 	ns = current_user_ns();
 	if (ns_capable(ns, CAP_FOWNER) && kuid_has_mapping(ns, inode->i_uid))
 		return true;
+
+	if (ns_capable(inode->i_sb->s_user_ns, CAP_FOWNER))
+		return true;
 	return false;
 }
 EXPORT_SYMBOL(inode_owner_or_capable);
diff --git a/kernel/capability.c b/kernel/capability.c
index 45432b54d5c6..5137a38a5670 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -437,13 +437,18 @@ EXPORT_SYMBOL(file_ns_capable);
  *
  * Return true if the current task has the given capability targeted at
  * its own user namespace and that the given inode's uid and gid are
- * mapped into the current user namespace.
+ * mapped into the current user namespace, or if the current task has
+ * the capability towards the user namespace of the inode's superblock.
  */
 bool capable_wrt_inode_uidgid(const struct inode *inode, int cap)
 {
-	struct user_namespace *ns = current_user_ns();
+	struct user_namespace *ns;
 
-	return ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid) &&
-		kgid_has_mapping(ns, inode->i_gid);
+	ns = current_user_ns();
+	if (ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid) &&
+	    kgid_has_mapping(ns, inode->i_gid))
+		return true;
+
+	return ns_capable(inode->i_sb->s_user_ns, cap);
 }
 EXPORT_SYMBOL(capable_wrt_inode_uidgid);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 12/18] fs: Don't remove suid for CAP_FSETID in s_user_ns
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: Serge Hallyn, Seth Forshee, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	Miklos Szeredi, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Expand the check in should_remove_suid() to keep privileges for
CAP_FSETID in s_user_ns rather than init_user_ns.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/inode.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/inode.c b/fs/inode.c
index 01c036fe1950..3e7c74da9304 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1684,7 +1684,8 @@ int should_remove_suid(struct dentry *dentry)
 	if (unlikely((mode & S_ISGID) && (mode & S_IXGRP)))
 		kill |= ATTR_KILL_SGID;
 
-	if (unlikely(kill && !capable(CAP_FSETID) && S_ISREG(mode)))
+	if (unlikely(kill && !ns_capable(dentry->d_sb->s_user_ns, CAP_FSETID) &&
+		     S_ISREG(mode)))
 		return kill;
 
 	return 0;
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 12/18] fs: Don't remove suid for CAP_FSETID in s_user_ns
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn,
	Miklos Szeredi, linux-kernel, linux-bcache, dm-devel, linux-raid,
	linux-mtd, linux-fsdevel, fuse-devel, linux-security-module,
	selinux, Seth Forshee

Expand the check in should_remove_suid() to keep privileges for
CAP_FSETID in s_user_ns rather than init_user_ns.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 fs/inode.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/inode.c b/fs/inode.c
index 01c036fe1950..3e7c74da9304 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1684,7 +1684,8 @@ int should_remove_suid(struct dentry *dentry)
 	if (unlikely((mode & S_ISGID) && (mode & S_IXGRP)))
 		kill |= ATTR_KILL_SGID;
 
-	if (unlikely(kill && !capable(CAP_FSETID) && S_ISREG(mode)))
+	if (unlikely(kill && !ns_capable(dentry->d_sb->s_user_ns, CAP_FSETID) &&
+		     S_ISREG(mode)))
 		return kill;
 
 	return 0;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 13/18] fs: Allow superblock owner to access do_remount_sb()
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: Serge Hallyn, Seth Forshee, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	Miklos Szeredi, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Superblock level remounts are currently restricted to global
CAP_SYS_ADMIN, as is the path for changing the root mount to
read only on umount. Loosen both of these permission checks to
also allow CAP_SYS_ADMIN in any namespace which is privileged
towards the userns which originally mounted the filesystem.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/namespace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 18fc58760aec..b00a765895e7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1510,7 +1510,7 @@ static int do_umount(struct mount *mnt, int flags)
 		 * Special case for "unmounting" root ...
 		 * we just try to remount it readonly.
 		 */
-		if (!capable(CAP_SYS_ADMIN))
+		if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
 			return -EPERM;
 		down_write(&sb->s_umount);
 		if (!(sb->s_flags & MS_RDONLY))
@@ -2199,7 +2199,7 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	down_write(&sb->s_umount);
 	if (flags & MS_BIND)
 		err = change_mount_flags(path->mnt, flags);
-	else if (!capable(CAP_SYS_ADMIN))
+	else if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
 		err = -EPERM;
 	else
 		err = do_remount_sb(sb, flags, data, 0);
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 13/18] fs: Allow superblock owner to access do_remount_sb()
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn,
	Miklos Szeredi, linux-kernel, linux-bcache, dm-devel, linux-raid,
	linux-mtd, linux-fsdevel, fuse-devel, linux-security-module,
	selinux, Seth Forshee

Superblock level remounts are currently restricted to global
CAP_SYS_ADMIN, as is the path for changing the root mount to
read only on umount. Loosen both of these permission checks to
also allow CAP_SYS_ADMIN in any namespace which is privileged
towards the userns which originally mounted the filesystem.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 fs/namespace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 18fc58760aec..b00a765895e7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1510,7 +1510,7 @@ static int do_umount(struct mount *mnt, int flags)
 		 * Special case for "unmounting" root ...
 		 * we just try to remount it readonly.
 		 */
-		if (!capable(CAP_SYS_ADMIN))
+		if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
 			return -EPERM;
 		down_write(&sb->s_umount);
 		if (!(sb->s_flags & MS_RDONLY))
@@ -2199,7 +2199,7 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	down_write(&sb->s_umount);
 	if (flags & MS_BIND)
 		err = change_mount_flags(path->mnt, flags);
-	else if (!capable(CAP_SYS_ADMIN))
+	else if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
 		err = -EPERM;
 	else
 		err = do_remount_sb(sb, flags, data, 0);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 14/18] capabilities: Allow privileged user in s_user_ns to set security.* xattrs
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Serge Hallyn, James Morris, Serge E. Hallyn
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Seth Forshee,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Alexander Viro,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

A privileged user in s_user_ns will generally have the ability to
manipulate the backing store and insert security.* xattrs into
the filesystem directly. Therefore the kernel must be prepared to
handle these xattrs from unprivileged mounts, and it makes little
sense for commoncap to prevent writing these xattrs to the
filesystem. The capability and LSM code have already been updated
to appropriately handle xattrs from unprivileged mounts, so it
is safe to loosen this restriction on setting xattrs.

The exception to this logic is that writing xattrs to a mounted
filesystem may also cause the LSM inode_post_setxattr or
inode_setsecurity callbacks to be invoked. SELinux will deny the
xattr update by virtue of applying mountpoint labeling to
unprivileged userns mounts, and Smack will deny the writes for
any user without global CAP_MAC_ADMIN, so loosening the
capability check in commoncap is safe in this respect as well.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 security/commoncap.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index 2119421613f6..d6c80c19c449 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -653,15 +653,17 @@ int cap_bprm_secureexec(struct linux_binprm *bprm)
 int cap_inode_setxattr(struct dentry *dentry, const char *name,
 		       const void *value, size_t size, int flags)
 {
+	struct user_namespace *user_ns = dentry->d_sb->s_user_ns;
+
 	if (!strcmp(name, XATTR_NAME_CAPS)) {
-		if (!capable(CAP_SETFCAP))
+		if (!ns_capable(user_ns, CAP_SETFCAP))
 			return -EPERM;
 		return 0;
 	}
 
 	if (!strncmp(name, XATTR_SECURITY_PREFIX,
 		     sizeof(XATTR_SECURITY_PREFIX) - 1) &&
-	    !capable(CAP_SYS_ADMIN))
+	    !ns_capable(user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 	return 0;
 }
@@ -679,15 +681,17 @@ int cap_inode_setxattr(struct dentry *dentry, const char *name,
  */
 int cap_inode_removexattr(struct dentry *dentry, const char *name)
 {
+	struct user_namespace *user_ns = dentry->d_sb->s_user_ns;
+
 	if (!strcmp(name, XATTR_NAME_CAPS)) {
-		if (!capable(CAP_SETFCAP))
+		if (!ns_capable(user_ns, CAP_SETFCAP))
 			return -EPERM;
 		return 0;
 	}
 
 	if (!strncmp(name, XATTR_SECURITY_PREFIX,
 		     sizeof(XATTR_SECURITY_PREFIX) - 1) &&
-	    !capable(CAP_SYS_ADMIN))
+	    !ns_capable(user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 	return 0;
 }
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 14/18] capabilities: Allow privileged user in s_user_ns to set security.* xattrs
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Serge Hallyn, James Morris, Serge E. Hallyn
  Cc: Alexander Viro, Richard Weinberger, Austin S Hemmelgarn,
	Miklos Szeredi, linux-kernel, linux-bcache, dm-devel, linux-raid,
	linux-mtd, linux-fsdevel, fuse-devel, linux-security-module,
	selinux, Seth Forshee

A privileged user in s_user_ns will generally have the ability to
manipulate the backing store and insert security.* xattrs into
the filesystem directly. Therefore the kernel must be prepared to
handle these xattrs from unprivileged mounts, and it makes little
sense for commoncap to prevent writing these xattrs to the
filesystem. The capability and LSM code have already been updated
to appropriately handle xattrs from unprivileged mounts, so it
is safe to loosen this restriction on setting xattrs.

The exception to this logic is that writing xattrs to a mounted
filesystem may also cause the LSM inode_post_setxattr or
inode_setsecurity callbacks to be invoked. SELinux will deny the
xattr update by virtue of applying mountpoint labeling to
unprivileged userns mounts, and Smack will deny the writes for
any user without global CAP_MAC_ADMIN, so loosening the
capability check in commoncap is safe in this respect as well.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 security/commoncap.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index 2119421613f6..d6c80c19c449 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -653,15 +653,17 @@ int cap_bprm_secureexec(struct linux_binprm *bprm)
 int cap_inode_setxattr(struct dentry *dentry, const char *name,
 		       const void *value, size_t size, int flags)
 {
+	struct user_namespace *user_ns = dentry->d_sb->s_user_ns;
+
 	if (!strcmp(name, XATTR_NAME_CAPS)) {
-		if (!capable(CAP_SETFCAP))
+		if (!ns_capable(user_ns, CAP_SETFCAP))
 			return -EPERM;
 		return 0;
 	}
 
 	if (!strncmp(name, XATTR_SECURITY_PREFIX,
 		     sizeof(XATTR_SECURITY_PREFIX) - 1) &&
-	    !capable(CAP_SYS_ADMIN))
+	    !ns_capable(user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 	return 0;
 }
@@ -679,15 +681,17 @@ int cap_inode_setxattr(struct dentry *dentry, const char *name,
  */
 int cap_inode_removexattr(struct dentry *dentry, const char *name)
 {
+	struct user_namespace *user_ns = dentry->d_sb->s_user_ns;
+
 	if (!strcmp(name, XATTR_NAME_CAPS)) {
-		if (!capable(CAP_SETFCAP))
+		if (!ns_capable(user_ns, CAP_SETFCAP))
 			return -EPERM;
 		return 0;
 	}
 
 	if (!strncmp(name, XATTR_SECURITY_PREFIX,
 		     sizeof(XATTR_SECURITY_PREFIX) - 1) &&
-	    !capable(CAP_SYS_ADMIN))
+	    !ns_capable(user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 	return 0;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 15/18] fuse: Add support for pid namespaces
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi
  Cc: Serge Hallyn, Seth Forshee, Miklos Szeredi,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Alexander Viro,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

If the userspace process servicing fuse requests is running in
a pid namespace then pids passed via the fuse fd need to be
translated relative to that namespace. Capture the pid namespace
in use when the filesystem is mounted and use this for pid
translation.

Since no use case currently exists for changing namespaces all
translations are done relative to the pid namespace in use when
/dev/fuse is opened. Mounting or /dev/fuse IO from another
namespace will return errors.

Requests from processes whose pid cannot be translated into the
target namespace are not permitted, except for requests
allocated via fuse_get_req_nofail_nopages. For no-fail requests
in.h.pid will be 0 if the pid translation fails.

File locking changes based on previous work done by Eric
Biederman.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org>
---
 fs/fuse/dev.c    | 19 +++++++++++++++----
 fs/fuse/file.c   | 22 +++++++++++++++++-----
 fs/fuse/fuse_i.h |  4 ++++
 fs/fuse/inode.c  |  3 +++
 4 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ebb5e37455a0..a4f6f30d6d86 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -19,6 +19,7 @@
 #include <linux/pipe_fs_i.h>
 #include <linux/swap.h>
 #include <linux/splice.h>
+#include <linux/sched.h>
 
 MODULE_ALIAS_MISCDEV(FUSE_MINOR);
 MODULE_ALIAS("devname:fuse");
@@ -124,11 +125,11 @@ static void __fuse_put_request(struct fuse_req *req)
 	atomic_dec(&req->count);
 }
 
-static void fuse_req_init_context(struct fuse_req *req)
+static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
 	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
 	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
-	req->in.h.pid = current->pid;
+	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
 }
 
 void fuse_set_initialized(struct fuse_conn *fc)
@@ -181,10 +182,14 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
 		goto out;
 	}
 
-	fuse_req_init_context(req);
+	fuse_req_init_context(fc, req);
 	__set_bit(FR_WAITING, &req->flags);
 	if (for_background)
 		__set_bit(FR_BACKGROUND, &req->flags);
+	if (req->in.h.pid == 0) {
+		fuse_put_request(fc, req);
+		return ERR_PTR(-EOVERFLOW);
+	}
 
 	return req;
 
@@ -274,7 +279,7 @@ struct fuse_req *fuse_get_req_nofail_nopages(struct fuse_conn *fc,
 	if (!req)
 		req = get_reserved_req(fc, file);
 
-	fuse_req_init_context(req);
+	fuse_req_init_context(fc, req);
 	__set_bit(FR_WAITING, &req->flags);
 	__clear_bit(FR_BACKGROUND, &req->flags);
 	return req;
@@ -1243,6 +1248,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 	struct fuse_in *in;
 	unsigned reqsize;
 
+	if (task_active_pid_ns(current) != fc->pid_ns)
+		return -EIO;
+
  restart:
 	spin_lock(&fiq->waitq.lock);
 	err = -EAGAIN;
@@ -1872,6 +1880,9 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	struct fuse_req *req;
 	struct fuse_out_header oh;
 
+	if (task_active_pid_ns(current) != fc->pid_ns)
+		return -EIO;
+
 	if (nbytes < sizeof(struct fuse_out_header))
 		return -EINVAL;
 
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index e0faf8f2c868..a6c7484c94ee 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2061,7 +2061,8 @@ static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma)
 	return generic_file_mmap(file, vma);
 }
 
-static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
+static int convert_fuse_file_lock(struct fuse_conn *fc,
+				  const struct fuse_file_lock *ffl,
 				  struct file_lock *fl)
 {
 	switch (ffl->type) {
@@ -2076,7 +2077,14 @@ static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
 
 		fl->fl_start = ffl->start;
 		fl->fl_end = ffl->end;
-		fl->fl_pid = ffl->pid;
+
+		/*
+		 * Convert pid into the caller's pid namespace. If the pid
+		 * does not map into the namespace fl_pid will get set to 0.
+		 */
+		rcu_read_lock();
+		fl->fl_pid = pid_vnr(find_pid_ns(ffl->pid, fc->pid_ns));
+		rcu_read_unlock();
 		break;
 
 	default:
@@ -2125,7 +2133,7 @@ static int fuse_getlk(struct file *file, struct file_lock *fl)
 	args.out.args[0].value = &outarg;
 	err = fuse_simple_request(fc, &args);
 	if (!err)
-		err = convert_fuse_file_lock(&outarg.lk, fl);
+		err = convert_fuse_file_lock(fc, &outarg.lk, fl);
 
 	return err;
 }
@@ -2137,7 +2145,8 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
 	FUSE_ARGS(args);
 	struct fuse_lk_in inarg;
 	int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
-	pid_t pid = fl->fl_type != F_UNLCK ? current->tgid : 0;
+	struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL;
+	pid_t pid_nr = pid_nr_ns(pid, fc->pid_ns);
 	int err;
 
 	if (fl->fl_lmops && fl->fl_lmops->lm_grant) {
@@ -2149,7 +2158,10 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
 	if (fl->fl_flags & FL_CLOSE)
 		return 0;
 
-	fuse_lk_fill(&args, file, fl, opcode, pid, flock, &inarg);
+	if (pid && pid_nr == 0)
+		return -EOVERFLOW;
+
+	fuse_lk_fill(&args, file, fl, opcode, pid_nr, flock, &inarg);
 	err = fuse_simple_request(fc, &args);
 
 	/* locking is restartable */
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 405113101db8..143b595197b6 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -22,6 +22,7 @@
 #include <linux/rbtree.h>
 #include <linux/poll.h>
 #include <linux/workqueue.h>
+#include <linux/pid_namespace.h>
 
 /** Max number of pages that can be used in a single read request */
 #define FUSE_MAX_PAGES_PER_REQ 32
@@ -456,6 +457,9 @@ struct fuse_conn {
 	/** The group id for this mount */
 	kgid_t group_id;
 
+	/** The pid namespace for this mount */
+	struct pid_namespace *pid_ns;
+
 	/** The fuse mount flags for this mount */
 	unsigned flags;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 2913db2a5b99..2f31874ea9db 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -20,6 +20,7 @@
 #include <linux/random.h>
 #include <linux/sched.h>
 #include <linux/exportfs.h>
+#include <linux/pid_namespace.h>
 
 MODULE_AUTHOR("Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>");
 MODULE_DESCRIPTION("Filesystem in Userspace");
@@ -609,6 +610,7 @@ void fuse_conn_init(struct fuse_conn *fc)
 	fc->connected = 1;
 	fc->attr_version = 1;
 	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
+	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);
 
@@ -617,6 +619,7 @@ void fuse_conn_put(struct fuse_conn *fc)
 	if (atomic_dec_and_test(&fc->count)) {
 		if (fc->destroy_req)
 			fuse_request_free(fc->destroy_req);
+		put_pid_ns(fc->pid_ns);
 		fc->release(fc);
 	}
 }
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 15/18] fuse: Add support for pid namespaces
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, linux-kernel, linux-bcache, dm-devel,
	linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux, Seth Forshee, Miklos Szeredi

If the userspace process servicing fuse requests is running in
a pid namespace then pids passed via the fuse fd need to be
translated relative to that namespace. Capture the pid namespace
in use when the filesystem is mounted and use this for pid
translation.

Since no use case currently exists for changing namespaces all
translations are done relative to the pid namespace in use when
/dev/fuse is opened. Mounting or /dev/fuse IO from another
namespace will return errors.

Requests from processes whose pid cannot be translated into the
target namespace are not permitted, except for requests
allocated via fuse_get_req_nofail_nopages. For no-fail requests
in.h.pid will be 0 if the pid translation fails.

File locking changes based on previous work done by Eric
Biederman.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/fuse/dev.c    | 19 +++++++++++++++----
 fs/fuse/file.c   | 22 +++++++++++++++++-----
 fs/fuse/fuse_i.h |  4 ++++
 fs/fuse/inode.c  |  3 +++
 4 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ebb5e37455a0..a4f6f30d6d86 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -19,6 +19,7 @@
 #include <linux/pipe_fs_i.h>
 #include <linux/swap.h>
 #include <linux/splice.h>
+#include <linux/sched.h>
 
 MODULE_ALIAS_MISCDEV(FUSE_MINOR);
 MODULE_ALIAS("devname:fuse");
@@ -124,11 +125,11 @@ static void __fuse_put_request(struct fuse_req *req)
 	atomic_dec(&req->count);
 }
 
-static void fuse_req_init_context(struct fuse_req *req)
+static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
 	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
 	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
-	req->in.h.pid = current->pid;
+	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
 }
 
 void fuse_set_initialized(struct fuse_conn *fc)
@@ -181,10 +182,14 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
 		goto out;
 	}
 
-	fuse_req_init_context(req);
+	fuse_req_init_context(fc, req);
 	__set_bit(FR_WAITING, &req->flags);
 	if (for_background)
 		__set_bit(FR_BACKGROUND, &req->flags);
+	if (req->in.h.pid == 0) {
+		fuse_put_request(fc, req);
+		return ERR_PTR(-EOVERFLOW);
+	}
 
 	return req;
 
@@ -274,7 +279,7 @@ struct fuse_req *fuse_get_req_nofail_nopages(struct fuse_conn *fc,
 	if (!req)
 		req = get_reserved_req(fc, file);
 
-	fuse_req_init_context(req);
+	fuse_req_init_context(fc, req);
 	__set_bit(FR_WAITING, &req->flags);
 	__clear_bit(FR_BACKGROUND, &req->flags);
 	return req;
@@ -1243,6 +1248,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 	struct fuse_in *in;
 	unsigned reqsize;
 
+	if (task_active_pid_ns(current) != fc->pid_ns)
+		return -EIO;
+
  restart:
 	spin_lock(&fiq->waitq.lock);
 	err = -EAGAIN;
@@ -1872,6 +1880,9 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	struct fuse_req *req;
 	struct fuse_out_header oh;
 
+	if (task_active_pid_ns(current) != fc->pid_ns)
+		return -EIO;
+
 	if (nbytes < sizeof(struct fuse_out_header))
 		return -EINVAL;
 
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index e0faf8f2c868..a6c7484c94ee 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2061,7 +2061,8 @@ static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma)
 	return generic_file_mmap(file, vma);
 }
 
-static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
+static int convert_fuse_file_lock(struct fuse_conn *fc,
+				  const struct fuse_file_lock *ffl,
 				  struct file_lock *fl)
 {
 	switch (ffl->type) {
@@ -2076,7 +2077,14 @@ static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
 
 		fl->fl_start = ffl->start;
 		fl->fl_end = ffl->end;
-		fl->fl_pid = ffl->pid;
+
+		/*
+		 * Convert pid into the caller's pid namespace. If the pid
+		 * does not map into the namespace fl_pid will get set to 0.
+		 */
+		rcu_read_lock();
+		fl->fl_pid = pid_vnr(find_pid_ns(ffl->pid, fc->pid_ns));
+		rcu_read_unlock();
 		break;
 
 	default:
@@ -2125,7 +2133,7 @@ static int fuse_getlk(struct file *file, struct file_lock *fl)
 	args.out.args[0].value = &outarg;
 	err = fuse_simple_request(fc, &args);
 	if (!err)
-		err = convert_fuse_file_lock(&outarg.lk, fl);
+		err = convert_fuse_file_lock(fc, &outarg.lk, fl);
 
 	return err;
 }
@@ -2137,7 +2145,8 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
 	FUSE_ARGS(args);
 	struct fuse_lk_in inarg;
 	int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
-	pid_t pid = fl->fl_type != F_UNLCK ? current->tgid : 0;
+	struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL;
+	pid_t pid_nr = pid_nr_ns(pid, fc->pid_ns);
 	int err;
 
 	if (fl->fl_lmops && fl->fl_lmops->lm_grant) {
@@ -2149,7 +2158,10 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
 	if (fl->fl_flags & FL_CLOSE)
 		return 0;
 
-	fuse_lk_fill(&args, file, fl, opcode, pid, flock, &inarg);
+	if (pid && pid_nr == 0)
+		return -EOVERFLOW;
+
+	fuse_lk_fill(&args, file, fl, opcode, pid_nr, flock, &inarg);
 	err = fuse_simple_request(fc, &args);
 
 	/* locking is restartable */
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 405113101db8..143b595197b6 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -22,6 +22,7 @@
 #include <linux/rbtree.h>
 #include <linux/poll.h>
 #include <linux/workqueue.h>
+#include <linux/pid_namespace.h>
 
 /** Max number of pages that can be used in a single read request */
 #define FUSE_MAX_PAGES_PER_REQ 32
@@ -456,6 +457,9 @@ struct fuse_conn {
 	/** The group id for this mount */
 	kgid_t group_id;
 
+	/** The pid namespace for this mount */
+	struct pid_namespace *pid_ns;
+
 	/** The fuse mount flags for this mount */
 	unsigned flags;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 2913db2a5b99..2f31874ea9db 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -20,6 +20,7 @@
 #include <linux/random.h>
 #include <linux/sched.h>
 #include <linux/exportfs.h>
+#include <linux/pid_namespace.h>
 
 MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
 MODULE_DESCRIPTION("Filesystem in Userspace");
@@ -609,6 +610,7 @@ void fuse_conn_init(struct fuse_conn *fc)
 	fc->connected = 1;
 	fc->attr_version = 1;
 	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
+	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);
 
@@ -617,6 +619,7 @@ void fuse_conn_put(struct fuse_conn *fc)
 	if (atomic_dec_and_test(&fc->count)) {
 		if (fc->destroy_req)
 			fuse_request_free(fc->destroy_req);
+		put_pid_ns(fc->pid_ns);
 		fc->release(fc);
 	}
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 16/18] fuse: Support fuse filesystems outside of init_user_ns
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi
  Cc: Serge Hallyn, Seth Forshee, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Alexander Viro,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

In order to support mounts from namespaces other than
init_user_ns, fuse must translate uids and gids to/from the
userns of the process servicing requests on /dev/fuse. This
patch does that, with a couple of restrictions on the namespace:

 - The userns for the fuse connection is fixed to the namespace
   from which /dev/fuse is opened.

 - The namespace must be the same as s_user_ns.

These restrictions simplify the implementation by avoiding the
need to pass around userns references and by allowing fuse to
rely on the checks in inode_change_ok for ownership changes.
Either restriction could be relaxed in the future if needed.

For cuse the namespace used for the connection is also simply
current_user_ns() at the time /dev/cuse is opened.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/fuse/cuse.c   |  3 ++-
 fs/fuse/dev.c    | 13 ++++++++-----
 fs/fuse/dir.c    | 14 +++++++-------
 fs/fuse/fuse_i.h |  6 +++++-
 fs/fuse/inode.c  | 35 +++++++++++++++++++++++------------
 5 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index eae2c11268bc..a10aca57bfe4 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -48,6 +48,7 @@
 #include <linux/stat.h>
 #include <linux/module.h>
 #include <linux/uio.h>
+#include <linux/user_namespace.h>
 
 #include "fuse_i.h"
 
@@ -498,7 +499,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file)
 	if (!cc)
 		return -ENOMEM;
 
-	fuse_conn_init(&cc->fc);
+	fuse_conn_init(&cc->fc, current_user_ns());
 
 	fud = fuse_dev_alloc(&cc->fc);
 	if (!fud) {
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index a4f6f30d6d86..11b4cb0a0e2f 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req)
 
 static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
-	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
-	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
+	req->in.h.uid = from_kuid(fc->user_ns, current_fsuid());
+	req->in.h.gid = from_kgid(fc->user_ns, current_fsgid());
 	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
 }
 
@@ -186,7 +186,8 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
 	__set_bit(FR_WAITING, &req->flags);
 	if (for_background)
 		__set_bit(FR_BACKGROUND, &req->flags);
-	if (req->in.h.pid == 0) {
+	if (req->in.h.pid == 0 || req->in.h.uid == (uid_t)-1 ||
+	    req->in.h.gid == (gid_t)-1) {
 		fuse_put_request(fc, req);
 		return ERR_PTR(-EOVERFLOW);
 	}
@@ -1248,7 +1249,8 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 	struct fuse_in *in;
 	unsigned reqsize;
 
-	if (task_active_pid_ns(current) != fc->pid_ns)
+	if (task_active_pid_ns(current) != fc->pid_ns ||
+	    current_user_ns() != fc->user_ns)
 		return -EIO;
 
  restart:
@@ -1880,7 +1882,8 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	struct fuse_req *req;
 	struct fuse_out_header oh;
 
-	if (task_active_pid_ns(current) != fc->pid_ns)
+	if (task_active_pid_ns(current) != fc->pid_ns ||
+	    current_user_ns() != fc->user_ns)
 		return -EIO;
 
 	if (nbytes < sizeof(struct fuse_out_header))
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 5e2e08712d3b..8fd9fe4dcd43 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -841,8 +841,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
 	stat->ino = attr->ino;
 	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
 	stat->nlink = attr->nlink;
-	stat->uid = make_kuid(&init_user_ns, attr->uid);
-	stat->gid = make_kgid(&init_user_ns, attr->gid);
+	stat->uid = inode->i_uid;
+	stat->gid = inode->i_gid;
 	stat->rdev = inode->i_rdev;
 	stat->atime.tv_sec = attr->atime;
 	stat->atime.tv_nsec = attr->atimensec;
@@ -1455,17 +1455,17 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
 	return true;
 }
 
-static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
-			   bool trust_local_cmtime)
+static void iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
+			   struct fuse_setattr_in *arg, bool trust_local_cmtime)
 {
 	unsigned ivalid = iattr->ia_valid;
 
 	if (ivalid & ATTR_MODE)
 		arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
 	if (ivalid & ATTR_UID)
-		arg->valid |= FATTR_UID,    arg->uid = from_kuid(&init_user_ns, iattr->ia_uid);
+		arg->valid |= FATTR_UID,    arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
 	if (ivalid & ATTR_GID)
-		arg->valid |= FATTR_GID,    arg->gid = from_kgid(&init_user_ns, iattr->ia_gid);
+		arg->valid |= FATTR_GID,    arg->gid = from_kgid(fc->user_ns, iattr->ia_gid);
 	if (ivalid & ATTR_SIZE)
 		arg->valid |= FATTR_SIZE,   arg->size = iattr->ia_size;
 	if (ivalid & ATTR_ATIME) {
@@ -1625,7 +1625,7 @@ int fuse_do_setattr(struct inode *inode, struct iattr *attr,
 
 	memset(&inarg, 0, sizeof(inarg));
 	memset(&outarg, 0, sizeof(outarg));
-	iattr_to_fattr(attr, &inarg, trust_local_cmtime);
+	iattr_to_fattr(fc, attr, &inarg, trust_local_cmtime);
 	if (file) {
 		struct fuse_file *ff = file->private_data;
 		inarg.valid |= FATTR_FH;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 143b595197b6..5897805405ba 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -23,6 +23,7 @@
 #include <linux/poll.h>
 #include <linux/workqueue.h>
 #include <linux/pid_namespace.h>
+#include <linux/user_namespace.h>
 
 /** Max number of pages that can be used in a single read request */
 #define FUSE_MAX_PAGES_PER_REQ 32
@@ -460,6 +461,9 @@ struct fuse_conn {
 	/** The pid namespace for this mount */
 	struct pid_namespace *pid_ns;
 
+	/** The user namespace for this mount */
+	struct user_namespace *user_ns;
+
 	/** The fuse mount flags for this mount */
 	unsigned flags;
 
@@ -855,7 +859,7 @@ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
 /**
  * Initialize fuse_conn
  */
-void fuse_conn_init(struct fuse_conn *fc);
+void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns);
 
 /**
  * Release reference to fuse_conn
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 2f31874ea9db..b7bdfdac3521 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
 	inode->i_ino     = fuse_squash_ino(attr->ino);
 	inode->i_mode    = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
 	set_nlink(inode, attr->nlink);
-	inode->i_uid     = make_kuid(&init_user_ns, attr->uid);
-	inode->i_gid     = make_kgid(&init_user_ns, attr->gid);
+	inode->i_uid     = make_kuid(fc->user_ns, attr->uid);
+	inode->i_gid     = make_kgid(fc->user_ns, attr->gid);
 	inode->i_blocks  = attr->blocks;
 	inode->i_atime.tv_sec   = attr->atime;
 	inode->i_atime.tv_nsec  = attr->atimensec;
@@ -467,12 +467,15 @@ static int fuse_match_uint(substring_t *s, unsigned int *res)
 	return err;
 }
 
-static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
+static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
+			  struct user_namespace *user_ns)
 {
 	char *p;
 	memset(d, 0, sizeof(struct fuse_mount_data));
 	d->max_read = ~0;
 	d->blksize = FUSE_DEFAULT_BLKSIZE;
+	d->user_id = make_kuid(user_ns, 0);
+	d->group_id = make_kgid(user_ns, 0);
 
 	while ((p = strsep(&opt, ",")) != NULL) {
 		int token;
@@ -503,7 +506,7 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
 		case OPT_USER_ID:
 			if (fuse_match_uint(&args[0], &uv))
 				return 0;
-			d->user_id = make_kuid(current_user_ns(), uv);
+			d->user_id = make_kuid(user_ns, uv);
 			if (!uid_valid(d->user_id))
 				return 0;
 			d->user_id_present = 1;
@@ -512,7 +515,7 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
 		case OPT_GROUP_ID:
 			if (fuse_match_uint(&args[0], &uv))
 				return 0;
-			d->group_id = make_kgid(current_user_ns(), uv);
+			d->group_id = make_kgid(user_ns, uv);
 			if (!gid_valid(d->group_id))
 				return 0;
 			d->group_id_present = 1;
@@ -555,8 +558,10 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 	struct super_block *sb = root->d_sb;
 	struct fuse_conn *fc = get_fuse_conn_super(sb);
 
-	seq_printf(m, ",user_id=%u", from_kuid_munged(&init_user_ns, fc->user_id));
-	seq_printf(m, ",group_id=%u", from_kgid_munged(&init_user_ns, fc->group_id));
+	seq_printf(m, ",user_id=%u",
+		   from_kuid_munged(fc->user_ns, fc->user_id));
+	seq_printf(m, ",group_id=%u",
+		   from_kgid_munged(fc->user_ns, fc->group_id));
 	if (fc->flags & FUSE_DEFAULT_PERMISSIONS)
 		seq_puts(m, ",default_permissions");
 	if (fc->flags & FUSE_ALLOW_OTHER)
@@ -587,7 +592,7 @@ static void fuse_pqueue_init(struct fuse_pqueue *fpq)
 	fpq->connected = 1;
 }
 
-void fuse_conn_init(struct fuse_conn *fc)
+void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns)
 {
 	memset(fc, 0, sizeof(*fc));
 	spin_lock_init(&fc->lock);
@@ -611,6 +616,7 @@ void fuse_conn_init(struct fuse_conn *fc)
 	fc->attr_version = 1;
 	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
 	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
+	fc->user_ns = get_user_ns(user_ns);
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);
 
@@ -620,6 +626,7 @@ void fuse_conn_put(struct fuse_conn *fc)
 		if (fc->destroy_req)
 			fuse_request_free(fc->destroy_req);
 		put_pid_ns(fc->pid_ns);
+		put_user_ns(fc->user_ns);
 		fc->release(fc);
 	}
 }
@@ -1046,7 +1053,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 
 	sb->s_flags &= ~(MS_NOSEC | MS_I_VERSION);
 
-	if (!parse_fuse_opt(data, &d, is_bdev))
+	if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns))
 		goto err;
 
 	if (is_bdev) {
@@ -1070,8 +1077,12 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 	if (!file)
 		goto err;
 
-	if ((file->f_op != &fuse_dev_operations) ||
-	    (file->f_cred->user_ns != &init_user_ns))
+	/*
+	 * Require mount to happen from the same user namespace which
+	 * opened /dev/fuse to prevent potential attacks.
+	 */
+	if (file->f_op != &fuse_dev_operations ||
+	    file->f_cred->user_ns != sb->s_user_ns)
 		goto err_fput;
 
 	fc = kmalloc(sizeof(*fc), GFP_KERNEL);
@@ -1079,7 +1090,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 	if (!fc)
 		goto err_fput;
 
-	fuse_conn_init(fc);
+	fuse_conn_init(fc, sb->s_user_ns);
 	fc->release = fuse_free_conn;
 
 	fud = fuse_dev_alloc(fc);
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 16/18] fuse: Support fuse filesystems outside of init_user_ns
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, linux-kernel, linux-bcache, dm-devel,
	linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux, Seth Forshee

In order to support mounts from namespaces other than
init_user_ns, fuse must translate uids and gids to/from the
userns of the process servicing requests on /dev/fuse. This
patch does that, with a couple of restrictions on the namespace:

 - The userns for the fuse connection is fixed to the namespace
   from which /dev/fuse is opened.

 - The namespace must be the same as s_user_ns.

These restrictions simplify the implementation by avoiding the
need to pass around userns references and by allowing fuse to
rely on the checks in inode_change_ok for ownership changes.
Either restriction could be relaxed in the future if needed.

For cuse the namespace used for the connection is also simply
current_user_ns() at the time /dev/cuse is opened.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 fs/fuse/cuse.c   |  3 ++-
 fs/fuse/dev.c    | 13 ++++++++-----
 fs/fuse/dir.c    | 14 +++++++-------
 fs/fuse/fuse_i.h |  6 +++++-
 fs/fuse/inode.c  | 35 +++++++++++++++++++++++------------
 5 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index eae2c11268bc..a10aca57bfe4 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -48,6 +48,7 @@
 #include <linux/stat.h>
 #include <linux/module.h>
 #include <linux/uio.h>
+#include <linux/user_namespace.h>
 
 #include "fuse_i.h"
 
@@ -498,7 +499,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file)
 	if (!cc)
 		return -ENOMEM;
 
-	fuse_conn_init(&cc->fc);
+	fuse_conn_init(&cc->fc, current_user_ns());
 
 	fud = fuse_dev_alloc(&cc->fc);
 	if (!fud) {
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index a4f6f30d6d86..11b4cb0a0e2f 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req)
 
 static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
-	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
-	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
+	req->in.h.uid = from_kuid(fc->user_ns, current_fsuid());
+	req->in.h.gid = from_kgid(fc->user_ns, current_fsgid());
 	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
 }
 
@@ -186,7 +186,8 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
 	__set_bit(FR_WAITING, &req->flags);
 	if (for_background)
 		__set_bit(FR_BACKGROUND, &req->flags);
-	if (req->in.h.pid == 0) {
+	if (req->in.h.pid == 0 || req->in.h.uid == (uid_t)-1 ||
+	    req->in.h.gid == (gid_t)-1) {
 		fuse_put_request(fc, req);
 		return ERR_PTR(-EOVERFLOW);
 	}
@@ -1248,7 +1249,8 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 	struct fuse_in *in;
 	unsigned reqsize;
 
-	if (task_active_pid_ns(current) != fc->pid_ns)
+	if (task_active_pid_ns(current) != fc->pid_ns ||
+	    current_user_ns() != fc->user_ns)
 		return -EIO;
 
  restart:
@@ -1880,7 +1882,8 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	struct fuse_req *req;
 	struct fuse_out_header oh;
 
-	if (task_active_pid_ns(current) != fc->pid_ns)
+	if (task_active_pid_ns(current) != fc->pid_ns ||
+	    current_user_ns() != fc->user_ns)
 		return -EIO;
 
 	if (nbytes < sizeof(struct fuse_out_header))
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 5e2e08712d3b..8fd9fe4dcd43 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -841,8 +841,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
 	stat->ino = attr->ino;
 	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
 	stat->nlink = attr->nlink;
-	stat->uid = make_kuid(&init_user_ns, attr->uid);
-	stat->gid = make_kgid(&init_user_ns, attr->gid);
+	stat->uid = inode->i_uid;
+	stat->gid = inode->i_gid;
 	stat->rdev = inode->i_rdev;
 	stat->atime.tv_sec = attr->atime;
 	stat->atime.tv_nsec = attr->atimensec;
@@ -1455,17 +1455,17 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
 	return true;
 }
 
-static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
-			   bool trust_local_cmtime)
+static void iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
+			   struct fuse_setattr_in *arg, bool trust_local_cmtime)
 {
 	unsigned ivalid = iattr->ia_valid;
 
 	if (ivalid & ATTR_MODE)
 		arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
 	if (ivalid & ATTR_UID)
-		arg->valid |= FATTR_UID,    arg->uid = from_kuid(&init_user_ns, iattr->ia_uid);
+		arg->valid |= FATTR_UID,    arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
 	if (ivalid & ATTR_GID)
-		arg->valid |= FATTR_GID,    arg->gid = from_kgid(&init_user_ns, iattr->ia_gid);
+		arg->valid |= FATTR_GID,    arg->gid = from_kgid(fc->user_ns, iattr->ia_gid);
 	if (ivalid & ATTR_SIZE)
 		arg->valid |= FATTR_SIZE,   arg->size = iattr->ia_size;
 	if (ivalid & ATTR_ATIME) {
@@ -1625,7 +1625,7 @@ int fuse_do_setattr(struct inode *inode, struct iattr *attr,
 
 	memset(&inarg, 0, sizeof(inarg));
 	memset(&outarg, 0, sizeof(outarg));
-	iattr_to_fattr(attr, &inarg, trust_local_cmtime);
+	iattr_to_fattr(fc, attr, &inarg, trust_local_cmtime);
 	if (file) {
 		struct fuse_file *ff = file->private_data;
 		inarg.valid |= FATTR_FH;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 143b595197b6..5897805405ba 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -23,6 +23,7 @@
 #include <linux/poll.h>
 #include <linux/workqueue.h>
 #include <linux/pid_namespace.h>
+#include <linux/user_namespace.h>
 
 /** Max number of pages that can be used in a single read request */
 #define FUSE_MAX_PAGES_PER_REQ 32
@@ -460,6 +461,9 @@ struct fuse_conn {
 	/** The pid namespace for this mount */
 	struct pid_namespace *pid_ns;
 
+	/** The user namespace for this mount */
+	struct user_namespace *user_ns;
+
 	/** The fuse mount flags for this mount */
 	unsigned flags;
 
@@ -855,7 +859,7 @@ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
 /**
  * Initialize fuse_conn
  */
-void fuse_conn_init(struct fuse_conn *fc);
+void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns);
 
 /**
  * Release reference to fuse_conn
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 2f31874ea9db..b7bdfdac3521 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
 	inode->i_ino     = fuse_squash_ino(attr->ino);
 	inode->i_mode    = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
 	set_nlink(inode, attr->nlink);
-	inode->i_uid     = make_kuid(&init_user_ns, attr->uid);
-	inode->i_gid     = make_kgid(&init_user_ns, attr->gid);
+	inode->i_uid     = make_kuid(fc->user_ns, attr->uid);
+	inode->i_gid     = make_kgid(fc->user_ns, attr->gid);
 	inode->i_blocks  = attr->blocks;
 	inode->i_atime.tv_sec   = attr->atime;
 	inode->i_atime.tv_nsec  = attr->atimensec;
@@ -467,12 +467,15 @@ static int fuse_match_uint(substring_t *s, unsigned int *res)
 	return err;
 }
 
-static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
+static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
+			  struct user_namespace *user_ns)
 {
 	char *p;
 	memset(d, 0, sizeof(struct fuse_mount_data));
 	d->max_read = ~0;
 	d->blksize = FUSE_DEFAULT_BLKSIZE;
+	d->user_id = make_kuid(user_ns, 0);
+	d->group_id = make_kgid(user_ns, 0);
 
 	while ((p = strsep(&opt, ",")) != NULL) {
 		int token;
@@ -503,7 +506,7 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
 		case OPT_USER_ID:
 			if (fuse_match_uint(&args[0], &uv))
 				return 0;
-			d->user_id = make_kuid(current_user_ns(), uv);
+			d->user_id = make_kuid(user_ns, uv);
 			if (!uid_valid(d->user_id))
 				return 0;
 			d->user_id_present = 1;
@@ -512,7 +515,7 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
 		case OPT_GROUP_ID:
 			if (fuse_match_uint(&args[0], &uv))
 				return 0;
-			d->group_id = make_kgid(current_user_ns(), uv);
+			d->group_id = make_kgid(user_ns, uv);
 			if (!gid_valid(d->group_id))
 				return 0;
 			d->group_id_present = 1;
@@ -555,8 +558,10 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 	struct super_block *sb = root->d_sb;
 	struct fuse_conn *fc = get_fuse_conn_super(sb);
 
-	seq_printf(m, ",user_id=%u", from_kuid_munged(&init_user_ns, fc->user_id));
-	seq_printf(m, ",group_id=%u", from_kgid_munged(&init_user_ns, fc->group_id));
+	seq_printf(m, ",user_id=%u",
+		   from_kuid_munged(fc->user_ns, fc->user_id));
+	seq_printf(m, ",group_id=%u",
+		   from_kgid_munged(fc->user_ns, fc->group_id));
 	if (fc->flags & FUSE_DEFAULT_PERMISSIONS)
 		seq_puts(m, ",default_permissions");
 	if (fc->flags & FUSE_ALLOW_OTHER)
@@ -587,7 +592,7 @@ static void fuse_pqueue_init(struct fuse_pqueue *fpq)
 	fpq->connected = 1;
 }
 
-void fuse_conn_init(struct fuse_conn *fc)
+void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns)
 {
 	memset(fc, 0, sizeof(*fc));
 	spin_lock_init(&fc->lock);
@@ -611,6 +616,7 @@ void fuse_conn_init(struct fuse_conn *fc)
 	fc->attr_version = 1;
 	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
 	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
+	fc->user_ns = get_user_ns(user_ns);
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);
 
@@ -620,6 +626,7 @@ void fuse_conn_put(struct fuse_conn *fc)
 		if (fc->destroy_req)
 			fuse_request_free(fc->destroy_req);
 		put_pid_ns(fc->pid_ns);
+		put_user_ns(fc->user_ns);
 		fc->release(fc);
 	}
 }
@@ -1046,7 +1053,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 
 	sb->s_flags &= ~(MS_NOSEC | MS_I_VERSION);
 
-	if (!parse_fuse_opt(data, &d, is_bdev))
+	if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns))
 		goto err;
 
 	if (is_bdev) {
@@ -1070,8 +1077,12 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 	if (!file)
 		goto err;
 
-	if ((file->f_op != &fuse_dev_operations) ||
-	    (file->f_cred->user_ns != &init_user_ns))
+	/*
+	 * Require mount to happen from the same user namespace which
+	 * opened /dev/fuse to prevent potential attacks.
+	 */
+	if (file->f_op != &fuse_dev_operations ||
+	    file->f_cred->user_ns != sb->s_user_ns)
 		goto err_fput;
 
 	fc = kmalloc(sizeof(*fc), GFP_KERNEL);
@@ -1079,7 +1090,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 	if (!fc)
 		goto err_fput;
 
-	fuse_conn_init(fc);
+	fuse_conn_init(fc, sb->s_user_ns);
 	fc->release = fuse_free_conn;
 
 	fud = fuse_dev_alloc(fc);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 17/18] fuse: Restrict allow_other to the superblock's namespace or a descendant
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi
  Cc: Serge Hallyn, Seth Forshee, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Alexander Viro,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Unprivileged users are normally restricted from mounting with the
allow_other option by system policy, but this could be bypassed
for a mount done with user namespace root permissions. In such
cases allow_other should not allow users outside the userns
to access the mount as doing so would give the unprivileged user
the ability to manipulate processes it would otherwise be unable
to manipulate. Restrict allow_other to apply to users in the same
userns used at mount or a descendant of that namespace.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/fuse/dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 8fd9fe4dcd43..24e4cdb554f1 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1015,7 +1015,7 @@ int fuse_allow_current_process(struct fuse_conn *fc)
 	const struct cred *cred;
 
 	if (fc->flags & FUSE_ALLOW_OTHER)
-		return 1;
+		return current_in_userns(fc->user_ns);
 
 	cred = current_cred();
 	if (uid_eq(cred->euid, fc->user_id) &&
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 17/18] fuse: Restrict allow_other to the superblock's namespace or a descendant
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, linux-kernel, linux-bcache, dm-devel,
	linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux, Seth Forshee

Unprivileged users are normally restricted from mounting with the
allow_other option by system policy, but this could be bypassed
for a mount done with user namespace root permissions. In such
cases allow_other should not allow users outside the userns
to access the mount as doing so would give the unprivileged user
the ability to manipulate processes it would otherwise be unable
to manipulate. Restrict allow_other to apply to users in the same
userns used at mount or a descendant of that namespace.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 fs/fuse/dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 8fd9fe4dcd43..24e4cdb554f1 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1015,7 +1015,7 @@ int fuse_allow_current_process(struct fuse_conn *fc)
 	const struct cred *cred;
 
 	if (fc->flags & FUSE_ALLOW_OTHER)
-		return 1;
+		return current_in_userns(fc->user_ns);
 
 	cred = current_cred();
 	if (uid_eq(cred->euid, fc->user_id) &&
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 18/18] fuse: Allow user namespace mounts
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-04 18:03     ` Seth Forshee
  2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi
  Cc: Serge Hallyn, Seth Forshee, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Alexander Viro,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/fuse/inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b7bdfdac3521..2fd338c199ce 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1201,7 +1201,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
 static struct file_system_type fuse_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "fuse",
-	.fs_flags	= FS_HAS_SUBTYPE,
+	.fs_flags	= FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
 	.mount		= fuse_mount,
 	.kill_sb	= fuse_kill_sb_anon,
 };
@@ -1233,7 +1233,7 @@ static struct file_system_type fuseblk_fs_type = {
 	.name		= "fuseblk",
 	.mount		= fuse_mount_blk,
 	.kill_sb	= fuse_kill_sb_blk,
-	.fs_flags	= FS_REQUIRES_DEV | FS_HAS_SUBTYPE,
+	.fs_flags	= FS_REQUIRES_DEV | FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
 };
 MODULE_ALIAS_FS("fuseblk");
 
-- 
1.9.1


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RESEND v2 18/18] fuse: Allow user namespace mounts
@ 2016-01-04 18:03     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-04 18:03 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, linux-kernel, linux-bcache, dm-devel,
	linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux, Seth Forshee

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 fs/fuse/inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b7bdfdac3521..2fd338c199ce 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1201,7 +1201,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
 static struct file_system_type fuse_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "fuse",
-	.fs_flags	= FS_HAS_SUBTYPE,
+	.fs_flags	= FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
 	.mount		= fuse_mount,
 	.kill_sb	= fuse_kill_sb_anon,
 };
@@ -1233,7 +1233,7 @@ static struct file_system_type fuseblk_fs_type = {
 	.name		= "fuseblk",
 	.mount		= fuse_mount_blk,
 	.kill_sb	= fuse_kill_sb_blk,
-	.fs_flags	= FS_REQUIRES_DEV | FS_HAS_SUBTYPE,
+	.fs_flags	= FS_REQUIRES_DEV | FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
 };
 MODULE_ALIAS_FS("fuseblk");
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces
  2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
                   ` (2 preceding siblings ...)
       [not found] ` <1451930639-94331-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
@ 2016-01-25 19:47 ` Seth Forshee
  2016-01-25 20:01     ` Eric W. Biederman
  3 siblings, 1 reply; 68+ messages in thread
From: Seth Forshee @ 2016-01-25 19:47 UTC (permalink / raw)
  To: Eric W. Biederman, linux-bcache, dm-devel, linux-raid, linux-mtd,
	linux-fsdevel, fuse-devel, linux-security-module, selinux
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, Miklos Szeredi, linux-kernel

On Mon, Jan 04, 2016 at 12:03:39PM -0600, Seth Forshee wrote:
> These patches implement support for mounting filesystems in user
> namespaces using fuse. They are based on the patches in the for-testing
> branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git,
> but I've rebased them onto 4.4-rc3. I've pushed all of this to:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/sforshee/linux.git fuse-userns
> 
> The patches are organized into three high-level groups.
> 
> Patches 1-6 are related to security, adding restrictions for
> unprivileged mounts and updating the LSMs as needed. Patches 1-2
> (checking inode permissions for block device mounts) may not be strictly
> necessary for fuseblk mounts since fuse doesn't do any IO on the block
> device in the kernel, but it still seems like a good idea to fail the
> mount if the user doesn't have the required permissions for the inode
> (though this is a bit misleading with fuse since the mounts are done via
> a suid-root helper).
> 
> Patches 7-14 update most of the vfs to translate ids correctly and deal
> with inodes which may have invalid user/group ids. I've omitted patches
> for anything not used by fuse - quota, fs freezing, some helper
> functions, etc. - but if these are wanted for the sake of completeness I
> can include them.
> 
> Patches 15-18 update fuse to deal with mounts from non-init pid and user
> namespaces and enable mounting from user namespaces.
> 
> Changes since v1:
>  - Drop patch for FIBMAP.
>  - Use current_in_userns in fuse_allow_current_process.
>  - Remove checks for uid/gid validity in fuse. Intead, ids from the
>    backing store which do not map into s_user_ns will result in invalid
>    ids in the vfs inode. Checks in the vfs will prevent unmappable ids
>    from being passed in from above.
>  - Update a couple of commit messages to provide more detail about
>    changes.

Now that the merge window is over, I'm wondering whether it might be
possible to get some feedback on these patches this cycle?

Thanks,
Seth

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces
  2016-01-25 19:47 ` [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
@ 2016-01-25 20:01     ` Eric W. Biederman
  0 siblings, 0 replies; 68+ messages in thread
From: Eric W. Biederman @ 2016-01-25 20:01 UTC (permalink / raw)
  To: Seth Forshee
  Cc: linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel,
	fuse-devel, linux-security-module, selinux, Alexander Viro,
	Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn,
	Miklos Szeredi, linux-kernel

Seth Forshee <seth.forshee@canonical.com> writes:

> On Mon, Jan 04, 2016 at 12:03:39PM -0600, Seth Forshee wrote:
>> These patches implement support for mounting filesystems in user
>> namespaces using fuse. They are based on the patches in the for-testing
>> branch of
>> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git,
>> but I've rebased them onto 4.4-rc3. I've pushed all of this to:
>> 
>>  git://git.kernel.org/pub/scm/linux/kernel/git/sforshee/linux.git fuse-userns
>> 
>> The patches are organized into three high-level groups.
>> 
>> Patches 1-6 are related to security, adding restrictions for
>> unprivileged mounts and updating the LSMs as needed. Patches 1-2
>> (checking inode permissions for block device mounts) may not be strictly
>> necessary for fuseblk mounts since fuse doesn't do any IO on the block
>> device in the kernel, but it still seems like a good idea to fail the
>> mount if the user doesn't have the required permissions for the inode
>> (though this is a bit misleading with fuse since the mounts are done via
>> a suid-root helper).
>> 
>> Patches 7-14 update most of the vfs to translate ids correctly and deal
>> with inodes which may have invalid user/group ids. I've omitted patches
>> for anything not used by fuse - quota, fs freezing, some helper
>> functions, etc. - but if these are wanted for the sake of completeness I
>> can include them.
>> 
>> Patches 15-18 update fuse to deal with mounts from non-init pid and user
>> namespaces and enable mounting from user namespaces.
>> 
>> Changes since v1:
>>  - Drop patch for FIBMAP.
>>  - Use current_in_userns in fuse_allow_current_process.
>>  - Remove checks for uid/gid validity in fuse. Intead, ids from the
>>    backing store which do not map into s_user_ns will result in invalid
>>    ids in the vfs inode. Checks in the vfs will prevent unmappable ids
>>    from being passed in from above.
>>  - Update a couple of commit messages to provide more detail about
>>    changes.
>
> Now that the merge window is over, I'm wondering whether it might be
> possible to get some feedback on these patches this cycle?

Definitely.  Apologies for not giving you much feedback earlier.

I had been hoping this was the kind of thing I could just double check
to be certain you weren't doing anything silly and just apply.  After my
last round of looking at this I realized that for me to be comfortable
with these patches I will have to give them very close scrutiny, and
check every detail.

Unfortunatly last cycle I had failed to budget enough time to give these
patches the close scrutiny they need.

From a high level I am still very much in favor of this approach and
at least getting as far as safe unprivileged fuse mounts.

I have one or two little things to look at and then I hope to be going
through your patches one by one in detail.

Eric


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces
@ 2016-01-25 20:01     ` Eric W. Biederman
  0 siblings, 0 replies; 68+ messages in thread
From: Eric W. Biederman @ 2016-01-25 20:01 UTC (permalink / raw)
  To: Seth Forshee
  Cc: linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel,
	fuse-devel, linux-security-module, selinux, Alexander Viro,
	Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn,
	Miklos Szeredi, linux-kernel

Seth Forshee <seth.forshee@canonical.com> writes:

> On Mon, Jan 04, 2016 at 12:03:39PM -0600, Seth Forshee wrote:
>> These patches implement support for mounting filesystems in user
>> namespaces using fuse. They are based on the patches in the for-testing
>> branch of
>> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git,
>> but I've rebased them onto 4.4-rc3. I've pushed all of this to:
>> 
>>  git://git.kernel.org/pub/scm/linux/kernel/git/sforshee/linux.git fuse-userns
>> 
>> The patches are organized into three high-level groups.
>> 
>> Patches 1-6 are related to security, adding restrictions for
>> unprivileged mounts and updating the LSMs as needed. Patches 1-2
>> (checking inode permissions for block device mounts) may not be strictly
>> necessary for fuseblk mounts since fuse doesn't do any IO on the block
>> device in the kernel, but it still seems like a good idea to fail the
>> mount if the user doesn't have the required permissions for the inode
>> (though this is a bit misleading with fuse since the mounts are done via
>> a suid-root helper).
>> 
>> Patches 7-14 update most of the vfs to translate ids correctly and deal
>> with inodes which may have invalid user/group ids. I've omitted patches
>> for anything not used by fuse - quota, fs freezing, some helper
>> functions, etc. - but if these are wanted for the sake of completeness I
>> can include them.
>> 
>> Patches 15-18 update fuse to deal with mounts from non-init pid and user
>> namespaces and enable mounting from user namespaces.
>> 
>> Changes since v1:
>>  - Drop patch for FIBMAP.
>>  - Use current_in_userns in fuse_allow_current_process.
>>  - Remove checks for uid/gid validity in fuse. Intead, ids from the
>>    backing store which do not map into s_user_ns will result in invalid
>>    ids in the vfs inode. Checks in the vfs will prevent unmappable ids
>>    from being passed in from above.
>>  - Update a couple of commit messages to provide more detail about
>>    changes.
>
> Now that the merge window is over, I'm wondering whether it might be
> possible to get some feedback on these patches this cycle?

Definitely.  Apologies for not giving you much feedback earlier.

I had been hoping this was the kind of thing I could just double check
to be certain you weren't doing anything silly and just apply.  After my
last round of looking at this I realized that for me to be comfortable
with these patches I will have to give them very close scrutiny, and
check every detail.

Unfortunatly last cycle I had failed to budget enough time to give these
patches the close scrutiny they need.

>From a high level I am still very much in favor of this approach and
at least getting as far as safe unprivileged fuse mounts.

I have one or two little things to look at and then I hope to be going
through your patches one by one in detail.

Eric

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces
  2016-01-25 20:01     ` Eric W. Biederman
  (?)
@ 2016-01-25 20:36     ` Seth Forshee
  -1 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-01-25 20:36 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel,
	fuse-devel, linux-security-module, selinux, Alexander Viro,
	Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn,
	Miklos Szeredi, linux-kernel

On Mon, Jan 25, 2016 at 02:01:22PM -0600, Eric W. Biederman wrote:
> Seth Forshee <seth.forshee@canonical.com> writes:
> 
> > On Mon, Jan 04, 2016 at 12:03:39PM -0600, Seth Forshee wrote:
> >> These patches implement support for mounting filesystems in user
> >> namespaces using fuse. They are based on the patches in the for-testing
> >> branch of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git,
> >> but I've rebased them onto 4.4-rc3. I've pushed all of this to:
> >> 
> >>  git://git.kernel.org/pub/scm/linux/kernel/git/sforshee/linux.git fuse-userns
> >> 
> >> The patches are organized into three high-level groups.
> >> 
> >> Patches 1-6 are related to security, adding restrictions for
> >> unprivileged mounts and updating the LSMs as needed. Patches 1-2
> >> (checking inode permissions for block device mounts) may not be strictly
> >> necessary for fuseblk mounts since fuse doesn't do any IO on the block
> >> device in the kernel, but it still seems like a good idea to fail the
> >> mount if the user doesn't have the required permissions for the inode
> >> (though this is a bit misleading with fuse since the mounts are done via
> >> a suid-root helper).
> >> 
> >> Patches 7-14 update most of the vfs to translate ids correctly and deal
> >> with inodes which may have invalid user/group ids. I've omitted patches
> >> for anything not used by fuse - quota, fs freezing, some helper
> >> functions, etc. - but if these are wanted for the sake of completeness I
> >> can include them.
> >> 
> >> Patches 15-18 update fuse to deal with mounts from non-init pid and user
> >> namespaces and enable mounting from user namespaces.
> >> 
> >> Changes since v1:
> >>  - Drop patch for FIBMAP.
> >>  - Use current_in_userns in fuse_allow_current_process.
> >>  - Remove checks for uid/gid validity in fuse. Intead, ids from the
> >>    backing store which do not map into s_user_ns will result in invalid
> >>    ids in the vfs inode. Checks in the vfs will prevent unmappable ids
> >>    from being passed in from above.
> >>  - Update a couple of commit messages to provide more detail about
> >>    changes.
> >
> > Now that the merge window is over, I'm wondering whether it might be
> > possible to get some feedback on these patches this cycle?
> 
> Definitely.  Apologies for not giving you much feedback earlier.
> 
> I had been hoping this was the kind of thing I could just double check
> to be certain you weren't doing anything silly and just apply.  After my
> last round of looking at this I realized that for me to be comfortable
> with these patches I will have to give them very close scrutiny, and
> check every detail.
> 
> Unfortunatly last cycle I had failed to budget enough time to give these
> patches the close scrutiny they need.
> 
> From a high level I am still very much in favor of this approach and
> at least getting as far as safe unprivileged fuse mounts.
> 
> I have one or two little things to look at and then I hope to be going
> through your patches one by one in detail.

Great. Thanks Eric.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 11/18] fs: Ensure the mounter of a filesystem is privileged towards its inodes
  2016-01-04 18:03     ` Seth Forshee
  (?)
@ 2016-03-03 17:02     ` Seth Forshee
  2016-03-04 22:43       ` Eric W. Biederman
  -1 siblings, 1 reply; 68+ messages in thread
From: Seth Forshee @ 2016-03-03 17:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, Miklos Szeredi, linux-kernel, linux-bcache,
	dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux

On Mon, Jan 04, 2016 at 12:03:50PM -0600, Seth Forshee wrote:
> The mounter of a filesystem should be privileged towards the
> inodes of that filesystem. Extend the checks in
> inode_owner_or_capable() and capable_wrt_inode_uidgid() to
> permit access by users priviliged in the user namespace of the
> inode's superblock.

Eric - I've discovered a problem related to this patch. The patches
you've already applied to your testing branch make it so that s_user_ns
can be an unprivileged user for proc and kernfs-based mounts. In some
cases DAC is the only thing protecting files in these mounts (ignoring
MAC), and with this patch an unprivileged user could bypass DAC.

There's a simple solution - always set s_user_ns to &init_user_ns for
those filesystems. I think this is the right thing to do, since the
backing store behind these filesystems are really kernel objects.  But
this would break the assumption behind your patch "userns: Simpilify
MNT_NODEV handling" and cause a regression in mounting behavior.

I've come up with several possible solutions for this conflict.

 1. Drop this patch and keep on setting s_user_ns to unprivilged users.
    This would be unfortunate because I think this patch does make sense
    for most filesystems.
 2. Restrict this patch so that a user privileged towards s_user_ns is
    only privileged towards the super blocks inodes if s_user_ns has a
    mapping for both i_uid and i_gid. This is better than (1) but still
    not ideal in my mind.
 3. Drop your patch and maintain the current MNT_NODEV behavior.
 4. Add a new s_iflags flag to indicate a super block is from an
    unprivileged mount, and use this in your patch instead of s_user_ns.

Any preference, or any other ideas?

Thanks,
Seth

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 11/18] fs: Ensure the mounter of a filesystem is privileged towards its inodes
  2016-03-03 17:02     ` Seth Forshee
@ 2016-03-04 22:43       ` Eric W. Biederman
  2016-03-06 15:48         ` Seth Forshee
  2016-03-28 16:59         ` Seth Forshee
  0 siblings, 2 replies; 68+ messages in thread
From: Eric W. Biederman @ 2016-03-04 22:43 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, Miklos Szeredi, linux-kernel, linux-bcache,
	dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux

Seth Forshee <seth.forshee@canonical.com> writes:

> On Mon, Jan 04, 2016 at 12:03:50PM -0600, Seth Forshee wrote:
>> The mounter of a filesystem should be privileged towards the
>> inodes of that filesystem. Extend the checks in
>> inode_owner_or_capable() and capable_wrt_inode_uidgid() to
>> permit access by users priviliged in the user namespace of the
>> inode's superblock.
>
> Eric - I've discovered a problem related to this patch. The patches
> you've already applied to your testing branch make it so that s_user_ns
> can be an unprivileged user for proc and kernfs-based mounts. In some
> cases DAC is the only thing protecting files in these mounts (ignoring
> MAC), and with this patch an unprivileged user could bypass DAC.
>
> There's a simple solution - always set s_user_ns to &init_user_ns for
> those filesystems. I think this is the right thing to do, since the
> backing store behind these filesystems are really kernel objects.  But
> this would break the assumption behind your patch "userns: Simpilify
> MNT_NODEV handling" and cause a regression in mounting behavior.
>
> I've come up with several possible solutions for this conflict.
>
>  1. Drop this patch and keep on setting s_user_ns to unprivilged users.
>     This would be unfortunate because I think this patch does make sense
>     for most filesystems.
>  2. Restrict this patch so that a user privileged towards s_user_ns is
>     only privileged towards the super blocks inodes if s_user_ns has a
>     mapping for both i_uid and i_gid. This is better than (1) but still
>     not ideal in my mind.
>  3. Drop your patch and maintain the current MNT_NODEV behavior.
>  4. Add a new s_iflags flag to indicate a super block is from an
>     unprivileged mount, and use this in your patch instead of s_user_ns.
>
> Any preference, or any other ideas?

In general this is only an issue if uids and gids on the filesystem
do not map into the user namespace.

Therefore the general fix is to limit the logic of checking for
capabilities in s_user_ns if we are dealing with INVALID_UID and
INVALID_GID.  For proc and kernfs that should never be the case
so the problem becomes a non-issue.

Further I would look at limiting that relaxation to just
inode_change_ok.  So that we can easily wrap that check per filesystem
and deny the relaxation for proc and kernfs.  proc and kernfs already
have wrappers for .setattr so denying changes when !uid_vaid and
!gid_valid would be a trivial addition, and ensure calamity does
not ensure.

Furthmore by limiting any additional to inode_change_ok we keep
the work of the additional tests off of the fast paths.

Eric

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 11/18] fs: Ensure the mounter of a filesystem is privileged towards its inodes
  2016-03-04 22:43       ` Eric W. Biederman
@ 2016-03-06 15:48         ` Seth Forshee
  2016-03-06 22:07           ` Eric W. Biederman
  2016-03-28 16:59         ` Seth Forshee
  1 sibling, 1 reply; 68+ messages in thread
From: Seth Forshee @ 2016-03-06 15:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, Miklos Szeredi, linux-kernel, linux-bcache,
	dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux

On Fri, Mar 04, 2016 at 04:43:06PM -0600, Eric W. Biederman wrote:
> Seth Forshee <seth.forshee@canonical.com> writes:
> 
> > On Mon, Jan 04, 2016 at 12:03:50PM -0600, Seth Forshee wrote:
> >> The mounter of a filesystem should be privileged towards the
> >> inodes of that filesystem. Extend the checks in
> >> inode_owner_or_capable() and capable_wrt_inode_uidgid() to
> >> permit access by users priviliged in the user namespace of the
> >> inode's superblock.
> >
> > Eric - I've discovered a problem related to this patch. The patches
> > you've already applied to your testing branch make it so that s_user_ns
> > can be an unprivileged user for proc and kernfs-based mounts. In some
> > cases DAC is the only thing protecting files in these mounts (ignoring
> > MAC), and with this patch an unprivileged user could bypass DAC.
> >
> > There's a simple solution - always set s_user_ns to &init_user_ns for
> > those filesystems. I think this is the right thing to do, since the
> > backing store behind these filesystems are really kernel objects.  But
> > this would break the assumption behind your patch "userns: Simpilify
> > MNT_NODEV handling" and cause a regression in mounting behavior.
> >
> > I've come up with several possible solutions for this conflict.
> >
> >  1. Drop this patch and keep on setting s_user_ns to unprivilged users.
> >     This would be unfortunate because I think this patch does make sense
> >     for most filesystems.
> >  2. Restrict this patch so that a user privileged towards s_user_ns is
> >     only privileged towards the super blocks inodes if s_user_ns has a
> >     mapping for both i_uid and i_gid. This is better than (1) but still
> >     not ideal in my mind.
> >  3. Drop your patch and maintain the current MNT_NODEV behavior.
> >  4. Add a new s_iflags flag to indicate a super block is from an
> >     unprivileged mount, and use this in your patch instead of s_user_ns.
> >
> > Any preference, or any other ideas?
> 
> In general this is only an issue if uids and gids on the filesystem
> do not map into the user namespace.

Yes, both capable_wrt_inode_uidgid and inode_owner_or_capable will
return true for a privileged user in the current namespace if the ids
map into that namespace.

> Therefore the general fix is to limit the logic of checking for
> capabilities in s_user_ns if we are dealing with INVALID_UID and
> INVALID_GID.  For proc and kernfs that should never be the case
> so the problem becomes a non-issue.
> 
> Further I would look at limiting that relaxation to just
> inode_change_ok.  So that we can easily wrap that check per filesystem
> and deny the relaxation for proc and kernfs.  proc and kernfs already
> have wrappers for .setattr so denying changes when !uid_vaid and
> !gid_valid would be a trivial addition, and ensure calamity does
> not ensure.
> 
> Furthmore by limiting any additional to inode_change_ok we keep
> the work of the additional tests off of the fast paths.

So then the inode would need to be chowned before a privileged user in a
non-init namespace would be capable towards it. That seems workable. It
looks like INVALID_UID and INVALID_GID do map into init_user_ns (which
seems a bit odd) so real root remains capable towards those indoes.

That seems okay to me then.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 11/18] fs: Ensure the mounter of a filesystem is privileged towards its inodes
  2016-03-06 15:48         ` Seth Forshee
@ 2016-03-06 22:07           ` Eric W. Biederman
  2016-03-07 13:32             ` Seth Forshee
  0 siblings, 1 reply; 68+ messages in thread
From: Eric W. Biederman @ 2016-03-06 22:07 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, Miklos Szeredi, linux-kernel, linux-bcache,
	dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux

Seth Forshee <seth.forshee@canonical.com> writes:

> On Fri, Mar 04, 2016 at 04:43:06PM -0600, Eric W. Biederman wrote:
>> Seth Forshee <seth.forshee@canonical.com> writes:
>> 
>> > On Mon, Jan 04, 2016 at 12:03:50PM -0600, Seth Forshee wrote:
>> >> The mounter of a filesystem should be privileged towards the
>> >> inodes of that filesystem. Extend the checks in
>> >> inode_owner_or_capable() and capable_wrt_inode_uidgid() to
>> >> permit access by users priviliged in the user namespace of the
>> >> inode's superblock.
>> >
>> > Eric - I've discovered a problem related to this patch. The patches
>> > you've already applied to your testing branch make it so that s_user_ns
>> > can be an unprivileged user for proc and kernfs-based mounts. In some
>> > cases DAC is the only thing protecting files in these mounts (ignoring
>> > MAC), and with this patch an unprivileged user could bypass DAC.
>> >
>> > There's a simple solution - always set s_user_ns to &init_user_ns for
>> > those filesystems. I think this is the right thing to do, since the
>> > backing store behind these filesystems are really kernel objects.  But
>> > this would break the assumption behind your patch "userns: Simpilify
>> > MNT_NODEV handling" and cause a regression in mounting behavior.
>> >
>> > I've come up with several possible solutions for this conflict.
>> >
>> >  1. Drop this patch and keep on setting s_user_ns to unprivilged users.
>> >     This would be unfortunate because I think this patch does make sense
>> >     for most filesystems.
>> >  2. Restrict this patch so that a user privileged towards s_user_ns is
>> >     only privileged towards the super blocks inodes if s_user_ns has a
>> >     mapping for both i_uid and i_gid. This is better than (1) but still
>> >     not ideal in my mind.
>> >  3. Drop your patch and maintain the current MNT_NODEV behavior.
>> >  4. Add a new s_iflags flag to indicate a super block is from an
>> >     unprivileged mount, and use this in your patch instead of s_user_ns.
>> >
>> > Any preference, or any other ideas?
>> 
>> In general this is only an issue if uids and gids on the filesystem
>> do not map into the user namespace.
>
> Yes, both capable_wrt_inode_uidgid and inode_owner_or_capable will
> return true for a privileged user in the current namespace if the ids
> map into that namespace.
>
>> Therefore the general fix is to limit the logic of checking for
>> capabilities in s_user_ns if we are dealing with INVALID_UID and
>> INVALID_GID.  For proc and kernfs that should never be the case
>> so the problem becomes a non-issue.
>> 
>> Further I would look at limiting that relaxation to just
>> inode_change_ok.  So that we can easily wrap that check per filesystem
>> and deny the relaxation for proc and kernfs.  proc and kernfs already
>> have wrappers for .setattr so denying changes when !uid_vaid and
>> !gid_valid would be a trivial addition, and ensure calamity does
>> not ensure.
>> 
>> Furthmore by limiting any additional to inode_change_ok we keep
>> the work of the additional tests off of the fast paths.
>
> So then the inode would need to be chowned before a privileged user in a
> non-init namespace would be capable towards it. That seems workable. It
> looks like INVALID_UID and INVALID_GID do map into init_user_ns (which
> seems a bit odd) so real root remains capable towards those indoes.
>
> That seems okay to me then.

If I was not clear I was suggesting that we allow a sufficiently
privileged user in the filesysteme's s_user_ns to allow chowning files
with INVALID_UID and INVALID_GID.

The global root user would always be able to do that because unless
capabilities are dropped it is sufficiently privileged in ever user
namespace.

Eric

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 11/18] fs: Ensure the mounter of a filesystem is privileged towards its inodes
  2016-03-06 22:07           ` Eric W. Biederman
@ 2016-03-07 13:32             ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-03-07 13:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, Miklos Szeredi, linux-kernel, linux-bcache,
	dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux

On Sun, Mar 06, 2016 at 04:07:49PM -0600, Eric W. Biederman wrote:
> Seth Forshee <seth.forshee@canonical.com> writes:
> 
> > On Fri, Mar 04, 2016 at 04:43:06PM -0600, Eric W. Biederman wrote:
> >> Seth Forshee <seth.forshee@canonical.com> writes:
> >> 
> >> > On Mon, Jan 04, 2016 at 12:03:50PM -0600, Seth Forshee wrote:
> >> >> The mounter of a filesystem should be privileged towards the
> >> >> inodes of that filesystem. Extend the checks in
> >> >> inode_owner_or_capable() and capable_wrt_inode_uidgid() to
> >> >> permit access by users priviliged in the user namespace of the
> >> >> inode's superblock.
> >> >
> >> > Eric - I've discovered a problem related to this patch. The patches
> >> > you've already applied to your testing branch make it so that s_user_ns
> >> > can be an unprivileged user for proc and kernfs-based mounts. In some
> >> > cases DAC is the only thing protecting files in these mounts (ignoring
> >> > MAC), and with this patch an unprivileged user could bypass DAC.
> >> >
> >> > There's a simple solution - always set s_user_ns to &init_user_ns for
> >> > those filesystems. I think this is the right thing to do, since the
> >> > backing store behind these filesystems are really kernel objects.  But
> >> > this would break the assumption behind your patch "userns: Simpilify
> >> > MNT_NODEV handling" and cause a regression in mounting behavior.
> >> >
> >> > I've come up with several possible solutions for this conflict.
> >> >
> >> >  1. Drop this patch and keep on setting s_user_ns to unprivilged users.
> >> >     This would be unfortunate because I think this patch does make sense
> >> >     for most filesystems.
> >> >  2. Restrict this patch so that a user privileged towards s_user_ns is
> >> >     only privileged towards the super blocks inodes if s_user_ns has a
> >> >     mapping for both i_uid and i_gid. This is better than (1) but still
> >> >     not ideal in my mind.
> >> >  3. Drop your patch and maintain the current MNT_NODEV behavior.
> >> >  4. Add a new s_iflags flag to indicate a super block is from an
> >> >     unprivileged mount, and use this in your patch instead of s_user_ns.
> >> >
> >> > Any preference, or any other ideas?
> >> 
> >> In general this is only an issue if uids and gids on the filesystem
> >> do not map into the user namespace.
> >
> > Yes, both capable_wrt_inode_uidgid and inode_owner_or_capable will
> > return true for a privileged user in the current namespace if the ids
> > map into that namespace.
> >
> >> Therefore the general fix is to limit the logic of checking for
> >> capabilities in s_user_ns if we are dealing with INVALID_UID and
> >> INVALID_GID.  For proc and kernfs that should never be the case
> >> so the problem becomes a non-issue.
> >> 
> >> Further I would look at limiting that relaxation to just
> >> inode_change_ok.  So that we can easily wrap that check per filesystem
> >> and deny the relaxation for proc and kernfs.  proc and kernfs already
> >> have wrappers for .setattr so denying changes when !uid_vaid and
> >> !gid_valid would be a trivial addition, and ensure calamity does
> >> not ensure.
> >> 
> >> Furthmore by limiting any additional to inode_change_ok we keep
> >> the work of the additional tests off of the fast paths.
> >
> > So then the inode would need to be chowned before a privileged user in a
> > non-init namespace would be capable towards it. That seems workable. It
> > looks like INVALID_UID and INVALID_GID do map into init_user_ns (which
> > seems a bit odd) so real root remains capable towards those indoes.
> >
> > That seems okay to me then.
> 
> If I was not clear I was suggesting that we allow a sufficiently
> privileged user in the filesysteme's s_user_ns to allow chowning files
> with INVALID_UID and INVALID_GID.

Right, I got that.

> The global root user would always be able to do that because unless
> capabilities are dropped it is sufficiently privileged in ever user
> namespace.

Sure. I was just commenting on one result - that ns-root has to chown
the file before being privileged wrt that file but global root does not,
on account of the fact that the invalid ids are mapped in init_user_ns.

Seth

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 15/18] fuse: Add support for pid namespaces
  2016-01-04 18:03     ` Seth Forshee
  (?)
@ 2016-03-09 10:53     ` Miklos Szeredi
  2016-03-09 14:17       ` Seth Forshee
  -1 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2016-03-09 10:53 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, linux-kernel,
	linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel,
	fuse-devel, linux-security-module, selinux, Miklos Szeredi

On Mon, Jan 04, 2016 at 12:03:54PM -0600, Seth Forshee wrote:
> If the userspace process servicing fuse requests is running in
> a pid namespace then pids passed via the fuse fd need to be
> translated relative to that namespace. Capture the pid namespace
> in use when the filesystem is mounted and use this for pid
> translation.
> 
> Since no use case currently exists for changing namespaces all
> translations are done relative to the pid namespace in use when
> /dev/fuse is opened.

The above doesn't match what the patch does.

 - FUSE captures namespace at mount time

 - CUSE captures namespace at /dev/cuse open


>  Mounting or /dev/fuse IO from another
> namespace will return errors.
> 
> Requests from processes whose pid cannot be translated into the
> target namespace are not permitted, except for requests
> allocated via fuse_get_req_nofail_nopages. For no-fail requests
> in.h.pid will be 0 if the pid translation fails.
> 
> File locking changes based on previous work done by Eric
> Biederman.
> 
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>

Not sure how my SOB got on this patch, use this instead:

Acked-by: Miklos Szeredi <mszeredi@redhat.com>

> ---
>  fs/fuse/dev.c    | 19 +++++++++++++++----
>  fs/fuse/file.c   | 22 +++++++++++++++++-----
>  fs/fuse/fuse_i.h |  4 ++++
>  fs/fuse/inode.c  |  3 +++
>  4 files changed, 39 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index ebb5e37455a0..a4f6f30d6d86 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -19,6 +19,7 @@
>  #include <linux/pipe_fs_i.h>
>  #include <linux/swap.h>
>  #include <linux/splice.h>
> +#include <linux/sched.h>
>  
>  MODULE_ALIAS_MISCDEV(FUSE_MINOR);
>  MODULE_ALIAS("devname:fuse");
> @@ -124,11 +125,11 @@ static void __fuse_put_request(struct fuse_req *req)
>  	atomic_dec(&req->count);
>  }
>  
> -static void fuse_req_init_context(struct fuse_req *req)
> +static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
>  {
>  	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
>  	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
> -	req->in.h.pid = current->pid;
> +	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
>  }
>  
>  void fuse_set_initialized(struct fuse_conn *fc)
> @@ -181,10 +182,14 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
>  		goto out;
>  	}
>  
> -	fuse_req_init_context(req);
> +	fuse_req_init_context(fc, req);
>  	__set_bit(FR_WAITING, &req->flags);
>  	if (for_background)
>  		__set_bit(FR_BACKGROUND, &req->flags);
> +	if (req->in.h.pid == 0) {
> +		fuse_put_request(fc, req);
> +		return ERR_PTR(-EOVERFLOW);
> +	}
>  
>  	return req;
>  
> @@ -274,7 +279,7 @@ struct fuse_req *fuse_get_req_nofail_nopages(struct fuse_conn *fc,
>  	if (!req)
>  		req = get_reserved_req(fc, file);
>  
> -	fuse_req_init_context(req);
> +	fuse_req_init_context(fc, req);
>  	__set_bit(FR_WAITING, &req->flags);
>  	__clear_bit(FR_BACKGROUND, &req->flags);
>  	return req;
> @@ -1243,6 +1248,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>  	struct fuse_in *in;
>  	unsigned reqsize;
>  
> +	if (task_active_pid_ns(current) != fc->pid_ns)
> +		return -EIO;
> +
>   restart:
>  	spin_lock(&fiq->waitq.lock);
>  	err = -EAGAIN;
> @@ -1872,6 +1880,9 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
>  	struct fuse_req *req;
>  	struct fuse_out_header oh;
>  
> +	if (task_active_pid_ns(current) != fc->pid_ns)
> +		return -EIO;
> +
>  	if (nbytes < sizeof(struct fuse_out_header))
>  		return -EINVAL;
>  
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index e0faf8f2c868..a6c7484c94ee 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -2061,7 +2061,8 @@ static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma)
>  	return generic_file_mmap(file, vma);
>  }
>  
> -static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
> +static int convert_fuse_file_lock(struct fuse_conn *fc,
> +				  const struct fuse_file_lock *ffl,
>  				  struct file_lock *fl)
>  {
>  	switch (ffl->type) {
> @@ -2076,7 +2077,14 @@ static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
>  
>  		fl->fl_start = ffl->start;
>  		fl->fl_end = ffl->end;
> -		fl->fl_pid = ffl->pid;
> +
> +		/*
> +		 * Convert pid into the caller's pid namespace. If the pid
> +		 * does not map into the namespace fl_pid will get set to 0.
> +		 */
> +		rcu_read_lock();
> +		fl->fl_pid = pid_vnr(find_pid_ns(ffl->pid, fc->pid_ns));
> +		rcu_read_unlock();
>  		break;
>  
>  	default:
> @@ -2125,7 +2133,7 @@ static int fuse_getlk(struct file *file, struct file_lock *fl)
>  	args.out.args[0].value = &outarg;
>  	err = fuse_simple_request(fc, &args);
>  	if (!err)
> -		err = convert_fuse_file_lock(&outarg.lk, fl);
> +		err = convert_fuse_file_lock(fc, &outarg.lk, fl);
>  
>  	return err;
>  }
> @@ -2137,7 +2145,8 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
>  	FUSE_ARGS(args);
>  	struct fuse_lk_in inarg;
>  	int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
> -	pid_t pid = fl->fl_type != F_UNLCK ? current->tgid : 0;
> +	struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL;
> +	pid_t pid_nr = pid_nr_ns(pid, fc->pid_ns);
>  	int err;
>  
>  	if (fl->fl_lmops && fl->fl_lmops->lm_grant) {
> @@ -2149,7 +2158,10 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
>  	if (fl->fl_flags & FL_CLOSE)
>  		return 0;
>  
> -	fuse_lk_fill(&args, file, fl, opcode, pid, flock, &inarg);
> +	if (pid && pid_nr == 0)
> +		return -EOVERFLOW;
> +
> +	fuse_lk_fill(&args, file, fl, opcode, pid_nr, flock, &inarg);
>  	err = fuse_simple_request(fc, &args);
>  
>  	/* locking is restartable */
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 405113101db8..143b595197b6 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -22,6 +22,7 @@
>  #include <linux/rbtree.h>
>  #include <linux/poll.h>
>  #include <linux/workqueue.h>
> +#include <linux/pid_namespace.h>
>  
>  /** Max number of pages that can be used in a single read request */
>  #define FUSE_MAX_PAGES_PER_REQ 32
> @@ -456,6 +457,9 @@ struct fuse_conn {
>  	/** The group id for this mount */
>  	kgid_t group_id;
>  
> +	/** The pid namespace for this mount */
> +	struct pid_namespace *pid_ns;
> +
>  	/** The fuse mount flags for this mount */
>  	unsigned flags;
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 2913db2a5b99..2f31874ea9db 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -20,6 +20,7 @@
>  #include <linux/random.h>
>  #include <linux/sched.h>
>  #include <linux/exportfs.h>
> +#include <linux/pid_namespace.h>
>  
>  MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
>  MODULE_DESCRIPTION("Filesystem in Userspace");
> @@ -609,6 +610,7 @@ void fuse_conn_init(struct fuse_conn *fc)
>  	fc->connected = 1;
>  	fc->attr_version = 1;
>  	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
> +	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
>  }
>  EXPORT_SYMBOL_GPL(fuse_conn_init);
>  
> @@ -617,6 +619,7 @@ void fuse_conn_put(struct fuse_conn *fc)
>  	if (atomic_dec_and_test(&fc->count)) {
>  		if (fc->destroy_req)
>  			fuse_request_free(fc->destroy_req);
> +		put_pid_ns(fc->pid_ns);
>  		fc->release(fc);
>  	}
>  }
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 16/18] fuse: Support fuse filesystems outside of init_user_ns
  2016-01-04 18:03     ` Seth Forshee
  (?)
@ 2016-03-09 11:29     ` Miklos Szeredi
  2016-03-09 14:18       ` Seth Forshee
  -1 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2016-03-09 11:29 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, linux-kernel,
	linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel,
	fuse-devel, linux-security-module, selinux

On Mon, Jan 04, 2016 at 12:03:55PM -0600, Seth Forshee wrote:
> In order to support mounts from namespaces other than
> init_user_ns, fuse must translate uids and gids to/from the
> userns of the process servicing requests on /dev/fuse. This
> patch does that, with a couple of restrictions on the namespace:
> 
>  - The userns for the fuse connection is fixed to the namespace
>    from which /dev/fuse is opened.
> 
>  - The namespace must be the same as s_user_ns.
> 
> These restrictions simplify the implementation by avoiding the
> need to pass around userns references and by allowing fuse to
> rely on the checks in inode_change_ok for ownership changes.
> Either restriction could be relaxed in the future if needed.
> 
> For cuse the namespace used for the connection is also simply
> current_user_ns() at the time /dev/cuse is opened.
> 
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> ---
>  fs/fuse/cuse.c   |  3 ++-
>  fs/fuse/dev.c    | 13 ++++++++-----
>  fs/fuse/dir.c    | 14 +++++++-------
>  fs/fuse/fuse_i.h |  6 +++++-
>  fs/fuse/inode.c  | 35 +++++++++++++++++++++++------------
>  5 files changed, 45 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
> index eae2c11268bc..a10aca57bfe4 100644
> --- a/fs/fuse/cuse.c
> +++ b/fs/fuse/cuse.c
> @@ -48,6 +48,7 @@
>  #include <linux/stat.h>
>  #include <linux/module.h>
>  #include <linux/uio.h>
> +#include <linux/user_namespace.h>
>  
>  #include "fuse_i.h"
>  
> @@ -498,7 +499,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file)
>  	if (!cc)
>  		return -ENOMEM;
>  
> -	fuse_conn_init(&cc->fc);
> +	fuse_conn_init(&cc->fc, current_user_ns());
>  
>  	fud = fuse_dev_alloc(&cc->fc);
>  	if (!fud) {
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index a4f6f30d6d86..11b4cb0a0e2f 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req)
>  
>  static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
>  {
> -	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
> -	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
> +	req->in.h.uid = from_kuid(fc->user_ns, current_fsuid());
> +	req->in.h.gid = from_kgid(fc->user_ns, current_fsgid());
>  	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
>  }
>  
> @@ -186,7 +186,8 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
>  	__set_bit(FR_WAITING, &req->flags);
>  	if (for_background)
>  		__set_bit(FR_BACKGROUND, &req->flags);
> -	if (req->in.h.pid == 0) {
> +	if (req->in.h.pid == 0 || req->in.h.uid == (uid_t)-1 ||
> +	    req->in.h.gid == (gid_t)-1) {
>  		fuse_put_request(fc, req);
>  		return ERR_PTR(-EOVERFLOW);
>  	}
> @@ -1248,7 +1249,8 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>  	struct fuse_in *in;
>  	unsigned reqsize;
>  
> -	if (task_active_pid_ns(current) != fc->pid_ns)
> +	if (task_active_pid_ns(current) != fc->pid_ns ||
> +	    current_user_ns() != fc->user_ns)
>  		return -EIO;
>  
>   restart:
> @@ -1880,7 +1882,8 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
>  	struct fuse_req *req;
>  	struct fuse_out_header oh;
>  
> -	if (task_active_pid_ns(current) != fc->pid_ns)
> +	if (task_active_pid_ns(current) != fc->pid_ns ||
> +	    current_user_ns() != fc->user_ns)
>  		return -EIO;
>  
>  	if (nbytes < sizeof(struct fuse_out_header))
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 5e2e08712d3b..8fd9fe4dcd43 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -841,8 +841,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
>  	stat->ino = attr->ino;
>  	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
>  	stat->nlink = attr->nlink;
> -	stat->uid = make_kuid(&init_user_ns, attr->uid);
> -	stat->gid = make_kgid(&init_user_ns, attr->gid);
> +	stat->uid = inode->i_uid;
> +	stat->gid = inode->i_gid;

This breaks the attr_version logic in fuse_change_attributes().

So just use make_k[ug]id() here as well.

>  	stat->rdev = inode->i_rdev;
>  	stat->atime.tv_sec = attr->atime;
>  	stat->atime.tv_nsec = attr->atimensec;
> @@ -1455,17 +1455,17 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
>  	return true;
>  }
>  
> -static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
> -			   bool trust_local_cmtime)
> +static void iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
> +			   struct fuse_setattr_in *arg, bool trust_local_cmtime)
>  {
>  	unsigned ivalid = iattr->ia_valid;
>  
>  	if (ivalid & ATTR_MODE)
>  		arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
>  	if (ivalid & ATTR_UID)
> -		arg->valid |= FATTR_UID,    arg->uid = from_kuid(&init_user_ns, iattr->ia_uid);
> +		arg->valid |= FATTR_UID,    arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
>  	if (ivalid & ATTR_GID)
> -		arg->valid |= FATTR_GID,    arg->gid = from_kgid(&init_user_ns, iattr->ia_gid);
> +		arg->valid |= FATTR_GID,    arg->gid = from_kgid(fc->user_ns, iattr->ia_gid);
>  	if (ivalid & ATTR_SIZE)
>  		arg->valid |= FATTR_SIZE,   arg->size = iattr->ia_size;
>  	if (ivalid & ATTR_ATIME) {
> @@ -1625,7 +1625,7 @@ int fuse_do_setattr(struct inode *inode, struct iattr *attr,
>  
>  	memset(&inarg, 0, sizeof(inarg));
>  	memset(&outarg, 0, sizeof(outarg));
> -	iattr_to_fattr(attr, &inarg, trust_local_cmtime);
> +	iattr_to_fattr(fc, attr, &inarg, trust_local_cmtime);
>  	if (file) {
>  		struct fuse_file *ff = file->private_data;
>  		inarg.valid |= FATTR_FH;
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 143b595197b6..5897805405ba 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -23,6 +23,7 @@
>  #include <linux/poll.h>
>  #include <linux/workqueue.h>
>  #include <linux/pid_namespace.h>
> +#include <linux/user_namespace.h>
>  
>  /** Max number of pages that can be used in a single read request */
>  #define FUSE_MAX_PAGES_PER_REQ 32
> @@ -460,6 +461,9 @@ struct fuse_conn {
>  	/** The pid namespace for this mount */
>  	struct pid_namespace *pid_ns;
>  
> +	/** The user namespace for this mount */
> +	struct user_namespace *user_ns;
> +
>  	/** The fuse mount flags for this mount */
>  	unsigned flags;
>  
> @@ -855,7 +859,7 @@ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
>  /**
>   * Initialize fuse_conn
>   */
> -void fuse_conn_init(struct fuse_conn *fc);
> +void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns);
>  
>  /**
>   * Release reference to fuse_conn
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 2f31874ea9db..b7bdfdac3521 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
>  	inode->i_ino     = fuse_squash_ino(attr->ino);
>  	inode->i_mode    = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
>  	set_nlink(inode, attr->nlink);
> -	inode->i_uid     = make_kuid(&init_user_ns, attr->uid);
> -	inode->i_gid     = make_kgid(&init_user_ns, attr->gid);
> +	inode->i_uid     = make_kuid(fc->user_ns, attr->uid);
> +	inode->i_gid     = make_kgid(fc->user_ns, attr->gid);
>  	inode->i_blocks  = attr->blocks;
>  	inode->i_atime.tv_sec   = attr->atime;
>  	inode->i_atime.tv_nsec  = attr->atimensec;
> @@ -467,12 +467,15 @@ static int fuse_match_uint(substring_t *s, unsigned int *res)
>  	return err;
>  }
>  
> -static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
> +static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
> +			  struct user_namespace *user_ns)
>  {
>  	char *p;
>  	memset(d, 0, sizeof(struct fuse_mount_data));
>  	d->max_read = ~0;
>  	d->blksize = FUSE_DEFAULT_BLKSIZE;
> +	d->user_id = make_kuid(user_ns, 0);
> +	d->group_id = make_kgid(user_ns, 0);

It is true that if "user_id=" or "group_id" options were omitted we used the
zero uid/gid values.  However, this isn't actually used by anybody AFAIK, and
generalizing it for userns doesn't seem to make much sense.

So I suggest we that we instead return an error if mounting from a userns AND
neither "allow_other" nor both "user_id" and "group_id" are specified.


>  
>  	while ((p = strsep(&opt, ",")) != NULL) {
>  		int token;
> @@ -503,7 +506,7 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
>  		case OPT_USER_ID:
>  			if (fuse_match_uint(&args[0], &uv))
>  				return 0;
> -			d->user_id = make_kuid(current_user_ns(), uv);
> +			d->user_id = make_kuid(user_ns, uv);
>  			if (!uid_valid(d->user_id))
>  				return 0;
>  			d->user_id_present = 1;
> @@ -512,7 +515,7 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
>  		case OPT_GROUP_ID:
>  			if (fuse_match_uint(&args[0], &uv))
>  				return 0;
> -			d->group_id = make_kgid(current_user_ns(), uv);
> +			d->group_id = make_kgid(user_ns, uv);
>  			if (!gid_valid(d->group_id))
>  				return 0;
>  			d->group_id_present = 1;
> @@ -555,8 +558,10 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
>  	struct super_block *sb = root->d_sb;
>  	struct fuse_conn *fc = get_fuse_conn_super(sb);
>  
> -	seq_printf(m, ",user_id=%u", from_kuid_munged(&init_user_ns, fc->user_id));
> -	seq_printf(m, ",group_id=%u", from_kgid_munged(&init_user_ns, fc->group_id));
> +	seq_printf(m, ",user_id=%u",
> +		   from_kuid_munged(fc->user_ns, fc->user_id));
> +	seq_printf(m, ",group_id=%u",
> +		   from_kgid_munged(fc->user_ns, fc->group_id));
>  	if (fc->flags & FUSE_DEFAULT_PERMISSIONS)
>  		seq_puts(m, ",default_permissions");
>  	if (fc->flags & FUSE_ALLOW_OTHER)
> @@ -587,7 +592,7 @@ static void fuse_pqueue_init(struct fuse_pqueue *fpq)
>  	fpq->connected = 1;
>  }
>  
> -void fuse_conn_init(struct fuse_conn *fc)
> +void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns)
>  {
>  	memset(fc, 0, sizeof(*fc));
>  	spin_lock_init(&fc->lock);
> @@ -611,6 +616,7 @@ void fuse_conn_init(struct fuse_conn *fc)
>  	fc->attr_version = 1;
>  	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
>  	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
> +	fc->user_ns = get_user_ns(user_ns);
>  }
>  EXPORT_SYMBOL_GPL(fuse_conn_init);
>  
> @@ -620,6 +626,7 @@ void fuse_conn_put(struct fuse_conn *fc)
>  		if (fc->destroy_req)
>  			fuse_request_free(fc->destroy_req);
>  		put_pid_ns(fc->pid_ns);
> +		put_user_ns(fc->user_ns);
>  		fc->release(fc);
>  	}
>  }
> @@ -1046,7 +1053,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
>  
>  	sb->s_flags &= ~(MS_NOSEC | MS_I_VERSION);
>  
> -	if (!parse_fuse_opt(data, &d, is_bdev))
> +	if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns))
>  		goto err;
>  
>  	if (is_bdev) {
> @@ -1070,8 +1077,12 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
>  	if (!file)
>  		goto err;
>  
> -	if ((file->f_op != &fuse_dev_operations) ||
> -	    (file->f_cred->user_ns != &init_user_ns))
> +	/*
> +	 * Require mount to happen from the same user namespace which
> +	 * opened /dev/fuse to prevent potential attacks.
> +	 */
> +	if (file->f_op != &fuse_dev_operations ||
> +	    file->f_cred->user_ns != sb->s_user_ns)
>  		goto err_fput;
>  
>  	fc = kmalloc(sizeof(*fc), GFP_KERNEL);
> @@ -1079,7 +1090,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
>  	if (!fc)
>  		goto err_fput;
>  
> -	fuse_conn_init(fc);
> +	fuse_conn_init(fc, sb->s_user_ns);
>  	fc->release = fuse_free_conn;
>  
>  	fud = fuse_dev_alloc(fc);
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 17/18] fuse: Restrict allow_other to the superblock's namespace or a descendant
  2016-01-04 18:03     ` Seth Forshee
  (?)
@ 2016-03-09 11:40     ` Miklos Szeredi
  -1 siblings, 0 replies; 68+ messages in thread
From: Miklos Szeredi @ 2016-03-09 11:40 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, linux-kernel,
	linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel,
	fuse-devel, linux-security-module, selinux

On Mon, Jan 04, 2016 at 12:03:56PM -0600, Seth Forshee wrote:
> Unprivileged users are normally restricted from mounting with the
> allow_other option by system policy, but this could be bypassed
> for a mount done with user namespace root permissions. In such
> cases allow_other should not allow users outside the userns
> to access the mount as doing so would give the unprivileged user
> the ability to manipulate processes it would otherwise be unable
> to manipulate. Restrict allow_other to apply to users in the same
> userns used at mount or a descendant of that namespace.
> 
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

Acked-by: Miklos Szeredi <mszeredi@redhat.com>

> ---
>  fs/fuse/dir.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 8fd9fe4dcd43..24e4cdb554f1 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1015,7 +1015,7 @@ int fuse_allow_current_process(struct fuse_conn *fc)
>  	const struct cred *cred;
>  
>  	if (fc->flags & FUSE_ALLOW_OTHER)
> -		return 1;
> +		return current_in_userns(fc->user_ns);
>  
>  	cred = current_cred();
>  	if (uid_eq(cred->euid, fc->user_id) &&
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 18/18] fuse: Allow user namespace mounts
  2016-01-04 18:03     ` Seth Forshee
  (?)
@ 2016-03-09 13:08     ` Miklos Szeredi
  -1 siblings, 0 replies; 68+ messages in thread
From: Miklos Szeredi @ 2016-03-09 13:08 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, linux-kernel,
	linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel,
	fuse-devel, linux-security-module, selinux

On Mon, Jan 04, 2016 at 12:03:57PM -0600, Seth Forshee wrote:
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

Acked-by: Miklos Szeredi <mszeredi@redhat.com>

> ---
>  fs/fuse/inode.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index b7bdfdac3521..2fd338c199ce 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1201,7 +1201,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
>  static struct file_system_type fuse_fs_type = {
>  	.owner		= THIS_MODULE,
>  	.name		= "fuse",
> -	.fs_flags	= FS_HAS_SUBTYPE,
> +	.fs_flags	= FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
>  	.mount		= fuse_mount,
>  	.kill_sb	= fuse_kill_sb_anon,
>  };
> @@ -1233,7 +1233,7 @@ static struct file_system_type fuseblk_fs_type = {
>  	.name		= "fuseblk",
>  	.mount		= fuse_mount_blk,
>  	.kill_sb	= fuse_kill_sb_blk,
> -	.fs_flags	= FS_REQUIRES_DEV | FS_HAS_SUBTYPE,
> +	.fs_flags	= FS_REQUIRES_DEV | FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
>  };
>  MODULE_ALIAS_FS("fuseblk");
>  
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 15/18] fuse: Add support for pid namespaces
  2016-03-09 10:53     ` Miklos Szeredi
@ 2016-03-09 14:17       ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-03-09 14:17 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, linux-kernel,
	linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel,
	fuse-devel, linux-security-module, selinux, Miklos Szeredi

On Wed, Mar 09, 2016 at 11:53:17AM +0100, Miklos Szeredi wrote:
> On Mon, Jan 04, 2016 at 12:03:54PM -0600, Seth Forshee wrote:
> > If the userspace process servicing fuse requests is running in
> > a pid namespace then pids passed via the fuse fd need to be
> > translated relative to that namespace. Capture the pid namespace
> > in use when the filesystem is mounted and use this for pid
> > translation.
> > 
> > Since no use case currently exists for changing namespaces all
> > translations are done relative to the pid namespace in use when
> > /dev/fuse is opened.
> 
> The above doesn't match what the patch does.
> 
>  - FUSE captures namespace at mount time
> 
>  - CUSE captures namespace at /dev/cuse open

Possibly an earlier version of the patch worked that way and I forgot to
update the description after it change. Anyway, I'll fix it.

> >  Mounting or /dev/fuse IO from another
> > namespace will return errors.
> > 
> > Requests from processes whose pid cannot be translated into the
> > target namespace are not permitted, except for requests
> > allocated via fuse_get_req_nofail_nopages. For no-fail requests
> > in.h.pid will be 0 if the pid translation fails.
> > 
> > File locking changes based on previous work done by Eric
> > Biederman.
> > 
> > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> > Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> 
> Not sure how my SOB got on this patch, use this instead:
> 
> Acked-by: Miklos Szeredi <mszeredi@redhat.com>

My memory is that you had sent a patch as a proposed alternative to one
of my earlier patches, and I squashed the two together and added your
SOB at that point. I'll change it.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 16/18] fuse: Support fuse filesystems outside of init_user_ns
  2016-03-09 11:29     ` Miklos Szeredi
@ 2016-03-09 14:18       ` Seth Forshee
  2016-03-09 14:48           ` Miklos Szeredi
  0 siblings, 1 reply; 68+ messages in thread
From: Seth Forshee @ 2016-03-09 14:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, linux-kernel,
	linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel,
	fuse-devel, linux-security-module, selinux

On Wed, Mar 09, 2016 at 12:29:23PM +0100, Miklos Szeredi wrote:
> On Mon, Jan 04, 2016 at 12:03:55PM -0600, Seth Forshee wrote:
> > In order to support mounts from namespaces other than
> > init_user_ns, fuse must translate uids and gids to/from the
> > userns of the process servicing requests on /dev/fuse. This
> > patch does that, with a couple of restrictions on the namespace:
> > 
> >  - The userns for the fuse connection is fixed to the namespace
> >    from which /dev/fuse is opened.
> > 
> >  - The namespace must be the same as s_user_ns.
> > 
> > These restrictions simplify the implementation by avoiding the
> > need to pass around userns references and by allowing fuse to
> > rely on the checks in inode_change_ok for ownership changes.
> > Either restriction could be relaxed in the future if needed.
> > 
> > For cuse the namespace used for the connection is also simply
> > current_user_ns() at the time /dev/cuse is opened.
> > 
> > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> > ---
> >  fs/fuse/cuse.c   |  3 ++-
> >  fs/fuse/dev.c    | 13 ++++++++-----
> >  fs/fuse/dir.c    | 14 +++++++-------
> >  fs/fuse/fuse_i.h |  6 +++++-
> >  fs/fuse/inode.c  | 35 +++++++++++++++++++++++------------
> >  5 files changed, 45 insertions(+), 26 deletions(-)
> > 
> > diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
> > index eae2c11268bc..a10aca57bfe4 100644
> > --- a/fs/fuse/cuse.c
> > +++ b/fs/fuse/cuse.c
> > @@ -48,6 +48,7 @@
> >  #include <linux/stat.h>
> >  #include <linux/module.h>
> >  #include <linux/uio.h>
> > +#include <linux/user_namespace.h>
> >  
> >  #include "fuse_i.h"
> >  
> > @@ -498,7 +499,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file)
> >  	if (!cc)
> >  		return -ENOMEM;
> >  
> > -	fuse_conn_init(&cc->fc);
> > +	fuse_conn_init(&cc->fc, current_user_ns());
> >  
> >  	fud = fuse_dev_alloc(&cc->fc);
> >  	if (!fud) {
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index a4f6f30d6d86..11b4cb0a0e2f 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > @@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req)
> >  
> >  static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
> >  {
> > -	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
> > -	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
> > +	req->in.h.uid = from_kuid(fc->user_ns, current_fsuid());
> > +	req->in.h.gid = from_kgid(fc->user_ns, current_fsgid());
> >  	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
> >  }
> >  
> > @@ -186,7 +186,8 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
> >  	__set_bit(FR_WAITING, &req->flags);
> >  	if (for_background)
> >  		__set_bit(FR_BACKGROUND, &req->flags);
> > -	if (req->in.h.pid == 0) {
> > +	if (req->in.h.pid == 0 || req->in.h.uid == (uid_t)-1 ||
> > +	    req->in.h.gid == (gid_t)-1) {
> >  		fuse_put_request(fc, req);
> >  		return ERR_PTR(-EOVERFLOW);
> >  	}
> > @@ -1248,7 +1249,8 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> >  	struct fuse_in *in;
> >  	unsigned reqsize;
> >  
> > -	if (task_active_pid_ns(current) != fc->pid_ns)
> > +	if (task_active_pid_ns(current) != fc->pid_ns ||
> > +	    current_user_ns() != fc->user_ns)
> >  		return -EIO;
> >  
> >   restart:
> > @@ -1880,7 +1882,8 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
> >  	struct fuse_req *req;
> >  	struct fuse_out_header oh;
> >  
> > -	if (task_active_pid_ns(current) != fc->pid_ns)
> > +	if (task_active_pid_ns(current) != fc->pid_ns ||
> > +	    current_user_ns() != fc->user_ns)
> >  		return -EIO;
> >  
> >  	if (nbytes < sizeof(struct fuse_out_header))
> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index 5e2e08712d3b..8fd9fe4dcd43 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -841,8 +841,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
> >  	stat->ino = attr->ino;
> >  	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
> >  	stat->nlink = attr->nlink;
> > -	stat->uid = make_kuid(&init_user_ns, attr->uid);
> > -	stat->gid = make_kgid(&init_user_ns, attr->gid);
> > +	stat->uid = inode->i_uid;
> > +	stat->gid = inode->i_gid;
> 
> This breaks the attr_version logic in fuse_change_attributes().
> 
> So just use make_k[ug]id() here as well.

Okay.

> >  	stat->rdev = inode->i_rdev;
> >  	stat->atime.tv_sec = attr->atime;
> >  	stat->atime.tv_nsec = attr->atimensec;
> > @@ -1455,17 +1455,17 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
> >  	return true;
> >  }
> >  
> > -static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
> > -			   bool trust_local_cmtime)
> > +static void iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
> > +			   struct fuse_setattr_in *arg, bool trust_local_cmtime)
> >  {
> >  	unsigned ivalid = iattr->ia_valid;
> >  
> >  	if (ivalid & ATTR_MODE)
> >  		arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
> >  	if (ivalid & ATTR_UID)
> > -		arg->valid |= FATTR_UID,    arg->uid = from_kuid(&init_user_ns, iattr->ia_uid);
> > +		arg->valid |= FATTR_UID,    arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
> >  	if (ivalid & ATTR_GID)
> > -		arg->valid |= FATTR_GID,    arg->gid = from_kgid(&init_user_ns, iattr->ia_gid);
> > +		arg->valid |= FATTR_GID,    arg->gid = from_kgid(fc->user_ns, iattr->ia_gid);
> >  	if (ivalid & ATTR_SIZE)
> >  		arg->valid |= FATTR_SIZE,   arg->size = iattr->ia_size;
> >  	if (ivalid & ATTR_ATIME) {
> > @@ -1625,7 +1625,7 @@ int fuse_do_setattr(struct inode *inode, struct iattr *attr,
> >  
> >  	memset(&inarg, 0, sizeof(inarg));
> >  	memset(&outarg, 0, sizeof(outarg));
> > -	iattr_to_fattr(attr, &inarg, trust_local_cmtime);
> > +	iattr_to_fattr(fc, attr, &inarg, trust_local_cmtime);
> >  	if (file) {
> >  		struct fuse_file *ff = file->private_data;
> >  		inarg.valid |= FATTR_FH;
> > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > index 143b595197b6..5897805405ba 100644
> > --- a/fs/fuse/fuse_i.h
> > +++ b/fs/fuse/fuse_i.h
> > @@ -23,6 +23,7 @@
> >  #include <linux/poll.h>
> >  #include <linux/workqueue.h>
> >  #include <linux/pid_namespace.h>
> > +#include <linux/user_namespace.h>
> >  
> >  /** Max number of pages that can be used in a single read request */
> >  #define FUSE_MAX_PAGES_PER_REQ 32
> > @@ -460,6 +461,9 @@ struct fuse_conn {
> >  	/** The pid namespace for this mount */
> >  	struct pid_namespace *pid_ns;
> >  
> > +	/** The user namespace for this mount */
> > +	struct user_namespace *user_ns;
> > +
> >  	/** The fuse mount flags for this mount */
> >  	unsigned flags;
> >  
> > @@ -855,7 +859,7 @@ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
> >  /**
> >   * Initialize fuse_conn
> >   */
> > -void fuse_conn_init(struct fuse_conn *fc);
> > +void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns);
> >  
> >  /**
> >   * Release reference to fuse_conn
> > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > index 2f31874ea9db..b7bdfdac3521 100644
> > --- a/fs/fuse/inode.c
> > +++ b/fs/fuse/inode.c
> > @@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
> >  	inode->i_ino     = fuse_squash_ino(attr->ino);
> >  	inode->i_mode    = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
> >  	set_nlink(inode, attr->nlink);
> > -	inode->i_uid     = make_kuid(&init_user_ns, attr->uid);
> > -	inode->i_gid     = make_kgid(&init_user_ns, attr->gid);
> > +	inode->i_uid     = make_kuid(fc->user_ns, attr->uid);
> > +	inode->i_gid     = make_kgid(fc->user_ns, attr->gid);
> >  	inode->i_blocks  = attr->blocks;
> >  	inode->i_atime.tv_sec   = attr->atime;
> >  	inode->i_atime.tv_nsec  = attr->atimensec;
> > @@ -467,12 +467,15 @@ static int fuse_match_uint(substring_t *s, unsigned int *res)
> >  	return err;
> >  }
> >  
> > -static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
> > +static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
> > +			  struct user_namespace *user_ns)
> >  {
> >  	char *p;
> >  	memset(d, 0, sizeof(struct fuse_mount_data));
> >  	d->max_read = ~0;
> >  	d->blksize = FUSE_DEFAULT_BLKSIZE;
> > +	d->user_id = make_kuid(user_ns, 0);
> > +	d->group_id = make_kgid(user_ns, 0);
> 
> It is true that if "user_id=" or "group_id" options were omitted we used the
> zero uid/gid values.  However, this isn't actually used by anybody AFAIK, and
> generalizing it for userns doesn't seem to make much sense.
> 
> So I suggest we that we instead return an error if mounting from a userns AND
> neither "allow_other" nor both "user_id" and "group_id" are specified.

But those are also used for ownership of the connection files in
fusectl. In an allow_other mount shouldn't those files by owned by
namespace root and not global root?

> >  	while ((p = strsep(&opt, ",")) != NULL) {
> >  		int token;
> > @@ -503,7 +506,7 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
> >  		case OPT_USER_ID:
> >  			if (fuse_match_uint(&args[0], &uv))
> >  				return 0;
> > -			d->user_id = make_kuid(current_user_ns(), uv);
> > +			d->user_id = make_kuid(user_ns, uv);
> >  			if (!uid_valid(d->user_id))
> >  				return 0;
> >  			d->user_id_present = 1;
> > @@ -512,7 +515,7 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
> >  		case OPT_GROUP_ID:
> >  			if (fuse_match_uint(&args[0], &uv))
> >  				return 0;
> > -			d->group_id = make_kgid(current_user_ns(), uv);
> > +			d->group_id = make_kgid(user_ns, uv);
> >  			if (!gid_valid(d->group_id))
> >  				return 0;
> >  			d->group_id_present = 1;
> > @@ -555,8 +558,10 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
> >  	struct super_block *sb = root->d_sb;
> >  	struct fuse_conn *fc = get_fuse_conn_super(sb);
> >  
> > -	seq_printf(m, ",user_id=%u", from_kuid_munged(&init_user_ns, fc->user_id));
> > -	seq_printf(m, ",group_id=%u", from_kgid_munged(&init_user_ns, fc->group_id));
> > +	seq_printf(m, ",user_id=%u",
> > +		   from_kuid_munged(fc->user_ns, fc->user_id));
> > +	seq_printf(m, ",group_id=%u",
> > +		   from_kgid_munged(fc->user_ns, fc->group_id));
> >  	if (fc->flags & FUSE_DEFAULT_PERMISSIONS)
> >  		seq_puts(m, ",default_permissions");
> >  	if (fc->flags & FUSE_ALLOW_OTHER)
> > @@ -587,7 +592,7 @@ static void fuse_pqueue_init(struct fuse_pqueue *fpq)
> >  	fpq->connected = 1;
> >  }
> >  
> > -void fuse_conn_init(struct fuse_conn *fc)
> > +void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns)
> >  {
> >  	memset(fc, 0, sizeof(*fc));
> >  	spin_lock_init(&fc->lock);
> > @@ -611,6 +616,7 @@ void fuse_conn_init(struct fuse_conn *fc)
> >  	fc->attr_version = 1;
> >  	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
> >  	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
> > +	fc->user_ns = get_user_ns(user_ns);
> >  }
> >  EXPORT_SYMBOL_GPL(fuse_conn_init);
> >  
> > @@ -620,6 +626,7 @@ void fuse_conn_put(struct fuse_conn *fc)
> >  		if (fc->destroy_req)
> >  			fuse_request_free(fc->destroy_req);
> >  		put_pid_ns(fc->pid_ns);
> > +		put_user_ns(fc->user_ns);
> >  		fc->release(fc);
> >  	}
> >  }
> > @@ -1046,7 +1053,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
> >  
> >  	sb->s_flags &= ~(MS_NOSEC | MS_I_VERSION);
> >  
> > -	if (!parse_fuse_opt(data, &d, is_bdev))
> > +	if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns))
> >  		goto err;
> >  
> >  	if (is_bdev) {
> > @@ -1070,8 +1077,12 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
> >  	if (!file)
> >  		goto err;
> >  
> > -	if ((file->f_op != &fuse_dev_operations) ||
> > -	    (file->f_cred->user_ns != &init_user_ns))
> > +	/*
> > +	 * Require mount to happen from the same user namespace which
> > +	 * opened /dev/fuse to prevent potential attacks.
> > +	 */
> > +	if (file->f_op != &fuse_dev_operations ||
> > +	    file->f_cred->user_ns != sb->s_user_ns)
> >  		goto err_fput;
> >  
> >  	fc = kmalloc(sizeof(*fc), GFP_KERNEL);
> > @@ -1079,7 +1090,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
> >  	if (!fc)
> >  		goto err_fput;
> >  
> > -	fuse_conn_init(fc);
> > +	fuse_conn_init(fc, sb->s_user_ns);
> >  	fc->release = fuse_free_conn;
> >  
> >  	fud = fuse_dev_alloc(fc);
> > -- 
> > 1.9.1
> > 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 16/18] fuse: Support fuse filesystems outside of init_user_ns
  2016-03-09 14:18       ` Seth Forshee
@ 2016-03-09 14:48           ` Miklos Szeredi
  0 siblings, 0 replies; 68+ messages in thread
From: Miklos Szeredi @ 2016-03-09 14:48 UTC (permalink / raw)
  To: Seth Forshee
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, LSM, Kernel Mailing List,
	linux-raid-u79uwXL29TY76Z2rM5mHXA, fuse-devel,
	Austin S Hemmelgarn, linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Eric W. Biederman, selinux-+05T5uksL2qpZYMLLGbcSA, Linux-Fsdevel,
	Alexander Viro

On Wed, Mar 9, 2016 at 3:18 PM, Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> On Wed, Mar 09, 2016 at 12:29:23PM +0100, Miklos Szeredi wrote:
>> On Mon, Jan 04, 2016 at 12:03:55PM -0600, Seth Forshee wrote:

>> > -static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
>> > +static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
>> > +                     struct user_namespace *user_ns)
>> >  {
>> >     char *p;
>> >     memset(d, 0, sizeof(struct fuse_mount_data));
>> >     d->max_read = ~0;
>> >     d->blksize = FUSE_DEFAULT_BLKSIZE;
>> > +   d->user_id = make_kuid(user_ns, 0);
>> > +   d->group_id = make_kgid(user_ns, 0);
>>
>> It is true that if "user_id=" or "group_id" options were omitted we used the
>> zero uid/gid values.  However, this isn't actually used by anybody AFAIK, and
>> generalizing it for userns doesn't seem to make much sense.
>>
>> So I suggest we that we instead return an error if mounting from a userns AND
>> neither "allow_other" nor both "user_id" and "group_id" are specified.
>
> But those are also used for ownership of the connection files in
> fusectl. In an allow_other mount shouldn't those files by owned by
> namespace root and not global root?

Yes.

Can't we use current_cred()->uid/gid? Or fsuid/fsgid maybe?

When we have true unprivileged mounts, the user_id/group_id options
become redundant anyway and we can just use the current credentials.

Thanks,
Miklos

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
-- 
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 16/18] fuse: Support fuse filesystems outside of init_user_ns
@ 2016-03-09 14:48           ` Miklos Szeredi
  0 siblings, 0 replies; 68+ messages in thread
From: Miklos Szeredi @ 2016-03-09 14:48 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, Kernel Mailing List,
	linux-bcache, dm-devel, linux-raid, linux-mtd, Linux-Fsdevel,
	fuse-devel, LSM, selinux

On Wed, Mar 9, 2016 at 3:18 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
> On Wed, Mar 09, 2016 at 12:29:23PM +0100, Miklos Szeredi wrote:
>> On Mon, Jan 04, 2016 at 12:03:55PM -0600, Seth Forshee wrote:

>> > -static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
>> > +static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
>> > +                     struct user_namespace *user_ns)
>> >  {
>> >     char *p;
>> >     memset(d, 0, sizeof(struct fuse_mount_data));
>> >     d->max_read = ~0;
>> >     d->blksize = FUSE_DEFAULT_BLKSIZE;
>> > +   d->user_id = make_kuid(user_ns, 0);
>> > +   d->group_id = make_kgid(user_ns, 0);
>>
>> It is true that if "user_id=" or "group_id" options were omitted we used the
>> zero uid/gid values.  However, this isn't actually used by anybody AFAIK, and
>> generalizing it for userns doesn't seem to make much sense.
>>
>> So I suggest we that we instead return an error if mounting from a userns AND
>> neither "allow_other" nor both "user_id" and "group_id" are specified.
>
> But those are also used for ownership of the connection files in
> fusectl. In an allow_other mount shouldn't those files by owned by
> namespace root and not global root?

Yes.

Can't we use current_cred()->uid/gid? Or fsuid/fsgid maybe?

When we have true unprivileged mounts, the user_id/group_id options
become redundant anyway and we can just use the current credentials.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 16/18] fuse: Support fuse filesystems outside of init_user_ns
  2016-03-09 14:48           ` Miklos Szeredi
@ 2016-03-09 15:25               ` Seth Forshee
  -1 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-03-09 15:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, LSM, Kernel Mailing List,
	linux-raid-u79uwXL29TY76Z2rM5mHXA, fuse-devel,
	Austin S Hemmelgarn, linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Eric W. Biederman, selinux-+05T5uksL2qpZYMLLGbcSA, Linux-Fsdevel,
	Alexander Viro

On Wed, Mar 09, 2016 at 03:48:22PM +0100, Miklos Szeredi wrote:
> On Wed, Mar 9, 2016 at 3:18 PM, Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> > On Wed, Mar 09, 2016 at 12:29:23PM +0100, Miklos Szeredi wrote:
> >> On Mon, Jan 04, 2016 at 12:03:55PM -0600, Seth Forshee wrote:
> 
> >> > -static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
> >> > +static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
> >> > +                     struct user_namespace *user_ns)
> >> >  {
> >> >     char *p;
> >> >     memset(d, 0, sizeof(struct fuse_mount_data));
> >> >     d->max_read = ~0;
> >> >     d->blksize = FUSE_DEFAULT_BLKSIZE;
> >> > +   d->user_id = make_kuid(user_ns, 0);
> >> > +   d->group_id = make_kgid(user_ns, 0);
> >>
> >> It is true that if "user_id=" or "group_id" options were omitted we used the
> >> zero uid/gid values.  However, this isn't actually used by anybody AFAIK, and
> >> generalizing it for userns doesn't seem to make much sense.
> >>
> >> So I suggest we that we instead return an error if mounting from a userns AND
> >> neither "allow_other" nor both "user_id" and "group_id" are specified.
> >
> > But those are also used for ownership of the connection files in
> > fusectl. In an allow_other mount shouldn't those files by owned by
> > namespace root and not global root?
> 
> Yes.
> 
> Can't we use current_cred()->uid/gid? Or fsuid/fsgid maybe?

That would be a departure from the current behavior in the !allow_other
case for unprivileged users. Since those mounts are done by an suid
helper all of those ids would be root in the userns, wouldn't they?

> When we have true unprivileged mounts, the user_id/group_id options
> become redundant anyway and we can just use the current credentials.

True, but we don't yet have that.

Thanks,
Seth

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
-- 
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 16/18] fuse: Support fuse filesystems outside of init_user_ns
@ 2016-03-09 15:25               ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-03-09 15:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, Kernel Mailing List,
	linux-bcache, dm-devel, linux-raid, linux-mtd, Linux-Fsdevel,
	fuse-devel, LSM, selinux

On Wed, Mar 09, 2016 at 03:48:22PM +0100, Miklos Szeredi wrote:
> On Wed, Mar 9, 2016 at 3:18 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
> > On Wed, Mar 09, 2016 at 12:29:23PM +0100, Miklos Szeredi wrote:
> >> On Mon, Jan 04, 2016 at 12:03:55PM -0600, Seth Forshee wrote:
> 
> >> > -static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
> >> > +static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
> >> > +                     struct user_namespace *user_ns)
> >> >  {
> >> >     char *p;
> >> >     memset(d, 0, sizeof(struct fuse_mount_data));
> >> >     d->max_read = ~0;
> >> >     d->blksize = FUSE_DEFAULT_BLKSIZE;
> >> > +   d->user_id = make_kuid(user_ns, 0);
> >> > +   d->group_id = make_kgid(user_ns, 0);
> >>
> >> It is true that if "user_id=" or "group_id" options were omitted we used the
> >> zero uid/gid values.  However, this isn't actually used by anybody AFAIK, and
> >> generalizing it for userns doesn't seem to make much sense.
> >>
> >> So I suggest we that we instead return an error if mounting from a userns AND
> >> neither "allow_other" nor both "user_id" and "group_id" are specified.
> >
> > But those are also used for ownership of the connection files in
> > fusectl. In an allow_other mount shouldn't those files by owned by
> > namespace root and not global root?
> 
> Yes.
> 
> Can't we use current_cred()->uid/gid? Or fsuid/fsgid maybe?

That would be a departure from the current behavior in the !allow_other
case for unprivileged users. Since those mounts are done by an suid
helper all of those ids would be root in the userns, wouldn't they?

> When we have true unprivileged mounts, the user_id/group_id options
> become redundant anyway and we can just use the current credentials.

True, but we don't yet have that.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 16/18] fuse: Support fuse filesystems outside of init_user_ns
  2016-03-09 15:25               ` Seth Forshee
  (?)
@ 2016-03-09 15:51               ` Miklos Szeredi
       [not found]                 ` <CAJfpegv5KR_Hi-79a8oyb+R+tv9W3RYqy5pngUKSyauVNk2ScQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  -1 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2016-03-09 15:51 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, Kernel Mailing List,
	linux-bcache, dm-devel, linux-raid, linux-mtd, Linux-Fsdevel,
	fuse-devel, LSM, selinux

On Wed, Mar 9, 2016 at 4:25 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
> On Wed, Mar 09, 2016 at 03:48:22PM +0100, Miklos Szeredi wrote:

>> Can't we use current_cred()->uid/gid? Or fsuid/fsgid maybe?
>
> That would be a departure from the current behavior in the !allow_other
> case for unprivileged users. Since those mounts are done by an suid
> helper all of those ids would be root in the userns, wouldn't they?

Well, actually this is what the helper does:

    sprintf(d, "fd=%i,rootmode=%o,user_id=%u,group_id=%u",
        fd, rootmode, getuid(), getgid());

So it just uses the current uid/gid.  Apparently no reason to do this
in userland, we could just as well set these in the kernel.  Except
for possible backward compatibility problems for things not using the
helper.

BUT if the mount is unprivileged or it's a userns mount, or anything
previously not possible, then we are not constrained by the backward
compatibility issues, and can go with the saner solution.

Does that not make sense?

>> When we have true unprivileged mounts, the user_id/group_id options
>> become redundant anyway and we can just use the current credentials.
>
> True, but we don't yet have that.

What's missing?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 16/18] fuse: Support fuse filesystems outside of init_user_ns
  2016-03-09 15:51               ` Miklos Szeredi
@ 2016-03-09 17:07                     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-03-09 17:07 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, LSM, Kernel Mailing List,
	linux-raid-u79uwXL29TY76Z2rM5mHXA, fuse-devel,
	Austin S Hemmelgarn, linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Eric W. Biederman, selinux-+05T5uksL2qpZYMLLGbcSA, Linux-Fsdevel,
	Alexander Viro

On Wed, Mar 09, 2016 at 04:51:42PM +0100, Miklos Szeredi wrote:
> On Wed, Mar 9, 2016 at 4:25 PM, Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> > On Wed, Mar 09, 2016 at 03:48:22PM +0100, Miklos Szeredi wrote:
> 
> >> Can't we use current_cred()->uid/gid? Or fsuid/fsgid maybe?
> >
> > That would be a departure from the current behavior in the !allow_other
> > case for unprivileged users. Since those mounts are done by an suid
> > helper all of those ids would be root in the userns, wouldn't they?
> 
> Well, actually this is what the helper does:
> 
>     sprintf(d, "fd=%i,rootmode=%o,user_id=%u,group_id=%u",
>         fd, rootmode, getuid(), getgid());

Sorry, I was thinking of euid. So this may not be a problem.

> So it just uses the current uid/gid.  Apparently no reason to do this
> in userland, we could just as well set these in the kernel.  Except
> for possible backward compatibility problems for things not using the
> helper.
> 
> BUT if the mount is unprivileged or it's a userns mount, or anything
> previously not possible, then we are not constrained by the backward
> compatibility issues, and can go with the saner solution.
> 
> Does that not make sense?

But we generally do want backwards compatibility, and we want userspace
software to be able to expect the same behavior whether or not it's
running in a user namespaced container. Obviously we can't always have
things 100% identical, but we shouldn't break things unless we really
need to.

However it may be that this isn't actually going to break assumptions of
existing software like I had feared. My preference is still to not
change any userspace-visible behaviors since we never know what software
might have made assumptions based on those behaviors. But if you're
confident that it won't break anything I'm willing to give it a try.

> >> When we have true unprivileged mounts, the user_id/group_id options
> >> become redundant anyway and we can just use the current credentials.
> >
> > True, but we don't yet have that.
> 
> What's missing?

A user must still be privileged to mount, even if only towards their own
user and mount namespaces. Maybe that's what you meant though and I just
misunderstood.

Thanks,
Seth

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
-- 
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 16/18] fuse: Support fuse filesystems outside of init_user_ns
@ 2016-03-09 17:07                     ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-03-09 17:07 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, Kernel Mailing List,
	linux-bcache, dm-devel, linux-raid, linux-mtd, Linux-Fsdevel,
	fuse-devel, LSM, selinux

On Wed, Mar 09, 2016 at 04:51:42PM +0100, Miklos Szeredi wrote:
> On Wed, Mar 9, 2016 at 4:25 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
> > On Wed, Mar 09, 2016 at 03:48:22PM +0100, Miklos Szeredi wrote:
> 
> >> Can't we use current_cred()->uid/gid? Or fsuid/fsgid maybe?
> >
> > That would be a departure from the current behavior in the !allow_other
> > case for unprivileged users. Since those mounts are done by an suid
> > helper all of those ids would be root in the userns, wouldn't they?
> 
> Well, actually this is what the helper does:
> 
>     sprintf(d, "fd=%i,rootmode=%o,user_id=%u,group_id=%u",
>         fd, rootmode, getuid(), getgid());

Sorry, I was thinking of euid. So this may not be a problem.

> So it just uses the current uid/gid.  Apparently no reason to do this
> in userland, we could just as well set these in the kernel.  Except
> for possible backward compatibility problems for things not using the
> helper.
> 
> BUT if the mount is unprivileged or it's a userns mount, or anything
> previously not possible, then we are not constrained by the backward
> compatibility issues, and can go with the saner solution.
> 
> Does that not make sense?

But we generally do want backwards compatibility, and we want userspace
software to be able to expect the same behavior whether or not it's
running in a user namespaced container. Obviously we can't always have
things 100% identical, but we shouldn't break things unless we really
need to.

However it may be that this isn't actually going to break assumptions of
existing software like I had feared. My preference is still to not
change any userspace-visible behaviors since we never know what software
might have made assumptions based on those behaviors. But if you're
confident that it won't break anything I'm willing to give it a try.

> >> When we have true unprivileged mounts, the user_id/group_id options
> >> become redundant anyway and we can just use the current credentials.
> >
> > True, but we don't yet have that.
> 
> What's missing?

A user must still be privileged to mount, even if only towards their own
user and mount namespaces. Maybe that's what you meant though and I just
misunderstood.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 16/18] fuse: Support fuse filesystems outside of init_user_ns
  2016-03-09 17:07                     ` Seth Forshee
  (?)
@ 2016-03-14 20:58                     ` Miklos Szeredi
  2016-03-25 20:31                       ` Seth Forshee
  -1 siblings, 1 reply; 68+ messages in thread
From: Miklos Szeredi @ 2016-03-14 20:58 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, Kernel Mailing List,
	linux-bcache, dm-devel, linux-raid, linux-mtd, Linux-Fsdevel,
	fuse-devel, LSM, selinux

On Wed, Mar 9, 2016 at 6:07 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
> On Wed, Mar 09, 2016 at 04:51:42PM +0100, Miklos Szeredi wrote:
>> On Wed, Mar 9, 2016 at 4:25 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
>> > On Wed, Mar 09, 2016 at 03:48:22PM +0100, Miklos Szeredi wrote:
>>
>> >> Can't we use current_cred()->uid/gid? Or fsuid/fsgid maybe?
>> >
>> > That would be a departure from the current behavior in the !allow_other
>> > case for unprivileged users. Since those mounts are done by an suid
>> > helper all of those ids would be root in the userns, wouldn't they?
>>
>> Well, actually this is what the helper does:
>>
>>     sprintf(d, "fd=%i,rootmode=%o,user_id=%u,group_id=%u",
>>         fd, rootmode, getuid(), getgid());
>
> Sorry, I was thinking of euid. So this may not be a problem.
>
>> So it just uses the current uid/gid.  Apparently no reason to do this
>> in userland, we could just as well set these in the kernel.  Except
>> for possible backward compatibility problems for things not using the
>> helper.
>>
>> BUT if the mount is unprivileged or it's a userns mount, or anything
>> previously not possible, then we are not constrained by the backward
>> compatibility issues, and can go with the saner solution.
>>
>> Does that not make sense?
>
> But we generally do want backwards compatibility, and we want userspace
> software to be able to expect the same behavior whether or not it's
> running in a user namespaced container. Obviously we can't always have
> things 100% identical, but we shouldn't break things unless we really
> need to.
>
> However it may be that this isn't actually going to break assumptions of
> existing software like I had feared. My preference is still to not
> change any userspace-visible behaviors since we never know what software
> might have made assumptions based on those behaviors. But if you're
> confident that it won't break anything I'm willing to give it a try.

I'm quite confident it won't make a difference.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH] fs: remove excess check for in_userns
  2016-01-04 18:03     ` Seth Forshee
  (?)
@ 2016-03-15 12:09     ` Pavel Tikhomirov
  2016-03-15 13:45       ` Seth Forshee
  -1 siblings, 1 reply; 68+ messages in thread
From: Pavel Tikhomirov @ 2016-03-15 12:09 UTC (permalink / raw)
  To: Andy Lutomirski, Seth Forshee, Eric W. Biederman, devel
  Cc: Serge Hallyn, Alexander Viro, linux-security-module, linux-mtd,
	selinux, linux-fsdevel, Pavel Tikhomirov, Konstantin Khorenko,
	Pavel Emelyanov

If in_userns returns false mnt_may_suid also returns false, and we
will reach second(removed) if-check only in case it does not trigger,
so remove it.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 security/commoncap.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index ca0c04ae..82f930c 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -445,8 +445,6 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 
 	if (!mnt_may_suid(bprm->file->f_path.mnt))
 		return 0;
-	if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns))
-		return 0;
 
 	dentry = dget(bprm->file->f_dentry);
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH] fs: remove excess check for in_userns
  2016-03-15 12:09     ` [PATCH] fs: remove excess check for in_userns Pavel Tikhomirov
@ 2016-03-15 13:45       ` Seth Forshee
  2016-03-15 14:19         ` Pavel Tikhomirov
                           ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Seth Forshee @ 2016-03-15 13:45 UTC (permalink / raw)
  To: Pavel Tikhomirov
  Cc: Andy Lutomirski, Eric W. Biederman, devel, Serge Hallyn,
	Alexander Viro, linux-security-module, linux-mtd, selinux,
	linux-fsdevel, Konstantin Khorenko, Pavel Emelyanov

On Tue, Mar 15, 2016 at 03:09:00PM +0300, Pavel Tikhomirov wrote:
> If in_userns returns false mnt_may_suid also returns false, and we
> will reach second(removed) if-check only in case it does not trigger,
> so remove it.

We had a somewhat lengthy discussion previously where one of the
conclusions was that we'd have that check in both places even though
it's redundant. Iirc the reason was that though they're doing the same
test they're doing so to answer different questions, so we should have
the test in both places (or something along those lines).

Thanks,
Seth

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] fs: remove excess check for in_userns
  2016-03-15 13:45       ` Seth Forshee
@ 2016-03-15 14:19         ` Pavel Tikhomirov
  2016-03-15 14:19         ` Pavel Tikhomirov
  2016-03-22 23:19         ` James Morris
  2 siblings, 0 replies; 68+ messages in thread
From: Pavel Tikhomirov @ 2016-03-15 14:19 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Andy Lutomirski, Eric W. Biederman, devel, Serge Hallyn,
	Alexander Viro, linux-security-module, linux-mtd, selinux,
	linux-fsdevel, Konstantin Khorenko, Pavel Emelyanov



On 03/15/2016 04:45 PM, Seth Forshee wrote:
> On Tue, Mar 15, 2016 at 03:09:00PM +0300, Pavel Tikhomirov wrote:
>> If in_userns returns false mnt_may_suid also returns false, and we
>> will reach second(removed) if-check only in case it does not trigger,
>> so remove it.
>
> We had a somewhat lengthy discussion previously where one of the
> conclusions was that we'd have that check in both places even though
> it's redundant. Iirc the reason was that though they're doing the same
> test they're doing so to answer different questions, so we should have
> the test in both places (or something along those lines).

Ok, that is reasonable. But from my POW the edge between the meaning of 
those checks is quiet blurred.

Thanks!

>
> Thanks,
> Seth
>

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] fs: remove excess check for in_userns
  2016-03-15 13:45       ` Seth Forshee
  2016-03-15 14:19         ` Pavel Tikhomirov
@ 2016-03-15 14:19         ` Pavel Tikhomirov
  2016-03-22 23:19         ` James Morris
  2 siblings, 0 replies; 68+ messages in thread
From: Pavel Tikhomirov @ 2016-03-15 14:19 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Andy Lutomirski, Eric W. Biederman, devel, Serge Hallyn,
	Alexander Viro, linux-security-module, linux-mtd, selinux,
	linux-fsdevel, Konstantin Khorenko, Pavel Emelyanov



On 03/15/2016 04:45 PM, Seth Forshee wrote:
> On Tue, Mar 15, 2016 at 03:09:00PM +0300, Pavel Tikhomirov wrote:
>> If in_userns returns false mnt_may_suid also returns false, and we
>> will reach second(removed) if-check only in case it does not trigger,
>> so remove it.
>
> We had a somewhat lengthy discussion previously where one of the
> conclusions was that we'd have that check in both places even though
> it's redundant. Iirc the reason was that though they're doing the same
> test they're doing so to answer different questions, so we should have
> the test in both places (or something along those lines).

Ok, that is reasonable. But from my POV the edge between the meaning of 
those checks is quiet blurred.

Thanks!

>
> Thanks,
> Seth
>

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] fs: remove excess check for in_userns
  2016-03-15 13:45       ` Seth Forshee
  2016-03-15 14:19         ` Pavel Tikhomirov
  2016-03-15 14:19         ` Pavel Tikhomirov
@ 2016-03-22 23:19         ` James Morris
  2 siblings, 0 replies; 68+ messages in thread
From: James Morris @ 2016-03-22 23:19 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Pavel Tikhomirov, Andy Lutomirski, Eric W. Biederman, devel,
	Serge Hallyn, Alexander Viro, linux-security-module, linux-mtd,
	selinux, linux-fsdevel, Konstantin Khorenko, Pavel Emelyanov

On Tue, 15 Mar 2016, Seth Forshee wrote:

> On Tue, Mar 15, 2016 at 03:09:00PM +0300, Pavel Tikhomirov wrote:
> > If in_userns returns false mnt_may_suid also returns false, and we
> > will reach second(removed) if-check only in case it does not trigger,
> > so remove it.
> 
> We had a somewhat lengthy discussion previously where one of the
> conclusions was that we'd have that check in both places even though
> it's redundant. Iirc the reason was that though they're doing the same
> test they're doing so to answer different questions, so we should have
> the test in both places (or something along those lines).

A comment in the code might be useful here.

-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 16/18] fuse: Support fuse filesystems outside of init_user_ns
  2016-03-14 20:58                     ` Miklos Szeredi
@ 2016-03-25 20:31                       ` Seth Forshee
  0 siblings, 0 replies; 68+ messages in thread
From: Seth Forshee @ 2016-03-25 20:31 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Eric W. Biederman, Alexander Viro, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, Kernel Mailing List,
	linux-bcache, dm-devel, linux-raid, linux-mtd, Linux-Fsdevel,
	fuse-devel, LSM, selinux

On Mon, Mar 14, 2016 at 09:58:43PM +0100, Miklos Szeredi wrote:
> On Wed, Mar 9, 2016 at 6:07 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
> > On Wed, Mar 09, 2016 at 04:51:42PM +0100, Miklos Szeredi wrote:
> >> On Wed, Mar 9, 2016 at 4:25 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
> >> > On Wed, Mar 09, 2016 at 03:48:22PM +0100, Miklos Szeredi wrote:
> >>
> >> >> Can't we use current_cred()->uid/gid? Or fsuid/fsgid maybe?
> >> >
> >> > That would be a departure from the current behavior in the !allow_other
> >> > case for unprivileged users. Since those mounts are done by an suid
> >> > helper all of those ids would be root in the userns, wouldn't they?
> >>
> >> Well, actually this is what the helper does:
> >>
> >>     sprintf(d, "fd=%i,rootmode=%o,user_id=%u,group_id=%u",
> >>         fd, rootmode, getuid(), getgid());
> >
> > Sorry, I was thinking of euid. So this may not be a problem.
> >
> >> So it just uses the current uid/gid.  Apparently no reason to do this
> >> in userland, we could just as well set these in the kernel.  Except
> >> for possible backward compatibility problems for things not using the
> >> helper.
> >>
> >> BUT if the mount is unprivileged or it's a userns mount, or anything
> >> previously not possible, then we are not constrained by the backward
> >> compatibility issues, and can go with the saner solution.
> >>
> >> Does that not make sense?
> >
> > But we generally do want backwards compatibility, and we want userspace
> > software to be able to expect the same behavior whether or not it's
> > running in a user namespaced container. Obviously we can't always have
> > things 100% identical, but we shouldn't break things unless we really
> > need to.
> >
> > However it may be that this isn't actually going to break assumptions of
> > existing software like I had feared. My preference is still to not
> > change any userspace-visible behaviors since we never know what software
> > might have made assumptions based on those behaviors. But if you're
> > confident that it won't break anything I'm willing to give it a try.
> 
> I'm quite confident it won't make a difference.

I was just about to go make these changes and discovered that the
user_id and group_id options are already mandatory, due to this check at
the bottom of parse_fuse_opt():

        if (!d->fd_present || !d->rootmode_present ||
            !d->user_id_present || !d->group_id_present)
                return 0;

So I'll simply drop those two lines which supply default values for
these options.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 11/18] fs: Ensure the mounter of a filesystem is privileged towards its inodes
  2016-03-04 22:43       ` Eric W. Biederman
  2016-03-06 15:48         ` Seth Forshee
@ 2016-03-28 16:59         ` Seth Forshee
  2016-03-30  1:36           ` Eric W. Biederman
  1 sibling, 1 reply; 68+ messages in thread
From: Seth Forshee @ 2016-03-28 16:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, Miklos Szeredi, linux-kernel, linux-bcache,
	dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux

On Fri, Mar 04, 2016 at 04:43:06PM -0600, Eric W. Biederman wrote:
> In general this is only an issue if uids and gids on the filesystem
> do not map into the user namespace.
> 
> Therefore the general fix is to limit the logic of checking for
> capabilities in s_user_ns if we are dealing with INVALID_UID and
> INVALID_GID.  For proc and kernfs that should never be the case
> so the problem becomes a non-issue.
> 
> Further I would look at limiting that relaxation to just
> inode_change_ok. 

Finally got around to implementing this today; is the patch below what
you had in mind?

> So that we can easily wrap that check per filesystem
> and deny the relaxation for proc and kernfs.  proc and kernfs already
> have wrappers for .setattr so denying changes when !uid_vaid and
> !gid_valid would be a trivial addition, and ensure calamity does
> not ensure.

I'm confused about this part though. As you say above, proc and kernfs
will never have inodes with invalid ids, so it's not an issue. Do you
just mean this to be extra insurance against problems?

Thanks,
Seth

---

diff --git a/fs/attr.c b/fs/attr.c
index 3cfaaac4a18e..f2bcd3f7dfbb 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -16,6 +16,31 @@
 #include <linux/evm.h>
 #include <linux/ima.h>
 
+static bool chown_ok(const struct inode *inode, kuid_t uid)
+{
+	if (uid_eq(current_fsuid(), inode->i_uid) && uid_eq(uid, inode->i_uid))
+		return true;
+	if (capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+		return true;
+	if (!uid_valid(inode->i_uid) &&
+	    ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
+		return true;
+	return false;
+}
+
+static bool chgrp_ok(const struct inode *inode, kgid_t gid)
+{
+	if (uid_eq(current_fsuid(), inode->i_uid) &&
+	    (in_group_p(gid) || gid_eq(gid, inode->i_gid)))
+		return true;
+	if (capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+		return true;
+	if (!gid_valid(inode->i_gid) &&
+	    ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
+		return true;
+	return false;
+}
+
 /**
  * inode_change_ok - check if attribute changes to an inode are allowed
  * @inode:	inode to check
@@ -58,17 +83,11 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
 		return 0;
 
 	/* Make sure a caller can chown. */
-	if ((ia_valid & ATTR_UID) &&
-	    (!uid_eq(current_fsuid(), inode->i_uid) ||
-	     !uid_eq(attr->ia_uid, inode->i_uid)) &&
-	    !capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+	if ((ia_valid & ATTR_UID) && !chown_ok(inode, attr->ia_uid))
 		return -EPERM;
 
 	/* Make sure caller can chgrp. */
-	if ((ia_valid & ATTR_GID) &&
-	    (!uid_eq(current_fsuid(), inode->i_uid) ||
-	    (!in_group_p(attr->ia_gid) && !gid_eq(attr->ia_gid, inode->i_gid))) &&
-	    !capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+	if ((ia_valid & ATTR_GID) && !chgrp_ok(inode, attr->ia_gid))
 		return -EPERM;
 
 	/* Make sure a caller can chmod. */

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 11/18] fs: Ensure the mounter of a filesystem is privileged towards its inodes
  2016-03-28 16:59         ` Seth Forshee
@ 2016-03-30  1:36           ` Eric W. Biederman
  2016-03-30 14:58             ` Seth Forshee
  0 siblings, 1 reply; 68+ messages in thread
From: Eric W. Biederman @ 2016-03-30  1:36 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, Miklos Szeredi, linux-kernel, linux-bcache,
	dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux

Seth Forshee <seth.forshee@canonical.com> writes:

> On Fri, Mar 04, 2016 at 04:43:06PM -0600, Eric W. Biederman wrote:
>> In general this is only an issue if uids and gids on the filesystem
>> do not map into the user namespace.
>> 
>> Therefore the general fix is to limit the logic of checking for
>> capabilities in s_user_ns if we are dealing with INVALID_UID and
>> INVALID_GID.  For proc and kernfs that should never be the case
>> so the problem becomes a non-issue.
>> 
>> Further I would look at limiting that relaxation to just
>> inode_change_ok. 
>
> Finally got around to implementing this today; is the patch below what
> you had in mind?

Pretty much.

For the same reason that capble_wrt_inode_uidgid(inode) had to look
at both inode->i_uid and inode->i_gid I think we need to look at
both inode->i_uid and inode->i_gid in those case.

I am worried about chgrp_ok in cases such as inode->i_uid is valid
but unmapped.  I have a similiar worry about chown_ok where
inode->i_gid is valid but unmapped (although that worry is less
serious).

>> So that we can easily wrap that check per filesystem
>> and deny the relaxation for proc and kernfs.  proc and kernfs already
>> have wrappers for .setattr so denying changes when !uid_vaid and
>> !gid_valid would be a trivial addition, and ensure calamity does
>> not ensure.
>
> I'm confused about this part though. As you say above, proc and kernfs
> will never have inodes with invalid ids, so it's not an issue. Do you
> just mean this to be extra insurance against problems?

I meant two things.
1) As filesystems explicitly have to call inode_change_ok they can
   over ride the default if it is possible.

2) Because being paranoid about backward compatibility matters, it
   almost certainly workth add adding a check:
   "if (!uid_valid(inode->i_uid) ||!gid_valid(inode->i_gid)) return -EPERM"
   To proc and sysfs just before they call inode_change_ok just so we
   don't need to analyze them and confirm that they don't use
   INVALID_UID.

   That just makes the patch more robust.

   The we could leave removing that code for a follow on patch where
   someone takes the time to read through and audit all of the proc and
   sysfs code to ensure that the case does not arise, instead of just
   implicitily assuming it.

   That is the usual pattern when pushing down changes.  Do something
   that is easily guaranteed to work, and leave the careful looking for
   a patch all of it's own.

Eric


> Thanks,
> Seth
>
> ---
>
> diff --git a/fs/attr.c b/fs/attr.c
> index 3cfaaac4a18e..f2bcd3f7dfbb 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -16,6 +16,31 @@
>  #include <linux/evm.h>
>  #include <linux/ima.h>
>  
> +static bool chown_ok(const struct inode *inode, kuid_t uid)
> +{
> +	if (uid_eq(current_fsuid(), inode->i_uid) && uid_eq(uid, inode->i_uid))
> +		return true;
> +	if (capable_wrt_inode_uidgid(inode, CAP_CHOWN))
> +		return true;
> +	if (!uid_valid(inode->i_uid) &&
> +	    ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
> +		return true;
> +	return false;
> +}
> +
> +static bool chgrp_ok(const struct inode *inode, kgid_t gid)
> +{
> +	if (uid_eq(current_fsuid(), inode->i_uid) &&
> +	    (in_group_p(gid) || gid_eq(gid, inode->i_gid)))
> +		return true;
> +	if (capable_wrt_inode_uidgid(inode, CAP_CHOWN))
> +		return true;
> +	if (!gid_valid(inode->i_gid) &&
> +	    ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
> +		return true;
> +	return false;
> +}
> +
>  /**
>   * inode_change_ok - check if attribute changes to an inode are allowed
>   * @inode:	inode to check
> @@ -58,17 +83,11 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
>  		return 0;
>  
>  	/* Make sure a caller can chown. */
> -	if ((ia_valid & ATTR_UID) &&
> -	    (!uid_eq(current_fsuid(), inode->i_uid) ||
> -	     !uid_eq(attr->ia_uid, inode->i_uid)) &&
> -	    !capable_wrt_inode_uidgid(inode, CAP_CHOWN))
> +	if ((ia_valid & ATTR_UID) && !chown_ok(inode, attr->ia_uid))
>  		return -EPERM;
>  
>  	/* Make sure caller can chgrp. */
> -	if ((ia_valid & ATTR_GID) &&
> -	    (!uid_eq(current_fsuid(), inode->i_uid) ||
> -	    (!in_group_p(attr->ia_gid) && !gid_eq(attr->ia_gid, inode->i_gid))) &&
> -	    !capable_wrt_inode_uidgid(inode, CAP_CHOWN))
> +	if ((ia_valid & ATTR_GID) && !chgrp_ok(inode, attr->ia_gid))
>  		return -EPERM;
>  
>  	/* Make sure a caller can chmod. */

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 11/18] fs: Ensure the mounter of a filesystem is privileged towards its inodes
  2016-03-30  1:36           ` Eric W. Biederman
@ 2016-03-30 14:58             ` Seth Forshee
  2016-03-30 20:18               ` Eric W. Biederman
  0 siblings, 1 reply; 68+ messages in thread
From: Seth Forshee @ 2016-03-30 14:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, Miklos Szeredi, linux-kernel, linux-bcache,
	dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux

On Tue, Mar 29, 2016 at 08:36:09PM -0500, Eric W. Biederman wrote:
> Seth Forshee <seth.forshee@canonical.com> writes:
> 
> > On Fri, Mar 04, 2016 at 04:43:06PM -0600, Eric W. Biederman wrote:
> >> In general this is only an issue if uids and gids on the filesystem
> >> do not map into the user namespace.
> >> 
> >> Therefore the general fix is to limit the logic of checking for
> >> capabilities in s_user_ns if we are dealing with INVALID_UID and
> >> INVALID_GID.  For proc and kernfs that should never be the case
> >> so the problem becomes a non-issue.
> >> 
> >> Further I would look at limiting that relaxation to just
> >> inode_change_ok. 
> >
> > Finally got around to implementing this today; is the patch below what
> > you had in mind?
> 
> Pretty much.
> 
> For the same reason that capble_wrt_inode_uidgid(inode) had to look
> at both inode->i_uid and inode->i_gid I think we need to look at
> both inode->i_uid and inode->i_gid in those case.
> 
> I am worried about chgrp_ok in cases such as inode->i_uid is valid
> but unmapped.  I have a similiar worry about chown_ok where
> inode->i_gid is valid but unmapped (although that worry is less
> serious).

That makes sense.

So then what is wanted is to check that the other id is either invalid,
or else it maps into s_user_ns. So for chown_ok() something like this:

    if (!uid_valid(inode->i_uid) &&
        (!gid_valid(inode->i_gid) || kgid_has_mapping(inode->i_sb->s_user_ns, inode->i_gid)) &&
        ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
            return true;

and likewise for chgrp_ok(). Does that satisfy your concerns?

> >> So that we can easily wrap that check per filesystem
> >> and deny the relaxation for proc and kernfs.  proc and kernfs already
> >> have wrappers for .setattr so denying changes when !uid_vaid and
> >> !gid_valid would be a trivial addition, and ensure calamity does
> >> not ensure.
> >
> > I'm confused about this part though. As you say above, proc and kernfs
> > will never have inodes with invalid ids, so it's not an issue. Do you
> > just mean this to be extra insurance against problems?
> 
> I meant two things.
> 1) As filesystems explicitly have to call inode_change_ok they can
>    over ride the default if it is possible.
> 
> 2) Because being paranoid about backward compatibility matters, it
>    almost certainly workth add adding a check:
>    "if (!uid_valid(inode->i_uid) ||!gid_valid(inode->i_gid)) return -EPERM"
>    To proc and sysfs just before they call inode_change_ok just so we
>    don't need to analyze them and confirm that they don't use
>    INVALID_UID.
> 
>    That just makes the patch more robust.
> 
>    The we could leave removing that code for a follow on patch where
>    someone takes the time to read through and audit all of the proc and
>    sysfs code to ensure that the case does not arise, instead of just
>    implicitily assuming it.
> 
>    That is the usual pattern when pushing down changes.  Do something
>    that is easily guaranteed to work, and leave the careful looking for
>    a patch all of it's own.

Okay, I'll add checks.

Thanks,
Seth


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RESEND v2 11/18] fs: Ensure the mounter of a filesystem is privileged towards its inodes
  2016-03-30 14:58             ` Seth Forshee
@ 2016-03-30 20:18               ` Eric W. Biederman
  0 siblings, 0 replies; 68+ messages in thread
From: Eric W. Biederman @ 2016-03-30 20:18 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
	Austin S Hemmelgarn, Miklos Szeredi, linux-kernel, linux-bcache,
	dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux

Seth Forshee <seth.forshee@canonical.com> writes:

> On Tue, Mar 29, 2016 at 08:36:09PM -0500, Eric W. Biederman wrote:
>> Seth Forshee <seth.forshee@canonical.com> writes:
>> 
>> > On Fri, Mar 04, 2016 at 04:43:06PM -0600, Eric W. Biederman wrote:
>> >> In general this is only an issue if uids and gids on the filesystem
>> >> do not map into the user namespace.
>> >> 
>> >> Therefore the general fix is to limit the logic of checking for
>> >> capabilities in s_user_ns if we are dealing with INVALID_UID and
>> >> INVALID_GID.  For proc and kernfs that should never be the case
>> >> so the problem becomes a non-issue.
>> >> 
>> >> Further I would look at limiting that relaxation to just
>> >> inode_change_ok. 
>> >
>> > Finally got around to implementing this today; is the patch below what
>> > you had in mind?
>> 
>> Pretty much.
>> 
>> For the same reason that capble_wrt_inode_uidgid(inode) had to look
>> at both inode->i_uid and inode->i_gid I think we need to look at
>> both inode->i_uid and inode->i_gid in those case.
>> 
>> I am worried about chgrp_ok in cases such as inode->i_uid is valid
>> but unmapped.  I have a similiar worry about chown_ok where
>> inode->i_gid is valid but unmapped (although that worry is less
>> serious).
>
> That makes sense.
>
> So then what is wanted is to check that the other id is either invalid,
> or else it maps into s_user_ns. So for chown_ok() something like this:
>
>     if (!uid_valid(inode->i_uid) &&
>         (!gid_valid(inode->i_gid) || kgid_has_mapping(inode->i_sb->s_user_ns, inode->i_gid)) &&
>         ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
>             return true;
>
> and likewise for chgrp_ok(). Does that satisfy your concerns?

Yes it does.

Eric

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2016-03-30 20:18 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-04 18:03 [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
2016-01-04 18:03 ` [PATCH RESEND v2 01/18] block_dev: Support checking inode permissions in lookup_bdev() Seth Forshee
2016-01-04 18:03 ` [PATCH RESEND v2 10/18] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
     [not found] ` <1451930639-94331-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
2016-01-04 18:03   ` [PATCH RESEND v2 02/18] block_dev: Check permissions towards block device inode when mounting Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-01-04 18:03   ` [PATCH RESEND v2 03/18] fs: Treat foreign mounts as nosuid Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-03-15 12:09     ` [PATCH] fs: remove excess check for in_userns Pavel Tikhomirov
2016-03-15 13:45       ` Seth Forshee
2016-03-15 14:19         ` Pavel Tikhomirov
2016-03-15 14:19         ` Pavel Tikhomirov
2016-03-22 23:19         ` James Morris
2016-01-04 18:03   ` [PATCH RESEND v2 04/18] selinux: Add support for unprivileged mounts from user namespaces Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-01-04 18:03   ` [PATCH RESEND v2 05/18] userns: Replace in_userns with current_in_userns Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-01-04 18:03   ` [PATCH RESEND v2 06/18] Smack: Handle labels consistently in untrusted mounts Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-01-04 18:03   ` [PATCH RESEND v2 07/18] fs: Check for invalid i_uid in may_follow_link() Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-01-04 18:03   ` [PATCH RESEND v2 08/18] cred: Reject inodes with invalid ids in set_create_file_as() Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-01-04 18:03   ` [PATCH RESEND v2 09/18] fs: Refuse uid/gid changes which don't map into s_user_ns Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-01-04 18:03   ` [PATCH RESEND v2 11/18] fs: Ensure the mounter of a filesystem is privileged towards its inodes Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-03-03 17:02     ` Seth Forshee
2016-03-04 22:43       ` Eric W. Biederman
2016-03-06 15:48         ` Seth Forshee
2016-03-06 22:07           ` Eric W. Biederman
2016-03-07 13:32             ` Seth Forshee
2016-03-28 16:59         ` Seth Forshee
2016-03-30  1:36           ` Eric W. Biederman
2016-03-30 14:58             ` Seth Forshee
2016-03-30 20:18               ` Eric W. Biederman
2016-01-04 18:03   ` [PATCH RESEND v2 12/18] fs: Don't remove suid for CAP_FSETID in s_user_ns Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-01-04 18:03   ` [PATCH RESEND v2 13/18] fs: Allow superblock owner to access do_remount_sb() Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-01-04 18:03   ` [PATCH RESEND v2 14/18] capabilities: Allow privileged user in s_user_ns to set security.* xattrs Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-01-04 18:03   ` [PATCH RESEND v2 15/18] fuse: Add support for pid namespaces Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-03-09 10:53     ` Miklos Szeredi
2016-03-09 14:17       ` Seth Forshee
2016-01-04 18:03   ` [PATCH RESEND v2 16/18] fuse: Support fuse filesystems outside of init_user_ns Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-03-09 11:29     ` Miklos Szeredi
2016-03-09 14:18       ` Seth Forshee
2016-03-09 14:48         ` Miklos Szeredi
2016-03-09 14:48           ` Miklos Szeredi
     [not found]           ` <CAJfpegv5JmB15yHpjYxVeOYdWWkoLMftr9-e_iS93Y_7m=t4Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-09 15:25             ` Seth Forshee
2016-03-09 15:25               ` Seth Forshee
2016-03-09 15:51               ` Miklos Szeredi
     [not found]                 ` <CAJfpegv5KR_Hi-79a8oyb+R+tv9W3RYqy5pngUKSyauVNk2ScQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-09 17:07                   ` Seth Forshee
2016-03-09 17:07                     ` Seth Forshee
2016-03-14 20:58                     ` Miklos Szeredi
2016-03-25 20:31                       ` Seth Forshee
2016-01-04 18:03   ` [PATCH RESEND v2 17/18] fuse: Restrict allow_other to the superblock's namespace or a descendant Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-03-09 11:40     ` Miklos Szeredi
2016-01-04 18:03   ` [PATCH RESEND v2 18/18] fuse: Allow user namespace mounts Seth Forshee
2016-01-04 18:03     ` Seth Forshee
2016-03-09 13:08     ` Miklos Szeredi
2016-01-25 19:47 ` [PATCH RESEND v2 00/19] Support fuse mounts in user namespaces Seth Forshee
2016-01-25 20:01   ` Eric W. Biederman
2016-01-25 20:01     ` Eric W. Biederman
2016-01-25 20:36     ` Seth Forshee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.