linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/21] Support fuse mounts in user namespaces
@ 2016-04-26 19:30 Seth Forshee
       [not found] ` <1461699046-30485-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
       [not found] ` <1461699046-30485-4-git-send-email-seth.forshee@canonical.com>
  0 siblings, 2 replies; 17+ messages in thread
From: Seth Forshee @ 2016-04-26 19:30 UTC (permalink / raw)
  To: Eric W. Biederman, linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	selinux-+05T5uksL2qpZYMLLGbcSA
  Cc: Serge Hallyn, Miklos Szeredi, Richard Weinberger,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Seth Forshee,
	Austin S Hemmelgarn, Alexander Viro, Pavel Tikhomirov

Hi Eric,

Here's another update to my patches for mouning with fuse from
unpivileged user namespaces. The main change here is a fix for a build
failure when fuse is built as a module. As usual the series is also
available at:

 git://git.kernel.org/pub/scm/linux/kernel/git/sforshee/linux.git fuse-userns

Changes since v3:

 * Export current_in_userns() to fix an error when fuse is built as a
   module.
 * Add comment explaining the conditions for allowing CAP_CHOWN in
   s_user_ns to change the owner or group of an inode.
 * Added acks from Serge.

Thanks,
Seth

---

Andy Lutomirski (1):
  fs: Treat foreign mounts as nosuid

Pavel Tikhomirov (1):
  fs: fix a posible leak of allocated superblock

Seth Forshee (19):
  fs: Remove check of s_user_ns for existing mounts in
    fs_fully_visible()
  fs: Allow sysfs and cgroupfs to share super blocks between user
    namespaces
  block_dev: Support checking inode permissions in lookup_bdev()
  block_dev: Check permissions towards block device inode when mounting
  selinux: Add support for unprivileged mounts from user namespaces
  userns: Replace in_userns with current_in_userns
  Smack: Handle labels consistently in untrusted mounts
  fs: Check for invalid i_uid in may_follow_link()
  cred: Reject inodes with invalid ids in set_create_file_as()
  fs: Refuse uid/gid changes which don't map into s_user_ns
  fs: Update posix_acl support to handle user namespace mounts
  fs: Allow superblock owner to change ownership of inodes with
    unmappable ids
  fs: Don't remove suid for CAP_FSETID in s_user_ns
  fs: Allow superblock owner to access do_remount_sb()
  capabilities: Allow privileged user in s_user_ns to set security.*
    xattrs
  fuse: Add support for pid namespaces
  fuse: Support fuse filesystems outside of init_user_ns
  fuse: Restrict allow_other to the superblock's namespace or a
    descendant
  fuse: Allow user namespace mounts

 drivers/md/bcache/super.c       |  2 +-
 drivers/md/dm-table.c           |  2 +-
 drivers/mtd/mtdsuper.c          |  2 +-
 fs/attr.c                       | 73 ++++++++++++++++++++++++++++++++++++-----
 fs/block_dev.c                  | 18 ++++++++--
 fs/exec.c                       |  2 +-
 fs/fuse/cuse.c                  |  3 +-
 fs/fuse/dev.c                   | 26 +++++++++++----
 fs/fuse/dir.c                   | 16 ++++-----
 fs/fuse/file.c                  | 22 ++++++++++---
 fs/fuse/fuse_i.h                | 10 +++++-
 fs/fuse/inode.c                 | 40 ++++++++++++++--------
 fs/inode.c                      |  3 +-
 fs/kernfs/inode.c               |  2 ++
 fs/namei.c                      |  2 +-
 fs/namespace.c                  | 20 ++++++++---
 fs/posix_acl.c                  | 67 +++++++++++++++++++++++--------------
 fs/proc/base.c                  |  2 ++
 fs/proc/generic.c               |  3 ++
 fs/proc/proc_sysctl.c           |  2 ++
 fs/quota/quota.c                |  2 +-
 fs/super.c                      |  7 +++-
 fs/sysfs/mount.c                |  3 +-
 fs/xattr.c                      | 19 ++++++++---
 include/linux/fs.h              |  3 +-
 include/linux/mount.h           |  1 +
 include/linux/posix_acl_xattr.h | 17 +++++++---
 include/linux/uidgid.h          | 10 ++++++
 include/linux/user_namespace.h  |  6 ++--
 kernel/cgroup.c                 |  4 +--
 kernel/cred.c                   |  2 ++
 kernel/user_namespace.c         |  7 ++--
 security/commoncap.c            | 22 +++++++++----
 security/selinux/hooks.c        | 25 +++++++++++++-
 security/smack/smack_lsm.c      | 29 ++++++++++------
 35 files changed, 355 insertions(+), 119 deletions(-)

_______________________________________________
Selinux mailing list
Selinux-+05T5uksL2qpZYMLLGbcSA@public.gmane.org
To unsubscribe, send email to Selinux-leave-+05T5uksL2pAGbPMOrvdOA@public.gmane.org
To get help, send an email containing "help" to Selinux-request-+05T5uksL2pAGbPMOrvdOA@public.gmane.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 02/21] fs: Remove check of s_user_ns for existing mounts in fs_fully_visible()
       [not found] ` <1461699046-30485-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
@ 2016-04-26 19:30   ` Seth Forshee
  2016-04-26 19:30   ` [PATCH v4 04/21] block_dev: Support checking inode permissions in lookup_bdev() Seth Forshee
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Seth Forshee @ 2016-04-26 19:30 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn, Seth Forshee,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	Richard Weinberger, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Pavel Tikhomirov

fs_fully_visible() ignores MNT_LOCK_NODEV when FS_USERS_DEV_MOUNT
is not set for the filesystem, but there is a bug in the logic
that may cause mounting to fail. It is doing this only when the
existing mount is not in init_user_ns but should check the new
mount instead. But the new mount is always in a non-init
namespace when fs_fully_visible() is called, so that condition
can simply be removed.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/namespace.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index f20c82f91ecb..c133318bec35 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3234,8 +3234,7 @@ static bool fs_fully_visible(struct file_system_type *type, int *new_mnt_flags)
 		mnt_flags = mnt->mnt.mnt_flags;
 		if (mnt->mnt.mnt_sb->s_iflags & SB_I_NOEXEC)
 			mnt_flags &= ~(MNT_LOCK_NOSUID | MNT_LOCK_NOEXEC);
-		if (mnt->mnt.mnt_sb->s_user_ns != &init_user_ns &&
-		    !(mnt->mnt.mnt_sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT))
+		if (!(mnt->mnt.mnt_sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT))
 			mnt_flags &= ~(MNT_LOCK_NODEV);
 
 		/* Verify the mount flags are equal to or more permissive
-- 
2.7.4

_______________________________________________
Selinux mailing list
Selinux-+05T5uksL2qpZYMLLGbcSA@public.gmane.org
To unsubscribe, send email to Selinux-leave-+05T5uksL2pAGbPMOrvdOA@public.gmane.org
To get help, send an email containing "help" to Selinux-request-+05T5uksL2pAGbPMOrvdOA@public.gmane.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 04/21] block_dev: Support checking inode permissions in lookup_bdev()
       [not found] ` <1461699046-30485-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
  2016-04-26 19:30   ` [PATCH v4 02/21] fs: Remove check of s_user_ns for existing mounts in fs_fully_visible() Seth Forshee
@ 2016-04-26 19:30   ` Seth Forshee
  2016-04-26 19:30   ` [PATCH v4 08/21] userns: Replace in_userns with current_in_userns Seth Forshee
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Seth Forshee @ 2016-04-26 19:30 UTC (permalink / raw)
  To: Eric W. Biederman, Kent Overstreet, Alasdair Kergon,
	Mike Snitzer, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Shaohua Li,
	David Woodhouse, Brian Norris, Alexander Viro, Jan Kara,
	Jeff Layton, J. Bruce Fields
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn, Seth Forshee,
	Miklos Szeredi, Richard Weinberger,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Pavel Tikhomirov

When looking up a block device by path no permission check is
done to verify that the user has access to the block device inode
at the specified path. In some cases it may be necessary to
check permissions towards the inode, such as allowing
unprivileged users to mount block devices in user namespaces.

Add an argument to lookup_bdev() to optionally perform this
permission check. A value of 0 skips the permission check and
behaves the same as before. A non-zero value specifies the mask
of access rights required towards the inode at the specified
path. The check is always skipped if the user has CAP_SYS_ADMIN.

All callers of lookup_bdev() currently pass a mask of 0, so this
patch results in no functional change. Subsequent patches will
add permission checks where appropriate.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 drivers/md/bcache/super.c |  2 +-
 drivers/md/dm-table.c     |  2 +-
 drivers/mtd/mtdsuper.c    |  2 +-
 fs/block_dev.c            | 13 ++++++++++---
 fs/quota/quota.c          |  2 +-
 include/linux/fs.h        |  2 +-
 6 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index a296425a7270..e169739a0253 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1950,7 +1950,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 				  sb);
 	if (IS_ERR(bdev)) {
 		if (bdev == ERR_PTR(-EBUSY)) {
-			bdev = lookup_bdev(strim(path));
+			bdev = lookup_bdev(strim(path), 0);
 			mutex_lock(&bch_register_lock);
 			if (!IS_ERR(bdev) && bch_is_open(bdev))
 				err = "device already registered";
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index f9e8f0bef332..13f568d527b5 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -372,7 +372,7 @@ dev_t dm_get_dev_t(const char *path)
 	dev_t uninitialized_var(dev);
 	struct block_device *bdev;
 
-	bdev = lookup_bdev(path);
+	bdev = lookup_bdev(path, 0);
 	if (IS_ERR(bdev))
 		dev = name_to_dev_t(path);
 	else {
diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
index 20c02a3b7417..b5b60e1af31c 100644
--- a/drivers/mtd/mtdsuper.c
+++ b/drivers/mtd/mtdsuper.c
@@ -176,7 +176,7 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
 	/* try the old way - the hack where we allowed users to mount
 	 * /dev/mtdblock$(n) but didn't actually _use_ the blockdev
 	 */
-	bdev = lookup_bdev(dev_name);
+	bdev = lookup_bdev(dev_name, 0);
 	if (IS_ERR(bdev)) {
 		ret = PTR_ERR(bdev);
 		pr_debug("MTDSB: lookup_bdev() returned %d\n", ret);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 3e84d62d0a25..e9b937845bdb 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1431,7 +1431,7 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
 	struct block_device *bdev;
 	int err;
 
-	bdev = lookup_bdev(path);
+	bdev = lookup_bdev(path, 0);
 	if (IS_ERR(bdev))
 		return bdev;
 
@@ -1821,12 +1821,14 @@ EXPORT_SYMBOL(ioctl_by_bdev);
 /**
  * lookup_bdev  - lookup a struct block_device by name
  * @pathname:	special file representing the block device
+ * @mask:	rights to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
  *
  * Get a reference to the blockdevice at @pathname in the current
  * namespace if possible and return it.  Return ERR_PTR(error)
- * otherwise.
+ * otherwise.  If @mask is non-zero, check for access rights to the
+ * inode at @pathname.
  */
-struct block_device *lookup_bdev(const char *pathname)
+struct block_device *lookup_bdev(const char *pathname, int mask)
 {
 	struct block_device *bdev;
 	struct inode *inode;
@@ -1841,6 +1843,11 @@ struct block_device *lookup_bdev(const char *pathname)
 		return ERR_PTR(error);
 
 	inode = d_backing_inode(path.dentry);
+	if (mask != 0 && !capable(CAP_SYS_ADMIN)) {
+		error = __inode_permission(inode, mask);
+		if (error)
+			goto fail;
+	}
 	error = -ENOTBLK;
 	if (!S_ISBLK(inode->i_mode))
 		goto fail;
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 0f10ee9892ce..59223384b22e 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -799,7 +799,7 @@ static struct super_block *quotactl_block(const char __user *special, int cmd)
 
 	if (IS_ERR(tmp))
 		return ERR_CAST(tmp);
-	bdev = lookup_bdev(tmp->name);
+	bdev = lookup_bdev(tmp->name, 0);
 	putname(tmp);
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 66a639ec1bc4..173b8adc6131 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2438,7 +2438,7 @@ static inline void unregister_chrdev(unsigned int major, const char *name)
 #define BLKDEV_MAJOR_HASH_SIZE	255
 extern const char *__bdevname(dev_t, char *buffer);
 extern const char *bdevname(struct block_device *bdev, char *buffer);
-extern struct block_device *lookup_bdev(const char *);
+extern struct block_device *lookup_bdev(const char *, int mask);
 extern void blkdev_show(struct seq_file *,off_t);
 
 #else
-- 
2.7.4

_______________________________________________
Selinux mailing list
Selinux-+05T5uksL2qpZYMLLGbcSA@public.gmane.org
To unsubscribe, send email to Selinux-leave-+05T5uksL2pAGbPMOrvdOA@public.gmane.org
To get help, send an email containing "help" to Selinux-request-+05T5uksL2pAGbPMOrvdOA@public.gmane.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 08/21] userns: Replace in_userns with current_in_userns
       [not found] ` <1461699046-30485-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
  2016-04-26 19:30   ` [PATCH v4 02/21] fs: Remove check of s_user_ns for existing mounts in fs_fully_visible() Seth Forshee
  2016-04-26 19:30   ` [PATCH v4 04/21] block_dev: Support checking inode permissions in lookup_bdev() Seth Forshee
@ 2016-04-26 19:30   ` Seth Forshee
  2016-04-26 19:30   ` [PATCH v4 10/21] fs: Check for invalid i_uid in may_follow_link() Seth Forshee
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Seth Forshee @ 2016-04-26 19:30 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Serge Hallyn, James Morris,
	Serge E. Hallyn
  Cc: Miklos Szeredi, Seth Forshee, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Richard Weinberger,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Pavel Tikhomirov

All current callers of in_userns pass current_user_ns as the
first argument. Simplify by replacing in_userns with
current_in_userns which checks whether current_user_ns is in the
namespace supplied as an argument.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: James Morris <james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/namespace.c                 | 2 +-
 include/linux/user_namespace.h | 6 ++----
 kernel/user_namespace.c        | 6 +++---
 security/commoncap.c           | 2 +-
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 6e9db4c166b4..0ad8e4a4f50b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3294,7 +3294,7 @@ bool mnt_may_suid(struct vfsmount *mnt)
 	 * in other namespaces.
 	 */
 	return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) &&
-	       in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns);
+	       current_in_userns(mnt->mnt_sb->s_user_ns);
 }
 
 static struct ns_common *mntns_get(struct task_struct *task)
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index a43faa727124..9217169c64cb 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -72,8 +72,7 @@ extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t,
 extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
 extern int proc_setgroups_show(struct seq_file *m, void *v);
 extern bool userns_may_setgroups(const struct user_namespace *ns);
-extern bool in_userns(const struct user_namespace *ns,
-		      const struct user_namespace *target_ns);
+extern bool current_in_userns(const struct user_namespace *target_ns);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -103,8 +102,7 @@ static inline bool userns_may_setgroups(const struct user_namespace *ns)
 	return true;
 }
 
-static inline bool in_userns(const struct user_namespace *ns,
-			     const struct user_namespace *target_ns)
+static inline bool current_in_userns(const struct user_namespace *target_ns)
 {
 	return true;
 }
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index dee3be5445da..68f594212759 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -942,10 +942,10 @@ bool userns_may_setgroups(const struct user_namespace *ns)
  * Returns true if @ns is the same namespace as or a descendant of
  * @target_ns.
  */
-bool in_userns(const struct user_namespace *ns,
-	       const struct user_namespace *target_ns)
+bool current_in_userns(const struct user_namespace *target_ns)
 {
-	for (; ns; ns = ns->parent) {
+	struct user_namespace *ns;
+	for (ns = current_user_ns(); ns; ns = ns->parent) {
 		if (ns == target_ns)
 			return true;
 	}
diff --git a/security/commoncap.c b/security/commoncap.c
index a306c5d90709..e657227d221e 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -461,7 +461,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
 	 * explicit that capability bits are limited to s_user_ns and its
 	 * descendants.
 	 */
-	if (!in_userns(current_user_ns(), bprm->file->f_path.mnt->mnt_sb->s_user_ns))
+	if (!current_in_userns(bprm->file->f_path.mnt->mnt_sb->s_user_ns))
 		return 0;
 
 	rc = get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
-- 
2.7.4

_______________________________________________
Selinux mailing list
Selinux-+05T5uksL2qpZYMLLGbcSA@public.gmane.org
To unsubscribe, send email to Selinux-leave-+05T5uksL2pAGbPMOrvdOA@public.gmane.org
To get help, send an email containing "help" to Selinux-request-+05T5uksL2pAGbPMOrvdOA@public.gmane.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 10/21] fs: Check for invalid i_uid in may_follow_link()
       [not found] ` <1461699046-30485-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
                     ` (2 preceding siblings ...)
  2016-04-26 19:30   ` [PATCH v4 08/21] userns: Replace in_userns with current_in_userns Seth Forshee
@ 2016-04-26 19:30   ` Seth Forshee
  2016-04-26 19:30   ` [PATCH v4 11/21] cred: Reject inodes with invalid ids in set_create_file_as() Seth Forshee
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Seth Forshee @ 2016-04-26 19:30 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn, Seth Forshee,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	Richard Weinberger, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Pavel Tikhomirov

Filesystem uids which don't map into a user namespace may result
in inode->i_uid being INVALID_UID. A symlink and its parent
could have different owners in the filesystem can both get
mapped to INVALID_UID, which may result in following a symlink
when this would not have otherwise been permitted when protected
symlinks are enabled.

Add a new helper function, uid_valid_eq(), and use this to
validate that the ids in may_follow_link() are both equal and
valid. Also add an equivalent helper for gids, which is
currently unused.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/namei.c             |  2 +-
 include/linux/uidgid.h | 10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index a29094c6f4a1..6fe8b0d8ca90 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -915,7 +915,7 @@ static inline int may_follow_link(struct nameidata *nd)
 		return 0;
 
 	/* Allowed if parent directory and link owner match. */
-	if (uid_eq(parent->i_uid, inode->i_uid))
+	if (uid_valid_eq(parent->i_uid, inode->i_uid))
 		return 0;
 
 	if (nd->flags & LOOKUP_RCU)
diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
index 03835522dfcb..e09529fe2668 100644
--- a/include/linux/uidgid.h
+++ b/include/linux/uidgid.h
@@ -117,6 +117,16 @@ static inline bool gid_valid(kgid_t gid)
 	return __kgid_val(gid) != (gid_t) -1;
 }
 
+static inline bool uid_valid_eq(kuid_t left, kuid_t right)
+{
+	return uid_eq(left, right) && uid_valid(left);
+}
+
+static inline bool gid_valid_eq(kgid_t left, kgid_t right)
+{
+	return gid_eq(left, right) && gid_valid(left);
+}
+
 #ifdef CONFIG_USER_NS
 
 extern kuid_t make_kuid(struct user_namespace *from, uid_t uid);
-- 
2.7.4

_______________________________________________
Selinux mailing list
Selinux-+05T5uksL2qpZYMLLGbcSA@public.gmane.org
To unsubscribe, send email to Selinux-leave-+05T5uksL2pAGbPMOrvdOA@public.gmane.org
To get help, send an email containing "help" to Selinux-request-+05T5uksL2pAGbPMOrvdOA@public.gmane.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 11/21] cred: Reject inodes with invalid ids in set_create_file_as()
       [not found] ` <1461699046-30485-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
                     ` (3 preceding siblings ...)
  2016-04-26 19:30   ` [PATCH v4 10/21] fs: Check for invalid i_uid in may_follow_link() Seth Forshee
@ 2016-04-26 19:30   ` Seth Forshee
  2016-04-26 19:30   ` [PATCH v4 13/21] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Seth Forshee @ 2016-04-26 19:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn, Seth Forshee,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	Richard Weinberger, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Alexander Viro,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Pavel Tikhomirov

Using INVALID_[UG]ID for the LSM file creation context doesn't
make sense, so return an error if the inode passed to
set_create_file_as() has an invalid id.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 kernel/cred.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/cred.c b/kernel/cred.c
index 0c0cd8a62285..5f264fb5737d 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -689,6 +689,8 @@ EXPORT_SYMBOL(set_security_override_from_ctx);
  */
 int set_create_files_as(struct cred *new, struct inode *inode)
 {
+	if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid))
+		return -EINVAL;
 	new->fsuid = inode->i_uid;
 	new->fsgid = inode->i_gid;
 	return security_kernel_create_files_as(new, inode);
-- 
2.7.4

_______________________________________________
Selinux mailing list
Selinux-+05T5uksL2qpZYMLLGbcSA@public.gmane.org
To unsubscribe, send email to Selinux-leave-+05T5uksL2pAGbPMOrvdOA@public.gmane.org
To get help, send an email containing "help" to Selinux-request-+05T5uksL2pAGbPMOrvdOA@public.gmane.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 13/21] fs: Update posix_acl support to handle user namespace mounts
       [not found] ` <1461699046-30485-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
                     ` (4 preceding siblings ...)
  2016-04-26 19:30   ` [PATCH v4 11/21] cred: Reject inodes with invalid ids in set_create_file_as() Seth Forshee
@ 2016-04-26 19:30   ` Seth Forshee
  2016-04-26 19:30   ` [PATCH v4 14/21] fs: Allow superblock owner to change ownership of inodes with unmappable ids Seth Forshee
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Seth Forshee @ 2016-04-26 19:30 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn, Seth Forshee,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	Richard Weinberger, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Pavel Tikhomirov

ids in on-disk ACLs should be converted to s_user_ns instead of
init_user_ns as is done now. This introduces the possibility for
id mappings to fail, and when this happens syscalls will return
EOVERFLOW.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/posix_acl.c                  | 67 ++++++++++++++++++++++++++---------------
 fs/xattr.c                      | 19 +++++++++---
 include/linux/posix_acl_xattr.h | 17 ++++++++---
 3 files changed, 70 insertions(+), 33 deletions(-)

diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index 711dd5170376..dac2842dd4cb 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -595,59 +595,77 @@ EXPORT_SYMBOL_GPL(posix_acl_create);
 /*
  * Fix up the uids and gids in posix acl extended attributes in place.
  */
-static void posix_acl_fix_xattr_userns(
+static int posix_acl_fix_xattr_userns(
 	struct user_namespace *to, struct user_namespace *from,
 	void *value, size_t size)
 {
 	posix_acl_xattr_header *header = (posix_acl_xattr_header *)value;
 	posix_acl_xattr_entry *entry = (posix_acl_xattr_entry *)(header+1), *end;
 	int count;
-	kuid_t uid;
-	kgid_t gid;
+	kuid_t kuid;
+	uid_t uid;
+	kgid_t kgid;
+	gid_t gid;
 
 	if (!value)
-		return;
+		return 0;
 	if (size < sizeof(posix_acl_xattr_header))
-		return;
+		return 0;
 	if (header->a_version != cpu_to_le32(POSIX_ACL_XATTR_VERSION))
-		return;
+		return 0;
 
 	count = posix_acl_xattr_count(size);
 	if (count < 0)
-		return;
+		return 0;
 	if (count == 0)
-		return;
+		return 0;
 
 	for (end = entry + count; entry != end; entry++) {
 		switch(le16_to_cpu(entry->e_tag)) {
 		case ACL_USER:
-			uid = make_kuid(from, le32_to_cpu(entry->e_id));
-			entry->e_id = cpu_to_le32(from_kuid(to, uid));
+			kuid = make_kuid(from, le32_to_cpu(entry->e_id));
+			if (!uid_valid(kuid))
+				return -EOVERFLOW;
+			uid = from_kuid(to, kuid);
+			if (uid == (uid_t)-1)
+				return -EOVERFLOW;
+			entry->e_id = cpu_to_le32(uid);
 			break;
 		case ACL_GROUP:
-			gid = make_kgid(from, le32_to_cpu(entry->e_id));
-			entry->e_id = cpu_to_le32(from_kgid(to, gid));
+			kgid = make_kgid(from, le32_to_cpu(entry->e_id));
+			if (!gid_valid(kgid))
+				return -EOVERFLOW;
+			gid = from_kgid(to, kgid);
+			if (gid == (gid_t)-1)
+				return -EOVERFLOW;
+			entry->e_id = cpu_to_le32(gid);
 			break;
 		default:
 			break;
 		}
 	}
+
+	return 0;
 }
 
-void posix_acl_fix_xattr_from_user(void *value, size_t size)
+int
+posix_acl_fix_xattr_from_user(struct user_namespace *target_ns, void *value,
+			      size_t size)
 {
-	struct user_namespace *user_ns = current_user_ns();
-	if (user_ns == &init_user_ns)
-		return;
-	posix_acl_fix_xattr_userns(&init_user_ns, user_ns, value, size);
+	struct user_namespace *source_ns = current_user_ns();
+	if (source_ns == target_ns)
+		return 0;
+	return posix_acl_fix_xattr_userns(target_ns, source_ns, value, size);
 }
 
-void posix_acl_fix_xattr_to_user(void *value, size_t size)
+int
+posix_acl_fix_xattr_to_user(struct user_namespace *source_ns, void *value,
+			    size_t size)
 {
-	struct user_namespace *user_ns = current_user_ns();
-	if (user_ns == &init_user_ns)
-		return;
-	posix_acl_fix_xattr_userns(user_ns, &init_user_ns, value, size);
+	struct user_namespace *target_ns = current_user_ns();
+	if (target_ns == source_ns)
+		return 0;
+	return posix_acl_fix_xattr_userns(target_ns, source_ns, value, size);
 }
 
 /*
@@ -780,7 +798,7 @@ posix_acl_xattr_get(const struct xattr_handler *handler,
 	if (acl == NULL)
 		return -ENODATA;
 
-	error = posix_acl_to_xattr(&init_user_ns, acl, value, size);
+	error = posix_acl_to_xattr(dentry->d_sb->s_user_ns, acl, value, size);
 	posix_acl_release(acl);
 
 	return error;
@@ -806,7 +824,8 @@ posix_acl_xattr_set(const struct xattr_handler *handler,
 		return -EPERM;
 
 	if (value) {
-		acl = posix_acl_from_xattr(&init_user_ns, value, size);
+		acl = posix_acl_from_xattr(dentry->d_sb->s_user_ns, value,
+					   size);
 		if (IS_ERR(acl))
 			return PTR_ERR(acl);
 
diff --git a/fs/xattr.c b/fs/xattr.c
index 4861322e28e8..c541121945cd 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -330,8 +330,12 @@ setxattr(struct dentry *d, const char __user *name, const void __user *value,
 			goto out;
 		}
 		if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
-		    (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
-			posix_acl_fix_xattr_from_user(kvalue, size);
+		    (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0)) {
+			error = posix_acl_fix_xattr_from_user(d->d_sb->s_user_ns,
+							      kvalue, size);
+			if (error)
+				goto out;
+		}
 	}
 
 	error = vfs_setxattr(d, kname, kvalue, size, flags);
@@ -427,9 +431,14 @@ getxattr(struct dentry *d, const char __user *name, void __user *value,
 	error = vfs_getxattr(d, kname, kvalue, size);
 	if (error > 0) {
 		if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
-		    (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
-			posix_acl_fix_xattr_to_user(kvalue, size);
-		if (size && copy_to_user(value, kvalue, error))
+		    (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0)) {
+			int ret;
+			ret = posix_acl_fix_xattr_to_user(d->d_sb->s_user_ns,
+							  kvalue, size);
+			if (ret)
+				error = ret;
+		}
+		if (error > 0 && size && copy_to_user(value, kvalue, error))
 			error = -EFAULT;
 	} else if (error == -ERANGE && size >= XATTR_SIZE_MAX) {
 		/* The file system tried to returned a value bigger
diff --git a/include/linux/posix_acl_xattr.h b/include/linux/posix_acl_xattr.h
index e5e8ec40278d..5dec6b10951a 100644
--- a/include/linux/posix_acl_xattr.h
+++ b/include/linux/posix_acl_xattr.h
@@ -49,14 +49,23 @@ posix_acl_xattr_count(size_t size)
 }
 
 #ifdef CONFIG_FS_POSIX_ACL
-void posix_acl_fix_xattr_from_user(void *value, size_t size);
-void posix_acl_fix_xattr_to_user(void *value, size_t size);
+int posix_acl_fix_xattr_from_user(struct user_namespace *target_ns,
+				  void *value, size_t size);
+int posix_acl_fix_xattr_to_user(struct user_namespace *source_ns, void *value,
+				size_t size);
 #else
-static inline void posix_acl_fix_xattr_from_user(void *value, size_t size)
+static inline int
+posix_acl_fix_xattr_from_user(struct user_namespace *target_ns, void *value,
+			      size_t size)
 {
+	return 0;
 }
-static inline void posix_acl_fix_xattr_to_user(void *value, size_t size)
+
+static inline int
+posix_acl_fix_xattr_to_user(struct user_namespace *source_ns, void *value,
+			    size_t size)
 {
+	return 0;
 }
 #endif
 
-- 
2.7.4

_______________________________________________
Selinux mailing list
Selinux-+05T5uksL2qpZYMLLGbcSA@public.gmane.org
To unsubscribe, send email to Selinux-leave-+05T5uksL2pAGbPMOrvdOA@public.gmane.org
To get help, send an email containing "help" to Selinux-request-+05T5uksL2pAGbPMOrvdOA@public.gmane.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 14/21] fs: Allow superblock owner to change ownership of inodes with unmappable ids
       [not found] ` <1461699046-30485-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
                     ` (5 preceding siblings ...)
  2016-04-26 19:30   ` [PATCH v4 13/21] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
@ 2016-04-26 19:30   ` Seth Forshee
  2016-04-26 19:30   ` [PATCH v4 16/21] fs: Allow superblock owner to access do_remount_sb() Seth Forshee
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Seth Forshee @ 2016-04-26 19:30 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Greg Kroah-Hartman
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn, Seth Forshee,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	Richard Weinberger, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Pavel Tikhomirov

In a userns mount some on-disk inodes may have ids which do not
map into s_user_ns, in which case the in-kernel inodes are owned
by invalid users. The superblock owner should be able to change
attributes of these inodes but cannot. However it is unsafe to
grant the superblock owner privileged access to all inodes in the
superblock since proc, sysfs, etc. use DAC to protect files which
may not belong to s_user_ns. The problem is restricted to only
inodes where the owner or group is an invalid user.

We can work around this by allowing users with CAP_CHOWN in
s_user_ns to change an invalid owner or group id, so long as the
other id is either invalid or mappable in s_user_ns. After
changing ownership the user will be privileged towards the inode
and thus able to change other attributes.

As an precaution, checks for invalid ids are added to the proc
and kernfs setattr interfaces. These filesystems are not expected
to have inodes with invalid ids, but if it does happen any
setattr operations will return -EPERM.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/attr.c             | 62 ++++++++++++++++++++++++++++++++++++++++++++-------
 fs/kernfs/inode.c     |  2 ++
 fs/proc/base.c        |  2 ++
 fs/proc/generic.c     |  3 +++
 fs/proc/proc_sysctl.c |  2 ++
 5 files changed, 63 insertions(+), 8 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index 3cfaaac4a18e..06bb3f401559 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -16,6 +16,58 @@
 #include <linux/evm.h>
 #include <linux/ima.h>
 
+static bool chown_ok(const struct inode *inode, kuid_t uid)
+{
+	struct user_namespace *user_ns;
+
+	if (uid_eq(current_fsuid(), inode->i_uid) && uid_eq(uid, inode->i_uid))
+		return true;
+	if (capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+		return true;
+
+	/*
+	 * Inode uids/gids are of type kuid_t/kgid_t. As such, they can be
+	 * a) INVALID_UID/INVALID_GID, b) valid and mappable into
+	 * i_sb->s_user_ns, or c) valid but not mappable into
+	 * i_sb->s_user_ns.
+	 *
+	 * For filesystems on user-supplied media ids will either be (a) or
+	 * (b), so we permit CAP_CHOWN in s_user_ns to change INVALID_UID if
+	 * the gid meets these conditions (and vice versa for INVALID_GID).
+	 *
+	 * For psuedo-filesystems like proc or sysfs ids will be either (b)
+	 * or (c), so these conditions do not permit namespace-root to chown
+	 * in those filesystems.
+	 */
+	user_ns = inode->i_sb->s_user_ns;
+	if (!uid_valid(inode->i_uid) &&
+	    (!gid_valid(inode->i_gid) || kgid_has_mapping(user_ns, inode->i_gid)) &&
+	    ns_capable(user_ns, CAP_CHOWN))
+		return true;
+
+	return false;
+}
+
+static bool chgrp_ok(const struct inode *inode, kgid_t gid)
+{
+	struct user_namespace *user_ns;
+
+	if (uid_eq(current_fsuid(), inode->i_uid) &&
+	    (in_group_p(gid) || gid_eq(gid, inode->i_gid)))
+		return true;
+	if (capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+		return true;
+
+	/* Logic here is the same as in chown_ok(); see comment there. */
+	user_ns = inode->i_sb->s_user_ns;
+	if (!gid_valid(inode->i_gid) &&
+	    (!uid_valid(inode->i_uid) || kuid_has_mapping(user_ns, inode->i_uid)) &&
+	    ns_capable(user_ns, CAP_CHOWN))
+		return true;
+
+	return false;
+}
+
 /**
  * inode_change_ok - check if attribute changes to an inode are allowed
  * @inode:	inode to check
@@ -58,17 +110,11 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
 		return 0;
 
 	/* Make sure a caller can chown. */
-	if ((ia_valid & ATTR_UID) &&
-	    (!uid_eq(current_fsuid(), inode->i_uid) ||
-	     !uid_eq(attr->ia_uid, inode->i_uid)) &&
-	    !capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+	if ((ia_valid & ATTR_UID) && !chown_ok(inode, attr->ia_uid))
 		return -EPERM;
 
 	/* Make sure caller can chgrp. */
-	if ((ia_valid & ATTR_GID) &&
-	    (!uid_eq(current_fsuid(), inode->i_uid) ||
-	    (!in_group_p(attr->ia_gid) && !gid_eq(attr->ia_gid, inode->i_gid))) &&
-	    !capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+	if ((ia_valid & ATTR_GID) && !chgrp_ok(inode, attr->ia_gid))
 		return -EPERM;
 
 	/* Make sure a caller can chmod. */
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index 16405ae88d2d..2e97a337ee5f 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -117,6 +117,8 @@ int kernfs_iop_setattr(struct dentry *dentry, struct iattr *iattr)
 
 	if (!kn)
 		return -EINVAL;
+	if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid))
+		return -EPERM;
 
 	mutex_lock(&kernfs_mutex);
 	error = inode_change_ok(inode, iattr);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index b1755b23893e..648d623e2158 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -711,6 +711,8 @@ int proc_setattr(struct dentry *dentry, struct iattr *attr)
 
 	if (attr->ia_valid & ATTR_MODE)
 		return -EPERM;
+	if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid))
+		return -EPERM;
 
 	error = inode_change_ok(inode, attr);
 	if (error)
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index ff3ffc76a937..1461570c552c 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -105,6 +105,9 @@ static int proc_notify_change(struct dentry *dentry, struct iattr *iattr)
 	struct proc_dir_entry *de = PDE(inode);
 	int error;
 
+	if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid))
+		return -EPERM;
+
 	error = inode_change_ok(inode, iattr);
 	if (error)
 		return error;
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index fe5b6e6c4671..f5d575157194 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -752,6 +752,8 @@ static int proc_sys_setattr(struct dentry *dentry, struct iattr *attr)
 
 	if (attr->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID))
 		return -EPERM;
+	if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid))
+		return -EPERM;
 
 	error = inode_change_ok(inode, attr);
 	if (error)
-- 
2.7.4

_______________________________________________
Selinux mailing list
Selinux-+05T5uksL2qpZYMLLGbcSA@public.gmane.org
To unsubscribe, send email to Selinux-leave-+05T5uksL2pAGbPMOrvdOA@public.gmane.org
To get help, send an email containing "help" to Selinux-request-+05T5uksL2pAGbPMOrvdOA@public.gmane.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 16/21] fs: Allow superblock owner to access do_remount_sb()
       [not found] ` <1461699046-30485-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
                     ` (6 preceding siblings ...)
  2016-04-26 19:30   ` [PATCH v4 14/21] fs: Allow superblock owner to change ownership of inodes with unmappable ids Seth Forshee
@ 2016-04-26 19:30   ` Seth Forshee
  2016-04-26 19:30   ` [PATCH v4 18/21] fuse: Add support for pid namespaces Seth Forshee
  2016-04-26 19:30   ` [PATCH v4 19/21] fuse: Support fuse filesystems outside of init_user_ns Seth Forshee
  9 siblings, 0 replies; 17+ messages in thread
From: Seth Forshee @ 2016-04-26 19:30 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn, Seth Forshee,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	Richard Weinberger, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Pavel Tikhomirov

Superblock level remounts are currently restricted to global
CAP_SYS_ADMIN, as is the path for changing the root mount to
read only on umount. Loosen both of these permission checks to
also allow CAP_SYS_ADMIN in any namespace which is privileged
towards the userns which originally mounted the filesystem.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/namespace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 0ad8e4a4f50b..575e3f8b34fd 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1510,7 +1510,7 @@ static int do_umount(struct mount *mnt, int flags)
 		 * Special case for "unmounting" root ...
 		 * we just try to remount it readonly.
 		 */
-		if (!capable(CAP_SYS_ADMIN))
+		if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
 			return -EPERM;
 		down_write(&sb->s_umount);
 		if (!(sb->s_flags & MS_RDONLY))
@@ -2207,7 +2207,7 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	down_write(&sb->s_umount);
 	if (flags & MS_BIND)
 		err = change_mount_flags(path->mnt, flags);
-	else if (!capable(CAP_SYS_ADMIN))
+	else if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
 		err = -EPERM;
 	else
 		err = do_remount_sb(sb, flags, data, 0);
-- 
2.7.4

_______________________________________________
Selinux mailing list
Selinux-+05T5uksL2qpZYMLLGbcSA@public.gmane.org
To unsubscribe, send email to Selinux-leave-+05T5uksL2pAGbPMOrvdOA@public.gmane.org
To get help, send an email containing "help" to Selinux-request-+05T5uksL2pAGbPMOrvdOA@public.gmane.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 18/21] fuse: Add support for pid namespaces
       [not found] ` <1461699046-30485-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
                     ` (7 preceding siblings ...)
  2016-04-26 19:30   ` [PATCH v4 16/21] fs: Allow superblock owner to access do_remount_sb() Seth Forshee
@ 2016-04-26 19:30   ` Seth Forshee
  2016-04-26 19:30   ` [PATCH v4 19/21] fuse: Support fuse filesystems outside of init_user_ns Seth Forshee
  9 siblings, 0 replies; 17+ messages in thread
From: Seth Forshee @ 2016-04-26 19:30 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn, Seth Forshee,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	Richard Weinberger, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Alexander Viro,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Pavel Tikhomirov

When the userspace process servicing fuse requests is running in
a pid namespace then pids passed via the fuse fd are not being
translated into that process' namespace. Translation is necessary
for the pid to be useful to that process.

Since no use case currently exists for changing namespaces all
translations can be done relative to the pid namespace in use
when fuse_conn_init() is called. For fuse this translates to
mount time, and for cuse this is when /dev/cuse is opened. IO for
this connection from another namespace will return errors.

Requests from processes whose pid cannot be translated into the
target namespace are not permitted, except for requests
allocated via fuse_get_req_nofail_nopages. For no-fail requests
in.h.pid will be 0 if the pid translation fails.

File locking changes based on previous work done by Eric
Biederman.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Miklos Szeredi <mszeredi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/fuse/dev.c    | 19 +++++++++++++++----
 fs/fuse/file.c   | 22 +++++++++++++++++-----
 fs/fuse/fuse_i.h |  4 ++++
 fs/fuse/inode.c  |  3 +++
 4 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index cbece1221417..4e91b2ac25a7 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -19,6 +19,7 @@
 #include <linux/pipe_fs_i.h>
 #include <linux/swap.h>
 #include <linux/splice.h>
+#include <linux/sched.h>
 
 MODULE_ALIAS_MISCDEV(FUSE_MINOR);
 MODULE_ALIAS("devname:fuse");
@@ -124,11 +125,11 @@ static void __fuse_put_request(struct fuse_req *req)
 	atomic_dec(&req->count);
 }
 
-static void fuse_req_init_context(struct fuse_req *req)
+static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
 	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
 	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
-	req->in.h.pid = current->pid;
+	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
 }
 
 void fuse_set_initialized(struct fuse_conn *fc)
@@ -181,10 +182,14 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
 		goto out;
 	}
 
-	fuse_req_init_context(req);
+	fuse_req_init_context(fc, req);
 	__set_bit(FR_WAITING, &req->flags);
 	if (for_background)
 		__set_bit(FR_BACKGROUND, &req->flags);
+	if (req->in.h.pid == 0) {
+		fuse_put_request(fc, req);
+		return ERR_PTR(-EOVERFLOW);
+	}
 
 	return req;
 
@@ -274,7 +279,7 @@ struct fuse_req *fuse_get_req_nofail_nopages(struct fuse_conn *fc,
 	if (!req)
 		req = get_reserved_req(fc, file);
 
-	fuse_req_init_context(req);
+	fuse_req_init_context(fc, req);
 	__set_bit(FR_WAITING, &req->flags);
 	__clear_bit(FR_BACKGROUND, &req->flags);
 	return req;
@@ -1243,6 +1248,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 	struct fuse_in *in;
 	unsigned reqsize;
 
+	if (task_active_pid_ns(current) != fc->pid_ns)
+		return -EIO;
+
  restart:
 	spin_lock(&fiq->waitq.lock);
 	err = -EAGAIN;
@@ -1872,6 +1880,9 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	struct fuse_req *req;
 	struct fuse_out_header oh;
 
+	if (task_active_pid_ns(current) != fc->pid_ns)
+		return -EIO;
+
 	if (nbytes < sizeof(struct fuse_out_header))
 		return -EINVAL;
 
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 719924d6c706..b5c616c5ec98 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2067,7 +2067,8 @@ static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma)
 	return generic_file_mmap(file, vma);
 }
 
-static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
+static int convert_fuse_file_lock(struct fuse_conn *fc,
+				  const struct fuse_file_lock *ffl,
 				  struct file_lock *fl)
 {
 	switch (ffl->type) {
@@ -2082,7 +2083,14 @@ static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
 
 		fl->fl_start = ffl->start;
 		fl->fl_end = ffl->end;
-		fl->fl_pid = ffl->pid;
+
+		/*
+		 * Convert pid into the caller's pid namespace. If the pid
+		 * does not map into the namespace fl_pid will get set to 0.
+		 */
+		rcu_read_lock();
+		fl->fl_pid = pid_vnr(find_pid_ns(ffl->pid, fc->pid_ns));
+		rcu_read_unlock();
 		break;
 
 	default:
@@ -2131,7 +2139,7 @@ static int fuse_getlk(struct file *file, struct file_lock *fl)
 	args.out.args[0].value = &outarg;
 	err = fuse_simple_request(fc, &args);
 	if (!err)
-		err = convert_fuse_file_lock(&outarg.lk, fl);
+		err = convert_fuse_file_lock(fc, &outarg.lk, fl);
 
 	return err;
 }
@@ -2143,7 +2151,8 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
 	FUSE_ARGS(args);
 	struct fuse_lk_in inarg;
 	int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
-	pid_t pid = fl->fl_type != F_UNLCK ? current->tgid : 0;
+	struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL;
+	pid_t pid_nr = pid_nr_ns(pid, fc->pid_ns);
 	int err;
 
 	if (fl->fl_lmops && fl->fl_lmops->lm_grant) {
@@ -2155,7 +2164,10 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
 	if (fl->fl_flags & FL_CLOSE)
 		return 0;
 
-	fuse_lk_fill(&args, file, fl, opcode, pid, flock, &inarg);
+	if (pid && pid_nr == 0)
+		return -EOVERFLOW;
+
+	fuse_lk_fill(&args, file, fl, opcode, pid_nr, flock, &inarg);
 	err = fuse_simple_request(fc, &args);
 
 	/* locking is restartable */
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index eddbe02c4028..9145445a759a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -23,6 +23,7 @@
 #include <linux/poll.h>
 #include <linux/workqueue.h>
 #include <linux/kref.h>
+#include <linux/pid_namespace.h>
 
 /** Max number of pages that can be used in a single read request */
 #define FUSE_MAX_PAGES_PER_REQ 32
@@ -465,6 +466,9 @@ struct fuse_conn {
 	/** The group id for this mount */
 	kgid_t group_id;
 
+	/** The pid namespace for this mount */
+	struct pid_namespace *pid_ns;
+
 	/** The fuse mount flags for this mount */
 	unsigned flags;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 1ce67668a8e1..eade0bfa4488 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -20,6 +20,7 @@
 #include <linux/random.h>
 #include <linux/sched.h>
 #include <linux/exportfs.h>
+#include <linux/pid_namespace.h>
 
 MODULE_AUTHOR("Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>");
 MODULE_DESCRIPTION("Filesystem in Userspace");
@@ -609,6 +610,7 @@ void fuse_conn_init(struct fuse_conn *fc)
 	fc->connected = 1;
 	fc->attr_version = 1;
 	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
+	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);
 
@@ -617,6 +619,7 @@ void fuse_conn_put(struct fuse_conn *fc)
 	if (atomic_dec_and_test(&fc->count)) {
 		if (fc->destroy_req)
 			fuse_request_free(fc->destroy_req);
+		put_pid_ns(fc->pid_ns);
 		fc->release(fc);
 	}
 }
-- 
2.7.4

_______________________________________________
Selinux mailing list
Selinux-+05T5uksL2qpZYMLLGbcSA@public.gmane.org
To unsubscribe, send email to Selinux-leave-+05T5uksL2pAGbPMOrvdOA@public.gmane.org
To get help, send an email containing "help" to Selinux-request-+05T5uksL2pAGbPMOrvdOA@public.gmane.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 19/21] fuse: Support fuse filesystems outside of init_user_ns
       [not found] ` <1461699046-30485-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
                     ` (8 preceding siblings ...)
  2016-04-26 19:30   ` [PATCH v4 18/21] fuse: Add support for pid namespaces Seth Forshee
@ 2016-04-26 19:30   ` Seth Forshee
  9 siblings, 0 replies; 17+ messages in thread
From: Seth Forshee @ 2016-04-26 19:30 UTC (permalink / raw)
  To: Eric W. Biederman, Miklos Szeredi
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn, Seth Forshee,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Miklos Szeredi,
	Richard Weinberger, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Austin S Hemmelgarn,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Alexander Viro,
	selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Pavel Tikhomirov

In order to support mounts from namespaces other than
init_user_ns, fuse must translate uids and gids to/from the
userns of the process servicing requests on /dev/fuse. This
patch does that, with a couple of restrictions on the namespace:

 - The userns for the fuse connection is fixed to the namespace
   from which /dev/fuse is opened.

 - The namespace must be the same as s_user_ns.

These restrictions simplify the implementation by avoiding the
need to pass around userns references and by allowing fuse to
rely on the checks in inode_change_ok for ownership changes.
Either restriction could be relaxed in the future if needed.

For cuse the namespace used for the connection is also simply
current_user_ns() at the time /dev/cuse is opened.

Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 fs/fuse/cuse.c   |  3 ++-
 fs/fuse/dev.c    | 13 ++++++++-----
 fs/fuse/dir.c    | 14 +++++++-------
 fs/fuse/fuse_i.h |  6 +++++-
 fs/fuse/inode.c  | 33 +++++++++++++++++++++------------
 5 files changed, 43 insertions(+), 26 deletions(-)

diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index c5b6b7165489..98ebd0f4fd4c 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -48,6 +48,7 @@
 #include <linux/stat.h>
 #include <linux/module.h>
 #include <linux/uio.h>
+#include <linux/user_namespace.h>
 
 #include "fuse_i.h"
 
@@ -498,7 +499,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file)
 	if (!cc)
 		return -ENOMEM;
 
-	fuse_conn_init(&cc->fc);
+	fuse_conn_init(&cc->fc, current_user_ns());
 
 	fud = fuse_dev_alloc(&cc->fc);
 	if (!fud) {
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 4e91b2ac25a7..8fa1ce934df3 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req)
 
 static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
 {
-	req->in.h.uid = from_kuid_munged(&init_user_ns, current_fsuid());
-	req->in.h.gid = from_kgid_munged(&init_user_ns, current_fsgid());
+	req->in.h.uid = from_kuid(fc->user_ns, current_fsuid());
+	req->in.h.gid = from_kgid(fc->user_ns, current_fsgid());
 	req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
 }
 
@@ -186,7 +186,8 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages,
 	__set_bit(FR_WAITING, &req->flags);
 	if (for_background)
 		__set_bit(FR_BACKGROUND, &req->flags);
-	if (req->in.h.pid == 0) {
+	if (req->in.h.pid == 0 || req->in.h.uid == (uid_t)-1 ||
+	    req->in.h.gid == (gid_t)-1) {
 		fuse_put_request(fc, req);
 		return ERR_PTR(-EOVERFLOW);
 	}
@@ -1248,7 +1249,8 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 	struct fuse_in *in;
 	unsigned reqsize;
 
-	if (task_active_pid_ns(current) != fc->pid_ns)
+	if (task_active_pid_ns(current) != fc->pid_ns ||
+	    current_user_ns() != fc->user_ns)
 		return -EIO;
 
  restart:
@@ -1880,7 +1882,8 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	struct fuse_req *req;
 	struct fuse_out_header oh;
 
-	if (task_active_pid_ns(current) != fc->pid_ns)
+	if (task_active_pid_ns(current) != fc->pid_ns ||
+	    current_user_ns() != fc->user_ns)
 		return -EIO;
 
 	if (nbytes < sizeof(struct fuse_out_header))
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 4b855b65d457..ecba75bf6640 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -841,8 +841,8 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
 	stat->ino = attr->ino;
 	stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
 	stat->nlink = attr->nlink;
-	stat->uid = make_kuid(&init_user_ns, attr->uid);
-	stat->gid = make_kgid(&init_user_ns, attr->gid);
+	stat->uid = make_kuid(fc->user_ns, attr->uid);
+	stat->gid = make_kgid(fc->user_ns, attr->gid);
 	stat->rdev = inode->i_rdev;
 	stat->atime.tv_sec = attr->atime;
 	stat->atime.tv_nsec = attr->atimensec;
@@ -1459,17 +1459,17 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
 	return true;
 }
 
-static void iattr_to_fattr(struct iattr *iattr, struct fuse_setattr_in *arg,
-			   bool trust_local_cmtime)
+static void iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
+			   struct fuse_setattr_in *arg, bool trust_local_cmtime)
 {
 	unsigned ivalid = iattr->ia_valid;
 
 	if (ivalid & ATTR_MODE)
 		arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
 	if (ivalid & ATTR_UID)
-		arg->valid |= FATTR_UID,    arg->uid = from_kuid(&init_user_ns, iattr->ia_uid);
+		arg->valid |= FATTR_UID,    arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
 	if (ivalid & ATTR_GID)
-		arg->valid |= FATTR_GID,    arg->gid = from_kgid(&init_user_ns, iattr->ia_gid);
+		arg->valid |= FATTR_GID,    arg->gid = from_kgid(fc->user_ns, iattr->ia_gid);
 	if (ivalid & ATTR_SIZE)
 		arg->valid |= FATTR_SIZE,   arg->size = iattr->ia_size;
 	if (ivalid & ATTR_ATIME) {
@@ -1629,7 +1629,7 @@ int fuse_do_setattr(struct inode *inode, struct iattr *attr,
 
 	memset(&inarg, 0, sizeof(inarg));
 	memset(&outarg, 0, sizeof(outarg));
-	iattr_to_fattr(attr, &inarg, trust_local_cmtime);
+	iattr_to_fattr(fc, attr, &inarg, trust_local_cmtime);
 	if (file) {
 		struct fuse_file *ff = file->private_data;
 		inarg.valid |= FATTR_FH;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 9145445a759a..9f4c3c82edd6 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -24,6 +24,7 @@
 #include <linux/workqueue.h>
 #include <linux/kref.h>
 #include <linux/pid_namespace.h>
+#include <linux/user_namespace.h>
 
 /** Max number of pages that can be used in a single read request */
 #define FUSE_MAX_PAGES_PER_REQ 32
@@ -469,6 +470,9 @@ struct fuse_conn {
 	/** The pid namespace for this mount */
 	struct pid_namespace *pid_ns;
 
+	/** The user namespace for this mount */
+	struct user_namespace *user_ns;
+
 	/** The fuse mount flags for this mount */
 	unsigned flags;
 
@@ -867,7 +871,7 @@ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
 /**
  * Initialize fuse_conn
  */
-void fuse_conn_init(struct fuse_conn *fc);
+void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns);
 
 /**
  * Release reference to fuse_conn
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index eade0bfa4488..0a771145d853 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -167,8 +167,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
 	inode->i_ino     = fuse_squash_ino(attr->ino);
 	inode->i_mode    = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
 	set_nlink(inode, attr->nlink);
-	inode->i_uid     = make_kuid(&init_user_ns, attr->uid);
-	inode->i_gid     = make_kgid(&init_user_ns, attr->gid);
+	inode->i_uid     = make_kuid(fc->user_ns, attr->uid);
+	inode->i_gid     = make_kgid(fc->user_ns, attr->gid);
 	inode->i_blocks  = attr->blocks;
 	inode->i_atime.tv_sec   = attr->atime;
 	inode->i_atime.tv_nsec  = attr->atimensec;
@@ -467,7 +467,8 @@ static int fuse_match_uint(substring_t *s, unsigned int *res)
 	return err;
 }
 
-static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
+static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
+			  struct user_namespace *user_ns)
 {
 	char *p;
 	memset(d, 0, sizeof(struct fuse_mount_data));
@@ -503,7 +504,7 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
 		case OPT_USER_ID:
 			if (fuse_match_uint(&args[0], &uv))
 				return 0;
-			d->user_id = make_kuid(current_user_ns(), uv);
+			d->user_id = make_kuid(user_ns, uv);
 			if (!uid_valid(d->user_id))
 				return 0;
 			d->user_id_present = 1;
@@ -512,7 +513,7 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
 		case OPT_GROUP_ID:
 			if (fuse_match_uint(&args[0], &uv))
 				return 0;
-			d->group_id = make_kgid(current_user_ns(), uv);
+			d->group_id = make_kgid(user_ns, uv);
 			if (!gid_valid(d->group_id))
 				return 0;
 			d->group_id_present = 1;
@@ -555,8 +556,10 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 	struct super_block *sb = root->d_sb;
 	struct fuse_conn *fc = get_fuse_conn_super(sb);
 
-	seq_printf(m, ",user_id=%u", from_kuid_munged(&init_user_ns, fc->user_id));
-	seq_printf(m, ",group_id=%u", from_kgid_munged(&init_user_ns, fc->group_id));
+	seq_printf(m, ",user_id=%u",
+		   from_kuid_munged(fc->user_ns, fc->user_id));
+	seq_printf(m, ",group_id=%u",
+		   from_kgid_munged(fc->user_ns, fc->group_id));
 	if (fc->flags & FUSE_DEFAULT_PERMISSIONS)
 		seq_puts(m, ",default_permissions");
 	if (fc->flags & FUSE_ALLOW_OTHER)
@@ -587,7 +590,7 @@ static void fuse_pqueue_init(struct fuse_pqueue *fpq)
 	fpq->connected = 1;
 }
 
-void fuse_conn_init(struct fuse_conn *fc)
+void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns)
 {
 	memset(fc, 0, sizeof(*fc));
 	spin_lock_init(&fc->lock);
@@ -611,6 +614,7 @@ void fuse_conn_init(struct fuse_conn *fc)
 	fc->attr_version = 1;
 	get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
 	fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
+	fc->user_ns = get_user_ns(user_ns);
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);
 
@@ -620,6 +624,7 @@ void fuse_conn_put(struct fuse_conn *fc)
 		if (fc->destroy_req)
 			fuse_request_free(fc->destroy_req);
 		put_pid_ns(fc->pid_ns);
+		put_user_ns(fc->user_ns);
 		fc->release(fc);
 	}
 }
@@ -1046,7 +1051,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 
 	sb->s_flags &= ~(MS_NOSEC | MS_I_VERSION);
 
-	if (!parse_fuse_opt(data, &d, is_bdev))
+	if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns))
 		goto err;
 
 	if (is_bdev) {
@@ -1070,8 +1075,12 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 	if (!file)
 		goto err;
 
-	if ((file->f_op != &fuse_dev_operations) ||
-	    (file->f_cred->user_ns != &init_user_ns))
+	/*
+	 * Require mount to happen from the same user namespace which
+	 * opened /dev/fuse to prevent potential attacks.
+	 */
+	if (file->f_op != &fuse_dev_operations ||
+	    file->f_cred->user_ns != sb->s_user_ns)
 		goto err_fput;
 
 	fc = kmalloc(sizeof(*fc), GFP_KERNEL);
@@ -1079,7 +1088,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 	if (!fc)
 		goto err_fput;
 
-	fuse_conn_init(fc);
+	fuse_conn_init(fc, sb->s_user_ns);
 	fc->release = fuse_free_conn;
 
 	fud = fuse_dev_alloc(fc);
-- 
2.7.4

_______________________________________________
Selinux mailing list
Selinux-+05T5uksL2qpZYMLLGbcSA@public.gmane.org
To unsubscribe, send email to Selinux-leave-+05T5uksL2pAGbPMOrvdOA@public.gmane.org
To get help, send an email containing "help" to Selinux-request-+05T5uksL2pAGbPMOrvdOA@public.gmane.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces
       [not found] ` <1461699046-30485-4-git-send-email-seth.forshee@canonical.com>
@ 2016-05-17 22:39   ` Eric W. Biederman
  2016-05-17 23:58     ` Seth Forshee
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2016-05-17 22:39 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Greg Kroah-Hartman, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Li Zefan, Johannes Weiner, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, Miklos Szeredi,
	Pavel Tikhomirov, linux-kernel, linux-bcache, dm-devel,
	linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux, cgroups

Seth Forshee <seth.forshee@canonical.com> writes:

> Both of these filesystems already have use cases for mounting the
> same super block from multiple user namespaces. For sysfs this
> happens when using criu for snapshotting a container, where sysfs
> is mnounted in the containers network ns but the hosts user ns.
> The cgroup filesystem shares the same super block for all mounts
> of the same hierarchy regardless of the namespace.
>
> As a result, the restriction on mounting a super block from a
> single user namespace creates regressions for existing uses of
> these filesystems. For these specific filesystems this
> restriction isn't really necessary since the backing store is
> objects in kernel memory and thus the ids assigned from inodes
> is not subject to translation relative to s_user_ns.
>
> Add a new filesystem flag, FS_USERNS_SHARE_SB, which when set
> causes sget_userns() to skip the check of s_user_ns. Set this
> flag for the sysfs and cgroup filesystems to fix the
> regressions.

So this one needs to be sget_userns(..., &init_user_ns, ...).
And not a new special case.

Apologies for not catching this earlier.

I am looking at folding all of this into the patch that introduces
sget_userns so that even bisects won't have regresssions.

We loose the ability to call mount -o remount and actually affect
these filesystems (which we don't have without s_user_ns) but we gain a
whole lot of simplicity, and we don't break the xattr and security label
on sysfs code.

Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces
  2016-05-17 22:39   ` [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces Eric W. Biederman
@ 2016-05-17 23:58     ` Seth Forshee
  2016-05-18 15:45       ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Seth Forshee @ 2016-05-17 23:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Viro, Greg Kroah-Hartman, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Li Zefan, Johannes Weiner, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, Miklos Szeredi,
	Pavel Tikhomirov, linux-kernel, linux-bcache, dm-devel,
	linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux, cgroups

On Tue, May 17, 2016 at 05:39:33PM -0500, Eric W. Biederman wrote:
> Seth Forshee <seth.forshee@canonical.com> writes:
> 
> > Both of these filesystems already have use cases for mounting the
> > same super block from multiple user namespaces. For sysfs this
> > happens when using criu for snapshotting a container, where sysfs
> > is mnounted in the containers network ns but the hosts user ns.
> > The cgroup filesystem shares the same super block for all mounts
> > of the same hierarchy regardless of the namespace.
> >
> > As a result, the restriction on mounting a super block from a
> > single user namespace creates regressions for existing uses of
> > these filesystems. For these specific filesystems this
> > restriction isn't really necessary since the backing store is
> > objects in kernel memory and thus the ids assigned from inodes
> > is not subject to translation relative to s_user_ns.
> >
> > Add a new filesystem flag, FS_USERNS_SHARE_SB, which when set
> > causes sget_userns() to skip the check of s_user_ns. Set this
> > flag for the sysfs and cgroup filesystems to fix the
> > regressions.
> 
> So this one needs to be sget_userns(..., &init_user_ns, ...).
> And not a new special case.

This is actually what I wanted to do, but based on a previous discussion
where I had suggested doing this (for a different reason) I came away
thinking you did not want it that way. So I'm happy with that change.

But if we do that it violates some of the assumptions of the patch to
rework MNT_NODEV on your testing branch (and also those behind patch 2
in this series). Something will need to be changed there to prevent a
regression in mount behavior when a user ns tries to mount without
MNT_NODEV when the mount inherited from its parent has it set.

> Apologies for not catching this earlier.

Actually this is a more recent patch, so you possibly hadn't seen it
before.

> I am looking at folding all of this into the patch that introduces
> sget_userns so that even bisects won't have regresssions.

That's fine with me.

Thanks,
Seth


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces
  2016-05-17 23:58     ` Seth Forshee
@ 2016-05-18 15:45       ` Eric W. Biederman
  2016-05-18 16:16         ` Seth Forshee
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2016-05-18 15:45 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Greg Kroah-Hartman, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Li Zefan, Johannes Weiner, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, Miklos Szeredi,
	Pavel Tikhomirov, linux-kernel, linux-bcache, dm-devel,
	linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux, cgroups

Seth Forshee <seth.forshee@canonical.com> writes:

> On Tue, May 17, 2016 at 05:39:33PM -0500, Eric W. Biederman wrote:
>> Seth Forshee <seth.forshee@canonical.com> writes:
>> 
>> > Both of these filesystems already have use cases for mounting the
>> > same super block from multiple user namespaces. For sysfs this
>> > happens when using criu for snapshotting a container, where sysfs
>> > is mnounted in the containers network ns but the hosts user ns.
>> > The cgroup filesystem shares the same super block for all mounts
>> > of the same hierarchy regardless of the namespace.
>> >
>> > As a result, the restriction on mounting a super block from a
>> > single user namespace creates regressions for existing uses of
>> > these filesystems. For these specific filesystems this
>> > restriction isn't really necessary since the backing store is
>> > objects in kernel memory and thus the ids assigned from inodes
>> > is not subject to translation relative to s_user_ns.
>> >
>> > Add a new filesystem flag, FS_USERNS_SHARE_SB, which when set
>> > causes sget_userns() to skip the check of s_user_ns. Set this
>> > flag for the sysfs and cgroup filesystems to fix the
>> > regressions.
>> 
>> So this one needs to be sget_userns(..., &init_user_ns, ...).
>> And not a new special case.
>
> This is actually what I wanted to do, but based on a previous discussion
> where I had suggested doing this (for a different reason) I came away
> thinking you did not want it that way. So I'm happy with that change.

Yeah.  Somedays it seems like there are a lot of pieces in play here.
The security labels on sysfs seems to be a very compelling case.

> But if we do that it violates some of the assumptions of the patch to
> rework MNT_NODEV on your testing branch (and also those behind patch 2
> in this series). Something will need to be changed there to prevent a
> regression in mount behavior when a user ns tries to mount without
> MNT_NODEV when the mount inherited from its parent has it set.

Thank you for pointing that out.  I will look into that.

I believe I know exactly what you are talking about.  Of the choices I
think it is better to a minor localized change in the fs_fully_visible
logic than it is to cause problems elsewhere.

>> Apologies for not catching this earlier.
>
> Actually this is a more recent patch, so you possibly hadn't seen it
> before.
>
>> I am looking at folding all of this into the patch that introduces
>> sget_userns so that even bisects won't have regresssions.
>
> That's fine with me.

And thank you for keeping everything as separate patches.  That is at
least helping me catch up.  Even if I don't agree that these things
should be separate come merge time.

Eric


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces
  2016-05-18 15:45       ` Eric W. Biederman
@ 2016-05-18 16:16         ` Seth Forshee
  2016-05-18 16:27           ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Seth Forshee @ 2016-05-18 16:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Viro, Greg Kroah-Hartman, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Li Zefan, Johannes Weiner, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, Miklos Szeredi,
	Pavel Tikhomirov, linux-kernel, linux-bcache, dm-devel,
	linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux, cgroups

On Wed, May 18, 2016 at 10:45:31AM -0500, Eric W. Biederman wrote:
> > But if we do that it violates some of the assumptions of the patch to
> > rework MNT_NODEV on your testing branch (and also those behind patch 2
> > in this series). Something will need to be changed there to prevent a
> > regression in mount behavior when a user ns tries to mount without
> > MNT_NODEV when the mount inherited from its parent has it set.
> 
> Thank you for pointing that out.  I will look into that.
> 
> I believe I know exactly what you are talking about.  Of the choices I
> think it is better to a minor localized change in the fs_fully_visible
> logic than it is to cause problems elsewhere.

Agreed.

> >> Apologies for not catching this earlier.
> >
> > Actually this is a more recent patch, so you possibly hadn't seen it
> > before.
> >
> >> I am looking at folding all of this into the patch that introduces
> >> sget_userns so that even bisects won't have regresssions.
> >
> > That's fine with me.
> 
> And thank you for keeping everything as separate patches.  That is at
> least helping me catch up.  Even if I don't agree that these things
> should be separate come merge time.

Honestly I probably would have squashed some of them into that first
patch myself if you hadn't already applied it to your testing branch, so
that's all just luck.

Keep in mind that I also have that patch for mqueue that isn't in this
series, and I haven't yet checked to see if the 4.7 merges introduce
anything which is going to require updating these patches. I was
planning to wait and send out updates after -rc1, but if you want that
stuff sooner just let me know.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces
  2016-05-18 16:16         ` Seth Forshee
@ 2016-05-18 16:27           ` Eric W. Biederman
  0 siblings, 0 replies; 17+ messages in thread
From: Eric W. Biederman @ 2016-05-18 16:27 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Alexander Viro, Greg Kroah-Hartman, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Li Zefan, Johannes Weiner, Serge Hallyn,
	Richard Weinberger, Austin S Hemmelgarn, Miklos Szeredi,
	Pavel Tikhomirov, linux-kernel, linux-bcache, dm-devel,
	linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux, cgroups

Seth Forshee <seth.forshee@canonical.com> writes:

> On Wed, May 18, 2016 at 10:45:31AM -0500, Eric W. Biederman wrote:
>> > But if we do that it violates some of the assumptions of the patch to
>> > rework MNT_NODEV on your testing branch (and also those behind patch 2
>> > in this series). Something will need to be changed there to prevent a
>> > regression in mount behavior when a user ns tries to mount without
>> > MNT_NODEV when the mount inherited from its parent has it set.
>> 
>> Thank you for pointing that out.  I will look into that.
>> 
>> I believe I know exactly what you are talking about.  Of the choices I
>> think it is better to a minor localized change in the fs_fully_visible
>> logic than it is to cause problems elsewhere.
>
> Agreed.
>
>> >> Apologies for not catching this earlier.
>> >
>> > Actually this is a more recent patch, so you possibly hadn't seen it
>> > before.
>> >
>> >> I am looking at folding all of this into the patch that introduces
>> >> sget_userns so that even bisects won't have regresssions.
>> >
>> > That's fine with me.
>> 
>> And thank you for keeping everything as separate patches.  That is at
>> least helping me catch up.  Even if I don't agree that these things
>> should be separate come merge time.
>
> Honestly I probably would have squashed some of them into that first
> patch myself if you hadn't already applied it to your testing branch, so
> that's all just luck.
>
> Keep in mind that I also have that patch for mqueue that isn't in this
> series, and I haven't yet checked to see if the 4.7 merges introduce
> anything which is going to require updating these patches. I was
> planning to wait and send out updates after -rc1, but if you want that
> stuff sooner just let me know.

As unfortunately I don't have anything going into -rc1 I am working on
this right now.

Let me finish sorting out the sget_userns and mnt nodev mess and I will
push something out and then we can compare notes.  I think I have mqueue
covered by other changes.  As it is in the set of filesystems that
should just use sget_userns.

I am sorting through the nodev corner of this now.  It should just be a
day or two.

Eric


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces
  2016-04-26 19:36 [PATCH v4 00/21] Support fuse mounts in " Seth Forshee
@ 2016-04-26 19:36 ` Seth Forshee
  0 siblings, 0 replies; 17+ messages in thread
From: Seth Forshee @ 2016-04-26 19:36 UTC (permalink / raw)
  To: Eric W. Biederman, Alexander Viro, Greg Kroah-Hartman,
	Jeff Layton, J. Bruce Fields, Tejun Heo, Li Zefan,
	Johannes Weiner
  Cc: Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn,
	Miklos Szeredi, Pavel Tikhomirov, linux-kernel, linux-bcache,
	dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel,
	linux-security-module, selinux, cgroups, Seth Forshee

Both of these filesystems already have use cases for mounting the
same super block from multiple user namespaces. For sysfs this
happens when using criu for snapshotting a container, where sysfs
is mounted in the containers network ns but the hosts user ns.
The cgroup filesystem shares the same super block for all mounts
of the same hierarchy regardless of the namespace.

As a result, the restriction on mounting a super block from a
single user namespace creates regressions for existing uses of
these filesystems. For these specific filesystems this
restriction isn't really necessary since the backing store is
objects in kernel memory and thus the ids assigned from inodes
is not subject to translation relative to s_user_ns.

Add a new filesystem flag, FS_USERNS_SHARE_SB, which when set
causes sget_userns() to skip the check of s_user_ns. Set this
flag for the sysfs and cgroup filesystems to fix the
regressions.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
---
 fs/super.c         | 3 ++-
 fs/sysfs/mount.c   | 3 ++-
 include/linux/fs.h | 1 +
 kernel/cgroup.c    | 4 ++--
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 092a7828442e..ead156b44bf8 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -472,7 +472,8 @@ retry:
 		hlist_for_each_entry(old, &type->fs_supers, s_instances) {
 			if (!test(old, data))
 				continue;
-			if (user_ns != old->s_user_ns) {
+			if (!(type->fs_flags & FS_USERNS_SHARE_SB) &&
+			    user_ns != old->s_user_ns) {
 				spin_unlock(&sb_lock);
 				if (s) {
 					up_write(&s->s_umount);
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index f3db82071cfb..9555accd4322 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -59,7 +59,8 @@ static struct file_system_type sysfs_fs_type = {
 	.name		= "sysfs",
 	.mount		= sysfs_mount,
 	.kill_sb	= sysfs_kill_sb,
-	.fs_flags	= FS_USERNS_VISIBLE | FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_VISIBLE | FS_USERNS_MOUNT |
+			  FS_USERNS_SHARE_SB,
 };
 
 int __init sysfs_init(void)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index be0f8023e28c..66a639ec1bc4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1988,6 +1988,7 @@ struct file_system_type {
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_USERNS_DEV_MOUNT	16 /* A userns mount does not imply MNT_NODEV */
 #define FS_USERNS_VISIBLE	32	/* FS must already be visible */
+#define FS_USERNS_SHARE_SB	64	/* Allow sharing sb between userns-es */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	struct dentry *(*mount) (struct file_system_type *, int,
 		       const char *, void *);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 671dc05c0b0f..9c9aa27e531a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2247,14 +2247,14 @@ static struct file_system_type cgroup_fs_type = {
 	.name = "cgroup",
 	.mount = cgroup_mount,
 	.kill_sb = cgroup_kill_sb,
-	.fs_flags = FS_USERNS_MOUNT,
+	.fs_flags = FS_USERNS_MOUNT | FS_USERNS_SHARE_SB,
 };
 
 static struct file_system_type cgroup2_fs_type = {
 	.name = "cgroup2",
 	.mount = cgroup_mount,
 	.kill_sb = cgroup_kill_sb,
-	.fs_flags = FS_USERNS_MOUNT,
+	.fs_flags = FS_USERNS_MOUNT | FS_USERNS_SHARE_SB,
 };
 
 static char *cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-05-18 16:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-26 19:30 [PATCH v4 00/21] Support fuse mounts in user namespaces Seth Forshee
     [not found] ` <1461699046-30485-1-git-send-email-seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
2016-04-26 19:30   ` [PATCH v4 02/21] fs: Remove check of s_user_ns for existing mounts in fs_fully_visible() Seth Forshee
2016-04-26 19:30   ` [PATCH v4 04/21] block_dev: Support checking inode permissions in lookup_bdev() Seth Forshee
2016-04-26 19:30   ` [PATCH v4 08/21] userns: Replace in_userns with current_in_userns Seth Forshee
2016-04-26 19:30   ` [PATCH v4 10/21] fs: Check for invalid i_uid in may_follow_link() Seth Forshee
2016-04-26 19:30   ` [PATCH v4 11/21] cred: Reject inodes with invalid ids in set_create_file_as() Seth Forshee
2016-04-26 19:30   ` [PATCH v4 13/21] fs: Update posix_acl support to handle user namespace mounts Seth Forshee
2016-04-26 19:30   ` [PATCH v4 14/21] fs: Allow superblock owner to change ownership of inodes with unmappable ids Seth Forshee
2016-04-26 19:30   ` [PATCH v4 16/21] fs: Allow superblock owner to access do_remount_sb() Seth Forshee
2016-04-26 19:30   ` [PATCH v4 18/21] fuse: Add support for pid namespaces Seth Forshee
2016-04-26 19:30   ` [PATCH v4 19/21] fuse: Support fuse filesystems outside of init_user_ns Seth Forshee
     [not found] ` <1461699046-30485-4-git-send-email-seth.forshee@canonical.com>
2016-05-17 22:39   ` [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces Eric W. Biederman
2016-05-17 23:58     ` Seth Forshee
2016-05-18 15:45       ` Eric W. Biederman
2016-05-18 16:16         ` Seth Forshee
2016-05-18 16:27           ` Eric W. Biederman
2016-04-26 19:36 [PATCH v4 00/21] Support fuse mounts in " Seth Forshee
2016-04-26 19:36 ` [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between " Seth Forshee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).