linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts
@ 2018-05-23 23:22 Eric W. Biederman
  2018-05-23 23:25 ` [REVIEW][PATCH 1/6] vfs: Don't allow changing the link count of an inode with an invalid uid or gid Eric W. Biederman
                   ` (7 more replies)
  0 siblings, 8 replies; 29+ messages in thread
From: Eric W. Biederman @ 2018-05-23 23:22 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Seth Forshee, Serge E. Hallyn, Christian Brauner,
	linux-kernel


Very slowly the work has been progressing to ensure the vfs has the
necessary support for mounting filesystems without privilege.

This patchset contains one more core piece of that work, ensuring a few
more operations that would write back an inode and confuse an exisiting
filesystem are denied.

The rest of the changes actually enable userns root to do things with
filesystems that the userns root has mounted.  Most of these have been
waiting in the wings a long time, held back because I wanted the core
of the patchset to be solid before I started allowing additional
behavor.

It is definitely time for these changes so the effect of s_user_ns
becomes less theoretical.

The change to allow mknod is new, but consistent with everything else
and harmless as device nodes on filesystems mounted without privilege
are ignored.

Unless problems show up in the during review I plan to merge these changes.

These changes are also available at:
  git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git userns-test

Eric W. Biederman (5):
      vfs: Don't allow changing the link count of an inode with an invalid uid or gid
      vfs: Allow userns root to call mknod on owned filesystems.
      fs: Allow superblock owner to replace invalid owners of inodes
      fs: Allow superblock owner to access do_remount_sb()
      capabilities: Allow privileged user in s_user_ns to set security.* xattrs

Seth Forshee (1):
      fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems

 fs/attr.c            | 36 ++++++++++++++++++++++++++++--------
 fs/ioctl.c           |  4 ++--
 fs/namei.c           | 16 ++++++++++++----
 fs/namespace.c       |  4 ++--
 security/commoncap.c |  8 ++++++--
 5 files changed, 50 insertions(+), 18 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [REVIEW][PATCH 1/6] vfs: Don't allow changing the link count of an inode with an invalid uid or gid
  2018-05-23 23:22 [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts Eric W. Biederman
@ 2018-05-23 23:25 ` Eric W. Biederman
  2018-05-24 12:58   ` Seth Forshee
  2018-05-23 23:25 ` [REVIEW][PATCH 2/6] vfs: Allow userns root to call mknod on owned filesystems Eric W. Biederman
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2018-05-23 23:25 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Seth Forshee, Serge E. Hallyn, Christian Brauner,
	linux-kernel, Eric W. Biederman

Changing the link count of an inode via unlink or link will cause a
write back of that inode.  If the uids or gids are invalid (aka not known
to the kernel) writing the inode back may change the uid or gid in the
filesystem.   To prevent possible filesystem and to avoid the need for
filesystem maintainers to worry about it don't allow operations on
inodes with an invalid uid or gid.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namei.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 186bd2464fd5..942c1f096f6b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -984,13 +984,15 @@ static bool safe_hardlink_source(struct inode *inode)
  */
 static int may_linkat(struct path *link)
 {
-	struct inode *inode;
+	struct inode *inode = link->dentry->d_inode;
+
+	/* Inode writeback is not safe when the uid or gid are invalid. */
+	if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid))
+		return -EOVERFLOW;
 
 	if (!sysctl_protected_hardlinks)
 		return 0;
 
-	inode = link->dentry->d_inode;
-
 	/* Source inode owner (or CAP_FOWNER) can hardlink all they like,
 	 * otherwise, it must be a safe source.
 	 */
@@ -2749,6 +2751,11 @@ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
 	BUG_ON(!inode);
 
 	BUG_ON(victim->d_parent->d_inode != dir);
+
+	/* Inode writeback is not safe when the uid or gid are invalid. */
+	if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid))
+		return -EOVERFLOW;
+
 	audit_inode_child(dir, victim, AUDIT_TYPE_CHILD_DELETE);
 
 	error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
-- 
2.14.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [REVIEW][PATCH 2/6] vfs: Allow userns root to call mknod on owned filesystems.
  2018-05-23 23:22 [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts Eric W. Biederman
  2018-05-23 23:25 ` [REVIEW][PATCH 1/6] vfs: Don't allow changing the link count of an inode with an invalid uid or gid Eric W. Biederman
@ 2018-05-23 23:25 ` Eric W. Biederman
  2018-05-24 13:55   ` Seth Forshee
  2018-05-24 19:12   ` Christian Brauner
  2018-05-23 23:25 ` [REVIEW][PATCH 3/6] fs: Allow superblock owner to replace invalid owners of inodes Eric W. Biederman
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 29+ messages in thread
From: Eric W. Biederman @ 2018-05-23 23:25 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Seth Forshee, Serge E. Hallyn, Christian Brauner,
	linux-kernel, Eric W. Biederman

These filesystems already always set SB_I_NODEV so mknod will not be
useful for gaining control of any devices no matter their permissions.
This will allow overlayfs and applications to fakeroot to use device
nodes to represent things on disk.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namei.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 942c1f096f6b..20335896dcce 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3679,7 +3679,8 @@ int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
 	if (error)
 		return error;
 
-	if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
+	if ((S_ISCHR(mode) || S_ISBLK(mode)) &&
+	    !ns_capable(dentry->d_sb->s_user_ns, CAP_MKNOD))
 		return -EPERM;
 
 	if (!dir->i_op->mknod)
-- 
2.14.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [REVIEW][PATCH 3/6] fs: Allow superblock owner to replace invalid owners of inodes
  2018-05-23 23:22 [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts Eric W. Biederman
  2018-05-23 23:25 ` [REVIEW][PATCH 1/6] vfs: Don't allow changing the link count of an inode with an invalid uid or gid Eric W. Biederman
  2018-05-23 23:25 ` [REVIEW][PATCH 2/6] vfs: Allow userns root to call mknod on owned filesystems Eric W. Biederman
@ 2018-05-23 23:25 ` Eric W. Biederman
  2018-05-23 23:41   ` [REVIEW][PATCH v2 " Eric W. Biederman
  2018-05-23 23:25 ` [REVIEW][PATCH 4/6] fs: Allow superblock owner to access do_remount_sb() Eric W. Biederman
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2018-05-23 23:25 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Seth Forshee, Serge E. Hallyn, Christian Brauner,
	linux-kernel, Eric W. Biederman

Allow users with CAP_SYS_CHOWN over the superblock of a filesystem to
chown files when inode owner is invalid.  Ordinarily the
capable_wrt_inode_uidgid check is sufficient to allow access to files
but when the underlying filesystem has uids or gids that don't map to
the current user namespace it is not enough, so the chown permission
checks need to be extended to allow this case.

Calling chown on filesystem nodes whose uid or gid don't map is
necessary if those nodes are going to be modified as writing back
inodes which contain uids or gids that don't map is likely to cause
filesystem corruption of the uid or gid fields.

Once chown has been called the existing capable_wrt_inode_uidgid
checks are sufficient to allow the owner of a superblock to do anything
the global root user can do with an appropriate set of capabilities.

An ordinary filesystem mountable by a userns root will limit all uids
and gids in s_user_ns or the INVALID_UID and INVALID_GID to flag all
others.  So having this added permission limited to just INVALID_UID
and INVALID_GID is sufficient to handle every case on an ordinary filesystem.

Of the virtual filesystems at least proc is known to set s_user_ns to
something other than &init_user_ns, while at the same time presenting
some files owned by GLOBAL_ROOT_UID.  Those files the mounter of proc
in a user namespace should not be able to chown to get access to.
Limiting the relaxation in permission to just the minimum of allowing
changing INVALID_UID and INVALID_GID prevents problems with cases like
that.

The original version of this patch was written by: Seth Forshee.  I
have rewritten and rethought this patch enough so it's really not the
same thing (certainly it needs a different description), but he
deserves credit for getting out there and getting the conversation
started, and finding the potential gotcha's and putting up with my
semi-paranoid feedback.

Inspired-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/attr.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index 12ffdb6fb63c..4220c98f41a5 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -18,6 +18,32 @@
 #include <linux/evm.h>
 #include <linux/ima.h>
 
+static bool chown_ok(const struct inode *inode, kuid_t uid)
+{
+	if (uid_eq(current_fsuid(), inode->i_uid) &&
+	    uid_eq(uid, inode->i_uid))
+		return true;
+	if (capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+		return true;
+	if (uid_eq(inode->i_uid, INVALID_UID) &&
+	    ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN));
+		return true;
+	return false;
+}
+
+static bool chgrp_ok(const struct inode *inode, kgid_t gid)
+{
+	if (uid_eq(current_fsuid(), inode->i_uid) &&
+	    (in_group_p(gid) || gid_eq(gid, inode->i_gid)))
+		return true;
+	if (capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+		return true;
+	if (gid_eq(inode->i_gid, INVALID_GID) &&
+	    ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN));
+		return true;
+	return false;
+}
+
 /**
  * setattr_prepare - check if attribute changes to a dentry are allowed
  * @dentry:	dentry to check
@@ -52,17 +78,11 @@ int setattr_prepare(struct dentry *dentry, struct iattr *attr)
 		goto kill_priv;
 
 	/* Make sure a caller can chown. */
-	if ((ia_valid & ATTR_UID) &&
-	    (!uid_eq(current_fsuid(), inode->i_uid) ||
-	     !uid_eq(attr->ia_uid, inode->i_uid)) &&
-	    !capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+	if ((ia_valid & ATTR_UID) && !chown_ok(inode, attr->ia_uid))
 		return -EPERM;
 
 	/* Make sure caller can chgrp. */
-	if ((ia_valid & ATTR_GID) &&
-	    (!uid_eq(current_fsuid(), inode->i_uid) ||
-	    (!in_group_p(attr->ia_gid) && !gid_eq(attr->ia_gid, inode->i_gid))) &&
-	    !capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+	if ((ia_valid & ATTR_GID) && !chgrp_ok(inode, attr->ia_gid))
 		return -EPERM;
 
 	/* Make sure a caller can chmod. */
-- 
2.14.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [REVIEW][PATCH 4/6] fs: Allow superblock owner to access do_remount_sb()
  2018-05-23 23:22 [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts Eric W. Biederman
                   ` (2 preceding siblings ...)
  2018-05-23 23:25 ` [REVIEW][PATCH 3/6] fs: Allow superblock owner to replace invalid owners of inodes Eric W. Biederman
@ 2018-05-23 23:25 ` Eric W. Biederman
  2018-05-24 15:58   ` Christian Brauner
  2018-05-23 23:25 ` [REVIEW][PATCH 5/6] capabilities: Allow privileged user in s_user_ns to set security.* xattrs Eric W. Biederman
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2018-05-23 23:25 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Seth Forshee, Serge E. Hallyn, Christian Brauner,
	linux-kernel, Eric W. Biederman

Superblock level remounts are currently restricted to global
CAP_SYS_ADMIN, as is the path for changing the root mount to
read only on umount. Loosen both of these permission checks to
also allow CAP_SYS_ADMIN in any namespace which is privileged
towards the userns which originally mounted the filesystem.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/namespace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5f75969adff1..8ddd14806799 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1590,7 +1590,7 @@ static int do_umount(struct mount *mnt, int flags)
 		 * Special case for "unmounting" root ...
 		 * we just try to remount it readonly.
 		 */
-		if (!capable(CAP_SYS_ADMIN))
+		if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
 			return -EPERM;
 		down_write(&sb->s_umount);
 		if (!sb_rdonly(sb))
@@ -2333,7 +2333,7 @@ static int do_remount(struct path *path, int ms_flags, int sb_flags,
 	down_write(&sb->s_umount);
 	if (ms_flags & MS_BIND)
 		err = change_mount_flags(path->mnt, ms_flags);
-	else if (!capable(CAP_SYS_ADMIN))
+	else if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
 		err = -EPERM;
 	else
 		err = do_remount_sb(sb, sb_flags, data, 0);
-- 
2.14.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [REVIEW][PATCH 5/6] capabilities: Allow privileged user in s_user_ns to set security.* xattrs
  2018-05-23 23:22 [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts Eric W. Biederman
                   ` (3 preceding siblings ...)
  2018-05-23 23:25 ` [REVIEW][PATCH 4/6] fs: Allow superblock owner to access do_remount_sb() Eric W. Biederman
@ 2018-05-23 23:25 ` Eric W. Biederman
  2018-05-24 15:57   ` Christian Brauner
  2018-05-23 23:25 ` [REVIEW][PATCH 6/6] fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems Eric W. Biederman
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2018-05-23 23:25 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Seth Forshee, Serge E. Hallyn, Christian Brauner,
	linux-kernel, Eric W. Biederman

A privileged user in s_user_ns will generally have the ability to
manipulate the backing store and insert security.* xattrs into
the filesystem directly. Therefore the kernel must be prepared to
handle these xattrs from unprivileged mounts, and it makes little
sense for commoncap to prevent writing these xattrs to the
filesystem. The capability and LSM code have already been updated
to appropriately handle xattrs from unprivileged mounts, so it
is safe to loosen this restriction on setting xattrs.

The exception to this logic is that writing xattrs to a mounted
filesystem may also cause the LSM inode_post_setxattr or
inode_setsecurity callbacks to be invoked. SELinux will deny the
xattr update by virtue of applying mountpoint labeling to
unprivileged userns mounts, and Smack will deny the writes for
any user without global CAP_MAC_ADMIN, so loosening the
capability check in commoncap is safe in this respect as well.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 security/commoncap.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index 1ce701fcb3f3..f4c33abd9959 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -919,6 +919,8 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
 int cap_inode_setxattr(struct dentry *dentry, const char *name,
 		       const void *value, size_t size, int flags)
 {
+	struct user_namespace *user_ns = dentry->d_sb->s_user_ns;
+
 	/* Ignore non-security xattrs */
 	if (strncmp(name, XATTR_SECURITY_PREFIX,
 			sizeof(XATTR_SECURITY_PREFIX) - 1) != 0)
@@ -931,7 +933,7 @@ int cap_inode_setxattr(struct dentry *dentry, const char *name,
 	if (strcmp(name, XATTR_NAME_CAPS) == 0)
 		return 0;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 	return 0;
 }
@@ -949,6 +951,8 @@ int cap_inode_setxattr(struct dentry *dentry, const char *name,
  */
 int cap_inode_removexattr(struct dentry *dentry, const char *name)
 {
+	struct user_namespace *user_ns = dentry->d_sb->s_user_ns;
+
 	/* Ignore non-security xattrs */
 	if (strncmp(name, XATTR_SECURITY_PREFIX,
 			sizeof(XATTR_SECURITY_PREFIX) - 1) != 0)
@@ -964,7 +968,7 @@ int cap_inode_removexattr(struct dentry *dentry, const char *name)
 		return 0;
 	}
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 	return 0;
 }
-- 
2.14.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [REVIEW][PATCH 6/6] fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
  2018-05-23 23:22 [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts Eric W. Biederman
                   ` (4 preceding siblings ...)
  2018-05-23 23:25 ` [REVIEW][PATCH 5/6] capabilities: Allow privileged user in s_user_ns to set security.* xattrs Eric W. Biederman
@ 2018-05-23 23:25 ` Eric W. Biederman
  2018-05-24 15:59   ` Christian Brauner
  2018-05-24 21:46 ` [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts Theodore Y. Ts'o
  2018-05-29 15:40 ` Dongsu Park
  7 siblings, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2018-05-23 23:25 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Seth Forshee, Serge E. Hallyn, Christian Brauner,
	linux-kernel, Eric W . Biederman

From: Seth Forshee <seth.forshee@canonical.com>

The user in control of a super block should be allowed to freeze
and thaw it. Relax the restrictions on the FIFREEZE and FITHAW
ioctls to require CAP_SYS_ADMIN in s_user_ns.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/ioctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index 4823431d1c9d..b445b13fc59b 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -549,7 +549,7 @@ static int ioctl_fsfreeze(struct file *filp)
 {
 	struct super_block *sb = file_inode(filp)->i_sb;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/* If filesystem doesn't support freeze feature, return. */
@@ -566,7 +566,7 @@ static int ioctl_fsthaw(struct file *filp)
 {
 	struct super_block *sb = file_inode(filp)->i_sb;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/* Thaw */
-- 
2.14.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [REVIEW][PATCH v2 3/6] fs: Allow superblock owner to replace invalid owners of inodes
  2018-05-23 23:25 ` [REVIEW][PATCH 3/6] fs: Allow superblock owner to replace invalid owners of inodes Eric W. Biederman
@ 2018-05-23 23:41   ` Eric W. Biederman
  2018-05-24 22:30     ` Christian Brauner
  0 siblings, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2018-05-23 23:41 UTC (permalink / raw)
  To: Linux Containers
  Cc: linux-fsdevel, Seth Forshee, Serge E. Hallyn, Christian Brauner,
	linux-kernel


Allow users with CAP_SYS_CHOWN over the superblock of a filesystem to
chown files when inode owner is invalid.  Ordinarily the
capable_wrt_inode_uidgid check is sufficient to allow access to files
but when the underlying filesystem has uids or gids that don't map to
the current user namespace it is not enough, so the chown permission
checks need to be extended to allow this case.

Calling chown on filesystem nodes whose uid or gid don't map is
necessary if those nodes are going to be modified as writing back
inodes which contain uids or gids that don't map is likely to cause
filesystem corruption of the uid or gid fields.

Once chown has been called the existing capable_wrt_inode_uidgid
checks are sufficient to allow the owner of a superblock to do anything
the global root user can do with an appropriate set of capabilities.

An ordinary filesystem mountable by a userns root will limit all uids
and gids in s_user_ns or the INVALID_UID and INVALID_GID to flag all
others.  So having this added permission limited to just INVALID_UID
and INVALID_GID is sufficient to handle every case on an ordinary filesystem.

Of the virtual filesystems at least proc is known to set s_user_ns to
something other than &init_user_ns, while at the same time presenting
some files owned by GLOBAL_ROOT_UID.  Those files the mounter of proc
in a user namespace should not be able to chown to get access to.
Limiting the relaxation in permission to just the minimum of allowing
changing INVALID_UID and INVALID_GID prevents problems with cases like
that.

The original version of this patch was written by: Seth Forshee.  I
have rewritten and rethought this patch enough so it's really not the
same thing (certainly it needs a different description), but he
deserves credit for getting out there and getting the conversation
started, and finding the potential gotcha's and putting up with my
semi-paranoid feedback.

Inspired-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---

Sigh.  In simplifying this change so it would not require a change to
proc (or any other similar filesystem) I accidentally introduced some
badly placed semicolons.  The kbuild test robot was very nice and found
those for me.  Resend with those unnecessary semicolons removed.

 fs/attr.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index 12ffdb6fb63c..d0b4d34878fb 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -18,6 +18,32 @@
 #include <linux/evm.h>
 #include <linux/ima.h>
 
+static bool chown_ok(const struct inode *inode, kuid_t uid)
+{
+	if (uid_eq(current_fsuid(), inode->i_uid) &&
+	    uid_eq(uid, inode->i_uid))
+		return true;
+	if (capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+		return true;
+	if (uid_eq(inode->i_uid, INVALID_UID) &&
+	    ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
+		return true;
+	return false;
+}
+
+static bool chgrp_ok(const struct inode *inode, kgid_t gid)
+{
+	if (uid_eq(current_fsuid(), inode->i_uid) &&
+	    (in_group_p(gid) || gid_eq(gid, inode->i_gid)))
+		return true;
+	if (capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+		return true;
+	if (gid_eq(inode->i_gid, INVALID_GID) &&
+	    ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
+		return true;
+	return false;
+}
+
 /**
  * setattr_prepare - check if attribute changes to a dentry are allowed
  * @dentry:	dentry to check
@@ -52,17 +78,11 @@ int setattr_prepare(struct dentry *dentry, struct iattr *attr)
 		goto kill_priv;
 
 	/* Make sure a caller can chown. */
-	if ((ia_valid & ATTR_UID) &&
-	    (!uid_eq(current_fsuid(), inode->i_uid) ||
-	     !uid_eq(attr->ia_uid, inode->i_uid)) &&
-	    !capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+	if ((ia_valid & ATTR_UID) && !chown_ok(inode, attr->ia_uid))
 		return -EPERM;
 
 	/* Make sure caller can chgrp. */
-	if ((ia_valid & ATTR_GID) &&
-	    (!uid_eq(current_fsuid(), inode->i_uid) ||
-	    (!in_group_p(attr->ia_gid) && !gid_eq(attr->ia_gid, inode->i_gid))) &&
-	    !capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+	if ((ia_valid & ATTR_GID) && !chgrp_ok(inode, attr->ia_gid))
 		return -EPERM;
 
 	/* Make sure a caller can chmod. */

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 1/6] vfs: Don't allow changing the link count of an inode with an invalid uid or gid
  2018-05-23 23:25 ` [REVIEW][PATCH 1/6] vfs: Don't allow changing the link count of an inode with an invalid uid or gid Eric W. Biederman
@ 2018-05-24 12:58   ` Seth Forshee
  2018-05-24 22:30     ` Christian Brauner
  0 siblings, 1 reply; 29+ messages in thread
From: Seth Forshee @ 2018-05-24 12:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Serge E. Hallyn,
	Christian Brauner, linux-kernel

On Wed, May 23, 2018 at 06:25:33PM -0500, Eric W. Biederman wrote:
> Changing the link count of an inode via unlink or link will cause a
> write back of that inode.  If the uids or gids are invalid (aka not known
> to the kernel) writing the inode back may change the uid or gid in the
> filesystem.   To prevent possible filesystem and to avoid the need for
> filesystem maintainers to worry about it don't allow operations on
> inodes with an invalid uid or gid.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Seth Forshee <seth.forshee@canonical.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 2/6] vfs: Allow userns root to call mknod on owned filesystems.
  2018-05-23 23:25 ` [REVIEW][PATCH 2/6] vfs: Allow userns root to call mknod on owned filesystems Eric W. Biederman
@ 2018-05-24 13:55   ` Seth Forshee
  2018-05-24 16:55     ` Eric W. Biederman
  2018-05-24 19:12   ` Christian Brauner
  1 sibling, 1 reply; 29+ messages in thread
From: Seth Forshee @ 2018-05-24 13:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Serge E. Hallyn,
	Christian Brauner, linux-kernel

On Wed, May 23, 2018 at 06:25:34PM -0500, Eric W. Biederman wrote:
> These filesystems already always set SB_I_NODEV so mknod will not be
> useful for gaining control of any devices no matter their permissions.
> This will allow overlayfs and applications to fakeroot to use device
> nodes to represent things on disk.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

For a normal filesystem this does seem safe enough.

However, I'd also like to see us allow unprivileged mounting for
overlayfs, and there we need to worry about whether this would allow a
mknod in an underlying filesystem which should not be allowed. That
mknod will be subject to this same check in the underlying filesystem
using the credentials of the user that mounted the overaly fs, which
should be sufficient to ensure that the mknod is permitted.

Thus this looks okay to me.

Acked-by: Seth Forshee <seth.forshee@canonical.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 5/6] capabilities: Allow privileged user in s_user_ns to set security.* xattrs
  2018-05-23 23:25 ` [REVIEW][PATCH 5/6] capabilities: Allow privileged user in s_user_ns to set security.* xattrs Eric W. Biederman
@ 2018-05-24 15:57   ` Christian Brauner
  0 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2018-05-24 15:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Seth Forshee, Serge E. Hallyn,
	linux-kernel

On Wed, May 23, 2018 at 06:25:37PM -0500, Eric W. Biederman wrote:
> A privileged user in s_user_ns will generally have the ability to
> manipulate the backing store and insert security.* xattrs into
> the filesystem directly. Therefore the kernel must be prepared to
> handle these xattrs from unprivileged mounts, and it makes little
> sense for commoncap to prevent writing these xattrs to the
> filesystem. The capability and LSM code have already been updated
> to appropriately handle xattrs from unprivileged mounts, so it
> is safe to loosen this restriction on setting xattrs.
> 
> The exception to this logic is that writing xattrs to a mounted
> filesystem may also cause the LSM inode_post_setxattr or
> inode_setsecurity callbacks to be invoked. SELinux will deny the
> xattr update by virtue of applying mountpoint labeling to
> unprivileged userns mounts, and Smack will deny the writes for
> any user without global CAP_MAC_ADMIN, so loosening the
> capability check in commoncap is safe in this respect as well.

Acked-by: Christian Brauner <christian@brauner.io>

> 
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

Note, I just talked to Serge. This should be Acked-by: Serge Hallyn <serge@hallyn.com>

> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  security/commoncap.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 1ce701fcb3f3..f4c33abd9959 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -919,6 +919,8 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
>  int cap_inode_setxattr(struct dentry *dentry, const char *name,
>  		       const void *value, size_t size, int flags)
>  {
> +	struct user_namespace *user_ns = dentry->d_sb->s_user_ns;
> +
>  	/* Ignore non-security xattrs */
>  	if (strncmp(name, XATTR_SECURITY_PREFIX,
>  			sizeof(XATTR_SECURITY_PREFIX) - 1) != 0)
> @@ -931,7 +933,7 @@ int cap_inode_setxattr(struct dentry *dentry, const char *name,
>  	if (strcmp(name, XATTR_NAME_CAPS) == 0)
>  		return 0;
>  
> -	if (!capable(CAP_SYS_ADMIN))
> +	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
>  		return -EPERM;
>  	return 0;
>  }
> @@ -949,6 +951,8 @@ int cap_inode_setxattr(struct dentry *dentry, const char *name,
>   */
>  int cap_inode_removexattr(struct dentry *dentry, const char *name)
>  {
> +	struct user_namespace *user_ns = dentry->d_sb->s_user_ns;
> +
>  	/* Ignore non-security xattrs */
>  	if (strncmp(name, XATTR_SECURITY_PREFIX,
>  			sizeof(XATTR_SECURITY_PREFIX) - 1) != 0)
> @@ -964,7 +968,7 @@ int cap_inode_removexattr(struct dentry *dentry, const char *name)
>  		return 0;
>  	}
>  
> -	if (!capable(CAP_SYS_ADMIN))
> +	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
>  		return -EPERM;
>  	return 0;
>  }
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 4/6] fs: Allow superblock owner to access do_remount_sb()
  2018-05-23 23:25 ` [REVIEW][PATCH 4/6] fs: Allow superblock owner to access do_remount_sb() Eric W. Biederman
@ 2018-05-24 15:58   ` Christian Brauner
  2018-05-24 16:45     ` Eric W. Biederman
  0 siblings, 1 reply; 29+ messages in thread
From: Christian Brauner @ 2018-05-24 15:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-kernel, Seth Forshee, linux-fsdevel

On Wed, May 23, 2018 at 06:25:36PM -0500, Eric W. Biederman wrote:
> Superblock level remounts are currently restricted to global
> CAP_SYS_ADMIN, as is the path for changing the root mount to
> read only on umount. Loosen both of these permission checks to
> also allow CAP_SYS_ADMIN in any namespace which is privileged
> towards the userns which originally mounted the filesystem.

Acked-by: Christian Brauner <christian@brauner.io>

> 
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

Note, I just talked to Serge. This should be Acked-by: Serge Hallyn <serge@hallyn.com>

> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  fs/namespace.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 5f75969adff1..8ddd14806799 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1590,7 +1590,7 @@ static int do_umount(struct mount *mnt, int flags)
>  		 * Special case for "unmounting" root ...
>  		 * we just try to remount it readonly.
>  		 */
> -		if (!capable(CAP_SYS_ADMIN))
> +		if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
>  			return -EPERM;
>  		down_write(&sb->s_umount);
>  		if (!sb_rdonly(sb))
> @@ -2333,7 +2333,7 @@ static int do_remount(struct path *path, int ms_flags, int sb_flags,
>  	down_write(&sb->s_umount);
>  	if (ms_flags & MS_BIND)
>  		err = change_mount_flags(path->mnt, ms_flags);
> -	else if (!capable(CAP_SYS_ADMIN))
> +	else if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
>  		err = -EPERM;
>  	else
>  		err = do_remount_sb(sb, sb_flags, data, 0);
> -- 
> 2.14.1
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 6/6] fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
  2018-05-23 23:25 ` [REVIEW][PATCH 6/6] fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems Eric W. Biederman
@ 2018-05-24 15:59   ` Christian Brauner
  0 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2018-05-24 15:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Seth Forshee, Serge E. Hallyn,
	linux-kernel

On Wed, May 23, 2018 at 06:25:38PM -0500, Eric W. Biederman wrote:
> From: Seth Forshee <seth.forshee@canonical.com>
> 
> The user in control of a super block should be allowed to freeze
> and thaw it. Relax the restrictions on the FIFREEZE and FITHAW
> ioctls to require CAP_SYS_ADMIN in s_user_ns.

Acked-by: Christian Brauner <christian@brauner.io>

> 
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  fs/ioctl.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 4823431d1c9d..b445b13fc59b 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -549,7 +549,7 @@ static int ioctl_fsfreeze(struct file *filp)
>  {
>  	struct super_block *sb = file_inode(filp)->i_sb;
>  
> -	if (!capable(CAP_SYS_ADMIN))
> +	if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
>  		return -EPERM;
>  
>  	/* If filesystem doesn't support freeze feature, return. */
> @@ -566,7 +566,7 @@ static int ioctl_fsthaw(struct file *filp)
>  {
>  	struct super_block *sb = file_inode(filp)->i_sb;
>  
> -	if (!capable(CAP_SYS_ADMIN))
> +	if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
>  		return -EPERM;
>  
>  	/* Thaw */
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 4/6] fs: Allow superblock owner to access do_remount_sb()
  2018-05-24 15:58   ` Christian Brauner
@ 2018-05-24 16:45     ` Eric W. Biederman
  2018-05-24 17:28       ` Christian Brauner
  0 siblings, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2018-05-24 16:45 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linux Containers, linux-kernel, Seth Forshee, linux-fsdevel

Christian Brauner <christian@brauner.io> writes:

> On Wed, May 23, 2018 at 06:25:36PM -0500, Eric W. Biederman wrote:
>> Superblock level remounts are currently restricted to global
>> CAP_SYS_ADMIN, as is the path for changing the root mount to
>> read only on umount. Loosen both of these permission checks to
>> also allow CAP_SYS_ADMIN in any namespace which is privileged
>> towards the userns which originally mounted the filesystem.
>
> Acked-by: Christian Brauner <christian@brauner.io>
>
>> 
>> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
>> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
>
> Note, I just talked to Serge. This should be Acked-by: Serge Hallyn <serge@hallyn.com>

Now you know how long these patches have been sitting waiting to get
merged.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 2/6] vfs: Allow userns root to call mknod on owned filesystems.
  2018-05-24 13:55   ` Seth Forshee
@ 2018-05-24 16:55     ` Eric W. Biederman
  2018-05-24 17:22       ` Seth Forshee
  0 siblings, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2018-05-24 16:55 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Linux Containers, linux-fsdevel, Serge E. Hallyn,
	Christian Brauner, linux-kernel

Seth Forshee <seth.forshee@canonical.com> writes:

> On Wed, May 23, 2018 at 06:25:34PM -0500, Eric W. Biederman wrote:
>> These filesystems already always set SB_I_NODEV so mknod will not be
>> useful for gaining control of any devices no matter their permissions.
>> This will allow overlayfs and applications to fakeroot to use device
>> nodes to represent things on disk.
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> For a normal filesystem this does seem safe enough.
>
> However, I'd also like to see us allow unprivileged mounting for
> overlayfs, and there we need to worry about whether this would allow a
> mknod in an underlying filesystem which should not be allowed. That
> mknod will be subject to this same check in the underlying filesystem
> using the credentials of the user that mounted the overaly fs, which
> should be sufficient to ensure that the mknod is permitted.

Sufficient to ensure the mknod is not permitted on the underlying
filesystem.  I believe you mean.

> Thus this looks okay to me.
>
> Acked-by: Seth Forshee <seth.forshee@canonical.com>

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 2/6] vfs: Allow userns root to call mknod on owned filesystems.
  2018-05-24 16:55     ` Eric W. Biederman
@ 2018-05-24 17:22       ` Seth Forshee
  0 siblings, 0 replies; 29+ messages in thread
From: Seth Forshee @ 2018-05-24 17:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Serge E. Hallyn,
	Christian Brauner, linux-kernel

On Thu, May 24, 2018 at 11:55:45AM -0500, Eric W. Biederman wrote:
> Seth Forshee <seth.forshee@canonical.com> writes:
> 
> > On Wed, May 23, 2018 at 06:25:34PM -0500, Eric W. Biederman wrote:
> >> These filesystems already always set SB_I_NODEV so mknod will not be
> >> useful for gaining control of any devices no matter their permissions.
> >> This will allow overlayfs and applications to fakeroot to use device
> >> nodes to represent things on disk.
> >> 
> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> >
> > For a normal filesystem this does seem safe enough.
> >
> > However, I'd also like to see us allow unprivileged mounting for
> > overlayfs, and there we need to worry about whether this would allow a
> > mknod in an underlying filesystem which should not be allowed. That
> > mknod will be subject to this same check in the underlying filesystem
> > using the credentials of the user that mounted the overaly fs, which
> > should be sufficient to ensure that the mknod is permitted.
> 
> Sufficient to ensure the mknod is not permitted on the underlying
> filesystem.  I believe you mean.

Right, or in other words with the relaxed capability check a user still
could not use an overlayfs mount in a user namespace to mknod in a
filesystem when that user couldn't otherwise mknod in that filesystem.
Sorry if I wasn't clear.

> 
> > Thus this looks okay to me.
> >
> > Acked-by: Seth Forshee <seth.forshee@canonical.com>
> 
> Eric
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 4/6] fs: Allow superblock owner to access do_remount_sb()
  2018-05-24 16:45     ` Eric W. Biederman
@ 2018-05-24 17:28       ` Christian Brauner
  0 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2018-05-24 17:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-kernel, Seth Forshee, linux-fsdevel

On Thu, May 24, 2018 at 11:45:06AM -0500, Eric W. Biederman wrote:
> Christian Brauner <christian@brauner.io> writes:
> 
> > On Wed, May 23, 2018 at 06:25:36PM -0500, Eric W. Biederman wrote:
> >> Superblock level remounts are currently restricted to global
> >> CAP_SYS_ADMIN, as is the path for changing the root mount to
> >> read only on umount. Loosen both of these permission checks to
> >> also allow CAP_SYS_ADMIN in any namespace which is privileged
> >> towards the userns which originally mounted the filesystem.
> >
> > Acked-by: Christian Brauner <christian@brauner.io>
> >
> >> 
> >> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> >> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> >> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
> >
> > Note, I just talked to Serge. This should be Acked-by: Serge Hallyn <serge@hallyn.com>
> 
> Now you know how long these patches have been sitting waiting to get
> merged.

Indeed. :)

Christian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 2/6] vfs: Allow userns root to call mknod on owned filesystems.
  2018-05-23 23:25 ` [REVIEW][PATCH 2/6] vfs: Allow userns root to call mknod on owned filesystems Eric W. Biederman
  2018-05-24 13:55   ` Seth Forshee
@ 2018-05-24 19:12   ` Christian Brauner
  1 sibling, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2018-05-24 19:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Seth Forshee, Serge E. Hallyn,
	linux-kernel

On Wed, May 23, 2018 at 06:25:34PM -0500, Eric W. Biederman wrote:
> These filesystems already always set SB_I_NODEV so mknod will not be
> useful for gaining control of any devices no matter their permissions.
> This will allow overlayfs and applications to fakeroot to use device
> nodes to represent things on disk.

Excellent.

Acked-by: Christian Brauner <christian@brauner.io>

> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  fs/namei.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 942c1f096f6b..20335896dcce 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3679,7 +3679,8 @@ int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
>  	if (error)
>  		return error;
>  
> -	if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
> +	if ((S_ISCHR(mode) || S_ISBLK(mode)) &&
> +	    !ns_capable(dentry->d_sb->s_user_ns, CAP_MKNOD))
>  		return -EPERM;
>  
>  	if (!dir->i_op->mknod)
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts
  2018-05-23 23:22 [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts Eric W. Biederman
                   ` (5 preceding siblings ...)
  2018-05-23 23:25 ` [REVIEW][PATCH 6/6] fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems Eric W. Biederman
@ 2018-05-24 21:46 ` Theodore Y. Ts'o
  2018-05-24 23:23   ` Eric W. Biederman
  2018-05-29 15:40 ` Dongsu Park
  7 siblings, 1 reply; 29+ messages in thread
From: Theodore Y. Ts'o @ 2018-05-24 21:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Seth Forshee, Serge E. Hallyn,
	Christian Brauner, linux-kernel

On Wed, May 23, 2018 at 06:22:56PM -0500, Eric W. Biederman wrote:
> 
> Very slowly the work has been progressing to ensure the vfs has the
> necessary support for mounting filesystems without privilege.

What's the thinking behind how system administrators and/or file
systems would configure whether or not a particular file system type
will be allowed to be mounted w/o privilege?

					- Ted

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH v2 3/6] fs: Allow superblock owner to replace invalid owners of inodes
  2018-05-23 23:41   ` [REVIEW][PATCH v2 " Eric W. Biederman
@ 2018-05-24 22:30     ` Christian Brauner
  0 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2018-05-24 22:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Seth Forshee, Serge E. Hallyn,
	linux-kernel

On Wed, May 23, 2018 at 06:41:29PM -0500, Eric W. Biederman wrote:
> 
> Allow users with CAP_SYS_CHOWN over the superblock of a filesystem to
> chown files when inode owner is invalid.  Ordinarily the
> capable_wrt_inode_uidgid check is sufficient to allow access to files
> but when the underlying filesystem has uids or gids that don't map to
> the current user namespace it is not enough, so the chown permission
> checks need to be extended to allow this case.
> 
> Calling chown on filesystem nodes whose uid or gid don't map is
> necessary if those nodes are going to be modified as writing back
> inodes which contain uids or gids that don't map is likely to cause
> filesystem corruption of the uid or gid fields.
> 
> Once chown has been called the existing capable_wrt_inode_uidgid
> checks are sufficient to allow the owner of a superblock to do anything
> the global root user can do with an appropriate set of capabilities.
> 
> An ordinary filesystem mountable by a userns root will limit all uids
> and gids in s_user_ns or the INVALID_UID and INVALID_GID to flag all
> others.  So having this added permission limited to just INVALID_UID
> and INVALID_GID is sufficient to handle every case on an ordinary filesystem.
> 
> Of the virtual filesystems at least proc is known to set s_user_ns to
> something other than &init_user_ns, while at the same time presenting
> some files owned by GLOBAL_ROOT_UID.  Those files the mounter of proc
> in a user namespace should not be able to chown to get access to.
> Limiting the relaxation in permission to just the minimum of allowing
> changing INVALID_UID and INVALID_GID prevents problems with cases like
> that.
> 
> The original version of this patch was written by: Seth Forshee.  I
> have rewritten and rethought this patch enough so it's really not the
> same thing (certainly it needs a different description), but he
> deserves credit for getting out there and getting the conversation
> started, and finding the potential gotcha's and putting up with my
> semi-paranoid feedback.

Ok, took me a little longer to reason about this.

Acked-by: Christian Brauner <christian@brauner.io>

> 
> Inspired-by: Seth Forshee <seth.forshee@canonical.com>
> Acked-by: Seth Forshee <seth.forshee@canonical.com>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
> 
> Sigh.  In simplifying this change so it would not require a change to
> proc (or any other similar filesystem) I accidentally introduced some
> badly placed semicolons.  The kbuild test robot was very nice and found
> those for me.  Resend with those unnecessary semicolons removed.
> 
>  fs/attr.c | 36 ++++++++++++++++++++++++++++--------
>  1 file changed, 28 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/attr.c b/fs/attr.c
> index 12ffdb6fb63c..d0b4d34878fb 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -18,6 +18,32 @@
>  #include <linux/evm.h>
>  #include <linux/ima.h>
>  
> +static bool chown_ok(const struct inode *inode, kuid_t uid)
> +{
> +	if (uid_eq(current_fsuid(), inode->i_uid) &&
> +	    uid_eq(uid, inode->i_uid))
> +		return true;
> +	if (capable_wrt_inode_uidgid(inode, CAP_CHOWN))
> +		return true;
> +	if (uid_eq(inode->i_uid, INVALID_UID) &&
> +	    ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
> +		return true;
> +	return false;
> +}
> +
> +static bool chgrp_ok(const struct inode *inode, kgid_t gid)
> +{
> +	if (uid_eq(current_fsuid(), inode->i_uid) &&
> +	    (in_group_p(gid) || gid_eq(gid, inode->i_gid)))
> +		return true;
> +	if (capable_wrt_inode_uidgid(inode, CAP_CHOWN))
> +		return true;
> +	if (gid_eq(inode->i_gid, INVALID_GID) &&
> +	    ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
> +		return true;
> +	return false;
> +}
> +
>  /**
>   * setattr_prepare - check if attribute changes to a dentry are allowed
>   * @dentry:	dentry to check
> @@ -52,17 +78,11 @@ int setattr_prepare(struct dentry *dentry, struct iattr *attr)
>  		goto kill_priv;
>  
>  	/* Make sure a caller can chown. */
> -	if ((ia_valid & ATTR_UID) &&
> -	    (!uid_eq(current_fsuid(), inode->i_uid) ||
> -	     !uid_eq(attr->ia_uid, inode->i_uid)) &&
> -	    !capable_wrt_inode_uidgid(inode, CAP_CHOWN))
> +	if ((ia_valid & ATTR_UID) && !chown_ok(inode, attr->ia_uid))
>  		return -EPERM;
>  
>  	/* Make sure caller can chgrp. */
> -	if ((ia_valid & ATTR_GID) &&
> -	    (!uid_eq(current_fsuid(), inode->i_uid) ||
> -	    (!in_group_p(attr->ia_gid) && !gid_eq(attr->ia_gid, inode->i_gid))) &&
> -	    !capable_wrt_inode_uidgid(inode, CAP_CHOWN))
> +	if ((ia_valid & ATTR_GID) && !chgrp_ok(inode, attr->ia_gid))
>  		return -EPERM;
>  
>  	/* Make sure a caller can chmod. */

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 1/6] vfs: Don't allow changing the link count of an inode with an invalid uid or gid
  2018-05-24 12:58   ` Seth Forshee
@ 2018-05-24 22:30     ` Christian Brauner
  0 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2018-05-24 22:30 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Eric W. Biederman, Linux Containers, linux-fsdevel,
	Serge E. Hallyn, linux-kernel

On Thu, May 24, 2018 at 07:58:32AM -0500, Seth Forshee wrote:
> On Wed, May 23, 2018 at 06:25:33PM -0500, Eric W. Biederman wrote:
> > Changing the link count of an inode via unlink or link will cause a
> > write back of that inode.  If the uids or gids are invalid (aka not known
> > to the kernel) writing the inode back may change the uid or gid in the
> > filesystem.   To prevent possible filesystem and to avoid the need for
> > filesystem maintainers to worry about it don't allow operations on
> > inodes with an invalid uid or gid.
> > 
> > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> Acked-by: Seth Forshee <seth.forshee@canonical.com>

Acked-by: Christian Brauner <christian@brauner.io>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts
  2018-05-24 21:46 ` [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts Theodore Y. Ts'o
@ 2018-05-24 23:23   ` Eric W. Biederman
  2018-05-25  3:57     ` Dave Chinner
  0 siblings, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2018-05-24 23:23 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Linux Containers, linux-fsdevel, Seth Forshee, Serge E. Hallyn,
	Christian Brauner, linux-kernel

"Theodore Y. Ts'o" <tytso@mit.edu> writes:

> On Wed, May 23, 2018 at 06:22:56PM -0500, Eric W. Biederman wrote:
>> 
>> Very slowly the work has been progressing to ensure the vfs has the
>> necessary support for mounting filesystems without privilege.
>
> What's the thinking behind how system administrators and/or file
> systems would configure whether or not a particular file system type
> will be allowed to be mounted w/o privilege?

The mechanism is .fs_flags in file_system_type.   If the FS_USERNS_MOUNT
flag is set then root in a user namespace (AKA an unprivileged user)
will be allowed to mount to mount the filesystem.

There are very real concerns about attacking a filesystem with an
invalid filesystem image, or by a malicious protocol speaker.  So I
don't want to enable anything without the file system maintainers
consent and without a reasonable expecation that neither a system wide
denial of service attack nor a privilege escalation attack is possible
from if the filesystem is enabled.

So at a practical level what we have in the vfs is the non-fuse specific
bits that enable unprivileged mounts of fuse.  Things like handling
of unmapped uid and gids, how normally trusted xattrs are dealt with,
etc.

A big practical one for me is that if either the uid or gid is not
mapped the vfs avoids writing to the inode.

Right now my practical goal is to be able to say: "Go run your
filesystem in userspace with fuse if you want stronger security
guarantees."  I think that will be enough to make removable media
reasonably safe from privilege escalation attacks.

There is enough code in most filesystems that I don't know what our
chances of locking down very many of them are.  But I figure a few more
of them are possible.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts
  2018-05-24 23:23   ` Eric W. Biederman
@ 2018-05-25  3:57     ` Dave Chinner
  2018-05-25  4:06       ` Darrick J. Wong
  2018-05-29 13:12       ` Eric W. Biederman
  0 siblings, 2 replies; 29+ messages in thread
From: Dave Chinner @ 2018-05-25  3:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Theodore Y. Ts'o, Linux Containers, linux-fsdevel,
	Seth Forshee, Serge E. Hallyn, Christian Brauner, linux-kernel

On Thu, May 24, 2018 at 06:23:30PM -0500, Eric W. Biederman wrote:
> "Theodore Y. Ts'o" <tytso@mit.edu> writes:
> 
> > On Wed, May 23, 2018 at 06:22:56PM -0500, Eric W. Biederman wrote:
> >> 
> >> Very slowly the work has been progressing to ensure the vfs has the
> >> necessary support for mounting filesystems without privilege.
> >
> > What's the thinking behind how system administrators and/or file
> > systems would configure whether or not a particular file system type
> > will be allowed to be mounted w/o privilege?
> 
> The mechanism is .fs_flags in file_system_type.   If the FS_USERNS_MOUNT
> flag is set then root in a user namespace (AKA an unprivileged user)
> will be allowed to mount to mount the filesystem.
> 
> There are very real concerns about attacking a filesystem with an
> invalid filesystem image, or by a malicious protocol speaker.  So I
> don't want to enable anything without the file system maintainers
> consent and without a reasonable expecation that neither a system wide
> denial of service attack nor a privilege escalation attack is possible
> from if the filesystem is enabled.
> 
> So at a practical level what we have in the vfs is the non-fuse specific
> bits that enable unprivileged mounts of fuse.  Things like handling
> of unmapped uid and gids, how normally trusted xattrs are dealt with,
> etc.
> 
> A big practical one for me is that if either the uid or gid is not
> mapped the vfs avoids writing to the inode.
> 
> Right now my practical goal is to be able to say: "Go run your
> filesystem in userspace with fuse if you want stronger security
> guarantees."  I think that will be enough to make removable media
> reasonably safe from privilege escalation attacks.
> 
> There is enough code in most filesystems that I don't know what our
> chances of locking down very many of them are.  But I figure a few more
> of them are possible.

I'm not sure we need to - fusefs-lkl gives users the ability to
mount any of the kernel filesystems via fuse without us needing to
support unprivileged kernel mounts for those filesystems.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts
  2018-05-25  3:57     ` Dave Chinner
@ 2018-05-25  4:06       ` Darrick J. Wong
  2018-05-29 13:12       ` Eric W. Biederman
  1 sibling, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2018-05-25  4:06 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Eric W. Biederman, Theodore Y. Ts'o, Linux Containers,
	linux-fsdevel, Seth Forshee, Serge E. Hallyn, Christian Brauner,
	linux-kernel

On Fri, May 25, 2018 at 01:57:16PM +1000, Dave Chinner wrote:
> On Thu, May 24, 2018 at 06:23:30PM -0500, Eric W. Biederman wrote:
> > "Theodore Y. Ts'o" <tytso@mit.edu> writes:
> > 
> > > On Wed, May 23, 2018 at 06:22:56PM -0500, Eric W. Biederman wrote:
> > >> 
> > >> Very slowly the work has been progressing to ensure the vfs has the
> > >> necessary support for mounting filesystems without privilege.
> > >
> > > What's the thinking behind how system administrators and/or file
> > > systems would configure whether or not a particular file system type
> > > will be allowed to be mounted w/o privilege?
> > 
> > The mechanism is .fs_flags in file_system_type.   If the FS_USERNS_MOUNT
> > flag is set then root in a user namespace (AKA an unprivileged user)
> > will be allowed to mount to mount the filesystem.
> > 
> > There are very real concerns about attacking a filesystem with an
> > invalid filesystem image, or by a malicious protocol speaker.  So I
> > don't want to enable anything without the file system maintainers
> > consent and without a reasonable expecation that neither a system wide
> > denial of service attack nor a privilege escalation attack is possible
> > from if the filesystem is enabled.
> > 
> > So at a practical level what we have in the vfs is the non-fuse specific
> > bits that enable unprivileged mounts of fuse.  Things like handling
> > of unmapped uid and gids, how normally trusted xattrs are dealt with,
> > etc.
> > 
> > A big practical one for me is that if either the uid or gid is not
> > mapped the vfs avoids writing to the inode.
> > 
> > Right now my practical goal is to be able to say: "Go run your
> > filesystem in userspace with fuse if you want stronger security
> > guarantees."  I think that will be enough to make removable media
> > reasonably safe from privilege escalation attacks.
> > 
> > There is enough code in most filesystems that I don't know what our
> > chances of locking down very many of them are.  But I figure a few more
> > of them are possible.
> 
> I'm not sure we need to - fusefs-lkl gives users the ability to
> mount any of the kernel filesystems via fuse without us needing to
> support unprivileged kernel mounts for those filesystems.

/me wonders, is there a fusefs-lkl package for Linux?

(He says, knowing that freebsd has one... :))

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts
  2018-05-25  3:57     ` Dave Chinner
  2018-05-25  4:06       ` Darrick J. Wong
@ 2018-05-29 13:12       ` Eric W. Biederman
  2018-05-29 22:17         ` Dave Chinner
  1 sibling, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2018-05-29 13:12 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Theodore Y. Ts'o, Linux Containers, linux-fsdevel,
	Seth Forshee, Serge E. Hallyn, Christian Brauner, linux-kernel

Dave Chinner <david@fromorbit.com> writes:

> On Thu, May 24, 2018 at 06:23:30PM -0500, Eric W. Biederman wrote:
>> "Theodore Y. Ts'o" <tytso@mit.edu> writes:
>> 
>> > On Wed, May 23, 2018 at 06:22:56PM -0500, Eric W. Biederman wrote:
>> >> 
>> >> Very slowly the work has been progressing to ensure the vfs has the
>> >> necessary support for mounting filesystems without privilege.
>> >
>> > What's the thinking behind how system administrators and/or file
>> > systems would configure whether or not a particular file system type
>> > will be allowed to be mounted w/o privilege?
>> 
>> The mechanism is .fs_flags in file_system_type.   If the FS_USERNS_MOUNT
>> flag is set then root in a user namespace (AKA an unprivileged user)
>> will be allowed to mount to mount the filesystem.
>> 
>> There are very real concerns about attacking a filesystem with an
>> invalid filesystem image, or by a malicious protocol speaker.  So I
>> don't want to enable anything without the file system maintainers
>> consent and without a reasonable expecation that neither a system wide
>> denial of service attack nor a privilege escalation attack is possible
>> from if the filesystem is enabled.
>> 
>> So at a practical level what we have in the vfs is the non-fuse specific
>> bits that enable unprivileged mounts of fuse.  Things like handling
>> of unmapped uid and gids, how normally trusted xattrs are dealt with,
>> etc.
>> 
>> A big practical one for me is that if either the uid or gid is not
>> mapped the vfs avoids writing to the inode.
>> 
>> Right now my practical goal is to be able to say: "Go run your
>> filesystem in userspace with fuse if you want stronger security
>> guarantees."  I think that will be enough to make removable media
>> reasonably safe from privilege escalation attacks.
>> 
>> There is enough code in most filesystems that I don't know what our
>> chances of locking down very many of them are.  But I figure a few more
>> of them are possible.
>
> I'm not sure we need to - fusefs-lkl gives users the ability to
> mount any of the kernel filesystems via fuse without us needing to
> support unprivileged kernel mounts for those filesystems.

Maybe.

That certainly seems like a good proof of concept for running
ordinary filesystems with fuse.  If we are going to rely on it
someone probably needs to do the work to merge arch/lkl into the
main tree.  My quick look suggests that the lkl port lags behind
a little bit and has just made it to 4.16.

Is fusefs-lkl valuable for testing filesystems?  If xfs-tests were to
have a mode that used that used the fuse protocol for testing and
fuzzing filesystems without the full weight of the kernel in the middle
that might encourage people to suppor this kind of things as well.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts
  2018-05-23 23:22 [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts Eric W. Biederman
                   ` (6 preceding siblings ...)
  2018-05-24 21:46 ` [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts Theodore Y. Ts'o
@ 2018-05-29 15:40 ` Dongsu Park
  7 siblings, 0 replies; 29+ messages in thread
From: Dongsu Park @ 2018-05-29 15:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-fsdevel, Seth Forshee, LKML, Christian Brauner

Hi,

On Thu, May 24, 2018 at 1:22 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Very slowly the work has been progressing to ensure the vfs has the
> necessary support for mounting filesystems without privilege.
>
> This patchset contains one more core piece of that work, ensuring a few
> more operations that would write back an inode and confuse an exisiting
> filesystem are denied.
>
> The rest of the changes actually enable userns root to do things with
> filesystems that the userns root has mounted.  Most of these have been
> waiting in the wings a long time, held back because I wanted the core
> of the patchset to be solid before I started allowing additional
> behavor.
>
> It is definitely time for these changes so the effect of s_user_ns
> becomes less theoretical.
>
> The change to allow mknod is new, but consistent with everything else
> and harmless as device nodes on filesystems mounted without privilege
> are ignored.
>
> Unless problems show up in the during review I plan to merge these changes.

Thank you for the great work. I have been looking forward to seeing it.
I have just gathered available relevant patches in my branch:

https://github.com/kinvolk/linux/tree/dongsu/fuse-userns-for-4.18

With this branch, I tested sshfs/fuse from non-init user namespace.
It works fine as expected. So you can add:

Tested-by: Dongsu Park <dongsu@kinvolk.io>

Thanks!
Dongsu

> These changes are also available at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git userns-test
>
> Eric W. Biederman (5):
>       vfs: Don't allow changing the link count of an inode with an invalid uid or gid
>       vfs: Allow userns root to call mknod on owned filesystems.
>       fs: Allow superblock owner to replace invalid owners of inodes
>       fs: Allow superblock owner to access do_remount_sb()
>       capabilities: Allow privileged user in s_user_ns to set security.* xattrs
>
> Seth Forshee (1):
>       fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
>
>  fs/attr.c            | 36 ++++++++++++++++++++++++++++--------
>  fs/ioctl.c           |  4 ++--
>  fs/namei.c           | 16 ++++++++++++----
>  fs/namespace.c       |  4 ++--
>  security/commoncap.c |  8 ++++++--
>  5 files changed, 50 insertions(+), 18 deletions(-)
>
> Eric
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts
  2018-05-29 13:12       ` Eric W. Biederman
@ 2018-05-29 22:17         ` Dave Chinner
  2018-05-30  2:34           ` Eric W. Biederman
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2018-05-29 22:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Theodore Y. Ts'o, Linux Containers, linux-fsdevel,
	Seth Forshee, Serge E. Hallyn, Christian Brauner, linux-kernel

On Tue, May 29, 2018 at 08:12:28AM -0500, Eric W. Biederman wrote:
> Dave Chinner <david@fromorbit.com> writes:
> 
> > On Thu, May 24, 2018 at 06:23:30PM -0500, Eric W. Biederman wrote:
> >> "Theodore Y. Ts'o" <tytso@mit.edu> writes:
> >> 
> >> > On Wed, May 23, 2018 at 06:22:56PM -0500, Eric W. Biederman wrote:
> >> >> 
> >> >> Very slowly the work has been progressing to ensure the vfs has the
> >> >> necessary support for mounting filesystems without privilege.
> >> >
> >> > What's the thinking behind how system administrators and/or file
> >> > systems would configure whether or not a particular file system type
> >> > will be allowed to be mounted w/o privilege?
> >> 
> >> The mechanism is .fs_flags in file_system_type.   If the FS_USERNS_MOUNT
> >> flag is set then root in a user namespace (AKA an unprivileged user)
> >> will be allowed to mount to mount the filesystem.
> >> 
> >> There are very real concerns about attacking a filesystem with an
> >> invalid filesystem image, or by a malicious protocol speaker.  So I
> >> don't want to enable anything without the file system maintainers
> >> consent and without a reasonable expecation that neither a system wide
> >> denial of service attack nor a privilege escalation attack is possible
> >> from if the filesystem is enabled.
> >> 
> >> So at a practical level what we have in the vfs is the non-fuse specific
> >> bits that enable unprivileged mounts of fuse.  Things like handling
> >> of unmapped uid and gids, how normally trusted xattrs are dealt with,
> >> etc.
> >> 
> >> A big practical one for me is that if either the uid or gid is not
> >> mapped the vfs avoids writing to the inode.
> >> 
> >> Right now my practical goal is to be able to say: "Go run your
> >> filesystem in userspace with fuse if you want stronger security
> >> guarantees."  I think that will be enough to make removable media
> >> reasonably safe from privilege escalation attacks.
> >> 
> >> There is enough code in most filesystems that I don't know what our
> >> chances of locking down very many of them are.  But I figure a few more
> >> of them are possible.
> >
> > I'm not sure we need to - fusefs-lkl gives users the ability to
> > mount any of the kernel filesystems via fuse without us needing to
> > support unprivileged kernel mounts for those filesystems.
> 
> Maybe.
> 
> That certainly seems like a good proof of concept for running
> ordinary filesystems with fuse.  If we are going to rely on it
> someone probably needs to do the work to merge arch/lkl into the
> main tree.  My quick look suggests that the lkl port lags behind
> a little bit and has just made it to 4.16.

Yeah, the are some fairly big process and policy things that need
to be decided here. Not just at the kernel level, but at distro and
app infrastructure level too.

I was originally sceptical of supporting kernel filesystems via lkl,
but the desire for unprivileged mounts has not gone away and so I'm
less worried about accessing filesystems that way than I am of
letting the kernel parse untrusted images from untrusted users...

I'm not sure what the correct forum for this is - wasn't this
something the Plumbers conference was supposed to facilitate?

> Is fusefs-lkl valuable for testing filesystems?  If xfs-tests were to
> have a mode that used that used the fuse protocol for testing and
> fuzzing filesystems without the full weight of the kernel in the middle
> that might encourage people to suppor this kind of things as well.

Getting lkl-fuse to run under fstests would be a great way to ensure
we have some level of confidence that it will do the right thing and
users can expect that it won't eat their data. I think this would
need to be a part of a recommendation for wider deploy of such a
solution...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts
  2018-05-29 22:17         ` Dave Chinner
@ 2018-05-30  2:34           ` Eric W. Biederman
  2018-05-30  4:34             ` Dave Chinner
  0 siblings, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2018-05-30  2:34 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Theodore Y. Ts'o, Linux Containers, linux-fsdevel,
	Seth Forshee, Serge E. Hallyn, Christian Brauner, linux-kernel

Dave Chinner <david@fromorbit.com> writes:

> Yeah, the are some fairly big process and policy things that need
> to be decided here. Not just at the kernel level, but at distro and
> app infrastructure level too.
>
> I was originally sceptical of supporting kernel filesystems via lkl,
> but the desire for unprivileged mounts has not gone away and so I'm
> less worried about accessing filesystems that way than I am of
> letting the kernel parse untrusted images from untrusted users...

There is also the more readily available libguestfs which doesn't
support as many filesystems but does seem available in most
linux distributions already.  It already has a fuse option available
with guestmount.  I may have to dig in there and see how to make
it available without using fusermount.

> I'm not sure what the correct forum for this is - wasn't this
> something the Plumbers conference was supposed to facilitate?

Yes.  If we all need to be in a room and talk about things.
It is early enough in the planning for Plumers that we could
definitely schedule a talk or a BOF for this.

>> Is fusefs-lkl valuable for testing filesystems?  If xfs-tests were to
>> have a mode that used that used the fuse protocol for testing and
>> fuzzing filesystems without the full weight of the kernel in the middle
>> that might encourage people to suppor this kind of things as well.
>
> Getting lkl-fuse to run under fstests would be a great way to ensure
> we have some level of confidence that it will do the right thing and
> users can expect that it won't eat their data. I think this would
> need to be a part of a recommendation for wider deploy of such a
> solution...

Good thought.  I will have to give that a look.  That does sound like a
good practical test.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts
  2018-05-30  2:34           ` Eric W. Biederman
@ 2018-05-30  4:34             ` Dave Chinner
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Chinner @ 2018-05-30  4:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Theodore Y. Ts'o, Linux Containers, linux-fsdevel,
	Seth Forshee, Serge E. Hallyn, Christian Brauner, linux-kernel

On Tue, May 29, 2018 at 09:34:35PM -0500, Eric W. Biederman wrote:
> Dave Chinner <david@fromorbit.com> writes:
> 
> > Yeah, the are some fairly big process and policy things that
> > need to be decided here. Not just at the kernel level, but at
> > distro and app infrastructure level too.
> >
> > I was originally sceptical of supporting kernel filesystems via
> > lkl, but the desire for unprivileged mounts has not gone away
> > and so I'm less worried about accessing filesystems that way
> > than I am of letting the kernel parse untrusted images from
> > untrusted users...
> 
> There is also the more readily available libguestfs which doesn't
> support as many filesystems but does seem available in most linux
> distributions already.  It already has a fuse option available
> with guestmount.  I may have to dig in there and see how to make
> it available without using fusermount.

That only provides host access to filesystems mounted inside guest
VMs, right?  AFAIA, libguestfs is not providing a FUSE
implementation that mounts and parses raw XFS images. e.g it barely
understands anything XFS, and that which it does is via running and
screen-scraping the output of XFS's userspace management tools...

> > I'm not sure what the correct forum for this is - wasn't this
> > something the Plumbers conference was supposed to facilitate?
> 
> Yes.  If we all need to be in a room and talk about things.
> It is early enough in the planning for Plumers that we could
> definitely schedule a talk or a BOF for this.

Ok. I have no idea if I'll be at plumbers - it's an awful long way
from where I am....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-05-30  4:34 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-23 23:22 [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts Eric W. Biederman
2018-05-23 23:25 ` [REVIEW][PATCH 1/6] vfs: Don't allow changing the link count of an inode with an invalid uid or gid Eric W. Biederman
2018-05-24 12:58   ` Seth Forshee
2018-05-24 22:30     ` Christian Brauner
2018-05-23 23:25 ` [REVIEW][PATCH 2/6] vfs: Allow userns root to call mknod on owned filesystems Eric W. Biederman
2018-05-24 13:55   ` Seth Forshee
2018-05-24 16:55     ` Eric W. Biederman
2018-05-24 17:22       ` Seth Forshee
2018-05-24 19:12   ` Christian Brauner
2018-05-23 23:25 ` [REVIEW][PATCH 3/6] fs: Allow superblock owner to replace invalid owners of inodes Eric W. Biederman
2018-05-23 23:41   ` [REVIEW][PATCH v2 " Eric W. Biederman
2018-05-24 22:30     ` Christian Brauner
2018-05-23 23:25 ` [REVIEW][PATCH 4/6] fs: Allow superblock owner to access do_remount_sb() Eric W. Biederman
2018-05-24 15:58   ` Christian Brauner
2018-05-24 16:45     ` Eric W. Biederman
2018-05-24 17:28       ` Christian Brauner
2018-05-23 23:25 ` [REVIEW][PATCH 5/6] capabilities: Allow privileged user in s_user_ns to set security.* xattrs Eric W. Biederman
2018-05-24 15:57   ` Christian Brauner
2018-05-23 23:25 ` [REVIEW][PATCH 6/6] fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems Eric W. Biederman
2018-05-24 15:59   ` Christian Brauner
2018-05-24 21:46 ` [REVIEW][PATCH 0/6] Wrapping up the vfs support for unprivileged mounts Theodore Y. Ts'o
2018-05-24 23:23   ` Eric W. Biederman
2018-05-25  3:57     ` Dave Chinner
2018-05-25  4:06       ` Darrick J. Wong
2018-05-29 13:12       ` Eric W. Biederman
2018-05-29 22:17         ` Dave Chinner
2018-05-30  2:34           ` Eric W. Biederman
2018-05-30  4:34             ` Dave Chinner
2018-05-29 15:40 ` Dongsu Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).