linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] fs: improve setgid stripping consistency even more
@ 2022-10-07 14:05 Christian Brauner
  2022-10-07 14:05 ` [PATCH v2 1/5] attr: add setattr_drop_sgid() Christian Brauner
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Christian Brauner @ 2022-10-07 14:05 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi, Darrick J . Wong
  Cc: Christian Brauner (Microsoft),
	Al Viro, Christoph Hellwig, Seth Forshee, Yang Xu, Filipe Manana,
	linux-unionfs, linux-fsdevel

From: "Christian Brauner (Microsoft)" <brauner@kernel.org>

Hey everyone,

A long while ago I found a few setgid inheritance bugs in overlayfs in
certain conditions. Amir recently picked this back up in
https://lore.kernel.org/linux-fsdevel/20221003123040.900827-1-amir73il@gmail.com
and I jumped on board to fix this more generally. This series should
make setgid stripping more consistent and fix the related overlayfs bugs.

Currently setgid stripping in file_remove_privs()'s should_remove_suid()
helper is inconsistent with other parts of the vfs. Specifically, it only
raises ATTR_KILL_SGID if the inode is S_ISGID and S_IXGRP but not if the
inode isn't in the caller's groups and the caller isn't privileged over the
inode although we require this already in setattr_prepare() and
setattr_copy() and so all filesystem implement this requirement implicitly
because they have to use setattr_{prepare,copy}() anyway.

But the inconsistency shows up in setgid stripping bugs for overlayfs in
xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
generic/687). For example, we test whether suid and setgid stripping works
correctly when performing various write-like operations as an unprivileged
user (fallocate, reflink, write, etc.):

echo "Test 1 - qa_user, non-exec file $verb"
setup_testfile
chmod a+rws $junk_file
commit_and_check "$qa_user" "$verb" 64k 64k

The test basically creates a file with 6666 permissions. While the file has
the S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. On a
regular filesystem like xfs what will happen is:

sys_fallocate()
-> vfs_fallocate()
   -> xfs_file_fallocate()
      -> file_modified()
         -> __file_remove_privs()
            -> dentry_needs_remove_privs()
               -> should_remove_suid()
            -> __remove_privs()
               newattrs.ia_valid = ATTR_FORCE | kill;
               -> notify_change()
                  -> setattr_copy()

In should_remove_suid() we can see that ATTR_KILL_SUID is raised
unconditionally because the file in the test has S_ISUID set.

But we also see that ATTR_KILL_SGID won't be set because while the file
is S_ISGID it is not S_IXGRP (see above) which is a condition for
ATTR_KILL_SGID being raised.

So by the time we call notify_change() we have attr->ia_valid set to
ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
ATTR_KILL_SUID is set and does:

ia_valid = attr->ia_valid |= ATTR_MODE
attr->ia_mode = (inode->i_mode & ~S_ISUID);

which means that when we call setattr_copy() later we will definitely
update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.

Now we call into the filesystem's ->setattr() inode operation which will
end up calling setattr_copy(). Since ATTR_MODE is set we will hit:

if (ia_valid & ATTR_MODE) {
        umode_t mode = attr->ia_mode;
        vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
        if (!vfsgid_in_group_p(vfsgid) &&
            !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                mode &= ~S_ISGID;
        inode->i_mode = mode;
}

and since the caller in the test is neither capable nor in the group of the
inode the S_ISGID bit is stripped.

But assume the file isn't suid then ATTR_KILL_SUID won't be raised which
has the consequence that neither the setgid nor the suid bits are stripped
even though it should be stripped because the inode isn't in the caller's
groups and the caller isn't privileged over the inode.

If overlayfs is in the mix things become a bit more complicated and the bug
shows up more clearly. When e.g., ovl_setattr() is hit from
ovl_fallocate()'s call to file_remove_privs() then ATTR_KILL_SUID and
ATTR_KILL_SGID might be raised but because the check in notify_change() is
questioning the ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be
stripped the S_ISGID bit isn't removed even though it should be stripped:

sys_fallocate()
-> vfs_fallocate()
   -> ovl_fallocate()
      -> file_remove_privs()
         -> dentry_needs_remove_privs()
            -> should_remove_suid()
         -> __remove_privs()
            newattrs.ia_valid = ATTR_FORCE | kill;
            -> notify_change()
               -> ovl_setattr()
                  // TAKE ON MOUNTER'S CREDS
                  -> ovl_do_notify_change()
                     -> notify_change()
                  // GIVE UP MOUNTER'S CREDS
     // TAKE ON MOUNTER'S CREDS
     -> vfs_fallocate()
        -> xfs_file_fallocate()
           -> file_modified()
              -> __file_remove_privs()
                 -> dentry_needs_remove_privs()
                    -> should_remove_suid()
                 -> __remove_privs()
                    newattrs.ia_valid = attr_force | kill;
                    -> notify_change()

The fix for all of this is to make file_remove_privs()'s
should_remove_suid() helper to perform the same checks as we already
require in setattr_prepare() and setattr_copy() and have notify_change()
not pointlessly requiring S_IXGRP again. It doesn't make any sense in the
first place because the caller must calculate the flags via
should_remove_suid() anyway which would raise ATTR_KILL_SGID.

Running xfstests with this doesn't report any regressions. We should really
try and use consistent checks.

Thanks!
Christian

Amir Goldstein (2):
  ovl: remove privs in ovl_copyfile()
  ovl: remove privs in ovl_fallocate()

Christian Brauner (3):
  attr: add setattr_drop_sgid()
  attr: add should_remove_sgid()
  attr: use consistent sgid stripping checks

 fs/attr.c           | 58 ++++++++++++++++++++++++++++++++++++++++-----
 fs/fuse/file.c      |  2 +-
 fs/inode.c          | 24 +++++++------------
 fs/internal.h       |  5 +++-
 fs/ocfs2/file.c     |  4 ++--
 fs/open.c           |  8 +++----
 fs/overlayfs/file.c | 28 +++++++++++++++++++---
 include/linux/fs.h  |  2 +-
 8 files changed, 97 insertions(+), 34 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/5] attr: add setattr_drop_sgid()
  2022-10-07 14:05 [PATCH v2 0/5] fs: improve setgid stripping consistency even more Christian Brauner
@ 2022-10-07 14:05 ` Christian Brauner
  2022-10-11  8:11   ` Amir Goldstein
  2022-10-07 14:05 ` [PATCH v2 2/5] attr: add should_remove_sgid() Christian Brauner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Christian Brauner @ 2022-10-07 14:05 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi, Darrick J . Wong
  Cc: Christian Brauner, Al Viro, Christoph Hellwig, Seth Forshee,
	Yang Xu, Filipe Manana, linux-unionfs, linux-fsdevel

In setattr_{copy,prepare}() we need to perform the same permission
checks to determine whether we need to drop the setgid bit or not.
Instead of open-coding it twice add a simple helper the encapsulates the
logic. We will reuse this helpers to make dropping the setgid bit during
write operations more consistent in a follow up patch.

Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---

Notes:
    /* v2 */
    patch added

 fs/attr.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index 1552a5f23d6b..b1cff6f5b715 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -18,6 +18,27 @@
 #include <linux/evm.h>
 #include <linux/ima.h>
 
+/**
+ * setattr_drop_sgid - check generic setgid permissions
+ * @mnt_userns:	User namespace of the mount the inode was created from
+ * @inode: inode to check
+ * @vfsgid: the new/current vfsgid of @inode
+ *
+ * This function determines whether the setgid bit needs to be removed because
+ * the caller lacks privileges over the inode.
+ *
+ * Return: true if the setgid bit needs to be removed, false if not.
+ */
+static bool setattr_drop_sgid(struct user_namespace *mnt_userns,
+			      const struct inode *inode, vfsgid_t vfsgid)
+{
+	if (vfsgid_in_group_p(vfsgid))
+		return false;
+	if (capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
+		return false;
+	return true;
+}
+
 /**
  * chown_ok - verify permissions to chown inode
  * @mnt_userns:	user namespace of the mount @inode was found from
@@ -140,8 +161,7 @@ int setattr_prepare(struct user_namespace *mnt_userns, struct dentry *dentry,
 			vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
 
 		/* Also check the setgid bit! */
-		if (!vfsgid_in_group_p(vfsgid) &&
-		    !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
+		if (setattr_drop_sgid(mnt_userns, inode, vfsgid))
 			attr->ia_mode &= ~S_ISGID;
 	}
 
@@ -251,9 +271,8 @@ void setattr_copy(struct user_namespace *mnt_userns, struct inode *inode,
 		inode->i_ctime = attr->ia_ctime;
 	if (ia_valid & ATTR_MODE) {
 		umode_t mode = attr->ia_mode;
-		vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
-		if (!vfsgid_in_group_p(vfsgid) &&
-		    !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
+		if (setattr_drop_sgid(mnt_userns, inode,
+				      i_gid_into_vfsgid(mnt_userns, inode)))
 			mode &= ~S_ISGID;
 		inode->i_mode = mode;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/5] attr: add should_remove_sgid()
  2022-10-07 14:05 [PATCH v2 0/5] fs: improve setgid stripping consistency even more Christian Brauner
  2022-10-07 14:05 ` [PATCH v2 1/5] attr: add setattr_drop_sgid() Christian Brauner
@ 2022-10-07 14:05 ` Christian Brauner
       [not found]   ` <202210080357.inSALqdT-lkp@intel.com>
  2022-10-11  8:18   ` Amir Goldstein
  2022-10-07 14:05 ` [PATCH v2 3/5] attr: use consistent sgid stripping checks Christian Brauner
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Christian Brauner @ 2022-10-07 14:05 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi, Darrick J . Wong
  Cc: Christian Brauner, Al Viro, Christoph Hellwig, Seth Forshee,
	Yang Xu, Filipe Manana, linux-unionfs, linux-fsdevel

The current setgid stripping logic during write and ownership change
operations is inconsistent and strewn over multiple places. In order to
consolidate it and make more consistent we'll add a new helper
should_remove_sgid(). The function retains the old behavior where we
remove the S_ISGID bit unconditionally when S_IXGRP is set but also when
it isn't set and the caller is neither in the group of the inode nor
privileged over the inode.

We will use this helper both in write operation permission removal such
as file_remove_privs() as well as in ownership change operations.

Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---

Notes:
    /* v2 */
    Dave Chinner <dchinner@redhat.com>:
    - Use easier to follow logic in the new helper.

 fs/attr.c     | 27 +++++++++++++++++++++++++++
 fs/internal.h |  2 ++
 2 files changed, 29 insertions(+)

diff --git a/fs/attr.c b/fs/attr.c
index b1cff6f5b715..d0bb1dae425e 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -39,6 +39,33 @@ static bool setattr_drop_sgid(struct user_namespace *mnt_userns,
 	return true;
 }
 
+/**
+ * should_remove_sgid - determine whether the setgid bit needs to be removed
+ * @mnt_userns:	User namespace of the mount the inode was created from
+ * @inode: inode to check
+ *
+ * This function determines whether the setgid bit needs to be removed.
+ * We retain backwards compatibility and require setgid bit to be removed
+ * unconditionally if S_IXGRP is set. Otherwise we have the exact same
+ * requirements as setattr_prepare() and setattr_copy().
+ *
+ * Return: ATTR_KILL_SGID if setgid bit needs to be removed, 0 otherwise.
+ */
+int should_remove_sgid(struct user_namespace *mnt_userns,
+		       const struct inode *inode)
+{
+	umode_t mode = inode->i_mode;
+
+	if (!(mode & S_ISGID))
+		return 0;
+	if (mode & S_IXGRP)
+		return ATTR_KILL_SGID;
+	if (setattr_drop_sgid(mnt_userns, inode,
+			      i_gid_into_vfsgid(mnt_userns, inode)))
+		return ATTR_KILL_SGID;
+	return 0;
+}
+
 /**
  * chown_ok - verify permissions to chown inode
  * @mnt_userns:	user namespace of the mount @inode was found from
diff --git a/fs/internal.h b/fs/internal.h
index 87e96b9024ce..9d165ab65a2a 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -221,3 +221,5 @@ ssize_t do_getxattr(struct user_namespace *mnt_userns,
 int setxattr_copy(const char __user *name, struct xattr_ctx *ctx);
 int do_setxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 		struct xattr_ctx *ctx);
+int should_remove_sgid(struct user_namespace *mnt_userns,
+		       const struct inode *inode);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/5] attr: use consistent sgid stripping checks
  2022-10-07 14:05 [PATCH v2 0/5] fs: improve setgid stripping consistency even more Christian Brauner
  2022-10-07 14:05 ` [PATCH v2 1/5] attr: add setattr_drop_sgid() Christian Brauner
  2022-10-07 14:05 ` [PATCH v2 2/5] attr: add should_remove_sgid() Christian Brauner
@ 2022-10-07 14:05 ` Christian Brauner
  2022-10-11  8:43   ` Amir Goldstein
  2022-10-07 14:05 ` [PATCH v2 4/5] ovl: remove privs in ovl_copyfile() Christian Brauner
  2022-10-07 14:05 ` [PATCH v2 5/5] ovl: remove privs in ovl_fallocate() Christian Brauner
  4 siblings, 1 reply; 15+ messages in thread
From: Christian Brauner @ 2022-10-07 14:05 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi, Darrick J . Wong
  Cc: Christian Brauner, Al Viro, Christoph Hellwig, Seth Forshee,
	Yang Xu, Filipe Manana, linux-unionfs, linux-fsdevel

Currently setgid stripping in file_remove_privs()'s should_remove_suid()
helper is inconsistent with other parts of the vfs. Specifically, it only
raises ATTR_KILL_SGID if the inode is S_ISGID and S_IXGRP but not if the
inode isn't in the caller's groups and the caller isn't privileged over the
inode although we require this already in setattr_prepare() and
setattr_copy() and so all filesystem implement this requirement implicitly
because they have to use setattr_{prepare,copy}() anyway.

But the inconsistency shows up in setgid stripping bugs for overlayfs in
xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
generic/687). For example, we test whether suid and setgid stripping works
correctly when performing various write-like operations as an unprivileged
user (fallocate, reflink, write, etc.):

echo "Test 1 - qa_user, non-exec file $verb"
setup_testfile
chmod a+rws $junk_file
commit_and_check "$qa_user" "$verb" 64k 64k

The test basically creates a file with 6666 permissions. While the file has
the S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. On a
regular filesystem like xfs what will happen is:

sys_fallocate()
-> vfs_fallocate()
   -> xfs_file_fallocate()
      -> file_modified()
         -> __file_remove_privs()
            -> dentry_needs_remove_privs()
               -> should_remove_suid()
            -> __remove_privs()
               newattrs.ia_valid = ATTR_FORCE | kill;
               -> notify_change()
                  -> setattr_copy()

In should_remove_suid() we can see that ATTR_KILL_SUID is raised
unconditionally because the file in the test has S_ISUID set.

But we also see that ATTR_KILL_SGID won't be set because while the file
is S_ISGID it is not S_IXGRP (see above) which is a condition for
ATTR_KILL_SGID being raised.

So by the time we call notify_change() we have attr->ia_valid set to
ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
ATTR_KILL_SUID is set and does:

ia_valid = attr->ia_valid |= ATTR_MODE
attr->ia_mode = (inode->i_mode & ~S_ISUID);

which means that when we call setattr_copy() later we will definitely
update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.

Now we call into the filesystem's ->setattr() inode operation which will
end up calling setattr_copy(). Since ATTR_MODE is set we will hit:

if (ia_valid & ATTR_MODE) {
        umode_t mode = attr->ia_mode;
        vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
        if (!vfsgid_in_group_p(vfsgid) &&
            !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                mode &= ~S_ISGID;
        inode->i_mode = mode;
}

and since the caller in the test is neither capable nor in the group of the
inode the S_ISGID bit is stripped.

But assume the file isn't suid then ATTR_KILL_SUID won't be raised which
has the consequence that neither the setgid nor the suid bits are stripped
even though it should be stripped because the inode isn't in the caller's
groups and the caller isn't privileged over the inode.

If overlayfs is in the mix things become a bit more complicated and the bug
shows up more clearly. When e.g., ovl_setattr() is hit from
ovl_fallocate()'s call to file_remove_privs() then ATTR_KILL_SUID and
ATTR_KILL_SGID might be raised but because the check in notify_change() is
questioning the ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be
stripped the S_ISGID bit isn't removed even though it should be stripped:

sys_fallocate()
-> vfs_fallocate()
   -> ovl_fallocate()
      -> file_remove_privs()
         -> dentry_needs_remove_privs()
            -> should_remove_suid()
         -> __remove_privs()
            newattrs.ia_valid = ATTR_FORCE | kill;
            -> notify_change()
               -> ovl_setattr()
                  // TAKE ON MOUNTER'S CREDS
                  -> ovl_do_notify_change()
                     -> notify_change()
                  // GIVE UP MOUNTER'S CREDS
     // TAKE ON MOUNTER'S CREDS
     -> vfs_fallocate()
        -> xfs_file_fallocate()
           -> file_modified()
              -> __file_remove_privs()
                 -> dentry_needs_remove_privs()
                    -> should_remove_suid()
                 -> __remove_privs()
                    newattrs.ia_valid = attr_force | kill;
                    -> notify_change()

The fix for all of this is to make file_remove_privs()'s
should_remove_suid() helper to perform the same checks as we already
require in setattr_prepare() and setattr_copy() and have notify_change()
not pointlessly requiring S_IXGRP again. It doesn't make any sense in the
first place because the caller must calculate the flags via
should_remove_suid() anyway which would raise ATTR_KILL_SGID.

Running xfstests with this doesn't report any regressions. We should really
try and use consistent checks.

Co-Developed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---

Notes:
    /* v2 */
    Amir Goldstein <amir73il@gmail.com>:
    - mention xfstests that failed prior to that
    
    Christian Brauner <brauner@kernel.org>:
    - Use should_remove_sgid() in chown_common() just like we do in do_truncate().

 fs/attr.c          |  2 +-
 fs/fuse/file.c     |  2 +-
 fs/inode.c         | 24 ++++++++----------------
 fs/internal.h      |  3 ++-
 fs/ocfs2/file.c    |  4 ++--
 fs/open.c          |  8 ++++----
 include/linux/fs.h |  2 +-
 7 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index d0bb1dae425e..888b34e8c268 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -421,7 +421,7 @@ int notify_change(struct user_namespace *mnt_userns, struct dentry *dentry,
 		}
 	}
 	if (ia_valid & ATTR_KILL_SGID) {
-		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
+		if (mode & S_ISGID) {
 			if (!(ia_valid & ATTR_MODE)) {
 				ia_valid = attr->ia_valid |= ATTR_MODE;
 				attr->ia_mode = inode->i_mode;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1a3afd469e3a..fccc2c7e88fd 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1313,7 +1313,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
 			return err;
 
 		if (fc->handle_killpriv_v2 &&
-		    should_remove_suid(file_dentry(file))) {
+		    should_remove_suid(&init_user_ns, file_dentry(file))) {
 			goto writethrough;
 		}
 
diff --git a/fs/inode.c b/fs/inode.c
index ba1de23c13c1..092a66324c65 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1949,26 +1949,17 @@ void touch_atime(const struct path *path)
 }
 EXPORT_SYMBOL(touch_atime);
 
-/*
- * The logic we want is
- *
- *	if suid or (sgid and xgrp)
- *		remove privs
- */
-int should_remove_suid(struct dentry *dentry)
+int should_remove_suid(struct user_namespace *mnt_userns, struct dentry *dentry)
 {
-	umode_t mode = d_inode(dentry)->i_mode;
+	struct inode *inode = d_inode(dentry);
+	umode_t mode = inode->i_mode;
 	int kill = 0;
 
 	/* suid always must be killed */
 	if (unlikely(mode & S_ISUID))
 		kill = ATTR_KILL_SUID;
 
-	/*
-	 * sgid without any exec bits is just a mandatory locking mark; leave
-	 * it alone.  If some exec bits are set, it's a real sgid; kill it.
-	 */
-	if (unlikely((mode & S_ISGID) && (mode & S_IXGRP)))
+	if (unlikely(should_remove_sgid(mnt_userns, inode)))
 		kill |= ATTR_KILL_SGID;
 
 	if (unlikely(kill && !capable(CAP_FSETID) && S_ISREG(mode)))
@@ -1983,7 +1974,8 @@ EXPORT_SYMBOL(should_remove_suid);
  * response to write or truncate. Return 0 if nothing has to be changed.
  * Negative value on error (change should be denied).
  */
-int dentry_needs_remove_privs(struct dentry *dentry)
+int dentry_needs_remove_privs(struct user_namespace *mnt_userns,
+			      struct dentry *dentry)
 {
 	struct inode *inode = d_inode(dentry);
 	int mask = 0;
@@ -1992,7 +1984,7 @@ int dentry_needs_remove_privs(struct dentry *dentry)
 	if (IS_NOSEC(inode))
 		return 0;
 
-	mask = should_remove_suid(dentry);
+	mask = should_remove_suid(mnt_userns, dentry);
 	ret = security_inode_need_killpriv(dentry);
 	if (ret < 0)
 		return ret;
@@ -2024,7 +2016,7 @@ static int __file_remove_privs(struct file *file, unsigned int flags)
 	if (IS_NOSEC(inode) || !S_ISREG(inode->i_mode))
 		return 0;
 
-	kill = dentry_needs_remove_privs(dentry);
+	kill = dentry_needs_remove_privs(file_mnt_user_ns(file), dentry);
 	if (kill < 0)
 		return kill;
 
diff --git a/fs/internal.h b/fs/internal.h
index 9d165ab65a2a..b46881b7f8a0 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -139,7 +139,8 @@ extern int vfs_open(const struct path *, struct file *);
  * inode.c
  */
 extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc);
-extern int dentry_needs_remove_privs(struct dentry *dentry);
+extern int dentry_needs_remove_privs(struct user_namespace *,
+				     struct dentry *dentry);
 
 /*
  * fs-writeback.c
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 9c67edd215d5..e421491783c3 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1991,7 +1991,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
 		}
 	}
 
-	if (file && should_remove_suid(file->f_path.dentry)) {
+	if (file && should_remove_suid(&init_user_ns, file->f_path.dentry)) {
 		ret = __ocfs2_write_remove_suid(inode, di_bh);
 		if (ret) {
 			mlog_errno(ret);
@@ -2279,7 +2279,7 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
 		 * inode. There's also the dinode i_size state which
 		 * can be lost via setattr during extending writes (we
 		 * set inode->i_size at the end of a write. */
-		if (should_remove_suid(dentry)) {
+		if (should_remove_suid(&init_user_ns, dentry)) {
 			if (meta_level == 0) {
 				ocfs2_inode_unlock_for_extent_tree(inode,
 								   &di_bh,
diff --git a/fs/open.c b/fs/open.c
index 8a813fa5ca56..d955ecef758f 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -54,7 +54,7 @@ int do_truncate(struct user_namespace *mnt_userns, struct dentry *dentry,
 	}
 
 	/* Remove suid, sgid, and file capabilities on truncate too */
-	ret = dentry_needs_remove_privs(dentry);
+	ret = dentry_needs_remove_privs(mnt_userns, dentry);
 	if (ret < 0)
 		return ret;
 	if (ret)
@@ -721,10 +721,10 @@ int chown_common(const struct path *path, uid_t user, gid_t group)
 		return -EINVAL;
 	if ((group != (gid_t)-1) && !setattr_vfsgid(&newattrs, gid))
 		return -EINVAL;
-	if (!S_ISDIR(inode->i_mode))
-		newattrs.ia_valid |=
-			ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
 	inode_lock(inode);
+	if (!S_ISDIR(inode->i_mode))
+		newattrs.ia_valid |= ATTR_KILL_SUID | ATTR_KILL_PRIV |
+				     should_remove_sgid(mnt_userns, inode);
 	/* Continue to send actual fs values, not the mount values. */
 	error = security_path_chown(
 		path,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9eced4cc286e..993ab96af619 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3090,7 +3090,7 @@ extern void __destroy_inode(struct inode *);
 extern struct inode *new_inode_pseudo(struct super_block *sb);
 extern struct inode *new_inode(struct super_block *sb);
 extern void free_inode_nonrcu(struct inode *inode);
-extern int should_remove_suid(struct dentry *);
+extern int should_remove_suid(struct user_namespace *, struct dentry *);
 extern int file_remove_privs(struct file *);
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/5] ovl: remove privs in ovl_copyfile()
  2022-10-07 14:05 [PATCH v2 0/5] fs: improve setgid stripping consistency even more Christian Brauner
                   ` (2 preceding siblings ...)
  2022-10-07 14:05 ` [PATCH v2 3/5] attr: use consistent sgid stripping checks Christian Brauner
@ 2022-10-07 14:05 ` Christian Brauner
  2022-10-07 14:05 ` [PATCH v2 5/5] ovl: remove privs in ovl_fallocate() Christian Brauner
  4 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2022-10-07 14:05 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi, Darrick J . Wong
  Cc: Al Viro, Christoph Hellwig, Seth Forshee, Yang Xu, Filipe Manana,
	Christian Brauner, linux-unionfs, linux-fsdevel, Miklos Szeredi

From: Amir Goldstein <amir73il@gmail.com>

Underlying fs doesn't remove privs because copy_range/remap_range are
called with privileged mounter credentials.

This fixes some failures in fstest generic/673.

Fixes: 8ede205541ff ("ovl: add reflink/copyfile/dedup support")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
---

Notes:
    /* v2 */
    Acked-by: Miklos Szeredi <miklos@szeredi.hu>

 fs/overlayfs/file.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index daff601b5c41..362a4eed92b5 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -567,14 +567,23 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 	const struct cred *old_cred;
 	loff_t ret;
 
+	inode_lock(inode_out);
+	if (op != OVL_DEDUPE) {
+		/* Update mode */
+		ovl_copyattr(inode_out);
+		ret = file_remove_privs(file_out);
+		if (ret)
+			goto out_unlock;
+	}
+
 	ret = ovl_real_fdget(file_out, &real_out);
 	if (ret)
-		return ret;
+		goto out_unlock;
 
 	ret = ovl_real_fdget(file_in, &real_in);
 	if (ret) {
 		fdput(real_out);
-		return ret;
+		goto out_unlock;
 	}
 
 	old_cred = ovl_override_creds(file_inode(file_out)->i_sb);
@@ -603,6 +612,9 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 	fdput(real_in);
 	fdput(real_out);
 
+out_unlock:
+	inode_unlock(inode_out);
+
 	return ret;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/5] ovl: remove privs in ovl_fallocate()
  2022-10-07 14:05 [PATCH v2 0/5] fs: improve setgid stripping consistency even more Christian Brauner
                   ` (3 preceding siblings ...)
  2022-10-07 14:05 ` [PATCH v2 4/5] ovl: remove privs in ovl_copyfile() Christian Brauner
@ 2022-10-07 14:05 ` Christian Brauner
  4 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2022-10-07 14:05 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi, Darrick J . Wong
  Cc: Al Viro, Christoph Hellwig, Seth Forshee, Yang Xu, Filipe Manana,
	Christian Brauner, linux-unionfs, linux-fsdevel, Miklos Szeredi

From: Amir Goldstein <amir73il@gmail.com>

Underlying fs doesn't remove privs because fallocate is called with
privileged mounter credentials.

This fixes some failure in fstests generic/683..687.

Fixes: aab8848cee5e ("ovl: add ovl_fallocate()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
---

Notes:
    /* v2 */
    Acked-by: Miklos Szeredi <miklos@szeredi.hu>

 fs/overlayfs/file.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 362a4eed92b5..a34f8042724c 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -517,9 +517,16 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 	const struct cred *old_cred;
 	int ret;
 
+	inode_lock(inode);
+	/* Update mode */
+	ovl_copyattr(inode);
+	ret = file_remove_privs(file);
+	if (ret)
+		goto out_unlock;
+
 	ret = ovl_real_fdget(file, &real);
 	if (ret)
-		return ret;
+		goto out_unlock;
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	ret = vfs_fallocate(real.file, mode, offset, len);
@@ -530,6 +537,9 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 
 	fdput(real);
 
+out_unlock:
+	inode_unlock(inode);
+
 	return ret;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/5] attr: add should_remove_sgid()
       [not found]   ` <202210080357.inSALqdT-lkp@intel.com>
@ 2022-10-08  5:56     ` Christian Brauner
  0 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2022-10-08  5:56 UTC (permalink / raw)
  To: kernel test robot
  Cc: Amir Goldstein, Miklos Szeredi, Darrick J . Wong, llvm,
	kbuild-all, Al Viro, Christoph Hellwig, Seth Forshee, Yang Xu,
	Filipe Manana, linux-unionfs, linux-fsdevel

On Sat, Oct 08, 2022 at 03:16:04AM +0800, kernel test robot wrote:
> Hi Christian,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on mszeredi-vfs/overlayfs-next]
> [cannot apply to linus/master mszeredi-fuse/for-next v6.0 next-20221007]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]

This isn't rebased and thus will fail to apply. There'll be a merge
conflict with changes to fs/internal.h for current master. Best to wait
until I have rebased this after v6.0-rc1 is out.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/5] attr: add setattr_drop_sgid()
  2022-10-07 14:05 ` [PATCH v2 1/5] attr: add setattr_drop_sgid() Christian Brauner
@ 2022-10-11  8:11   ` Amir Goldstein
  2022-10-11  8:57     ` Christian Brauner
  0 siblings, 1 reply; 15+ messages in thread
From: Amir Goldstein @ 2022-10-11  8:11 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Miklos Szeredi, Darrick J . Wong, Al Viro, Christoph Hellwig,
	Seth Forshee, Yang Xu, Filipe Manana, linux-unionfs,
	linux-fsdevel

On Fri, Oct 7, 2022 at 5:06 PM Christian Brauner <brauner@kernel.org> wrote:
>
> In setattr_{copy,prepare}() we need to perform the same permission
> checks to determine whether we need to drop the setgid bit or not.
> Instead of open-coding it twice add a simple helper the encapsulates the
> logic. We will reuse this helpers to make dropping the setgid bit during
> write operations more consistent in a follow up patch.
>
> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Looks good.
Some suggestions below - not  a must.

Thanks,
Amir.

> ---
>
> Notes:
>     /* v2 */
>     patch added
>
>  fs/attr.c | 29 ++++++++++++++++++++++++-----
>  1 file changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/fs/attr.c b/fs/attr.c
> index 1552a5f23d6b..b1cff6f5b715 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -18,6 +18,27 @@
>  #include <linux/evm.h>
>  #include <linux/ima.h>
>
> +/**
> + * setattr_drop_sgid - check generic setgid permissions

Helper name sounds like a directive, where it should sound like a question.
e.g. setattr_should_remove_sgid()

> + * @mnt_userns:        User namespace of the mount the inode was created from
> + * @inode: inode to check
> + * @vfsgid: the new/current vfsgid of @inode
> + *
> + * This function determines whether the setgid bit needs to be removed because
> + * the caller lacks privileges over the inode.
> + *
> + * Return: true if the setgid bit needs to be removed, false if not.

You may want to consider matching the return value to that of
should_remove_sgid(), that is 0 or ATTR_KILL_SGID to make all the family of
those helpers behave similarly.

> + */
> +static bool setattr_drop_sgid(struct user_namespace *mnt_userns,
> +                             const struct inode *inode, vfsgid_t vfsgid)
> +{
> +       if (vfsgid_in_group_p(vfsgid))
> +               return false;
> +       if (capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
> +               return false;
> +       return true;
> +}
> +
>  /**
>   * chown_ok - verify permissions to chown inode
>   * @mnt_userns:        user namespace of the mount @inode was found from
> @@ -140,8 +161,7 @@ int setattr_prepare(struct user_namespace *mnt_userns, struct dentry *dentry,
>                         vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
>
>                 /* Also check the setgid bit! */
> -               if (!vfsgid_in_group_p(vfsgid) &&
> -                   !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
> +               if (setattr_drop_sgid(mnt_userns, inode, vfsgid))
>                         attr->ia_mode &= ~S_ISGID;
>         }
>
> @@ -251,9 +271,8 @@ void setattr_copy(struct user_namespace *mnt_userns, struct inode *inode,
>                 inode->i_ctime = attr->ia_ctime;
>         if (ia_valid & ATTR_MODE) {
>                 umode_t mode = attr->ia_mode;
> -               vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
> -               if (!vfsgid_in_group_p(vfsgid) &&
> -                   !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
> +               if (setattr_drop_sgid(mnt_userns, inode,
> +                                     i_gid_into_vfsgid(mnt_userns, inode)))
>                         mode &= ~S_ISGID;
>                 inode->i_mode = mode;
>         }
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/5] attr: add should_remove_sgid()
  2022-10-07 14:05 ` [PATCH v2 2/5] attr: add should_remove_sgid() Christian Brauner
       [not found]   ` <202210080357.inSALqdT-lkp@intel.com>
@ 2022-10-11  8:18   ` Amir Goldstein
  2022-10-11  8:46     ` Christian Brauner
  1 sibling, 1 reply; 15+ messages in thread
From: Amir Goldstein @ 2022-10-11  8:18 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Miklos Szeredi, Darrick J . Wong, Al Viro, Christoph Hellwig,
	Seth Forshee, Yang Xu, Filipe Manana, linux-unionfs,
	linux-fsdevel

On Fri, Oct 7, 2022 at 5:06 PM Christian Brauner <brauner@kernel.org> wrote:
>
> The current setgid stripping logic during write and ownership change
> operations is inconsistent and strewn over multiple places. In order to
> consolidate it and make more consistent we'll add a new helper
> should_remove_sgid(). The function retains the old behavior where we
> remove the S_ISGID bit unconditionally when S_IXGRP is set but also when
> it isn't set and the caller is neither in the group of the inode nor
> privileged over the inode.
>
> We will use this helper both in write operation permission removal such
> as file_remove_privs() as well as in ownership change operations.
>
> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Looks good.
Some suggestions below.

> ---
>
> Notes:
>     /* v2 */
>     Dave Chinner <dchinner@redhat.com>:
>     - Use easier to follow logic in the new helper.
>
>  fs/attr.c     | 27 +++++++++++++++++++++++++++
>  fs/internal.h |  2 ++
>  2 files changed, 29 insertions(+)
>
> diff --git a/fs/attr.c b/fs/attr.c
> index b1cff6f5b715..d0bb1dae425e 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -39,6 +39,33 @@ static bool setattr_drop_sgid(struct user_namespace *mnt_userns,
>         return true;
>  }
>
> +/**
> + * should_remove_sgid - determine whether the setgid bit needs to be removed
> + * @mnt_userns:        User namespace of the mount the inode was created from
> + * @inode: inode to check
> + *
> + * This function determines whether the setgid bit needs to be removed.
> + * We retain backwards compatibility and require setgid bit to be removed
> + * unconditionally if S_IXGRP is set. Otherwise we have the exact same
> + * requirements as setattr_prepare() and setattr_copy().
> + *
> + * Return: ATTR_KILL_SGID if setgid bit needs to be removed, 0 otherwise.
> + */
> +int should_remove_sgid(struct user_namespace *mnt_userns,
> +                      const struct inode *inode)
> +{
> +       umode_t mode = inode->i_mode;
> +
> +       if (!(mode & S_ISGID))
> +               return 0;
> +       if (mode & S_IXGRP)
> +               return ATTR_KILL_SGID;
> +       if (setattr_drop_sgid(mnt_userns, inode,
> +                             i_gid_into_vfsgid(mnt_userns, inode)))
> +               return ATTR_KILL_SGID;
> +       return 0;

If you take my suggestion from patch 1/5, that would become:

    return setattr_should_remove_sgid(mnt_userns, inode,
                                 i_gid_into_vfsgid(mnt_userns, inode));

> +}
> +
>  /**
>   * chown_ok - verify permissions to chown inode
>   * @mnt_userns:        user namespace of the mount @inode was found from
> diff --git a/fs/internal.h b/fs/internal.h
> index 87e96b9024ce..9d165ab65a2a 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -221,3 +221,5 @@ ssize_t do_getxattr(struct user_namespace *mnt_userns,
>  int setxattr_copy(const char __user *name, struct xattr_ctx *ctx);
>  int do_setxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
>                 struct xattr_ctx *ctx);
> +int should_remove_sgid(struct user_namespace *mnt_userns,
> +                      const struct inode *inode);

I realize that you placed this helper in attr.c to make
setattr_drop_sgid() static, but IMO the code will be clearer to readers
if all the family of suig/sgid stripping helpers were clustered together
in inode.c where it will be easier to get the high level view.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] attr: use consistent sgid stripping checks
  2022-10-07 14:05 ` [PATCH v2 3/5] attr: use consistent sgid stripping checks Christian Brauner
@ 2022-10-11  8:43   ` Amir Goldstein
  2022-10-11  8:56     ` Christian Brauner
  0 siblings, 1 reply; 15+ messages in thread
From: Amir Goldstein @ 2022-10-11  8:43 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Miklos Szeredi, Darrick J . Wong, Al Viro, Christoph Hellwig,
	Seth Forshee, Yang Xu, Filipe Manana, linux-unionfs,
	linux-fsdevel

On Fri, Oct 7, 2022 at 5:06 PM Christian Brauner <brauner@kernel.org> wrote:
>
> Currently setgid stripping in file_remove_privs()'s should_remove_suid()
> helper is inconsistent with other parts of the vfs. Specifically, it only
> raises ATTR_KILL_SGID if the inode is S_ISGID and S_IXGRP but not if the
> inode isn't in the caller's groups and the caller isn't privileged over the
> inode although we require this already in setattr_prepare() and
> setattr_copy() and so all filesystem implement this requirement implicitly
> because they have to use setattr_{prepare,copy}() anyway.
>
> But the inconsistency shows up in setgid stripping bugs for overlayfs in
> xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
> generic/687). For example, we test whether suid and setgid stripping works
> correctly when performing various write-like operations as an unprivileged
> user (fallocate, reflink, write, etc.):
>
> echo "Test 1 - qa_user, non-exec file $verb"
> setup_testfile
> chmod a+rws $junk_file
> commit_and_check "$qa_user" "$verb" 64k 64k
>
> The test basically creates a file with 6666 permissions. While the file has
> the S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. On a
> regular filesystem like xfs what will happen is:
>
> sys_fallocate()
> -> vfs_fallocate()
>    -> xfs_file_fallocate()
>       -> file_modified()
>          -> __file_remove_privs()
>             -> dentry_needs_remove_privs()
>                -> should_remove_suid()
>             -> __remove_privs()
>                newattrs.ia_valid = ATTR_FORCE | kill;
>                -> notify_change()
>                   -> setattr_copy()
>
> In should_remove_suid() we can see that ATTR_KILL_SUID is raised
> unconditionally because the file in the test has S_ISUID set.
>
> But we also see that ATTR_KILL_SGID won't be set because while the file
> is S_ISGID it is not S_IXGRP (see above) which is a condition for
> ATTR_KILL_SGID being raised.
>
> So by the time we call notify_change() we have attr->ia_valid set to
> ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
> ATTR_KILL_SUID is set and does:
>
> ia_valid = attr->ia_valid |= ATTR_MODE
> attr->ia_mode = (inode->i_mode & ~S_ISUID);
>
> which means that when we call setattr_copy() later we will definitely
> update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.
>
> Now we call into the filesystem's ->setattr() inode operation which will
> end up calling setattr_copy(). Since ATTR_MODE is set we will hit:
>
> if (ia_valid & ATTR_MODE) {
>         umode_t mode = attr->ia_mode;
>         vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
>         if (!vfsgid_in_group_p(vfsgid) &&
>             !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
>                 mode &= ~S_ISGID;
>         inode->i_mode = mode;
> }
>
> and since the caller in the test is neither capable nor in the group of the
> inode the S_ISGID bit is stripped.
>
> But assume the file isn't suid then ATTR_KILL_SUID won't be raised which
> has the consequence that neither the setgid nor the suid bits are stripped
> even though it should be stripped because the inode isn't in the caller's
> groups and the caller isn't privileged over the inode.
>
> If overlayfs is in the mix things become a bit more complicated and the bug
> shows up more clearly. When e.g., ovl_setattr() is hit from
> ovl_fallocate()'s call to file_remove_privs() then ATTR_KILL_SUID and
> ATTR_KILL_SGID might be raised but because the check in notify_change() is
> questioning the ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be
> stripped the S_ISGID bit isn't removed even though it should be stripped:
>
> sys_fallocate()
> -> vfs_fallocate()
>    -> ovl_fallocate()
>       -> file_remove_privs()
>          -> dentry_needs_remove_privs()
>             -> should_remove_suid()
>          -> __remove_privs()
>             newattrs.ia_valid = ATTR_FORCE | kill;
>             -> notify_change()
>                -> ovl_setattr()
>                   // TAKE ON MOUNTER'S CREDS
>                   -> ovl_do_notify_change()
>                      -> notify_change()
>                   // GIVE UP MOUNTER'S CREDS
>      // TAKE ON MOUNTER'S CREDS
>      -> vfs_fallocate()
>         -> xfs_file_fallocate()
>            -> file_modified()
>               -> __file_remove_privs()
>                  -> dentry_needs_remove_privs()
>                     -> should_remove_suid()
>                  -> __remove_privs()
>                     newattrs.ia_valid = attr_force | kill;
>                     -> notify_change()
>
> The fix for all of this is to make file_remove_privs()'s
> should_remove_suid() helper to perform the same checks as we already
> require in setattr_prepare() and setattr_copy() and have notify_change()
> not pointlessly requiring S_IXGRP again. It doesn't make any sense in the
> first place because the caller must calculate the flags via
> should_remove_suid() anyway which would raise ATTR_KILL_SGID.
>
> Running xfstests with this doesn't report any regressions. We should really
> try and use consistent checks.
>
> Co-Developed-by: Amir Goldstein <amir73il@gmail.com>
> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> ---
>
> Notes:
>     /* v2 */
>     Amir Goldstein <amir73il@gmail.com>:
>     - mention xfstests that failed prior to that
>
>     Christian Brauner <brauner@kernel.org>:
>     - Use should_remove_sgid() in chown_common() just like we do in do_truncate().
>
>  fs/attr.c          |  2 +-
>  fs/fuse/file.c     |  2 +-
>  fs/inode.c         | 24 ++++++++----------------
>  fs/internal.h      |  3 ++-
>  fs/ocfs2/file.c    |  4 ++--
>  fs/open.c          |  8 ++++----
>  include/linux/fs.h |  2 +-
>  7 files changed, 19 insertions(+), 26 deletions(-)
>
> diff --git a/fs/attr.c b/fs/attr.c
> index d0bb1dae425e..888b34e8c268 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -421,7 +421,7 @@ int notify_change(struct user_namespace *mnt_userns, struct dentry *dentry,
>                 }
>         }
>         if (ia_valid & ATTR_KILL_SGID) {
> -               if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> +               if (mode & S_ISGID) {
>                         if (!(ia_valid & ATTR_MODE)) {
>                                 ia_valid = attr->ia_valid |= ATTR_MODE;
>                                 attr->ia_mode = inode->i_mode;
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 1a3afd469e3a..fccc2c7e88fd 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1313,7 +1313,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
>                         return err;
>
>                 if (fc->handle_killpriv_v2 &&
> -                   should_remove_suid(file_dentry(file))) {
> +                   should_remove_suid(&init_user_ns, file_dentry(file))) {
>                         goto writethrough;
>                 }
>
> diff --git a/fs/inode.c b/fs/inode.c
> index ba1de23c13c1..092a66324c65 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1949,26 +1949,17 @@ void touch_atime(const struct path *path)
>  }
>  EXPORT_SYMBOL(touch_atime);
>
> -/*
> - * The logic we want is
> - *
> - *     if suid or (sgid and xgrp)
> - *             remove privs
> - */
> -int should_remove_suid(struct dentry *dentry)
> +int should_remove_suid(struct user_namespace *mnt_userns, struct dentry *dentry)
>  {
> -       umode_t mode = d_inode(dentry)->i_mode;
> +       struct inode *inode = d_inode(dentry);
> +       umode_t mode = inode->i_mode;
>         int kill = 0;
>
>         /* suid always must be killed */
>         if (unlikely(mode & S_ISUID))
>                 kill = ATTR_KILL_SUID;
>
> -       /*
> -        * sgid without any exec bits is just a mandatory locking mark; leave
> -        * it alone.  If some exec bits are set, it's a real sgid; kill it.
> -        */
> -       if (unlikely((mode & S_ISGID) && (mode & S_IXGRP)))
> +       if (unlikely(should_remove_sgid(mnt_userns, inode)))
>                 kill |= ATTR_KILL_SGID;

   kill |= should_remove_sgid(mnt_userns, inode);

>
>         if (unlikely(kill && !capable(CAP_FSETID) && S_ISREG(mode)))
> @@ -1983,7 +1974,8 @@ EXPORT_SYMBOL(should_remove_suid);
>   * response to write or truncate. Return 0 if nothing has to be changed.
>   * Negative value on error (change should be denied).
>   */
> -int dentry_needs_remove_privs(struct dentry *dentry)
> +int dentry_needs_remove_privs(struct user_namespace *mnt_userns,
> +                             struct dentry *dentry)
>  {
>         struct inode *inode = d_inode(dentry);
>         int mask = 0;
> @@ -1992,7 +1984,7 @@ int dentry_needs_remove_privs(struct dentry *dentry)
>         if (IS_NOSEC(inode))
>                 return 0;
>
> -       mask = should_remove_suid(dentry);
> +       mask = should_remove_suid(mnt_userns, dentry);
>         ret = security_inode_need_killpriv(dentry);
>         if (ret < 0)
>                 return ret;
> @@ -2024,7 +2016,7 @@ static int __file_remove_privs(struct file *file, unsigned int flags)
>         if (IS_NOSEC(inode) || !S_ISREG(inode->i_mode))
>                 return 0;
>
> -       kill = dentry_needs_remove_privs(dentry);
> +       kill = dentry_needs_remove_privs(file_mnt_user_ns(file), dentry);
>         if (kill < 0)
>                 return kill;
>
> diff --git a/fs/internal.h b/fs/internal.h
> index 9d165ab65a2a..b46881b7f8a0 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -139,7 +139,8 @@ extern int vfs_open(const struct path *, struct file *);
>   * inode.c
>   */
>  extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc);
> -extern int dentry_needs_remove_privs(struct dentry *dentry);
> +extern int dentry_needs_remove_privs(struct user_namespace *,
> +                                    struct dentry *dentry);
>
>  /*
>   * fs-writeback.c
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 9c67edd215d5..e421491783c3 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1991,7 +1991,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
>                 }
>         }
>
> -       if (file && should_remove_suid(file->f_path.dentry)) {
> +       if (file && should_remove_suid(&init_user_ns, file->f_path.dentry)) {
>                 ret = __ocfs2_write_remove_suid(inode, di_bh);
>                 if (ret) {
>                         mlog_errno(ret);
> @@ -2279,7 +2279,7 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
>                  * inode. There's also the dinode i_size state which
>                  * can be lost via setattr during extending writes (we
>                  * set inode->i_size at the end of a write. */
> -               if (should_remove_suid(dentry)) {
> +               if (should_remove_suid(&init_user_ns, dentry)) {
>                         if (meta_level == 0) {
>                                 ocfs2_inode_unlock_for_extent_tree(inode,
>                                                                    &di_bh,
> diff --git a/fs/open.c b/fs/open.c
> index 8a813fa5ca56..d955ecef758f 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -54,7 +54,7 @@ int do_truncate(struct user_namespace *mnt_userns, struct dentry *dentry,
>         }
>
>         /* Remove suid, sgid, and file capabilities on truncate too */
> -       ret = dentry_needs_remove_privs(dentry);
> +       ret = dentry_needs_remove_privs(mnt_userns, dentry);
>         if (ret < 0)
>                 return ret;
>         if (ret)
> @@ -721,10 +721,10 @@ int chown_common(const struct path *path, uid_t user, gid_t group)
>                 return -EINVAL;
>         if ((group != (gid_t)-1) && !setattr_vfsgid(&newattrs, gid))
>                 return -EINVAL;
> -       if (!S_ISDIR(inode->i_mode))
> -               newattrs.ia_valid |=
> -                       ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
>         inode_lock(inode);
> +       if (!S_ISDIR(inode->i_mode))
> +               newattrs.ia_valid |= ATTR_KILL_SUID | ATTR_KILL_PRIV |
> +                                    should_remove_sgid(mnt_userns, inode);

This is making me stop and wonder:
1. This has !S_ISDIR, should_remove_suid() has S_ISREG and
    setattr_drop_sgid() has neither - is this consistent?
2. SUID and PRIV are removed unconditionally and SGID is
    removed conditionally - this is not a change of behavior
    (at least for non-overlayfs), but is it desired???

Those questions could be dealt with in future patches if at all.

The change itself looks legit and solves a real problem, so you may add:

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

Instead of Co-Developed-by ;-)

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/5] attr: add should_remove_sgid()
  2022-10-11  8:18   ` Amir Goldstein
@ 2022-10-11  8:46     ` Christian Brauner
  0 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2022-10-11  8:46 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Darrick J . Wong, Al Viro, Christoph Hellwig,
	Seth Forshee, Yang Xu, Filipe Manana, linux-unionfs,
	linux-fsdevel

On Tue, Oct 11, 2022 at 11:18:22AM +0300, Amir Goldstein wrote:
> On Fri, Oct 7, 2022 at 5:06 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > The current setgid stripping logic during write and ownership change
> > operations is inconsistent and strewn over multiple places. In order to
> > consolidate it and make more consistent we'll add a new helper
> > should_remove_sgid(). The function retains the old behavior where we
> > remove the S_ISGID bit unconditionally when S_IXGRP is set but also when
> > it isn't set and the caller is neither in the group of the inode nor
> > privileged over the inode.
> >
> > We will use this helper both in write operation permission removal such
> > as file_remove_privs() as well as in ownership change operations.
> >
> > Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> 
> Looks good.
> Some suggestions below.
> 
> > ---
> >
> > Notes:
> >     /* v2 */
> >     Dave Chinner <dchinner@redhat.com>:
> >     - Use easier to follow logic in the new helper.
> >
> >  fs/attr.c     | 27 +++++++++++++++++++++++++++
> >  fs/internal.h |  2 ++
> >  2 files changed, 29 insertions(+)
> >
> > diff --git a/fs/attr.c b/fs/attr.c
> > index b1cff6f5b715..d0bb1dae425e 100644
> > --- a/fs/attr.c
> > +++ b/fs/attr.c
> > @@ -39,6 +39,33 @@ static bool setattr_drop_sgid(struct user_namespace *mnt_userns,
> >         return true;
> >  }
> >
> > +/**
> > + * should_remove_sgid - determine whether the setgid bit needs to be removed
> > + * @mnt_userns:        User namespace of the mount the inode was created from
> > + * @inode: inode to check
> > + *
> > + * This function determines whether the setgid bit needs to be removed.
> > + * We retain backwards compatibility and require setgid bit to be removed
> > + * unconditionally if S_IXGRP is set. Otherwise we have the exact same
> > + * requirements as setattr_prepare() and setattr_copy().
> > + *
> > + * Return: ATTR_KILL_SGID if setgid bit needs to be removed, 0 otherwise.
> > + */
> > +int should_remove_sgid(struct user_namespace *mnt_userns,
> > +                      const struct inode *inode)
> > +{
> > +       umode_t mode = inode->i_mode;
> > +
> > +       if (!(mode & S_ISGID))
> > +               return 0;
> > +       if (mode & S_IXGRP)
> > +               return ATTR_KILL_SGID;
> > +       if (setattr_drop_sgid(mnt_userns, inode,
> > +                             i_gid_into_vfsgid(mnt_userns, inode)))
> > +               return ATTR_KILL_SGID;
> > +       return 0;
> 
> If you take my suggestion from patch 1/5, that would become:
> 
>     return setattr_should_remove_sgid(mnt_userns, inode,
>                                  i_gid_into_vfsgid(mnt_userns, inode));
> 
> > +}
> > +
> >  /**
> >   * chown_ok - verify permissions to chown inode
> >   * @mnt_userns:        user namespace of the mount @inode was found from
> > diff --git a/fs/internal.h b/fs/internal.h
> > index 87e96b9024ce..9d165ab65a2a 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
> > @@ -221,3 +221,5 @@ ssize_t do_getxattr(struct user_namespace *mnt_userns,
> >  int setxattr_copy(const char __user *name, struct xattr_ctx *ctx);
> >  int do_setxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
> >                 struct xattr_ctx *ctx);
> > +int should_remove_sgid(struct user_namespace *mnt_userns,
> > +                      const struct inode *inode);
> 
> I realize that you placed this helper in attr.c to make
> setattr_drop_sgid() static, but IMO the code will be clearer to readers
> if all the family of suig/sgid stripping helpers were clustered together
> in inode.c where it will be easier to get the high level view.

I think then we should rather move all those helpers into attr.c. After
all it's is setting/returning iattr flags. Then they can be called from
inode.c

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] attr: use consistent sgid stripping checks
  2022-10-11  8:43   ` Amir Goldstein
@ 2022-10-11  8:56     ` Christian Brauner
  2022-10-11 11:07       ` Amir Goldstein
  0 siblings, 1 reply; 15+ messages in thread
From: Christian Brauner @ 2022-10-11  8:56 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Darrick J . Wong, Al Viro, Christoph Hellwig,
	Seth Forshee, Yang Xu, Filipe Manana, linux-unionfs,
	linux-fsdevel

On Tue, Oct 11, 2022 at 11:43:22AM +0300, Amir Goldstein wrote:
> On Fri, Oct 7, 2022 at 5:06 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > Currently setgid stripping in file_remove_privs()'s should_remove_suid()
> > helper is inconsistent with other parts of the vfs. Specifically, it only
> > raises ATTR_KILL_SGID if the inode is S_ISGID and S_IXGRP but not if the
> > inode isn't in the caller's groups and the caller isn't privileged over the
> > inode although we require this already in setattr_prepare() and
> > setattr_copy() and so all filesystem implement this requirement implicitly
> > because they have to use setattr_{prepare,copy}() anyway.
> >
> > But the inconsistency shows up in setgid stripping bugs for overlayfs in
> > xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
> > generic/687). For example, we test whether suid and setgid stripping works
> > correctly when performing various write-like operations as an unprivileged
> > user (fallocate, reflink, write, etc.):
> >
> > echo "Test 1 - qa_user, non-exec file $verb"
> > setup_testfile
> > chmod a+rws $junk_file
> > commit_and_check "$qa_user" "$verb" 64k 64k
> >
> > The test basically creates a file with 6666 permissions. While the file has
> > the S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. On a
> > regular filesystem like xfs what will happen is:
> >
> > sys_fallocate()
> > -> vfs_fallocate()
> >    -> xfs_file_fallocate()
> >       -> file_modified()
> >          -> __file_remove_privs()
> >             -> dentry_needs_remove_privs()
> >                -> should_remove_suid()
> >             -> __remove_privs()
> >                newattrs.ia_valid = ATTR_FORCE | kill;
> >                -> notify_change()
> >                   -> setattr_copy()
> >
> > In should_remove_suid() we can see that ATTR_KILL_SUID is raised
> > unconditionally because the file in the test has S_ISUID set.
> >
> > But we also see that ATTR_KILL_SGID won't be set because while the file
> > is S_ISGID it is not S_IXGRP (see above) which is a condition for
> > ATTR_KILL_SGID being raised.
> >
> > So by the time we call notify_change() we have attr->ia_valid set to
> > ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
> > ATTR_KILL_SUID is set and does:
> >
> > ia_valid = attr->ia_valid |= ATTR_MODE
> > attr->ia_mode = (inode->i_mode & ~S_ISUID);
> >
> > which means that when we call setattr_copy() later we will definitely
> > update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.
> >
> > Now we call into the filesystem's ->setattr() inode operation which will
> > end up calling setattr_copy(). Since ATTR_MODE is set we will hit:
> >
> > if (ia_valid & ATTR_MODE) {
> >         umode_t mode = attr->ia_mode;
> >         vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
> >         if (!vfsgid_in_group_p(vfsgid) &&
> >             !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
> >                 mode &= ~S_ISGID;
> >         inode->i_mode = mode;
> > }
> >
> > and since the caller in the test is neither capable nor in the group of the
> > inode the S_ISGID bit is stripped.
> >
> > But assume the file isn't suid then ATTR_KILL_SUID won't be raised which
> > has the consequence that neither the setgid nor the suid bits are stripped
> > even though it should be stripped because the inode isn't in the caller's
> > groups and the caller isn't privileged over the inode.
> >
> > If overlayfs is in the mix things become a bit more complicated and the bug
> > shows up more clearly. When e.g., ovl_setattr() is hit from
> > ovl_fallocate()'s call to file_remove_privs() then ATTR_KILL_SUID and
> > ATTR_KILL_SGID might be raised but because the check in notify_change() is
> > questioning the ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be
> > stripped the S_ISGID bit isn't removed even though it should be stripped:
> >
> > sys_fallocate()
> > -> vfs_fallocate()
> >    -> ovl_fallocate()
> >       -> file_remove_privs()
> >          -> dentry_needs_remove_privs()
> >             -> should_remove_suid()
> >          -> __remove_privs()
> >             newattrs.ia_valid = ATTR_FORCE | kill;
> >             -> notify_change()
> >                -> ovl_setattr()
> >                   // TAKE ON MOUNTER'S CREDS
> >                   -> ovl_do_notify_change()
> >                      -> notify_change()
> >                   // GIVE UP MOUNTER'S CREDS
> >      // TAKE ON MOUNTER'S CREDS
> >      -> vfs_fallocate()
> >         -> xfs_file_fallocate()
> >            -> file_modified()
> >               -> __file_remove_privs()
> >                  -> dentry_needs_remove_privs()
> >                     -> should_remove_suid()
> >                  -> __remove_privs()
> >                     newattrs.ia_valid = attr_force | kill;
> >                     -> notify_change()
> >
> > The fix for all of this is to make file_remove_privs()'s
> > should_remove_suid() helper to perform the same checks as we already
> > require in setattr_prepare() and setattr_copy() and have notify_change()
> > not pointlessly requiring S_IXGRP again. It doesn't make any sense in the
> > first place because the caller must calculate the flags via
> > should_remove_suid() anyway which would raise ATTR_KILL_SGID.
> >
> > Running xfstests with this doesn't report any regressions. We should really
> > try and use consistent checks.
> >
> > Co-Developed-by: Amir Goldstein <amir73il@gmail.com>
> > Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> > ---
> >
> > Notes:
> >     /* v2 */
> >     Amir Goldstein <amir73il@gmail.com>:
> >     - mention xfstests that failed prior to that
> >
> >     Christian Brauner <brauner@kernel.org>:
> >     - Use should_remove_sgid() in chown_common() just like we do in do_truncate().
> >
> >  fs/attr.c          |  2 +-
> >  fs/fuse/file.c     |  2 +-
> >  fs/inode.c         | 24 ++++++++----------------
> >  fs/internal.h      |  3 ++-
> >  fs/ocfs2/file.c    |  4 ++--
> >  fs/open.c          |  8 ++++----
> >  include/linux/fs.h |  2 +-
> >  7 files changed, 19 insertions(+), 26 deletions(-)
> >
> > diff --git a/fs/attr.c b/fs/attr.c
> > index d0bb1dae425e..888b34e8c268 100644
> > --- a/fs/attr.c
> > +++ b/fs/attr.c
> > @@ -421,7 +421,7 @@ int notify_change(struct user_namespace *mnt_userns, struct dentry *dentry,
> >                 }
> >         }
> >         if (ia_valid & ATTR_KILL_SGID) {
> > -               if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> > +               if (mode & S_ISGID) {
> >                         if (!(ia_valid & ATTR_MODE)) {
> >                                 ia_valid = attr->ia_valid |= ATTR_MODE;
> >                                 attr->ia_mode = inode->i_mode;
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 1a3afd469e3a..fccc2c7e88fd 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -1313,7 +1313,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >                         return err;
> >
> >                 if (fc->handle_killpriv_v2 &&
> > -                   should_remove_suid(file_dentry(file))) {
> > +                   should_remove_suid(&init_user_ns, file_dentry(file))) {
> >                         goto writethrough;
> >                 }
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index ba1de23c13c1..092a66324c65 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -1949,26 +1949,17 @@ void touch_atime(const struct path *path)
> >  }
> >  EXPORT_SYMBOL(touch_atime);
> >
> > -/*
> > - * The logic we want is
> > - *
> > - *     if suid or (sgid and xgrp)
> > - *             remove privs
> > - */
> > -int should_remove_suid(struct dentry *dentry)
> > +int should_remove_suid(struct user_namespace *mnt_userns, struct dentry *dentry)
> >  {
> > -       umode_t mode = d_inode(dentry)->i_mode;
> > +       struct inode *inode = d_inode(dentry);
> > +       umode_t mode = inode->i_mode;
> >         int kill = 0;
> >
> >         /* suid always must be killed */
> >         if (unlikely(mode & S_ISUID))
> >                 kill = ATTR_KILL_SUID;
> >
> > -       /*
> > -        * sgid without any exec bits is just a mandatory locking mark; leave
> > -        * it alone.  If some exec bits are set, it's a real sgid; kill it.
> > -        */
> > -       if (unlikely((mode & S_ISGID) && (mode & S_IXGRP)))
> > +       if (unlikely(should_remove_sgid(mnt_userns, inode)))
> >                 kill |= ATTR_KILL_SGID;
> 
>    kill |= should_remove_sgid(mnt_userns, inode);
> 
> >
> >         if (unlikely(kill && !capable(CAP_FSETID) && S_ISREG(mode)))
> > @@ -1983,7 +1974,8 @@ EXPORT_SYMBOL(should_remove_suid);
> >   * response to write or truncate. Return 0 if nothing has to be changed.
> >   * Negative value on error (change should be denied).
> >   */
> > -int dentry_needs_remove_privs(struct dentry *dentry)
> > +int dentry_needs_remove_privs(struct user_namespace *mnt_userns,
> > +                             struct dentry *dentry)
> >  {
> >         struct inode *inode = d_inode(dentry);
> >         int mask = 0;
> > @@ -1992,7 +1984,7 @@ int dentry_needs_remove_privs(struct dentry *dentry)
> >         if (IS_NOSEC(inode))
> >                 return 0;
> >
> > -       mask = should_remove_suid(dentry);
> > +       mask = should_remove_suid(mnt_userns, dentry);
> >         ret = security_inode_need_killpriv(dentry);
> >         if (ret < 0)
> >                 return ret;
> > @@ -2024,7 +2016,7 @@ static int __file_remove_privs(struct file *file, unsigned int flags)
> >         if (IS_NOSEC(inode) || !S_ISREG(inode->i_mode))
> >                 return 0;
> >
> > -       kill = dentry_needs_remove_privs(dentry);
> > +       kill = dentry_needs_remove_privs(file_mnt_user_ns(file), dentry);
> >         if (kill < 0)
> >                 return kill;
> >
> > diff --git a/fs/internal.h b/fs/internal.h
> > index 9d165ab65a2a..b46881b7f8a0 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
> > @@ -139,7 +139,8 @@ extern int vfs_open(const struct path *, struct file *);
> >   * inode.c
> >   */
> >  extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc);
> > -extern int dentry_needs_remove_privs(struct dentry *dentry);
> > +extern int dentry_needs_remove_privs(struct user_namespace *,
> > +                                    struct dentry *dentry);
> >
> >  /*
> >   * fs-writeback.c
> > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> > index 9c67edd215d5..e421491783c3 100644
> > --- a/fs/ocfs2/file.c
> > +++ b/fs/ocfs2/file.c
> > @@ -1991,7 +1991,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
> >                 }
> >         }
> >
> > -       if (file && should_remove_suid(file->f_path.dentry)) {
> > +       if (file && should_remove_suid(&init_user_ns, file->f_path.dentry)) {
> >                 ret = __ocfs2_write_remove_suid(inode, di_bh);
> >                 if (ret) {
> >                         mlog_errno(ret);
> > @@ -2279,7 +2279,7 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
> >                  * inode. There's also the dinode i_size state which
> >                  * can be lost via setattr during extending writes (we
> >                  * set inode->i_size at the end of a write. */
> > -               if (should_remove_suid(dentry)) {
> > +               if (should_remove_suid(&init_user_ns, dentry)) {
> >                         if (meta_level == 0) {
> >                                 ocfs2_inode_unlock_for_extent_tree(inode,
> >                                                                    &di_bh,
> > diff --git a/fs/open.c b/fs/open.c
> > index 8a813fa5ca56..d955ecef758f 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -54,7 +54,7 @@ int do_truncate(struct user_namespace *mnt_userns, struct dentry *dentry,
> >         }
> >
> >         /* Remove suid, sgid, and file capabilities on truncate too */
> > -       ret = dentry_needs_remove_privs(dentry);
> > +       ret = dentry_needs_remove_privs(mnt_userns, dentry);
> >         if (ret < 0)
> >                 return ret;
> >         if (ret)
> > @@ -721,10 +721,10 @@ int chown_common(const struct path *path, uid_t user, gid_t group)
> >                 return -EINVAL;
> >         if ((group != (gid_t)-1) && !setattr_vfsgid(&newattrs, gid))
> >                 return -EINVAL;
> > -       if (!S_ISDIR(inode->i_mode))
> > -               newattrs.ia_valid |=
> > -                       ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
> >         inode_lock(inode);
> > +       if (!S_ISDIR(inode->i_mode))
> > +               newattrs.ia_valid |= ATTR_KILL_SUID | ATTR_KILL_PRIV |
> > +                                    should_remove_sgid(mnt_userns, inode);
> 
> This is making me stop and wonder:
> 1. This has !S_ISDIR, should_remove_suid() has S_ISREG and
>     setattr_drop_sgid() has neither - is this consistent?

I thought about that. It'very likely redundant since we deal with that
in other parts but I need to verify all callers before we can remove
that.

> 2. SUID and PRIV are removed unconditionally and SGID is
>     removed conditionally - this is not a change of behavior
>     (at least for non-overlayfs), but is it desired???

It looks that way but it isn't. The setgid bit was only killed
unconditionally for S_IXGRP. We continue to do that. But it was always
removed conditionally for ~S_IXGRP. The difference between this patchset
and earlier is that it was done in settattr_prepare() or setattr_copy()
before this change.

IOW, we raised ATTR_KILL_SGID unconditionally but then only
conditionally obeyed it in setattr_{prepare,copy}() whereas now we
conditionally raise ATTR_KILL_SGID. That's surely a slight change but it
just means that we don't cause bugs for filesystems that roll their own
prepare or copy helpers and is just nicer overall.

> 
> Those questions could be dealt with in future patches if at all.

Yes, I think we're not past cleaning this up but moving slowly here
can't hurt tbh.

> 
> The change itself looks legit and solves a real problem, so you may add:
> 
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> 
> Instead of Co-Developed-by ;-)

Well, you did suggest the notify_change() bit.
But I'll take the RVB. ;)
Thank you!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/5] attr: add setattr_drop_sgid()
  2022-10-11  8:11   ` Amir Goldstein
@ 2022-10-11  8:57     ` Christian Brauner
  0 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2022-10-11  8:57 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Darrick J . Wong, Al Viro, Christoph Hellwig,
	Seth Forshee, Yang Xu, Filipe Manana, linux-unionfs,
	linux-fsdevel

On Tue, Oct 11, 2022 at 11:11:37AM +0300, Amir Goldstein wrote:
> On Fri, Oct 7, 2022 at 5:06 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > In setattr_{copy,prepare}() we need to perform the same permission
> > checks to determine whether we need to drop the setgid bit or not.
> > Instead of open-coding it twice add a simple helper the encapsulates the
> > logic. We will reuse this helpers to make dropping the setgid bit during
> > write operations more consistent in a follow up patch.
> >
> > Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> 
> Looks good.
> Some suggestions below - not  a must.
> 
> Thanks,
> Amir.
> 
> > ---
> >
> > Notes:
> >     /* v2 */
> >     patch added
> >
> >  fs/attr.c | 29 ++++++++++++++++++++++++-----
> >  1 file changed, 24 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/attr.c b/fs/attr.c
> > index 1552a5f23d6b..b1cff6f5b715 100644
> > --- a/fs/attr.c
> > +++ b/fs/attr.c
> > @@ -18,6 +18,27 @@
> >  #include <linux/evm.h>
> >  #include <linux/ima.h>
> >
> > +/**
> > + * setattr_drop_sgid - check generic setgid permissions
> 
> Helper name sounds like a directive, where it should sound like a question.
> e.g. setattr_should_remove_sgid()

sounds good

> 
> > + * @mnt_userns:        User namespace of the mount the inode was created from
> > + * @inode: inode to check
> > + * @vfsgid: the new/current vfsgid of @inode
> > + *
> > + * This function determines whether the setgid bit needs to be removed because
> > + * the caller lacks privileges over the inode.
> > + *
> > + * Return: true if the setgid bit needs to be removed, false if not.
> 
> You may want to consider matching the return value to that of
> should_remove_sgid(), that is 0 or ATTR_KILL_SGID to make all the family of
> those helpers behave similarly.

fine by me

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] attr: use consistent sgid stripping checks
  2022-10-11  8:56     ` Christian Brauner
@ 2022-10-11 11:07       ` Amir Goldstein
  2022-10-11 13:48         ` Christian Brauner
  0 siblings, 1 reply; 15+ messages in thread
From: Amir Goldstein @ 2022-10-11 11:07 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Miklos Szeredi, Darrick J . Wong, Al Viro, Christoph Hellwig,
	Seth Forshee, Yang Xu, Filipe Manana, linux-unionfs,
	linux-fsdevel

> > > @@ -721,10 +721,10 @@ int chown_common(const struct path *path, uid_t user, gid_t group)
> > >                 return -EINVAL;
> > >         if ((group != (gid_t)-1) && !setattr_vfsgid(&newattrs, gid))
> > >                 return -EINVAL;
> > > -       if (!S_ISDIR(inode->i_mode))
> > > -               newattrs.ia_valid |=
> > > -                       ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
> > >         inode_lock(inode);
> > > +       if (!S_ISDIR(inode->i_mode))
> > > +               newattrs.ia_valid |= ATTR_KILL_SUID | ATTR_KILL_PRIV |
> > > +                                    should_remove_sgid(mnt_userns, inode);
> >
> > This is making me stop and wonder:
> > 1. This has !S_ISDIR, should_remove_suid() has S_ISREG and
> >     setattr_drop_sgid() has neither - is this consistent?
>
> I thought about that. It'very likely redundant since we deal with that
> in other parts but I need to verify all callers before we can remove
> that.
>
> > 2. SUID and PRIV are removed unconditionally and SGID is
> >     removed conditionally - this is not a change of behavior
> >     (at least for non-overlayfs), but is it desired???
>
> It looks that way but it isn't. The setgid bit was only killed
> unconditionally for S_IXGRP. We continue to do that. But it was always
> removed conditionally for ~S_IXGRP. The difference between this patchset
> and earlier is that it was done in settattr_prepare() or setattr_copy()
> before this change.
>
> IOW, we raised ATTR_KILL_SGID unconditionally but then only
> conditionally obeyed it in setattr_{prepare,copy}() whereas now we
> conditionally raise ATTR_KILL_SGID. That's surely a slight change but it
> just means that we don't cause bugs for filesystems that roll their own
> prepare or copy helpers and is just nicer overall.
>

Yes, that sounds right.

The point that I was trying to make and failed to articulate myself was
that chown_common() raises ATTR_KILL_SUID unconditionally,
while should_remove_suid() raises ATTR_KILL_SUID conditional
to !capable(CAP_FSETID).

Is this inconsistency in stripping SUID desired?

According to man page (I think that) it is:

"When the owner or group of an executable file is changed by an
 unprivileged user, the S_ISUID and S_ISGID mode bits are cleared.
 POSIX does not specify whether this also  should  happen  when  root
 does the chown(); the Linux behavior depends on the kernel version,
 and since Linux 2.2.13, root is treated like other users..."

So special casing SUID stripping in chown() looks intentional,
but maybe it is worth a comment.

The paragraph above *may* be interpreted that chown() should strip
S_SGID|S_IXGRP regardless of CAP_FSETID, which, as you say,
has not been the case for a while.

"...In case of a non-group-executable file (i.e., one for which the
 S_IXGRP bit is not set) the S_ISGID bit indicates mandatory locking,
 and is not cleared by a chown().
 When the owner or group of an executable file is changed (by any user),
 all capability sets for the file are cleared."

This part sounds aligned with the code.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] attr: use consistent sgid stripping checks
  2022-10-11 11:07       ` Amir Goldstein
@ 2022-10-11 13:48         ` Christian Brauner
  0 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2022-10-11 13:48 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Darrick J . Wong, Al Viro, Christoph Hellwig,
	Seth Forshee, Yang Xu, Filipe Manana, linux-unionfs,
	linux-fsdevel

On Tue, Oct 11, 2022 at 02:07:10PM +0300, Amir Goldstein wrote:
> > > > @@ -721,10 +721,10 @@ int chown_common(const struct path *path, uid_t user, gid_t group)
> > > >                 return -EINVAL;
> > > >         if ((group != (gid_t)-1) && !setattr_vfsgid(&newattrs, gid))
> > > >                 return -EINVAL;
> > > > -       if (!S_ISDIR(inode->i_mode))
> > > > -               newattrs.ia_valid |=
> > > > -                       ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
> > > >         inode_lock(inode);
> > > > +       if (!S_ISDIR(inode->i_mode))
> > > > +               newattrs.ia_valid |= ATTR_KILL_SUID | ATTR_KILL_PRIV |
> > > > +                                    should_remove_sgid(mnt_userns, inode);
> > >
> > > This is making me stop and wonder:
> > > 1. This has !S_ISDIR, should_remove_suid() has S_ISREG and
> > >     setattr_drop_sgid() has neither - is this consistent?
> >
> > I thought about that. It'very likely redundant since we deal with that
> > in other parts but I need to verify all callers before we can remove
> > that.
> >
> > > 2. SUID and PRIV are removed unconditionally and SGID is
> > >     removed conditionally - this is not a change of behavior
> > >     (at least for non-overlayfs), but is it desired???
> >
> > It looks that way but it isn't. The setgid bit was only killed
> > unconditionally for S_IXGRP. We continue to do that. But it was always
> > removed conditionally for ~S_IXGRP. The difference between this patchset
> > and earlier is that it was done in settattr_prepare() or setattr_copy()
> > before this change.
> >
> > IOW, we raised ATTR_KILL_SGID unconditionally but then only
> > conditionally obeyed it in setattr_{prepare,copy}() whereas now we
> > conditionally raise ATTR_KILL_SGID. That's surely a slight change but it
> > just means that we don't cause bugs for filesystems that roll their own
> > prepare or copy helpers and is just nicer overall.
> >
> 
> Yes, that sounds right.
> 
> The point that I was trying to make and failed to articulate myself was
> that chown_common() raises ATTR_KILL_SUID unconditionally,
> while should_remove_suid() raises ATTR_KILL_SUID conditional
> to !capable(CAP_FSETID).
> 
> Is this inconsistency in stripping SUID desired?

I looked at this before and it likely isn't intentional. But I need to
do pre-git archeology to determine that after I'm back from PTO. It
likely is something we can tackle.

> 
> According to man page (I think that) it is:
> 
> "When the owner or group of an executable file is changed by an
>  unprivileged user, the S_ISUID and S_ISGID mode bits are cleared.
>  POSIX does not specify whether this also  should  happen  when  root
>  does the chown(); the Linux behavior depends on the kernel version,
>  and since Linux 2.2.13, root is treated like other users..."
> 
> So special casing SUID stripping in chown() looks intentional,
> but maybe it is worth a comment.

It definitely is worth a comment but I think instead we should in the
future risk changing this for the write path as well. Because right now
losing the S_ISGID bit during chown() for regular files unconditionally
is important to not accidently have root create a situation where they
open a way for an unprivileged user to escalate privileges when chowning
a non-root owned setuid binary to a root-owned setuid binary:

touch aaa
chown 1000:1000
chmod u+s aaa
sudo chown aaa

and if the setuid bit would be retained then an unpriv user can now
abuse the setuid binary - if they can execute ofc. So that's why it's
dropped unconditionally. However, if that is a valid attack scenario
then a write should also drop setuid unconditionally since a non-harmful
setuid binary could be changed to a harmful one.

> 
> The paragraph above *may* be interpreted that chown() should strip
> S_SGID|S_IXGRP regardless of CAP_FSETID, which, as you say,
> has not been the case for a while.

Yeah, for the setgid bit we've been dropping it implicitly currently.

Thanks!
Christian

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-10-11 13:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-07 14:05 [PATCH v2 0/5] fs: improve setgid stripping consistency even more Christian Brauner
2022-10-07 14:05 ` [PATCH v2 1/5] attr: add setattr_drop_sgid() Christian Brauner
2022-10-11  8:11   ` Amir Goldstein
2022-10-11  8:57     ` Christian Brauner
2022-10-07 14:05 ` [PATCH v2 2/5] attr: add should_remove_sgid() Christian Brauner
     [not found]   ` <202210080357.inSALqdT-lkp@intel.com>
2022-10-08  5:56     ` Christian Brauner
2022-10-11  8:18   ` Amir Goldstein
2022-10-11  8:46     ` Christian Brauner
2022-10-07 14:05 ` [PATCH v2 3/5] attr: use consistent sgid stripping checks Christian Brauner
2022-10-11  8:43   ` Amir Goldstein
2022-10-11  8:56     ` Christian Brauner
2022-10-11 11:07       ` Amir Goldstein
2022-10-11 13:48         ` Christian Brauner
2022-10-07 14:05 ` [PATCH v2 4/5] ovl: remove privs in ovl_copyfile() Christian Brauner
2022-10-07 14:05 ` [PATCH v2 5/5] ovl: remove privs in ovl_fallocate() Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).