[PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
@ 2022-02-21 18:22 Andrey Zhadchenko
  2022-02-22  8:33 ` Christoph Hellwig
  2022-02-25  1:57 ` Darrick J. Wong
  0 siblings, 2 replies; 16+ messages in thread
From: Andrey Zhadchenko @ 2022-02-21 18:22 UTC (permalink / raw)
  To: linux-xfs; +Cc: christian.brauner, hch, djwong

xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
bits.
Unfortunately chown syscall results in different callstask:
i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
has CAP_FSETID capable in init_user_ns rather than mntns userns.

Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
---
 fs/xfs/xfs_iops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 09211e1d08ad..5b1fe635d153 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -774,7 +774,7 @@ xfs_setattr_nonsize(
 		 * cleared upon successful return from chown()
 		 */
 		if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
-		    !capable(CAP_FSETID))
+		    !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
 			inode->i_mode &= ~(S_ISUID|S_ISGID);
 
 		/*
-- 
2.35.0.rc2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-21 18:22 [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts Andrey Zhadchenko
@ 2022-02-22  8:33 ` Christoph Hellwig
  2022-02-22  9:25   ` Andrey Zhadchenko
  2022-02-22 10:24   ` Christian Brauner
  2022-02-25  1:57 ` Darrick J. Wong
  1 sibling, 2 replies; 16+ messages in thread
From: Christoph Hellwig @ 2022-02-22  8:33 UTC (permalink / raw)
  To: Andrey Zhadchenko; +Cc: linux-xfs, christian.brauner, hch, djwong

On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
> xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
> bits.
> Unfortunately chown syscall results in different callstask:
> i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
> has CAP_FSETID capable in init_user_ns rather than mntns userns.

Can you add an xfstests the exercises this path?

The fix itself looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-22  8:33 ` Christoph Hellwig
@ 2022-02-22  9:25   ` Andrey Zhadchenko
  2022-02-22 10:24   ` Christian Brauner
  1 sibling, 0 replies; 16+ messages in thread
From: Andrey Zhadchenko @ 2022-02-22  9:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, christian.brauner, djwong


On 2/22/22 11:33, Christoph Hellwig wrote:
> On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
>> xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
>> bits.
>> Unfortunately chown syscall results in different callstask:
>> i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
>> has CAP_FSETID capable in init_user_ns rather than mntns userns.
> 
> Can you add an xfstests the exercises this path?

Will do

> 
> The fix itself looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-22  8:33 ` Christoph Hellwig
  2022-02-22  9:25   ` Andrey Zhadchenko
@ 2022-02-22 10:24   ` Christian Brauner
  2022-02-22 11:19     ` Andrey Zhadchenko
  1 sibling, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2022-02-22 10:24 UTC (permalink / raw)
  To: Christoph Hellwig, Andrey Zhadchenko
  Cc: Andrey Zhadchenko, linux-xfs, brauner, djwong

On Tue, Feb 22, 2022 at 09:33:40AM +0100, Christoph Hellwig wrote:
> On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
> > xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
> > bits.
> > Unfortunately chown syscall results in different callstask:
> > i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
> > has CAP_FSETID capable in init_user_ns rather than mntns userns.
> 
> Can you add an xfstests the exercises this path?
> 
> The fix itself looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

So for anything other than directories the s{g,u}id bits are cleared on
every chown in notify_change() by the vfs; even for the root user (Also
documented on chown(2) manpage).

So the only scenario were this change would be relevant is for
directories afaict:

1. So ext4 has the behavior:

   ubuntu@f2-vm|~
   > mkdir suid.dir
   ubuntu@f2-vm|~
   > perms ./suid.dir
   drwxrwxr-x 775 (1000:1000) ./suid.dir
   ubuntu@f2-vm|~
   > chmod u+s ./suid.dir/
   ubuntu@f2-vm|~
   > perms ./suid.dir
   drwsrwxr-x 4775 (1000:1000) ./suid.dir
   ubuntu@f2-vm|~
   > chmod g+s ./suid.dir/
   ubuntu@f2-vm|~
   > perms ./suid.dir
   drwsrwsr-x 6775 (1000:1000) ./suid.dir
   ubuntu@f2-vm|~
   > chown 1000:1000 ./suid.dir/
   ubuntu@f2-vm|~
   > perms ./suid.dir/
   drwsrwsr-x 6775 (1000:1000) ./suid.dir/
   
   meaning that both s{g,u}id bits are retained for directories. (Just to
   make this explicit: changing {g,u}id to the same {g,u}id still ends up
   calling into the filesystem.)

2. Whereas xfs currently has:

   brauner@wittgenstein|~
   > mkdir suid.dir
   brauner@wittgenstein|~
   > perms ./suid.dir
   drwxrwxr-x 775 ./suid.dir
   brauner@wittgenstein|~
   > chmod u+s ./suid.dir/
   brauner@wittgenstein|~
   > perms ./suid.dir
   drwsrwxr-x 4775 ./suid.dir
   brauner@wittgenstein|~
   > chmod g+s ./suid.dir/
   brauner@wittgenstein|~
   > perms ./suid.dir
   drwsrwsr-x 6775 ./suid.dir
   brauner@wittgenstein|~
   > chown 1000:1000 ./suid.dir/
   brauner@wittgenstein|~
   > perms ./suid.dir/
   drwxrwxr-x 775 ./suid.dir/
   
   meaning that both s{g,u}id bits are cleared for directories.

Since the vfs will always ensure that s{g,u}id bits are stripped for
anything that isn't a directory in the vfs:
- ATTR_KILL_S{G,U}ID is raised in chown_common():

	if (!S_ISDIR(inode->i_mode))
		newattrs.ia_valid |=
			ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;

- and then in notify_change() we'll get the bits stripped and ATTR_MODE
  raised:

	if (ia_valid & ATTR_KILL_SUID) {
		if (mode & S_ISUID) {
			ia_valid = attr->ia_valid |= ATTR_MODE;
			attr->ia_mode = (inode->i_mode & ~S_ISUID);
		}
	}
	if (ia_valid & ATTR_KILL_SGID) {
		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
			if (!(ia_valid & ATTR_MODE)) {
				ia_valid = attr->ia_valid |= ATTR_MODE;
				attr->ia_mode = inode->i_mode;
			}
			attr->ia_mode &= ~S_ISGID;
		}
	}

we can change this codepath to just mirror setattr_copy() or switch
fully to setattr_copy() (if feasible).

Because as of right now the code seems to imply that the xfs code itself
is responsible for stripping s{g,u}id bits for all files whereas it is
the vfs that does it for any non-directory. So I'd propose to either try
and switch that code to setattr_copy() or to do open-code the
setattr_copy() check:

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index b79b3846e71b..ff55b31521a2 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -748,9 +748,13 @@ xfs_setattr_nonsize(
                 * The set-user-ID and set-group-ID bits of a file will be
                 * cleared upon successful return from chown()
                 */
-               if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
-                   !capable(CAP_FSETID))
-                       inode->i_mode &= ~(S_ISUID|S_ISGID);
+               if (iattr->ia_valid & ATTR_MODE) {
+                       umode_t mode = iattr->ia_mode;
+                       if (!in_group_p(i_gid_into_mnt(mnt_userns, inode)) &&
+                           !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
+                               mode &= ~S_ISGID;
+                       inode->i_mode = mode;
+               }

                /*
                 * Change the ownerships and register quota modifications

which aligns xfs with ext4 and any other filesystem. Any thoughts on
this?

For @Andrey specifically: the tests these should go into:

https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/idmapped-mounts/idmapped-mounts.c

Note that there are already setgid inheritance tests and set*id
execution tests in there.
You should be able to copy a lot of code from them. Could you please add
the test I sketched above and ideally also a test that the set{g,u}id
bits are stripped during chown for regular files?

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-22 10:24   ` Christian Brauner
@ 2022-02-22 11:19     ` Andrey Zhadchenko
  2022-02-22 12:23       ` Christian Brauner
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Zhadchenko @ 2022-02-22 11:19 UTC (permalink / raw)
  To: Christian Brauner, Christoph Hellwig; +Cc: linux-xfs, djwong

On 2/22/22 13:24, Christian Brauner wrote:
> On Tue, Feb 22, 2022 at 09:33:40AM +0100, Christoph Hellwig wrote:
>> On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
>>> xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
>>> bits.
>>> Unfortunately chown syscall results in different callstask:
>>> i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
>>> has CAP_FSETID capable in init_user_ns rather than mntns userns.
>>
>> Can you add an xfstests the exercises this path?
>>
>> The fix itself looks good:
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> So for anything other than directories the s{g,u}id bits are cleared on
> every chown in notify_change() by the vfs; even for the root user (Also
> documented on chown(2) manpage).

Only exception - chown preserves setgid bit set on a 
non-group-executable file (also documented there) but do not take root 
privileges into account at vfs level.

> 
> So the only scenario were this change would be relevant is for
> directories afaict:
> 
> 1. So ext4 has the behavior:
> 
>     ubuntu@f2-vm|~
>     > mkdir suid.dir
>     ubuntu@f2-vm|~
>     > perms ./suid.dir
>     drwxrwxr-x 775 (1000:1000) ./suid.dir
>     ubuntu@f2-vm|~
>     > chmod u+s ./suid.dir/
>     ubuntu@f2-vm|~
>     > perms ./suid.dir
>     drwsrwxr-x 4775 (1000:1000) ./suid.dir
>     ubuntu@f2-vm|~
>     > chmod g+s ./suid.dir/
>     ubuntu@f2-vm|~
>     > perms ./suid.dir
>     drwsrwsr-x 6775 (1000:1000) ./suid.dir
>     ubuntu@f2-vm|~
>     > chown 1000:1000 ./suid.dir/
>     ubuntu@f2-vm|~
>     > perms ./suid.dir/
>     drwsrwsr-x 6775 (1000:1000) ./suid.dir/
>     
>     meaning that both s{g,u}id bits are retained for directories. (Just to
>     make this explicit: changing {g,u}id to the same {g,u}id still ends up
>     calling into the filesystem.)
> 
> 2. Whereas xfs currently has:
> 
>     brauner@wittgenstein|~
>     > mkdir suid.dir
>     brauner@wittgenstein|~
>     > perms ./suid.dir
>     drwxrwxr-x 775 ./suid.dir
>     brauner@wittgenstein|~
>     > chmod u+s ./suid.dir/
>     brauner@wittgenstein|~
>     > perms ./suid.dir
>     drwsrwxr-x 4775 ./suid.dir
>     brauner@wittgenstein|~
>     > chmod g+s ./suid.dir/
>     brauner@wittgenstein|~
>     > perms ./suid.dir
>     drwsrwsr-x 6775 ./suid.dir
>     brauner@wittgenstein|~
>     > chown 1000:1000 ./suid.dir/
>     brauner@wittgenstein|~
>     > perms ./suid.dir/
>     drwxrwxr-x 775 ./suid.dir/
>     
>     meaning that both s{g,u}id bits are cleared for directories.
> 
> Since the vfs will always ensure that s{g,u}id bits are stripped for
> anything that isn't a directory in the vfs:
> - ATTR_KILL_S{G,U}ID is raised in chown_common():
> 
> 	if (!S_ISDIR(inode->i_mode))
> 		newattrs.ia_valid |=
> 			ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
> 
> - and then in notify_change() we'll get the bits stripped and ATTR_MODE
>    raised:
> 
> 	if (ia_valid & ATTR_KILL_SUID) {
> 		if (mode & S_ISUID) {
> 			ia_valid = attr->ia_valid |= ATTR_MODE;
> 			attr->ia_mode = (inode->i_mode & ~S_ISUID);
> 		}
> 	}
> 	if (ia_valid & ATTR_KILL_SGID) {
> 		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {

So SGID is not killed if there is no S_IXGRP (yet no capability check)

Actually I do not really understand why do kernel expects filesystems to 
further apply restrictions with CAP_FSETID. Why not kill it here since 
we have all info?

> 			if (!(ia_valid & ATTR_MODE)) {
> 				ia_valid = attr->ia_valid |= ATTR_MODE;
> 				attr->ia_mode = inode->i_mode;
> 			}
> 			attr->ia_mode &= ~S_ISGID;
> 		}
> 	}
> 
> we can change this codepath to just mirror setattr_copy() or switch
> fully to setattr_copy() (if feasible).
> 
> Because as of right now the code seems to imply that the xfs code itself
> is responsible for stripping s{g,u}id bits for all files whereas it is
> the vfs that does it for any non-directory. So I'd propose to either try
> and switch that code to setattr_copy() or to do open-code the
> setattr_copy() check:
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index b79b3846e71b..ff55b31521a2 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -748,9 +748,13 @@ xfs_setattr_nonsize(
>                   * The set-user-ID and set-group-ID bits of a file will be
>                   * cleared upon successful return from chown()
>                   */
> -               if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
> -                   !capable(CAP_FSETID))
> -                       inode->i_mode &= ~(S_ISUID|S_ISGID);
> +               if (iattr->ia_valid & ATTR_MODE) {
> +                       umode_t mode = iattr->ia_mode;
> +                       if (!in_group_p(i_gid_into_mnt(mnt_userns, inode)) &&
> +                           !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
> +                               mode &= ~S_ISGID;
> +                       inode->i_mode = mode;
> +               }
> 
>                  /*
>                   * Change the ownerships and register quota modifications
> 
> which aligns xfs with ext4 and any other filesystem. Any thoughts on
> this?
> 
> For @Andrey specifically: the tests these should go into:
> 
> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/idmapped-mounts/idmapped-mounts.c
> 
> Note that there are already setgid inheritance tests and set*id
> execution tests in there.
> You should be able to copy a lot of code from them. Could you please add
> the test I sketched above and ideally also a test that the set{g,u}id
> bits are stripped during chown for regular files?
Thanks for the link. To clarify what tests and result you expect:
- for directory chown we expect to preserve s{g,u}id
- for regfile chown we expect to preserve S_ISGID only if S_IXGRP is 
absent and CAP_FSETID is present

JFYI: I found out this problem while running LTP (specifically 
syscalls/chown02 test) on idmapped XFS. Maybe I will be able to find 
more, who knows.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-22 11:19     ` Andrey Zhadchenko
@ 2022-02-22 12:23       ` Christian Brauner
  2022-02-22 12:36         ` Christian Brauner
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2022-02-22 12:23 UTC (permalink / raw)
  To: Andrey Zhadchenko; +Cc: Christoph Hellwig, linux-xfs, djwong

On Tue, Feb 22, 2022 at 02:19:16PM +0300, Andrey Zhadchenko wrote:
> On 2/22/22 13:24, Christian Brauner wrote:
> > On Tue, Feb 22, 2022 at 09:33:40AM +0100, Christoph Hellwig wrote:
> > > On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
> > > > xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
> > > > bits.
> > > > Unfortunately chown syscall results in different callstask:
> > > > i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
> > > > has CAP_FSETID capable in init_user_ns rather than mntns userns.
> > > 
> > > Can you add an xfstests the exercises this path?
> > > 
> > > The fix itself looks good:
> > > 
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > 
> > So for anything other than directories the s{g,u}id bits are cleared on
> > every chown in notify_change() by the vfs; even for the root user (Also
> > documented on chown(2) manpage).
> 
> Only exception - chown preserves setgid bit set on a non-group-executable
> file (also documented there) but do not take root privileges into account at
> vfs level.
> 
> > 
> > So the only scenario were this change would be relevant is for
> > directories afaict:
> > 
> > 1. So ext4 has the behavior:
> > 
> >     ubuntu@f2-vm|~
> >     > mkdir suid.dir
> >     ubuntu@f2-vm|~
> >     > perms ./suid.dir
> >     drwxrwxr-x 775 (1000:1000) ./suid.dir
> >     ubuntu@f2-vm|~
> >     > chmod u+s ./suid.dir/
> >     ubuntu@f2-vm|~
> >     > perms ./suid.dir
> >     drwsrwxr-x 4775 (1000:1000) ./suid.dir
> >     ubuntu@f2-vm|~
> >     > chmod g+s ./suid.dir/
> >     ubuntu@f2-vm|~
> >     > perms ./suid.dir
> >     drwsrwsr-x 6775 (1000:1000) ./suid.dir
> >     ubuntu@f2-vm|~
> >     > chown 1000:1000 ./suid.dir/
> >     ubuntu@f2-vm|~
> >     > perms ./suid.dir/
> >     drwsrwsr-x 6775 (1000:1000) ./suid.dir/
> >     meaning that both s{g,u}id bits are retained for directories. (Just to
> >     make this explicit: changing {g,u}id to the same {g,u}id still ends up
> >     calling into the filesystem.)
> > 
> > 2. Whereas xfs currently has:
> > 
> >     brauner@wittgenstein|~
> >     > mkdir suid.dir
> >     brauner@wittgenstein|~
> >     > perms ./suid.dir
> >     drwxrwxr-x 775 ./suid.dir
> >     brauner@wittgenstein|~
> >     > chmod u+s ./suid.dir/
> >     brauner@wittgenstein|~
> >     > perms ./suid.dir
> >     drwsrwxr-x 4775 ./suid.dir
> >     brauner@wittgenstein|~
> >     > chmod g+s ./suid.dir/
> >     brauner@wittgenstein|~
> >     > perms ./suid.dir
> >     drwsrwsr-x 6775 ./suid.dir
> >     brauner@wittgenstein|~
> >     > chown 1000:1000 ./suid.dir/
> >     brauner@wittgenstein|~
> >     > perms ./suid.dir/
> >     drwxrwxr-x 775 ./suid.dir/
> >     meaning that both s{g,u}id bits are cleared for directories.
> > 
> > Since the vfs will always ensure that s{g,u}id bits are stripped for
> > anything that isn't a directory in the vfs:
> > - ATTR_KILL_S{G,U}ID is raised in chown_common():
> > 
> > 	if (!S_ISDIR(inode->i_mode))
> > 		newattrs.ia_valid |=
> > 			ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
> > 
> > - and then in notify_change() we'll get the bits stripped and ATTR_MODE
> >    raised:
> > 
> > 	if (ia_valid & ATTR_KILL_SUID) {
> > 		if (mode & S_ISUID) {
> > 			ia_valid = attr->ia_valid |= ATTR_MODE;
> > 			attr->ia_mode = (inode->i_mode & ~S_ISUID);
> > 		}
> > 	}
> > 	if (ia_valid & ATTR_KILL_SGID) {
> > 		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> 
> So SGID is not killed if there is no S_IXGRP (yet no capability check)
> 
> Actually I do not really understand why do kernel expects filesystems to
> further apply restrictions with CAP_FSETID. Why not kill it here since we
> have all info?

Some filesystems do treat the sgid behavior of directories special (some
network filesystems do where they send that information to the server
before updating the inode afair). So I'd rather not do that in there as
we're risking breaking expectations and it's a very sensitive change.

Plus, the logic is encapsulated in the vfs generic setattr_copy() helper
which nearly all filesystems call.

> 
> > 			if (!(ia_valid & ATTR_MODE)) {
> > 				ia_valid = attr->ia_valid |= ATTR_MODE;
> > 				attr->ia_mode = inode->i_mode;
> > 			}
> > 			attr->ia_mode &= ~S_ISGID;
> > 		}
> > 	}
> > 
> > we can change this codepath to just mirror setattr_copy() or switch
> > fully to setattr_copy() (if feasible).
> > 
> > Because as of right now the code seems to imply that the xfs code itself
> > is responsible for stripping s{g,u}id bits for all files whereas it is
> > the vfs that does it for any non-directory. So I'd propose to either try
> > and switch that code to setattr_copy() or to do open-code the
> > setattr_copy() check:
> > 
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index b79b3846e71b..ff55b31521a2 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -748,9 +748,13 @@ xfs_setattr_nonsize(
> >                   * The set-user-ID and set-group-ID bits of a file will be
> >                   * cleared upon successful return from chown()
> >                   */
> > -               if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
> > -                   !capable(CAP_FSETID))
> > -                       inode->i_mode &= ~(S_ISUID|S_ISGID);
> > +               if (iattr->ia_valid & ATTR_MODE) {
> > +                       umode_t mode = iattr->ia_mode;
> > +                       if (!in_group_p(i_gid_into_mnt(mnt_userns, inode)) &&
> > +                           !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
> > +                               mode &= ~S_ISGID;
> > +                       inode->i_mode = mode;
> > +               }
> > 
> >                  /*
> >                   * Change the ownerships and register quota modifications
> > 
> > which aligns xfs with ext4 and any other filesystem. Any thoughts on
> > this?
> > 
> > For @Andrey specifically: the tests these should go into:
> > 
> > https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/idmapped-mounts/idmapped-mounts.c
> > 
> > Note that there are already setgid inheritance tests and set*id
> > execution tests in there.
> > You should be able to copy a lot of code from them. Could you please add
> > the test I sketched above and ideally also a test that the set{g,u}id
> > bits are stripped during chown for regular files?
> Thanks for the link. To clarify what tests and result you expect:
> - for directory chown we expect to preserve s{g,u}id
> - for regfile chown we expect to preserve S_ISGID only if S_IXGRP is absent
> and CAP_FSETID is present

So specifically for chown():
1. if regfile
   -> strip suid bit unconditionally
   -> strip sgid bit if inode has sgid bit and file is group-executable
2. if directory
   -> strip sgid bit if inode's gid is neither among the caller's groups
      nor is the caller capable wrt to that inode
The behavior described in 2. is encoded in the vfs generic
setattr_copy() helper. And that is what we should see.

The behavior of ext4 and btrfs is what we should see afaict as both use
setattr_copy().

> 
> JFYI: I found out this problem while running LTP (specifically
> syscalls/chown02 test) on idmapped XFS. Maybe I will be able to find more,
> who knows.

Hm, if you look above, then you can see that the failure (or difference
in behavior) you're reporting is independent of idmapped mounts. An
ext4 directory shows different behavior than an xfs directory on a
regular system without any idmapped mounts used. So I'm not clear how
that's specifically related to idmapped mounts yet.

Fwiw, one part in your commit message is a bit misleading:

> > > > has CAP_FSETID capable in init_user_ns rather than mntns userns.

that's not what capable_wrt_to_inode_uidgid() does. What it does is to
check whether the caller is capable in their current user namespace.
That's how capable_wrt_to_inode_uidgid() has always worked.
The mnt_userns is only used to idmap the inode's {g,u}id. So if the
caller has CAP_FSETID in its current userns and the inode's {g,u}id have
a valid mapping in the mnt's userns the caller is considered privileged
over that inode.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-22 12:23       ` Christian Brauner
@ 2022-02-22 12:36         ` Christian Brauner
  2022-02-22 12:44           ` Christian Brauner
  2022-02-22 14:54           ` Andrey Zhadchenko
  0 siblings, 2 replies; 16+ messages in thread
From: Christian Brauner @ 2022-02-22 12:36 UTC (permalink / raw)
  To: Andrey Zhadchenko; +Cc: Christoph Hellwig, linux-xfs, djwong

On Tue, Feb 22, 2022 at 01:23:31PM +0100, Christian Brauner wrote:
> On Tue, Feb 22, 2022 at 02:19:16PM +0300, Andrey Zhadchenko wrote:
> > On 2/22/22 13:24, Christian Brauner wrote:
> > > On Tue, Feb 22, 2022 at 09:33:40AM +0100, Christoph Hellwig wrote:
> > > > On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
> > > > > xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
> > > > > bits.
> > > > > Unfortunately chown syscall results in different callstask:
> > > > > i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
> > > > > has CAP_FSETID capable in init_user_ns rather than mntns userns.
> > > > 
> > > > Can you add an xfstests the exercises this path?
> > > > 
> > > > The fix itself looks good:
> > > > 
> > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > 
> > > So for anything other than directories the s{g,u}id bits are cleared on
> > > every chown in notify_change() by the vfs; even for the root user (Also
> > > documented on chown(2) manpage).
> > 
> > Only exception - chown preserves setgid bit set on a non-group-executable
> > file (also documented there) but do not take root privileges into account at
> > vfs level.
> > 
> > > 
> > > So the only scenario were this change would be relevant is for
> > > directories afaict:
> > > 
> > > 1. So ext4 has the behavior:
> > > 
> > >     ubuntu@f2-vm|~
> > >     > mkdir suid.dir
> > >     ubuntu@f2-vm|~
> > >     > perms ./suid.dir
> > >     drwxrwxr-x 775 (1000:1000) ./suid.dir
> > >     ubuntu@f2-vm|~
> > >     > chmod u+s ./suid.dir/
> > >     ubuntu@f2-vm|~
> > >     > perms ./suid.dir
> > >     drwsrwxr-x 4775 (1000:1000) ./suid.dir
> > >     ubuntu@f2-vm|~
> > >     > chmod g+s ./suid.dir/
> > >     ubuntu@f2-vm|~
> > >     > perms ./suid.dir
> > >     drwsrwsr-x 6775 (1000:1000) ./suid.dir
> > >     ubuntu@f2-vm|~
> > >     > chown 1000:1000 ./suid.dir/
> > >     ubuntu@f2-vm|~
> > >     > perms ./suid.dir/
> > >     drwsrwsr-x 6775 (1000:1000) ./suid.dir/
> > >     meaning that both s{g,u}id bits are retained for directories. (Just to
> > >     make this explicit: changing {g,u}id to the same {g,u}id still ends up
> > >     calling into the filesystem.)
> > > 
> > > 2. Whereas xfs currently has:
> > > 
> > >     brauner@wittgenstein|~
> > >     > mkdir suid.dir
> > >     brauner@wittgenstein|~
> > >     > perms ./suid.dir
> > >     drwxrwxr-x 775 ./suid.dir
> > >     brauner@wittgenstein|~
> > >     > chmod u+s ./suid.dir/
> > >     brauner@wittgenstein|~
> > >     > perms ./suid.dir
> > >     drwsrwxr-x 4775 ./suid.dir
> > >     brauner@wittgenstein|~
> > >     > chmod g+s ./suid.dir/
> > >     brauner@wittgenstein|~
> > >     > perms ./suid.dir
> > >     drwsrwsr-x 6775 ./suid.dir
> > >     brauner@wittgenstein|~
> > >     > chown 1000:1000 ./suid.dir/
> > >     brauner@wittgenstein|~
> > >     > perms ./suid.dir/
> > >     drwxrwxr-x 775 ./suid.dir/
> > >     meaning that both s{g,u}id bits are cleared for directories.
> > > 
> > > Since the vfs will always ensure that s{g,u}id bits are stripped for
> > > anything that isn't a directory in the vfs:
> > > - ATTR_KILL_S{G,U}ID is raised in chown_common():
> > > 
> > > 	if (!S_ISDIR(inode->i_mode))
> > > 		newattrs.ia_valid |=
> > > 			ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
> > > 
> > > - and then in notify_change() we'll get the bits stripped and ATTR_MODE
> > >    raised:
> > > 
> > > 	if (ia_valid & ATTR_KILL_SUID) {
> > > 		if (mode & S_ISUID) {
> > > 			ia_valid = attr->ia_valid |= ATTR_MODE;
> > > 			attr->ia_mode = (inode->i_mode & ~S_ISUID);
> > > 		}
> > > 	}
> > > 	if (ia_valid & ATTR_KILL_SGID) {
> > > 		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> > 
> > So SGID is not killed if there is no S_IXGRP (yet no capability check)
> > 
> > Actually I do not really understand why do kernel expects filesystems to
> > further apply restrictions with CAP_FSETID. Why not kill it here since we
> > have all info?
> 
> Some filesystems do treat the sgid behavior of directories special (some
> network filesystems do where they send that information to the server
> before updating the inode afair). So I'd rather not do that in there as
> we're risking breaking expectations and it's a very sensitive change.
> 
> Plus, the logic is encapsulated in the vfs generic setattr_copy() helper
> which nearly all filesystems call.
> 
> > 
> > > 			if (!(ia_valid & ATTR_MODE)) {
> > > 				ia_valid = attr->ia_valid |= ATTR_MODE;
> > > 				attr->ia_mode = inode->i_mode;
> > > 			}
> > > 			attr->ia_mode &= ~S_ISGID;
> > > 		}
> > > 	}
> > > 
> > > we can change this codepath to just mirror setattr_copy() or switch
> > > fully to setattr_copy() (if feasible).
> > > 
> > > Because as of right now the code seems to imply that the xfs code itself
> > > is responsible for stripping s{g,u}id bits for all files whereas it is
> > > the vfs that does it for any non-directory. So I'd propose to either try
> > > and switch that code to setattr_copy() or to do open-code the
> > > setattr_copy() check:
> > > 
> > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > index b79b3846e71b..ff55b31521a2 100644
> > > --- a/fs/xfs/xfs_iops.c
> > > +++ b/fs/xfs/xfs_iops.c
> > > @@ -748,9 +748,13 @@ xfs_setattr_nonsize(
> > >                   * The set-user-ID and set-group-ID bits of a file will be
> > >                   * cleared upon successful return from chown()
> > >                   */
> > > -               if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
> > > -                   !capable(CAP_FSETID))
> > > -                       inode->i_mode &= ~(S_ISUID|S_ISGID);
> > > +               if (iattr->ia_valid & ATTR_MODE) {
> > > +                       umode_t mode = iattr->ia_mode;
> > > +                       if (!in_group_p(i_gid_into_mnt(mnt_userns, inode)) &&
> > > +                           !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
> > > +                               mode &= ~S_ISGID;
> > > +                       inode->i_mode = mode;
> > > +               }
> > > 
> > >                  /*
> > >                   * Change the ownerships and register quota modifications
> > > 
> > > which aligns xfs with ext4 and any other filesystem. Any thoughts on
> > > this?
> > > 
> > > For @Andrey specifically: the tests these should go into:
> > > 
> > > https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/idmapped-mounts/idmapped-mounts.c
> > > 
> > > Note that there are already setgid inheritance tests and set*id
> > > execution tests in there.
> > > You should be able to copy a lot of code from them. Could you please add
> > > the test I sketched above and ideally also a test that the set{g,u}id
> > > bits are stripped during chown for regular files?
> > Thanks for the link. To clarify what tests and result you expect:
> > - for directory chown we expect to preserve s{g,u}id
> > - for regfile chown we expect to preserve S_ISGID only if S_IXGRP is absent
> > and CAP_FSETID is present
> 
> So specifically for chown():
> 1. if regfile
>    -> strip suid bit unconditionally
>    -> strip sgid bit if inode has sgid bit and file is group-executable
> 2. if directory
>    -> strip sgid bit if inode's gid is neither among the caller's groups
>       nor is the caller capable wrt to that inode
> The behavior described in 2. is encoded in the vfs generic
> setattr_copy() helper. And that is what we should see.
> 
> The behavior of ext4 and btrfs is what we should see afaict as both use
> setattr_copy().
> 
> > 
> > JFYI: I found out this problem while running LTP (specifically
> > syscalls/chown02 test) on idmapped XFS. Maybe I will be able to find more,
> > who knows.
> 
> Hm, if you look above, then you can see that the failure (or difference
> in behavior) you're reporting is independent of idmapped mounts. An
> ext4 directory shows different behavior than an xfs directory on a
> regular system without any idmapped mounts used. So I'm not clear how
> that's specifically related to idmapped mounts yet.

So for example, in order to cause the sgid bit stripped while it should
be preserved if xfs were to use setattr_copy() I can simply do:

brauner@wittgenstein|~/src/git/linux/ltp/testcases/kernel/syscalls/chown|master %=
> unshare -U --map-root
root@wittgenstein|~/src/git/linux/ltp/testcases/kernel/syscalls/chown|master %=
> PATH=$PATH:$PWD ./chown02
tst_memutils.c:157: TWARN: Can't adjust score, even with capabilities!?
tst_test.c:1455: TINFO: Timeout per run is 0h 05m 00s
chown02.c:45: TPASS: chown(testfile1, 0, 0) passed
chown02.c:45: TPASS: chown(testfile2, 0, 0) passed
chown02.c:57: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700

Summary:
passed   2
failed   1
broken   0
skipped  0
warnings 1

There's no idmapped mounts here in play. The caller simply has been
placed in a new user namespace and thus they fail the current
capable(CAP_FSETID) check which will cause xfs to strip the sgid bit.

Now trying the same with ext4:

ubuntu@f2-vm:~/src/git/linux/ltp/testcases/kernel/syscalls/chown$ unshare -U --map-root
root@f2-vm:~/src/git/linux/ltp/testcases/kernel/syscalls/chown# PATH=$PATH:$PWD ./chown02
tst_memutils.c:157: TWARN: Can't adjust score, even with capabilities!?
tst_test.c:1455: TINFO: Timeout per run is 0h 05m 00s
chown02.c:45: TPASS: chown(testfile1, 0, 0) passed
chown02.c:45: TPASS: chown(testfile2, 0, 0) passed

Summary:
passed   2
failed   0
broken   0
skipped  0
warnings 1

it passes since ext4 uses setattr_copy() and thus the capability is
checked for in the caller's user namespace.

> 
> Fwiw, one part in your commit message is a bit misleading:
> 
> > > > > has CAP_FSETID capable in init_user_ns rather than mntns userns.
> 
> that's not what capable_wrt_to_inode_uidgid() does. What it does is to
> check whether the caller is capable in their current user namespace.
> That's how capable_wrt_to_inode_uidgid() has always worked.
> The mnt_userns is only used to idmap the inode's {g,u}id. So if the
> caller has CAP_FSETID in its current userns and the inode's {g,u}id have
> a valid mapping in the mnt's userns the caller is considered privileged
> over that inode.
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-22 12:36         ` Christian Brauner
@ 2022-02-22 12:44           ` Christian Brauner
  2022-02-22 14:54           ` Andrey Zhadchenko
  1 sibling, 0 replies; 16+ messages in thread
From: Christian Brauner @ 2022-02-22 12:44 UTC (permalink / raw)
  To: Andrey Zhadchenko; +Cc: Christoph Hellwig, linux-xfs, djwong

On Tue, Feb 22, 2022 at 01:36:56PM +0100, Christian Brauner wrote:
> On Tue, Feb 22, 2022 at 01:23:31PM +0100, Christian Brauner wrote:
> > On Tue, Feb 22, 2022 at 02:19:16PM +0300, Andrey Zhadchenko wrote:
> > > On 2/22/22 13:24, Christian Brauner wrote:
> > > > On Tue, Feb 22, 2022 at 09:33:40AM +0100, Christoph Hellwig wrote:
> > > > > On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
> > > > > > xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
> > > > > > bits.
> > > > > > Unfortunately chown syscall results in different callstask:
> > > > > > i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
> > > > > > has CAP_FSETID capable in init_user_ns rather than mntns userns.
> > > > > 
> > > > > Can you add an xfstests the exercises this path?
> > > > > 
> > > > > The fix itself looks good:
> > > > > 
> > > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > 
> > > > So for anything other than directories the s{g,u}id bits are cleared on
> > > > every chown in notify_change() by the vfs; even for the root user (Also
> > > > documented on chown(2) manpage).
> > > 
> > > Only exception - chown preserves setgid bit set on a non-group-executable
> > > file (also documented there) but do not take root privileges into account at
> > > vfs level.
> > > 
> > > > 
> > > > So the only scenario were this change would be relevant is for
> > > > directories afaict:
> > > > 
> > > > 1. So ext4 has the behavior:
> > > > 
> > > >     ubuntu@f2-vm|~
> > > >     > mkdir suid.dir
> > > >     ubuntu@f2-vm|~
> > > >     > perms ./suid.dir
> > > >     drwxrwxr-x 775 (1000:1000) ./suid.dir
> > > >     ubuntu@f2-vm|~
> > > >     > chmod u+s ./suid.dir/
> > > >     ubuntu@f2-vm|~
> > > >     > perms ./suid.dir
> > > >     drwsrwxr-x 4775 (1000:1000) ./suid.dir
> > > >     ubuntu@f2-vm|~
> > > >     > chmod g+s ./suid.dir/
> > > >     ubuntu@f2-vm|~
> > > >     > perms ./suid.dir
> > > >     drwsrwsr-x 6775 (1000:1000) ./suid.dir
> > > >     ubuntu@f2-vm|~
> > > >     > chown 1000:1000 ./suid.dir/
> > > >     ubuntu@f2-vm|~
> > > >     > perms ./suid.dir/
> > > >     drwsrwsr-x 6775 (1000:1000) ./suid.dir/
> > > >     meaning that both s{g,u}id bits are retained for directories. (Just to
> > > >     make this explicit: changing {g,u}id to the same {g,u}id still ends up
> > > >     calling into the filesystem.)
> > > > 
> > > > 2. Whereas xfs currently has:
> > > > 
> > > >     brauner@wittgenstein|~
> > > >     > mkdir suid.dir
> > > >     brauner@wittgenstein|~
> > > >     > perms ./suid.dir
> > > >     drwxrwxr-x 775 ./suid.dir
> > > >     brauner@wittgenstein|~
> > > >     > chmod u+s ./suid.dir/
> > > >     brauner@wittgenstein|~
> > > >     > perms ./suid.dir
> > > >     drwsrwxr-x 4775 ./suid.dir
> > > >     brauner@wittgenstein|~
> > > >     > chmod g+s ./suid.dir/
> > > >     brauner@wittgenstein|~
> > > >     > perms ./suid.dir
> > > >     drwsrwsr-x 6775 ./suid.dir
> > > >     brauner@wittgenstein|~
> > > >     > chown 1000:1000 ./suid.dir/
> > > >     brauner@wittgenstein|~
> > > >     > perms ./suid.dir/
> > > >     drwxrwxr-x 775 ./suid.dir/
> > > >     meaning that both s{g,u}id bits are cleared for directories.
> > > > 
> > > > Since the vfs will always ensure that s{g,u}id bits are stripped for
> > > > anything that isn't a directory in the vfs:
> > > > - ATTR_KILL_S{G,U}ID is raised in chown_common():
> > > > 
> > > > 	if (!S_ISDIR(inode->i_mode))
> > > > 		newattrs.ia_valid |=
> > > > 			ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
> > > > 
> > > > - and then in notify_change() we'll get the bits stripped and ATTR_MODE
> > > >    raised:
> > > > 
> > > > 	if (ia_valid & ATTR_KILL_SUID) {
> > > > 		if (mode & S_ISUID) {
> > > > 			ia_valid = attr->ia_valid |= ATTR_MODE;
> > > > 			attr->ia_mode = (inode->i_mode & ~S_ISUID);
> > > > 		}
> > > > 	}
> > > > 	if (ia_valid & ATTR_KILL_SGID) {
> > > > 		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> > > 
> > > So SGID is not killed if there is no S_IXGRP (yet no capability check)
> > > 
> > > Actually I do not really understand why do kernel expects filesystems to
> > > further apply restrictions with CAP_FSETID. Why not kill it here since we
> > > have all info?
> > 
> > Some filesystems do treat the sgid behavior of directories special (some
> > network filesystems do where they send that information to the server
> > before updating the inode afair). So I'd rather not do that in there as
> > we're risking breaking expectations and it's a very sensitive change.
> > 
> > Plus, the logic is encapsulated in the vfs generic setattr_copy() helper
> > which nearly all filesystems call.
> > 
> > > 
> > > > 			if (!(ia_valid & ATTR_MODE)) {
> > > > 				ia_valid = attr->ia_valid |= ATTR_MODE;
> > > > 				attr->ia_mode = inode->i_mode;
> > > > 			}
> > > > 			attr->ia_mode &= ~S_ISGID;
> > > > 		}
> > > > 	}
> > > > 
> > > > we can change this codepath to just mirror setattr_copy() or switch
> > > > fully to setattr_copy() (if feasible).
> > > > 
> > > > Because as of right now the code seems to imply that the xfs code itself
> > > > is responsible for stripping s{g,u}id bits for all files whereas it is
> > > > the vfs that does it for any non-directory. So I'd propose to either try
> > > > and switch that code to setattr_copy() or to do open-code the
> > > > setattr_copy() check:
> > > > 
> > > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > > index b79b3846e71b..ff55b31521a2 100644
> > > > --- a/fs/xfs/xfs_iops.c
> > > > +++ b/fs/xfs/xfs_iops.c
> > > > @@ -748,9 +748,13 @@ xfs_setattr_nonsize(
> > > >                   * The set-user-ID and set-group-ID bits of a file will be
> > > >                   * cleared upon successful return from chown()
> > > >                   */
> > > > -               if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
> > > > -                   !capable(CAP_FSETID))
> > > > -                       inode->i_mode &= ~(S_ISUID|S_ISGID);
> > > > +               if (iattr->ia_valid & ATTR_MODE) {
> > > > +                       umode_t mode = iattr->ia_mode;
> > > > +                       if (!in_group_p(i_gid_into_mnt(mnt_userns, inode)) &&
> > > > +                           !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
> > > > +                               mode &= ~S_ISGID;
> > > > +                       inode->i_mode = mode;
> > > > +               }
> > > > 
> > > >                  /*
> > > >                   * Change the ownerships and register quota modifications
> > > > 
> > > > which aligns xfs with ext4 and any other filesystem. Any thoughts on
> > > > this?
> > > > 
> > > > For @Andrey specifically: the tests these should go into:
> > > > 
> > > > https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/idmapped-mounts/idmapped-mounts.c
> > > > 
> > > > Note that there are already setgid inheritance tests and set*id
> > > > execution tests in there.
> > > > You should be able to copy a lot of code from them. Could you please add
> > > > the test I sketched above and ideally also a test that the set{g,u}id
> > > > bits are stripped during chown for regular files?
> > > Thanks for the link. To clarify what tests and result you expect:
> > > - for directory chown we expect to preserve s{g,u}id
> > > - for regfile chown we expect to preserve S_ISGID only if S_IXGRP is absent
> > > and CAP_FSETID is present
> > 
> > So specifically for chown():
> > 1. if regfile
> >    -> strip suid bit unconditionally
> >    -> strip sgid bit if inode has sgid bit and file is group-executable
> > 2. if directory
> >    -> strip sgid bit if inode's gid is neither among the caller's groups
> >       nor is the caller capable wrt to that inode
> > The behavior described in 2. is encoded in the vfs generic
> > setattr_copy() helper. And that is what we should see.
> > 
> > The behavior of ext4 and btrfs is what we should see afaict as both use
> > setattr_copy().
> > 
> > > 
> > > JFYI: I found out this problem while running LTP (specifically
> > > syscalls/chown02 test) on idmapped XFS. Maybe I will be able to find more,
> > > who knows.
> > 
> > Hm, if you look above, then you can see that the failure (or difference
> > in behavior) you're reporting is independent of idmapped mounts. An
> > ext4 directory shows different behavior than an xfs directory on a
> > regular system without any idmapped mounts used. So I'm not clear how
> > that's specifically related to idmapped mounts yet.
> 
> So for example, in order to cause the sgid bit stripped while it should
> be preserved if xfs were to use setattr_copy() I can simply do:
> 
> brauner@wittgenstein|~/src/git/linux/ltp/testcases/kernel/syscalls/chown|master %=
> > unshare -U --map-root
> root@wittgenstein|~/src/git/linux/ltp/testcases/kernel/syscalls/chown|master %=
> > PATH=$PATH:$PWD ./chown02
> tst_memutils.c:157: TWARN: Can't adjust score, even with capabilities!?
> tst_test.c:1455: TINFO: Timeout per run is 0h 05m 00s
> chown02.c:45: TPASS: chown(testfile1, 0, 0) passed
> chown02.c:45: TPASS: chown(testfile2, 0, 0) passed
> chown02.c:57: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700
> 
> Summary:
> passed   2
> failed   1
> broken   0
> skipped  0
> warnings 1
> 
> There's no idmapped mounts here in play. The caller simply has been
> placed in a new user namespace and thus they fail the current
> capable(CAP_FSETID) check which will cause xfs to strip the sgid bit.
> 
> Now trying the same with ext4:
> 
> ubuntu@f2-vm:~/src/git/linux/ltp/testcases/kernel/syscalls/chown$ unshare -U --map-root
> root@f2-vm:~/src/git/linux/ltp/testcases/kernel/syscalls/chown# PATH=$PATH:$PWD ./chown02
> tst_memutils.c:157: TWARN: Can't adjust score, even with capabilities!?
> tst_test.c:1455: TINFO: Timeout per run is 0h 05m 00s
> chown02.c:45: TPASS: chown(testfile1, 0, 0) passed
> chown02.c:45: TPASS: chown(testfile2, 0, 0) passed
> 
> Summary:
> passed   2
> failed   0
> broken   0
> skipped  0
> warnings 1
> 
> it passes since ext4 uses setattr_copy() and thus the capability is
> checked for in the caller's user namespace.

Fwiw, the xfstests I linked to always try to test 4 scenarios for every
vfs syscall/ioctl:
1. host without idmapped mounts
2. userns without idmapped mounts
3. host with idmapped mounts
4. userns with idmapped mounts

> 
> > 
> > Fwiw, one part in your commit message is a bit misleading:
> > 
> > > > > > has CAP_FSETID capable in init_user_ns rather than mntns userns.
> > 
> > that's not what capable_wrt_to_inode_uidgid() does. What it does is to
> > check whether the caller is capable in their current user namespace.
> > That's how capable_wrt_to_inode_uidgid() has always worked.
> > The mnt_userns is only used to idmap the inode's {g,u}id. So if the
> > caller has CAP_FSETID in its current userns and the inode's {g,u}id have
> > a valid mapping in the mnt's userns the caller is considered privileged
> > over that inode.
> > 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-22 12:36         ` Christian Brauner
  2022-02-22 12:44           ` Christian Brauner
@ 2022-02-22 14:54           ` Andrey Zhadchenko
  2022-02-22 15:03             ` Christian Brauner
                               ` (2 more replies)
  1 sibling, 3 replies; 16+ messages in thread
From: Andrey Zhadchenko @ 2022-02-22 14:54 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Christoph Hellwig, linux-xfs, djwong



On 2/22/22 15:36, Christian Brauner wrote:
> On Tue, Feb 22, 2022 at 01:23:31PM +0100, Christian Brauner wrote:
>> On Tue, Feb 22, 2022 at 02:19:16PM +0300, Andrey Zhadchenko wrote:
>>> On 2/22/22 13:24, Christian Brauner wrote:
>>>> On Tue, Feb 22, 2022 at 09:33:40AM +0100, Christoph Hellwig wrote:
>>>>> On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
>>>>>> xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
>>>>>> bits.
>>>>>> Unfortunately chown syscall results in different callstask:
>>>>>> i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
>>>>>> has CAP_FSETID capable in init_user_ns rather than mntns userns.
>>>>>
>>>>> Can you add an xfstests the exercises this path?
>>>>>
>>>>> The fix itself looks good:
>>>>>
>>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>>>
>>>> So for anything other than directories the s{g,u}id bits are cleared on
>>>> every chown in notify_change() by the vfs; even for the root user (Also
>>>> documented on chown(2) manpage).
>>>
>>> Only exception - chown preserves setgid bit set on a non-group-executable
>>> file (also documented there) but do not take root privileges into account at
>>> vfs level.
>>>
>>>>
>>>> So the only scenario were this change would be relevant is for
>>>> directories afaict:
>>>>
>>>> 1. So ext4 has the behavior:
>>>>
>>>>      ubuntu@f2-vm|~
>>>>      > mkdir suid.dir
>>>>      ubuntu@f2-vm|~
>>>>      > perms ./suid.dir
>>>>      drwxrwxr-x 775 (1000:1000) ./suid.dir
>>>>      ubuntu@f2-vm|~
>>>>      > chmod u+s ./suid.dir/
>>>>      ubuntu@f2-vm|~
>>>>      > perms ./suid.dir
>>>>      drwsrwxr-x 4775 (1000:1000) ./suid.dir
>>>>      ubuntu@f2-vm|~
>>>>      > chmod g+s ./suid.dir/
>>>>      ubuntu@f2-vm|~
>>>>      > perms ./suid.dir
>>>>      drwsrwsr-x 6775 (1000:1000) ./suid.dir
>>>>      ubuntu@f2-vm|~
>>>>      > chown 1000:1000 ./suid.dir/
>>>>      ubuntu@f2-vm|~
>>>>      > perms ./suid.dir/
>>>>      drwsrwsr-x 6775 (1000:1000) ./suid.dir/
>>>>      meaning that both s{g,u}id bits are retained for directories. (Just to
>>>>      make this explicit: changing {g,u}id to the same {g,u}id still ends up
>>>>      calling into the filesystem.)
>>>>
>>>> 2. Whereas xfs currently has:
>>>>
>>>>      brauner@wittgenstein|~
>>>>      > mkdir suid.dir
>>>>      brauner@wittgenstein|~
>>>>      > perms ./suid.dir
>>>>      drwxrwxr-x 775 ./suid.dir
>>>>      brauner@wittgenstein|~
>>>>      > chmod u+s ./suid.dir/
>>>>      brauner@wittgenstein|~
>>>>      > perms ./suid.dir
>>>>      drwsrwxr-x 4775 ./suid.dir
>>>>      brauner@wittgenstein|~
>>>>      > chmod g+s ./suid.dir/
>>>>      brauner@wittgenstein|~
>>>>      > perms ./suid.dir
>>>>      drwsrwsr-x 6775 ./suid.dir
>>>>      brauner@wittgenstein|~
>>>>      > chown 1000:1000 ./suid.dir/
>>>>      brauner@wittgenstein|~
>>>>      > perms ./suid.dir/
>>>>      drwxrwxr-x 775 ./suid.dir/
>>>>      meaning that both s{g,u}id bits are cleared for directories.
>>>>
>>>> Since the vfs will always ensure that s{g,u}id bits are stripped for
>>>> anything that isn't a directory in the vfs:
>>>> - ATTR_KILL_S{G,U}ID is raised in chown_common():
>>>>
>>>> 	if (!S_ISDIR(inode->i_mode))
>>>> 		newattrs.ia_valid |=
>>>> 			ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
>>>>
>>>> - and then in notify_change() we'll get the bits stripped and ATTR_MODE
>>>>     raised:
>>>>
>>>> 	if (ia_valid & ATTR_KILL_SUID) {
>>>> 		if (mode & S_ISUID) {
>>>> 			ia_valid = attr->ia_valid |= ATTR_MODE;
>>>> 			attr->ia_mode = (inode->i_mode & ~S_ISUID);
>>>> 		}
>>>> 	}
>>>> 	if (ia_valid & ATTR_KILL_SGID) {
>>>> 		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
>>>
>>> So SGID is not killed if there is no S_IXGRP (yet no capability check)
>>>
>>> Actually I do not really understand why do kernel expects filesystems to
>>> further apply restrictions with CAP_FSETID. Why not kill it here since we
>>> have all info?
>>
>> Some filesystems do treat the sgid behavior of directories special (some
>> network filesystems do where they send that information to the server
>> before updating the inode afair). So I'd rather not do that in there as
>> we're risking breaking expectations and it's a very sensitive change.
>>
>> Plus, the logic is encapsulated in the vfs generic setattr_copy() helper
>> which nearly all filesystems call.
>>
>>>
>>>> 			if (!(ia_valid & ATTR_MODE)) {
>>>> 				ia_valid = attr->ia_valid |= ATTR_MODE;
>>>> 				attr->ia_mode = inode->i_mode;
>>>> 			}
>>>> 			attr->ia_mode &= ~S_ISGID;
>>>> 		}
>>>> 	}
>>>>
>>>> we can change this codepath to just mirror setattr_copy() or switch
>>>> fully to setattr_copy() (if feasible).
>>>>
>>>> Because as of right now the code seems to imply that the xfs code itself
>>>> is responsible for stripping s{g,u}id bits for all files whereas it is
>>>> the vfs that does it for any non-directory. So I'd propose to either try
>>>> and switch that code to setattr_copy() or to do open-code the
>>>> setattr_copy() check:

I did some more research on it and seems like modes are already stripped 
enough.

notify_change() -> inode->i_op->setattr() -> xfs_vn_setattr() -> 
xfs_vn_change_ok() -> prepare_setattr()
which has the following:
         if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
                          i_gid_into_mnt(mnt_userns, inode)) &&
              !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                  attr->ia_mode &= ~S_ISGID;

After xfs_vn_change_ok() xfs_setattr_nonsize() is finally called and 
additionally strips sgid and suid.

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 09211e1d08ad..7fda5ff3ef17 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -767,16 +767,6 @@ xfs_setattr_nonsize(
                 gid = (mask & ATTR_GID) ? iattr->ia_gid : igid;
                 uid = (mask & ATTR_UID) ? iattr->ia_uid : iuid;

-               /*
-                * CAP_FSETID overrides the following restrictions:
-                *
-                * The set-user-ID and set-group-ID bits of a file will be
-                * cleared upon successful return from chown()
-                */
-               if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
-                   !capable(CAP_FSETID))
-                       inode->i_mode &= ~(S_ISUID|S_ISGID);
-
                 /*
                  * Change the ownerships and register quota modifications
                  * in the transaction.


root@debian:/mnt/xfs# unshare -U --map-root
root@debian:/mnt/xfs# touch testfile
root@debian:/mnt/xfs# chmod g+s testfile
root@debian:/mnt/xfs# ls -la testfile
-rw-r-Sr-- 1 root root 0 Feb 22 14:46 testfile
root@debian:/mnt/xfs# chown 0:0 testfile
root@debian:/mnt/xfs# ls -la testfile
-rw-r-Sr-- 1 root root 0 Feb 22 14:46 testfile
root@debian:/mnt/xfs#

root@debian:/mnt/xfs# mkdir testdir
root@debian:/mnt/xfs# chmod u+s testdir
root@debian:/mnt/xfs# chmod u+g testdir
root@debian:/mnt/xfs#
root@debian:/mnt/xfs# ls -la
total 4
drwxr-xr-x 4 root root   50 Feb 22 14:47 .
drwxr-xr-x 4 root root 4096 Feb 22 13:45 ..
drwsr-sr-x 2 root root    6 Feb 22 14:42 test1
drwsr-xr-x 2 root root    6 Feb 22 14:47 testdir
-rw-r-Sr-- 1 root root    0 Feb 22 14:46 testfile
root@debian:/mnt/xfs# chown 0:0 testdir
root@debian:/mnt/xfs# ls -la
total 4
drwxr-xr-x 4 root root   50 Feb 22 14:47 .
drwxr-xr-x 4 root root 4096 Feb 22 13:45 ..
drwsr-sr-x 2 root root    6 Feb 22 14:42 test1
drwsr-xr-x 2 root root    6 Feb 22 14:47 testdir
-rw-r-Sr-- 1 root root    0 Feb 22 14:46 testfile

>>>>
>>>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>>>> index b79b3846e71b..ff55b31521a2 100644
>>>> --- a/fs/xfs/xfs_iops.c
>>>> +++ b/fs/xfs/xfs_iops.c
>>>> @@ -748,9 +748,13 @@ xfs_setattr_nonsize(
>>>>                    * The set-user-ID and set-group-ID bits of a file will be
>>>>                    * cleared upon successful return from chown()
>>>>                    */
>>>> -               if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
>>>> -                   !capable(CAP_FSETID))
>>>> -                       inode->i_mode &= ~(S_ISUID|S_ISGID);
>>>> +               if (iattr->ia_valid & ATTR_MODE) {
>>>> +                       umode_t mode = iattr->ia_mode;
>>>> +                       if (!in_group_p(i_gid_into_mnt(mnt_userns, inode)) &&
>>>> +                           !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
>>>> +                               mode &= ~S_ISGID;
>>>> +                       inode->i_mode = mode;
>>>> +               }
>>>>
>>>>                   /*
>>>>                    * Change the ownerships and register quota modifications
>>>>
>>>> which aligns xfs with ext4 and any other filesystem. Any thoughts on
>>>> this?
>>>>
>>>> For @Andrey specifically: the tests these should go into:
>>>>
>>>> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/idmapped-mounts/idmapped-mounts.c
>>>>
>>>> Note that there are already setgid inheritance tests and set*id
>>>> execution tests in there.
>>>> You should be able to copy a lot of code from them. Could you please add
>>>> the test I sketched above and ideally also a test that the set{g,u}id
>>>> bits are stripped during chown for regular files?
>>> Thanks for the link. To clarify what tests and result you expect:
>>> - for directory chown we expect to preserve s{g,u}id
>>> - for regfile chown we expect to preserve S_ISGID only if S_IXGRP is absent
>>> and CAP_FSETID is present
>>
>> So specifically for chown():
>> 1. if regfile
>>     -> strip suid bit unconditionally
>>     -> strip sgid bit if inode has sgid bit and file is group-executable
>> 2. if directory
>>     -> strip sgid bit if inode's gid is neither among the caller's groups
>>        nor is the caller capable wrt to that inode
>> The behavior described in 2. is encoded in the vfs generic
>> setattr_copy() helper. And that is what we should see.
>>
>> The behavior of ext4 and btrfs is what we should see afaict as both use
>> setattr_copy().
>>
>>>
>>> JFYI: I found out this problem while running LTP (specifically
>>> syscalls/chown02 test) on idmapped XFS. Maybe I will be able to find more,
>>> who knows.
>>
>> Hm, if you look above, then you can see that the failure (or difference
>> in behavior) you're reporting is independent of idmapped mounts. An
>> ext4 directory shows different behavior than an xfs directory on a
>> regular system without any idmapped mounts used. So I'm not clear how
>> that's specifically related to idmapped mounts yet.

I guess my commit message is pretty poor. Initially I found out that 
chown() on idmapped xfs (+userns) drops sgid unconditionally on 
regfiles. I did not thought about directories at all. It is good that 
you pointed it out.
The problem is indeed independent from idmapping. However I thought this 
belonged to it since most of the checks were updated with idmapped 
series. Thanks for the explanation

> 
> So for example, in order to cause the sgid bit stripped while it should
> be preserved if xfs were to use setattr_copy() I can simply do:
> 
> brauner@wittgenstein|~/src/git/linux/ltp/testcases/kernel/syscalls/chown|master %=
>> unshare -U --map-root
> root@wittgenstein|~/src/git/linux/ltp/testcases/kernel/syscalls/chown|master %=
>> PATH=$PATH:$PWD ./chown02
> tst_memutils.c:157: TWARN: Can't adjust score, even with capabilities!?
> tst_test.c:1455: TINFO: Timeout per run is 0h 05m 00s
> chown02.c:45: TPASS: chown(testfile1, 0, 0) passed
> chown02.c:45: TPASS: chown(testfile2, 0, 0) passed
> chown02.c:57: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700
> 
> Summary:
> passed   2
> failed   1
> broken   0
> skipped  0
> warnings 1
> 
> There's no idmapped mounts here in play. The caller simply has been
> placed in a new user namespace and thus they fail the current
> capable(CAP_FSETID) check which will cause xfs to strip the sgid bit >
> Now trying the same with ext4:
> 
> ubuntu@f2-vm:~/src/git/linux/ltp/testcases/kernel/syscalls/chown$ unshare -U --map-root
> root@f2-vm:~/src/git/linux/ltp/testcases/kernel/syscalls/chown# PATH=$PATH:$PWD ./chown02
> tst_memutils.c:157: TWARN: Can't adjust score, even with capabilities!?
> tst_test.c:1455: TINFO: Timeout per run is 0h 05m 00s
> chown02.c:45: TPASS: chown(testfile1, 0, 0) passed
> chown02.c:45: TPASS: chown(testfile2, 0, 0) passed
> 
> Summary:
> passed   2
> failed   0
> broken   0
> skipped  0
> warnings 1
> 
> it passes since ext4 uses setattr_copy() and thus the capability is
> checked for in the caller's user namespace.
> 
>>
>> Fwiw, one part in your commit message is a bit misleading:
>>
>>>>>> has CAP_FSETID capable in init_user_ns rather than mntns userns.
>>
>> that's not what capable_wrt_to_inode_uidgid() does. What it does is to
>> check whether the caller is capable in their current user namespace.
>> That's how capable_wrt_to_inode_uidgid() has always worked.
>> The mnt_userns is only used to idmap the inode's {g,u}id. So if the
>> caller has CAP_FSETID in its current userns and the inode's {g,u}id have
>> a valid mapping in the mnt's userns the caller is considered privileged
>> over that inode.
>>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-22 14:54           ` Andrey Zhadchenko
@ 2022-02-22 15:03             ` Christian Brauner
  2022-02-22 21:40             ` Dave Chinner
  2022-02-23  8:11             ` Christian Brauner
  2 siblings, 0 replies; 16+ messages in thread
From: Christian Brauner @ 2022-02-22 15:03 UTC (permalink / raw)
  To: Andrey Zhadchenko; +Cc: Christoph Hellwig, linux-xfs, djwong

On Tue, Feb 22, 2022 at 05:54:07PM +0300, Andrey Zhadchenko wrote:
> 
> 
> On 2/22/22 15:36, Christian Brauner wrote:
> > On Tue, Feb 22, 2022 at 01:23:31PM +0100, Christian Brauner wrote:
> > > On Tue, Feb 22, 2022 at 02:19:16PM +0300, Andrey Zhadchenko wrote:
> > > > On 2/22/22 13:24, Christian Brauner wrote:
> > > > > On Tue, Feb 22, 2022 at 09:33:40AM +0100, Christoph Hellwig wrote:
> > > > > > On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
> > > > > > > xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
> > > > > > > bits.
> > > > > > > Unfortunately chown syscall results in different callstask:
> > > > > > > i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
> > > > > > > has CAP_FSETID capable in init_user_ns rather than mntns userns.
> > > > > > 
> > > > > > Can you add an xfstests the exercises this path?
> > > > > > 
> > > > > > The fix itself looks good:
> > > > > > 
> > > > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > > 
> > > > > So for anything other than directories the s{g,u}id bits are cleared on
> > > > > every chown in notify_change() by the vfs; even for the root user (Also
> > > > > documented on chown(2) manpage).
> > > > 
> > > > Only exception - chown preserves setgid bit set on a non-group-executable
> > > > file (also documented there) but do not take root privileges into account at
> > > > vfs level.
> > > > 
> > > > > 
> > > > > So the only scenario were this change would be relevant is for
> > > > > directories afaict:
> > > > > 
> > > > > 1. So ext4 has the behavior:
> > > > > 
> > > > >      ubuntu@f2-vm|~
> > > > >      > mkdir suid.dir
> > > > >      ubuntu@f2-vm|~
> > > > >      > perms ./suid.dir
> > > > >      drwxrwxr-x 775 (1000:1000) ./suid.dir
> > > > >      ubuntu@f2-vm|~
> > > > >      > chmod u+s ./suid.dir/
> > > > >      ubuntu@f2-vm|~
> > > > >      > perms ./suid.dir
> > > > >      drwsrwxr-x 4775 (1000:1000) ./suid.dir
> > > > >      ubuntu@f2-vm|~
> > > > >      > chmod g+s ./suid.dir/
> > > > >      ubuntu@f2-vm|~
> > > > >      > perms ./suid.dir
> > > > >      drwsrwsr-x 6775 (1000:1000) ./suid.dir
> > > > >      ubuntu@f2-vm|~
> > > > >      > chown 1000:1000 ./suid.dir/
> > > > >      ubuntu@f2-vm|~
> > > > >      > perms ./suid.dir/
> > > > >      drwsrwsr-x 6775 (1000:1000) ./suid.dir/
> > > > >      meaning that both s{g,u}id bits are retained for directories. (Just to
> > > > >      make this explicit: changing {g,u}id to the same {g,u}id still ends up
> > > > >      calling into the filesystem.)
> > > > > 
> > > > > 2. Whereas xfs currently has:
> > > > > 
> > > > >      brauner@wittgenstein|~
> > > > >      > mkdir suid.dir
> > > > >      brauner@wittgenstein|~
> > > > >      > perms ./suid.dir
> > > > >      drwxrwxr-x 775 ./suid.dir
> > > > >      brauner@wittgenstein|~
> > > > >      > chmod u+s ./suid.dir/
> > > > >      brauner@wittgenstein|~
> > > > >      > perms ./suid.dir
> > > > >      drwsrwxr-x 4775 ./suid.dir
> > > > >      brauner@wittgenstein|~
> > > > >      > chmod g+s ./suid.dir/
> > > > >      brauner@wittgenstein|~
> > > > >      > perms ./suid.dir
> > > > >      drwsrwsr-x 6775 ./suid.dir
> > > > >      brauner@wittgenstein|~
> > > > >      > chown 1000:1000 ./suid.dir/
> > > > >      brauner@wittgenstein|~
> > > > >      > perms ./suid.dir/
> > > > >      drwxrwxr-x 775 ./suid.dir/
> > > > >      meaning that both s{g,u}id bits are cleared for directories.
> > > > > 
> > > > > Since the vfs will always ensure that s{g,u}id bits are stripped for
> > > > > anything that isn't a directory in the vfs:
> > > > > - ATTR_KILL_S{G,U}ID is raised in chown_common():
> > > > > 
> > > > > 	if (!S_ISDIR(inode->i_mode))
> > > > > 		newattrs.ia_valid |=
> > > > > 			ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
> > > > > 
> > > > > - and then in notify_change() we'll get the bits stripped and ATTR_MODE
> > > > >     raised:
> > > > > 
> > > > > 	if (ia_valid & ATTR_KILL_SUID) {
> > > > > 		if (mode & S_ISUID) {
> > > > > 			ia_valid = attr->ia_valid |= ATTR_MODE;
> > > > > 			attr->ia_mode = (inode->i_mode & ~S_ISUID);
> > > > > 		}
> > > > > 	}
> > > > > 	if (ia_valid & ATTR_KILL_SGID) {
> > > > > 		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> > > > 
> > > > So SGID is not killed if there is no S_IXGRP (yet no capability check)
> > > > 
> > > > Actually I do not really understand why do kernel expects filesystems to
> > > > further apply restrictions with CAP_FSETID. Why not kill it here since we
> > > > have all info?
> > > 
> > > Some filesystems do treat the sgid behavior of directories special (some
> > > network filesystems do where they send that information to the server
> > > before updating the inode afair). So I'd rather not do that in there as
> > > we're risking breaking expectations and it's a very sensitive change.
> > > 
> > > Plus, the logic is encapsulated in the vfs generic setattr_copy() helper
> > > which nearly all filesystems call.
> > > 
> > > > 
> > > > > 			if (!(ia_valid & ATTR_MODE)) {
> > > > > 				ia_valid = attr->ia_valid |= ATTR_MODE;
> > > > > 				attr->ia_mode = inode->i_mode;
> > > > > 			}
> > > > > 			attr->ia_mode &= ~S_ISGID;
> > > > > 		}
> > > > > 	}
> > > > > 
> > > > > we can change this codepath to just mirror setattr_copy() or switch
> > > > > fully to setattr_copy() (if feasible).
> > > > > 
> > > > > Because as of right now the code seems to imply that the xfs code itself
> > > > > is responsible for stripping s{g,u}id bits for all files whereas it is
> > > > > the vfs that does it for any non-directory. So I'd propose to either try
> > > > > and switch that code to setattr_copy() or to do open-code the
> > > > > setattr_copy() check:
> 
> I did some more research on it and seems like modes are already stripped
> enough.
> 
> notify_change() -> inode->i_op->setattr() -> xfs_vn_setattr() ->
> xfs_vn_change_ok() -> prepare_setattr()
> which has the following:
>         if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
>                          i_gid_into_mnt(mnt_userns, inode)) &&
>              !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
>                  attr->ia_mode &= ~S_ISGID;
> 
> After xfs_vn_change_ok() xfs_setattr_nonsize() is finally called and
> additionally strips sgid and suid.
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 09211e1d08ad..7fda5ff3ef17 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -767,16 +767,6 @@ xfs_setattr_nonsize(
>                 gid = (mask & ATTR_GID) ? iattr->ia_gid : igid;
>                 uid = (mask & ATTR_UID) ? iattr->ia_uid : iuid;
> 
> -               /*
> -                * CAP_FSETID overrides the following restrictions:
> -                *
> -                * The set-user-ID and set-group-ID bits of a file will be
> -                * cleared upon successful return from chown()
> -                */
> -               if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
> -                   !capable(CAP_FSETID))
> -                       inode->i_mode &= ~(S_ISUID|S_ISGID);
> -
>                 /*
>                  * Change the ownerships and register quota modifications
>                  * in the transaction.
> 
> 
> root@debian:/mnt/xfs# unshare -U --map-root
> root@debian:/mnt/xfs# touch testfile
> root@debian:/mnt/xfs# chmod g+s testfile
> root@debian:/mnt/xfs# ls -la testfile
> -rw-r-Sr-- 1 root root 0 Feb 22 14:46 testfile
> root@debian:/mnt/xfs# chown 0:0 testfile
> root@debian:/mnt/xfs# ls -la testfile
> -rw-r-Sr-- 1 root root 0 Feb 22 14:46 testfile
> root@debian:/mnt/xfs#
> 
> root@debian:/mnt/xfs# mkdir testdir
> root@debian:/mnt/xfs# chmod u+s testdir
> root@debian:/mnt/xfs# chmod u+g testdir
> root@debian:/mnt/xfs#
> root@debian:/mnt/xfs# ls -la
> total 4
> drwxr-xr-x 4 root root   50 Feb 22 14:47 .
> drwxr-xr-x 4 root root 4096 Feb 22 13:45 ..
> drwsr-sr-x 2 root root    6 Feb 22 14:42 test1
> drwsr-xr-x 2 root root    6 Feb 22 14:47 testdir
> -rw-r-Sr-- 1 root root    0 Feb 22 14:46 testfile
> root@debian:/mnt/xfs# chown 0:0 testdir
> root@debian:/mnt/xfs# ls -la
> total 4
> drwxr-xr-x 4 root root   50 Feb 22 14:47 .
> drwxr-xr-x 4 root root 4096 Feb 22 13:45 ..
> drwsr-sr-x 2 root root    6 Feb 22 14:42 test1
> drwsr-xr-x 2 root root    6 Feb 22 14:47 testdir
> -rw-r-Sr-- 1 root root    0 Feb 22 14:46 testfile
> 
> > > > > 
> > > > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > > > index b79b3846e71b..ff55b31521a2 100644
> > > > > --- a/fs/xfs/xfs_iops.c
> > > > > +++ b/fs/xfs/xfs_iops.c
> > > > > @@ -748,9 +748,13 @@ xfs_setattr_nonsize(
> > > > >                    * The set-user-ID and set-group-ID bits of a file will be
> > > > >                    * cleared upon successful return from chown()
> > > > >                    */
> > > > > -               if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
> > > > > -                   !capable(CAP_FSETID))
> > > > > -                       inode->i_mode &= ~(S_ISUID|S_ISGID);
> > > > > +               if (iattr->ia_valid & ATTR_MODE) {
> > > > > +                       umode_t mode = iattr->ia_mode;
> > > > > +                       if (!in_group_p(i_gid_into_mnt(mnt_userns, inode)) &&
> > > > > +                           !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
> > > > > +                               mode &= ~S_ISGID;
> > > > > +                       inode->i_mode = mode;
> > > > > +               }
> > > > > 
> > > > >                   /*
> > > > >                    * Change the ownerships and register quota modifications
> > > > > 
> > > > > which aligns xfs with ext4 and any other filesystem. Any thoughts on
> > > > > this?
> > > > > 
> > > > > For @Andrey specifically: the tests these should go into:
> > > > > 
> > > > > https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/idmapped-mounts/idmapped-mounts.c
> > > > > 
> > > > > Note that there are already setgid inheritance tests and set*id
> > > > > execution tests in there.
> > > > > You should be able to copy a lot of code from them. Could you please add
> > > > > the test I sketched above and ideally also a test that the set{g,u}id
> > > > > bits are stripped during chown for regular files?
> > > > Thanks for the link. To clarify what tests and result you expect:
> > > > - for directory chown we expect to preserve s{g,u}id
> > > > - for regfile chown we expect to preserve S_ISGID only if S_IXGRP is absent
> > > > and CAP_FSETID is present
> > > 
> > > So specifically for chown():
> > > 1. if regfile
> > >     -> strip suid bit unconditionally
> > >     -> strip sgid bit if inode has sgid bit and file is group-executable
> > > 2. if directory
> > >     -> strip sgid bit if inode's gid is neither among the caller's groups
> > >        nor is the caller capable wrt to that inode
> > > The behavior described in 2. is encoded in the vfs generic
> > > setattr_copy() helper. And that is what we should see.
> > > 
> > > The behavior of ext4 and btrfs is what we should see afaict as both use
> > > setattr_copy().
> > > 
> > > > 
> > > > JFYI: I found out this problem while running LTP (specifically
> > > > syscalls/chown02 test) on idmapped XFS. Maybe I will be able to find more,
> > > > who knows.
> > > 
> > > Hm, if you look above, then you can see that the failure (or difference
> > > in behavior) you're reporting is independent of idmapped mounts. An
> > > ext4 directory shows different behavior than an xfs directory on a
> > > regular system without any idmapped mounts used. So I'm not clear how
> > > that's specifically related to idmapped mounts yet.
> 
> I guess my commit message is pretty poor. Initially I found out that chown()
> on idmapped xfs (+userns) drops sgid unconditionally on regfiles. I did not
> thought about directories at all. It is good that you pointed it out.
> The problem is indeed independent from idmapping. However I thought this
> belonged to it since most of the checks were updated with idmapped series.
> Thanks for the explanation

No, not at all. It's perfectly fine to not know all the details right of
the bat! Thank your for bringing this up so I could take a closer look
at this.

> 
> > 
> > So for example, in order to cause the sgid bit stripped while it should
> > be preserved if xfs were to use setattr_copy() I can simply do:
> > 
> > brauner@wittgenstein|~/src/git/linux/ltp/testcases/kernel/syscalls/chown|master %=
> > > unshare -U --map-root
> > root@wittgenstein|~/src/git/linux/ltp/testcases/kernel/syscalls/chown|master %=
> > > PATH=$PATH:$PWD ./chown02
> > tst_memutils.c:157: TWARN: Can't adjust score, even with capabilities!?
> > tst_test.c:1455: TINFO: Timeout per run is 0h 05m 00s
> > chown02.c:45: TPASS: chown(testfile1, 0, 0) passed
> > chown02.c:45: TPASS: chown(testfile2, 0, 0) passed
> > chown02.c:57: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700
> > 
> > Summary:
> > passed   2
> > failed   1
> > broken   0
> > skipped  0
> > warnings 1
> > 
> > There's no idmapped mounts here in play. The caller simply has been
> > placed in a new user namespace and thus they fail the current
> > capable(CAP_FSETID) check which will cause xfs to strip the sgid bit >
> > Now trying the same with ext4:
> > 
> > ubuntu@f2-vm:~/src/git/linux/ltp/testcases/kernel/syscalls/chown$ unshare -U --map-root
> > root@f2-vm:~/src/git/linux/ltp/testcases/kernel/syscalls/chown# PATH=$PATH:$PWD ./chown02
> > tst_memutils.c:157: TWARN: Can't adjust score, even with capabilities!?
> > tst_test.c:1455: TINFO: Timeout per run is 0h 05m 00s
> > chown02.c:45: TPASS: chown(testfile1, 0, 0) passed
> > chown02.c:45: TPASS: chown(testfile2, 0, 0) passed
> > 
> > Summary:
> > passed   2
> > failed   0
> > broken   0
> > skipped  0
> > warnings 1
> > 
> > it passes since ext4 uses setattr_copy() and thus the capability is
> > checked for in the caller's user namespace.
> > 
> > > 
> > > Fwiw, one part in your commit message is a bit misleading:
> > > 
> > > > > > > has CAP_FSETID capable in init_user_ns rather than mntns userns.
> > > 
> > > that's not what capable_wrt_to_inode_uidgid() does. What it does is to
> > > check whether the caller is capable in their current user namespace.
> > > That's how capable_wrt_to_inode_uidgid() has always worked.
> > > The mnt_userns is only used to idmap the inode's {g,u}id. So if the
> > > caller has CAP_FSETID in its current userns and the inode's {g,u}id have
> > > a valid mapping in the mnt's userns the caller is considered privileged
> > > over that inode.
> > > 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-22 14:54           ` Andrey Zhadchenko
  2022-02-22 15:03             ` Christian Brauner
@ 2022-02-22 21:40             ` Dave Chinner
  2022-02-23  8:11             ` Christian Brauner
  2 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2022-02-22 21:40 UTC (permalink / raw)
  To: Andrey Zhadchenko; +Cc: Christian Brauner, Christoph Hellwig, linux-xfs, djwong

On Tue, Feb 22, 2022 at 05:54:07PM +0300, Andrey Zhadchenko wrote:
> On 2/22/22 15:36, Christian Brauner wrote:
> > > > > Because as of right now the code seems to imply that the xfs code itself
> > > > > is responsible for stripping s{g,u}id bits for all files whereas it is
> > > > > the vfs that does it for any non-directory. So I'd propose to either try
> > > > > and switch that code to setattr_copy() or to do open-code the
> > > > > setattr_copy() check:
> 
> I did some more research on it and seems like modes are already stripped
> enough.
> 
> notify_change() -> inode->i_op->setattr() -> xfs_vn_setattr() ->
> xfs_vn_change_ok() -> prepare_setattr()
> which has the following:
>         if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
>                          i_gid_into_mnt(mnt_userns, inode)) &&
>              !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
>                  attr->ia_mode &= ~S_ISGID;
> 
> After xfs_vn_change_ok() xfs_setattr_nonsize() is finally called and
> additionally strips sgid and suid.
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 09211e1d08ad..7fda5ff3ef17 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -767,16 +767,6 @@ xfs_setattr_nonsize(
>                 gid = (mask & ATTR_GID) ? iattr->ia_gid : igid;
>                 uid = (mask & ATTR_UID) ? iattr->ia_uid : iuid;
> 
> -               /*
> -                * CAP_FSETID overrides the following restrictions:
> -                *
> -                * The set-user-ID and set-group-ID bits of a file will be
> -                * cleared upon successful return from chown()
> -                */
> -               if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
> -                   !capable(CAP_FSETID))
> -                       inode->i_mode &= ~(S_ISUID|S_ISGID);

THis code has been in XFS since 1997 - it addressed shortcomings in
the Irix chown implementation w.r.t. the requirements of CAP_CHOWN
and CAP_FSETID in _POSIX_CHOWN_RESTRICTED configurations.

If the VFS handles all this correctly these days then, yes, we can
just get rid of this code - it's legacy code and we should behave
consistently across all filesystems w.r.t. su/gid files.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-22 14:54           ` Andrey Zhadchenko
  2022-02-22 15:03             ` Christian Brauner
  2022-02-22 21:40             ` Dave Chinner
@ 2022-02-23  8:11             ` Christian Brauner
  2 siblings, 0 replies; 16+ messages in thread
From: Christian Brauner @ 2022-02-23  8:11 UTC (permalink / raw)
  To: Andrey Zhadchenko; +Cc: Christoph Hellwig, linux-xfs, djwong

On Tue, Feb 22, 2022 at 05:54:07PM +0300, Andrey Zhadchenko wrote:
> 
> 
> On 2/22/22 15:36, Christian Brauner wrote:
> > On Tue, Feb 22, 2022 at 01:23:31PM +0100, Christian Brauner wrote:
> > > On Tue, Feb 22, 2022 at 02:19:16PM +0300, Andrey Zhadchenko wrote:
> > > > On 2/22/22 13:24, Christian Brauner wrote:
> > > > > On Tue, Feb 22, 2022 at 09:33:40AM +0100, Christoph Hellwig wrote:
> > > > > > On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
> > > > > > > xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
> > > > > > > bits.
> > > > > > > Unfortunately chown syscall results in different callstask:
> > > > > > > i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
> > > > > > > has CAP_FSETID capable in init_user_ns rather than mntns userns.
> > > > > > 
> > > > > > Can you add an xfstests the exercises this path?
> > > > > > 
> > > > > > The fix itself looks good:
> > > > > > 
> > > > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > > 
> > > > > So for anything other than directories the s{g,u}id bits are cleared on
> > > > > every chown in notify_change() by the vfs; even for the root user (Also
> > > > > documented on chown(2) manpage).
> > > > 
> > > > Only exception - chown preserves setgid bit set on a non-group-executable
> > > > file (also documented there) but do not take root privileges into account at
> > > > vfs level.
> > > > 
> > > > > 
> > > > > So the only scenario were this change would be relevant is for
> > > > > directories afaict:
> > > > > 
> > > > > 1. So ext4 has the behavior:
> > > > > 
> > > > >      ubuntu@f2-vm|~
> > > > >      > mkdir suid.dir
> > > > >      ubuntu@f2-vm|~
> > > > >      > perms ./suid.dir
> > > > >      drwxrwxr-x 775 (1000:1000) ./suid.dir
> > > > >      ubuntu@f2-vm|~
> > > > >      > chmod u+s ./suid.dir/
> > > > >      ubuntu@f2-vm|~
> > > > >      > perms ./suid.dir
> > > > >      drwsrwxr-x 4775 (1000:1000) ./suid.dir
> > > > >      ubuntu@f2-vm|~
> > > > >      > chmod g+s ./suid.dir/
> > > > >      ubuntu@f2-vm|~
> > > > >      > perms ./suid.dir
> > > > >      drwsrwsr-x 6775 (1000:1000) ./suid.dir
> > > > >      ubuntu@f2-vm|~
> > > > >      > chown 1000:1000 ./suid.dir/
> > > > >      ubuntu@f2-vm|~
> > > > >      > perms ./suid.dir/
> > > > >      drwsrwsr-x 6775 (1000:1000) ./suid.dir/
> > > > >      meaning that both s{g,u}id bits are retained for directories. (Just to
> > > > >      make this explicit: changing {g,u}id to the same {g,u}id still ends up
> > > > >      calling into the filesystem.)
> > > > > 
> > > > > 2. Whereas xfs currently has:
> > > > > 
> > > > >      brauner@wittgenstein|~
> > > > >      > mkdir suid.dir
> > > > >      brauner@wittgenstein|~
> > > > >      > perms ./suid.dir
> > > > >      drwxrwxr-x 775 ./suid.dir
> > > > >      brauner@wittgenstein|~
> > > > >      > chmod u+s ./suid.dir/
> > > > >      brauner@wittgenstein|~
> > > > >      > perms ./suid.dir
> > > > >      drwsrwxr-x 4775 ./suid.dir
> > > > >      brauner@wittgenstein|~
> > > > >      > chmod g+s ./suid.dir/
> > > > >      brauner@wittgenstein|~
> > > > >      > perms ./suid.dir
> > > > >      drwsrwsr-x 6775 ./suid.dir
> > > > >      brauner@wittgenstein|~
> > > > >      > chown 1000:1000 ./suid.dir/
> > > > >      brauner@wittgenstein|~
> > > > >      > perms ./suid.dir/
> > > > >      drwxrwxr-x 775 ./suid.dir/
> > > > >      meaning that both s{g,u}id bits are cleared for directories.
> > > > > 
> > > > > Since the vfs will always ensure that s{g,u}id bits are stripped for
> > > > > anything that isn't a directory in the vfs:
> > > > > - ATTR_KILL_S{G,U}ID is raised in chown_common():
> > > > > 
> > > > > 	if (!S_ISDIR(inode->i_mode))
> > > > > 		newattrs.ia_valid |=
> > > > > 			ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
> > > > > 
> > > > > - and then in notify_change() we'll get the bits stripped and ATTR_MODE
> > > > >     raised:
> > > > > 
> > > > > 	if (ia_valid & ATTR_KILL_SUID) {
> > > > > 		if (mode & S_ISUID) {
> > > > > 			ia_valid = attr->ia_valid |= ATTR_MODE;
> > > > > 			attr->ia_mode = (inode->i_mode & ~S_ISUID);
> > > > > 		}
> > > > > 	}
> > > > > 	if (ia_valid & ATTR_KILL_SGID) {
> > > > > 		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> > > > 
> > > > So SGID is not killed if there is no S_IXGRP (yet no capability check)
> > > > 
> > > > Actually I do not really understand why do kernel expects filesystems to
> > > > further apply restrictions with CAP_FSETID. Why not kill it here since we
> > > > have all info?
> > > 
> > > Some filesystems do treat the sgid behavior of directories special (some
> > > network filesystems do where they send that information to the server
> > > before updating the inode afair). So I'd rather not do that in there as
> > > we're risking breaking expectations and it's a very sensitive change.
> > > 
> > > Plus, the logic is encapsulated in the vfs generic setattr_copy() helper
> > > which nearly all filesystems call.
> > > 
> > > > 
> > > > > 			if (!(ia_valid & ATTR_MODE)) {
> > > > > 				ia_valid = attr->ia_valid |= ATTR_MODE;
> > > > > 				attr->ia_mode = inode->i_mode;
> > > > > 			}
> > > > > 			attr->ia_mode &= ~S_ISGID;
> > > > > 		}
> > > > > 	}
> > > > > 
> > > > > we can change this codepath to just mirror setattr_copy() or switch
> > > > > fully to setattr_copy() (if feasible).
> > > > > 
> > > > > Because as of right now the code seems to imply that the xfs code itself
> > > > > is responsible for stripping s{g,u}id bits for all files whereas it is
> > > > > the vfs that does it for any non-directory. So I'd propose to either try
> > > > > and switch that code to setattr_copy() or to do open-code the
> > > > > setattr_copy() check:
> 
> I did some more research on it and seems like modes are already stripped
> enough.
> 
> notify_change() -> inode->i_op->setattr() -> xfs_vn_setattr() ->
> xfs_vn_change_ok() -> prepare_setattr()
> which has the following:
>         if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
>                          i_gid_into_mnt(mnt_userns, inode)) &&
>              !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
>                  attr->ia_mode &= ~S_ISGID;
> 
> After xfs_vn_change_ok() xfs_setattr_nonsize() is finally called and
> additionally strips sgid and suid.

Ok, good. Can you send a patch that removes this code and add the tests
we talked about? That would be great!

Thanks!
Christian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-21 18:22 [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts Andrey Zhadchenko
  2022-02-22  8:33 ` Christoph Hellwig
@ 2022-02-25  1:57 ` Darrick J. Wong
  2022-02-25  9:45   ` Christian Brauner
  1 sibling, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2022-02-25  1:57 UTC (permalink / raw)
  To: Andrey Zhadchenko; +Cc: linux-xfs, christian.brauner, hch

On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
> xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
> bits.
> Unfortunately chown syscall results in different callstask:
> i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
> has CAP_FSETID capable in init_user_ns rather than mntns userns.
> 
> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>

LGTM...
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_iops.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 09211e1d08ad..5b1fe635d153 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -774,7 +774,7 @@ xfs_setattr_nonsize(
>  		 * cleared upon successful return from chown()
>  		 */
>  		if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
> -		    !capable(CAP_FSETID))
> +		    !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
>  			inode->i_mode &= ~(S_ISUID|S_ISGID);
>  
>  		/*
> -- 
> 2.35.0.rc2
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-25  1:57 ` Darrick J. Wong
@ 2022-02-25  9:45   ` Christian Brauner
  2022-02-25 10:42     ` Andrey Zhadchenko
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2022-02-25  9:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Andrey Zhadchenko, linux-xfs, christian.brauner, hch

On Thu, Feb 24, 2022 at 05:57:38PM -0800, Darrick J. Wong wrote:
> On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
> > xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
> > bits.
> > Unfortunately chown syscall results in different callstask:
> > i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
> > has CAP_FSETID capable in init_user_ns rather than mntns userns.
> > 
> > Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
> 
> LGTM...
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Darrick, could I ask you to please wait with applying.
The correct fix for this is either to simply remove the check here
altogether as we figured out in the thread or to switch to a generic vfs
helper setattr_copy().
Andrey will send a new patch in the not too distant future afaict
including tests.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-25  9:45   ` Christian Brauner
@ 2022-02-25 10:42     ` Andrey Zhadchenko
  2022-02-25 17:11       ` Darrick J. Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Zhadchenko @ 2022-02-25 10:42 UTC (permalink / raw)
  To: Christian Brauner, Darrick J. Wong; +Cc: linux-xfs, christian.brauner, hch



On 2/25/22 12:45, Christian Brauner wrote:
> On Thu, Feb 24, 2022 at 05:57:38PM -0800, Darrick J. Wong wrote:
>> On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
>>> xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
>>> bits.
>>> Unfortunately chown syscall results in different callstask:
>>> i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
>>> has CAP_FSETID capable in init_user_ns rather than mntns userns.
>>>
>>> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
>>
>> LGTM...
>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 
> Darrick, could I ask you to please wait with applying.
> The correct fix for this is either to simply remove the check here
> altogether as we figured out in the thread or to switch to a generic vfs
> helper setattr_copy().
> Andrey will send a new patch in the not too distant future afaict
> including tests.

Yes, please do not apply this patch for now. I am currently working on 
next version, however it is postponed a bit due to my personal affairs.
Also I intend to add a second patch for xfs_fileattr_set() since it has 
the similar flaw - it may drop S_ISUID|S_ISGID for directories and may 
not drop S_ISUID for executable files.
I expect to send patches next week alongside with new xfstests.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
  2022-02-25 10:42     ` Andrey Zhadchenko
@ 2022-02-25 17:11       ` Darrick J. Wong
  0 siblings, 0 replies; 16+ messages in thread
From: Darrick J. Wong @ 2022-02-25 17:11 UTC (permalink / raw)
  To: Andrey Zhadchenko; +Cc: Christian Brauner, linux-xfs, christian.brauner, hch

On Fri, Feb 25, 2022 at 01:42:26PM +0300, Andrey Zhadchenko wrote:
> 
> 
> On 2/25/22 12:45, Christian Brauner wrote:
> > On Thu, Feb 24, 2022 at 05:57:38PM -0800, Darrick J. Wong wrote:
> > > On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
> > > > xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
> > > > bits.
> > > > Unfortunately chown syscall results in different callstask:
> > > > i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
> > > > has CAP_FSETID capable in init_user_ns rather than mntns userns.
> > > > 
> > > > Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
> > > 
> > > LGTM...
> > > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > 
> > Darrick, could I ask you to please wait with applying.
> > The correct fix for this is either to simply remove the check here
> > altogether as we figured out in the thread or to switch to a generic vfs
> > helper setattr_copy().
> > Andrey will send a new patch in the not too distant future afaict
> > including tests.
> 
> Yes, please do not apply this patch for now. I am currently working on next
> version, however it is postponed a bit due to my personal affairs.
> Also I intend to add a second patch for xfs_fileattr_set() since it has the
> similar flaw - it may drop S_ISUID|S_ISGID for directories and may not drop
> S_ISUID for executable files.
> I expect to send patches next week alongside with new xfstests.

Ah, fair enough.  Felipe noticed that generic/673 produced inconsistent
outputs between btrfs and xfs last week and had asked about why the
setuid/setgid dropping behavior was unique to xfs.  We discovered that
xfs' omission of the setattr_copy logic was behind that...

--D

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-02-25 17:11 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-21 18:22 [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts Andrey Zhadchenko
2022-02-22  8:33 ` Christoph Hellwig
2022-02-22  9:25   ` Andrey Zhadchenko
2022-02-22 10:24   ` Christian Brauner
2022-02-22 11:19     ` Andrey Zhadchenko
2022-02-22 12:23       ` Christian Brauner
2022-02-22 12:36         ` Christian Brauner
2022-02-22 12:44           ` Christian Brauner
2022-02-22 14:54           ` Andrey Zhadchenko
2022-02-22 15:03             ` Christian Brauner
2022-02-22 21:40             ` Dave Chinner
2022-02-23  8:11             ` Christian Brauner
2022-02-25  1:57 ` Darrick J. Wong
2022-02-25  9:45   ` Christian Brauner
2022-02-25 10:42     ` Andrey Zhadchenko
2022-02-25 17:11       ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.