From: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>,
linux-xfs@vger.kernel.org, djwong@kernel.org
Subject: Re: [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts
Date: Tue, 22 Feb 2022 17:54:07 +0300 [thread overview]
Message-ID: <48bcd8ac-f9e5-a83c-604c-5af602cb362a@virtuozzo.com> (raw)
In-Reply-To: <20220222123656.433l67bxhv3s2vbo@wittgenstein>
On 2/22/22 15:36, Christian Brauner wrote:
> On Tue, Feb 22, 2022 at 01:23:31PM +0100, Christian Brauner wrote:
>> On Tue, Feb 22, 2022 at 02:19:16PM +0300, Andrey Zhadchenko wrote:
>>> On 2/22/22 13:24, Christian Brauner wrote:
>>>> On Tue, Feb 22, 2022 at 09:33:40AM +0100, Christoph Hellwig wrote:
>>>>> On Mon, Feb 21, 2022 at 09:22:18PM +0300, Andrey Zhadchenko wrote:
>>>>>> xfs_fileattr_set() handles idmapped mounts correctly and do not drop this
>>>>>> bits.
>>>>>> Unfortunately chown syscall results in different callstask:
>>>>>> i_op->xfs_vn_setattr()->...->xfs_setattr_nonsize() which checks if process
>>>>>> has CAP_FSETID capable in init_user_ns rather than mntns userns.
>>>>>
>>>>> Can you add an xfstests the exercises this path?
>>>>>
>>>>> The fix itself looks good:
>>>>>
>>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>>>
>>>> So for anything other than directories the s{g,u}id bits are cleared on
>>>> every chown in notify_change() by the vfs; even for the root user (Also
>>>> documented on chown(2) manpage).
>>>
>>> Only exception - chown preserves setgid bit set on a non-group-executable
>>> file (also documented there) but do not take root privileges into account at
>>> vfs level.
>>>
>>>>
>>>> So the only scenario were this change would be relevant is for
>>>> directories afaict:
>>>>
>>>> 1. So ext4 has the behavior:
>>>>
>>>> ubuntu@f2-vm|~
>>>> > mkdir suid.dir
>>>> ubuntu@f2-vm|~
>>>> > perms ./suid.dir
>>>> drwxrwxr-x 775 (1000:1000) ./suid.dir
>>>> ubuntu@f2-vm|~
>>>> > chmod u+s ./suid.dir/
>>>> ubuntu@f2-vm|~
>>>> > perms ./suid.dir
>>>> drwsrwxr-x 4775 (1000:1000) ./suid.dir
>>>> ubuntu@f2-vm|~
>>>> > chmod g+s ./suid.dir/
>>>> ubuntu@f2-vm|~
>>>> > perms ./suid.dir
>>>> drwsrwsr-x 6775 (1000:1000) ./suid.dir
>>>> ubuntu@f2-vm|~
>>>> > chown 1000:1000 ./suid.dir/
>>>> ubuntu@f2-vm|~
>>>> > perms ./suid.dir/
>>>> drwsrwsr-x 6775 (1000:1000) ./suid.dir/
>>>> meaning that both s{g,u}id bits are retained for directories. (Just to
>>>> make this explicit: changing {g,u}id to the same {g,u}id still ends up
>>>> calling into the filesystem.)
>>>>
>>>> 2. Whereas xfs currently has:
>>>>
>>>> brauner@wittgenstein|~
>>>> > mkdir suid.dir
>>>> brauner@wittgenstein|~
>>>> > perms ./suid.dir
>>>> drwxrwxr-x 775 ./suid.dir
>>>> brauner@wittgenstein|~
>>>> > chmod u+s ./suid.dir/
>>>> brauner@wittgenstein|~
>>>> > perms ./suid.dir
>>>> drwsrwxr-x 4775 ./suid.dir
>>>> brauner@wittgenstein|~
>>>> > chmod g+s ./suid.dir/
>>>> brauner@wittgenstein|~
>>>> > perms ./suid.dir
>>>> drwsrwsr-x 6775 ./suid.dir
>>>> brauner@wittgenstein|~
>>>> > chown 1000:1000 ./suid.dir/
>>>> brauner@wittgenstein|~
>>>> > perms ./suid.dir/
>>>> drwxrwxr-x 775 ./suid.dir/
>>>> meaning that both s{g,u}id bits are cleared for directories.
>>>>
>>>> Since the vfs will always ensure that s{g,u}id bits are stripped for
>>>> anything that isn't a directory in the vfs:
>>>> - ATTR_KILL_S{G,U}ID is raised in chown_common():
>>>>
>>>> if (!S_ISDIR(inode->i_mode))
>>>> newattrs.ia_valid |=
>>>> ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
>>>>
>>>> - and then in notify_change() we'll get the bits stripped and ATTR_MODE
>>>> raised:
>>>>
>>>> if (ia_valid & ATTR_KILL_SUID) {
>>>> if (mode & S_ISUID) {
>>>> ia_valid = attr->ia_valid |= ATTR_MODE;
>>>> attr->ia_mode = (inode->i_mode & ~S_ISUID);
>>>> }
>>>> }
>>>> if (ia_valid & ATTR_KILL_SGID) {
>>>> if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
>>>
>>> So SGID is not killed if there is no S_IXGRP (yet no capability check)
>>>
>>> Actually I do not really understand why do kernel expects filesystems to
>>> further apply restrictions with CAP_FSETID. Why not kill it here since we
>>> have all info?
>>
>> Some filesystems do treat the sgid behavior of directories special (some
>> network filesystems do where they send that information to the server
>> before updating the inode afair). So I'd rather not do that in there as
>> we're risking breaking expectations and it's a very sensitive change.
>>
>> Plus, the logic is encapsulated in the vfs generic setattr_copy() helper
>> which nearly all filesystems call.
>>
>>>
>>>> if (!(ia_valid & ATTR_MODE)) {
>>>> ia_valid = attr->ia_valid |= ATTR_MODE;
>>>> attr->ia_mode = inode->i_mode;
>>>> }
>>>> attr->ia_mode &= ~S_ISGID;
>>>> }
>>>> }
>>>>
>>>> we can change this codepath to just mirror setattr_copy() or switch
>>>> fully to setattr_copy() (if feasible).
>>>>
>>>> Because as of right now the code seems to imply that the xfs code itself
>>>> is responsible for stripping s{g,u}id bits for all files whereas it is
>>>> the vfs that does it for any non-directory. So I'd propose to either try
>>>> and switch that code to setattr_copy() or to do open-code the
>>>> setattr_copy() check:
I did some more research on it and seems like modes are already stripped
enough.
notify_change() -> inode->i_op->setattr() -> xfs_vn_setattr() ->
xfs_vn_change_ok() -> prepare_setattr()
which has the following:
if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
i_gid_into_mnt(mnt_userns, inode)) &&
!capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
attr->ia_mode &= ~S_ISGID;
After xfs_vn_change_ok() xfs_setattr_nonsize() is finally called and
additionally strips sgid and suid.
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 09211e1d08ad..7fda5ff3ef17 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -767,16 +767,6 @@ xfs_setattr_nonsize(
gid = (mask & ATTR_GID) ? iattr->ia_gid : igid;
uid = (mask & ATTR_UID) ? iattr->ia_uid : iuid;
- /*
- * CAP_FSETID overrides the following restrictions:
- *
- * The set-user-ID and set-group-ID bits of a file will be
- * cleared upon successful return from chown()
- */
- if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
- !capable(CAP_FSETID))
- inode->i_mode &= ~(S_ISUID|S_ISGID);
-
/*
* Change the ownerships and register quota modifications
* in the transaction.
root@debian:/mnt/xfs# unshare -U --map-root
root@debian:/mnt/xfs# touch testfile
root@debian:/mnt/xfs# chmod g+s testfile
root@debian:/mnt/xfs# ls -la testfile
-rw-r-Sr-- 1 root root 0 Feb 22 14:46 testfile
root@debian:/mnt/xfs# chown 0:0 testfile
root@debian:/mnt/xfs# ls -la testfile
-rw-r-Sr-- 1 root root 0 Feb 22 14:46 testfile
root@debian:/mnt/xfs#
root@debian:/mnt/xfs# mkdir testdir
root@debian:/mnt/xfs# chmod u+s testdir
root@debian:/mnt/xfs# chmod u+g testdir
root@debian:/mnt/xfs#
root@debian:/mnt/xfs# ls -la
total 4
drwxr-xr-x 4 root root 50 Feb 22 14:47 .
drwxr-xr-x 4 root root 4096 Feb 22 13:45 ..
drwsr-sr-x 2 root root 6 Feb 22 14:42 test1
drwsr-xr-x 2 root root 6 Feb 22 14:47 testdir
-rw-r-Sr-- 1 root root 0 Feb 22 14:46 testfile
root@debian:/mnt/xfs# chown 0:0 testdir
root@debian:/mnt/xfs# ls -la
total 4
drwxr-xr-x 4 root root 50 Feb 22 14:47 .
drwxr-xr-x 4 root root 4096 Feb 22 13:45 ..
drwsr-sr-x 2 root root 6 Feb 22 14:42 test1
drwsr-xr-x 2 root root 6 Feb 22 14:47 testdir
-rw-r-Sr-- 1 root root 0 Feb 22 14:46 testfile
>>>>
>>>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>>>> index b79b3846e71b..ff55b31521a2 100644
>>>> --- a/fs/xfs/xfs_iops.c
>>>> +++ b/fs/xfs/xfs_iops.c
>>>> @@ -748,9 +748,13 @@ xfs_setattr_nonsize(
>>>> * The set-user-ID and set-group-ID bits of a file will be
>>>> * cleared upon successful return from chown()
>>>> */
>>>> - if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
>>>> - !capable(CAP_FSETID))
>>>> - inode->i_mode &= ~(S_ISUID|S_ISGID);
>>>> + if (iattr->ia_valid & ATTR_MODE) {
>>>> + umode_t mode = iattr->ia_mode;
>>>> + if (!in_group_p(i_gid_into_mnt(mnt_userns, inode)) &&
>>>> + !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
>>>> + mode &= ~S_ISGID;
>>>> + inode->i_mode = mode;
>>>> + }
>>>>
>>>> /*
>>>> * Change the ownerships and register quota modifications
>>>>
>>>> which aligns xfs with ext4 and any other filesystem. Any thoughts on
>>>> this?
>>>>
>>>> For @Andrey specifically: the tests these should go into:
>>>>
>>>> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/idmapped-mounts/idmapped-mounts.c
>>>>
>>>> Note that there are already setgid inheritance tests and set*id
>>>> execution tests in there.
>>>> You should be able to copy a lot of code from them. Could you please add
>>>> the test I sketched above and ideally also a test that the set{g,u}id
>>>> bits are stripped during chown for regular files?
>>> Thanks for the link. To clarify what tests and result you expect:
>>> - for directory chown we expect to preserve s{g,u}id
>>> - for regfile chown we expect to preserve S_ISGID only if S_IXGRP is absent
>>> and CAP_FSETID is present
>>
>> So specifically for chown():
>> 1. if regfile
>> -> strip suid bit unconditionally
>> -> strip sgid bit if inode has sgid bit and file is group-executable
>> 2. if directory
>> -> strip sgid bit if inode's gid is neither among the caller's groups
>> nor is the caller capable wrt to that inode
>> The behavior described in 2. is encoded in the vfs generic
>> setattr_copy() helper. And that is what we should see.
>>
>> The behavior of ext4 and btrfs is what we should see afaict as both use
>> setattr_copy().
>>
>>>
>>> JFYI: I found out this problem while running LTP (specifically
>>> syscalls/chown02 test) on idmapped XFS. Maybe I will be able to find more,
>>> who knows.
>>
>> Hm, if you look above, then you can see that the failure (or difference
>> in behavior) you're reporting is independent of idmapped mounts. An
>> ext4 directory shows different behavior than an xfs directory on a
>> regular system without any idmapped mounts used. So I'm not clear how
>> that's specifically related to idmapped mounts yet.
I guess my commit message is pretty poor. Initially I found out that
chown() on idmapped xfs (+userns) drops sgid unconditionally on
regfiles. I did not thought about directories at all. It is good that
you pointed it out.
The problem is indeed independent from idmapping. However I thought this
belonged to it since most of the checks were updated with idmapped
series. Thanks for the explanation
>
> So for example, in order to cause the sgid bit stripped while it should
> be preserved if xfs were to use setattr_copy() I can simply do:
>
> brauner@wittgenstein|~/src/git/linux/ltp/testcases/kernel/syscalls/chown|master %=
>> unshare -U --map-root
> root@wittgenstein|~/src/git/linux/ltp/testcases/kernel/syscalls/chown|master %=
>> PATH=$PATH:$PWD ./chown02
> tst_memutils.c:157: TWARN: Can't adjust score, even with capabilities!?
> tst_test.c:1455: TINFO: Timeout per run is 0h 05m 00s
> chown02.c:45: TPASS: chown(testfile1, 0, 0) passed
> chown02.c:45: TPASS: chown(testfile2, 0, 0) passed
> chown02.c:57: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700
>
> Summary:
> passed 2
> failed 1
> broken 0
> skipped 0
> warnings 1
>
> There's no idmapped mounts here in play. The caller simply has been
> placed in a new user namespace and thus they fail the current
> capable(CAP_FSETID) check which will cause xfs to strip the sgid bit >
> Now trying the same with ext4:
>
> ubuntu@f2-vm:~/src/git/linux/ltp/testcases/kernel/syscalls/chown$ unshare -U --map-root
> root@f2-vm:~/src/git/linux/ltp/testcases/kernel/syscalls/chown# PATH=$PATH:$PWD ./chown02
> tst_memutils.c:157: TWARN: Can't adjust score, even with capabilities!?
> tst_test.c:1455: TINFO: Timeout per run is 0h 05m 00s
> chown02.c:45: TPASS: chown(testfile1, 0, 0) passed
> chown02.c:45: TPASS: chown(testfile2, 0, 0) passed
>
> Summary:
> passed 2
> failed 0
> broken 0
> skipped 0
> warnings 1
>
> it passes since ext4 uses setattr_copy() and thus the capability is
> checked for in the caller's user namespace.
>
>>
>> Fwiw, one part in your commit message is a bit misleading:
>>
>>>>>> has CAP_FSETID capable in init_user_ns rather than mntns userns.
>>
>> that's not what capable_wrt_to_inode_uidgid() does. What it does is to
>> check whether the caller is capable in their current user namespace.
>> That's how capable_wrt_to_inode_uidgid() has always worked.
>> The mnt_userns is only used to idmap the inode's {g,u}id. So if the
>> caller has CAP_FSETID in its current userns and the inode's {g,u}id have
>> a valid mapping in the mnt's userns the caller is considered privileged
>> over that inode.
>>
next prev parent reply other threads:[~2022-02-22 14:54 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-21 18:22 [PATCH] xfs: do not clear S_ISUID|S_ISGID for idmapped mounts Andrey Zhadchenko
2022-02-22 8:33 ` Christoph Hellwig
2022-02-22 9:25 ` Andrey Zhadchenko
2022-02-22 10:24 ` Christian Brauner
2022-02-22 11:19 ` Andrey Zhadchenko
2022-02-22 12:23 ` Christian Brauner
2022-02-22 12:36 ` Christian Brauner
2022-02-22 12:44 ` Christian Brauner
2022-02-22 14:54 ` Andrey Zhadchenko [this message]
2022-02-22 15:03 ` Christian Brauner
2022-02-22 21:40 ` Dave Chinner
2022-02-23 8:11 ` Christian Brauner
2022-02-25 1:57 ` Darrick J. Wong
2022-02-25 9:45 ` Christian Brauner
2022-02-25 10:42 ` Andrey Zhadchenko
2022-02-25 17:11 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48bcd8ac-f9e5-a83c-604c-5af602cb362a@virtuozzo.com \
--to=andrey.zhadchenko@virtuozzo.com \
--cc=brauner@kernel.org \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).