linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Some fixes for overlayfs remove privs
@ 2022-10-03 12:30 Amir Goldstein
  2022-10-03 12:30 ` [PATCH 1/2] ovl: remove privs in ovl_copyfile() Amir Goldstein
  2022-10-03 12:30 ` [PATCH 2/2] ovl: remove privs in ovl_fallocate() Amir Goldstein
  0 siblings, 2 replies; 13+ messages in thread
From: Amir Goldstein @ 2022-10-03 12:30 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Christian Brauner, Yang Xu, Darrick J . Wong, Filipe Manana,
	linux-unionfs, linux-fsdevel

Miklos,

While running latest fstests on overlayfs-next, I noticed these
failures:
generic/673 generic/683 generic/684 generic/685 generic/686 generic/687

Christian has also reported those failures earlier.

Those are not regressions, those are 5 new tests added to fstests and
one test whose expected result was modified by fstests commit b3a59bb6
("generic/673: fix golden output to reflect vfs setgid behavior").

The following two patches aim to fix those test failures, but they are
incomplete - without those patches, the tests fail miserably in all test
cases, because no privs are stripped.

With those two patches, only two test cases are failing, which are the
two test cases whose expectation was changed by fstests commit b3a59bb6.
The reason was explained in [1] and the issue was fixed for xfs by kernel
commit e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes").

Trying to figure out how to fix this hurts my brain, so I'll need
suggestions how to proceed.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-xfs/CAL3q7H47iNQ=Wmk83WcGB-KBJVOEtR9+qGczzCeXJ9Y2KCV25Q@mail.gmail.com/

Amir Goldstein (2):
  ovl: remove privs in ovl_copyfile()
  ovl: remove privs in ovl_fallocate()

 fs/overlayfs/file.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] ovl: remove privs in ovl_copyfile()
  2022-10-03 12:30 [PATCH 0/2] Some fixes for overlayfs remove privs Amir Goldstein
@ 2022-10-03 12:30 ` Amir Goldstein
  2022-10-05 14:37   ` Miklos Szeredi
  2022-10-03 12:30 ` [PATCH 2/2] ovl: remove privs in ovl_fallocate() Amir Goldstein
  1 sibling, 1 reply; 13+ messages in thread
From: Amir Goldstein @ 2022-10-03 12:30 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Christian Brauner, Yang Xu, Darrick J . Wong, Filipe Manana,
	linux-unionfs, linux-fsdevel

Underlying fs doesn't remove privs because copy_range/remap_range are
called with privileged mounter credentials.

This fixes some failures in fstest generic/673.

Fixes: 8ede205541ff ("ovl: add reflink/copyfile/dedup support")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index d17faeb014e5..c8308da8909a 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -567,14 +567,23 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 	const struct cred *old_cred;
 	loff_t ret;
 
+	inode_lock(inode_out);
+	if (op != OVL_DEDUPE) {
+		/* Update mode */
+		ovl_copyattr(inode_out);
+		ret = file_remove_privs(file_out);
+		if (ret)
+			goto out_unlock;
+	}
+
 	ret = ovl_real_fdget(file_out, &real_out);
 	if (ret)
-		return ret;
+		goto out_unlock;
 
 	ret = ovl_real_fdget(file_in, &real_in);
 	if (ret) {
 		fdput(real_out);
-		return ret;
+		goto out_unlock;
 	}
 
 	old_cred = ovl_override_creds(file_inode(file_out)->i_sb);
@@ -603,6 +612,9 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 	fdput(real_in);
 	fdput(real_out);
 
+out_unlock:
+	inode_unlock(inode_out);
+
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] ovl: remove privs in ovl_fallocate()
  2022-10-03 12:30 [PATCH 0/2] Some fixes for overlayfs remove privs Amir Goldstein
  2022-10-03 12:30 ` [PATCH 1/2] ovl: remove privs in ovl_copyfile() Amir Goldstein
@ 2022-10-03 12:30 ` Amir Goldstein
  2022-10-04 10:59   ` Christian Brauner
  2022-10-05 14:40   ` Miklos Szeredi
  1 sibling, 2 replies; 13+ messages in thread
From: Amir Goldstein @ 2022-10-03 12:30 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Christian Brauner, Yang Xu, Darrick J . Wong, Filipe Manana,
	linux-unionfs, linux-fsdevel

Underlying fs doesn't remove privs because fallocate is called with
privileged mounter credentials.

This fixes some failure in fstests generic/683..687.

Fixes: aab8848cee5e ("ovl: add ovl_fallocate()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index c8308da8909a..e90ac5376456 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -517,9 +517,16 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 	const struct cred *old_cred;
 	int ret;
 
+	inode_lock(inode);
+	/* Update mode */
+	ovl_copyattr(inode);
+	ret = file_remove_privs(file);
+	if (ret)
+		goto out_unlock;
+
 	ret = ovl_real_fdget(file, &real);
 	if (ret)
-		return ret;
+		goto out_unlock;
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	ret = vfs_fallocate(real.file, mode, offset, len);
@@ -530,6 +537,9 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 
 	fdput(real);
 
+out_unlock:
+	inode_unlock(inode);
+
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] ovl: remove privs in ovl_fallocate()
  2022-10-03 12:30 ` [PATCH 2/2] ovl: remove privs in ovl_fallocate() Amir Goldstein
@ 2022-10-04 10:59   ` Christian Brauner
  2022-10-04 13:20     ` Christian Brauner
  2022-10-04 14:01     ` Amir Goldstein
  2022-10-05 14:40   ` Miklos Szeredi
  1 sibling, 2 replies; 13+ messages in thread
From: Christian Brauner @ 2022-10-04 10:59 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Yang Xu, Darrick J . Wong, Filipe Manana,
	linux-unionfs, linux-fsdevel

On Mon, Oct 03, 2022 at 03:30:40PM +0300, Amir Goldstein wrote:
> Underlying fs doesn't remove privs because fallocate is called with
> privileged mounter credentials.
> 
> This fixes some failure in fstests generic/683..687.
> 
> Fixes: aab8848cee5e ("ovl: add ovl_fallocate()")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/file.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index c8308da8909a..e90ac5376456 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -517,9 +517,16 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
>  	const struct cred *old_cred;
>  	int ret;
>  
> +	inode_lock(inode);
> +	/* Update mode */
> +	ovl_copyattr(inode);
> +	ret = file_remove_privs(file);

First, thank you for picking this up!

Let me analyze generic/683 failure of Test1 to see why you still see
failures in this test:

echo "Test 1 - qa_user, non-exec file $verb"
setup_testfile
chmod a+rws $junk_file
commit_and_check "$qa_user" "$verb" 64k 64k

So this creates a file with 6666 permissions. While the file has the
S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. This is
important in a little bit.

On a regular filesystem like xfs what will happen is:

sys_fallocate()
-> vfs_fallocate()
   -> xfs_file_fallocate()
      -> file_modified()
         -> __file_remove_privs()
            -> dentry_needs_remove_privs()
               -> should_remove_suid()
            -> __remove_privs()
               newattrs.ia_valid = ATTR_FORCE | kill;
               -> notify_change()

In should_remove_suid() we can see that ATTR_KILL_SUID is raised
unconditionally because the file in the test has S_ISUID set.

But we also see that ATTR_KILL_SGID won't be set because while the file
is S_ISGID it is not S_IXGRP (see above) which is a condition for
ATTR_KILL_SGID being raised.

So by the time we call notify_change() we have attr->ia_valid set to
ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
ATTR_KILL_SUID is set and does:

ia_valid = attr->ia_valid |= ATTR_MODE
attr->ia_mode = (inode->i_mode & ~S_ISUID);

which means that when we call setattr_copy() later we will definitely
update inode->i_mode. Note that attr->ia_mode still contain S_ISGID.

Now we call into the filesystem's ->setattr() inode operation which will end up
calling setattr_copy(). Since ATTR_MODE is set we will hit:

if (ia_valid & ATTR_MODE) {
        umode_t mode = attr->ia_mode;
        vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
        if (!vfsgid_in_group_p(vfsgid) &&
            !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                mode &= ~S_ISGID;
        inode->i_mode = mode;
}

and since the caller in the test is neither capable nor in the group of the
inode the S_ISGID bit is stripped.

But now contrast this with overlayfs even after your changes. When
ovl_setattr() is hit from ovl_fallocate()'s call to file_remove_privs()
and calls ovl_do_notify_change() then we are doing this under the
mounter's creds and so the S_ISGID bit is retained:

sys_fallocate()
-> vfs_fallocate()
   -> ovl_fallocate()
      -> file_remove_privs()
         -> dentry_needs_remove_privs()
            -> should_remove_suid()
         -> __remove_privs()
            newattrs.ia_valid = attr_force | kill;
            -> notify_change()
	       -> ovl_setattr()
	          // TAKE ON MOUNTER'S CREDS
	          -> ovl_do_notify_change()
	          // GIVE UP MOUNTER'S CREDS
     // TAKE ON MOUNTER'S CREDS
     -> vfs_fallocate()
        -> xfs_file_fallocate()
           -> file_modified()
              -> __file_remove_privs()
                 -> dentry_needs_remove_privs()
                    -> should_remove_suid()
                 -> __remove_privs()
                    newattrs.ia_valid = attr_force | kill;
                    -> notify_change()

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] ovl: remove privs in ovl_fallocate()
  2022-10-04 10:59   ` Christian Brauner
@ 2022-10-04 13:20     ` Christian Brauner
  2022-10-04 14:01     ` Amir Goldstein
  1 sibling, 0 replies; 13+ messages in thread
From: Christian Brauner @ 2022-10-04 13:20 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Yang Xu, Darrick J . Wong, Filipe Manana,
	linux-unionfs, linux-fsdevel

On Tue, Oct 04, 2022 at 12:59:37PM +0200, Christian Brauner wrote:
> On Mon, Oct 03, 2022 at 03:30:40PM +0300, Amir Goldstein wrote:
> > Underlying fs doesn't remove privs because fallocate is called with
> > privileged mounter credentials.
> > 
> > This fixes some failure in fstests generic/683..687.
> > 
> > Fixes: aab8848cee5e ("ovl: add ovl_fallocate()")
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/overlayfs/file.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > index c8308da8909a..e90ac5376456 100644
> > --- a/fs/overlayfs/file.c
> > +++ b/fs/overlayfs/file.c
> > @@ -517,9 +517,16 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
> >  	const struct cred *old_cred;
> >  	int ret;
> >  
> > +	inode_lock(inode);
> > +	/* Update mode */
> > +	ovl_copyattr(inode);
> > +	ret = file_remove_privs(file);
> 
> First, thank you for picking this up!
> 
> Let me analyze generic/683 failure of Test1 to see why you still see
> failures in this test:
> 
> echo "Test 1 - qa_user, non-exec file $verb"
> setup_testfile
> chmod a+rws $junk_file
> commit_and_check "$qa_user" "$verb" 64k 64k
> 
> So this creates a file with 6666 permissions. While the file has the
> S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. This is
> important in a little bit.
> 
> On a regular filesystem like xfs what will happen is:
> 
> sys_fallocate()
> -> vfs_fallocate()
>    -> xfs_file_fallocate()
>       -> file_modified()
>          -> __file_remove_privs()
>             -> dentry_needs_remove_privs()
>                -> should_remove_suid()
>             -> __remove_privs()
>                newattrs.ia_valid = ATTR_FORCE | kill;
>                -> notify_change()
> 
> In should_remove_suid() we can see that ATTR_KILL_SUID is raised
> unconditionally because the file in the test has S_ISUID set.
> 
> But we also see that ATTR_KILL_SGID won't be set because while the file
> is S_ISGID it is not S_IXGRP (see above) which is a condition for
> ATTR_KILL_SGID being raised.
> 
> So by the time we call notify_change() we have attr->ia_valid set to
> ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
> ATTR_KILL_SUID is set and does:
> 
> ia_valid = attr->ia_valid |= ATTR_MODE
> attr->ia_mode = (inode->i_mode & ~S_ISUID);
> 
> which means that when we call setattr_copy() later we will definitely
> update inode->i_mode. Note that attr->ia_mode still contain S_ISGID.
> 
> Now we call into the filesystem's ->setattr() inode operation which will end up
> calling setattr_copy(). Since ATTR_MODE is set we will hit:
> 
> if (ia_valid & ATTR_MODE) {
>         umode_t mode = attr->ia_mode;
>         vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
>         if (!vfsgid_in_group_p(vfsgid) &&
>             !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
>                 mode &= ~S_ISGID;
>         inode->i_mode = mode;
> }
> 
> and since the caller in the test is neither capable nor in the group of the
> inode the S_ISGID bit is stripped.
> 
> But now contrast this with overlayfs even after your changes. When
> ovl_setattr() is hit from ovl_fallocate()'s call to file_remove_privs()
> and calls ovl_do_notify_change() then we are doing this under the
> mounter's creds and so the S_ISGID bit is retained:
> 
> sys_fallocate()
> -> vfs_fallocate()
>    -> ovl_fallocate()
>       -> file_remove_privs()
>          -> dentry_needs_remove_privs()
>             -> should_remove_suid()
>          -> __remove_privs()
>             newattrs.ia_valid = attr_force | kill;
>             -> notify_change()
> 	       -> ovl_setattr()

Note also that the setattr_prepare() call skips setgid checks because
ATTR_FORCE is passed from file_remove_privs().

> 	          // TAKE ON MOUNTER'S CREDS
> 	          -> ovl_do_notify_change()
> 	          // GIVE UP MOUNTER'S CREDS
>      // TAKE ON MOUNTER'S CREDS
>      -> vfs_fallocate()
>         -> xfs_file_fallocate()
>            -> file_modified()
>               -> __file_remove_privs()
>                  -> dentry_needs_remove_privs()
>                     -> should_remove_suid()
>                  -> __remove_privs()
>                     newattrs.ia_valid = attr_force | kill;
>                     -> notify_change()

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] ovl: remove privs in ovl_fallocate()
  2022-10-04 10:59   ` Christian Brauner
  2022-10-04 13:20     ` Christian Brauner
@ 2022-10-04 14:01     ` Amir Goldstein
  2022-10-04 14:12       ` Christian Brauner
  1 sibling, 1 reply; 13+ messages in thread
From: Amir Goldstein @ 2022-10-04 14:01 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Miklos Szeredi, Yang Xu, Darrick J . Wong, Filipe Manana,
	linux-unionfs, linux-fsdevel

On Tue, Oct 4, 2022 at 1:59 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Mon, Oct 03, 2022 at 03:30:40PM +0300, Amir Goldstein wrote:
> > Underlying fs doesn't remove privs because fallocate is called with
> > privileged mounter credentials.
> >
> > This fixes some failure in fstests generic/683..687.
> >
> > Fixes: aab8848cee5e ("ovl: add ovl_fallocate()")
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/overlayfs/file.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > index c8308da8909a..e90ac5376456 100644
> > --- a/fs/overlayfs/file.c
> > +++ b/fs/overlayfs/file.c
> > @@ -517,9 +517,16 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
> >       const struct cred *old_cred;
> >       int ret;
> >
> > +     inode_lock(inode);
> > +     /* Update mode */
> > +     ovl_copyattr(inode);
> > +     ret = file_remove_privs(file);
>
> First, thank you for picking this up!
>
> Let me analyze generic/683 failure of Test1 to see why you still see
> failures in this test:
>
> echo "Test 1 - qa_user, non-exec file $verb"
> setup_testfile
> chmod a+rws $junk_file
> commit_and_check "$qa_user" "$verb" 64k 64k
>
> So this creates a file with 6666 permissions. While the file has the
> S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. This is
> important in a little bit.
>
> On a regular filesystem like xfs what will happen is:
>
> sys_fallocate()
> -> vfs_fallocate()
>    -> xfs_file_fallocate()
>       -> file_modified()
>          -> __file_remove_privs()
>             -> dentry_needs_remove_privs()
>                -> should_remove_suid()
>             -> __remove_privs()
>                newattrs.ia_valid = ATTR_FORCE | kill;
>                -> notify_change()
>
> In should_remove_suid() we can see that ATTR_KILL_SUID is raised
> unconditionally because the file in the test has S_ISUID set.
>
> But we also see that ATTR_KILL_SGID won't be set because while the file
> is S_ISGID it is not S_IXGRP (see above) which is a condition for
> ATTR_KILL_SGID being raised.
>
> So by the time we call notify_change() we have attr->ia_valid set to
> ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
> ATTR_KILL_SUID is set and does:
>
> ia_valid = attr->ia_valid |= ATTR_MODE
> attr->ia_mode = (inode->i_mode & ~S_ISUID);
>
> which means that when we call setattr_copy() later we will definitely
> update inode->i_mode. Note that attr->ia_mode still contain S_ISGID.
>
> Now we call into the filesystem's ->setattr() inode operation which will end up
> calling setattr_copy(). Since ATTR_MODE is set we will hit:
>
> if (ia_valid & ATTR_MODE) {
>         umode_t mode = attr->ia_mode;
>         vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
>         if (!vfsgid_in_group_p(vfsgid) &&
>             !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
>                 mode &= ~S_ISGID;
>         inode->i_mode = mode;
> }
>

Can you think of a reason why the above should not be done
in notify_change() before even calling to ->setattr()?

Although, it wouldn't help because ovl_setattr() does:

    if (attr->ia_valid & (ATTR_KILL_SUID|ATTR_KILL_SGID))
        attr->ia_valid &= ~ATTR_MODE;

> and since the caller in the test is neither capable nor in the group of the
> inode the S_ISGID bit is stripped.
>
> But now contrast this with overlayfs even after your changes. When
> ovl_setattr() is hit from ovl_fallocate()'s call to file_remove_privs()
> and calls ovl_do_notify_change() then we are doing this under the
> mounter's creds and so the S_ISGID bit is retained:
>
> sys_fallocate()
> -> vfs_fallocate()
>    -> ovl_fallocate()
>       -> file_remove_privs()
>          -> dentry_needs_remove_privs()
>             -> should_remove_suid()
>          -> __remove_privs()
>             newattrs.ia_valid = attr_force | kill;
>             -> notify_change()
>                -> ovl_setattr()
>                   // TAKE ON MOUNTER'S CREDS
>                   -> ovl_do_notify_change()
>                   // GIVE UP MOUNTER'S CREDS
>      // TAKE ON MOUNTER'S CREDS
>      -> vfs_fallocate()
>         -> xfs_file_fallocate()
>            -> file_modified()
>               -> __file_remove_privs()
>                  -> dentry_needs_remove_privs()
>                     -> should_remove_suid()
>                  -> __remove_privs()
>                     newattrs.ia_valid = attr_force | kill;
>                     -> notify_change()

The model in overlayfs is that security is checked twice
once on overlay inode with caller creds and once again
on xfs inode with mounter creds. Either of these checks
could result in clearing SUID/SGID bits.

In the call stack above, the outer should_remove_suid()
with caller creds sets ATTR_KILL_SUID and then the outer
notify_change() clears SUID and sets ATTR_MODE,
but ovl_setattr() clears ATTR_MODE and then the inner
notify_change() re-clears SUID and sets ATTR_MODE again.

If the outer notify_change() would have checked the in_group_p()
condition, clear SGID and set a flag ATTR_KILL_SGID_FORCE
then the inner notify_change() would see this flag and re-clear
SGID bit, just the same as it does with SUID bit in the stack stace
above.

Is this making any sense?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] ovl: remove privs in ovl_fallocate()
  2022-10-04 14:01     ` Amir Goldstein
@ 2022-10-04 14:12       ` Christian Brauner
  2022-10-04 15:52         ` Christian Brauner
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2022-10-04 14:12 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Yang Xu, Darrick J . Wong, Filipe Manana,
	linux-unionfs, linux-fsdevel

On Tue, Oct 04, 2022 at 05:01:06PM +0300, Amir Goldstein wrote:
> On Tue, Oct 4, 2022 at 1:59 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Mon, Oct 03, 2022 at 03:30:40PM +0300, Amir Goldstein wrote:
> > > Underlying fs doesn't remove privs because fallocate is called with
> > > privileged mounter credentials.
> > >
> > > This fixes some failure in fstests generic/683..687.
> > >
> > > Fixes: aab8848cee5e ("ovl: add ovl_fallocate()")
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > >  fs/overlayfs/file.c | 12 +++++++++++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > index c8308da8909a..e90ac5376456 100644
> > > --- a/fs/overlayfs/file.c
> > > +++ b/fs/overlayfs/file.c
> > > @@ -517,9 +517,16 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
> > >       const struct cred *old_cred;
> > >       int ret;
> > >
> > > +     inode_lock(inode);
> > > +     /* Update mode */
> > > +     ovl_copyattr(inode);
> > > +     ret = file_remove_privs(file);
> >
> > First, thank you for picking this up!
> >
> > Let me analyze generic/683 failure of Test1 to see why you still see
> > failures in this test:
> >
> > echo "Test 1 - qa_user, non-exec file $verb"
> > setup_testfile
> > chmod a+rws $junk_file
> > commit_and_check "$qa_user" "$verb" 64k 64k
> >
> > So this creates a file with 6666 permissions. While the file has the
> > S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. This is
> > important in a little bit.
> >
> > On a regular filesystem like xfs what will happen is:
> >
> > sys_fallocate()
> > -> vfs_fallocate()
> >    -> xfs_file_fallocate()
> >       -> file_modified()
> >          -> __file_remove_privs()
> >             -> dentry_needs_remove_privs()
> >                -> should_remove_suid()
> >             -> __remove_privs()
> >                newattrs.ia_valid = ATTR_FORCE | kill;
> >                -> notify_change()
> >
> > In should_remove_suid() we can see that ATTR_KILL_SUID is raised
> > unconditionally because the file in the test has S_ISUID set.
> >
> > But we also see that ATTR_KILL_SGID won't be set because while the file
> > is S_ISGID it is not S_IXGRP (see above) which is a condition for
> > ATTR_KILL_SGID being raised.
> >
> > So by the time we call notify_change() we have attr->ia_valid set to
> > ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
> > ATTR_KILL_SUID is set and does:
> >
> > ia_valid = attr->ia_valid |= ATTR_MODE
> > attr->ia_mode = (inode->i_mode & ~S_ISUID);
> >
> > which means that when we call setattr_copy() later we will definitely
> > update inode->i_mode. Note that attr->ia_mode still contain S_ISGID.
> >
> > Now we call into the filesystem's ->setattr() inode operation which will end up
> > calling setattr_copy(). Since ATTR_MODE is set we will hit:
> >
> > if (ia_valid & ATTR_MODE) {
> >         umode_t mode = attr->ia_mode;
> >         vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
> >         if (!vfsgid_in_group_p(vfsgid) &&
> >             !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
> >                 mode &= ~S_ISGID;
> >         inode->i_mode = mode;
> > }
> >
> 
> Can you think of a reason why the above should not be done
> in notify_change() before even calling to ->setattr()?
> 
> Although, it wouldn't help because ovl_setattr() does:
> 
>     if (attr->ia_valid & (ATTR_KILL_SUID|ATTR_KILL_SGID))
>         attr->ia_valid &= ~ATTR_MODE;
> 
> > and since the caller in the test is neither capable nor in the group of the
> > inode the S_ISGID bit is stripped.
> >
> > But now contrast this with overlayfs even after your changes. When
> > ovl_setattr() is hit from ovl_fallocate()'s call to file_remove_privs()
> > and calls ovl_do_notify_change() then we are doing this under the
> > mounter's creds and so the S_ISGID bit is retained:
> >
> > sys_fallocate()
> > -> vfs_fallocate()
> >    -> ovl_fallocate()
> >       -> file_remove_privs()
> >          -> dentry_needs_remove_privs()
> >             -> should_remove_suid()
> >          -> __remove_privs()
> >             newattrs.ia_valid = attr_force | kill;
> >             -> notify_change()
> >                -> ovl_setattr()
> >                   // TAKE ON MOUNTER'S CREDS
> >                   -> ovl_do_notify_change()
> >                   // GIVE UP MOUNTER'S CREDS
> >      // TAKE ON MOUNTER'S CREDS
> >      -> vfs_fallocate()
> >         -> xfs_file_fallocate()
> >            -> file_modified()
> >               -> __file_remove_privs()
> >                  -> dentry_needs_remove_privs()
> >                     -> should_remove_suid()
> >                  -> __remove_privs()
> >                     newattrs.ia_valid = attr_force | kill;
> >                     -> notify_change()
> 
> The model in overlayfs is that security is checked twice
> once on overlay inode with caller creds and once again
> on xfs inode with mounter creds. Either of these checks
> could result in clearing SUID/SGID bits.

Yep.

> 
> In the call stack above, the outer should_remove_suid()
> with caller creds sets ATTR_KILL_SUID and then the outer
> notify_change() clears SUID and sets ATTR_MODE,

Yes.

> but ovl_setattr() clears ATTR_MODE and then the inner
> notify_change() re-clears SUID and sets ATTR_MODE again.

Yes.

> 
> If the outer notify_change() would have checked the in_group_p()
> condition, clear SGID and set a flag ATTR_KILL_SGID_FORCE
> then the inner notify_change() would see this flag and re-clear
> SGID bit, just the same as it does with SUID bit in the stack stace
> above.
> 
> Is this making any sense?

What I kept thinking was sm along the lines of:

diff --git a/fs/inode.c b/fs/inode.c
index ba1de23c13c1..e62a564201b7 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1968,8 +1968,12 @@ int should_remove_suid(struct dentry *dentry)
         * sgid without any exec bits is just a mandatory locking mark; leave
         * it alone.  If some exec bits are set, it's a real sgid; kill it.
         */
-       if (unlikely((mode & S_ISGID) && (mode & S_IXGRP)))
-               kill |= ATTR_KILL_SGID;
+       if (unlikely(mode & S_ISGID)) {
+               if ((mode & S_IXGRP) ||
+                   (!vfsgid_in_group_p(vfsgid) &&
+                    !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID)))
+                       kill |= ATTR_KILL_SGID;
+       }

        if (unlikely(kill && !capable(CAP_FSETID) && S_ISREG(mode)))
                return kill;

mandatory locks have been removed as well so that remark seems pointless
as well?

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] ovl: remove privs in ovl_fallocate()
  2022-10-04 14:12       ` Christian Brauner
@ 2022-10-04 15:52         ` Christian Brauner
  2022-10-05 13:56           ` Amir Goldstein
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2022-10-04 15:52 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Yang Xu, Darrick J . Wong, Filipe Manana,
	linux-unionfs, linux-fsdevel

On Tue, Oct 04, 2022 at 04:13:05PM +0200, Christian Brauner wrote:
> On Tue, Oct 04, 2022 at 05:01:06PM +0300, Amir Goldstein wrote:
> > On Tue, Oct 4, 2022 at 1:59 PM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Mon, Oct 03, 2022 at 03:30:40PM +0300, Amir Goldstein wrote:
> > > > Underlying fs doesn't remove privs because fallocate is called with
> > > > privileged mounter credentials.
> > > >
> > > > This fixes some failure in fstests generic/683..687.
> > > >
> > > > Fixes: aab8848cee5e ("ovl: add ovl_fallocate()")
> > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > ---
> > > >  fs/overlayfs/file.c | 12 +++++++++++-
> > > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > > index c8308da8909a..e90ac5376456 100644
> > > > --- a/fs/overlayfs/file.c
> > > > +++ b/fs/overlayfs/file.c
> > > > @@ -517,9 +517,16 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
> > > >       const struct cred *old_cred;
> > > >       int ret;
> > > >
> > > > +     inode_lock(inode);
> > > > +     /* Update mode */
> > > > +     ovl_copyattr(inode);
> > > > +     ret = file_remove_privs(file);
> > >
> > > First, thank you for picking this up!
> > >
> > > Let me analyze generic/683 failure of Test1 to see why you still see
> > > failures in this test:
> > >
> > > echo "Test 1 - qa_user, non-exec file $verb"
> > > setup_testfile
> > > chmod a+rws $junk_file
> > > commit_and_check "$qa_user" "$verb" 64k 64k
> > >
> > > So this creates a file with 6666 permissions. While the file has the
> > > S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. This is
> > > important in a little bit.
> > >
> > > On a regular filesystem like xfs what will happen is:
> > >
> > > sys_fallocate()
> > > -> vfs_fallocate()
> > >    -> xfs_file_fallocate()
> > >       -> file_modified()
> > >          -> __file_remove_privs()
> > >             -> dentry_needs_remove_privs()
> > >                -> should_remove_suid()
> > >             -> __remove_privs()
> > >                newattrs.ia_valid = ATTR_FORCE | kill;
> > >                -> notify_change()
> > >
> > > In should_remove_suid() we can see that ATTR_KILL_SUID is raised
> > > unconditionally because the file in the test has S_ISUID set.
> > >
> > > But we also see that ATTR_KILL_SGID won't be set because while the file
> > > is S_ISGID it is not S_IXGRP (see above) which is a condition for
> > > ATTR_KILL_SGID being raised.
> > >
> > > So by the time we call notify_change() we have attr->ia_valid set to
> > > ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
> > > ATTR_KILL_SUID is set and does:
> > >
> > > ia_valid = attr->ia_valid |= ATTR_MODE
> > > attr->ia_mode = (inode->i_mode & ~S_ISUID);
> > >
> > > which means that when we call setattr_copy() later we will definitely
> > > update inode->i_mode. Note that attr->ia_mode still contain S_ISGID.
> > >
> > > Now we call into the filesystem's ->setattr() inode operation which will end up
> > > calling setattr_copy(). Since ATTR_MODE is set we will hit:
> > >
> > > if (ia_valid & ATTR_MODE) {
> > >         umode_t mode = attr->ia_mode;
> > >         vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
> > >         if (!vfsgid_in_group_p(vfsgid) &&
> > >             !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
> > >                 mode &= ~S_ISGID;
> > >         inode->i_mode = mode;
> > > }
> > >
> > 
> > Can you think of a reason why the above should not be done
> > in notify_change() before even calling to ->setattr()?
> > 
> > Although, it wouldn't help because ovl_setattr() does:
> > 
> >     if (attr->ia_valid & (ATTR_KILL_SUID|ATTR_KILL_SGID))
> >         attr->ia_valid &= ~ATTR_MODE;
> > 
> > > and since the caller in the test is neither capable nor in the group of the
> > > inode the S_ISGID bit is stripped.
> > >
> > > But now contrast this with overlayfs even after your changes. When
> > > ovl_setattr() is hit from ovl_fallocate()'s call to file_remove_privs()
> > > and calls ovl_do_notify_change() then we are doing this under the
> > > mounter's creds and so the S_ISGID bit is retained:
> > >
> > > sys_fallocate()
> > > -> vfs_fallocate()
> > >    -> ovl_fallocate()
> > >       -> file_remove_privs()
> > >          -> dentry_needs_remove_privs()
> > >             -> should_remove_suid()
> > >          -> __remove_privs()
> > >             newattrs.ia_valid = attr_force | kill;
> > >             -> notify_change()
> > >                -> ovl_setattr()
> > >                   // TAKE ON MOUNTER'S CREDS
> > >                   -> ovl_do_notify_change()
> > >                   // GIVE UP MOUNTER'S CREDS
> > >      // TAKE ON MOUNTER'S CREDS
> > >      -> vfs_fallocate()
> > >         -> xfs_file_fallocate()
> > >            -> file_modified()
> > >               -> __file_remove_privs()
> > >                  -> dentry_needs_remove_privs()
> > >                     -> should_remove_suid()
> > >                  -> __remove_privs()
> > >                     newattrs.ia_valid = attr_force | kill;
> > >                     -> notify_change()
> > 
> > The model in overlayfs is that security is checked twice
> > once on overlay inode with caller creds and once again
> > on xfs inode with mounter creds. Either of these checks
> > could result in clearing SUID/SGID bits.
> 
> Yep.
> 
> > 
> > In the call stack above, the outer should_remove_suid()
> > with caller creds sets ATTR_KILL_SUID and then the outer
> > notify_change() clears SUID and sets ATTR_MODE,
> 
> Yes.
> 
> > but ovl_setattr() clears ATTR_MODE and then the inner
> > notify_change() re-clears SUID and sets ATTR_MODE again.
> 
> Yes.
> 
> > 
> > If the outer notify_change() would have checked the in_group_p()
> > condition, clear SGID and set a flag ATTR_KILL_SGID_FORCE
> > then the inner notify_change() would see this flag and re-clear
> > SGID bit, just the same as it does with SUID bit in the stack stace
> > above.
> > 
> > Is this making any sense?
> 
> What I kept thinking was sm along the lines of:
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index ba1de23c13c1..e62a564201b7 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1968,8 +1968,12 @@ int should_remove_suid(struct dentry *dentry)
>          * sgid without any exec bits is just a mandatory locking mark; leave
>          * it alone.  If some exec bits are set, it's a real sgid; kill it.
>          */
> -       if (unlikely((mode & S_ISGID) && (mode & S_IXGRP)))
> -               kill |= ATTR_KILL_SGID;
> +       if (unlikely(mode & S_ISGID)) {
> +               if ((mode & S_IXGRP) ||
> +                   (!vfsgid_in_group_p(vfsgid) &&
> +                    !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID)))
> +                       kill |= ATTR_KILL_SGID;
> +       }
> 
>         if (unlikely(kill && !capable(CAP_FSETID) && S_ISREG(mode)))
>                 return kill;
> 
> mandatory locks have been removed as well so that remark seems pointless
> as well?

My feeling here is that both should_remove_suid() and notify_change()
need to apply the same permission checks as setattr_prepare() and
setattr_copy() instead of all this special casing them. I don't see a
good reason to not require the same checks. So sm like (__completely
untested__):

From 922f9f123ab6531c29bf05585ef88c17fe65dba3 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner@kernel.org>
Date: Tue, 4 Oct 2022 16:13:34 +0200
Subject: [UNTESTED PATCH] attr: use consistent sgid stripping checks

Require the same permissions as setattr_prepare() and setattr_copy() have
instead of all these special cases. We can probably consolidate this even
more...

Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/attr.c          |  2 +-
 fs/fuse/file.c     |  2 +-
 fs/inode.c         | 41 ++++++++++++++++++++++++-----------------
 fs/internal.h      |  4 +++-
 fs/ocfs2/file.c    |  4 ++--
 fs/open.c          |  2 +-
 include/linux/fs.h |  2 +-
 7 files changed, 33 insertions(+), 24 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index 1552a5f23d6b..9262c6b31c26 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -375,7 +375,7 @@ int notify_change(struct user_namespace *mnt_userns, struct dentry *dentry,
 		}
 	}
 	if (ia_valid & ATTR_KILL_SGID) {
-		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
+		if (should_remove_sgid(mnt_userns, dentry)) {
 			if (!(ia_valid & ATTR_MODE)) {
 				ia_valid = attr->ia_valid |= ATTR_MODE;
 				attr->ia_mode = inode->i_mode;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1a3afd469e3a..fccc2c7e88fd 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1313,7 +1313,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
 			return err;
 
 		if (fc->handle_killpriv_v2 &&
-		    should_remove_suid(file_dentry(file))) {
+		    should_remove_suid(&init_user_ns, file_dentry(file))) {
 			goto writethrough;
 		}
 
diff --git a/fs/inode.c b/fs/inode.c
index ba1de23c13c1..c639aefe01c3 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1949,27 +1949,33 @@ void touch_atime(const struct path *path)
 }
 EXPORT_SYMBOL(touch_atime);
 
-/*
- * The logic we want is
- *
- *	if suid or (sgid and xgrp)
- *		remove privs
- */
-int should_remove_suid(struct dentry *dentry)
+bool should_remove_sgid(struct user_namespace *mnt_userns, struct dentry *dentry)
+{
+	struct inode *inode = d_inode(dentry);
+	umode_t mode = inode->i_mode;
+
+	if (unlikely(mode & S_ISGID)) {
+		if ((mode & S_IXGRP) ||
+		    (!vfsgid_in_group_p(i_gid_into_vfsgid(mnt_userns, inode)) &&
+		     !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID)))
+			return true;
+	}
+
+	return false;
+}
+
+int should_remove_suid(struct user_namespace *mnt_userns, struct dentry *dentry)
 {
-	umode_t mode = d_inode(dentry)->i_mode;
+	struct inode *inode = d_inode(dentry);
+	umode_t mode = inode->i_mode;
 	int kill = 0;
 
 	/* suid always must be killed */
 	if (unlikely(mode & S_ISUID))
 		kill = ATTR_KILL_SUID;
 
-	/*
-	 * sgid without any exec bits is just a mandatory locking mark; leave
-	 * it alone.  If some exec bits are set, it's a real sgid; kill it.
-	 */
-	if (unlikely((mode & S_ISGID) && (mode & S_IXGRP)))
-		kill |= ATTR_KILL_SGID;
+	if (should_remove_sgid(mnt_userns, dentry))
+		kill = ATTR_KILL_SGID;
 
 	if (unlikely(kill && !capable(CAP_FSETID) && S_ISREG(mode)))
 		return kill;
@@ -1983,7 +1989,8 @@ EXPORT_SYMBOL(should_remove_suid);
  * response to write or truncate. Return 0 if nothing has to be changed.
  * Negative value on error (change should be denied).
  */
-int dentry_needs_remove_privs(struct dentry *dentry)
+int dentry_needs_remove_privs(struct user_namespace *mnt_userns,
+			      struct dentry *dentry)
 {
 	struct inode *inode = d_inode(dentry);
 	int mask = 0;
@@ -1992,7 +1999,7 @@ int dentry_needs_remove_privs(struct dentry *dentry)
 	if (IS_NOSEC(inode))
 		return 0;
 
-	mask = should_remove_suid(dentry);
+	mask = should_remove_suid(mnt_userns, dentry);
 	ret = security_inode_need_killpriv(dentry);
 	if (ret < 0)
 		return ret;
@@ -2024,7 +2031,7 @@ static int __file_remove_privs(struct file *file, unsigned int flags)
 	if (IS_NOSEC(inode) || !S_ISREG(inode->i_mode))
 		return 0;
 
-	kill = dentry_needs_remove_privs(dentry);
+	kill = dentry_needs_remove_privs(file_mnt_user_ns(file), dentry);
 	if (kill < 0)
 		return kill;
 
diff --git a/fs/internal.h b/fs/internal.h
index 1e67b4b9a4d1..ae152ded227c 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -139,7 +139,8 @@ extern int vfs_open(const struct path *, struct file *);
  * inode.c
  */
 extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc);
-extern int dentry_needs_remove_privs(struct dentry *dentry);
+extern int dentry_needs_remove_privs(struct user_namespace *,
+				     struct dentry *dentry);
 
 /*
  * fs-writeback.c
@@ -226,3 +227,4 @@ int do_set_acl(struct user_namespace *mnt_userns, struct dentry *dentry,
 	       const char *acl_name, const void *kvalue, size_t size);
 ssize_t do_get_acl(struct user_namespace *mnt_userns, struct dentry *dentry,
 		   const char *acl_name, void *kvalue, size_t size);
+int should_remove_sgid(struct user_namespace *mnt_userns, struct dentry *dentry);
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index af900aaa9275..0a33096a3852 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1991,7 +1991,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
 		}
 	}
 
-	if (file && should_remove_suid(file->f_path.dentry)) {
+	if (file && should_remove_suid(&init_user_ns, file->f_path.dentry)) {
 		ret = __ocfs2_write_remove_suid(inode, di_bh);
 		if (ret) {
 			mlog_errno(ret);
@@ -2279,7 +2279,7 @@ static int ocfs2_prepare_inode_for_write(struct file *file,
 		 * inode. There's also the dinode i_size state which
 		 * can be lost via setattr during extending writes (we
 		 * set inode->i_size at the end of a write. */
-		if (should_remove_suid(dentry)) {
+		if (should_remove_suid(&init_user_ns, dentry)) {
 			if (meta_level == 0) {
 				ocfs2_inode_unlock_for_extent_tree(inode,
 								   &di_bh,
diff --git a/fs/open.c b/fs/open.c
index 8a813fa5ca56..ecb7b8a58275 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -54,7 +54,7 @@ int do_truncate(struct user_namespace *mnt_userns, struct dentry *dentry,
 	}
 
 	/* Remove suid, sgid, and file capabilities on truncate too */
-	ret = dentry_needs_remove_privs(dentry);
+	ret = dentry_needs_remove_privs(mnt_userns, dentry);
 	if (ret < 0)
 		return ret;
 	if (ret)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index badff81b9dde..58347514421e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3092,7 +3092,7 @@ extern void __destroy_inode(struct inode *);
 extern struct inode *new_inode_pseudo(struct super_block *sb);
 extern struct inode *new_inode(struct super_block *sb);
 extern void free_inode_nonrcu(struct inode *inode);
-extern int should_remove_suid(struct dentry *);
+extern int should_remove_suid(struct user_namespace *, struct dentry *);
 extern int file_remove_privs(struct file *);
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] ovl: remove privs in ovl_fallocate()
  2022-10-04 15:52         ` Christian Brauner
@ 2022-10-05 13:56           ` Amir Goldstein
  0 siblings, 0 replies; 13+ messages in thread
From: Amir Goldstein @ 2022-10-05 13:56 UTC (permalink / raw)
  To: Christian Brauner, Miklos Szeredi
  Cc: Yang Xu, Darrick J . Wong, Filipe Manana, linux-unionfs, linux-fsdevel

On Tue, Oct 4, 2022 at 6:52 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Oct 04, 2022 at 04:13:05PM +0200, Christian Brauner wrote:
> > On Tue, Oct 04, 2022 at 05:01:06PM +0300, Amir Goldstein wrote:
> > > On Tue, Oct 4, 2022 at 1:59 PM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Mon, Oct 03, 2022 at 03:30:40PM +0300, Amir Goldstein wrote:
> > > > > Underlying fs doesn't remove privs because fallocate is called with
> > > > > privileged mounter credentials.
> > > > >
> > > > > This fixes some failure in fstests generic/683..687.
> > > > >
> > > > > Fixes: aab8848cee5e ("ovl: add ovl_fallocate()")
> > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > > ---
> > > > >  fs/overlayfs/file.c | 12 +++++++++++-
> > > > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > > > index c8308da8909a..e90ac5376456 100644
> > > > > --- a/fs/overlayfs/file.c
> > > > > +++ b/fs/overlayfs/file.c
> > > > > @@ -517,9 +517,16 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
> > > > >       const struct cred *old_cred;
> > > > >       int ret;
> > > > >
> > > > > +     inode_lock(inode);
> > > > > +     /* Update mode */
> > > > > +     ovl_copyattr(inode);
> > > > > +     ret = file_remove_privs(file);
> > > >
> > > > First, thank you for picking this up!
> > > >
> > > > Let me analyze generic/683 failure of Test1 to see why you still see
> > > > failures in this test:
> > > >
> > > > echo "Test 1 - qa_user, non-exec file $verb"
> > > > setup_testfile
> > > > chmod a+rws $junk_file
> > > > commit_and_check "$qa_user" "$verb" 64k 64k
> > > >
> > > > So this creates a file with 6666 permissions. While the file has the
> > > > S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. This is
> > > > important in a little bit.
> > > >
> > > > On a regular filesystem like xfs what will happen is:
> > > >
> > > > sys_fallocate()
> > > > -> vfs_fallocate()
> > > >    -> xfs_file_fallocate()
> > > >       -> file_modified()
> > > >          -> __file_remove_privs()
> > > >             -> dentry_needs_remove_privs()
> > > >                -> should_remove_suid()
> > > >             -> __remove_privs()
> > > >                newattrs.ia_valid = ATTR_FORCE | kill;
> > > >                -> notify_change()
> > > >
> > > > In should_remove_suid() we can see that ATTR_KILL_SUID is raised
> > > > unconditionally because the file in the test has S_ISUID set.
> > > >
> > > > But we also see that ATTR_KILL_SGID won't be set because while the file
> > > > is S_ISGID it is not S_IXGRP (see above) which is a condition for
> > > > ATTR_KILL_SGID being raised.
> > > >
> > > > So by the time we call notify_change() we have attr->ia_valid set to
> > > > ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
> > > > ATTR_KILL_SUID is set and does:
> > > >
> > > > ia_valid = attr->ia_valid |= ATTR_MODE
> > > > attr->ia_mode = (inode->i_mode & ~S_ISUID);
> > > >
> > > > which means that when we call setattr_copy() later we will definitely
> > > > update inode->i_mode. Note that attr->ia_mode still contain S_ISGID.
> > > >
> > > > Now we call into the filesystem's ->setattr() inode operation which will end up
> > > > calling setattr_copy(). Since ATTR_MODE is set we will hit:
> > > >
> > > > if (ia_valid & ATTR_MODE) {
> > > >         umode_t mode = attr->ia_mode;
> > > >         vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
> > > >         if (!vfsgid_in_group_p(vfsgid) &&
> > > >             !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
> > > >                 mode &= ~S_ISGID;
> > > >         inode->i_mode = mode;
> > > > }
> > > >
> > >
> > > Can you think of a reason why the above should not be done
> > > in notify_change() before even calling to ->setattr()?
> > >
> > > Although, it wouldn't help because ovl_setattr() does:
> > >
> > >     if (attr->ia_valid & (ATTR_KILL_SUID|ATTR_KILL_SGID))
> > >         attr->ia_valid &= ~ATTR_MODE;
> > >
> > > > and since the caller in the test is neither capable nor in the group of the
> > > > inode the S_ISGID bit is stripped.
> > > >
> > > > But now contrast this with overlayfs even after your changes. When
> > > > ovl_setattr() is hit from ovl_fallocate()'s call to file_remove_privs()
> > > > and calls ovl_do_notify_change() then we are doing this under the
> > > > mounter's creds and so the S_ISGID bit is retained:
> > > >
> > > > sys_fallocate()
> > > > -> vfs_fallocate()
> > > >    -> ovl_fallocate()
> > > >       -> file_remove_privs()
> > > >          -> dentry_needs_remove_privs()
> > > >             -> should_remove_suid()
> > > >          -> __remove_privs()
> > > >             newattrs.ia_valid = attr_force | kill;
> > > >             -> notify_change()
> > > >                -> ovl_setattr()
> > > >                   // TAKE ON MOUNTER'S CREDS
> > > >                   -> ovl_do_notify_change()
> > > >                   // GIVE UP MOUNTER'S CREDS
> > > >      // TAKE ON MOUNTER'S CREDS
> > > >      -> vfs_fallocate()
> > > >         -> xfs_file_fallocate()
> > > >            -> file_modified()
> > > >               -> __file_remove_privs()
> > > >                  -> dentry_needs_remove_privs()
> > > >                     -> should_remove_suid()
> > > >                  -> __remove_privs()
> > > >                     newattrs.ia_valid = attr_force | kill;
> > > >                     -> notify_change()
> > >
> > > The model in overlayfs is that security is checked twice
> > > once on overlay inode with caller creds and once again
> > > on xfs inode with mounter creds. Either of these checks
> > > could result in clearing SUID/SGID bits.
> >
> > Yep.
> >
> > >
> > > In the call stack above, the outer should_remove_suid()
> > > with caller creds sets ATTR_KILL_SUID and then the outer
> > > notify_change() clears SUID and sets ATTR_MODE,
> >
> > Yes.
> >
> > > but ovl_setattr() clears ATTR_MODE and then the inner
> > > notify_change() re-clears SUID and sets ATTR_MODE again.
> >
> > Yes.
> >
> > >
> > > If the outer notify_change() would have checked the in_group_p()
> > > condition, clear SGID and set a flag ATTR_KILL_SGID_FORCE
> > > then the inner notify_change() would see this flag and re-clear
> > > SGID bit, just the same as it does with SUID bit in the stack stace
> > > above.
> > >
> > > Is this making any sense?
> >
> > What I kept thinking was sm along the lines of:
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index ba1de23c13c1..e62a564201b7 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -1968,8 +1968,12 @@ int should_remove_suid(struct dentry *dentry)
> >          * sgid without any exec bits is just a mandatory locking mark; leave
> >          * it alone.  If some exec bits are set, it's a real sgid; kill it.
> >          */
> > -       if (unlikely((mode & S_ISGID) && (mode & S_IXGRP)))
> > -               kill |= ATTR_KILL_SGID;
> > +       if (unlikely(mode & S_ISGID)) {
> > +               if ((mode & S_IXGRP) ||
> > +                   (!vfsgid_in_group_p(vfsgid) &&
> > +                    !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID)))
> > +                       kill |= ATTR_KILL_SGID;
> > +       }
> >
> >         if (unlikely(kill && !capable(CAP_FSETID) && S_ISREG(mode)))
> >                 return kill;
> >
> > mandatory locks have been removed as well so that remark seems pointless
> > as well?
>
> My feeling here is that both should_remove_suid() and notify_change()
> need to apply the same permission checks as setattr_prepare() and
> setattr_copy() instead of all this special casing them. I don't see a
> good reason to not require the same checks. So sm like (__completely
> untested__):

I like it :)
Few small nits. Some you may have already noticed after testing...

>
> From 922f9f123ab6531c29bf05585ef88c17fe65dba3 Mon Sep 17 00:00:00 2001
> From: Christian Brauner <brauner@kernel.org>
> Date: Tue, 4 Oct 2022 16:13:34 +0200
> Subject: [UNTESTED PATCH] attr: use consistent sgid stripping checks
>
> Require the same permissions as setattr_prepare() and setattr_copy() have
> instead of all these special cases. We can probably consolidate this even
> more...
>
> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> ---
>  fs/attr.c          |  2 +-
>  fs/fuse/file.c     |  2 +-
>  fs/inode.c         | 41 ++++++++++++++++++++++++-----------------
>  fs/internal.h      |  4 +++-
>  fs/ocfs2/file.c    |  4 ++--
>  fs/open.c          |  2 +-
>  include/linux/fs.h |  2 +-
>  7 files changed, 33 insertions(+), 24 deletions(-)
>
> diff --git a/fs/attr.c b/fs/attr.c
> index 1552a5f23d6b..9262c6b31c26 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -375,7 +375,7 @@ int notify_change(struct user_namespace *mnt_userns, struct dentry *dentry,
>                 }
>         }
>         if (ia_valid & ATTR_KILL_SGID) {
> -               if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> +               if (should_remove_sgid(mnt_userns, dentry)) {

This should just be:
+               if (mode & S_ISGID) {

Just like the case with ATTR_KILL_SUID,
notify_change() should follow ATTR_KILL_* hints issued by
dentry_needs_remove_privs() without re-checking the conditions.
This is what makes overlayfs SUID stripping work in current code.

>                         if (!(ia_valid & ATTR_MODE)) {
>                                 ia_valid = attr->ia_valid |= ATTR_MODE;
>                                 attr->ia_mode = inode->i_mode;
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 1a3afd469e3a..fccc2c7e88fd 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1313,7 +1313,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
>                         return err;
>
>                 if (fc->handle_killpriv_v2 &&
> -                   should_remove_suid(file_dentry(file))) {
> +                   should_remove_suid(&init_user_ns, file_dentry(file))) {
>                         goto writethrough;
>                 }
>
> diff --git a/fs/inode.c b/fs/inode.c
> index ba1de23c13c1..c639aefe01c3 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1949,27 +1949,33 @@ void touch_atime(const struct path *path)
>  }
>  EXPORT_SYMBOL(touch_atime);
>
> -/*
> - * The logic we want is
> - *
> - *     if suid or (sgid and xgrp)
> - *             remove privs
> - */
> -int should_remove_suid(struct dentry *dentry)
> +bool should_remove_sgid(struct user_namespace *mnt_userns, struct dentry *dentry)
> +{
> +       struct inode *inode = d_inode(dentry);
> +       umode_t mode = inode->i_mode;
> +
> +       if (unlikely(mode & S_ISGID)) {
> +               if ((mode & S_IXGRP) ||
> +                   (!vfsgid_in_group_p(i_gid_into_vfsgid(mnt_userns, inode)) &&
> +                    !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID)))
> +                       return true;
> +       }
> +
> +       return false;
> +}
> +
> +int should_remove_suid(struct user_namespace *mnt_userns, struct dentry *dentry)
>  {
> -       umode_t mode = d_inode(dentry)->i_mode;
> +       struct inode *inode = d_inode(dentry);
> +       umode_t mode = inode->i_mode;
>         int kill = 0;
>
>         /* suid always must be killed */
>         if (unlikely(mode & S_ISUID))
>                 kill = ATTR_KILL_SUID;
>
> -       /*
> -        * sgid without any exec bits is just a mandatory locking mark; leave
> -        * it alone.  If some exec bits are set, it's a real sgid; kill it.
> -        */
> -       if (unlikely((mode & S_ISGID) && (mode & S_IXGRP)))
> -               kill |= ATTR_KILL_SGID;
> +       if (should_remove_sgid(mnt_userns, dentry))
> +               kill = ATTR_KILL_SGID;

kill |= ATTR_KILL_SGID;

>
>         if (unlikely(kill && !capable(CAP_FSETID) && S_ISREG(mode)))
>                 return kill;
> @@ -1983,7 +1989,8 @@ EXPORT_SYMBOL(should_remove_suid);
>   * response to write or truncate. Return 0 if nothing has to be changed.
>   * Negative value on error (change should be denied).
>   */
> -int dentry_needs_remove_privs(struct dentry *dentry)
> +int dentry_needs_remove_privs(struct user_namespace *mnt_userns,
> +                             struct dentry *dentry)
>  {
>         struct inode *inode = d_inode(dentry);
>         int mask = 0;
> @@ -1992,7 +1999,7 @@ int dentry_needs_remove_privs(struct dentry *dentry)
>         if (IS_NOSEC(inode))
>                 return 0;
>
> -       mask = should_remove_suid(dentry);
> +       mask = should_remove_suid(mnt_userns, dentry);
>         ret = security_inode_need_killpriv(dentry);
>         if (ret < 0)
>                 return ret;
> @@ -2024,7 +2031,7 @@ static int __file_remove_privs(struct file *file, unsigned int flags)
>         if (IS_NOSEC(inode) || !S_ISREG(inode->i_mode))
>                 return 0;
>
> -       kill = dentry_needs_remove_privs(dentry);
> +       kill = dentry_needs_remove_privs(file_mnt_user_ns(file), dentry);
>         if (kill < 0)
>                 return kill;
>
> diff --git a/fs/internal.h b/fs/internal.h
> index 1e67b4b9a4d1..ae152ded227c 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -139,7 +139,8 @@ extern int vfs_open(const struct path *, struct file *);
>   * inode.c
>   */
>  extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc);
> -extern int dentry_needs_remove_privs(struct dentry *dentry);
> +extern int dentry_needs_remove_privs(struct user_namespace *,
> +                                    struct dentry *dentry);
>
>  /*
>   * fs-writeback.c
> @@ -226,3 +227,4 @@ int do_set_acl(struct user_namespace *mnt_userns, struct dentry *dentry,
>                const char *acl_name, const void *kvalue, size_t size);
>  ssize_t do_get_acl(struct user_namespace *mnt_userns, struct dentry *dentry,
>                    const char *acl_name, void *kvalue, size_t size);
> +int should_remove_sgid(struct user_namespace *mnt_userns, struct dentry *dentry);

Pls move this up to /* inode.c */ section

Miklos,

Do you want to ACK the ovl_fallocate() and ovl_copyfile() patches
so Chritain can pick them to his tree or do you prefer to take them
through your tree? They fix SUID stripping bugs regardless of
Christain's extra patch for fixing vfs SGID stripping.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] ovl: remove privs in ovl_copyfile()
  2022-10-03 12:30 ` [PATCH 1/2] ovl: remove privs in ovl_copyfile() Amir Goldstein
@ 2022-10-05 14:37   ` Miklos Szeredi
  0 siblings, 0 replies; 13+ messages in thread
From: Miklos Szeredi @ 2022-10-05 14:37 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Yang Xu, Darrick J . Wong, Filipe Manana,
	linux-unionfs, linux-fsdevel

On Mon, 3 Oct 2022 at 14:30, Amir Goldstein <amir73il@gmail.com> wrote:
>
> Underlying fs doesn't remove privs because copy_range/remap_range are
> called with privileged mounter credentials.
>
> This fixes some failures in fstest generic/673.
>
> Fixes: 8ede205541ff ("ovl: add reflink/copyfile/dedup support")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Acked-by: Miklos Szeredi <mszeredi@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] ovl: remove privs in ovl_fallocate()
  2022-10-03 12:30 ` [PATCH 2/2] ovl: remove privs in ovl_fallocate() Amir Goldstein
  2022-10-04 10:59   ` Christian Brauner
@ 2022-10-05 14:40   ` Miklos Szeredi
  2022-10-05 15:16     ` Christian Brauner
  1 sibling, 1 reply; 13+ messages in thread
From: Miklos Szeredi @ 2022-10-05 14:40 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Yang Xu, Darrick J . Wong, Filipe Manana,
	linux-unionfs, linux-fsdevel

On Mon, 3 Oct 2022 at 14:30, Amir Goldstein <amir73il@gmail.com> wrote:
>
> Underlying fs doesn't remove privs because fallocate is called with
> privileged mounter credentials.
>
> This fixes some failure in fstests generic/683..687.
>
> Fixes: aab8848cee5e ("ovl: add ovl_fallocate()")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Acked-by: Miklos Szeredi <mszeredi@redhat.com>

Christian, please feel free to take these if you already have a bunch
of related patches.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] ovl: remove privs in ovl_fallocate()
  2022-10-05 14:40   ` Miklos Szeredi
@ 2022-10-05 15:16     ` Christian Brauner
  0 siblings, 0 replies; 13+ messages in thread
From: Christian Brauner @ 2022-10-05 15:16 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Amir Goldstein, Yang Xu, Darrick J . Wong, Filipe Manana,
	linux-unionfs, linux-fsdevel

On Wed, Oct 05, 2022 at 04:40:10PM +0200, Miklos Szeredi wrote:
> On Mon, 3 Oct 2022 at 14:30, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > Underlying fs doesn't remove privs because fallocate is called with
> > privileged mounter credentials.
> >
> > This fixes some failure in fstests generic/683..687.
> >
> > Fixes: aab8848cee5e ("ovl: add ovl_fallocate()")
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> 
> Acked-by: Miklos Szeredi <mszeredi@redhat.com>
> 
> Christian, please feel free to take these if you already have a bunch
> of related patches.

Thank you, Miklos! Will do. I resend the series with the vfs related
change. If you have an opinion on it I'd appreciate it.A

Thanks!
Christian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/2] ovl: remove privs in ovl_fallocate()
  2022-10-03 12:21 [PATCH 0/2] Some fixes for overlayfs remove privs Amir Goldstein
@ 2022-10-03 12:21 ` Amir Goldstein
  0 siblings, 0 replies; 13+ messages in thread
From: Amir Goldstein @ 2022-10-03 12:21 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Christian Brauner, Yang Xu, Darrick J . Wong, Filipe Manana,
	linux-unionfs, linux-fsdevel

Underlying fs doesn't remove privs because fallocate is called with
privileged mounter credentials.

This fixes some failure in fstests generic/683..687.

Fixes: aab8848cee5e ("ovl: add ovl_fallocate()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index c8308da8909a..e90ac5376456 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -517,9 +517,16 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 	const struct cred *old_cred;
 	int ret;
 
+	inode_lock(inode);
+	/* Update mode */
+	ovl_copyattr(inode);
+	ret = file_remove_privs(file);
+	if (ret)
+		goto out_unlock;
+
 	ret = ovl_real_fdget(file, &real);
 	if (ret)
-		return ret;
+		goto out_unlock;
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	ret = vfs_fallocate(real.file, mode, offset, len);
@@ -530,6 +537,9 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 
 	fdput(real);
 
+out_unlock:
+	inode_unlock(inode);
+
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-10-05 15:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-03 12:30 [PATCH 0/2] Some fixes for overlayfs remove privs Amir Goldstein
2022-10-03 12:30 ` [PATCH 1/2] ovl: remove privs in ovl_copyfile() Amir Goldstein
2022-10-05 14:37   ` Miklos Szeredi
2022-10-03 12:30 ` [PATCH 2/2] ovl: remove privs in ovl_fallocate() Amir Goldstein
2022-10-04 10:59   ` Christian Brauner
2022-10-04 13:20     ` Christian Brauner
2022-10-04 14:01     ` Amir Goldstein
2022-10-04 14:12       ` Christian Brauner
2022-10-04 15:52         ` Christian Brauner
2022-10-05 13:56           ` Amir Goldstein
2022-10-05 14:40   ` Miklos Szeredi
2022-10-05 15:16     ` Christian Brauner
  -- strict thread matches above, loose matches on Subject: below --
2022-10-03 12:21 [PATCH 0/2] Some fixes for overlayfs remove privs Amir Goldstein
2022-10-03 12:21 ` [PATCH 2/2] ovl: remove privs in ovl_fallocate() Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).