All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/3] vfs: Add inode_sgid_strip() api
@ 2022-04-14  7:57 Yang Xu
  2022-04-14  7:57 ` [PATCH v2 2/3] vfs: strip file's S_ISGID mode on vfs instead of on underlying filesystem Yang Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Yang Xu @ 2022-04-14  7:57 UTC (permalink / raw)
  To: linux-fsdevel, ceph-devel, ocfs2-devel
  Cc: viro, david, brauner, djwong, jlayton, Yang Xu

inode_sgid_strip() function is used to strip S_ISGID mode
when creat/open/mknod file.

Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
---
 fs/inode.c         | 18 ++++++++++++++++++
 include/linux/fs.h |  3 ++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/fs/inode.c b/fs/inode.c
index 9d9b422504d1..d63264998855 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2405,3 +2405,21 @@ struct timespec64 current_time(struct inode *inode)
 	return timestamp_truncate(now, inode);
 }
 EXPORT_SYMBOL(current_time);
+
+void inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir,
+		      umode_t *mode)
+{
+	if (!dir || !(dir->i_mode & S_ISGID))
+		return;
+	if ((*mode & (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
+		return;
+	if (S_ISDIR(*mode))
+		return;
+	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
+		return;
+	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
+		return;
+
+	*mode &= ~S_ISGID;
+}
+EXPORT_SYMBOL(inode_sgid_strip);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bbde95387a23..94d94219fe7c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1897,7 +1897,8 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
 void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
 		      const struct inode *dir, umode_t mode);
 extern bool may_open_dev(const struct path *path);
-
+void inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir,
+		      umode_t *mode);
 /*
  * This is the "filldir" function type, used by readdir() to let
  * the kernel specify what kind of dirent layout it wants to have.
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/3] vfs: strip file's S_ISGID mode on vfs instead of on underlying filesystem
  2022-04-14  7:57 [PATCH v2 1/3] vfs: Add inode_sgid_strip() api Yang Xu
@ 2022-04-14  7:57 ` Yang Xu
  2022-04-14 12:45   ` Christian Brauner
  2022-04-14  7:57 ` [PATCH v2 3/3] ceph: Remove S_ISGID clear code in ceph_finish_async_create Yang Xu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Yang Xu @ 2022-04-14  7:57 UTC (permalink / raw)
  To: linux-fsdevel, ceph-devel, ocfs2-devel
  Cc: viro, david, brauner, djwong, jlayton, Yang Xu

Currently, vfs only passes mode argument to filesystem, then use inode_init_owner()
to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner
firstly, then posxi acl setup, but xfs uses the contrary order. It will affect
S_ISGID clear especially we filter S_IXGRP by umask or acl.

Regardless of which filesystem is in use, failure to strip the SGID correctly is
considered a security failure that needs to be fixed. The current VFS infrastructure
requires the filesystem to do everything right and not step on any landmines to
strip the SGID bit, when in fact it can easily be done at the VFS and the filesystems
then don't even need to be aware that the SGID needs to be (or has been stripped) by
the operation the user asked to be done.

Vfs has all the info it needs - it doesn't need the filesystems to do everything
correctly with the mode and ensuring that they order things like posix acl setup
functions correctly with inode_init_owner() to strip the SGID bit.

Just strip the SGID bit at the VFS, and then the filesystems can't get it wrong.

Also, the inode_sgid_strip() api should be used before IS_POSIXACL() because
this api may change mode.

Only the following places use inode_init_owner
"hugetlbfs/inode.c:846:          inode_init_owner(&init_user_ns, inode, dir, mode);
 nilfs2/inode.c:354:     inode_init_owner(&init_user_ns, inode, dir, mode);
 zonefs/super.c:1289:    inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555);
 reiserfs/namei.c:619:   inode_init_owner(&init_user_ns, inode, dir, mode);
 jfs/jfs_inode.c:67:     inode_init_owner(&init_user_ns, inode, parent, mode);
 f2fs/namei.c:50:        inode_init_owner(mnt_userns, inode, dir, mode);
 ext2/ialloc.c:549:              inode_init_owner(&init_user_ns, inode, dir, mode);
 overlayfs/dir.c:643:    inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode);
 ufs/ialloc.c:292:       inode_init_owner(&init_user_ns, inode, dir, mode);
 ntfs3/inode.c:1283:     inode_init_owner(mnt_userns, inode, dir, mode);
 ramfs/inode.c:64:               inode_init_owner(&init_user_ns, inode, dir, mode);
 9p/vfs_inode.c:263:     inode_init_owner(&init_user_ns, inode, NULL, mode);
 btrfs/tests/btrfs-tests.c:65:   inode_init_owner(&init_user_ns, inode, NULL, S_IFREG);
 btrfs/inode.c:6215:     inode_init_owner(mnt_userns, inode, dir, mode);
 sysv/ialloc.c:166:      inode_init_owner(&init_user_ns, inode, dir, mode);
 omfs/inode.c:51:        inode_init_owner(&init_user_ns, inode, NULL, mode);
 ubifs/dir.c:97: inode_init_owner(&init_user_ns, inode, dir, mode);
 udf/ialloc.c:108:       inode_init_owner(&init_user_ns, inode, dir, mode);
 ext4/ialloc.c:979:              inode_init_owner(mnt_userns, inode, dir, mode);
 hfsplus/inode.c:393:    inode_init_owner(&init_user_ns, inode, dir, mode);
 xfs/xfs_inode.c:840:            inode_init_owner(mnt_userns, inode, dir, mode);
 ocfs2/dlmfs/dlmfs.c:331:                inode_init_owner(&init_user_ns, inode, NULL, mode);
 ocfs2/dlmfs/dlmfs.c:354:        inode_init_owner(&init_user_ns, inode, parent, mode);
 ocfs2/namei.c:200:      inode_init_owner(&init_user_ns, inode, dir, mode);
 minix/bitmap.c:255:     inode_init_owner(&init_user_ns, inode, dir, mode);
 bfs/dir.c:99:   inode_init_owner(&init_user_ns, inode, dir, mode);
"

They are used in filesystem init new inode function and these init inode functions are used
by following operations:
mkdir
symlink
mknod
create
tmpfile
rename

We don't care about mkdir because we don't strip SGID bit for directory except fs.xfs.irix_sgid_inherit.
symlink and rename only use valid mode that doesn't have SGID bit.

We have added inode_sgid_strip api for the remaining operations.

In addition to the above six operations, two filesystems has a little difference
1) btrfs has btrfs_create_subvol_root to create new inode but used non SGID bit mode and can ignore
2) ocfs2 reflink function should add inode_sgid_strip api manually because we don't add it in vfs

Last but not least, this patch also changed grpid behaviour for ext4/xfs because the mode passed to
them may been changed by inode_sgid_strip.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
---
 fs/inode.c       | 4 ----
 fs/namei.c       | 5 ++++-
 fs/ocfs2/namei.c | 1 +
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index d63264998855..b08bdd73e116 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2246,10 +2246,6 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
 		/* Directories are special, and always inherit S_ISGID */
 		if (S_ISDIR(mode))
 			mode |= S_ISGID;
-		else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
-			 !in_group_p(i_gid_into_mnt(mnt_userns, dir)) &&
-			 !capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
-			mode &= ~S_ISGID;
 	} else
 		inode_fsgid_set(inode, mnt_userns);
 	inode->i_mode = mode;
diff --git a/fs/namei.c b/fs/namei.c
index 3f1829b3ab5b..e03f7defdd30 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3287,6 +3287,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	if (open_flag & O_CREAT) {
 		if (open_flag & O_EXCL)
 			open_flag &= ~O_TRUNC;
+		inode_sgid_strip(mnt_userns, dir->d_inode, &mode);
 		if (!IS_POSIXACL(dir->d_inode))
 			mode &= ~current_umask();
 		if (likely(got_write))
@@ -3521,6 +3522,7 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns,
 	child = d_alloc(dentry, &slash_name);
 	if (unlikely(!child))
 		goto out_err;
+	inode_sgid_strip(mnt_userns, dir, &mode);
 	error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
 	if (error)
 		goto out_err;
@@ -3850,13 +3852,14 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
 	if (IS_ERR(dentry))
 		goto out1;
 
+	mnt_userns = mnt_user_ns(path.mnt);
+	inode_sgid_strip(mnt_userns, path.dentry->d_inode, &mode);
 	if (!IS_POSIXACL(path.dentry->d_inode))
 		mode &= ~current_umask();
 	error = security_path_mknod(&path, dentry, mode, dev);
 	if (error)
 		goto out2;
 
-	mnt_userns = mnt_user_ns(path.mnt);
 	switch (mode & S_IFMT) {
 		case 0: case S_IFREG:
 			error = vfs_create(mnt_userns, path.dentry->d_inode,
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index c75fd54b9185..f1d626697302 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -197,6 +197,7 @@ static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode)
 	 * callers. */
 	if (S_ISDIR(mode))
 		set_nlink(inode, 2);
+	inode_sgid_strip(&init_user_ns, dir, &mode);
 	inode_init_owner(&init_user_ns, inode, dir, mode);
 	status = dquot_initialize(inode);
 	if (status)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/3] ceph: Remove S_ISGID clear code in ceph_finish_async_create
  2022-04-14  7:57 [PATCH v2 1/3] vfs: Add inode_sgid_strip() api Yang Xu
  2022-04-14  7:57 ` [PATCH v2 2/3] vfs: strip file's S_ISGID mode on vfs instead of on underlying filesystem Yang Xu
@ 2022-04-14  7:57 ` Yang Xu
  2022-04-14 12:02 ` [PATCH v2 1/3] vfs: Add inode_sgid_strip() api Christian Brauner
  2022-04-14 15:57   ` Darrick J. Wong
  3 siblings, 0 replies; 16+ messages in thread
From: Yang Xu @ 2022-04-14  7:57 UTC (permalink / raw)
  To: linux-fsdevel, ceph-devel, ocfs2-devel
  Cc: viro, david, brauner, djwong, jlayton, Yang Xu

Since vfs has stripped S_ISGID, we don't need this code any more.

Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
---
 fs/ceph/file.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 6c9e837aa1d3..8e3b99853333 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -651,10 +651,6 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
 		/* Directories always inherit the setgid bit. */
 		if (S_ISDIR(mode))
 			mode |= S_ISGID;
-		else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
-			 !in_group_p(dir->i_gid) &&
-			 !capable_wrt_inode_uidgid(&init_user_ns, dir, CAP_FSETID))
-			mode &= ~S_ISGID;
 	} else {
 		in.gid = cpu_to_le32(from_kgid(&init_user_ns, current_fsgid()));
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/3] vfs: Add inode_sgid_strip() api
  2022-04-14  7:57 [PATCH v2 1/3] vfs: Add inode_sgid_strip() api Yang Xu
  2022-04-14  7:57 ` [PATCH v2 2/3] vfs: strip file's S_ISGID mode on vfs instead of on underlying filesystem Yang Xu
  2022-04-14  7:57 ` [PATCH v2 3/3] ceph: Remove S_ISGID clear code in ceph_finish_async_create Yang Xu
@ 2022-04-14 12:02 ` Christian Brauner
  2022-04-15  1:39   ` xuyang2018.jy
  2022-04-14 15:57   ` Darrick J. Wong
  3 siblings, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2022-04-14 12:02 UTC (permalink / raw)
  To: Yang Xu
  Cc: linux-fsdevel, ceph-devel, ocfs2-devel, viro, david, djwong, jlayton

On Thu, Apr 14, 2022 at 03:57:17PM +0800, Yang Xu wrote:
> inode_sgid_strip() function is used to strip S_ISGID mode
> when creat/open/mknod file.
> 
> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
> ---
>  fs/inode.c         | 18 ++++++++++++++++++
>  include/linux/fs.h |  3 ++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 9d9b422504d1..d63264998855 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2405,3 +2405,21 @@ struct timespec64 current_time(struct inode *inode)
>  	return timestamp_truncate(now, inode);
>  }
>  EXPORT_SYMBOL(current_time);
> +
> +void inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir,
> +		      umode_t *mode)
> +{
> +	if (!dir || !(dir->i_mode & S_ISGID))
> +		return;
> +	if ((*mode & (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
> +		return;
> +	if (S_ISDIR(*mode))
> +		return;
> +	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
> +		return;
> +	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
> +		return;
> +
> +	*mode &= ~S_ISGID;
> +}
> +EXPORT_SYMBOL(inode_sgid_strip);


I still think this should return umode_t with the setgid bit stripped
instead of modifying the mode directly. I may have misunderstood Dave,
but I thought he preferred to return umode_t too?

umode_t inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir, umode_t mode)
{
	if (S_ISDIR(mode))
		return mode;

	if (!dir || !(dir->i_mode & S_ISGID))
		return;

	if ((mode & (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
		return;

	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
		return;

	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
		return;

	return mode & ~S_ISGID;
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] vfs: strip file's S_ISGID mode on vfs instead of on underlying filesystem
  2022-04-14  7:57 ` [PATCH v2 2/3] vfs: strip file's S_ISGID mode on vfs instead of on underlying filesystem Yang Xu
@ 2022-04-14 12:45   ` Christian Brauner
  2022-04-15  3:14     ` xuyang2018.jy
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2022-04-14 12:45 UTC (permalink / raw)
  To: Yang Xu
  Cc: linux-fsdevel, ceph-devel, ocfs2-devel, viro, david, djwong, jlayton

On Thu, Apr 14, 2022 at 03:57:18PM +0800, Yang Xu wrote:
> Currently, vfs only passes mode argument to filesystem, then use inode_init_owner()
> to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner
> firstly, then posxi acl setup, but xfs uses the contrary order. It will affect
> S_ISGID clear especially we filter S_IXGRP by umask or acl.
> 
> Regardless of which filesystem is in use, failure to strip the SGID correctly is
> considered a security failure that needs to be fixed. The current VFS infrastructure
> requires the filesystem to do everything right and not step on any landmines to
> strip the SGID bit, when in fact it can easily be done at the VFS and the filesystems
> then don't even need to be aware that the SGID needs to be (or has been stripped) by
> the operation the user asked to be done.
> 
> Vfs has all the info it needs - it doesn't need the filesystems to do everything
> correctly with the mode and ensuring that they order things like posix acl setup
> functions correctly with inode_init_owner() to strip the SGID bit.
> 
> Just strip the SGID bit at the VFS, and then the filesystems can't get it wrong.
> 
> Also, the inode_sgid_strip() api should be used before IS_POSIXACL() because
> this api may change mode.
> 
> Only the following places use inode_init_owner
> "hugetlbfs/inode.c:846:          inode_init_owner(&init_user_ns, inode, dir, mode);
>  nilfs2/inode.c:354:     inode_init_owner(&init_user_ns, inode, dir, mode);
>  zonefs/super.c:1289:    inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555);
>  reiserfs/namei.c:619:   inode_init_owner(&init_user_ns, inode, dir, mode);
>  jfs/jfs_inode.c:67:     inode_init_owner(&init_user_ns, inode, parent, mode);
>  f2fs/namei.c:50:        inode_init_owner(mnt_userns, inode, dir, mode);
>  ext2/ialloc.c:549:              inode_init_owner(&init_user_ns, inode, dir, mode);
>  overlayfs/dir.c:643:    inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode);
>  ufs/ialloc.c:292:       inode_init_owner(&init_user_ns, inode, dir, mode);
>  ntfs3/inode.c:1283:     inode_init_owner(mnt_userns, inode, dir, mode);
>  ramfs/inode.c:64:               inode_init_owner(&init_user_ns, inode, dir, mode);
>  9p/vfs_inode.c:263:     inode_init_owner(&init_user_ns, inode, NULL, mode);
>  btrfs/tests/btrfs-tests.c:65:   inode_init_owner(&init_user_ns, inode, NULL, S_IFREG);
>  btrfs/inode.c:6215:     inode_init_owner(mnt_userns, inode, dir, mode);
>  sysv/ialloc.c:166:      inode_init_owner(&init_user_ns, inode, dir, mode);
>  omfs/inode.c:51:        inode_init_owner(&init_user_ns, inode, NULL, mode);
>  ubifs/dir.c:97: inode_init_owner(&init_user_ns, inode, dir, mode);
>  udf/ialloc.c:108:       inode_init_owner(&init_user_ns, inode, dir, mode);
>  ext4/ialloc.c:979:              inode_init_owner(mnt_userns, inode, dir, mode);
>  hfsplus/inode.c:393:    inode_init_owner(&init_user_ns, inode, dir, mode);
>  xfs/xfs_inode.c:840:            inode_init_owner(mnt_userns, inode, dir, mode);
>  ocfs2/dlmfs/dlmfs.c:331:                inode_init_owner(&init_user_ns, inode, NULL, mode);
>  ocfs2/dlmfs/dlmfs.c:354:        inode_init_owner(&init_user_ns, inode, parent, mode);
>  ocfs2/namei.c:200:      inode_init_owner(&init_user_ns, inode, dir, mode);
>  minix/bitmap.c:255:     inode_init_owner(&init_user_ns, inode, dir, mode);
>  bfs/dir.c:99:   inode_init_owner(&init_user_ns, inode, dir, mode);
> "

For completeness sake, there's also spufs which doesn't really go
through the regular VFS callpath because it has separate system calls
like:

SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
	umode_t, mode, int, neighbor_fd)

but looking through the code it only allows the creation of directories and only
allows bits in 0777.

> 
> They are used in filesystem init new inode function and these init inode functions are used
> by following operations:
> mkdir
> symlink
> mknod
> create
> tmpfile
> rename
> 
> We don't care about mkdir because we don't strip SGID bit for directory except fs.xfs.irix_sgid_inherit.
> symlink and rename only use valid mode that doesn't have SGID bit.
> 
> We have added inode_sgid_strip api for the remaining operations.
> 
> In addition to the above six operations, two filesystems has a little difference
> 1) btrfs has btrfs_create_subvol_root to create new inode but used non SGID bit mode and can ignore
> 2) ocfs2 reflink function should add inode_sgid_strip api manually because we don't add it in vfs
> 
> Last but not least, this patch also changed grpid behaviour for ext4/xfs because the mode passed to
> them may been changed by inode_sgid_strip.

I think the patch itself is useful as it would move a security sensitive
operation that is currently burried in individual filesystems into the
vfs layer. But it has a decent regression potential since it might trip
filesystems that have so far relied on getting the S_ISGID bit with a
mode argument. The example being network filesystems that Jeff brought
up earlier. So this needs a lot of testing and long exposure in -next
for at least one full kernel cycle imho.

> 
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
> ---
>  fs/inode.c       | 4 ----
>  fs/namei.c       | 5 ++++-
>  fs/ocfs2/namei.c | 1 +
>  3 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index d63264998855..b08bdd73e116 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2246,10 +2246,6 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
>  		/* Directories are special, and always inherit S_ISGID */
>  		if (S_ISDIR(mode))
>  			mode |= S_ISGID;
> -		else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
> -			 !in_group_p(i_gid_into_mnt(mnt_userns, dir)) &&
> -			 !capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
> -			mode &= ~S_ISGID;
>  	} else
>  		inode_fsgid_set(inode, mnt_userns);
>  	inode->i_mode = mode;
> diff --git a/fs/namei.c b/fs/namei.c
> index 3f1829b3ab5b..e03f7defdd30 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3287,6 +3287,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
>  	if (open_flag & O_CREAT) {
>  		if (open_flag & O_EXCL)
>  			open_flag &= ~O_TRUNC;
> +		inode_sgid_strip(mnt_userns, dir->d_inode, &mode);
>  		if (!IS_POSIXACL(dir->d_inode))
>  			mode &= ~current_umask();
>  		if (likely(got_write))
> @@ -3521,6 +3522,7 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns,
>  	child = d_alloc(dentry, &slash_name);
>  	if (unlikely(!child))
>  		goto out_err;
> +	inode_sgid_strip(mnt_userns, dir, &mode);

Hm, an additional question: how is umask stripping currently handled in
vfs_tmpfile()? I don't see it anywhere. That seems like a bug?

>  	error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
>  	if (error)
>  		goto out_err;
> @@ -3850,13 +3852,14 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
>  	if (IS_ERR(dentry))
>  		goto out1;
>  
> +	mnt_userns = mnt_user_ns(path.mnt);
> +	inode_sgid_strip(mnt_userns, path.dentry->d_inode, &mode);
>  	if (!IS_POSIXACL(path.dentry->d_inode))
>  		mode &= ~current_umask();

It would be worth to add another helper prepare_mode() that calls
inode_sgid_strip() and does the umask stripping as well and then call it
in all these places. You should even call it in do_mkdirat() since
inode_sgid_strip() will skip directories anyway. This will enforce the
same ordering for all relevant operations and it will make the code more
uniform and easier to understand.

>  	error = security_path_mknod(&path, dentry, mode, dev);
>  	if (error)
>  		goto out2;
>  
> -	mnt_userns = mnt_user_ns(path.mnt);
>  	switch (mode & S_IFMT) {
>  		case 0: case S_IFREG:
>  			error = vfs_create(mnt_userns, path.dentry->d_inode,
> diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
> index c75fd54b9185..f1d626697302 100644
> --- a/fs/ocfs2/namei.c
> +++ b/fs/ocfs2/namei.c
> @@ -197,6 +197,7 @@ static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode)
>  	 * callers. */
>  	if (S_ISDIR(mode))
>  		set_nlink(inode, 2);
> +	inode_sgid_strip(&init_user_ns, dir, &mode);
>  	inode_init_owner(&init_user_ns, inode, dir, mode);
>  	status = dquot_initialize(inode);
>  	if (status)
> -- 
> 2.27.0
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Ocfs2-devel] [PATCH v2 1/3] vfs: Add inode_sgid_strip() api
  2022-04-14  7:57 [PATCH v2 1/3] vfs: Add inode_sgid_strip() api Yang Xu
@ 2022-04-14 15:57   ` Darrick J. Wong
  2022-04-14  7:57 ` [PATCH v2 3/3] ceph: Remove S_ISGID clear code in ceph_finish_async_create Yang Xu
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Darrick J. Wong via Ocfs2-devel @ 2022-04-14 15:57 UTC (permalink / raw)
  To: Yang Xu
  Cc: brauner, jlayton, david, viro, linux-fsdevel, ceph-devel, ocfs2-devel

On Thu, Apr 14, 2022 at 03:57:17PM +0800, Yang Xu wrote:
> inode_sgid_strip() function is used to strip S_ISGID mode
> when creat/open/mknod file.
> 
> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
> ---
>  fs/inode.c         | 18 ++++++++++++++++++
>  include/linux/fs.h |  3 ++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 9d9b422504d1..d63264998855 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2405,3 +2405,21 @@ struct timespec64 current_time(struct inode *inode)
>  	return timestamp_truncate(now, inode);
>  }
>  EXPORT_SYMBOL(current_time);
> +
> +void inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir,
> +		      umode_t *mode)
> +{
> +	if (!dir || !(dir->i_mode & S_ISGID))
> +		return;
> +	if ((*mode & (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
> +		return;
> +	if (S_ISDIR(*mode))
> +		return;
> +	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
> +		return;
> +	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
> +		return;
> +
> +	*mode &= ~S_ISGID;
> +}

Thanks for cleaning up the multiple if statements from last time.

I still would like to see patch 1 replace the code in inode_init_owner
so that we can compare before and after in the same patch.  Patch 2 can
then be solely about moving the callsite around the VFS.

--D

> +EXPORT_SYMBOL(inode_sgid_strip);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index bbde95387a23..94d94219fe7c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1897,7 +1897,8 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
>  void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
>  		      const struct inode *dir, umode_t mode);
>  extern bool may_open_dev(const struct path *path);
> -
> +void inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir,
> +		      umode_t *mode);
>  /*
>   * This is the "filldir" function type, used by readdir() to let
>   * the kernel specify what kind of dirent layout it wants to have.
> -- 
> 2.27.0
> 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/3] vfs: Add inode_sgid_strip() api
@ 2022-04-14 15:57   ` Darrick J. Wong
  0 siblings, 0 replies; 16+ messages in thread
From: Darrick J. Wong @ 2022-04-14 15:57 UTC (permalink / raw)
  To: Yang Xu
  Cc: linux-fsdevel, ceph-devel, ocfs2-devel, viro, david, brauner, jlayton

On Thu, Apr 14, 2022 at 03:57:17PM +0800, Yang Xu wrote:
> inode_sgid_strip() function is used to strip S_ISGID mode
> when creat/open/mknod file.
> 
> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
> ---
>  fs/inode.c         | 18 ++++++++++++++++++
>  include/linux/fs.h |  3 ++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 9d9b422504d1..d63264998855 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2405,3 +2405,21 @@ struct timespec64 current_time(struct inode *inode)
>  	return timestamp_truncate(now, inode);
>  }
>  EXPORT_SYMBOL(current_time);
> +
> +void inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir,
> +		      umode_t *mode)
> +{
> +	if (!dir || !(dir->i_mode & S_ISGID))
> +		return;
> +	if ((*mode & (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
> +		return;
> +	if (S_ISDIR(*mode))
> +		return;
> +	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
> +		return;
> +	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
> +		return;
> +
> +	*mode &= ~S_ISGID;
> +}

Thanks for cleaning up the multiple if statements from last time.

I still would like to see patch 1 replace the code in inode_init_owner
so that we can compare before and after in the same patch.  Patch 2 can
then be solely about moving the callsite around the VFS.

--D

> +EXPORT_SYMBOL(inode_sgid_strip);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index bbde95387a23..94d94219fe7c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1897,7 +1897,8 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
>  void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
>  		      const struct inode *dir, umode_t mode);
>  extern bool may_open_dev(const struct path *path);
> -
> +void inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir,
> +		      umode_t *mode);
>  /*
>   * This is the "filldir" function type, used by readdir() to let
>   * the kernel specify what kind of dirent layout it wants to have.
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/3] vfs: Add inode_sgid_strip() api
  2022-04-14 15:57   ` Darrick J. Wong
  (?)
@ 2022-04-15  1:18   ` xuyang2018.jy
  2022-04-15  1:40       ` [Ocfs2-devel] " Darrick J. Wong via Ocfs2-devel
  -1 siblings, 1 reply; 16+ messages in thread
From: xuyang2018.jy @ 2022-04-15  1:18 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-fsdevel, ceph-devel, ocfs2-devel, viro, david, brauner, jlayton

on 2022/4/14 23:57, Darrick J. Wong wrote:
> On Thu, Apr 14, 2022 at 03:57:17PM +0800, Yang Xu wrote:
>> inode_sgid_strip() function is used to strip S_ISGID mode
>> when creat/open/mknod file.
>>
>> Reviewed-by: Christian Brauner (Microsoft)<brauner@kernel.org>
>> Signed-off-by: Yang Xu<xuyang2018.jy@fujitsu.com>
>> ---
>>   fs/inode.c         | 18 ++++++++++++++++++
>>   include/linux/fs.h |  3 ++-
>>   2 files changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/inode.c b/fs/inode.c
>> index 9d9b422504d1..d63264998855 100644
>> --- a/fs/inode.c
>> +++ b/fs/inode.c
>> @@ -2405,3 +2405,21 @@ struct timespec64 current_time(struct inode *inode)
>>   	return timestamp_truncate(now, inode);
>>   }
>>   EXPORT_SYMBOL(current_time);
>> +
>> +void inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir,
>> +		      umode_t *mode)
>> +{
>> +	if (!dir || !(dir->i_mode&  S_ISGID))
>> +		return;
>> +	if ((*mode&  (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
>> +		return;
>> +	if (S_ISDIR(*mode))
>> +		return;
>> +	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
>> +		return;
>> +	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
>> +		return;
>> +
>> +	*mode&= ~S_ISGID;
>> +}
>
> Thanks for cleaning up the multiple if statements from last time.
>
> I still would like to see patch 1 replace the code in inode_init_owner
> so that we can compare before and after in the same patch.  Patch 2 can
> then be solely about moving the callsite around the VFS.
>
Ok, then patch 1 can named as"fs/inode: move inode sgid strip operation 
from inode_init_owner into inode_sgid_strip".  What do you think about it?


Best Regards
Yang Xu
> --D
>
>> +EXPORT_SYMBOL(inode_sgid_strip);
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index bbde95387a23..94d94219fe7c 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -1897,7 +1897,8 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
>>   void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
>>   		      const struct inode *dir, umode_t mode);
>>   extern bool may_open_dev(const struct path *path);
>> -
>> +void inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir,
>> +		      umode_t *mode);
>>   /*
>>    * This is the "filldir" function type, used by readdir() to let
>>    * the kernel specify what kind of dirent layout it wants to have.
>> --
>> 2.27.0
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/3] vfs: Add inode_sgid_strip() api
  2022-04-14 12:02 ` [PATCH v2 1/3] vfs: Add inode_sgid_strip() api Christian Brauner
@ 2022-04-15  1:39   ` xuyang2018.jy
  0 siblings, 0 replies; 16+ messages in thread
From: xuyang2018.jy @ 2022-04-15  1:39 UTC (permalink / raw)
  To: Christian Brauner, david
  Cc: linux-fsdevel, ceph-devel, ocfs2-devel, viro, djwong, jlayton

on 2022/4/14 20:02, Christian Brauner wrote:
> On Thu, Apr 14, 2022 at 03:57:17PM +0800, Yang Xu wrote:
>> inode_sgid_strip() function is used to strip S_ISGID mode
>> when creat/open/mknod file.
>>
>> Reviewed-by: Christian Brauner (Microsoft)<brauner@kernel.org>
>> Signed-off-by: Yang Xu<xuyang2018.jy@fujitsu.com>
>> ---
>>   fs/inode.c         | 18 ++++++++++++++++++
>>   include/linux/fs.h |  3 ++-
>>   2 files changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/inode.c b/fs/inode.c
>> index 9d9b422504d1..d63264998855 100644
>> --- a/fs/inode.c
>> +++ b/fs/inode.c
>> @@ -2405,3 +2405,21 @@ struct timespec64 current_time(struct inode *inode)
>>   	return timestamp_truncate(now, inode);
>>   }
>>   EXPORT_SYMBOL(current_time);
>> +
>> +void inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir,
>> +		      umode_t *mode)
>> +{
>> +	if (!dir || !(dir->i_mode&  S_ISGID))
>> +		return;
>> +	if ((*mode&  (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
>> +		return;
>> +	if (S_ISDIR(*mode))
>> +		return;
>> +	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
>> +		return;
>> +	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
>> +		return;
>> +
>> +	*mode&= ~S_ISGID;
>> +}
>> +EXPORT_SYMBOL(inode_sgid_strip);
>
>
> I still think this should return umode_t with the setgid bit stripped
> instead of modifying the mode directly. I may have misunderstood Dave,
> but I thought he preferred to return umode_t too?
Dave's comment as below:
"
Agreed, that's a much nicer API for this function - it makes it
clear that it can modifying the mode that is passed in.
"

So I think Dave should like modify mode directly instead of returning a 
umode_t value.

@Dave  So which way do you mean?

Best Regards
Yang Xu
>
> umode_t inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir, umode_t mode)
> {
> 	if (S_ISDIR(mode))
> 		return mode;
>
> 	if (!dir || !(dir->i_mode&  S_ISGID))
> 		return;
>
> 	if ((mode&  (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
> 		return;
>
> 	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
> 		return;
>
> 	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
> 		return;
>
> 	return mode&  ~S_ISGID;
> }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/3] vfs: Add inode_sgid_strip() api
  2022-04-15  1:18   ` xuyang2018.jy
@ 2022-04-15  1:40       ` Darrick J. Wong via Ocfs2-devel
  0 siblings, 0 replies; 16+ messages in thread
From: Darrick J. Wong @ 2022-04-15  1:40 UTC (permalink / raw)
  To: xuyang2018.jy
  Cc: linux-fsdevel, ceph-devel, ocfs2-devel, viro, david, brauner, jlayton

On Fri, Apr 15, 2022 at 01:18:57AM +0000, xuyang2018.jy@fujitsu.com wrote:
> on 2022/4/14 23:57, Darrick J. Wong wrote:
> > On Thu, Apr 14, 2022 at 03:57:17PM +0800, Yang Xu wrote:
> >> inode_sgid_strip() function is used to strip S_ISGID mode
> >> when creat/open/mknod file.
> >>
> >> Reviewed-by: Christian Brauner (Microsoft)<brauner@kernel.org>
> >> Signed-off-by: Yang Xu<xuyang2018.jy@fujitsu.com>
> >> ---
> >>   fs/inode.c         | 18 ++++++++++++++++++
> >>   include/linux/fs.h |  3 ++-
> >>   2 files changed, 20 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/inode.c b/fs/inode.c
> >> index 9d9b422504d1..d63264998855 100644
> >> --- a/fs/inode.c
> >> +++ b/fs/inode.c
> >> @@ -2405,3 +2405,21 @@ struct timespec64 current_time(struct inode *inode)
> >>   	return timestamp_truncate(now, inode);
> >>   }
> >>   EXPORT_SYMBOL(current_time);
> >> +
> >> +void inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir,
> >> +		      umode_t *mode)
> >> +{
> >> +	if (!dir || !(dir->i_mode&  S_ISGID))
> >> +		return;
> >> +	if ((*mode&  (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
> >> +		return;
> >> +	if (S_ISDIR(*mode))
> >> +		return;
> >> +	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
> >> +		return;
> >> +	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
> >> +		return;
> >> +
> >> +	*mode&= ~S_ISGID;
> >> +}
> >
> > Thanks for cleaning up the multiple if statements from last time.
> >
> > I still would like to see patch 1 replace the code in inode_init_owner
> > so that we can compare before and after in the same patch.  Patch 2 can
> > then be solely about moving the callsite around the VFS.
> >
> Ok, then patch 1 can named as"fs/inode: move inode sgid strip operation 
> from inode_init_owner into inode_sgid_strip".  What do you think about it?

Sounds good to me.

--D

> 
> Best Regards
> Yang Xu
> > --D
> >
> >> +EXPORT_SYMBOL(inode_sgid_strip);
> >> diff --git a/include/linux/fs.h b/include/linux/fs.h
> >> index bbde95387a23..94d94219fe7c 100644
> >> --- a/include/linux/fs.h
> >> +++ b/include/linux/fs.h
> >> @@ -1897,7 +1897,8 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
> >>   void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
> >>   		      const struct inode *dir, umode_t mode);
> >>   extern bool may_open_dev(const struct path *path);
> >> -
> >> +void inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir,
> >> +		      umode_t *mode);
> >>   /*
> >>    * This is the "filldir" function type, used by readdir() to let
> >>    * the kernel specify what kind of dirent layout it wants to have.
> >> --
> >> 2.27.0
> >>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Ocfs2-devel] [PATCH v2 1/3] vfs: Add inode_sgid_strip() api
@ 2022-04-15  1:40       ` Darrick J. Wong via Ocfs2-devel
  0 siblings, 0 replies; 16+ messages in thread
From: Darrick J. Wong via Ocfs2-devel @ 2022-04-15  1:40 UTC (permalink / raw)
  To: xuyang2018.jy
  Cc: brauner, jlayton, david, viro, linux-fsdevel, ceph-devel, ocfs2-devel

On Fri, Apr 15, 2022 at 01:18:57AM +0000, xuyang2018.jy@fujitsu.com wrote:
> on 2022/4/14 23:57, Darrick J. Wong wrote:
> > On Thu, Apr 14, 2022 at 03:57:17PM +0800, Yang Xu wrote:
> >> inode_sgid_strip() function is used to strip S_ISGID mode
> >> when creat/open/mknod file.
> >>
> >> Reviewed-by: Christian Brauner (Microsoft)<brauner@kernel.org>
> >> Signed-off-by: Yang Xu<xuyang2018.jy@fujitsu.com>
> >> ---
> >>   fs/inode.c         | 18 ++++++++++++++++++
> >>   include/linux/fs.h |  3 ++-
> >>   2 files changed, 20 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/inode.c b/fs/inode.c
> >> index 9d9b422504d1..d63264998855 100644
> >> --- a/fs/inode.c
> >> +++ b/fs/inode.c
> >> @@ -2405,3 +2405,21 @@ struct timespec64 current_time(struct inode *inode)
> >>   	return timestamp_truncate(now, inode);
> >>   }
> >>   EXPORT_SYMBOL(current_time);
> >> +
> >> +void inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir,
> >> +		      umode_t *mode)
> >> +{
> >> +	if (!dir || !(dir->i_mode&  S_ISGID))
> >> +		return;
> >> +	if ((*mode&  (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
> >> +		return;
> >> +	if (S_ISDIR(*mode))
> >> +		return;
> >> +	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
> >> +		return;
> >> +	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
> >> +		return;
> >> +
> >> +	*mode&= ~S_ISGID;
> >> +}
> >
> > Thanks for cleaning up the multiple if statements from last time.
> >
> > I still would like to see patch 1 replace the code in inode_init_owner
> > so that we can compare before and after in the same patch.  Patch 2 can
> > then be solely about moving the callsite around the VFS.
> >
> Ok, then patch 1 can named as"fs/inode: move inode sgid strip operation 
> from inode_init_owner into inode_sgid_strip".  What do you think about it?

Sounds good to me.

--D

> 
> Best Regards
> Yang Xu
> > --D
> >
> >> +EXPORT_SYMBOL(inode_sgid_strip);
> >> diff --git a/include/linux/fs.h b/include/linux/fs.h
> >> index bbde95387a23..94d94219fe7c 100644
> >> --- a/include/linux/fs.h
> >> +++ b/include/linux/fs.h
> >> @@ -1897,7 +1897,8 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
> >>   void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
> >>   		      const struct inode *dir, umode_t mode);
> >>   extern bool may_open_dev(const struct path *path);
> >> -
> >> +void inode_sgid_strip(struct user_namespace *mnt_userns, struct inode *dir,
> >> +		      umode_t *mode);
> >>   /*
> >>    * This is the "filldir" function type, used by readdir() to let
> >>    * the kernel specify what kind of dirent layout it wants to have.
> >> --
> >> 2.27.0
> >>

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] vfs: strip file's S_ISGID mode on vfs instead of on underlying filesystem
  2022-04-14 12:45   ` Christian Brauner
@ 2022-04-15  3:14     ` xuyang2018.jy
  2022-04-15  9:06       ` xuyang2018.jy
  2022-04-15 14:02       ` Christian Brauner
  0 siblings, 2 replies; 16+ messages in thread
From: xuyang2018.jy @ 2022-04-15  3:14 UTC (permalink / raw)
  To: Christian Brauner, djwong
  Cc: linux-fsdevel, ceph-devel, ocfs2-devel, viro, david, jlayton

on 2022/4/14 20:45, Christian Brauner wrote:
> On Thu, Apr 14, 2022 at 03:57:18PM +0800, Yang Xu wrote:
>> Currently, vfs only passes mode argument to filesystem, then use inode_init_owner()
>> to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner
>> firstly, then posxi acl setup, but xfs uses the contrary order. It will affect
>> S_ISGID clear especially we filter S_IXGRP by umask or acl.
>>
>> Regardless of which filesystem is in use, failure to strip the SGID correctly is
>> considered a security failure that needs to be fixed. The current VFS infrastructure
>> requires the filesystem to do everything right and not step on any landmines to
>> strip the SGID bit, when in fact it can easily be done at the VFS and the filesystems
>> then don't even need to be aware that the SGID needs to be (or has been stripped) by
>> the operation the user asked to be done.
>>
>> Vfs has all the info it needs - it doesn't need the filesystems to do everything
>> correctly with the mode and ensuring that they order things like posix acl setup
>> functions correctly with inode_init_owner() to strip the SGID bit.
>>
>> Just strip the SGID bit at the VFS, and then the filesystems can't get it wrong.
>>
>> Also, the inode_sgid_strip() api should be used before IS_POSIXACL() because
>> this api may change mode.
>>
>> Only the following places use inode_init_owner
>> "hugetlbfs/inode.c:846:          inode_init_owner(&init_user_ns, inode, dir, mode);
>>   nilfs2/inode.c:354:     inode_init_owner(&init_user_ns, inode, dir, mode);
>>   zonefs/super.c:1289:    inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555);
>>   reiserfs/namei.c:619:   inode_init_owner(&init_user_ns, inode, dir, mode);
>>   jfs/jfs_inode.c:67:     inode_init_owner(&init_user_ns, inode, parent, mode);
>>   f2fs/namei.c:50:        inode_init_owner(mnt_userns, inode, dir, mode);
>>   ext2/ialloc.c:549:              inode_init_owner(&init_user_ns, inode, dir, mode);
>>   overlayfs/dir.c:643:    inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode);
>>   ufs/ialloc.c:292:       inode_init_owner(&init_user_ns, inode, dir, mode);
>>   ntfs3/inode.c:1283:     inode_init_owner(mnt_userns, inode, dir, mode);
>>   ramfs/inode.c:64:               inode_init_owner(&init_user_ns, inode, dir, mode);
>>   9p/vfs_inode.c:263:     inode_init_owner(&init_user_ns, inode, NULL, mode);
>>   btrfs/tests/btrfs-tests.c:65:   inode_init_owner(&init_user_ns, inode, NULL, S_IFREG);
>>   btrfs/inode.c:6215:     inode_init_owner(mnt_userns, inode, dir, mode);
>>   sysv/ialloc.c:166:      inode_init_owner(&init_user_ns, inode, dir, mode);
>>   omfs/inode.c:51:        inode_init_owner(&init_user_ns, inode, NULL, mode);
>>   ubifs/dir.c:97: inode_init_owner(&init_user_ns, inode, dir, mode);
>>   udf/ialloc.c:108:       inode_init_owner(&init_user_ns, inode, dir, mode);
>>   ext4/ialloc.c:979:              inode_init_owner(mnt_userns, inode, dir, mode);
>>   hfsplus/inode.c:393:    inode_init_owner(&init_user_ns, inode, dir, mode);
>>   xfs/xfs_inode.c:840:            inode_init_owner(mnt_userns, inode, dir, mode);
>>   ocfs2/dlmfs/dlmfs.c:331:                inode_init_owner(&init_user_ns, inode, NULL, mode);
>>   ocfs2/dlmfs/dlmfs.c:354:        inode_init_owner(&init_user_ns, inode, parent, mode);
>>   ocfs2/namei.c:200:      inode_init_owner(&init_user_ns, inode, dir, mode);
>>   minix/bitmap.c:255:     inode_init_owner(&init_user_ns, inode, dir, mode);
>>   bfs/dir.c:99:   inode_init_owner(&init_user_ns, inode, dir, mode);
>> "
>
> For completeness sake, there's also spufs which doesn't really go
> through the regular VFS callpath because it has separate system calls
> like:
>
> SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
> 	umode_t, mode, int, neighbor_fd)
>
> but looking through the code it only allows the creation of directories and only
> allows bits in 0777.
IMO, this fs also doesn't use inode_init_owner, so it should be not 
affected. We add indo_sgid_strip into vfs, IMO, it only happen new sgid 
strip situation and doesn't happen to remove old sgid strip situation.
So I think it is "safe".
>
>>
>> They are used in filesystem init new inode function and these init inode functions are used
>> by following operations:
>> mkdir
>> symlink
>> mknod
>> create
>> tmpfile
>> rename
>>
>> We don't care about mkdir because we don't strip SGID bit for directory except fs.xfs.irix_sgid_inherit.
>> symlink and rename only use valid mode that doesn't have SGID bit.
>>
>> We have added inode_sgid_strip api for the remaining operations.
>>
>> In addition to the above six operations, two filesystems has a little difference
>> 1) btrfs has btrfs_create_subvol_root to create new inode but used non SGID bit mode and can ignore
>> 2) ocfs2 reflink function should add inode_sgid_strip api manually because we don't add it in vfs
>>
>> Last but not least, this patch also changed grpid behaviour for ext4/xfs because the mode passed to
>> them may been changed by inode_sgid_strip.
>
> I think the patch itself is useful as it would move a security sensitive
> operation that is currently burried in individual filesystems into the
> vfs layer. But it has a decent regression potential since it might trip
> filesystems that have so far relied on getting the S_ISGID bit with a
> mode argument. The example being network filesystems that Jeff brought
> up earlier. So this needs a lot of testing and long exposure in -next
> for at least one full kernel cycle imho.
Agreed.
>
>>
>> Suggested-by: Dave Chinner<david@fromorbit.com>
>> Signed-off-by: Yang Xu<xuyang2018.jy@fujitsu.com>
>> ---
>>   fs/inode.c       | 4 ----
>>   fs/namei.c       | 5 ++++-
>>   fs/ocfs2/namei.c | 1 +
>>   3 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/inode.c b/fs/inode.c
>> index d63264998855..b08bdd73e116 100644
>> --- a/fs/inode.c
>> +++ b/fs/inode.c
>> @@ -2246,10 +2246,6 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
>>   		/* Directories are special, and always inherit S_ISGID */
>>   		if (S_ISDIR(mode))
>>   			mode |= S_ISGID;
>> -		else if ((mode&  (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)&&
>> -			 !in_group_p(i_gid_into_mnt(mnt_userns, dir))&&
>> -			 !capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
>> -			mode&= ~S_ISGID;
>>   	} else
>>   		inode_fsgid_set(inode, mnt_userns);
>>   	inode->i_mode = mode;
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 3f1829b3ab5b..e03f7defdd30 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -3287,6 +3287,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
>>   	if (open_flag&  O_CREAT) {
>>   		if (open_flag&  O_EXCL)
>>   			open_flag&= ~O_TRUNC;
>> +		inode_sgid_strip(mnt_userns, dir->d_inode,&mode);
>>   		if (!IS_POSIXACL(dir->d_inode))
>>   			mode&= ~current_umask();
>>   		if (likely(got_write))
>> @@ -3521,6 +3522,7 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns,
>>   	child = d_alloc(dentry,&slash_name);
>>   	if (unlikely(!child))
>>   		goto out_err;
>> +	inode_sgid_strip(mnt_userns, dir,&mode);
>
> Hm, an additional question: how is umask stripping currently handled in
> vfs_tmpfile()? I don't see it anywhere. That seems like a bug?
Yes, I think it is a bug.

You can verify this by setting
export MOUNT_OPTIONS='-o noacl'
in your xfstests config.

then in my setgid_create_umask test, you can add is_ixgrp check for 
tmpfile and it still have S_IXGRP mode(So yesterday, add additional 
check for this mode is meaningful).

For xfs, it seems doesn't have noacl mount options, just disable 
CONFIG_XFS_POSIX_ACL.
But, xfs doesn't have this problem becuase it doesn't put 
posix_acl_create under CONFIG_XFS_POSIX_ACL situation.

I think we should add this umask stripping  here and also should
put xfs posix_acl_create code into  CONFIG_XFS_POSIX_ACL situation.

@Darrick  What do you think the xfs code change ?

commit mesage may as below:
xfs/xfs_iops: Only do posix acl setup operation under CONFIG_XFS_POSIX_ACL

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index b34e8e4344a8..61c1d4e85891 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -146,10 +146,13 @@ xfs_create_need_xattr(
         struct posix_acl *default_acl,
         struct posix_acl *acl)
  {
+#ifdef CONFIG_XFS_POSIX_ACL
         if (acl)
                 return true;
         if (default_acl)
                 return true;
+#endif
+
  #if IS_ENABLED(CONFIG_SECURITY)
         if (dir->i_sb->s_security)
                 return true;
@@ -183,10 +186,11 @@ xfs_generic_create(
         } else {
                 rdev = 0;
         }
-
+#ifdef CONFIG_XFS_POSIX_ACL
         error = posix_acl_create(dir, &mode, &default_acl, &acl);
         if (error)
                 return error;
+#endif

         /* Verify mode is valid also for tmpfile case */
         error = xfs_dentry_mode_to_name(&name, dentry, mode);


Best Regards
Yang Xu
>
>>   	error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
>>   	if (error)
>>   		goto out_err;
>> @@ -3850,13 +3852,14 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
>>   	if (IS_ERR(dentry))
>>   		goto out1;
>>
>> +	mnt_userns = mnt_user_ns(path.mnt);
>> +	inode_sgid_strip(mnt_userns, path.dentry->d_inode,&mode);
>>   	if (!IS_POSIXACL(path.dentry->d_inode))
>>   		mode&= ~current_umask();
>
> It would be worth to add another helper prepare_mode() that calls
> inode_sgid_strip() and does the umask stripping as well and then call it
> in all these places. You should even call it in do_mkdirat() since
> inode_sgid_strip() will skip directories anyway. This will enforce the
> same ordering for all relevant operations and it will make the code more
> uniform and easier to understand.
Sound reasonable. I will think of it.
>
>>   	error = security_path_mknod(&path, dentry, mode, dev);
>>   	if (error)
>>   		goto out2;
>>
>> -	mnt_userns = mnt_user_ns(path.mnt);
>>   	switch (mode&  S_IFMT) {
>>   		case 0: case S_IFREG:
>>   			error = vfs_create(mnt_userns, path.dentry->d_inode,
>> diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
>> index c75fd54b9185..f1d626697302 100644
>> --- a/fs/ocfs2/namei.c
>> +++ b/fs/ocfs2/namei.c
>> @@ -197,6 +197,7 @@ static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode)
>>   	 * callers. */
>>   	if (S_ISDIR(mode))
>>   		set_nlink(inode, 2);
>> +	inode_sgid_strip(&init_user_ns, dir,&mode);
>>   	inode_init_owner(&init_user_ns, inode, dir, mode);
>>   	status = dquot_initialize(inode);
>>   	if (status)
>> --
>> 2.27.0
>>
>>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] vfs: strip file's S_ISGID mode on vfs instead of on underlying filesystem
  2022-04-15  3:14     ` xuyang2018.jy
@ 2022-04-15  9:06       ` xuyang2018.jy
  2022-04-15 14:03         ` Christian Brauner
  2022-04-15 14:02       ` Christian Brauner
  1 sibling, 1 reply; 16+ messages in thread
From: xuyang2018.jy @ 2022-04-15  9:06 UTC (permalink / raw)
  To: Christian Brauner, djwong
  Cc: linux-fsdevel, ceph-devel, ocfs2-devel, viro, david, jlayton

on 2022/4/15 12:15, Yang Xu wrote:
> on 2022/4/14 20:45, Christian Brauner wrote:
>> On Thu, Apr 14, 2022 at 03:57:18PM +0800, Yang Xu wrote:
>>> Currently, vfs only passes mode argument to filesystem, then use
>>> inode_init_owner()
>>> to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call
>>> inode_init_owner
>>> firstly, then posxi acl setup, but xfs uses the contrary order. It
>>> will affect
>>> S_ISGID clear especially we filter S_IXGRP by umask or acl.
>>>
>>> Regardless of which filesystem is in use, failure to strip the SGID
>>> correctly is
>>> considered a security failure that needs to be fixed. The current VFS
>>> infrastructure
>>> requires the filesystem to do everything right and not step on any
>>> landmines to
>>> strip the SGID bit, when in fact it can easily be done at the VFS and
>>> the filesystems
>>> then don't even need to be aware that the SGID needs to be (or has
>>> been stripped) by
>>> the operation the user asked to be done.
>>>
>>> Vfs has all the info it needs - it doesn't need the filesystems to do
>>> everything
>>> correctly with the mode and ensuring that they order things like
>>> posix acl setup
>>> functions correctly with inode_init_owner() to strip the SGID bit.
>>>
>>> Just strip the SGID bit at the VFS, and then the filesystems can't
>>> get it wrong.
>>>
>>> Also, the inode_sgid_strip() api should be used before IS_POSIXACL()
>>> because
>>> this api may change mode.
>>>
>>> Only the following places use inode_init_owner
>>> "hugetlbfs/inode.c:846: inode_init_owner(&init_user_ns, inode, dir,
>>> mode);
>>> nilfs2/inode.c:354: inode_init_owner(&init_user_ns, inode, dir, mode);
>>> zonefs/super.c:1289: inode_init_owner(&init_user_ns, inode, parent,
>>> S_IFDIR | 0555);
>>> reiserfs/namei.c:619: inode_init_owner(&init_user_ns, inode, dir, mode);
>>> jfs/jfs_inode.c:67: inode_init_owner(&init_user_ns, inode, parent,
>>> mode);
>>> f2fs/namei.c:50: inode_init_owner(mnt_userns, inode, dir, mode);
>>> ext2/ialloc.c:549: inode_init_owner(&init_user_ns, inode, dir, mode);
>>> overlayfs/dir.c:643: inode_init_owner(&init_user_ns, inode,
>>> dentry->d_parent->d_inode, mode);
>>> ufs/ialloc.c:292: inode_init_owner(&init_user_ns, inode, dir, mode);
>>> ntfs3/inode.c:1283: inode_init_owner(mnt_userns, inode, dir, mode);
>>> ramfs/inode.c:64: inode_init_owner(&init_user_ns, inode, dir, mode);
>>> 9p/vfs_inode.c:263: inode_init_owner(&init_user_ns, inode, NULL, mode);
>>> btrfs/tests/btrfs-tests.c:65: inode_init_owner(&init_user_ns, inode,
>>> NULL, S_IFREG);
>>> btrfs/inode.c:6215: inode_init_owner(mnt_userns, inode, dir, mode);
>>> sysv/ialloc.c:166: inode_init_owner(&init_user_ns, inode, dir, mode);
>>> omfs/inode.c:51: inode_init_owner(&init_user_ns, inode, NULL, mode);
>>> ubifs/dir.c:97: inode_init_owner(&init_user_ns, inode, dir, mode);
>>> udf/ialloc.c:108: inode_init_owner(&init_user_ns, inode, dir, mode);
>>> ext4/ialloc.c:979: inode_init_owner(mnt_userns, inode, dir, mode);
>>> hfsplus/inode.c:393: inode_init_owner(&init_user_ns, inode, dir, mode);
>>> xfs/xfs_inode.c:840: inode_init_owner(mnt_userns, inode, dir, mode);
>>> ocfs2/dlmfs/dlmfs.c:331: inode_init_owner(&init_user_ns, inode, NULL,
>>> mode);
>>> ocfs2/dlmfs/dlmfs.c:354: inode_init_owner(&init_user_ns, inode,
>>> parent, mode);
>>> ocfs2/namei.c:200: inode_init_owner(&init_user_ns, inode, dir, mode);
>>> minix/bitmap.c:255: inode_init_owner(&init_user_ns, inode, dir, mode);
>>> bfs/dir.c:99: inode_init_owner(&init_user_ns, inode, dir, mode);
>>> "
>>
>> For completeness sake, there's also spufs which doesn't really go
>> through the regular VFS callpath because it has separate system calls
>> like:
>>
>> SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int,
>> flags,
>> umode_t, mode, int, neighbor_fd)
>>
>> but looking through the code it only allows the creation of
>> directories and only
>> allows bits in 0777.
> IMO, this fs also doesn't use inode_init_owner, so it should be not
> affected. We add indo_sgid_strip into vfs, IMO, it only happen new sgid
> strip situation and doesn't happen to remove old sgid strip situation.
> So I think it is "safe".
>>
>>>
>>> They are used in filesystem init new inode function and these init
>>> inode functions are used
>>> by following operations:
>>> mkdir
>>> symlink
>>> mknod
>>> create
>>> tmpfile
>>> rename
>>>
>>> We don't care about mkdir because we don't strip SGID bit for
>>> directory except fs.xfs.irix_sgid_inherit.
>>> symlink and rename only use valid mode that doesn't have SGID bit.
>>>
>>> We have added inode_sgid_strip api for the remaining operations.
>>>
>>> In addition to the above six operations, two filesystems has a little
>>> difference
>>> 1) btrfs has btrfs_create_subvol_root to create new inode but used
>>> non SGID bit mode and can ignore
>>> 2) ocfs2 reflink function should add inode_sgid_strip api manually
>>> because we don't add it in vfs
>>>
>>> Last but not least, this patch also changed grpid behaviour for
>>> ext4/xfs because the mode passed to
>>> them may been changed by inode_sgid_strip.
>>
>> I think the patch itself is useful as it would move a security sensitive
>> operation that is currently burried in individual filesystems into the
>> vfs layer. But it has a decent regression potential since it might trip
>> filesystems that have so far relied on getting the S_ISGID bit with a
>> mode argument. The example being network filesystems that Jeff brought
>> up earlier. So this needs a lot of testing and long exposure in -next
>> for at least one full kernel cycle imho.
> Agreed.
>>
>>>
>>> Suggested-by: Dave Chinner<david@fromorbit.com>
>>> Signed-off-by: Yang Xu<xuyang2018.jy@fujitsu.com>
>>> ---
>>> fs/inode.c | 4 ----
>>> fs/namei.c | 5 ++++-
>>> fs/ocfs2/namei.c | 1 +
>>> 3 files changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/inode.c b/fs/inode.c
>>> index d63264998855..b08bdd73e116 100644
>>> --- a/fs/inode.c
>>> +++ b/fs/inode.c
>>> @@ -2246,10 +2246,6 @@ void inode_init_owner(struct user_namespace
>>> *mnt_userns, struct inode *inode,
>>> /* Directories are special, and always inherit S_ISGID */
>>> if (S_ISDIR(mode))
>>> mode |= S_ISGID;
>>> - else if ((mode& (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)&&
>>> - !in_group_p(i_gid_into_mnt(mnt_userns, dir))&&
>>> - !capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
>>> - mode&= ~S_ISGID;
>>> } else
>>> inode_fsgid_set(inode, mnt_userns);
>>> inode->i_mode = mode;
>>> diff --git a/fs/namei.c b/fs/namei.c
>>> index 3f1829b3ab5b..e03f7defdd30 100644
>>> --- a/fs/namei.c
>>> +++ b/fs/namei.c
>>> @@ -3287,6 +3287,7 @@ static struct dentry *lookup_open(struct
>>> nameidata *nd, struct file *file,
>>> if (open_flag& O_CREAT) {
>>> if (open_flag& O_EXCL)
>>> open_flag&= ~O_TRUNC;
>>> + inode_sgid_strip(mnt_userns, dir->d_inode,&mode);
>>> if (!IS_POSIXACL(dir->d_inode))
>>> mode&= ~current_umask();
>>> if (likely(got_write))
>>> @@ -3521,6 +3522,7 @@ struct dentry *vfs_tmpfile(struct
>>> user_namespace *mnt_userns,
>>> child = d_alloc(dentry,&slash_name);
>>> if (unlikely(!child))
>>> goto out_err;
>>> + inode_sgid_strip(mnt_userns, dir,&mode);
>>
>> Hm, an additional question: how is umask stripping currently handled in
>> vfs_tmpfile()? I don't see it anywhere. That seems like a bug?
> Yes, I think it is a bug.
Since you found this bug and I have finished my v3 kernel patch set(also 
fix this tmpfile umask problem and add your reported-by, also two 
patches about other problem in xfs/nfs ), so do you will fix this kernel 
bug or I send a v3 directly?

Best Regards
Yang Xu
>
> You can verify this by setting
> export MOUNT_OPTIONS='-o noacl'
> in your xfstests config.
>
> then in my setgid_create_umask test, you can add is_ixgrp check for
> tmpfile and it still have S_IXGRP mode(So yesterday, add additional
> check for this mode is meaningful).
>
> For xfs, it seems doesn't have noacl mount options, just disable
> CONFIG_XFS_POSIX_ACL.
> But, xfs doesn't have this problem becuase it doesn't put
> posix_acl_create under CONFIG_XFS_POSIX_ACL situation.
>
> I think we should add this umask stripping here and also should
> put xfs posix_acl_create code into CONFIG_XFS_POSIX_ACL situation.
>
> @Darrick What do you think the xfs code change ?
>
> commit mesage may as below:
> xfs/xfs_iops: Only do posix acl setup operation under CONFIG_XFS_POSIX_ACL
>
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index b34e8e4344a8..61c1d4e85891 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -146,10 +146,13 @@ xfs_create_need_xattr(
> struct posix_acl *default_acl,
> struct posix_acl *acl)
> {
> +#ifdef CONFIG_XFS_POSIX_ACL
> if (acl)
> return true;
> if (default_acl)
> return true;
> +#endif
> +
> #if IS_ENABLED(CONFIG_SECURITY)
> if (dir->i_sb->s_security)
> return true;
> @@ -183,10 +186,11 @@ xfs_generic_create(
> } else {
> rdev = 0;
> }
> -
> +#ifdef CONFIG_XFS_POSIX_ACL
> error = posix_acl_create(dir, &mode, &default_acl, &acl);
> if (error)
> return error;
> +#endif
>
> /* Verify mode is valid also for tmpfile case */
> error = xfs_dentry_mode_to_name(&name, dentry, mode);
>
>
> Best Regards
> Yang Xu
>>
>>> error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
>>> if (error)
>>> goto out_err;
>>> @@ -3850,13 +3852,14 @@ static int do_mknodat(int dfd, struct
>>> filename *name, umode_t mode,
>>> if (IS_ERR(dentry))
>>> goto out1;
>>>
>>> + mnt_userns = mnt_user_ns(path.mnt);
>>> + inode_sgid_strip(mnt_userns, path.dentry->d_inode,&mode);
>>> if (!IS_POSIXACL(path.dentry->d_inode))
>>> mode&= ~current_umask();
>>
>> It would be worth to add another helper prepare_mode() that calls
>> inode_sgid_strip() and does the umask stripping as well and then call it
>> in all these places. You should even call it in do_mkdirat() since
>> inode_sgid_strip() will skip directories anyway. This will enforce the
>> same ordering for all relevant operations and it will make the code more
>> uniform and easier to understand.
> Sound reasonable. I will think of it.
>>
>>> error = security_path_mknod(&path, dentry, mode, dev);
>>> if (error)
>>> goto out2;
>>>
>>> - mnt_userns = mnt_user_ns(path.mnt);
>>> switch (mode& S_IFMT) {
>>> case 0: case S_IFREG:
>>> error = vfs_create(mnt_userns, path.dentry->d_inode,
>>> diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
>>> index c75fd54b9185..f1d626697302 100644
>>> --- a/fs/ocfs2/namei.c
>>> +++ b/fs/ocfs2/namei.c
>>> @@ -197,6 +197,7 @@ static struct inode *ocfs2_get_init_inode(struct
>>> inode *dir, umode_t mode)
>>> * callers. */
>>> if (S_ISDIR(mode))
>>> set_nlink(inode, 2);
>>> + inode_sgid_strip(&init_user_ns, dir,&mode);
>>> inode_init_owner(&init_user_ns, inode, dir, mode);
>>> status = dquot_initialize(inode);
>>> if (status)
>>> --
>>> 2.27.0
>>>
>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] vfs: strip file's S_ISGID mode on vfs instead of on underlying filesystem
  2022-04-15  3:14     ` xuyang2018.jy
  2022-04-15  9:06       ` xuyang2018.jy
@ 2022-04-15 14:02       ` Christian Brauner
  2022-04-19  5:44         ` xuyang2018.jy
  1 sibling, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2022-04-15 14:02 UTC (permalink / raw)
  To: xuyang2018.jy
  Cc: djwong, linux-fsdevel, ceph-devel, ocfs2-devel, viro, david, jlayton

On Fri, Apr 15, 2022 at 03:14:53AM +0000, xuyang2018.jy@fujitsu.com wrote:
> on 2022/4/14 20:45, Christian Brauner wrote:
> > On Thu, Apr 14, 2022 at 03:57:18PM +0800, Yang Xu wrote:
> >> Currently, vfs only passes mode argument to filesystem, then use inode_init_owner()
> >> to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner
> >> firstly, then posxi acl setup, but xfs uses the contrary order. It will affect
> >> S_ISGID clear especially we filter S_IXGRP by umask or acl.
> >>
> >> Regardless of which filesystem is in use, failure to strip the SGID correctly is
> >> considered a security failure that needs to be fixed. The current VFS infrastructure
> >> requires the filesystem to do everything right and not step on any landmines to
> >> strip the SGID bit, when in fact it can easily be done at the VFS and the filesystems
> >> then don't even need to be aware that the SGID needs to be (or has been stripped) by
> >> the operation the user asked to be done.
> >>
> >> Vfs has all the info it needs - it doesn't need the filesystems to do everything
> >> correctly with the mode and ensuring that they order things like posix acl setup
> >> functions correctly with inode_init_owner() to strip the SGID bit.
> >>
> >> Just strip the SGID bit at the VFS, and then the filesystems can't get it wrong.
> >>
> >> Also, the inode_sgid_strip() api should be used before IS_POSIXACL() because
> >> this api may change mode.
> >>
> >> Only the following places use inode_init_owner
> >> "hugetlbfs/inode.c:846:          inode_init_owner(&init_user_ns, inode, dir, mode);
> >>   nilfs2/inode.c:354:     inode_init_owner(&init_user_ns, inode, dir, mode);
> >>   zonefs/super.c:1289:    inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555);
> >>   reiserfs/namei.c:619:   inode_init_owner(&init_user_ns, inode, dir, mode);
> >>   jfs/jfs_inode.c:67:     inode_init_owner(&init_user_ns, inode, parent, mode);
> >>   f2fs/namei.c:50:        inode_init_owner(mnt_userns, inode, dir, mode);
> >>   ext2/ialloc.c:549:              inode_init_owner(&init_user_ns, inode, dir, mode);
> >>   overlayfs/dir.c:643:    inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode);
> >>   ufs/ialloc.c:292:       inode_init_owner(&init_user_ns, inode, dir, mode);
> >>   ntfs3/inode.c:1283:     inode_init_owner(mnt_userns, inode, dir, mode);
> >>   ramfs/inode.c:64:               inode_init_owner(&init_user_ns, inode, dir, mode);
> >>   9p/vfs_inode.c:263:     inode_init_owner(&init_user_ns, inode, NULL, mode);
> >>   btrfs/tests/btrfs-tests.c:65:   inode_init_owner(&init_user_ns, inode, NULL, S_IFREG);
> >>   btrfs/inode.c:6215:     inode_init_owner(mnt_userns, inode, dir, mode);
> >>   sysv/ialloc.c:166:      inode_init_owner(&init_user_ns, inode, dir, mode);
> >>   omfs/inode.c:51:        inode_init_owner(&init_user_ns, inode, NULL, mode);
> >>   ubifs/dir.c:97: inode_init_owner(&init_user_ns, inode, dir, mode);
> >>   udf/ialloc.c:108:       inode_init_owner(&init_user_ns, inode, dir, mode);
> >>   ext4/ialloc.c:979:              inode_init_owner(mnt_userns, inode, dir, mode);
> >>   hfsplus/inode.c:393:    inode_init_owner(&init_user_ns, inode, dir, mode);
> >>   xfs/xfs_inode.c:840:            inode_init_owner(mnt_userns, inode, dir, mode);
> >>   ocfs2/dlmfs/dlmfs.c:331:                inode_init_owner(&init_user_ns, inode, NULL, mode);
> >>   ocfs2/dlmfs/dlmfs.c:354:        inode_init_owner(&init_user_ns, inode, parent, mode);
> >>   ocfs2/namei.c:200:      inode_init_owner(&init_user_ns, inode, dir, mode);
> >>   minix/bitmap.c:255:     inode_init_owner(&init_user_ns, inode, dir, mode);
> >>   bfs/dir.c:99:   inode_init_owner(&init_user_ns, inode, dir, mode);
> >> "
> >
> > For completeness sake, there's also spufs which doesn't really go
> > through the regular VFS callpath because it has separate system calls
> > like:
> >
> > SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
> > 	umode_t, mode, int, neighbor_fd)
> >
> > but looking through the code it only allows the creation of directories and only
> > allows bits in 0777.
> IMO, this fs also doesn't use inode_init_owner, so it should be not 
> affected. We add indo_sgid_strip into vfs, IMO, it only happen new sgid 
> strip situation and doesn't happen to remove old sgid strip situation.
> So I think it is "safe".

It does. The callchain spu_create() with SP_CREATE_GANG ends up in
spufs_mkgang() which calls inode_init_owner(). But as I said it's not a
problem since this only creates directories anyway.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] vfs: strip file's S_ISGID mode on vfs instead of on underlying filesystem
  2022-04-15  9:06       ` xuyang2018.jy
@ 2022-04-15 14:03         ` Christian Brauner
  0 siblings, 0 replies; 16+ messages in thread
From: Christian Brauner @ 2022-04-15 14:03 UTC (permalink / raw)
  To: xuyang2018.jy
  Cc: djwong, linux-fsdevel, ceph-devel, ocfs2-devel, viro, david, jlayton

On Fri, Apr 15, 2022 at 09:06:25AM +0000, xuyang2018.jy@fujitsu.com wrote:
> on 2022/4/15 12:15, Yang Xu wrote:
> > on 2022/4/14 20:45, Christian Brauner wrote:
> >> On Thu, Apr 14, 2022 at 03:57:18PM +0800, Yang Xu wrote:
> >>> Currently, vfs only passes mode argument to filesystem, then use
> >>> inode_init_owner()
> >>> to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call
> >>> inode_init_owner
> >>> firstly, then posxi acl setup, but xfs uses the contrary order. It
> >>> will affect
> >>> S_ISGID clear especially we filter S_IXGRP by umask or acl.
> >>>
> >>> Regardless of which filesystem is in use, failure to strip the SGID
> >>> correctly is
> >>> considered a security failure that needs to be fixed. The current VFS
> >>> infrastructure
> >>> requires the filesystem to do everything right and not step on any
> >>> landmines to
> >>> strip the SGID bit, when in fact it can easily be done at the VFS and
> >>> the filesystems
> >>> then don't even need to be aware that the SGID needs to be (or has
> >>> been stripped) by
> >>> the operation the user asked to be done.
> >>>
> >>> Vfs has all the info it needs - it doesn't need the filesystems to do
> >>> everything
> >>> correctly with the mode and ensuring that they order things like
> >>> posix acl setup
> >>> functions correctly with inode_init_owner() to strip the SGID bit.
> >>>
> >>> Just strip the SGID bit at the VFS, and then the filesystems can't
> >>> get it wrong.
> >>>
> >>> Also, the inode_sgid_strip() api should be used before IS_POSIXACL()
> >>> because
> >>> this api may change mode.
> >>>
> >>> Only the following places use inode_init_owner
> >>> "hugetlbfs/inode.c:846: inode_init_owner(&init_user_ns, inode, dir,
> >>> mode);
> >>> nilfs2/inode.c:354: inode_init_owner(&init_user_ns, inode, dir, mode);
> >>> zonefs/super.c:1289: inode_init_owner(&init_user_ns, inode, parent,
> >>> S_IFDIR | 0555);
> >>> reiserfs/namei.c:619: inode_init_owner(&init_user_ns, inode, dir, mode);
> >>> jfs/jfs_inode.c:67: inode_init_owner(&init_user_ns, inode, parent,
> >>> mode);
> >>> f2fs/namei.c:50: inode_init_owner(mnt_userns, inode, dir, mode);
> >>> ext2/ialloc.c:549: inode_init_owner(&init_user_ns, inode, dir, mode);
> >>> overlayfs/dir.c:643: inode_init_owner(&init_user_ns, inode,
> >>> dentry->d_parent->d_inode, mode);
> >>> ufs/ialloc.c:292: inode_init_owner(&init_user_ns, inode, dir, mode);
> >>> ntfs3/inode.c:1283: inode_init_owner(mnt_userns, inode, dir, mode);
> >>> ramfs/inode.c:64: inode_init_owner(&init_user_ns, inode, dir, mode);
> >>> 9p/vfs_inode.c:263: inode_init_owner(&init_user_ns, inode, NULL, mode);
> >>> btrfs/tests/btrfs-tests.c:65: inode_init_owner(&init_user_ns, inode,
> >>> NULL, S_IFREG);
> >>> btrfs/inode.c:6215: inode_init_owner(mnt_userns, inode, dir, mode);
> >>> sysv/ialloc.c:166: inode_init_owner(&init_user_ns, inode, dir, mode);
> >>> omfs/inode.c:51: inode_init_owner(&init_user_ns, inode, NULL, mode);
> >>> ubifs/dir.c:97: inode_init_owner(&init_user_ns, inode, dir, mode);
> >>> udf/ialloc.c:108: inode_init_owner(&init_user_ns, inode, dir, mode);
> >>> ext4/ialloc.c:979: inode_init_owner(mnt_userns, inode, dir, mode);
> >>> hfsplus/inode.c:393: inode_init_owner(&init_user_ns, inode, dir, mode);
> >>> xfs/xfs_inode.c:840: inode_init_owner(mnt_userns, inode, dir, mode);
> >>> ocfs2/dlmfs/dlmfs.c:331: inode_init_owner(&init_user_ns, inode, NULL,
> >>> mode);
> >>> ocfs2/dlmfs/dlmfs.c:354: inode_init_owner(&init_user_ns, inode,
> >>> parent, mode);
> >>> ocfs2/namei.c:200: inode_init_owner(&init_user_ns, inode, dir, mode);
> >>> minix/bitmap.c:255: inode_init_owner(&init_user_ns, inode, dir, mode);
> >>> bfs/dir.c:99: inode_init_owner(&init_user_ns, inode, dir, mode);
> >>> "
> >>
> >> For completeness sake, there's also spufs which doesn't really go
> >> through the regular VFS callpath because it has separate system calls
> >> like:
> >>
> >> SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int,
> >> flags,
> >> umode_t, mode, int, neighbor_fd)
> >>
> >> but looking through the code it only allows the creation of
> >> directories and only
> >> allows bits in 0777.
> > IMO, this fs also doesn't use inode_init_owner, so it should be not
> > affected. We add indo_sgid_strip into vfs, IMO, it only happen new sgid
> > strip situation and doesn't happen to remove old sgid strip situation.
> > So I think it is "safe".
> >>
> >>>
> >>> They are used in filesystem init new inode function and these init
> >>> inode functions are used
> >>> by following operations:
> >>> mkdir
> >>> symlink
> >>> mknod
> >>> create
> >>> tmpfile
> >>> rename
> >>>
> >>> We don't care about mkdir because we don't strip SGID bit for
> >>> directory except fs.xfs.irix_sgid_inherit.
> >>> symlink and rename only use valid mode that doesn't have SGID bit.
> >>>
> >>> We have added inode_sgid_strip api for the remaining operations.
> >>>
> >>> In addition to the above six operations, two filesystems has a little
> >>> difference
> >>> 1) btrfs has btrfs_create_subvol_root to create new inode but used
> >>> non SGID bit mode and can ignore
> >>> 2) ocfs2 reflink function should add inode_sgid_strip api manually
> >>> because we don't add it in vfs
> >>>
> >>> Last but not least, this patch also changed grpid behaviour for
> >>> ext4/xfs because the mode passed to
> >>> them may been changed by inode_sgid_strip.
> >>
> >> I think the patch itself is useful as it would move a security sensitive
> >> operation that is currently burried in individual filesystems into the
> >> vfs layer. But it has a decent regression potential since it might trip
> >> filesystems that have so far relied on getting the S_ISGID bit with a
> >> mode argument. The example being network filesystems that Jeff brought
> >> up earlier. So this needs a lot of testing and long exposure in -next
> >> for at least one full kernel cycle imho.
> > Agreed.
> >>
> >>>
> >>> Suggested-by: Dave Chinner<david@fromorbit.com>
> >>> Signed-off-by: Yang Xu<xuyang2018.jy@fujitsu.com>
> >>> ---
> >>> fs/inode.c | 4 ----
> >>> fs/namei.c | 5 ++++-
> >>> fs/ocfs2/namei.c | 1 +
> >>> 3 files changed, 5 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/fs/inode.c b/fs/inode.c
> >>> index d63264998855..b08bdd73e116 100644
> >>> --- a/fs/inode.c
> >>> +++ b/fs/inode.c
> >>> @@ -2246,10 +2246,6 @@ void inode_init_owner(struct user_namespace
> >>> *mnt_userns, struct inode *inode,
> >>> /* Directories are special, and always inherit S_ISGID */
> >>> if (S_ISDIR(mode))
> >>> mode |= S_ISGID;
> >>> - else if ((mode& (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)&&
> >>> - !in_group_p(i_gid_into_mnt(mnt_userns, dir))&&
> >>> - !capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
> >>> - mode&= ~S_ISGID;
> >>> } else
> >>> inode_fsgid_set(inode, mnt_userns);
> >>> inode->i_mode = mode;
> >>> diff --git a/fs/namei.c b/fs/namei.c
> >>> index 3f1829b3ab5b..e03f7defdd30 100644
> >>> --- a/fs/namei.c
> >>> +++ b/fs/namei.c
> >>> @@ -3287,6 +3287,7 @@ static struct dentry *lookup_open(struct
> >>> nameidata *nd, struct file *file,
> >>> if (open_flag& O_CREAT) {
> >>> if (open_flag& O_EXCL)
> >>> open_flag&= ~O_TRUNC;
> >>> + inode_sgid_strip(mnt_userns, dir->d_inode,&mode);
> >>> if (!IS_POSIXACL(dir->d_inode))
> >>> mode&= ~current_umask();
> >>> if (likely(got_write))
> >>> @@ -3521,6 +3522,7 @@ struct dentry *vfs_tmpfile(struct
> >>> user_namespace *mnt_userns,
> >>> child = d_alloc(dentry,&slash_name);
> >>> if (unlikely(!child))
> >>> goto out_err;
> >>> + inode_sgid_strip(mnt_userns, dir,&mode);
> >>
> >> Hm, an additional question: how is umask stripping currently handled in
> >> vfs_tmpfile()? I don't see it anywhere. That seems like a bug?
> > Yes, I think it is a bug.
> Since you found this bug and I have finished my v3 kernel patch set(also 
> fix this tmpfile umask problem and add your reported-by, also two 
> patches about other problem in xfs/nfs ), so do you will fix this kernel 
> bug or I send a v3 directly?

You can fix it as part of your series and I see you've already included
it in v3 anyway.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] vfs: strip file's S_ISGID mode on vfs instead of on underlying filesystem
  2022-04-15 14:02       ` Christian Brauner
@ 2022-04-19  5:44         ` xuyang2018.jy
  0 siblings, 0 replies; 16+ messages in thread
From: xuyang2018.jy @ 2022-04-19  5:44 UTC (permalink / raw)
  To: Christian Brauner
  Cc: djwong, linux-fsdevel, ceph-devel, ocfs2-devel, viro, david, jlayton

on 2022/4/15 22:02, Christian Brauner wrote:
> On Fri, Apr 15, 2022 at 03:14:53AM +0000, xuyang2018.jy@fujitsu.com wrote:
>> on 2022/4/14 20:45, Christian Brauner wrote:
>>> On Thu, Apr 14, 2022 at 03:57:18PM +0800, Yang Xu wrote:
>>>> Currently, vfs only passes mode argument to filesystem, then use inode_init_owner()
>>>> to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner
>>>> firstly, then posxi acl setup, but xfs uses the contrary order. It will affect
>>>> S_ISGID clear especially we filter S_IXGRP by umask or acl.
>>>>
>>>> Regardless of which filesystem is in use, failure to strip the SGID correctly is
>>>> considered a security failure that needs to be fixed. The current VFS infrastructure
>>>> requires the filesystem to do everything right and not step on any landmines to
>>>> strip the SGID bit, when in fact it can easily be done at the VFS and the filesystems
>>>> then don't even need to be aware that the SGID needs to be (or has been stripped) by
>>>> the operation the user asked to be done.
>>>>
>>>> Vfs has all the info it needs - it doesn't need the filesystems to do everything
>>>> correctly with the mode and ensuring that they order things like posix acl setup
>>>> functions correctly with inode_init_owner() to strip the SGID bit.
>>>>
>>>> Just strip the SGID bit at the VFS, and then the filesystems can't get it wrong.
>>>>
>>>> Also, the inode_sgid_strip() api should be used before IS_POSIXACL() because
>>>> this api may change mode.
>>>>
>>>> Only the following places use inode_init_owner
>>>> "hugetlbfs/inode.c:846:          inode_init_owner(&init_user_ns, inode, dir, mode);
>>>>    nilfs2/inode.c:354:     inode_init_owner(&init_user_ns, inode, dir, mode);
>>>>    zonefs/super.c:1289:    inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555);
>>>>    reiserfs/namei.c:619:   inode_init_owner(&init_user_ns, inode, dir, mode);
>>>>    jfs/jfs_inode.c:67:     inode_init_owner(&init_user_ns, inode, parent, mode);
>>>>    f2fs/namei.c:50:        inode_init_owner(mnt_userns, inode, dir, mode);
>>>>    ext2/ialloc.c:549:              inode_init_owner(&init_user_ns, inode, dir, mode);
>>>>    overlayfs/dir.c:643:    inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode);
>>>>    ufs/ialloc.c:292:       inode_init_owner(&init_user_ns, inode, dir, mode);
>>>>    ntfs3/inode.c:1283:     inode_init_owner(mnt_userns, inode, dir, mode);
>>>>    ramfs/inode.c:64:               inode_init_owner(&init_user_ns, inode, dir, mode);
>>>>    9p/vfs_inode.c:263:     inode_init_owner(&init_user_ns, inode, NULL, mode);
>>>>    btrfs/tests/btrfs-tests.c:65:   inode_init_owner(&init_user_ns, inode, NULL, S_IFREG);
>>>>    btrfs/inode.c:6215:     inode_init_owner(mnt_userns, inode, dir, mode);
>>>>    sysv/ialloc.c:166:      inode_init_owner(&init_user_ns, inode, dir, mode);
>>>>    omfs/inode.c:51:        inode_init_owner(&init_user_ns, inode, NULL, mode);
>>>>    ubifs/dir.c:97: inode_init_owner(&init_user_ns, inode, dir, mode);
>>>>    udf/ialloc.c:108:       inode_init_owner(&init_user_ns, inode, dir, mode);
>>>>    ext4/ialloc.c:979:              inode_init_owner(mnt_userns, inode, dir, mode);
>>>>    hfsplus/inode.c:393:    inode_init_owner(&init_user_ns, inode, dir, mode);
>>>>    xfs/xfs_inode.c:840:            inode_init_owner(mnt_userns, inode, dir, mode);
>>>>    ocfs2/dlmfs/dlmfs.c:331:                inode_init_owner(&init_user_ns, inode, NULL, mode);
>>>>    ocfs2/dlmfs/dlmfs.c:354:        inode_init_owner(&init_user_ns, inode, parent, mode);
>>>>    ocfs2/namei.c:200:      inode_init_owner(&init_user_ns, inode, dir, mode);
>>>>    minix/bitmap.c:255:     inode_init_owner(&init_user_ns, inode, dir, mode);
>>>>    bfs/dir.c:99:   inode_init_owner(&init_user_ns, inode, dir, mode);
>>>> "
>>>
>>> For completeness sake, there's also spufs which doesn't really go
>>> through the regular VFS callpath because it has separate system calls
>>> like:
>>>
>>> SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
>>> 	umode_t, mode, int, neighbor_fd)
>>>
>>> but looking through the code it only allows the creation of directories and only
>>> allows bits in 0777.
>> IMO, this fs also doesn't use inode_init_owner, so it should be not
>> affected. We add indo_sgid_strip into vfs, IMO, it only happen new sgid
>> strip situation and doesn't happen to remove old sgid strip situation.
>> So I think it is "safe".
>
> It does. The callchain spu_create() with SP_CREATE_GANG ends up in
> spufs_mkgang() which calls inode_init_owner(). But as I said it's not a
> problem since this only creates directories anyway.
Sorry, I miss this message before.
Oh, Yes, I only search inode_init_owner in linux/fs directory instead of 
the whole linux source directory.

It seems I also miss bpf and shmem.

kernel/bpf/inode.c:     inode_init_owner(&init_user_ns, inode, dir, mode);
mm/shmem.c:             inode_init_owner(&init_user_ns, inode, dir, mode);
arch/powerpc/platforms/cell/spufs/inode.c: 
inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR);
arch/powerpc/platforms/cell/spufs/inode.c: 
inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR)

bpf use vfs_mkobj in bpf_obj_do_pin with "S_IFREG | ((S_IRUSR | S_IWUSR) 
& ~current_umask()) mode and use bpf_mkobj_ops in 
bpf_iter_link_pin_kernel with S_IFREG | S_IRUSR; , so bpf is also not 
affected.

Also shmem used standard vfs api, so it is not affected. I will add the 
three missing things(spufs, bpf, shmem) in my commit message.

Best Regards
Yang Xu
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-04-19  5:44 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-14  7:57 [PATCH v2 1/3] vfs: Add inode_sgid_strip() api Yang Xu
2022-04-14  7:57 ` [PATCH v2 2/3] vfs: strip file's S_ISGID mode on vfs instead of on underlying filesystem Yang Xu
2022-04-14 12:45   ` Christian Brauner
2022-04-15  3:14     ` xuyang2018.jy
2022-04-15  9:06       ` xuyang2018.jy
2022-04-15 14:03         ` Christian Brauner
2022-04-15 14:02       ` Christian Brauner
2022-04-19  5:44         ` xuyang2018.jy
2022-04-14  7:57 ` [PATCH v2 3/3] ceph: Remove S_ISGID clear code in ceph_finish_async_create Yang Xu
2022-04-14 12:02 ` [PATCH v2 1/3] vfs: Add inode_sgid_strip() api Christian Brauner
2022-04-15  1:39   ` xuyang2018.jy
2022-04-14 15:57 ` [Ocfs2-devel] " Darrick J. Wong via Ocfs2-devel
2022-04-14 15:57   ` Darrick J. Wong
2022-04-15  1:18   ` xuyang2018.jy
2022-04-15  1:40     ` Darrick J. Wong
2022-04-15  1:40       ` [Ocfs2-devel] " Darrick J. Wong via Ocfs2-devel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.