All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip
  2022-04-25  3:09 [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip Yang Xu
@ 2022-04-25  2:45 ` Matthew Wilcox
  2022-04-25  3:08   ` xuyang2018.jy
  2022-04-25  3:09 ` [PATCH v6 2/4] fs: Add missing umask strip in vfs_tmpfile Yang Xu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2022-04-25  2:45 UTC (permalink / raw)
  To: Yang Xu; +Cc: linux-fsdevel, ceph-devel, viro, david, djwong, brauner, jlayton

On Mon, Apr 25, 2022 at 11:09:38AM +0800, Yang Xu wrote:
> This has no functional change. Just create and export inode_sgid_strip
> api for the subsequent patch. This function is used to strip inode's
> S_ISGID mode when init a new inode.

Why would you call this inode_sgid_strip() instead of
inode_strip_sgid()?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip
  2022-04-25  2:45 ` Matthew Wilcox
@ 2022-04-25  3:08   ` xuyang2018.jy
  2022-04-25 11:29     ` Christian Brauner
  0 siblings, 1 reply; 13+ messages in thread
From: xuyang2018.jy @ 2022-04-25  3:08 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-fsdevel, ceph-devel, viro, david, djwong, brauner, jlayton

on 2022/4/25 10:45, Matthew Wilcox wrote:
> On Mon, Apr 25, 2022 at 11:09:38AM +0800, Yang Xu wrote:
>> This has no functional change. Just create and export inode_sgid_strip
>> api for the subsequent patch. This function is used to strip inode's
>> S_ISGID mode when init a new inode.
>
> Why would you call this inode_sgid_strip() instead of
> inode_strip_sgid()?

Because I treated "inode sgid(inode's sgid)" as a whole.

inode_strip_sgid sounds also ok, but now seems strip_inode_sgid seem 
more clear because we strip inode sgid depend on not only inode's 
condition but also depend on parent directory's condition.

What do you think about this?

ps: I can aceept the above several way, so if you insist, I can change 
it to inode_strip_sgid.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip
@ 2022-04-25  3:09 Yang Xu
  2022-04-25  2:45 ` Matthew Wilcox
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Yang Xu @ 2022-04-25  3:09 UTC (permalink / raw)
  To: linux-fsdevel, ceph-devel
  Cc: viro, david, djwong, brauner, willy, jlayton, Yang Xu

This has no functional change. Just create and export inode_sgid_strip
api for the subsequent patch. This function is used to strip inode's
S_ISGID mode when init a new inode.

Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
---
 fs/inode.c         | 37 +++++++++++++++++++++++++++++++++----
 include/linux/fs.h |  2 ++
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 9d9b422504d1..78e7ef567e04 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2246,10 +2246,8 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
 		/* Directories are special, and always inherit S_ISGID */
 		if (S_ISDIR(mode))
 			mode |= S_ISGID;
-		else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
-			 !in_group_p(i_gid_into_mnt(mnt_userns, dir)) &&
-			 !capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
-			mode &= ~S_ISGID;
+		else
+			mode = inode_sgid_strip(mnt_userns, dir, mode);
 	} else
 		inode_fsgid_set(inode, mnt_userns);
 	inode->i_mode = mode;
@@ -2405,3 +2403,34 @@ struct timespec64 current_time(struct inode *inode)
 	return timestamp_truncate(now, inode);
 }
 EXPORT_SYMBOL(current_time);
+
+/**
+ * inode_sgid_strip - handle the sgid bit for non-directories
+ * @mnt_userns: User namespace of the mount the inode was created from
+ * @dir: parent directory inode
+ * @mode: mode of the file to be created in @dir
+ *
+ * If the @mode of the new file has both the S_ISGID and S_IXGRP bit
+ * raised and @dir has the S_ISGID bit raised ensure that the caller is
+ * either in the group of the parent directory or they have CAP_FSETID
+ * in their user namespace and are privileged over the parent directory.
+ * In all other cases, strip the S_ISGID bit from @mode.
+ *
+ * Return: the new mode to use for the file
+ */
+umode_t inode_sgid_strip(struct user_namespace *mnt_userns,
+			 const struct inode *dir, umode_t mode)
+{
+	if (S_ISDIR(mode) || !dir || !(dir->i_mode & S_ISGID))
+		return mode;
+	if ((mode & (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
+		return mode;
+	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
+		return mode;
+	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
+		return mode;
+
+	mode &= ~S_ISGID;
+	return mode;
+}
+EXPORT_SYMBOL(inode_sgid_strip);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bbde95387a23..532de76c9b91 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1897,6 +1897,8 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
 void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
 		      const struct inode *dir, umode_t mode);
 extern bool may_open_dev(const struct path *path);
+umode_t inode_sgid_strip(struct user_namespace *mnt_userns,
+			 const struct inode *dir, umode_t mode);
 
 /*
  * This is the "filldir" function type, used by readdir() to let
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 2/4] fs: Add missing umask strip in vfs_tmpfile
  2022-04-25  3:09 [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip Yang Xu
  2022-04-25  2:45 ` Matthew Wilcox
@ 2022-04-25  3:09 ` Yang Xu
  2022-04-25 16:54   ` Darrick J. Wong
  2022-04-25  3:09 ` [PATCH v6 3/4] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem Yang Xu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Yang Xu @ 2022-04-25  3:09 UTC (permalink / raw)
  To: linux-fsdevel, ceph-devel
  Cc: viro, david, djwong, brauner, willy, jlayton, Yang Xu, stable

All creation paths except for O_TMPFILE handle umask in the vfs directly
if the filesystem doesn't support or enable POSIX ACLs. If the filesystem
does then umask handling is deferred until posix_acl_create().
Because, O_TMPFILE misses umask handling in the vfs it will not honor
umask settings. Fix this by adding the missing umask handling.

Fixes: 60545d0d4610 ("[O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now...")
Cc: <stable@vger.kernel.org> # 4.19+
Reported-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
---
 fs/namei.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index 509657fdf4f5..73646e28fae0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3521,6 +3521,8 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns,
 	child = d_alloc(dentry, &slash_name);
 	if (unlikely(!child))
 		goto out_err;
+	if (!IS_POSIXACL(dir))
+		mode &= ~current_umask();
 	error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
 	if (error)
 		goto out_err;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 3/4] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem
  2022-04-25  3:09 [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip Yang Xu
  2022-04-25  2:45 ` Matthew Wilcox
  2022-04-25  3:09 ` [PATCH v6 2/4] fs: Add missing umask strip in vfs_tmpfile Yang Xu
@ 2022-04-25  3:09 ` Yang Xu
  2022-04-25 16:51   ` Darrick J. Wong
  2022-04-25  3:09 ` [PATCH v6 4/4] ceph: Remove S_ISGID stripping code in ceph_finish_async_create Yang Xu
  2022-04-25 16:53 ` [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip Darrick J. Wong
  4 siblings, 1 reply; 13+ messages in thread
From: Yang Xu @ 2022-04-25  3:09 UTC (permalink / raw)
  To: linux-fsdevel, ceph-devel
  Cc: viro, david, djwong, brauner, willy, jlayton, Yang Xu

Currently, vfs only passes mode argument to filesystem, then use inode_init_owner()
to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner
firstly, then posxi acl setup, but xfs uses the contrary order. It will
affect S_ISGID clear especially we filter S_IXGRP by umask or acl.

Regardless of which filesystem is in use, failure to strip the SGID correctly
is considered a security failure that needs to be fixed. The current VFS
infrastructure requires the filesystem to do everything right and not step on
any landmines to strip the SGID bit, when in fact it can easily be done at the
VFS and the filesystems then don't even need to be aware that the SGID needs
to be (or has been stripped) by the operation the user asked to be done.

Vfs has all the info it needs - it doesn't need the filesystems to do everything
correctly with the mode and ensuring that they order things like posix acl setup
functions correctly with inode_init_owner() to strip the SGID bit.

Just strip the SGID bit at the VFS, and then the filesystem can't get it wrong.

Also, the inode_sgid_strip() api should be used before IS_POSIXACL() because
this api may change mode.

Only the following places use inode_init_owner
"
arch/powerpc/platforms/cell/spufs/inode.c:      inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR);
arch/powerpc/platforms/cell/spufs/inode.c:      inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR);
fs/9p/vfs_inode.c:      inode_init_owner(&init_user_ns, inode, NULL, mode);
fs/bfs/dir.c:   inode_init_owner(&init_user_ns, inode, dir, mode);
fs/btrfs/inode.c:       inode_init_owner(mnt_userns, inode, dir, mode);
fs/btrfs/tests/btrfs-tests.c:   inode_init_owner(&init_user_ns, inode, NULL, S_IFREG);
fs/ext2/ialloc.c:               inode_init_owner(&init_user_ns, inode, dir, mode);
fs/ext4/ialloc.c:               inode_init_owner(mnt_userns, inode, dir, mode);
fs/f2fs/namei.c:        inode_init_owner(mnt_userns, inode, dir, mode);
fs/hfsplus/inode.c:     inode_init_owner(&init_user_ns, inode, dir, mode);
fs/hugetlbfs/inode.c:           inode_init_owner(&init_user_ns, inode, dir, mode);
fs/jfs/jfs_inode.c:     inode_init_owner(&init_user_ns, inode, parent, mode);
fs/minix/bitmap.c:      inode_init_owner(&init_user_ns, inode, dir, mode);
fs/nilfs2/inode.c:      inode_init_owner(&init_user_ns, inode, dir, mode);
fs/ntfs3/inode.c:       inode_init_owner(mnt_userns, inode, dir, mode);
fs/ocfs2/dlmfs/dlmfs.c:         inode_init_owner(&init_user_ns, inode, NULL, mode);
fs/ocfs2/dlmfs/dlmfs.c: inode_init_owner(&init_user_ns, inode, parent, mode);
fs/ocfs2/namei.c:       inode_init_owner(&init_user_ns, inode, dir, mode);
fs/omfs/inode.c:        inode_init_owner(&init_user_ns, inode, NULL, mode);
fs/overlayfs/dir.c:     inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode);
fs/ramfs/inode.c:               inode_init_owner(&init_user_ns, inode, dir, mode);
fs/reiserfs/namei.c:    inode_init_owner(&init_user_ns, inode, dir, mode);
fs/sysv/ialloc.c:       inode_init_owner(&init_user_ns, inode, dir, mode);
fs/ubifs/dir.c: inode_init_owner(&init_user_ns, inode, dir, mode);
fs/udf/ialloc.c:        inode_init_owner(&init_user_ns, inode, dir, mode);
fs/ufs/ialloc.c:        inode_init_owner(&init_user_ns, inode, dir, mode);
fs/xfs/xfs_inode.c:             inode_init_owner(mnt_userns, inode, dir, mode);
fs/zonefs/super.c:      inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555);
kernel/bpf/inode.c:     inode_init_owner(&init_user_ns, inode, dir, mode);
mm/shmem.c:             inode_init_owner(&init_user_ns, inode, dir, mode);
"

They are used in filesystem to init new inode function and these init inode
functions are used by following operations:
mkdir
symlink
mknod
create
tmpfile
rename

We don't care about mkdir because we don't strip SGID bit for directory except
fs.xfs.irix_sgid_inherit. But we even call prepare_mode() in do_mkdirat() since
inode_sgid_strip() will skip directories anyway. This will enforce the same
ordering for all relevant operations and it will make the code more uniform and
easier to understand by using new helper prepare_mode().

symlink and rename only use valid mode that doesn't have SGID bit.

We have added inode_sgid_strip api for the remaining operations.

In addition to the above six operations, four filesystems has a little difference
1) btrfs has btrfs_create_subvol_root to create new inode but used non SGID bit
   mode and can ignore
2) ocfs2 reflink function should add inode_sgid_strip api manually because this ioctl
   is only useful when backport reflink features to old kernels. ocfs2 still use vfs
   remap_range code to do reflink.
3) spufs which doesn't really go hrough the regular VFS callpath because it has
   separate system call spu_create, but it t only allows the creation of
   directories and only allows bits in 0777 and can ignore
4) bpf use vfs_mkobj in bpf_obj_do_pin with
   "S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask()) mode and
   use bpf_mkobj_ops in bpf_iter_link_pin_kernel with S_IFREG | S_IRUSR mode,
   so bpf is also not affected

This patch also changed grpid behaviour for ext4/xfs because the mode passed to
them may been changed by inode_sgid_strip.

Also as Christian Brauner said"
The patch itself is useful as it would move a security sensitive operation that is
currently burried in individual filesystems into the vfs layer. But it has a decent
regression  potential since it might strip filesystems that have so far relied on
getting the S_ISGID bit with a mode argument. So this needs a lot of testing and
long exposure in -next for at least one full kernel cycle."

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
---
 fs/inode.c         |  2 --
 fs/namei.c         | 22 +++++++++-------------
 fs/ocfs2/namei.c   |  1 +
 include/linux/fs.h | 11 +++++++++++
 4 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 78e7ef567e04..041c0837f248 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2246,8 +2246,6 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
 		/* Directories are special, and always inherit S_ISGID */
 		if (S_ISDIR(mode))
 			mode |= S_ISGID;
-		else
-			mode = inode_sgid_strip(mnt_userns, dir, mode);
 	} else
 		inode_fsgid_set(inode, mnt_userns);
 	inode->i_mode = mode;
diff --git a/fs/namei.c b/fs/namei.c
index 73646e28fae0..5b8e6288d503 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3287,8 +3287,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	if (open_flag & O_CREAT) {
 		if (open_flag & O_EXCL)
 			open_flag &= ~O_TRUNC;
-		if (!IS_POSIXACL(dir->d_inode))
-			mode &= ~current_umask();
+		mode = prepare_mode(mnt_userns, dir->d_inode, mode);
 		if (likely(got_write))
 			create_error = may_o_create(mnt_userns, &nd->path,
 						    dentry, mode);
@@ -3521,8 +3520,7 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns,
 	child = d_alloc(dentry, &slash_name);
 	if (unlikely(!child))
 		goto out_err;
-	if (!IS_POSIXACL(dir))
-		mode &= ~current_umask();
+	mode = prepare_mode(mnt_userns, dir, mode);
 	error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
 	if (error)
 		goto out_err;
@@ -3850,13 +3848,12 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
 	if (IS_ERR(dentry))
 		goto out1;
 
-	if (!IS_POSIXACL(path.dentry->d_inode))
-		mode &= ~current_umask();
+	mnt_userns = mnt_user_ns(path.mnt);
+	mode = prepare_mode(mnt_userns, path.dentry->d_inode, mode);
 	error = security_path_mknod(&path, dentry, mode, dev);
 	if (error)
 		goto out2;
 
-	mnt_userns = mnt_user_ns(path.mnt);
 	switch (mode & S_IFMT) {
 		case 0: case S_IFREG:
 			error = vfs_create(mnt_userns, path.dentry->d_inode,
@@ -3943,6 +3940,7 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
 	struct path path;
 	int error;
 	unsigned int lookup_flags = LOOKUP_DIRECTORY;
+	struct user_namespace *mnt_userns;
 
 retry:
 	dentry = filename_create(dfd, name, &path, lookup_flags);
@@ -3950,15 +3948,13 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
 	if (IS_ERR(dentry))
 		goto out_putname;
 
-	if (!IS_POSIXACL(path.dentry->d_inode))
-		mode &= ~current_umask();
+	mnt_userns = mnt_user_ns(path.mnt);
+	mode = prepare_mode(mnt_userns, path.dentry->d_inode, mode);
 	error = security_path_mkdir(&path, dentry, mode);
-	if (!error) {
-		struct user_namespace *mnt_userns;
-		mnt_userns = mnt_user_ns(path.mnt);
+	if (!error)
 		error = vfs_mkdir(mnt_userns, path.dentry->d_inode, dentry,
 				  mode);
-	}
+
 	done_path_create(&path, dentry);
 	if (retry_estale(error, lookup_flags)) {
 		lookup_flags |= LOOKUP_REVAL;
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index c75fd54b9185..21f3da2e66c9 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -197,6 +197,7 @@ static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode)
 	 * callers. */
 	if (S_ISDIR(mode))
 		set_nlink(inode, 2);
+	mode = inode_sgid_strip(&init_user_ns, dir, mode);
 	inode_init_owner(&init_user_ns, inode, dir, mode);
 	status = dquot_initialize(inode);
 	if (status)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 532de76c9b91..ca70cdf9c9e2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3459,6 +3459,17 @@ static inline bool dir_relax_shared(struct inode *inode)
 	return !IS_DEADDIR(inode);
 }
 
+static inline umode_t prepare_mode(struct user_namespace *mnt_userns,
+				   const struct inode *dir, umode_t mode)
+{
+	mode = inode_sgid_strip(mnt_userns, dir, mode);
+
+	if (!IS_POSIXACL(dir))
+		mode &= ~current_umask();
+
+	return mode;
+}
+
 extern bool path_noexec(const struct path *path);
 extern void inode_nohighmem(struct inode *inode);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 4/4] ceph: Remove S_ISGID stripping code in ceph_finish_async_create
  2022-04-25  3:09 [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip Yang Xu
                   ` (2 preceding siblings ...)
  2022-04-25  3:09 ` [PATCH v6 3/4] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem Yang Xu
@ 2022-04-25  3:09 ` Yang Xu
  2022-04-25 16:53 ` [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip Darrick J. Wong
  4 siblings, 0 replies; 13+ messages in thread
From: Yang Xu @ 2022-04-25  3:09 UTC (permalink / raw)
  To: linux-fsdevel, ceph-devel
  Cc: viro, david, djwong, brauner, willy, jlayton, Yang Xu

Previous patches moved sgid stripping exclusively into the vfs. So
manual sgid stripping by the filesystem isn't needed anymore.

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Christian Brauner (Microsoft)<brauner@kernel.org>
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
---
 fs/ceph/file.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 6c9e837aa1d3..8e3b99853333 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -651,10 +651,6 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
 		/* Directories always inherit the setgid bit. */
 		if (S_ISDIR(mode))
 			mode |= S_ISGID;
-		else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
-			 !in_group_p(dir->i_gid) &&
-			 !capable_wrt_inode_uidgid(&init_user_ns, dir, CAP_FSETID))
-			mode &= ~S_ISGID;
 	} else {
 		in.gid = cpu_to_le32(from_kgid(&init_user_ns, current_fsgid()));
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip
  2022-04-25  3:08   ` xuyang2018.jy
@ 2022-04-25 11:29     ` Christian Brauner
  2022-04-25 17:33       ` Matthew Wilcox
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2022-04-25 11:29 UTC (permalink / raw)
  To: xuyang2018.jy
  Cc: Matthew Wilcox, linux-fsdevel, ceph-devel, viro, david, djwong, jlayton

On Mon, Apr 25, 2022 at 03:08:36AM +0000, xuyang2018.jy@fujitsu.com wrote:
> on 2022/4/25 10:45, Matthew Wilcox wrote:
> > On Mon, Apr 25, 2022 at 11:09:38AM +0800, Yang Xu wrote:
> >> This has no functional change. Just create and export inode_sgid_strip
> >> api for the subsequent patch. This function is used to strip inode's
> >> S_ISGID mode when init a new inode.
> >
> > Why would you call this inode_sgid_strip() instead of
> > inode_strip_sgid()?
> 
> Because I treated "inode sgid(inode's sgid)" as a whole.
> 
> inode_strip_sgid sounds also ok, but now seems strip_inode_sgid seem 
> more clear because we strip inode sgid depend on not only inode's 
> condition but also depend on parent directory's condition.
> 
> What do you think about this?
> 
> ps: I can aceept the above several way, so if you insist, I can change 
> it to inode_strip_sgid.

I agree with Willy. I think inode_strip_sgid() is better. It'll be in
good company as <object>_<verb>_<what?> is pretty common:

inode_update_atime()
inode_init_once()
inode_init_owner()
inode_init_early()
inode_add_lru()
inode_needs_sync()
inode_set_flags()

Maybe mode_remove_sgid() is even better because it makes it clear that
the change happens to @mode and not @dir. But I'm fine with
inode_strip_sgid() or inode_remove_sgid() too.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/4] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem
  2022-04-25  3:09 ` [PATCH v6 3/4] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem Yang Xu
@ 2022-04-25 16:51   ` Darrick J. Wong
  2022-04-26  1:25     ` xuyang2018.jy
  0 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2022-04-25 16:51 UTC (permalink / raw)
  To: Yang Xu; +Cc: linux-fsdevel, ceph-devel, viro, david, brauner, willy, jlayton

On Mon, Apr 25, 2022 at 11:09:40AM +0800, Yang Xu wrote:
> Currently, vfs only passes mode argument to filesystem, then use inode_init_owner()
> to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner
> firstly, then posxi acl setup, but xfs uses the contrary order. It will
> affect S_ISGID clear especially we filter S_IXGRP by umask or acl.
> 
> Regardless of which filesystem is in use, failure to strip the SGID correctly
> is considered a security failure that needs to be fixed. The current VFS
> infrastructure requires the filesystem to do everything right and not step on
> any landmines to strip the SGID bit, when in fact it can easily be done at the
> VFS and the filesystems then don't even need to be aware that the SGID needs
> to be (or has been stripped) by the operation the user asked to be done.
> 
> Vfs has all the info it needs - it doesn't need the filesystems to do everything
> correctly with the mode and ensuring that they order things like posix acl setup
> functions correctly with inode_init_owner() to strip the SGID bit.
> 
> Just strip the SGID bit at the VFS, and then the filesystem can't get it wrong.
> 
> Also, the inode_sgid_strip() api should be used before IS_POSIXACL() because
> this api may change mode.
> 
> Only the following places use inode_init_owner
> "
> arch/powerpc/platforms/cell/spufs/inode.c:      inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR);
> arch/powerpc/platforms/cell/spufs/inode.c:      inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR);
> fs/9p/vfs_inode.c:      inode_init_owner(&init_user_ns, inode, NULL, mode);
> fs/bfs/dir.c:   inode_init_owner(&init_user_ns, inode, dir, mode);
> fs/btrfs/inode.c:       inode_init_owner(mnt_userns, inode, dir, mode);
> fs/btrfs/tests/btrfs-tests.c:   inode_init_owner(&init_user_ns, inode, NULL, S_IFREG);
> fs/ext2/ialloc.c:               inode_init_owner(&init_user_ns, inode, dir, mode);
> fs/ext4/ialloc.c:               inode_init_owner(mnt_userns, inode, dir, mode);
> fs/f2fs/namei.c:        inode_init_owner(mnt_userns, inode, dir, mode);
> fs/hfsplus/inode.c:     inode_init_owner(&init_user_ns, inode, dir, mode);
> fs/hugetlbfs/inode.c:           inode_init_owner(&init_user_ns, inode, dir, mode);
> fs/jfs/jfs_inode.c:     inode_init_owner(&init_user_ns, inode, parent, mode);
> fs/minix/bitmap.c:      inode_init_owner(&init_user_ns, inode, dir, mode);
> fs/nilfs2/inode.c:      inode_init_owner(&init_user_ns, inode, dir, mode);
> fs/ntfs3/inode.c:       inode_init_owner(mnt_userns, inode, dir, mode);
> fs/ocfs2/dlmfs/dlmfs.c:         inode_init_owner(&init_user_ns, inode, NULL, mode);
> fs/ocfs2/dlmfs/dlmfs.c: inode_init_owner(&init_user_ns, inode, parent, mode);
> fs/ocfs2/namei.c:       inode_init_owner(&init_user_ns, inode, dir, mode);
> fs/omfs/inode.c:        inode_init_owner(&init_user_ns, inode, NULL, mode);
> fs/overlayfs/dir.c:     inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode);
> fs/ramfs/inode.c:               inode_init_owner(&init_user_ns, inode, dir, mode);
> fs/reiserfs/namei.c:    inode_init_owner(&init_user_ns, inode, dir, mode);
> fs/sysv/ialloc.c:       inode_init_owner(&init_user_ns, inode, dir, mode);
> fs/ubifs/dir.c: inode_init_owner(&init_user_ns, inode, dir, mode);
> fs/udf/ialloc.c:        inode_init_owner(&init_user_ns, inode, dir, mode);
> fs/ufs/ialloc.c:        inode_init_owner(&init_user_ns, inode, dir, mode);
> fs/xfs/xfs_inode.c:             inode_init_owner(mnt_userns, inode, dir, mode);
> fs/zonefs/super.c:      inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555);
> kernel/bpf/inode.c:     inode_init_owner(&init_user_ns, inode, dir, mode);
> mm/shmem.c:             inode_init_owner(&init_user_ns, inode, dir, mode);
> "
> 
> They are used in filesystem to init new inode function and these init inode
> functions are used by following operations:
> mkdir
> symlink
> mknod
> create
> tmpfile
> rename
> 
> We don't care about mkdir because we don't strip SGID bit for directory except
> fs.xfs.irix_sgid_inherit. But we even call prepare_mode() in do_mkdirat() since
> inode_sgid_strip() will skip directories anyway. This will enforce the same
> ordering for all relevant operations and it will make the code more uniform and
> easier to understand by using new helper prepare_mode().
> 
> symlink and rename only use valid mode that doesn't have SGID bit.
> 
> We have added inode_sgid_strip api for the remaining operations.
> 
> In addition to the above six operations, four filesystems has a little difference
> 1) btrfs has btrfs_create_subvol_root to create new inode but used non SGID bit
>    mode and can ignore
> 2) ocfs2 reflink function should add inode_sgid_strip api manually because this ioctl
>    is only useful when backport reflink features to old kernels. ocfs2 still use vfs
>    remap_range code to do reflink.
> 3) spufs which doesn't really go hrough the regular VFS callpath because it has
>    separate system call spu_create, but it t only allows the creation of
>    directories and only allows bits in 0777 and can ignore
> 4) bpf use vfs_mkobj in bpf_obj_do_pin with
>    "S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask()) mode and
>    use bpf_mkobj_ops in bpf_iter_link_pin_kernel with S_IFREG | S_IRUSR mode,
>    so bpf is also not affected
> 
> This patch also changed grpid behaviour for ext4/xfs because the mode passed to
> them may been changed by inode_sgid_strip.
> 
> Also as Christian Brauner said"
> The patch itself is useful as it would move a security sensitive operation that is
> currently burried in individual filesystems into the vfs layer. But it has a decent
> regression  potential since it might strip filesystems that have so far relied on
> getting the S_ISGID bit with a mode argument. So this needs a lot of testing and
> long exposure in -next for at least one full kernel cycle."
> 
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
> ---
>  fs/inode.c         |  2 --
>  fs/namei.c         | 22 +++++++++-------------
>  fs/ocfs2/namei.c   |  1 +
>  include/linux/fs.h | 11 +++++++++++
>  4 files changed, 21 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 78e7ef567e04..041c0837f248 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2246,8 +2246,6 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
>  		/* Directories are special, and always inherit S_ISGID */
>  		if (S_ISDIR(mode))
>  			mode |= S_ISGID;
> -		else
> -			mode = inode_sgid_strip(mnt_userns, dir, mode);
>  	} else
>  		inode_fsgid_set(inode, mnt_userns);
>  	inode->i_mode = mode;
> diff --git a/fs/namei.c b/fs/namei.c
> index 73646e28fae0..5b8e6288d503 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3287,8 +3287,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
>  	if (open_flag & O_CREAT) {
>  		if (open_flag & O_EXCL)
>  			open_flag &= ~O_TRUNC;
> -		if (!IS_POSIXACL(dir->d_inode))
> -			mode &= ~current_umask();
> +		mode = prepare_mode(mnt_userns, dir->d_inode, mode);
>  		if (likely(got_write))
>  			create_error = may_o_create(mnt_userns, &nd->path,
>  						    dentry, mode);
> @@ -3521,8 +3520,7 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns,
>  	child = d_alloc(dentry, &slash_name);
>  	if (unlikely(!child))
>  		goto out_err;
> -	if (!IS_POSIXACL(dir))
> -		mode &= ~current_umask();
> +	mode = prepare_mode(mnt_userns, dir, mode);
>  	error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
>  	if (error)
>  		goto out_err;
> @@ -3850,13 +3848,12 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
>  	if (IS_ERR(dentry))
>  		goto out1;
>  
> -	if (!IS_POSIXACL(path.dentry->d_inode))
> -		mode &= ~current_umask();
> +	mnt_userns = mnt_user_ns(path.mnt);
> +	mode = prepare_mode(mnt_userns, path.dentry->d_inode, mode);
>  	error = security_path_mknod(&path, dentry, mode, dev);
>  	if (error)
>  		goto out2;
>  
> -	mnt_userns = mnt_user_ns(path.mnt);
>  	switch (mode & S_IFMT) {
>  		case 0: case S_IFREG:
>  			error = vfs_create(mnt_userns, path.dentry->d_inode,
> @@ -3943,6 +3940,7 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
>  	struct path path;
>  	int error;
>  	unsigned int lookup_flags = LOOKUP_DIRECTORY;
> +	struct user_namespace *mnt_userns;
>  
>  retry:
>  	dentry = filename_create(dfd, name, &path, lookup_flags);
> @@ -3950,15 +3948,13 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
>  	if (IS_ERR(dentry))
>  		goto out_putname;
>  
> -	if (!IS_POSIXACL(path.dentry->d_inode))
> -		mode &= ~current_umask();
> +	mnt_userns = mnt_user_ns(path.mnt);
> +	mode = prepare_mode(mnt_userns, path.dentry->d_inode, mode);
>  	error = security_path_mkdir(&path, dentry, mode);
> -	if (!error) {
> -		struct user_namespace *mnt_userns;
> -		mnt_userns = mnt_user_ns(path.mnt);
> +	if (!error)
>  		error = vfs_mkdir(mnt_userns, path.dentry->d_inode, dentry,
>  				  mode);
> -	}
> +
>  	done_path_create(&path, dentry);
>  	if (retry_estale(error, lookup_flags)) {
>  		lookup_flags |= LOOKUP_REVAL;
> diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
> index c75fd54b9185..21f3da2e66c9 100644
> --- a/fs/ocfs2/namei.c
> +++ b/fs/ocfs2/namei.c
> @@ -197,6 +197,7 @@ static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode)
>  	 * callers. */
>  	if (S_ISDIR(mode))
>  		set_nlink(inode, 2);
> +	mode = inode_sgid_strip(&init_user_ns, dir, mode);
>  	inode_init_owner(&init_user_ns, inode, dir, mode);
>  	status = dquot_initialize(inode);
>  	if (status)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 532de76c9b91..ca70cdf9c9e2 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3459,6 +3459,17 @@ static inline bool dir_relax_shared(struct inode *inode)
>  	return !IS_DEADDIR(inode);
>  }
>  
> +static inline umode_t prepare_mode(struct user_namespace *mnt_userns,

You probably ought to make this more obviously a *file* mode preparation
function by naming it "vfs_prepare_mode" or something.

--D

> +				   const struct inode *dir, umode_t mode)
> +{
> +	mode = inode_sgid_strip(mnt_userns, dir, mode);
> +
> +	if (!IS_POSIXACL(dir))
> +		mode &= ~current_umask();
> +
> +	return mode;
> +}
> +
>  extern bool path_noexec(const struct path *path);
>  extern void inode_nohighmem(struct inode *inode);
>  
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip
  2022-04-25  3:09 [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip Yang Xu
                   ` (3 preceding siblings ...)
  2022-04-25  3:09 ` [PATCH v6 4/4] ceph: Remove S_ISGID stripping code in ceph_finish_async_create Yang Xu
@ 2022-04-25 16:53 ` Darrick J. Wong
  4 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2022-04-25 16:53 UTC (permalink / raw)
  To: Yang Xu; +Cc: linux-fsdevel, ceph-devel, viro, david, brauner, willy, jlayton

On Mon, Apr 25, 2022 at 11:09:38AM +0800, Yang Xu wrote:
> This has no functional change. Just create and export inode_sgid_strip
> api for the subsequent patch. This function is used to strip inode's
> S_ISGID mode when init a new inode.
> 
> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>

I've a slight preference for inode_strip_sgid() as well, but otherwise
this looks like a reasonable refactoring.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/inode.c         | 37 +++++++++++++++++++++++++++++++++----
>  include/linux/fs.h |  2 ++
>  2 files changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 9d9b422504d1..78e7ef567e04 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2246,10 +2246,8 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
>  		/* Directories are special, and always inherit S_ISGID */
>  		if (S_ISDIR(mode))
>  			mode |= S_ISGID;
> -		else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
> -			 !in_group_p(i_gid_into_mnt(mnt_userns, dir)) &&
> -			 !capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
> -			mode &= ~S_ISGID;
> +		else
> +			mode = inode_sgid_strip(mnt_userns, dir, mode);
>  	} else
>  		inode_fsgid_set(inode, mnt_userns);
>  	inode->i_mode = mode;
> @@ -2405,3 +2403,34 @@ struct timespec64 current_time(struct inode *inode)
>  	return timestamp_truncate(now, inode);
>  }
>  EXPORT_SYMBOL(current_time);
> +
> +/**
> + * inode_sgid_strip - handle the sgid bit for non-directories
> + * @mnt_userns: User namespace of the mount the inode was created from
> + * @dir: parent directory inode
> + * @mode: mode of the file to be created in @dir
> + *
> + * If the @mode of the new file has both the S_ISGID and S_IXGRP bit
> + * raised and @dir has the S_ISGID bit raised ensure that the caller is
> + * either in the group of the parent directory or they have CAP_FSETID
> + * in their user namespace and are privileged over the parent directory.
> + * In all other cases, strip the S_ISGID bit from @mode.
> + *
> + * Return: the new mode to use for the file
> + */
> +umode_t inode_sgid_strip(struct user_namespace *mnt_userns,
> +			 const struct inode *dir, umode_t mode)
> +{
> +	if (S_ISDIR(mode) || !dir || !(dir->i_mode & S_ISGID))
> +		return mode;
> +	if ((mode & (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
> +		return mode;
> +	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
> +		return mode;
> +	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
> +		return mode;
> +
> +	mode &= ~S_ISGID;
> +	return mode;
> +}
> +EXPORT_SYMBOL(inode_sgid_strip);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index bbde95387a23..532de76c9b91 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1897,6 +1897,8 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
>  void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
>  		      const struct inode *dir, umode_t mode);
>  extern bool may_open_dev(const struct path *path);
> +umode_t inode_sgid_strip(struct user_namespace *mnt_userns,
> +			 const struct inode *dir, umode_t mode);
>  
>  /*
>   * This is the "filldir" function type, used by readdir() to let
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 2/4] fs: Add missing umask strip in vfs_tmpfile
  2022-04-25  3:09 ` [PATCH v6 2/4] fs: Add missing umask strip in vfs_tmpfile Yang Xu
@ 2022-04-25 16:54   ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2022-04-25 16:54 UTC (permalink / raw)
  To: Yang Xu
  Cc: linux-fsdevel, ceph-devel, viro, david, brauner, willy, jlayton, stable

On Mon, Apr 25, 2022 at 11:09:39AM +0800, Yang Xu wrote:
> All creation paths except for O_TMPFILE handle umask in the vfs directly
> if the filesystem doesn't support or enable POSIX ACLs. If the filesystem
> does then umask handling is deferred until posix_acl_create().
> Because, O_TMPFILE misses umask handling in the vfs it will not honor
> umask settings. Fix this by adding the missing umask handling.
> 
> Fixes: 60545d0d4610 ("[O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now...")

If I had a nickel for every time I felt like I was short a few
helpers... ;)

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> Cc: <stable@vger.kernel.org> # 4.19+
> Reported-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
> ---
>  fs/namei.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 509657fdf4f5..73646e28fae0 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3521,6 +3521,8 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns,
>  	child = d_alloc(dentry, &slash_name);
>  	if (unlikely(!child))
>  		goto out_err;
> +	if (!IS_POSIXACL(dir))
> +		mode &= ~current_umask();
>  	error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
>  	if (error)
>  		goto out_err;
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip
  2022-04-25 11:29     ` Christian Brauner
@ 2022-04-25 17:33       ` Matthew Wilcox
  2022-04-26  1:22         ` xuyang2018.jy
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2022-04-25 17:33 UTC (permalink / raw)
  To: Christian Brauner
  Cc: xuyang2018.jy, linux-fsdevel, ceph-devel, viro, david, djwong, jlayton

On Mon, Apr 25, 2022 at 01:29:47PM +0200, Christian Brauner wrote:
> On Mon, Apr 25, 2022 at 03:08:36AM +0000, xuyang2018.jy@fujitsu.com wrote:
> > on 2022/4/25 10:45, Matthew Wilcox wrote:
> > > On Mon, Apr 25, 2022 at 11:09:38AM +0800, Yang Xu wrote:
> > >> This has no functional change. Just create and export inode_sgid_strip
> > >> api for the subsequent patch. This function is used to strip inode's
> > >> S_ISGID mode when init a new inode.
> > >
> > > Why would you call this inode_sgid_strip() instead of
> > > inode_strip_sgid()?
> > 
> > Because I treated "inode sgid(inode's sgid)" as a whole.
> > 
> > inode_strip_sgid sounds also ok, but now seems strip_inode_sgid seem 
> > more clear because we strip inode sgid depend on not only inode's 
> > condition but also depend on parent directory's condition.
> > 
> > What do you think about this?
> > 
> > ps: I can aceept the above several way, so if you insist, I can change 
> > it to inode_strip_sgid.
> 
> I agree with Willy. I think inode_strip_sgid() is better. It'll be in
> good company as <object>_<verb>_<what?> is pretty common:
> 
> inode_update_atime()
> inode_init_once()
> inode_init_owner()
> inode_init_early()
> inode_add_lru()
> inode_needs_sync()
> inode_set_flags()
> 
> Maybe mode_remove_sgid() is even better because it makes it clear that
> the change happens to @mode and not @dir. But I'm fine with
> inode_strip_sgid() or inode_remove_sgid() too.

Oh!  Yes, mode_strip_sgid() is better.  We're operating on the mode,
not the inode.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip
  2022-04-25 17:33       ` Matthew Wilcox
@ 2022-04-26  1:22         ` xuyang2018.jy
  0 siblings, 0 replies; 13+ messages in thread
From: xuyang2018.jy @ 2022-04-26  1:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christian Brauner, linux-fsdevel, ceph-devel, viro, david,
	djwong, jlayton

on 2022/4/26 1:33, Matthew Wilcox wrote:
> On Mon, Apr 25, 2022 at 01:29:47PM +0200, Christian Brauner wrote:
>> On Mon, Apr 25, 2022 at 03:08:36AM +0000, xuyang2018.jy@fujitsu.com wrote:
>>> on 2022/4/25 10:45, Matthew Wilcox wrote:
>>>> On Mon, Apr 25, 2022 at 11:09:38AM +0800, Yang Xu wrote:
>>>>> This has no functional change. Just create and export inode_sgid_strip
>>>>> api for the subsequent patch. This function is used to strip inode's
>>>>> S_ISGID mode when init a new inode.
>>>>
>>>> Why would you call this inode_sgid_strip() instead of
>>>> inode_strip_sgid()?
>>>
>>> Because I treated "inode sgid(inode's sgid)" as a whole.
>>>
>>> inode_strip_sgid sounds also ok, but now seems strip_inode_sgid seem
>>> more clear because we strip inode sgid depend on not only inode's
>>> condition but also depend on parent directory's condition.
>>>
>>> What do you think about this?
>>>
>>> ps: I can aceept the above several way, so if you insist, I can change
>>> it to inode_strip_sgid.
>>
>> I agree with Willy. I think inode_strip_sgid() is better. It'll be in
>> good company as<object>_<verb>_<what?>  is pretty common:
>>
>> inode_update_atime()
>> inode_init_once()
>> inode_init_owner()
>> inode_init_early()
>> inode_add_lru()
>> inode_needs_sync()
>> inode_set_flags()
>>
>> Maybe mode_remove_sgid() is even better because it makes it clear that
>> the change happens to @mode and not @dir. But I'm fine with
>> inode_strip_sgid() or inode_remove_sgid() too.
>
> Oh!  Yes, mode_strip_sgid() is better.  We're operating on the mode,
> not the inode.

OK, I will use mode_strip_sgid().


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/4] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem
  2022-04-25 16:51   ` Darrick J. Wong
@ 2022-04-26  1:25     ` xuyang2018.jy
  0 siblings, 0 replies; 13+ messages in thread
From: xuyang2018.jy @ 2022-04-26  1:25 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-fsdevel, ceph-devel, viro, david, brauner, willy, jlayton

on 2022/4/26 0:51, Darrick J. Wong wrote:
> On Mon, Apr 25, 2022 at 11:09:40AM +0800, Yang Xu wrote:
>> Currently, vfs only passes mode argument to filesystem, then use inode_init_owner()
>> to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner
>> firstly, then posxi acl setup, but xfs uses the contrary order. It will
>> affect S_ISGID clear especially we filter S_IXGRP by umask or acl.
>>
>> Regardless of which filesystem is in use, failure to strip the SGID correctly
>> is considered a security failure that needs to be fixed. The current VFS
>> infrastructure requires the filesystem to do everything right and not step on
>> any landmines to strip the SGID bit, when in fact it can easily be done at the
>> VFS and the filesystems then don't even need to be aware that the SGID needs
>> to be (or has been stripped) by the operation the user asked to be done.
>>
>> Vfs has all the info it needs - it doesn't need the filesystems to do everything
>> correctly with the mode and ensuring that they order things like posix acl setup
>> functions correctly with inode_init_owner() to strip the SGID bit.
>>
>> Just strip the SGID bit at the VFS, and then the filesystem can't get it wrong.
>>
>> Also, the inode_sgid_strip() api should be used before IS_POSIXACL() because
>> this api may change mode.
>>
>> Only the following places use inode_init_owner
>> "
>> arch/powerpc/platforms/cell/spufs/inode.c:      inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR);
>> arch/powerpc/platforms/cell/spufs/inode.c:      inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR);
>> fs/9p/vfs_inode.c:      inode_init_owner(&init_user_ns, inode, NULL, mode);
>> fs/bfs/dir.c:   inode_init_owner(&init_user_ns, inode, dir, mode);
>> fs/btrfs/inode.c:       inode_init_owner(mnt_userns, inode, dir, mode);
>> fs/btrfs/tests/btrfs-tests.c:   inode_init_owner(&init_user_ns, inode, NULL, S_IFREG);
>> fs/ext2/ialloc.c:               inode_init_owner(&init_user_ns, inode, dir, mode);
>> fs/ext4/ialloc.c:               inode_init_owner(mnt_userns, inode, dir, mode);
>> fs/f2fs/namei.c:        inode_init_owner(mnt_userns, inode, dir, mode);
>> fs/hfsplus/inode.c:     inode_init_owner(&init_user_ns, inode, dir, mode);
>> fs/hugetlbfs/inode.c:           inode_init_owner(&init_user_ns, inode, dir, mode);
>> fs/jfs/jfs_inode.c:     inode_init_owner(&init_user_ns, inode, parent, mode);
>> fs/minix/bitmap.c:      inode_init_owner(&init_user_ns, inode, dir, mode);
>> fs/nilfs2/inode.c:      inode_init_owner(&init_user_ns, inode, dir, mode);
>> fs/ntfs3/inode.c:       inode_init_owner(mnt_userns, inode, dir, mode);
>> fs/ocfs2/dlmfs/dlmfs.c:         inode_init_owner(&init_user_ns, inode, NULL, mode);
>> fs/ocfs2/dlmfs/dlmfs.c: inode_init_owner(&init_user_ns, inode, parent, mode);
>> fs/ocfs2/namei.c:       inode_init_owner(&init_user_ns, inode, dir, mode);
>> fs/omfs/inode.c:        inode_init_owner(&init_user_ns, inode, NULL, mode);
>> fs/overlayfs/dir.c:     inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode);
>> fs/ramfs/inode.c:               inode_init_owner(&init_user_ns, inode, dir, mode);
>> fs/reiserfs/namei.c:    inode_init_owner(&init_user_ns, inode, dir, mode);
>> fs/sysv/ialloc.c:       inode_init_owner(&init_user_ns, inode, dir, mode);
>> fs/ubifs/dir.c: inode_init_owner(&init_user_ns, inode, dir, mode);
>> fs/udf/ialloc.c:        inode_init_owner(&init_user_ns, inode, dir, mode);
>> fs/ufs/ialloc.c:        inode_init_owner(&init_user_ns, inode, dir, mode);
>> fs/xfs/xfs_inode.c:             inode_init_owner(mnt_userns, inode, dir, mode);
>> fs/zonefs/super.c:      inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555);
>> kernel/bpf/inode.c:     inode_init_owner(&init_user_ns, inode, dir, mode);
>> mm/shmem.c:             inode_init_owner(&init_user_ns, inode, dir, mode);
>> "
>>
>> They are used in filesystem to init new inode function and these init inode
>> functions are used by following operations:
>> mkdir
>> symlink
>> mknod
>> create
>> tmpfile
>> rename
>>
>> We don't care about mkdir because we don't strip SGID bit for directory except
>> fs.xfs.irix_sgid_inherit. But we even call prepare_mode() in do_mkdirat() since
>> inode_sgid_strip() will skip directories anyway. This will enforce the same
>> ordering for all relevant operations and it will make the code more uniform and
>> easier to understand by using new helper prepare_mode().
>>
>> symlink and rename only use valid mode that doesn't have SGID bit.
>>
>> We have added inode_sgid_strip api for the remaining operations.
>>
>> In addition to the above six operations, four filesystems has a little difference
>> 1) btrfs has btrfs_create_subvol_root to create new inode but used non SGID bit
>>     mode and can ignore
>> 2) ocfs2 reflink function should add inode_sgid_strip api manually because this ioctl
>>     is only useful when backport reflink features to old kernels. ocfs2 still use vfs
>>     remap_range code to do reflink.
>> 3) spufs which doesn't really go hrough the regular VFS callpath because it has
>>     separate system call spu_create, but it t only allows the creation of
>>     directories and only allows bits in 0777 and can ignore
>> 4) bpf use vfs_mkobj in bpf_obj_do_pin with
>>     "S_IFREG | ((S_IRUSR | S_IWUSR)&  ~current_umask()) mode and
>>     use bpf_mkobj_ops in bpf_iter_link_pin_kernel with S_IFREG | S_IRUSR mode,
>>     so bpf is also not affected
>>
>> This patch also changed grpid behaviour for ext4/xfs because the mode passed to
>> them may been changed by inode_sgid_strip.
>>
>> Also as Christian Brauner said"
>> The patch itself is useful as it would move a security sensitive operation that is
>> currently burried in individual filesystems into the vfs layer. But it has a decent
>> regression  potential since it might strip filesystems that have so far relied on
>> getting the S_ISGID bit with a mode argument. So this needs a lot of testing and
>> long exposure in -next for at least one full kernel cycle."
>>
>> Suggested-by: Dave Chinner<david@fromorbit.com>
>> Signed-off-by: Yang Xu<xuyang2018.jy@fujitsu.com>
>> ---
>>   fs/inode.c         |  2 --
>>   fs/namei.c         | 22 +++++++++-------------
>>   fs/ocfs2/namei.c   |  1 +
>>   include/linux/fs.h | 11 +++++++++++
>>   4 files changed, 21 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/inode.c b/fs/inode.c
>> index 78e7ef567e04..041c0837f248 100644
>> --- a/fs/inode.c
>> +++ b/fs/inode.c
>> @@ -2246,8 +2246,6 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
>>   		/* Directories are special, and always inherit S_ISGID */
>>   		if (S_ISDIR(mode))
>>   			mode |= S_ISGID;
>> -		else
>> -			mode = inode_sgid_strip(mnt_userns, dir, mode);
>>   	} else
>>   		inode_fsgid_set(inode, mnt_userns);
>>   	inode->i_mode = mode;
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 73646e28fae0..5b8e6288d503 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -3287,8 +3287,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
>>   	if (open_flag&  O_CREAT) {
>>   		if (open_flag&  O_EXCL)
>>   			open_flag&= ~O_TRUNC;
>> -		if (!IS_POSIXACL(dir->d_inode))
>> -			mode&= ~current_umask();
>> +		mode = prepare_mode(mnt_userns, dir->d_inode, mode);
>>   		if (likely(got_write))
>>   			create_error = may_o_create(mnt_userns,&nd->path,
>>   						    dentry, mode);
>> @@ -3521,8 +3520,7 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns,
>>   	child = d_alloc(dentry,&slash_name);
>>   	if (unlikely(!child))
>>   		goto out_err;
>> -	if (!IS_POSIXACL(dir))
>> -		mode&= ~current_umask();
>> +	mode = prepare_mode(mnt_userns, dir, mode);
>>   	error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
>>   	if (error)
>>   		goto out_err;
>> @@ -3850,13 +3848,12 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
>>   	if (IS_ERR(dentry))
>>   		goto out1;
>>
>> -	if (!IS_POSIXACL(path.dentry->d_inode))
>> -		mode&= ~current_umask();
>> +	mnt_userns = mnt_user_ns(path.mnt);
>> +	mode = prepare_mode(mnt_userns, path.dentry->d_inode, mode);
>>   	error = security_path_mknod(&path, dentry, mode, dev);
>>   	if (error)
>>   		goto out2;
>>
>> -	mnt_userns = mnt_user_ns(path.mnt);
>>   	switch (mode&  S_IFMT) {
>>   		case 0: case S_IFREG:
>>   			error = vfs_create(mnt_userns, path.dentry->d_inode,
>> @@ -3943,6 +3940,7 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
>>   	struct path path;
>>   	int error;
>>   	unsigned int lookup_flags = LOOKUP_DIRECTORY;
>> +	struct user_namespace *mnt_userns;
>>
>>   retry:
>>   	dentry = filename_create(dfd, name,&path, lookup_flags);
>> @@ -3950,15 +3948,13 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
>>   	if (IS_ERR(dentry))
>>   		goto out_putname;
>>
>> -	if (!IS_POSIXACL(path.dentry->d_inode))
>> -		mode&= ~current_umask();
>> +	mnt_userns = mnt_user_ns(path.mnt);
>> +	mode = prepare_mode(mnt_userns, path.dentry->d_inode, mode);
>>   	error = security_path_mkdir(&path, dentry, mode);
>> -	if (!error) {
>> -		struct user_namespace *mnt_userns;
>> -		mnt_userns = mnt_user_ns(path.mnt);
>> +	if (!error)
>>   		error = vfs_mkdir(mnt_userns, path.dentry->d_inode, dentry,
>>   				  mode);
>> -	}
>> +
>>   	done_path_create(&path, dentry);
>>   	if (retry_estale(error, lookup_flags)) {
>>   		lookup_flags |= LOOKUP_REVAL;
>> diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
>> index c75fd54b9185..21f3da2e66c9 100644
>> --- a/fs/ocfs2/namei.c
>> +++ b/fs/ocfs2/namei.c
>> @@ -197,6 +197,7 @@ static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode)
>>   	 * callers. */
>>   	if (S_ISDIR(mode))
>>   		set_nlink(inode, 2);
>> +	mode = inode_sgid_strip(&init_user_ns, dir, mode);
>>   	inode_init_owner(&init_user_ns, inode, dir, mode);
>>   	status = dquot_initialize(inode);
>>   	if (status)
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 532de76c9b91..ca70cdf9c9e2 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -3459,6 +3459,17 @@ static inline bool dir_relax_shared(struct inode *inode)
>>   	return !IS_DEADDIR(inode);
>>   }
>>
>> +static inline umode_t prepare_mode(struct user_namespace *mnt_userns,
>
> You probably ought to make this more obviously a *file* mode preparation
> function by naming it "vfs_prepare_mode" or something.

I will use vfs_prepare_mode().

>
> --D
>
>> +				   const struct inode *dir, umode_t mode)
>> +{
>> +	mode = inode_sgid_strip(mnt_userns, dir, mode);
>> +
>> +	if (!IS_POSIXACL(dir))
>> +		mode&= ~current_umask();
>> +
>> +	return mode;
>> +}
>> +
>>   extern bool path_noexec(const struct path *path);
>>   extern void inode_nohighmem(struct inode *inode);
>>
>> --
>> 2.27.0
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-04-26  1:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-25  3:09 [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip Yang Xu
2022-04-25  2:45 ` Matthew Wilcox
2022-04-25  3:08   ` xuyang2018.jy
2022-04-25 11:29     ` Christian Brauner
2022-04-25 17:33       ` Matthew Wilcox
2022-04-26  1:22         ` xuyang2018.jy
2022-04-25  3:09 ` [PATCH v6 2/4] fs: Add missing umask strip in vfs_tmpfile Yang Xu
2022-04-25 16:54   ` Darrick J. Wong
2022-04-25  3:09 ` [PATCH v6 3/4] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem Yang Xu
2022-04-25 16:51   ` Darrick J. Wong
2022-04-26  1:25     ` xuyang2018.jy
2022-04-25  3:09 ` [PATCH v6 4/4] ceph: Remove S_ISGID stripping code in ceph_finish_async_create Yang Xu
2022-04-25 16:53 ` [PATCH v6 1/4] fs: move sgid strip operation from inode_init_owner into inode_sgid_strip Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.