* [PATCH v7 1/4] fs: move sgid stripping operation from inode_init_owner into mode_strip_sgid @ 2022-04-26 4:19 Yang Xu 2022-04-26 4:19 ` [PATCH v7 2/4] fs: Add missing umask strip in vfs_tmpfile Yang Xu ` (3 more replies) 0 siblings, 4 replies; 12+ messages in thread From: Yang Xu @ 2022-04-26 4:19 UTC (permalink / raw) To: linux-fsdevel, ceph-devel Cc: viro, david, djwong, brauner, willy, jlayton, Yang Xu This has no functional change. Just create and export mode_strip_sgid api for the subsequent patch. This function is used to strip S_ISGID mode when init a new inode. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> --- fs/inode.c | 37 +++++++++++++++++++++++++++++++++---- include/linux/fs.h | 2 ++ 2 files changed, 35 insertions(+), 4 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 9d9b422504d1..e9a5f2ec2f89 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2246,10 +2246,8 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode, /* Directories are special, and always inherit S_ISGID */ if (S_ISDIR(mode)) mode |= S_ISGID; - else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) && - !in_group_p(i_gid_into_mnt(mnt_userns, dir)) && - !capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID)) - mode &= ~S_ISGID; + else + mode = mode_strip_sgid(mnt_userns, dir, mode); } else inode_fsgid_set(inode, mnt_userns); inode->i_mode = mode; @@ -2405,3 +2403,34 @@ struct timespec64 current_time(struct inode *inode) return timestamp_truncate(now, inode); } EXPORT_SYMBOL(current_time); + +/** + * mode_strip_sgid - handle the sgid bit for non-directories + * @mnt_userns: User namespace of the mount the inode was created from + * @dir: parent directory inode + * @mode: mode of the file to be created in @dir + * + * If the @mode of the new file has both the S_ISGID and S_IXGRP bit + * raised and @dir has the S_ISGID bit raised ensure that the caller is + * either in the group of the parent directory or they have CAP_FSETID + * in their user namespace and are privileged over the parent directory. + * In all other cases, strip the S_ISGID bit from @mode. + * + * Return: the new mode to use for the file + */ +umode_t mode_strip_sgid(struct user_namespace *mnt_userns, + const struct inode *dir, umode_t mode) +{ + if (S_ISDIR(mode) || !dir || !(dir->i_mode & S_ISGID)) + return mode; + if ((mode & (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP)) + return mode; + if (in_group_p(i_gid_into_mnt(mnt_userns, dir))) + return mode; + if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID)) + return mode; + + mode &= ~S_ISGID; + return mode; +} +EXPORT_SYMBOL(mode_strip_sgid); diff --git a/include/linux/fs.h b/include/linux/fs.h index bbde95387a23..98b44a2732f5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1897,6 +1897,8 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd, void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode, const struct inode *dir, umode_t mode); extern bool may_open_dev(const struct path *path); +umode_t mode_strip_sgid(struct user_namespace *mnt_userns, + const struct inode *dir, umode_t mode); /* * This is the "filldir" function type, used by readdir() to let -- 2.27.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 2/4] fs: Add missing umask strip in vfs_tmpfile 2022-04-26 4:19 [PATCH v7 1/4] fs: move sgid stripping operation from inode_init_owner into mode_strip_sgid Yang Xu @ 2022-04-26 4:19 ` Yang Xu 2022-04-26 4:19 ` [PATCH v7 3/4] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem Yang Xu ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: Yang Xu @ 2022-04-26 4:19 UTC (permalink / raw) To: linux-fsdevel, ceph-devel Cc: viro, david, djwong, brauner, willy, jlayton, Yang Xu, stable All creation paths except for O_TMPFILE handle umask in the vfs directly if the filesystem doesn't support or enable POSIX ACLs. If the filesystem does then umask handling is deferred until posix_acl_create(). Because, O_TMPFILE misses umask handling in the vfs it will not honor umask settings. Fix this by adding the missing umask handling. Fixes: 60545d0d4610 ("[O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now...") Cc: <stable@vger.kernel.org> # 4.19+ Reported-by: Christian Brauner (Microsoft) <brauner@kernel.org> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> --- fs/namei.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/namei.c b/fs/namei.c index 509657fdf4f5..73646e28fae0 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3521,6 +3521,8 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns, child = d_alloc(dentry, &slash_name); if (unlikely(!child)) goto out_err; + if (!IS_POSIXACL(dir)) + mode &= ~current_umask(); error = dir->i_op->tmpfile(mnt_userns, dir, child, mode); if (error) goto out_err; -- 2.27.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 3/4] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem 2022-04-26 4:19 [PATCH v7 1/4] fs: move sgid stripping operation from inode_init_owner into mode_strip_sgid Yang Xu 2022-04-26 4:19 ` [PATCH v7 2/4] fs: Add missing umask strip in vfs_tmpfile Yang Xu @ 2022-04-26 4:19 ` Yang Xu 2022-04-26 5:21 ` Darrick J. Wong 2022-04-26 4:19 ` [PATCH v7 4/4] ceph: Remove S_ISGID stripping code in ceph_finish_async_create Yang Xu 2022-04-26 7:06 ` [PATCH v7 1/4] fs: move sgid stripping operation from inode_init_owner into mode_strip_sgid Christian Brauner 3 siblings, 1 reply; 12+ messages in thread From: Yang Xu @ 2022-04-26 4:19 UTC (permalink / raw) To: linux-fsdevel, ceph-devel Cc: viro, david, djwong, brauner, willy, jlayton, Yang Xu Currently, vfs only passes mode parameter to filesystem, then use inode_init_owner() to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner firstly, then posxi acl setup, but xfs uses the contrary order. It will affect S_ISGID clear especially we filter S_IXGRP by umask or acl. Regardless of which filesystem is in use, failure to strip the SGID correctly is considered a security failure that needs to be fixed. The current VFS infrastructure requires the filesystem to do everything right and not step on any landmines to strip the SGID bit, when in fact it can easily be done at the VFS and the filesystems then don't even need to be aware that the SGID needs to be (or has been stripped) by the operation the user asked to be done. Vfs has all the info it needs - it doesn't need the filesystems to do everything correctly with the mode and ensuring that they order things like posix acl setup functions correctly with inode_init_owner() to strip the SGID bit. Just strip the SGID bit at the VFS, and then the filesystem can't get it wrong. Also, the mode_strip_sgid() api should be used before IS_POSIXACL() because this api may change mode. Only the following places use inode_init_owner " arch/powerpc/platforms/cell/spufs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR); arch/powerpc/platforms/cell/spufs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR); fs/9p/vfs_inode.c: inode_init_owner(&init_user_ns, inode, NULL, mode); fs/bfs/dir.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/btrfs/inode.c: inode_init_owner(mnt_userns, inode, dir, mode); fs/btrfs/tests/btrfs-tests.c: inode_init_owner(&init_user_ns, inode, NULL, S_IFREG); fs/ext2/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/ext4/ialloc.c: inode_init_owner(mnt_userns, inode, dir, mode); fs/f2fs/namei.c: inode_init_owner(mnt_userns, inode, dir, mode); fs/hfsplus/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/hugetlbfs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/jfs/jfs_inode.c: inode_init_owner(&init_user_ns, inode, parent, mode); fs/minix/bitmap.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/nilfs2/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/ntfs3/inode.c: inode_init_owner(mnt_userns, inode, dir, mode); fs/ocfs2/dlmfs/dlmfs.c: inode_init_owner(&init_user_ns, inode, NULL, mode); fs/ocfs2/dlmfs/dlmfs.c: inode_init_owner(&init_user_ns, inode, parent, mode); fs/ocfs2/namei.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/omfs/inode.c: inode_init_owner(&init_user_ns, inode, NULL, mode); fs/overlayfs/dir.c: inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode); fs/ramfs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/reiserfs/namei.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/sysv/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/ubifs/dir.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/udf/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/ufs/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/xfs/xfs_inode.c: inode_init_owner(mnt_userns, inode, dir, mode); fs/zonefs/super.c: inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555); kernel/bpf/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); mm/shmem.c: inode_init_owner(&init_user_ns, inode, dir, mode); " They are used in filesystem to init new inode function and these init inode functions are used by following operations: mkdir symlink mknod create tmpfile rename We don't care about mkdir because we don't strip SGID bit for directory except fs.xfs.irix_sgid_inherit. But we even call vfs_prepare_mode() in do_mkdirat() since mode_strip_sgid() will skip directories anyway. This will enforce the same ordering for all relevant operations and it will make the code more uniform and easier to understand by using new helper vfs_prepare_mode(). symlink and rename only use valid mode that doesn't have SGID bit. We have added mode_strip_sgid() api for the remaining operations. In addition to the above six operations, four filesystems has a little difference 1) btrfs has btrfs_create_subvol_root to create new inode but used non SGID bit mode and can ignore 2) ocfs2 reflink function should add mode_strip_sgid api manually because this ioctl is unique and not added into vfs. It may use S_ISGID modd. 3) spufs which doesn't really go hrough the regular VFS callpath because it has separate system call spu_create, but it t only allows the creation of directories and only allows bits in 0777 and can ignore 4) bpf use vfs_mkobj in bpf_obj_do_pin with "S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask()) mode and use bpf_mkobj_ops in bpf_iter_link_pin_kernel with S_IFREG | S_IRUSR mode, so bpf is also not affected This patch also changed grpid behaviour for ext4/xfs because the mode passed to them may been changed by vfs_prepare_mode. Also as Christian Brauner said" The patch itself is useful as it would move a security sensitive operation that is currently burried in individual filesystems into the vfs layer. But it has a decent regression potential since it might strip filesystems that have so far relied on getting the S_ISGID bit with a mode argument. So this needs a lot of testing and long exposure in -next for at least one full kernel cycle." Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> --- fs/inode.c | 2 -- fs/namei.c | 22 +++++++++------------- fs/ocfs2/namei.c | 1 + include/linux/fs.h | 11 +++++++++++ 4 files changed, 21 insertions(+), 15 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index e9a5f2ec2f89..dd357f4b556d 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2246,8 +2246,6 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode, /* Directories are special, and always inherit S_ISGID */ if (S_ISDIR(mode)) mode |= S_ISGID; - else - mode = mode_strip_sgid(mnt_userns, dir, mode); } else inode_fsgid_set(inode, mnt_userns); inode->i_mode = mode; diff --git a/fs/namei.c b/fs/namei.c index 73646e28fae0..5dbf00704ae8 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3287,8 +3287,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file, if (open_flag & O_CREAT) { if (open_flag & O_EXCL) open_flag &= ~O_TRUNC; - if (!IS_POSIXACL(dir->d_inode)) - mode &= ~current_umask(); + mode = vfs_prepare_mode(mnt_userns, dir->d_inode, mode); if (likely(got_write)) create_error = may_o_create(mnt_userns, &nd->path, dentry, mode); @@ -3521,8 +3520,7 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns, child = d_alloc(dentry, &slash_name); if (unlikely(!child)) goto out_err; - if (!IS_POSIXACL(dir)) - mode &= ~current_umask(); + mode = vfs_prepare_mode(mnt_userns, dir, mode); error = dir->i_op->tmpfile(mnt_userns, dir, child, mode); if (error) goto out_err; @@ -3850,13 +3848,12 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode, if (IS_ERR(dentry)) goto out1; - if (!IS_POSIXACL(path.dentry->d_inode)) - mode &= ~current_umask(); + mnt_userns = mnt_user_ns(path.mnt); + mode = vfs_prepare_mode(mnt_userns, path.dentry->d_inode, mode); error = security_path_mknod(&path, dentry, mode, dev); if (error) goto out2; - mnt_userns = mnt_user_ns(path.mnt); switch (mode & S_IFMT) { case 0: case S_IFREG: error = vfs_create(mnt_userns, path.dentry->d_inode, @@ -3943,6 +3940,7 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode) struct path path; int error; unsigned int lookup_flags = LOOKUP_DIRECTORY; + struct user_namespace *mnt_userns; retry: dentry = filename_create(dfd, name, &path, lookup_flags); @@ -3950,15 +3948,13 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode) if (IS_ERR(dentry)) goto out_putname; - if (!IS_POSIXACL(path.dentry->d_inode)) - mode &= ~current_umask(); + mnt_userns = mnt_user_ns(path.mnt); + mode = vfs_prepare_mode(mnt_userns, path.dentry->d_inode, mode); error = security_path_mkdir(&path, dentry, mode); - if (!error) { - struct user_namespace *mnt_userns; - mnt_userns = mnt_user_ns(path.mnt); + if (!error) error = vfs_mkdir(mnt_userns, path.dentry->d_inode, dentry, mode); - } + done_path_create(&path, dentry); if (retry_estale(error, lookup_flags)) { lookup_flags |= LOOKUP_REVAL; diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c index c75fd54b9185..961d1cf54388 100644 --- a/fs/ocfs2/namei.c +++ b/fs/ocfs2/namei.c @@ -197,6 +197,7 @@ static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode) * callers. */ if (S_ISDIR(mode)) set_nlink(inode, 2); + mode = mode_strip_sgid(&init_user_ns, dir, mode); inode_init_owner(&init_user_ns, inode, dir, mode); status = dquot_initialize(inode); if (status) diff --git a/include/linux/fs.h b/include/linux/fs.h index 98b44a2732f5..914c8f28bb02 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3459,6 +3459,17 @@ static inline bool dir_relax_shared(struct inode *inode) return !IS_DEADDIR(inode); } +static inline umode_t vfs_prepare_mode(struct user_namespace *mnt_userns, + const struct inode *dir, umode_t mode) +{ + mode = mode_strip_sgid(mnt_userns, dir, mode); + + if (!IS_POSIXACL(dir)) + mode &= ~current_umask(); + + return mode; +} + extern bool path_noexec(const struct path *path); extern void inode_nohighmem(struct inode *inode); -- 2.27.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v7 3/4] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem 2022-04-26 4:19 ` [PATCH v7 3/4] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem Yang Xu @ 2022-04-26 5:21 ` Darrick J. Wong 2022-04-26 8:34 ` Christian Brauner 0 siblings, 1 reply; 12+ messages in thread From: Darrick J. Wong @ 2022-04-26 5:21 UTC (permalink / raw) To: Yang Xu; +Cc: linux-fsdevel, ceph-devel, viro, david, brauner, willy, jlayton On Tue, Apr 26, 2022 at 12:19:51PM +0800, Yang Xu wrote: > Currently, vfs only passes mode parameter to filesystem, then use inode_init_owner() > to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner > firstly, then posxi acl setup, but xfs uses the contrary order. It will > affect S_ISGID clear especially we filter S_IXGRP by umask or acl. > > Regardless of which filesystem is in use, failure to strip the SGID correctly > is considered a security failure that needs to be fixed. The current VFS > infrastructure requires the filesystem to do everything right and not step on > any landmines to strip the SGID bit, when in fact it can easily be done at the > VFS and the filesystems then don't even need to be aware that the SGID needs > to be (or has been stripped) by the operation the user asked to be done. > > Vfs has all the info it needs - it doesn't need the filesystems to do everything > correctly with the mode and ensuring that they order things like posix acl setup > functions correctly with inode_init_owner() to strip the SGID bit. > > Just strip the SGID bit at the VFS, and then the filesystem can't get it wrong. > > Also, the mode_strip_sgid() api should be used before IS_POSIXACL() because > this api may change mode. > > Only the following places use inode_init_owner > " > arch/powerpc/platforms/cell/spufs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR); > arch/powerpc/platforms/cell/spufs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR); > fs/9p/vfs_inode.c: inode_init_owner(&init_user_ns, inode, NULL, mode); > fs/bfs/dir.c: inode_init_owner(&init_user_ns, inode, dir, mode); > fs/btrfs/inode.c: inode_init_owner(mnt_userns, inode, dir, mode); > fs/btrfs/tests/btrfs-tests.c: inode_init_owner(&init_user_ns, inode, NULL, S_IFREG); > fs/ext2/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); > fs/ext4/ialloc.c: inode_init_owner(mnt_userns, inode, dir, mode); > fs/f2fs/namei.c: inode_init_owner(mnt_userns, inode, dir, mode); > fs/hfsplus/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); > fs/hugetlbfs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); > fs/jfs/jfs_inode.c: inode_init_owner(&init_user_ns, inode, parent, mode); > fs/minix/bitmap.c: inode_init_owner(&init_user_ns, inode, dir, mode); > fs/nilfs2/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); > fs/ntfs3/inode.c: inode_init_owner(mnt_userns, inode, dir, mode); > fs/ocfs2/dlmfs/dlmfs.c: inode_init_owner(&init_user_ns, inode, NULL, mode); > fs/ocfs2/dlmfs/dlmfs.c: inode_init_owner(&init_user_ns, inode, parent, mode); > fs/ocfs2/namei.c: inode_init_owner(&init_user_ns, inode, dir, mode); > fs/omfs/inode.c: inode_init_owner(&init_user_ns, inode, NULL, mode); > fs/overlayfs/dir.c: inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode); > fs/ramfs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); > fs/reiserfs/namei.c: inode_init_owner(&init_user_ns, inode, dir, mode); > fs/sysv/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); > fs/ubifs/dir.c: inode_init_owner(&init_user_ns, inode, dir, mode); > fs/udf/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); > fs/ufs/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); > fs/xfs/xfs_inode.c: inode_init_owner(mnt_userns, inode, dir, mode); > fs/zonefs/super.c: inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555); > kernel/bpf/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); > mm/shmem.c: inode_init_owner(&init_user_ns, inode, dir, mode); > " > > They are used in filesystem to init new inode function and these init inode > functions are used by following operations: > mkdir > symlink > mknod > create > tmpfile > rename > > We don't care about mkdir because we don't strip SGID bit for directory except > fs.xfs.irix_sgid_inherit. But we even call vfs_prepare_mode() in do_mkdirat() since > mode_strip_sgid() will skip directories anyway. This will enforce the same > ordering for all relevant operations and it will make the code more uniform and > easier to understand by using new helper vfs_prepare_mode(). > > symlink and rename only use valid mode that doesn't have SGID bit. > > We have added mode_strip_sgid() api for the remaining operations. > > In addition to the above six operations, four filesystems has a little difference > 1) btrfs has btrfs_create_subvol_root to create new inode but used non SGID bit > mode and can ignore > 2) ocfs2 reflink function should add mode_strip_sgid api manually because this ioctl > is unique and not added into vfs. It may use S_ISGID modd. > 3) spufs which doesn't really go hrough the regular VFS callpath because it has > separate system call spu_create, but it t only allows the creation of > directories and only allows bits in 0777 and can ignore > 4) bpf use vfs_mkobj in bpf_obj_do_pin with > "S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask()) mode and > use bpf_mkobj_ops in bpf_iter_link_pin_kernel with S_IFREG | S_IRUSR mode, > so bpf is also not affected > > This patch also changed grpid behaviour for ext4/xfs because the mode passed to > them may been changed by vfs_prepare_mode. > > Also as Christian Brauner said" > The patch itself is useful as it would move a security sensitive operation that is > currently burried in individual filesystems into the vfs layer. But it has a decent > regression potential since it might strip filesystems that have so far relied on > getting the S_ISGID bit with a mode argument. So this needs a lot of testing and > long exposure in -next for at least one full kernel cycle." > > Suggested-by: Dave Chinner <david@fromorbit.com> > Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> Looks good! Thank you for taking care of this! :) Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > --- > fs/inode.c | 2 -- > fs/namei.c | 22 +++++++++------------- > fs/ocfs2/namei.c | 1 + > include/linux/fs.h | 11 +++++++++++ > 4 files changed, 21 insertions(+), 15 deletions(-) > > diff --git a/fs/inode.c b/fs/inode.c > index e9a5f2ec2f89..dd357f4b556d 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -2246,8 +2246,6 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode, > /* Directories are special, and always inherit S_ISGID */ > if (S_ISDIR(mode)) > mode |= S_ISGID; > - else > - mode = mode_strip_sgid(mnt_userns, dir, mode); > } else > inode_fsgid_set(inode, mnt_userns); > inode->i_mode = mode; > diff --git a/fs/namei.c b/fs/namei.c > index 73646e28fae0..5dbf00704ae8 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -3287,8 +3287,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file, > if (open_flag & O_CREAT) { > if (open_flag & O_EXCL) > open_flag &= ~O_TRUNC; > - if (!IS_POSIXACL(dir->d_inode)) > - mode &= ~current_umask(); > + mode = vfs_prepare_mode(mnt_userns, dir->d_inode, mode); > if (likely(got_write)) > create_error = may_o_create(mnt_userns, &nd->path, > dentry, mode); > @@ -3521,8 +3520,7 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns, > child = d_alloc(dentry, &slash_name); > if (unlikely(!child)) > goto out_err; > - if (!IS_POSIXACL(dir)) > - mode &= ~current_umask(); > + mode = vfs_prepare_mode(mnt_userns, dir, mode); > error = dir->i_op->tmpfile(mnt_userns, dir, child, mode); > if (error) > goto out_err; > @@ -3850,13 +3848,12 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode, > if (IS_ERR(dentry)) > goto out1; > > - if (!IS_POSIXACL(path.dentry->d_inode)) > - mode &= ~current_umask(); > + mnt_userns = mnt_user_ns(path.mnt); > + mode = vfs_prepare_mode(mnt_userns, path.dentry->d_inode, mode); > error = security_path_mknod(&path, dentry, mode, dev); > if (error) > goto out2; > > - mnt_userns = mnt_user_ns(path.mnt); > switch (mode & S_IFMT) { > case 0: case S_IFREG: > error = vfs_create(mnt_userns, path.dentry->d_inode, > @@ -3943,6 +3940,7 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode) > struct path path; > int error; > unsigned int lookup_flags = LOOKUP_DIRECTORY; > + struct user_namespace *mnt_userns; > > retry: > dentry = filename_create(dfd, name, &path, lookup_flags); > @@ -3950,15 +3948,13 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode) > if (IS_ERR(dentry)) > goto out_putname; > > - if (!IS_POSIXACL(path.dentry->d_inode)) > - mode &= ~current_umask(); > + mnt_userns = mnt_user_ns(path.mnt); > + mode = vfs_prepare_mode(mnt_userns, path.dentry->d_inode, mode); > error = security_path_mkdir(&path, dentry, mode); > - if (!error) { > - struct user_namespace *mnt_userns; > - mnt_userns = mnt_user_ns(path.mnt); > + if (!error) > error = vfs_mkdir(mnt_userns, path.dentry->d_inode, dentry, > mode); > - } > + > done_path_create(&path, dentry); > if (retry_estale(error, lookup_flags)) { > lookup_flags |= LOOKUP_REVAL; > diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c > index c75fd54b9185..961d1cf54388 100644 > --- a/fs/ocfs2/namei.c > +++ b/fs/ocfs2/namei.c > @@ -197,6 +197,7 @@ static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode) > * callers. */ > if (S_ISDIR(mode)) > set_nlink(inode, 2); > + mode = mode_strip_sgid(&init_user_ns, dir, mode); > inode_init_owner(&init_user_ns, inode, dir, mode); > status = dquot_initialize(inode); > if (status) > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 98b44a2732f5..914c8f28bb02 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -3459,6 +3459,17 @@ static inline bool dir_relax_shared(struct inode *inode) > return !IS_DEADDIR(inode); > } > > +static inline umode_t vfs_prepare_mode(struct user_namespace *mnt_userns, > + const struct inode *dir, umode_t mode) > +{ > + mode = mode_strip_sgid(mnt_userns, dir, mode); > + > + if (!IS_POSIXACL(dir)) > + mode &= ~current_umask(); > + > + return mode; > +} > + > extern bool path_noexec(const struct path *path); > extern void inode_nohighmem(struct inode *inode); > > -- > 2.27.0 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 3/4] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem 2022-04-26 5:21 ` Darrick J. Wong @ 2022-04-26 8:34 ` Christian Brauner 0 siblings, 0 replies; 12+ messages in thread From: Christian Brauner @ 2022-04-26 8:34 UTC (permalink / raw) To: Darrick J. Wong Cc: Yang Xu, linux-fsdevel, ceph-devel, viro, david, willy, jlayton On Mon, Apr 25, 2022 at 10:21:57PM -0700, Darrick J. Wong wrote: > On Tue, Apr 26, 2022 at 12:19:51PM +0800, Yang Xu wrote: > > Currently, vfs only passes mode parameter to filesystem, then use inode_init_owner() > > to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner > > firstly, then posxi acl setup, but xfs uses the contrary order. It will > > affect S_ISGID clear especially we filter S_IXGRP by umask or acl. > > > > Regardless of which filesystem is in use, failure to strip the SGID correctly > > is considered a security failure that needs to be fixed. The current VFS > > infrastructure requires the filesystem to do everything right and not step on > > any landmines to strip the SGID bit, when in fact it can easily be done at the > > VFS and the filesystems then don't even need to be aware that the SGID needs > > to be (or has been stripped) by the operation the user asked to be done. > > > > Vfs has all the info it needs - it doesn't need the filesystems to do everything > > correctly with the mode and ensuring that they order things like posix acl setup > > functions correctly with inode_init_owner() to strip the SGID bit. > > > > Just strip the SGID bit at the VFS, and then the filesystem can't get it wrong. > > > > Also, the mode_strip_sgid() api should be used before IS_POSIXACL() because > > this api may change mode. > > > > Only the following places use inode_init_owner > > " > > arch/powerpc/platforms/cell/spufs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR); > > arch/powerpc/platforms/cell/spufs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR); > > fs/9p/vfs_inode.c: inode_init_owner(&init_user_ns, inode, NULL, mode); > > fs/bfs/dir.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > fs/btrfs/inode.c: inode_init_owner(mnt_userns, inode, dir, mode); > > fs/btrfs/tests/btrfs-tests.c: inode_init_owner(&init_user_ns, inode, NULL, S_IFREG); > > fs/ext2/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > fs/ext4/ialloc.c: inode_init_owner(mnt_userns, inode, dir, mode); > > fs/f2fs/namei.c: inode_init_owner(mnt_userns, inode, dir, mode); > > fs/hfsplus/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > fs/hugetlbfs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > fs/jfs/jfs_inode.c: inode_init_owner(&init_user_ns, inode, parent, mode); > > fs/minix/bitmap.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > fs/nilfs2/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > fs/ntfs3/inode.c: inode_init_owner(mnt_userns, inode, dir, mode); > > fs/ocfs2/dlmfs/dlmfs.c: inode_init_owner(&init_user_ns, inode, NULL, mode); > > fs/ocfs2/dlmfs/dlmfs.c: inode_init_owner(&init_user_ns, inode, parent, mode); > > fs/ocfs2/namei.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > fs/omfs/inode.c: inode_init_owner(&init_user_ns, inode, NULL, mode); > > fs/overlayfs/dir.c: inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode); > > fs/ramfs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > fs/reiserfs/namei.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > fs/sysv/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > fs/ubifs/dir.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > fs/udf/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > fs/ufs/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > fs/xfs/xfs_inode.c: inode_init_owner(mnt_userns, inode, dir, mode); > > fs/zonefs/super.c: inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555); > > kernel/bpf/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > mm/shmem.c: inode_init_owner(&init_user_ns, inode, dir, mode); > > " > > > > They are used in filesystem to init new inode function and these init inode > > functions are used by following operations: > > mkdir > > symlink > > mknod > > create > > tmpfile > > rename > > > > We don't care about mkdir because we don't strip SGID bit for directory except > > fs.xfs.irix_sgid_inherit. But we even call vfs_prepare_mode() in do_mkdirat() since > > mode_strip_sgid() will skip directories anyway. This will enforce the same > > ordering for all relevant operations and it will make the code more uniform and > > easier to understand by using new helper vfs_prepare_mode(). > > > > symlink and rename only use valid mode that doesn't have SGID bit. > > > > We have added mode_strip_sgid() api for the remaining operations. > > > > In addition to the above six operations, four filesystems has a little difference > > 1) btrfs has btrfs_create_subvol_root to create new inode but used non SGID bit > > mode and can ignore > > 2) ocfs2 reflink function should add mode_strip_sgid api manually because this ioctl > > is unique and not added into vfs. It may use S_ISGID modd. > > 3) spufs which doesn't really go hrough the regular VFS callpath because it has > > separate system call spu_create, but it t only allows the creation of > > directories and only allows bits in 0777 and can ignore > > 4) bpf use vfs_mkobj in bpf_obj_do_pin with > > "S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask()) mode and > > use bpf_mkobj_ops in bpf_iter_link_pin_kernel with S_IFREG | S_IRUSR mode, > > so bpf is also not affected > > > > This patch also changed grpid behaviour for ext4/xfs because the mode passed to > > them may been changed by vfs_prepare_mode. > > > > Also as Christian Brauner said" > > The patch itself is useful as it would move a security sensitive operation that is > > currently burried in individual filesystems into the vfs layer. But it has a decent > > regression potential since it might strip filesystems that have so far relied on > > getting the S_ISGID bit with a mode argument. So this needs a lot of testing and > > long exposure in -next for at least one full kernel cycle." > > > > Suggested-by: Dave Chinner <david@fromorbit.com> > > Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> > > Looks good! Thank you for taking care of this! :) > Reviewed-by: Darrick J. Wong <djwong@kernel.org> Agreed, thank you for taking care of this and sticking around through all the reviews. Since this is a very sensitive patch series I think we need to be annoyingly pedantic about the commit messages. This is really only necessary because of the nature of these changes so you'll forgive me for being really annoying about this. Here's what I'd change the commit messages to: It'd be great to get some more reviews and a thorough read of the commit message. We should aim to approriately express the advantages in relation to the associated regression risks: fs: move S_ISGID stripping into the vfs Creating files that have both the S_IXGRP and S_ISGID bit raised in directories that themselves have the S_ISGID bit set requires additional privileges to avoid security issues. When a filesystem creates a new inode it needs to take care that the caller is either in the group of the newly created inode or they have CAP_FSETID in their current user namespace and are privileged over the parent directory of the new inode. If any of these two conditions is true then the S_ISGID bit can be raised for an S_IXGRP file and if not it needs to be stripped. However, there are several key issues with the current state of things: * The S_ISGID stripping logic is entangled with umask stripping. If a filesystem doesn't support or enable POSIX ACLs then umask stripping is done directly in the vfs before calling into the filesystem. If the filesystem does support POSIX ACLs then unmask stripping may be done in the filesystem itself when calling posix_acl_create(). * Filesystems that don't rely on inode_init_owner() don't get S_ISGID stripping logic. While that may be intentional (e.g. network filesystems might just defer setgid stripping to a server) it is often just a security issue. * The first two points taken together mean that there's a non-standardized ordering between setgid stripping in inode_init_owner() and posix_acl_create() both on the vfs level and the filesystem level. The latter part is especially problematic since each filesystem is technically free to order inode_init_owner() and posix_acl_create() however it sees fit meaning that S_ISGID inheritance might or might not be applied. * We do still have bugs in this areas years after the initial round of setgid bugfixes. So the current state is quite messy and while we won't be able to make it completely clean as posix_acl_create() is still a filesystem specific call we can improve the S_SIGD stripping situation quite a bit by hoisting it out of inode_init_owner() and into the vfs creation operations. This means we alleviate the burden for filesystems to handle S_ISGID stripping correctly and can standardize the ordering between S_ISGID and umask stripping in the vfs. The S_ISGID bit is stripped before any umask is applied. This has the advantage that the ordering is unaffected by whether umask stripping is done by the vfs itself (if no POSIX ACLs are supported or enabled) or in the filesystem in posix_acl_create() (if POSIX ACLs are supported). To this end a new helper vfs_prepare_mode() is added which calls the previously added mode_strip_setgid() helper and strips the umask afterwards. All inode operations that create new filesystem objects have been updated to call vfs_prepare_mode() before passing the mode into the relevant inode operation of the filesystems. Care has been taken to ensure that the mode passed to the security hooks is the mode that is seen by the filesystem. Following is an overview of the filesystem specific and inode operations specific implications: arch/powerpc/platforms/cell/spufs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR); arch/powerpc/platforms/cell/spufs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR); fs/9p/vfs_inode.c: inode_init_owner(&init_user_ns, inode, NULL, mode); fs/bfs/dir.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/btrfs/inode.c: inode_init_owner(mnt_userns, inode, dir, mode); fs/btrfs/tests/btrfs-tests.c: inode_init_owner(&init_user_ns, inode, NULL, S_IFREG); fs/ext2/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/ext4/ialloc.c: inode_init_owner(mnt_userns, inode, dir, mode); fs/f2fs/namei.c: inode_init_owner(mnt_userns, inode, dir, mode); fs/hfsplus/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/hugetlbfs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/jfs/jfs_inode.c: inode_init_owner(&init_user_ns, inode, parent, mode); fs/minix/bitmap.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/nilfs2/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/ntfs3/inode.c: inode_init_owner(mnt_userns, inode, dir, mode); fs/ocfs2/dlmfs/dlmfs.c: inode_init_owner(&init_user_ns, inode, NULL, mode); fs/ocfs2/dlmfs/dlmfs.c: inode_init_owner(&init_user_ns, inode, parent, mode); fs/ocfs2/namei.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/omfs/inode.c: inode_init_owner(&init_user_ns, inode, NULL, mode); fs/overlayfs/dir.c: inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode); fs/ramfs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/reiserfs/namei.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/sysv/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/ubifs/dir.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/udf/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/ufs/ialloc.c: inode_init_owner(&init_user_ns, inode, dir, mode); fs/xfs/xfs_inode.c: inode_init_owner(mnt_userns, inode, dir, mode); fs/zonefs/super.c: inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555); kernel/bpf/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); mm/shmem.c: inode_init_owner(&init_user_ns, inode, dir, mode); All of the above filesystems end up calling inode_init_owner() when new filesystem objects are created through the following ->mkdir(), ->symlink(), ->mknod(), ->create(), ->tmpfile(), ->rename() inode operations. Since directories always inherit the S_ISGID bit with the exception of xfs when irix_sgid_inherit mode is turned on S_ISGID stripping doesn't apply. The ->symlink() inode operation trivially inherit the mode from the target and the ->rename() inode operation inherits the mode from the source inode. All other inode operations will have the S_ISGID bit stripped once in vfs_prepare_mode() before. In addition to this there are filesystems which allow the creation of filesystem objects through ioctl()s or - in the case of spufs - circumventing the vfs in other ways. If filesystem objects are created through ioctl()s the vfs doesn't know about it and can't apply regular permission checking including S_ISGID logic. Therfore, a filesystem relying on S_ISGID stripping in inode_init_owner() in their ioctl() callpath will be affected by moving this logic into the vfs. So we did our best to audit all filesystems in this regard: * btrfs allows the creation of filesystem objects through various ioctls(). Snapshot creation literally takes a snapshot and so the mode is fully preserved and S_ISGID stripping doesn't apply. Creating a new subvolum relies on inode_init_owner() in btrfs_new_inode() but only creates directories and doesn't raise S_ISGID. * ocfs2 has a peculiar implementation of reflinks. In contrast to e.g. xfs and btrfs FICLONE/FICLONERANGE ioctl() that is only concerned with the actual extents ocfs2 uses a separate ioctl() that also creates the target file. Iow, ocfs2 circumvents the vfs entirely here and did indeed rely on inode_init_owner() to strip the S_ISGID bit. This is the only place where a filesystem needs to call mode_strip_sgid() directly but this is self-inflicted pain tbh. * spufs doesn't go through the vfs at all and doesn't use ioctl()s either. Instead it has a dedicated system call spufs_create() which allows the creation of filesystem objects. But spufs only creates directories and doesn't allo S_SIGID bits, i.e. it specifically only allows 0777 bits. * bpf uses vfs_mkobj() but also doesn't allow S_ISGID bits to be created. While we did our best to audit everything there's a risk of regressions in here. However, for the sake of maintenance and given that we've seen a range of bugs years after S_ISGID inheritance issues were fixed (see [1]-[3]) the risk seems worth taking. In the worst case we will have to revert. Associated with this change is a new set of fstests to enforce the semantics for all new filesystems. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v7 4/4] ceph: Remove S_ISGID stripping code in ceph_finish_async_create 2022-04-26 4:19 [PATCH v7 1/4] fs: move sgid stripping operation from inode_init_owner into mode_strip_sgid Yang Xu 2022-04-26 4:19 ` [PATCH v7 2/4] fs: Add missing umask strip in vfs_tmpfile Yang Xu 2022-04-26 4:19 ` [PATCH v7 3/4] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem Yang Xu @ 2022-04-26 4:19 ` Yang Xu 2022-04-26 7:14 ` Christian Brauner 2022-04-26 7:06 ` [PATCH v7 1/4] fs: move sgid stripping operation from inode_init_owner into mode_strip_sgid Christian Brauner 3 siblings, 1 reply; 12+ messages in thread From: Yang Xu @ 2022-04-26 4:19 UTC (permalink / raw) To: linux-fsdevel, ceph-devel Cc: viro, david, djwong, brauner, willy, jlayton, Yang Xu Previous patches moved sgid stripping exclusively into the vfs. So manual sgid stripping by the filesystem isn't needed anymore. Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Christian Brauner (Microsoft)<brauner@kernel.org> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> --- fs/ceph/file.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 6c9e837aa1d3..8e3b99853333 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -651,10 +651,6 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry, /* Directories always inherit the setgid bit. */ if (S_ISDIR(mode)) mode |= S_ISGID; - else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) && - !in_group_p(dir->i_gid) && - !capable_wrt_inode_uidgid(&init_user_ns, dir, CAP_FSETID)) - mode &= ~S_ISGID; } else { in.gid = cpu_to_le32(from_kgid(&init_user_ns, current_fsgid())); } -- 2.27.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v7 4/4] ceph: Remove S_ISGID stripping code in ceph_finish_async_create 2022-04-26 4:19 ` [PATCH v7 4/4] ceph: Remove S_ISGID stripping code in ceph_finish_async_create Yang Xu @ 2022-04-26 7:14 ` Christian Brauner 2022-04-26 8:43 ` xuyang2018.jy 0 siblings, 1 reply; 12+ messages in thread From: Christian Brauner @ 2022-04-26 7:14 UTC (permalink / raw) To: Yang Xu; +Cc: linux-fsdevel, ceph-devel, viro, david, djwong, willy, jlayton On Tue, Apr 26, 2022 at 12:19:52PM +0800, Yang Xu wrote: > Previous patches moved sgid stripping exclusively into the vfs. So > manual sgid stripping by the filesystem isn't needed anymore. > > Reviewed-by: Xiubo Li <xiubli@redhat.com> > Reviewed-by: Christian Brauner (Microsoft)<brauner@kernel.org> > Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> > --- Since this is a very sensitive patch series I think we need to be annoyingly pedantic about the commit messages. This is really only necessary because of the nature of these changes so you'll forgive me for being really annoying about this. Here's what I'd change the commit messages to: ceph: rely on vfs for setgid stripping Now that we finished moving setgid stripping for regular files in setgid directories into the vfs, individual filesystem don't need to manually strip the setgid bit anymore. Drop the now unneeded code from ceph. > fs/ceph/file.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > index 6c9e837aa1d3..8e3b99853333 100644 > --- a/fs/ceph/file.c > +++ b/fs/ceph/file.c > @@ -651,10 +651,6 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry, > /* Directories always inherit the setgid bit. */ > if (S_ISDIR(mode)) > mode |= S_ISGID; (Frankly, this ideally shouldn't be necessary as well, i.e. it'd be great if that part would've been done by the vfs already too but it's not as security sensitive as setgid stripping for regular files.) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 4/4] ceph: Remove S_ISGID stripping code in ceph_finish_async_create 2022-04-26 7:14 ` Christian Brauner @ 2022-04-26 8:43 ` xuyang2018.jy 0 siblings, 0 replies; 12+ messages in thread From: xuyang2018.jy @ 2022-04-26 8:43 UTC (permalink / raw) To: Christian Brauner Cc: linux-fsdevel, ceph-devel, viro, david, djwong, willy, jlayton on 2022/4/26 15:14, Christian Brauner wrote: > On Tue, Apr 26, 2022 at 12:19:52PM +0800, Yang Xu wrote: >> Previous patches moved sgid stripping exclusively into the vfs. So >> manual sgid stripping by the filesystem isn't needed anymore. >> >> Reviewed-by: Xiubo Li<xiubli@redhat.com> >> Reviewed-by: Christian Brauner (Microsoft)<brauner@kernel.org> >> Signed-off-by: Yang Xu<xuyang2018.jy@fujitsu.com> >> --- > > Since this is a very sensitive patch series I think we need to be > annoyingly pedantic about the commit messages. This is really only > necessary because of the nature of these changes so you'll forgive me > for being really annoying about this. Here's what I'd change the commit > messages to: > > ceph: rely on vfs for setgid stripping > > Now that we finished moving setgid stripping for regular files in setgid > directories into the vfs, individual filesystem don't need to manually > strip the setgid bit anymore. Drop the now unneeded code from ceph. This seems better, thanks. > > >> fs/ceph/file.c | 4 ---- >> 1 file changed, 4 deletions(-) >> >> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >> index 6c9e837aa1d3..8e3b99853333 100644 >> --- a/fs/ceph/file.c >> +++ b/fs/ceph/file.c >> @@ -651,10 +651,6 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry, >> /* Directories always inherit the setgid bit. */ >> if (S_ISDIR(mode)) >> mode |= S_ISGID; > > (Frankly, this ideally shouldn't be necessary as well, i.e. it'd be > great if that part would've been done by the vfs already too but it's > not as security sensitive as setgid stripping for regular files.) Maybe we can just add mode_add_sgid api into vfs_prepare_mode in the future or only just add mode_add_sgid into do_mkdirat? static inline umode_t vfs_prepare_mode(struct user_namespace *mnt_userns, const struct inode *dir, umode_t mode) { mode = mode_strip_sgid(mnt_userns, dir, mode); if (!IS_POSIXACL(dir)) mode &= ~current_umask(); mode = mode_add_sgid(dir, mode) return mode; } fs/inode.c umodet mode_add_sgid(const struct inode *dir, umode_t mode) { if (dir && dir->i_mode & S_ISGID && S_ISDIR(mode)) mode |= S_ISGID; return mode; } Then we can remove "mode |= S_ISGID" code in inode_init_owner after we check all places called inode_init_owner. But now, let's finished S_ISGID stripping patchset firstly. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 1/4] fs: move sgid stripping operation from inode_init_owner into mode_strip_sgid 2022-04-26 4:19 [PATCH v7 1/4] fs: move sgid stripping operation from inode_init_owner into mode_strip_sgid Yang Xu ` (2 preceding siblings ...) 2022-04-26 4:19 ` [PATCH v7 4/4] ceph: Remove S_ISGID stripping code in ceph_finish_async_create Yang Xu @ 2022-04-26 7:06 ` Christian Brauner 2022-04-26 7:39 ` xuyang2018.jy 3 siblings, 1 reply; 12+ messages in thread From: Christian Brauner @ 2022-04-26 7:06 UTC (permalink / raw) To: Yang Xu; +Cc: linux-fsdevel, ceph-devel, viro, david, djwong, willy, jlayton On Tue, Apr 26, 2022 at 12:19:49PM +0800, Yang Xu wrote: > This has no functional change. Just create and export mode_strip_sgid > api for the subsequent patch. This function is used to strip S_ISGID mode > when init a new inode. > > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> > Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> > --- Since this is a very sensitive patch series I think we need to be annoyingly pedantic about the commit messages. This is really only necessary because of the nature of these changes so you'll forgive me for being really annoying about this. Here's what I'd change the commit message to: fs: add mode_strip_sgid() helper Add a dedicated helper to handle the setgid bit when creating a new file in a setgid directory. This is a preparatory patch for moving setgid stripping into the vfs. The patch contains no functional changes. Currently the setgid stripping logic is open-coded directly in inode_init_owner() and the individual filesystems are responsible for handling setgid inheritance. Since this has proven to be brittle as evidenced by old issues we uncovered over the last months (see [1] to [3] below) we will try to move this logic into the vfs. Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes" [1] Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [2] Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [3] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 1/4] fs: move sgid stripping operation from inode_init_owner into mode_strip_sgid 2022-04-26 7:06 ` [PATCH v7 1/4] fs: move sgid stripping operation from inode_init_owner into mode_strip_sgid Christian Brauner @ 2022-04-26 7:39 ` xuyang2018.jy 2022-04-26 8:39 ` Christian Brauner 0 siblings, 1 reply; 12+ messages in thread From: xuyang2018.jy @ 2022-04-26 7:39 UTC (permalink / raw) To: Christian Brauner Cc: linux-fsdevel, ceph-devel, viro, david, djwong, willy, jlayton on 2022/4/26 15:06, Christian Brauner wrote: > On Tue, Apr 26, 2022 at 12:19:49PM +0800, Yang Xu wrote: >> This has no functional change. Just create and export mode_strip_sgid >> api for the subsequent patch. This function is used to strip S_ISGID mode >> when init a new inode. >> >> Reviewed-by: Darrick J. Wong<djwong@kernel.org> >> Reviewed-by: Christian Brauner (Microsoft)<brauner@kernel.org> >> Signed-off-by: Yang Xu<xuyang2018.jy@fujitsu.com> >> --- > > Since this is a very sensitive patch series I think we need to be > annoyingly pedantic about the commit messages. This is really only > necessary because of the nature of these changes so you'll forgive me > for being really annoying about this. Here's what I'd change the commit > message to: > > fs: add mode_strip_sgid() helper > > Add a dedicated helper to handle the setgid bit when creating a new file > in a setgid directory. This is a preparatory patch for moving setgid > stripping into the vfs. The patch contains no functional changes. > > Currently the setgid stripping logic is open-coded directly in > inode_init_owner() and the individual filesystems are responsible for > handling setgid inheritance. Since this has proven to be brittle as > evidenced by old issues we uncovered over the last months (see [1] to > [3] below) we will try to move this logic into the vfs. > > Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes" [1] > Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [2] > Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [3] This seems better, thanks. ps: Sorry, forgive my poor ability for write this. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 1/4] fs: move sgid stripping operation from inode_init_owner into mode_strip_sgid 2022-04-26 7:39 ` xuyang2018.jy @ 2022-04-26 8:39 ` Christian Brauner 2022-04-26 9:22 ` xuyang2018.jy 0 siblings, 1 reply; 12+ messages in thread From: Christian Brauner @ 2022-04-26 8:39 UTC (permalink / raw) To: xuyang2018.jy Cc: linux-fsdevel, ceph-devel, viro, david, djwong, willy, jlayton On Tue, Apr 26, 2022 at 07:39:07AM +0000, xuyang2018.jy@fujitsu.com wrote: > on 2022/4/26 15:06, Christian Brauner wrote: > > On Tue, Apr 26, 2022 at 12:19:49PM +0800, Yang Xu wrote: > >> This has no functional change. Just create and export mode_strip_sgid > >> api for the subsequent patch. This function is used to strip S_ISGID mode > >> when init a new inode. > >> > >> Reviewed-by: Darrick J. Wong<djwong@kernel.org> > >> Reviewed-by: Christian Brauner (Microsoft)<brauner@kernel.org> > >> Signed-off-by: Yang Xu<xuyang2018.jy@fujitsu.com> > >> --- > > > > Since this is a very sensitive patch series I think we need to be > > annoyingly pedantic about the commit messages. This is really only > > necessary because of the nature of these changes so you'll forgive me > > for being really annoying about this. Here's what I'd change the commit > > message to: > > > > fs: add mode_strip_sgid() helper > > > > Add a dedicated helper to handle the setgid bit when creating a new file > > in a setgid directory. This is a preparatory patch for moving setgid > > stripping into the vfs. The patch contains no functional changes. > > > > Currently the setgid stripping logic is open-coded directly in > > inode_init_owner() and the individual filesystems are responsible for > > handling setgid inheritance. Since this has proven to be brittle as > > evidenced by old issues we uncovered over the last months (see [1] to > > [3] below) we will try to move this logic into the vfs. > > > > Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes" [1] > > Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [2] > > Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [3] > > This seems better, thanks. > > ps: Sorry, forgive my poor ability for write this. This really isn't any comment on your ability to write this! I tried to make this clear but I obviously failed. It is really just that this has an associated non-zero regression risk and we need to make sure to highlight this and be very clear about the motivation for this change. So it's equal parts pedantry and trying to keep our own heads off the guillotine. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 1/4] fs: move sgid stripping operation from inode_init_owner into mode_strip_sgid 2022-04-26 8:39 ` Christian Brauner @ 2022-04-26 9:22 ` xuyang2018.jy 0 siblings, 0 replies; 12+ messages in thread From: xuyang2018.jy @ 2022-04-26 9:22 UTC (permalink / raw) To: Christian Brauner Cc: linux-fsdevel, ceph-devel, viro, david, djwong, willy, jlayton on 2022/4/26 16:39, Christian Brauner wrote: > On Tue, Apr 26, 2022 at 07:39:07AM +0000, xuyang2018.jy@fujitsu.com wrote: >> on 2022/4/26 15:06, Christian Brauner wrote: >>> On Tue, Apr 26, 2022 at 12:19:49PM +0800, Yang Xu wrote: >>>> This has no functional change. Just create and export mode_strip_sgid >>>> api for the subsequent patch. This function is used to strip S_ISGID mode >>>> when init a new inode. >>>> >>>> Reviewed-by: Darrick J. Wong<djwong@kernel.org> >>>> Reviewed-by: Christian Brauner (Microsoft)<brauner@kernel.org> >>>> Signed-off-by: Yang Xu<xuyang2018.jy@fujitsu.com> >>>> --- >>> >>> Since this is a very sensitive patch series I think we need to be >>> annoyingly pedantic about the commit messages. This is really only >>> necessary because of the nature of these changes so you'll forgive me >>> for being really annoying about this. Here's what I'd change the commit >>> message to: >>> >>> fs: add mode_strip_sgid() helper >>> >>> Add a dedicated helper to handle the setgid bit when creating a new file >>> in a setgid directory. This is a preparatory patch for moving setgid >>> stripping into the vfs. The patch contains no functional changes. >>> >>> Currently the setgid stripping logic is open-coded directly in >>> inode_init_owner() and the individual filesystems are responsible for >>> handling setgid inheritance. Since this has proven to be brittle as >>> evidenced by old issues we uncovered over the last months (see [1] to >>> [3] below) we will try to move this logic into the vfs. >>> >>> Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes" [1] >>> Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [2] >>> Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [3] >> >> This seems better, thanks. >> >> ps: Sorry, forgive my poor ability for write this. > > This really isn't any comment on your ability to write this! I tried to > make this clear but I obviously failed. > > It is really just that this has an associated non-zero regression risk > and we need to make sure to highlight this and be very clear about the > motivation for this change. So it's equal parts pedantry and trying to > keep our own heads off the guillotine. Understand. So do you have other comments? I plan to send a v8(based on 5.18-rc4). ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-04-26 10:00 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-26 4:19 [PATCH v7 1/4] fs: move sgid stripping operation from inode_init_owner into mode_strip_sgid Yang Xu 2022-04-26 4:19 ` [PATCH v7 2/4] fs: Add missing umask strip in vfs_tmpfile Yang Xu 2022-04-26 4:19 ` [PATCH v7 3/4] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem Yang Xu 2022-04-26 5:21 ` Darrick J. Wong 2022-04-26 8:34 ` Christian Brauner 2022-04-26 4:19 ` [PATCH v7 4/4] ceph: Remove S_ISGID stripping code in ceph_finish_async_create Yang Xu 2022-04-26 7:14 ` Christian Brauner 2022-04-26 8:43 ` xuyang2018.jy 2022-04-26 7:06 ` [PATCH v7 1/4] fs: move sgid stripping operation from inode_init_owner into mode_strip_sgid Christian Brauner 2022-04-26 7:39 ` xuyang2018.jy 2022-04-26 8:39 ` Christian Brauner 2022-04-26 9:22 ` xuyang2018.jy
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.