On Nov 23, 2020, at 7:12 AM, Miklos Szeredi wrote: > > Dmitry found an issue with overlayfs's FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR > implementation: > > https://lore.kernel.org/linux-unionfs/CACT4Y+bUfavwMVv2SEMve5pabE_AwsDO0YsRBGZtYqX59a77vA@mail.gmail.com/ > > I think the only proper soltuion is to move these into inode operations, which > should be a good cleanup as well. > > This is a first cut, the FS_IOC_SETFLAGS conversion is not complete, and only > lightly tested on ext4 and xfs. > > There are minor changes in behavior, like the exact errno value in case of > multiple error conditions. > > 34 files changed, 668 insertions(+), 1170 deletions(-) Hi Miklos, this looks like a good reduction in code duplication (-500 LOC is nice). One issues I have with this patch is that it spreads the use of "fsxattr" asthe name for these inode flags further into the VFS. That was inherited from XFS because of the ioctl name, but is very confusing with "real" extended attributes, also using get/setxattr names but totally differently. It would be better to use function/variable names with "xflags" and "iflags" that are already used in several filesystems to separate this from xattrs. It wouldn't be terrible to also rename the ioctl to FS_IOC_FSSETXFLAGS and keep a #define for FS_IOC_FSSETXATTR for compatibility, but that is less critical if that is now only used in one place in the code. Some more comments inline... > --- a/fs/ioctl.c > +++ b/fs/ioctl.c > +/* > + * Generic function to check FS_IOC_FSSETXATTR/FS_IOC_SETFLAGS values and reject > + * any invalid configurations. > + * > + * Note: must be called with inode lock held. > + */ > +static int fssetxattr_prepare(struct inode *inode, > + const struct kfsxattr *old_fa, > + struct kfsxattr *fa) > +{ > + /* > + * The IMMUTABLE and APPEND_ONLY flags can only be changed by > + * the relevant capability. > + */ > + if ((fa->fsx_flags ^ old_fa->fsx_flags) & (FS_APPEND_FL | FS_IMMUTABLE_FL) && > + !capable(CAP_LINUX_IMMUTABLE)) > + return -EPERM; > + > + if (fa->flags_valid) > + return fscrypt_prepare_setflags(inode, old_fa->fsx_flags, fa->fsx_flags); This doesn't seem right? It means if iflags are set, the rest of the checks are skipped *even if* there are no problems with the fscrypt flags? That bypasses the DAX and PROJINHERIT checks later in this function, but it is also possible to set/clear those flags via FS_IOC_SETFLAGS, and is not very obvious for the code flow. I'd think this should be something more like: if (IS_ENCRYPTED(inode) && fa->flags_valid) { rc = fscrypt_prepare_setflags(...); if (rc) return rc; } and continue on to the rest of the checks, and maybe skip the xflags-specific checks (EXTSIZE, EXTSZINHERIT, COWEXTSIZE) if xflags_valid is not set, though they would just be no-ops in that case since the iflags interface will not set those flags. Alternately, move the DAX and PROJINHERIT checks above "flags_valid", but add a comment that the remaining checks are only for xflags-specific values. > + /* > + * Project Quota ID state is only allowed to change from within the init > + * namespace. Enforce that restriction only if we are trying to change > + * the quota ID state. Everything else is allowed in user namespaces. > + */ > + if (current_user_ns() != &init_user_ns) { > + if (old_fa->fsx_projid != fa->fsx_projid) > + return -EINVAL; > + if ((old_fa->fsx_xflags ^ fa->fsx_xflags) & > + FS_XFLAG_PROJINHERIT) > + return -EINVAL; > + } > + > + /* Check extent size hints. */ > + if ((fa->fsx_xflags & FS_XFLAG_EXTSIZE) && !S_ISREG(inode->i_mode)) > + return -EINVAL; > + > + if ((fa->fsx_xflags & FS_XFLAG_EXTSZINHERIT) && > + !S_ISDIR(inode->i_mode)) > + return -EINVAL; > + > + if ((fa->fsx_xflags & FS_XFLAG_COWEXTSIZE) && > + !S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) > + return -EINVAL; > + > + /* > + * It is only valid to set the DAX flag on regular files and > + * directories on filesystems. > + */ > + if ((fa->fsx_xflags & FS_XFLAG_DAX) && > + !(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode))) > + return -EINVAL; > + > + /* Extent size hints of zero turn off the flags. */ > + if (fa->fsx_extsize == 0) > + fa->fsx_xflags &= ~(FS_XFLAG_EXTSIZE | FS_XFLAG_EXTSZINHERIT); > + if (fa->fsx_cowextsize == 0) > + fa->fsx_xflags &= ~FS_XFLAG_COWEXTSIZE; > + > + return 0; > +} [snip] > +int vfs_fssetxattr(struct dentry *dentry, struct kfsxattr *fa) > +{ > + struct inode *inode = d_inode(dentry); > + struct kfsxattr old_fa; > + int err; > + > + if (d_is_special(dentry)) > + return -ENOTTY; > + > + if (!inode->i_op->fssetxattr) > + return -ENOIOCTLCMD; > + > + if (!inode_owner_or_capable(inode)) > + return -EPERM; > + > + inode_lock(inode); > + err = vfs_fsgetxattr(dentry, &old_fa); > + if (!err) { > + /* initialize missing bits from old_fa */ > + if (fa->flags_valid) { > + fa->fsx_xflags |= old_fa.fsx_xflags & ~FS_XFLAG_COMMON; > + fa->fsx_extsize = old_fa.fsx_extsize; > + fa->fsx_nextents = old_fa.fsx_nextents; > + fa->fsx_projid = old_fa.fsx_projid; > + fa->fsx_cowextsize = old_fa.fsx_cowextsize; > + } else { > + fa->fsx_flags |= old_fa.fsx_flags & ~FS_COMMON_FL; > + } This extra call to vfs_fsgetxattr() is adding pure overhead for the case of FS_IOC_GETFLAGS and is totally unnecessary. If iflags_valid is set, then none of these other fields should be accessed in the ->fssetxattr() method, and they can check for iflags_valid vs. xflags_valid themselves to see which ioctl is being called and only access fields which are valid. > + err = fssetxattr_prepare(inode, &old_fa, fa); > + if (!err) > + err = inode->i_op->fssetxattr(dentry, fa); > + } > + inode_unlock(inode); > + > + return err; > +} > +EXPORT_SYMBOL(vfs_fssetxattr); Cheers, Andreas