From: Christian Brauner <brauner@kernel.org>
To: Yang Xu <xuyang2018.jy@fujitsu.com>,
Dave Chinner <david@fromorbit.com>,
"Darrick J. Wong" <djwong@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
Jeff Layton <jlayton@kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>
Cc: Amir Goldstein <amir73il@gmail.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
ceph-devel <ceph-devel@vger.kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH v8 3/4] fs: move S_ISGID stripping into the vfs
Date: Wed, 27 Apr 2022 11:22:01 +0200 [thread overview]
Message-ID: <20220427092201.wvsdjbnc7b4dttaw@wittgenstein> (raw)
In-Reply-To: <20220426145349.zxmahoq2app2lhip@wittgenstein>
On Tue, Apr 26, 2022 at 04:53:49PM +0200, Christian Brauner wrote:
> On Tue, Apr 26, 2022 at 01:52:11PM +0200, Miklos Szeredi wrote:
> > On Tue, 26 Apr 2022 at 13:21, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Tue, Apr 26, 2022 at 1:38 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > > > One thing that I just remembered and which I think I haven't mentioned
> > > > so far is that moving S_ISGID stripping from filesystem callpaths into
> > > > the vfs callpaths means that we're hoisting this logic out of vfs_*()
> > > > helpers implicitly.
> > > >
> > > > So filesystems that call vfs_*() helpers directly can't rely on S_ISGID
> > > > stripping being done in vfs_*() helpers anymore unless they pass the
> > > > mode on from a prior run through the vfs.
> > > >
> > > > This mostly affects overlayfs which calls vfs_*() functions directly. So
> > > > a typical overlayfs callstack would be (roughly - I'm omw to lunch):
> > > >
> > > > sys_mknod()
> > > > -> do_mknodat(mode) // calls vfs_prepare_mode()
> > > > -> .mknod = ovl_mknod(mode)
> > > > -> ovl_create(mode)
> > > > -> vfs_mknod(mode)
> > > >
> > > > I think we are safe as overlayfs passes on the mode on from its own run
> > > > through the vfs and then via vfs_*() to the underlying filesystem but it
> > > > is worth point that out.
> > > >
> > > > Ccing Amir just for confirmation.
> > >
> > > Looks fine to me, but CC Miklos ...
> >
> > Looks fine to me as well. Overlayfs should share the mode (including
> > the suid and sgid bits), owner, group and ACL's with the underlying
> > filesystem, so clearing sgid based on overlay parent directory should
> > result in the same mode as if it was done based on the parent
> > directory on the underlying layer.
>
> Ah yes, good point.
>
> >
> > AFAIU this logic is not affected by userns or mnt_userns, but
> > Christian would be best to confirm that.
>
> It does depend on it as S_ISGID stripping requires knowledge about
> whether the caller has CAP_FSETID and is capable over the parent
> directory or if they are in the group the file is owned by.
>
> I think ultimately it might just come down to moving vfs_prepare_mode()
> into vfs_*() helpers and not into the do_*at() helpers.
>
> That would be cleaner anyway as right now we have this weird disconnect
> between vfs_tmpfile() and vfs_{create,mknod,mkdir}(). IOW, vfs_tmpfile()
> doesn't even have an associated do_*() wrapper where we could call
> vfs_prepare_mode() from.
>
> So ultimately it might be nicer if we do it in vfs_*() helpers anyway.
>
> The less pretty thing about it will be that the security_path_*() hooks
> also want a mode.
>
> Right now these hooks receive the mode as it's passed in from userspace
> minus umask but before S_ISGID stripping happens.
>
> Whereas I think they should really see what the filesystem sees and
> currently it's a bug that they see something else.
>
> I need to think about this a bit.
So on top of that series (though it should just be folded in), does that
look reasonable?
From e993f81caae60fee4f77b40d46ad3863ea383493 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner@kernel.org>
Date: Wed, 27 Apr 2022 10:53:35 +0200
Subject: [PATCH] UNTESTED UNTESTED UNTESTED
As I realized yesterday we need to move mode preparation into the vfs_*()
instead of do_*() helpers as filesystems like overlayfs have the following
callstacks:
sys_mknod(ovl_path, mode)
-> do_mknodat(ovl_path, mode)
-> .mknod = ovl_mknod(ovl_path, mode)
-> vfs_mknod(xfs_path, mode)
-> .mknod = xfs_vn_mknod(xfs_path, mode)
and the requirement that this will yield the same mode as:
sys_mknod(xfs_path, mode)
-> do_mknodat(xfs_path, mode)
-> .mknod = xfs_vn_mknod(xfs_path, mode)
By moving setgid stripping into vfs_*() helpers we achieve:
- Moving setgid stripping out of the individual filesystem's responsibility.
- Ensure that callers of vfs_*() helpers continue to get correct setgid
stripping.
Another thing I realized while looking at this yesterday was the entanglement
with security hooks. Security hooks currently see a different mode than the
actual filesystem sees when it calls into inode_init_owner(). This patch
doesn't change that!
I originally thought that we might be able to make the security hooks see the
same mode that the filesystem will see. However, I have doubts. First, I don't
think that is achievable without more restructuring. Second, I don't think it's
required as the hooks have clearly been placed before any vfs_*() calls and
thereby have committed themselves to see the mode as passed in from userspace
(minus the umask). We will simply continue doing just exactly that
side-stepping the issue for now.
Sketched-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
fs/namei.c | 103 +++++++++++++++++++++++++++++++++++++++------
include/linux/fs.h | 11 -----
2 files changed, 90 insertions(+), 24 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 5dbf00704ae8..8b83db15ae5f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2998,6 +2998,71 @@ void unlock_rename(struct dentry *p1, struct dentry *p2)
}
EXPORT_SYMBOL(unlock_rename);
+/**
+ * mode_strip_umask - handle vfs umask stripping
+ * @dir: parent directory of the new inode
+ * @mode: mode of the new inode to be created in @dir
+ *
+ * Umask stripping depends on whether or not the filesystem supports POSIX
+ * ACLs. If the filesystem doesn't support it umask stripping is done directly
+ * in here. If the filesystem does support POSIX ACLs umask stripping is
+ * deferred until the filesystem calls posix_acl_create().
+ *
+ * Returns: mode
+ */
+static inline umode_t mode_strip_umask(const struct inode *dir, umode_t mode)
+{
+ if (!IS_POSIXACL(dir))
+ mode &= ~current_umask();
+ return mode;
+}
+
+/**
+ * vfs_prepare_mode - prepare the mode to be used for a new inode
+ * @mnt_userns: user namespace of the mount the inode was found from
+ * @dir: parent directory of the new inode
+ * @mode: mode of the new inode
+ * @mask_perms: allowed permission by the vfs
+ * @type: type of file to be created
+ *
+ * This helper consolidates and enforces vfs restrictions on the @mode of a new
+ * object to be created.
+ *
+ * Umask stripping depends on whether the filesystem supports POSIX ACLs (see
+ * the kernel documentation for mode_strip_umask()). Moving umask stripping
+ * after setgid stripping allows the same ordering for both non-POSIX ACL and
+ * POSIX ACL supporting filesystems.
+ *
+ * Note that it's currently valid for @type to be 0 if a directory is created.
+ * Filesystems raise that flag individually and we need to check whether each
+ * filesystem can deal with receiving S_IFDIR from the vfs before we enforce a
+ * non-zero type.
+ *
+ * Returns: mode to be passed to the filesystem
+ */
+static inline umode_t vfs_prepare_mode(struct user_namespace *mnt_userns,
+ const struct inode *dir, umode_t mode,
+ umode_t mask_perms, umode_t type)
+{
+ /*
+ * S_ISGID stripping depends on the mode of the new file so make sure
+ * that the caller gives us this information and splat if we miss it.
+ */
+ WARN_ON_ONCE((mode & S_IFMT) == 0);
+
+ mode = mode_strip_sgid(mnt_userns, dir, mode);
+ mode = mode_strip_umask(dir, mode);
+
+ /*
+ * Apply the vfs mandated allowed permission mask and set the type of
+ * file to be created before we call into the filesystem.
+ */
+ mode &= (mask_perms & ~S_IFMT);
+ mode |= (type & S_IFMT);
+
+ return mode;
+}
+
/**
* vfs_create - create new file
* @mnt_userns: user namespace of the mount the inode was found from
@@ -3023,8 +3088,9 @@ int vfs_create(struct user_namespace *mnt_userns, struct inode *dir,
if (!dir->i_op->create)
return -EACCES; /* shouldn't it be ENOSYS? */
- mode &= S_IALLUGO;
- mode |= S_IFREG;
+
+ mode = vfs_prepare_mode(mnt_userns, d_inode(path.dentry), mode,
+ S_IALLUGO, S_IFREG);
error = security_inode_create(dir, dentry, mode);
if (error)
return error;
@@ -3287,7 +3353,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
if (open_flag & O_CREAT) {
if (open_flag & O_EXCL)
open_flag &= ~O_TRUNC;
- mode = vfs_prepare_mode(mnt_userns, dir->d_inode, mode);
+ mode = vfs_prepare_mode(mnt_userns, dir->d_inode, mode, mode, mode);
if (likely(got_write))
create_error = may_o_create(mnt_userns, &nd->path,
dentry, mode);
@@ -3520,7 +3586,7 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns,
child = d_alloc(dentry, &slash_name);
if (unlikely(!child))
goto out_err;
- mode = vfs_prepare_mode(mnt_userns, dir, mode);
+ mode = vfs_prepare_mode(mnt_userns, dir, mode, mode, mode);
error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
if (error)
goto out_err;
@@ -3798,6 +3864,8 @@ int vfs_mknod(struct user_namespace *mnt_userns, struct inode *dir,
if (!dir->i_op->mknod)
return -EPERM;
+ mode = vfs_prepare_mode(mnt_userns, d_inode(path.dentry),
+ mode, mode, mode);
error = devcgroup_inode_mknod(mode, dev);
if (error)
return error;
@@ -3848,12 +3916,13 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
if (IS_ERR(dentry))
goto out1;
- mnt_userns = mnt_user_ns(path.mnt);
- mode = vfs_prepare_mode(mnt_userns, path.dentry->d_inode, mode);
- error = security_path_mknod(&path, dentry, mode, dev);
+ error = security_path_mknod(&path, dentry,
+ mode_strip_umask(d_inode(path.dentry), mode),
+ dev);
if (error)
goto out2;
+ mnt_userns = mnt_user_ns(path.mnt);
switch (mode & S_IFMT) {
case 0: case S_IFREG:
error = vfs_create(mnt_userns, path.dentry->d_inode,
@@ -3919,7 +3988,13 @@ int vfs_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
if (!dir->i_op->mkdir)
return -EPERM;
- mode &= (S_IRWXUGO|S_ISVTX);
+ /*
+ * Filesystems currently raise S_IFDIR individually. We should try and
+ * fix that going forward passing it in from the vfs as we do for all
+ * other files going forward.
+ */
+ mode = vfs_prepare_mode(mnt_userns, d_inode(path.dentry),
+ mode, S_IRWXUGO | S_ISVTX, 0);
error = security_inode_mkdir(dir, dentry, mode);
if (error)
return error;
@@ -3940,7 +4015,6 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
struct path path;
int error;
unsigned int lookup_flags = LOOKUP_DIRECTORY;
- struct user_namespace *mnt_userns;
retry:
dentry = filename_create(dfd, name, &path, lookup_flags);
@@ -3948,12 +4022,15 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
if (IS_ERR(dentry))
goto out_putname;
- mnt_userns = mnt_user_ns(path.mnt);
- mode = vfs_prepare_mode(mnt_userns, path.dentry->d_inode, mode);
- error = security_path_mkdir(&path, dentry, mode);
- if (!error)
+ error = security_path_mkdir(&path, dentry,
+ mode_strip_umask(d_inode(path.dentry), mode));
+ if (!error) {
+ struct user_namespace *mnt_userns;
+
+ mnt_userns = mnt_user_ns(path.mnt);
error = vfs_mkdir(mnt_userns, path.dentry->d_inode, dentry,
mode);
+ }
done_path_create(&path, dentry);
if (retry_estale(error, lookup_flags)) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 914c8f28bb02..98b44a2732f5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3459,17 +3459,6 @@ static inline bool dir_relax_shared(struct inode *inode)
return !IS_DEADDIR(inode);
}
-static inline umode_t vfs_prepare_mode(struct user_namespace *mnt_userns,
- const struct inode *dir, umode_t mode)
-{
- mode = mode_strip_sgid(mnt_userns, dir, mode);
-
- if (!IS_POSIXACL(dir))
- mode &= ~current_umask();
-
- return mode;
-}
-
extern bool path_noexec(const struct path *path);
extern void inode_nohighmem(struct inode *inode);
--
2.32.0
next prev parent reply other threads:[~2022-04-27 10:43 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-26 11:11 [PATCH v8 1/4] fs: add mode_strip_sgid() helper Yang Xu
2022-04-26 11:11 ` [PATCH v8 2/4] fs: Add missing umask strip in vfs_tmpfile Yang Xu
2022-04-26 11:11 ` [PATCH v8 3/4] fs: move S_ISGID stripping into the vfs Yang Xu
2022-04-26 10:38 ` Christian Brauner
2022-04-26 11:20 ` Amir Goldstein
2022-04-26 11:52 ` Miklos Szeredi
2022-04-26 14:53 ` Christian Brauner
2022-04-27 9:22 ` Christian Brauner [this message]
2022-04-28 4:45 ` Al Viro
2022-04-28 8:07 ` Christian Brauner
2022-04-26 15:52 ` Christian Brauner
2022-04-27 1:21 ` xuyang2018.jy
2022-04-26 11:11 ` [PATCH v8 4/4] ceph: rely on vfs for setgid stripping Yang Xu
2022-04-26 14:52 ` [PATCH v8 1/4] fs: add mode_strip_sgid() helper Jeff Layton
2022-04-27 1:34 ` xuyang2018.jy
2022-04-28 1:59 ` Al Viro
2022-04-28 2:15 ` Al Viro
2022-04-28 2:23 ` xuyang2018.jy
2022-04-28 2:49 ` Al Viro
2022-04-28 3:12 ` Al Viro
2022-04-28 3:46 ` Al Viro
2022-04-28 9:34 ` Christian Brauner
2022-05-19 1:03 ` xuyang2018.jy
2022-05-19 9:14 ` Christian Brauner
2022-04-28 8:06 ` Jann Horn
2022-04-28 8:44 ` Christian Brauner
2022-04-28 11:55 ` Christian Brauner
2022-04-28 8:25 ` Christian Brauner
2022-04-28 4:40 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220427092201.wvsdjbnc7b4dttaw@wittgenstein \
--to=brauner@kernel.org \
--cc=amir73il@gmail.com \
--cc=ceph-devel@vger.kernel.org \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=xuyang2018.jy@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).