All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Yang Xu <xuyang2018.jy@fujitsu.com>,
	Dave Chinner <david@fromorbit.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Jeff Layton <jlayton@kernel.org>,
	Miklos Szeredi <miklos@szeredi.hu>
Cc: Amir Goldstein <amir73il@gmail.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	ceph-devel <ceph-devel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH v8 3/4] fs: move S_ISGID stripping into the vfs
Date: Wed, 27 Apr 2022 11:22:01 +0200	[thread overview]
Message-ID: <20220427092201.wvsdjbnc7b4dttaw@wittgenstein> (raw)
In-Reply-To: <20220426145349.zxmahoq2app2lhip@wittgenstein>

On Tue, Apr 26, 2022 at 04:53:49PM +0200, Christian Brauner wrote:
> On Tue, Apr 26, 2022 at 01:52:11PM +0200, Miklos Szeredi wrote:
> > On Tue, 26 Apr 2022 at 13:21, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Tue, Apr 26, 2022 at 1:38 PM Christian Brauner <brauner@kernel.org> wrote:
> > 
> > > > One thing that I just remembered and which I think I haven't mentioned
> > > > so far is that moving S_ISGID stripping from filesystem callpaths into
> > > > the vfs callpaths means that we're hoisting this logic out of vfs_*()
> > > > helpers implicitly.
> > > >
> > > > So filesystems that call vfs_*() helpers directly can't rely on S_ISGID
> > > > stripping being done in vfs_*() helpers anymore unless they pass the
> > > > mode on from a prior run through the vfs.
> > > >
> > > > This mostly affects overlayfs which calls vfs_*() functions directly. So
> > > > a typical overlayfs callstack would be (roughly - I'm omw to lunch):
> > > >
> > > > sys_mknod()
> > > > -> do_mknodat(mode) // calls vfs_prepare_mode()
> > > >    -> .mknod = ovl_mknod(mode)
> > > >       -> ovl_create(mode)
> > > >          -> vfs_mknod(mode)
> > > >
> > > > I think we are safe as overlayfs passes on the mode on from its own run
> > > > through the vfs and then via vfs_*() to the underlying filesystem but it
> > > > is worth point that out.
> > > >
> > > > Ccing Amir just for confirmation.
> > >
> > > Looks fine to me, but CC Miklos ...
> > 
> > Looks fine to me as well.  Overlayfs should share the mode (including
> > the suid and sgid bits), owner, group and ACL's with the underlying
> > filesystem, so clearing sgid based on overlay parent directory should
> > result in the same mode as if it was done based on the parent
> > directory on the underlying layer.
> 
> Ah yes, good point.
> 
> > 
> > AFAIU this logic is not affected by userns or mnt_userns, but
> > Christian would be best to confirm that.
> 
> It does depend on it as S_ISGID stripping requires knowledge about
> whether the caller has CAP_FSETID and is capable over the parent
> directory or if they are in the group the file is owned by.
> 
> I think ultimately it might just come down to moving vfs_prepare_mode()
> into vfs_*() helpers and not into the do_*at() helpers.
> 
> That would be cleaner anyway as right now we have this weird disconnect
> between vfs_tmpfile() and vfs_{create,mknod,mkdir}(). IOW, vfs_tmpfile()
> doesn't even have an associated do_*() wrapper where we could call
> vfs_prepare_mode() from.
> 
> So ultimately it might be nicer if we do it in vfs_*() helpers anyway.
> 
> The less pretty thing about it will be that the security_path_*() hooks
> also want a mode.
> 
> Right now these hooks receive the mode as it's passed in from userspace
> minus umask but before S_ISGID stripping happens.
> 
> Whereas I think they should really see what the filesystem sees and
> currently it's a bug that they see something else.
> 
> I need to think about this a bit.

So on top of that series (though it should just be folded in), does that
look reasonable?

From e993f81caae60fee4f77b40d46ad3863ea383493 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner@kernel.org>
Date: Wed, 27 Apr 2022 10:53:35 +0200
Subject: [PATCH] UNTESTED UNTESTED UNTESTED

As I realized yesterday we need to move mode preparation into the vfs_*()
instead of do_*() helpers as filesystems like overlayfs have the following
callstacks:

sys_mknod(ovl_path, mode)
-> do_mknodat(ovl_path, mode)
   -> .mknod = ovl_mknod(ovl_path, mode)
      -> vfs_mknod(xfs_path, mode)
	 -> .mknod = xfs_vn_mknod(xfs_path, mode)

and the requirement that this will yield the same mode as:

sys_mknod(xfs_path, mode)
-> do_mknodat(xfs_path, mode)
   -> .mknod = xfs_vn_mknod(xfs_path, mode)

By moving setgid stripping into vfs_*() helpers we achieve:

- Moving setgid stripping out of the individual filesystem's responsibility.
- Ensure that callers of vfs_*() helpers continue to get correct setgid
  stripping.

Another thing I realized while looking at this yesterday was the entanglement
with security hooks. Security hooks currently see a different mode than the
actual filesystem sees when it calls into inode_init_owner(). This patch
doesn't change that!

I originally thought that we might be able to make the security hooks see the
same mode that the filesystem will see. However, I have doubts. First, I don't
think that is achievable without more restructuring. Second, I don't think it's
required as the hooks have clearly been placed before any vfs_*() calls and
thereby have committed themselves to see the mode as passed in from userspace
(minus the umask). We will simply continue doing just exactly that
side-stepping the issue for now.

Sketched-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
 fs/namei.c         | 103 +++++++++++++++++++++++++++++++++++++++------
 include/linux/fs.h |  11 -----
 2 files changed, 90 insertions(+), 24 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5dbf00704ae8..8b83db15ae5f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2998,6 +2998,71 @@ void unlock_rename(struct dentry *p1, struct dentry *p2)
 }
 EXPORT_SYMBOL(unlock_rename);
 
+/**
+ * mode_strip_umask - handle vfs umask stripping
+ * @dir:	parent directory of the new inode
+ * @mode:	mode of the new inode to be created in @dir
+ *
+ * Umask stripping depends on whether or not the filesystem supports POSIX
+ * ACLs. If the filesystem doesn't support it umask stripping is done directly
+ * in here. If the filesystem does support POSIX ACLs umask stripping is
+ * deferred until the filesystem calls posix_acl_create().
+ *
+ * Returns: mode
+ */
+static inline umode_t mode_strip_umask(const struct inode *dir, umode_t mode)
+{
+	if (!IS_POSIXACL(dir))
+		mode &= ~current_umask();
+	return mode;
+}
+
+/**
+ * vfs_prepare_mode - prepare the mode to be used for a new inode
+ * @mnt_userns:	user namespace of the mount the inode was found from
+ * @dir:	parent directory of the new inode
+ * @mode:	mode of the new inode
+ * @mask_perms:	allowed permission by the vfs
+ * @type:	type of file to be created
+ *
+ * This helper consolidates and enforces vfs restrictions on the @mode of a new
+ * object to be created.
+ *
+ * Umask stripping depends on whether the filesystem supports POSIX ACLs (see
+ * the kernel documentation for mode_strip_umask()). Moving umask stripping
+ * after setgid stripping allows the same ordering for both non-POSIX ACL and
+ * POSIX ACL supporting filesystems.
+ *
+ * Note that it's currently valid for @type to be 0 if a directory is created.
+ * Filesystems raise that flag individually and we need to check whether each
+ * filesystem can deal with receiving S_IFDIR from the vfs before we enforce a
+ * non-zero type.
+ *
+ * Returns: mode to be passed to the filesystem
+ */
+static inline umode_t vfs_prepare_mode(struct user_namespace *mnt_userns,
+				       const struct inode *dir, umode_t mode,
+				       umode_t mask_perms, umode_t type)
+{
+	/*
+	 * S_ISGID stripping depends on the mode of the new file so make sure
+	 * that the caller gives us this information and splat if we miss it.
+	 */
+	WARN_ON_ONCE((mode & S_IFMT) == 0);
+
+	mode = mode_strip_sgid(mnt_userns, dir, mode);
+	mode = mode_strip_umask(dir, mode);
+
+	/*
+	 * Apply the vfs mandated allowed permission mask and set the type of
+	 * file to be created before we call into the filesystem.
+	 */
+	mode &= (mask_perms & ~S_IFMT);
+	mode |= (type & S_IFMT);
+
+	return mode;
+}
+
 /**
  * vfs_create - create new file
  * @mnt_userns:	user namespace of the mount the inode was found from
@@ -3023,8 +3088,9 @@ int vfs_create(struct user_namespace *mnt_userns, struct inode *dir,
 
 	if (!dir->i_op->create)
 		return -EACCES;	/* shouldn't it be ENOSYS? */
-	mode &= S_IALLUGO;
-	mode |= S_IFREG;
+
+	mode = vfs_prepare_mode(mnt_userns, d_inode(path.dentry), mode,
+				S_IALLUGO, S_IFREG);
 	error = security_inode_create(dir, dentry, mode);
 	if (error)
 		return error;
@@ -3287,7 +3353,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	if (open_flag & O_CREAT) {
 		if (open_flag & O_EXCL)
 			open_flag &= ~O_TRUNC;
-		mode = vfs_prepare_mode(mnt_userns, dir->d_inode, mode);
+		mode = vfs_prepare_mode(mnt_userns, dir->d_inode, mode, mode, mode);
 		if (likely(got_write))
 			create_error = may_o_create(mnt_userns, &nd->path,
 						    dentry, mode);
@@ -3520,7 +3586,7 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns,
 	child = d_alloc(dentry, &slash_name);
 	if (unlikely(!child))
 		goto out_err;
-	mode = vfs_prepare_mode(mnt_userns, dir, mode);
+	mode = vfs_prepare_mode(mnt_userns, dir, mode, mode, mode);
 	error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
 	if (error)
 		goto out_err;
@@ -3798,6 +3864,8 @@ int vfs_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 	if (!dir->i_op->mknod)
 		return -EPERM;
 
+	mode = vfs_prepare_mode(mnt_userns, d_inode(path.dentry),
+				mode, mode, mode);
 	error = devcgroup_inode_mknod(mode, dev);
 	if (error)
 		return error;
@@ -3848,12 +3916,13 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
 	if (IS_ERR(dentry))
 		goto out1;
 
-	mnt_userns = mnt_user_ns(path.mnt);
-	mode = vfs_prepare_mode(mnt_userns, path.dentry->d_inode, mode);
-	error = security_path_mknod(&path, dentry, mode, dev);
+	error = security_path_mknod(&path, dentry,
+				    mode_strip_umask(d_inode(path.dentry), mode),
+				    dev);
 	if (error)
 		goto out2;
 
+	mnt_userns = mnt_user_ns(path.mnt);
 	switch (mode & S_IFMT) {
 		case 0: case S_IFREG:
 			error = vfs_create(mnt_userns, path.dentry->d_inode,
@@ -3919,7 +3988,13 @@ int vfs_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 	if (!dir->i_op->mkdir)
 		return -EPERM;
 
-	mode &= (S_IRWXUGO|S_ISVTX);
+	/*
+	 * Filesystems currently raise S_IFDIR individually. We should try and
+	 * fix that going forward passing it in from the vfs as we do for all
+	 * other files going forward.
+	 */
+	mode = vfs_prepare_mode(mnt_userns, d_inode(path.dentry),
+				mode, S_IRWXUGO | S_ISVTX, 0);
 	error = security_inode_mkdir(dir, dentry, mode);
 	if (error)
 		return error;
@@ -3940,7 +4015,6 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
 	struct path path;
 	int error;
 	unsigned int lookup_flags = LOOKUP_DIRECTORY;
-	struct user_namespace *mnt_userns;
 
 retry:
 	dentry = filename_create(dfd, name, &path, lookup_flags);
@@ -3948,12 +4022,15 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
 	if (IS_ERR(dentry))
 		goto out_putname;
 
-	mnt_userns = mnt_user_ns(path.mnt);
-	mode = vfs_prepare_mode(mnt_userns, path.dentry->d_inode, mode);
-	error = security_path_mkdir(&path, dentry, mode);
-	if (!error)
+	error = security_path_mkdir(&path, dentry,
+				    mode_strip_umask(d_inode(path.dentry), mode));
+	if (!error) {
+		struct user_namespace *mnt_userns;
+
+		mnt_userns = mnt_user_ns(path.mnt);
 		error = vfs_mkdir(mnt_userns, path.dentry->d_inode, dentry,
 				  mode);
+	}
 
 	done_path_create(&path, dentry);
 	if (retry_estale(error, lookup_flags)) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 914c8f28bb02..98b44a2732f5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3459,17 +3459,6 @@ static inline bool dir_relax_shared(struct inode *inode)
 	return !IS_DEADDIR(inode);
 }
 
-static inline umode_t vfs_prepare_mode(struct user_namespace *mnt_userns,
-				   const struct inode *dir, umode_t mode)
-{
-	mode = mode_strip_sgid(mnt_userns, dir, mode);
-
-	if (!IS_POSIXACL(dir))
-		mode &= ~current_umask();
-
-	return mode;
-}
-
 extern bool path_noexec(const struct path *path);
 extern void inode_nohighmem(struct inode *inode);
 
-- 
2.32.0


  reply	other threads:[~2022-04-27 10:43 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-26 11:11 [PATCH v8 1/4] fs: add mode_strip_sgid() helper Yang Xu
2022-04-26 11:11 ` [PATCH v8 2/4] fs: Add missing umask strip in vfs_tmpfile Yang Xu
2022-04-26 11:11 ` [PATCH v8 3/4] fs: move S_ISGID stripping into the vfs Yang Xu
2022-04-26 10:38   ` Christian Brauner
2022-04-26 11:20     ` Amir Goldstein
2022-04-26 11:52       ` Miklos Szeredi
2022-04-26 14:53         ` Christian Brauner
2022-04-27  9:22           ` Christian Brauner [this message]
2022-04-28  4:45             ` Al Viro
2022-04-28  8:07               ` Christian Brauner
2022-04-26 15:52   ` Christian Brauner
2022-04-27  1:21     ` xuyang2018.jy
2022-04-26 11:11 ` [PATCH v8 4/4] ceph: rely on vfs for setgid stripping Yang Xu
2022-04-26 14:52 ` [PATCH v8 1/4] fs: add mode_strip_sgid() helper Jeff Layton
2022-04-27  1:34   ` xuyang2018.jy
2022-04-28  1:59 ` Al Viro
2022-04-28  2:15   ` Al Viro
2022-04-28  2:23     ` xuyang2018.jy
2022-04-28  2:49       ` Al Viro
2022-04-28  3:12         ` Al Viro
2022-04-28  3:46           ` Al Viro
2022-04-28  9:34             ` Christian Brauner
2022-05-19  1:03               ` xuyang2018.jy
2022-05-19  9:14                 ` Christian Brauner
2022-04-28  8:06           ` Jann Horn
2022-04-28  8:44         ` Christian Brauner
2022-04-28 11:55           ` Christian Brauner
2022-04-28  8:25   ` Christian Brauner
2022-04-28  4:40 ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220427092201.wvsdjbnc7b4dttaw@wittgenstein \
    --to=brauner@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=xuyang2018.jy@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.