linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] vfs: don't unnecessarily clone write access for writable fds
@ 2020-06-11 16:05 Eric Biggers
  2020-06-29 16:50 ` Eric Biggers
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Biggers @ 2020-06-11 16:05 UTC (permalink / raw)
  To: linux-fsdevel, Alexander Viro; +Cc: Daeho Jeong, linux-kernel

From: Eric Biggers <ebiggers@google.com>

There's no need for mnt_want_write_file() to increment mnt_writers when
the file is already open for writing, provided that
mnt_drop_write_file() is changed to conditionally decrement it.

We seem to have ended up in the current situation because
mnt_want_write_file() used to be paired with mnt_drop_write(), due to
mnt_drop_write_file() not having been added yet.  So originally
mnt_want_write_file() had to always increment mnt_writers.

But later mnt_drop_write_file() was added, and all callers of
mnt_want_write_file() were paired with it.  This makes the compatibility
between mnt_want_write_file() and mnt_drop_write() no longer necessary.

Therefore, make __mnt_want_write_file() and __mnt_drop_write_file() skip
incrementing mnt_writers on files already open for writing.  This
removes the only caller of mnt_clone_write(), so remove that too.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---

v2: keep the check for emergency r/o remounts.

 fs/namespace.c        | 53 ++++++++++++++++---------------------------
 include/linux/mount.h |  1 -
 2 files changed, 20 insertions(+), 34 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index f30ed401cc6d7a..b2052b2f3d87f6 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -359,51 +359,37 @@ int mnt_want_write(struct vfsmount *m)
 }
 EXPORT_SYMBOL_GPL(mnt_want_write);
 
-/**
- * mnt_clone_write - get write access to a mount
- * @mnt: the mount on which to take a write
- *
- * This is effectively like mnt_want_write, except
- * it must only be used to take an extra write reference
- * on a mountpoint that we already know has a write reference
- * on it. This allows some optimisation.
- *
- * After finished, mnt_drop_write must be called as usual to
- * drop the reference.
- */
-int mnt_clone_write(struct vfsmount *mnt)
-{
-	/* superblock may be r/o */
-	if (__mnt_is_readonly(mnt))
-		return -EROFS;
-	preempt_disable();
-	mnt_inc_writers(real_mount(mnt));
-	preempt_enable();
-	return 0;
-}
-EXPORT_SYMBOL_GPL(mnt_clone_write);
-
 /**
  * __mnt_want_write_file - get write access to a file's mount
  * @file: the file who's mount on which to take a write
  *
- * This is like __mnt_want_write, but it takes a file and can
- * do some optimisations if the file is open for write already
+ * This is like __mnt_want_write, but if the file is already open for writing it
+ * skips incrementing mnt_writers (since the open file already has a reference)
+ * and instead only does the check for emergency r/o remounts.  This must be
+ * paired with __mnt_drop_write_file.
  */
 int __mnt_want_write_file(struct file *file)
 {
-	if (!(file->f_mode & FMODE_WRITER))
-		return __mnt_want_write(file->f_path.mnt);
-	else
-		return mnt_clone_write(file->f_path.mnt);
+	if (file->f_mode & FMODE_WRITER) {
+		/*
+		 * Superblock may have become readonly while there are still
+		 * writable fd's, e.g. due to a fs error with errors=remount-ro
+		 */
+		if (__mnt_is_readonly(file->f_path.mnt))
+			return -EROFS;
+		return 0;
+	}
+	return __mnt_want_write(file->f_path.mnt);
 }
 
 /**
  * mnt_want_write_file - get write access to a file's mount
  * @file: the file who's mount on which to take a write
  *
- * This is like mnt_want_write, but it takes a file and can
- * do some optimisations if the file is open for write already
+ * This is like mnt_want_write, but if the file is already open for writing it
+ * skips incrementing mnt_writers (since the open file already has a reference)
+ * and instead only does the freeze protection and the check for emergency r/o
+ * remounts.  This must be paired with mnt_drop_write_file.
  */
 int mnt_want_write_file(struct file *file)
 {
@@ -449,7 +435,8 @@ EXPORT_SYMBOL_GPL(mnt_drop_write);
 
 void __mnt_drop_write_file(struct file *file)
 {
-	__mnt_drop_write(file->f_path.mnt);
+	if (!(file->f_mode & FMODE_WRITER))
+		__mnt_drop_write(file->f_path.mnt);
 }
 
 void mnt_drop_write_file(struct file *file)
diff --git a/include/linux/mount.h b/include/linux/mount.h
index de657bd211fa64..29d216f927c28c 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -78,7 +78,6 @@ struct path;
 
 extern int mnt_want_write(struct vfsmount *mnt);
 extern int mnt_want_write_file(struct file *file);
-extern int mnt_clone_write(struct vfsmount *mnt);
 extern void mnt_drop_write(struct vfsmount *mnt);
 extern void mnt_drop_write_file(struct file *file);
 extern void mntput(struct vfsmount *mnt);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] vfs: don't unnecessarily clone write access for writable fds
  2020-06-11 16:05 [PATCH v2] vfs: don't unnecessarily clone write access for writable fds Eric Biggers
@ 2020-06-29 16:50 ` Eric Biggers
  2020-09-16  3:59   ` Eric Biggers
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Biggers @ 2020-06-29 16:50 UTC (permalink / raw)
  To: linux-fsdevel, Alexander Viro; +Cc: Daeho Jeong, linux-kernel

On Thu, Jun 11, 2020 at 09:05:34AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> There's no need for mnt_want_write_file() to increment mnt_writers when
> the file is already open for writing, provided that
> mnt_drop_write_file() is changed to conditionally decrement it.
> 
> We seem to have ended up in the current situation because
> mnt_want_write_file() used to be paired with mnt_drop_write(), due to
> mnt_drop_write_file() not having been added yet.  So originally
> mnt_want_write_file() had to always increment mnt_writers.
> 
> But later mnt_drop_write_file() was added, and all callers of
> mnt_want_write_file() were paired with it.  This makes the compatibility
> between mnt_want_write_file() and mnt_drop_write() no longer necessary.
> 
> Therefore, make __mnt_want_write_file() and __mnt_drop_write_file() skip
> incrementing mnt_writers on files already open for writing.  This
> removes the only caller of mnt_clone_write(), so remove that too.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Al, any thoughts on this patch?

- Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] vfs: don't unnecessarily clone write access for writable fds
  2020-06-29 16:50 ` Eric Biggers
@ 2020-09-16  3:59   ` Eric Biggers
  2020-09-17  0:54     ` Al Viro
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Biggers @ 2020-09-16  3:59 UTC (permalink / raw)
  To: linux-fsdevel, Alexander Viro; +Cc: Daeho Jeong, linux-kernel

On Mon, Jun 29, 2020 at 09:50:14AM -0700, Eric Biggers wrote:
> On Thu, Jun 11, 2020 at 09:05:34AM -0700, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > There's no need for mnt_want_write_file() to increment mnt_writers when
> > the file is already open for writing, provided that
> > mnt_drop_write_file() is changed to conditionally decrement it.
> > 
> > We seem to have ended up in the current situation because
> > mnt_want_write_file() used to be paired with mnt_drop_write(), due to
> > mnt_drop_write_file() not having been added yet.  So originally
> > mnt_want_write_file() had to always increment mnt_writers.
> > 
> > But later mnt_drop_write_file() was added, and all callers of
> > mnt_want_write_file() were paired with it.  This makes the compatibility
> > between mnt_want_write_file() and mnt_drop_write() no longer necessary.
> > 
> > Therefore, make __mnt_want_write_file() and __mnt_drop_write_file() skip
> > incrementing mnt_writers on files already open for writing.  This
> > removes the only caller of mnt_clone_write(), so remove that too.
> > 
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> 
> Al, any thoughts on this patch?
> 

Ping?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] vfs: don't unnecessarily clone write access for writable fds
  2020-09-16  3:59   ` Eric Biggers
@ 2020-09-17  0:54     ` Al Viro
  2020-09-22 16:41       ` Eric Biggers
  0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2020-09-17  0:54 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-fsdevel, Daeho Jeong, linux-kernel

On Tue, Sep 15, 2020 at 08:59:14PM -0700, Eric Biggers wrote:
> On Mon, Jun 29, 2020 at 09:50:14AM -0700, Eric Biggers wrote:
> > On Thu, Jun 11, 2020 at 09:05:34AM -0700, Eric Biggers wrote:
> > > From: Eric Biggers <ebiggers@google.com>
> > > 
> > > There's no need for mnt_want_write_file() to increment mnt_writers when
> > > the file is already open for writing, provided that
> > > mnt_drop_write_file() is changed to conditionally decrement it.
> > > 
> > > We seem to have ended up in the current situation because
> > > mnt_want_write_file() used to be paired with mnt_drop_write(), due to
> > > mnt_drop_write_file() not having been added yet.  So originally
> > > mnt_want_write_file() had to always increment mnt_writers.
> > > 
> > > But later mnt_drop_write_file() was added, and all callers of
> > > mnt_want_write_file() were paired with it.  This makes the compatibility
> > > between mnt_want_write_file() and mnt_drop_write() no longer necessary.

Umm...  That really needs to be put into D/f/porting; this kind of rule changes
(from "it used to work both ways" to "things quietly break if you use the
old variant") should come with explicit statement in there.

I'm certainly fine with unexporting mnt_clone_write() and making the damn
thing static, but as for the rest I would put an explicit "don't pair
mnt_drop_write() with mnt_want_write_file()" and wait for a cycle.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] vfs: don't unnecessarily clone write access for writable fds
  2020-09-17  0:54     ` Al Viro
@ 2020-09-22 16:41       ` Eric Biggers
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Biggers @ 2020-09-22 16:41 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Daeho Jeong, linux-kernel

On Thu, Sep 17, 2020 at 01:54:41AM +0100, Al Viro wrote:
> On Tue, Sep 15, 2020 at 08:59:14PM -0700, Eric Biggers wrote:
> > On Mon, Jun 29, 2020 at 09:50:14AM -0700, Eric Biggers wrote:
> > > On Thu, Jun 11, 2020 at 09:05:34AM -0700, Eric Biggers wrote:
> > > > From: Eric Biggers <ebiggers@google.com>
> > > > 
> > > > There's no need for mnt_want_write_file() to increment mnt_writers when
> > > > the file is already open for writing, provided that
> > > > mnt_drop_write_file() is changed to conditionally decrement it.
> > > > 
> > > > We seem to have ended up in the current situation because
> > > > mnt_want_write_file() used to be paired with mnt_drop_write(), due to
> > > > mnt_drop_write_file() not having been added yet.  So originally
> > > > mnt_want_write_file() had to always increment mnt_writers.
> > > > 
> > > > But later mnt_drop_write_file() was added, and all callers of
> > > > mnt_want_write_file() were paired with it.  This makes the compatibility
> > > > between mnt_want_write_file() and mnt_drop_write() no longer necessary.
> 
> Umm...  That really needs to be put into D/f/porting; this kind of rule changes
> (from "it used to work both ways" to "things quietly break if you use the
> old variant") should come with explicit statement in there.
> 
> I'm certainly fine with unexporting mnt_clone_write() and making the damn
> thing static, but as for the rest I would put an explicit "don't pair
> mnt_drop_write() with mnt_want_write_file()" and wait for a cycle.

Is there any point in waiting a cycle between adding the note to
Documentation/filesystems/porting.rst and making the behavior change?  It seems
that all the other notes just get added at the same time the change is made.

- Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-22 16:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11 16:05 [PATCH v2] vfs: don't unnecessarily clone write access for writable fds Eric Biggers
2020-06-29 16:50 ` Eric Biggers
2020-09-16  3:59   ` Eric Biggers
2020-09-17  0:54     ` Al Viro
2020-09-22 16:41       ` Eric Biggers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).