All of lore.kernel.org
 help / color / mirror / Atom feed
* mnt_want_write_file() has problem?
@ 2009-08-02 21:36 OGAWA Hirofumi
  2009-08-03 18:31 ` Dave Hansen
  2009-08-04 19:15 ` Dave Hansen
  0 siblings, 2 replies; 7+ messages in thread
From: OGAWA Hirofumi @ 2009-08-02 21:36 UTC (permalink / raw)
  To: Al Viro, Nick Piggin; +Cc: linux-kernel

Hi,

While I'm reading some code, I suspected that mnt_want_write_file() may
have wrong assumption.  I think mnt_want_write_file() is assuming it
increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's
special_file(), it is false?

Sorry, I'm still not checking all of those though. E.g. I'm thinking the
below.

static inline int __get_file_write_access(struct inode *inode,
					  struct vfsmount *mnt)
{
[...]
	if (!special_file(inode->i_mode)) {
		/*
		 * Balanced in __fput()
		 */
		error = mnt_want_write(mnt);
		if (error)
			put_write_access(inode);
	}
	return error;
}

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>



Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
---

 fs/namespace.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff -puN fs/namespace.c~mnt_want_write-wrong-assume fs/namespace.c
--- linux-2.6/fs/namespace.c~mnt_want_write-wrong-assume	2009-08-03 04:33:35.000000000 +0900
+++ linux-2.6-hirofumi/fs/namespace.c	2009-08-03 04:31:34.000000000 +0900
@@ -316,7 +316,8 @@ EXPORT_SYMBOL_GPL(mnt_clone_write);
  */
 int mnt_want_write_file(struct file *file)
 {
-	if (!(file->f_mode & FMODE_WRITE))
+	struct inode *inode = file->f_dentry->d_inode;
+	if (!(file->f_mode & FMODE_WRITE) || special_file(inode->i_mode))
 		return mnt_want_write(file->f_path.mnt);
 	else
 		return mnt_clone_write(file->f_path.mnt);
_

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mnt_want_write_file() has problem?
  2009-08-02 21:36 mnt_want_write_file() has problem? OGAWA Hirofumi
@ 2009-08-03 18:31 ` Dave Hansen
  2009-08-03 18:48   ` OGAWA Hirofumi
  2009-08-04 19:15 ` Dave Hansen
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Hansen @ 2009-08-03 18:31 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Al Viro, Nick Piggin, linux-kernel

On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote:
> While I'm reading some code, I suspected that mnt_want_write_file() may
> have wrong assumption.  I think mnt_want_write_file() is assuming it
> increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's
> special_file(), it is false?
> 
> Sorry, I'm still not checking all of those though. E.g. I'm thinking the
> below.
> 
> static inline int __get_file_write_access(struct inode *inode,
> 					  struct vfsmount *mnt)
> {
> [...]
> 	if (!special_file(inode->i_mode)) {
> 		/*
> 		 * Balanced in __fput()
> 		 */
> 		error = mnt_want_write(mnt);
> 		if (error)
> 			put_write_access(inode);
> 	}
> 	return error;
> }

In practice I don't think this is an issue.  We were never supposed to
do mnt_want_write(mnt) for any 'struct file' that was a special_file(),
specifically because of what you mention.

Nick's use of mnt_want_write_file() was a 1:1 drop-in for
mnt_want_write().  So, if all is well in the world, there should not be
any call sites where mnt_want_write_file() gets called on a
special_file().

Future users of mnt_want_file_write() may not notice this fact, though.
This is probably worth at least a note in the documentation or perhaps a
WARN_ON().

-- Dave


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mnt_want_write_file() has problem?
  2009-08-03 18:31 ` Dave Hansen
@ 2009-08-03 18:48   ` OGAWA Hirofumi
  2009-08-03 20:37     ` Dave Hansen
  0 siblings, 1 reply; 7+ messages in thread
From: OGAWA Hirofumi @ 2009-08-03 18:48 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Al Viro, Nick Piggin, linux-kernel

Dave Hansen <dave@linux.vnet.ibm.com> writes:

> On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote:
>> While I'm reading some code, I suspected that mnt_want_write_file() may
>> have wrong assumption.  I think mnt_want_write_file() is assuming it
>> increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's
>> special_file(), it is false?
>> 
>> Sorry, I'm still not checking all of those though. E.g. I'm thinking the
>> below.
>> 
>> static inline int __get_file_write_access(struct inode *inode,
>> 					  struct vfsmount *mnt)
>> {
>> [...]
>> 	if (!special_file(inode->i_mode)) {
>> 		/*
>> 		 * Balanced in __fput()
>> 		 */
>> 		error = mnt_want_write(mnt);
>> 		if (error)
>> 			put_write_access(inode);
>> 	}
>> 	return error;
>> }
>
> In practice I don't think this is an issue.  We were never supposed to
> do mnt_want_write(mnt) for any 'struct file' that was a special_file(),
> specifically because of what you mention.
>
> Nick's use of mnt_want_write_file() was a 1:1 drop-in for
> mnt_want_write().  So, if all is well in the world, there should not be
> any call sites where mnt_want_write_file() gets called on a
> special_file().

void file_update_time(struct file *file)
sys_fchmod()
sys_fchown()
sys_fsetxattr()
sys_fremovexattr()

Um..., the users of mnt_want_write_file() seems to be those. I think
all of those filp can be special file?

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mnt_want_write_file() has problem?
  2009-08-03 18:48   ` OGAWA Hirofumi
@ 2009-08-03 20:37     ` Dave Hansen
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Hansen @ 2009-08-03 20:37 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Al Viro, Nick Piggin, linux-kernel

On Tue, 2009-08-04 at 03:48 +0900, OGAWA Hirofumi wrote:
> Dave Hansen <dave@linux.vnet.ibm.com> writes:
> 
> > On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote:
> >> While I'm reading some code, I suspected that mnt_want_write_file() may
> >> have wrong assumption.  I think mnt_want_write_file() is assuming it
> >> increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's
> >> special_file(), it is false?
> >> 
> >> Sorry, I'm still not checking all of those though. E.g. I'm thinking the
> >> below.
> >> 
> >> static inline int __get_file_write_access(struct inode *inode,
> >> 					  struct vfsmount *mnt)
> >> {
> >> [...]
> >> 	if (!special_file(inode->i_mode)) {
> >> 		/*
> >> 		 * Balanced in __fput()
> >> 		 */
> >> 		error = mnt_want_write(mnt);
> >> 		if (error)
> >> 			put_write_access(inode);
> >> 	}
> >> 	return error;
> >> }
> >
> > In practice I don't think this is an issue.  We were never supposed to
> > do mnt_want_write(mnt) for any 'struct file' that was a special_file(),
> > specifically because of what you mention.
> >
> > Nick's use of mnt_want_write_file() was a 1:1 drop-in for
> > mnt_want_write().  So, if all is well in the world, there should not be
> > any call sites where mnt_want_write_file() gets called on a
> > special_file().
> 
> void file_update_time(struct file *file)
> sys_fchmod()
> sys_fchown()
> sys_fsetxattr()
> sys_fremovexattr()
> 
> Um..., the users of mnt_want_write_file() seems to be those. I think
> all of those filp can be special file?

OK, I see where you're going now.  I think the race goes like this:

Let's say we have a process with /dev/null opened with FMODE_WRITE.  It
is the only file open on the filesystem and so the /dev mount has a 0
mnt_writers count.  That process goes to f_chmod() its fd to /dev/null.
The code checks and notices that (file->f_mode & FMODE_WRITE), and goes
to mnt_clone_write().

At the same time, another process tries to 'mount -o remount,ro /dev'.
That process never sees mnt_clone_write()'s mnt_writers bump and allows
the remount,ro, even though there's an elevated mnt_writers count.

Here's a completely untested/uncompiled patch.  I'll see if I can find a
test case that triggers this bug with the BUG_ON() in this patch.

diff --git a/fs/namespace.c b/fs/namespace.c
index 277c28a..a4714c4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -294,9 +294,17 @@ EXPORT_SYMBOL_GPL(mnt_want_write);
  *
  * After finished, mnt_drop_write must be called as usual to
  * drop the reference.
+ *
+ * Be very careful using this.  You must *guarantee* that
+ * this vfsmount has at least one existing, persistent writer
+ * that can not possibly go away, before calling this.
  */
 int mnt_clone_write(struct vfsmount *mnt)
 {
+	/* This would kill the performance
+	 * optimization in this function
+	BUG_ON(count_mnt_writers(mnt) > 0);
+	*/
 	/* superblock may be r/o */
 	if (__mnt_is_readonly(mnt))
 		return -EROFS;
@@ -312,14 +320,20 @@ EXPORT_SYMBOL_GPL(mnt_clone_write);
  * @file: the file who's mount on which to take a write
  *
  * This is like mnt_want_write, but it takes a file and can
- * do some optimisations if the file is open for write already
+ * do some optimisations if the file is open for write already.
+ * We do not do mnt_want_write() on read-only or special files,
+ * so we can not use mnt_clone_write() for them.
  */
 int mnt_want_write_file(struct file *file)
 {
-	if (!(file->f_mode & FMODE_WRITE))
-		return mnt_want_write(file->f_path.mnt);
-	else
-		return mnt_clone_write(file->f_path.mnt);
+	struct path *path = &file->f_path;
+	struct inode *inode = path->dentry->d_inode;
+
+	if ((file->f_mode & FMODE_WRITE) &&
+	    !special_file(inode))
+		return mnt_clone_write(path->mnt);
+
+	return mnt_want_write(path->mnt);
 }
 EXPORT_SYMBOL_GPL(mnt_want_write_file);
 


-- Dave


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: mnt_want_write_file() has problem?
  2009-08-02 21:36 mnt_want_write_file() has problem? OGAWA Hirofumi
  2009-08-03 18:31 ` Dave Hansen
@ 2009-08-04 19:15 ` Dave Hansen
  2009-08-05  5:37   ` Nick Piggin
  2009-09-12 13:39   ` Al Viro
  1 sibling, 2 replies; 7+ messages in thread
From: Dave Hansen @ 2009-08-04 19:15 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Al Viro, Nick Piggin, linux-kernel, akpm

On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote:
> diff -puN fs/namespace.c~mnt_want_write-wrong-assume fs/namespace.c
> ---
> linux-2.6/fs/namespace.c~mnt_want_write-wrong-assume        2009-08-03
> 04:33:35.000000000 +0900
> +++ linux-2.6-hirofumi/fs/namespace.c   2009-08-03 04:31:34.000000000
> +0900
> @@ -316,7 +316,8 @@ EXPORT_SYMBOL_GPL(mnt_clone_write);
>   */
>  int mnt_want_write_file(struct file *file)
>  {
> -       if (!(file->f_mode & FMODE_WRITE))
> +       struct inode *inode = file->f_dentry->d_inode;
> +       if (!(file->f_mode & FMODE_WRITE) || special_file(inode->i_mode))
>                 return mnt_want_write(file->f_path.mnt);
>         else
>                 return mnt_clone_write(file->f_path.mnt);

I'm fine with this.  I'd like a debugging check in mnt_clone_write()
since this bug is easy to detect, but such a check will also cost all of
the performance gains that Nick added.  So, we can't have it
unconditionally.

-- 

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>

-- Dave


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mnt_want_write_file() has problem?
  2009-08-04 19:15 ` Dave Hansen
@ 2009-08-05  5:37   ` Nick Piggin
  2009-09-12 13:39   ` Al Viro
  1 sibling, 0 replies; 7+ messages in thread
From: Nick Piggin @ 2009-08-05  5:37 UTC (permalink / raw)
  To: Dave Hansen; +Cc: OGAWA Hirofumi, Al Viro, linux-kernel, akpm

On Tue, Aug 04, 2009 at 12:15:19PM -0700, Dave Hansen wrote:
> On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote:
> > diff -puN fs/namespace.c~mnt_want_write-wrong-assume fs/namespace.c
> > ---
> > linux-2.6/fs/namespace.c~mnt_want_write-wrong-assume        2009-08-03
> > 04:33:35.000000000 +0900
> > +++ linux-2.6-hirofumi/fs/namespace.c   2009-08-03 04:31:34.000000000
> > +0900
> > @@ -316,7 +316,8 @@ EXPORT_SYMBOL_GPL(mnt_clone_write);
> >   */
> >  int mnt_want_write_file(struct file *file)
> >  {
> > -       if (!(file->f_mode & FMODE_WRITE))
> > +       struct inode *inode = file->f_dentry->d_inode;
> > +       if (!(file->f_mode & FMODE_WRITE) || special_file(inode->i_mode))
> >                 return mnt_want_write(file->f_path.mnt);
> >         else
> >                 return mnt_clone_write(file->f_path.mnt);
> 
> I'm fine with this.  I'd like a debugging check in mnt_clone_write()
> since this bug is easy to detect, but such a check will also cost all of
> the performance gains that Nick added.  So, we can't have it
> unconditionally.

Yeah, good catch, thanks for this.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mnt_want_write_file() has problem?
  2009-08-04 19:15 ` Dave Hansen
  2009-08-05  5:37   ` Nick Piggin
@ 2009-09-12 13:39   ` Al Viro
  1 sibling, 0 replies; 7+ messages in thread
From: Al Viro @ 2009-09-12 13:39 UTC (permalink / raw)
  To: Dave Hansen; +Cc: OGAWA Hirofumi, Nick Piggin, linux-kernel, akpm

On Tue, Aug 04, 2009 at 12:15:19PM -0700, Dave Hansen wrote:
> On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote:
> > diff -puN fs/namespace.c~mnt_want_write-wrong-assume fs/namespace.c
> > ---
> > linux-2.6/fs/namespace.c~mnt_want_write-wrong-assume        2009-08-03
> > 04:33:35.000000000 +0900
> > +++ linux-2.6-hirofumi/fs/namespace.c   2009-08-03 04:31:34.000000000
> > +0900
> > @@ -316,7 +316,8 @@ EXPORT_SYMBOL_GPL(mnt_clone_write);
> >   */
> >  int mnt_want_write_file(struct file *file)
> >  {
> > -       if (!(file->f_mode & FMODE_WRITE))
> > +       struct inode *inode = file->f_dentry->d_inode;
> > +       if (!(file->f_mode & FMODE_WRITE) || special_file(inode->i_mode))
> >                 return mnt_want_write(file->f_path.mnt);
> >         else
> >                 return mnt_clone_write(file->f_path.mnt);
> 
> I'm fine with this.  I'd like a debugging check in mnt_clone_write()
> since this bug is easy to detect, but such a check will also cost all of
> the performance gains that Nick added.  So, we can't have it
> unconditionally.

[Very belated] ACK.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-09-12 13:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-02 21:36 mnt_want_write_file() has problem? OGAWA Hirofumi
2009-08-03 18:31 ` Dave Hansen
2009-08-03 18:48   ` OGAWA Hirofumi
2009-08-03 20:37     ` Dave Hansen
2009-08-04 19:15 ` Dave Hansen
2009-08-05  5:37   ` Nick Piggin
2009-09-12 13:39   ` Al Viro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.