* mnt_want_write_file() has problem?
@ 2009-08-02 21:36 OGAWA Hirofumi
2009-08-03 18:31 ` Dave Hansen
2009-08-04 19:15 ` Dave Hansen
0 siblings, 2 replies; 7+ messages in thread
From: OGAWA Hirofumi @ 2009-08-02 21:36 UTC (permalink / raw)
To: Al Viro, Nick Piggin; +Cc: linux-kernel
Hi,
While I'm reading some code, I suspected that mnt_want_write_file() may
have wrong assumption. I think mnt_want_write_file() is assuming it
increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's
special_file(), it is false?
Sorry, I'm still not checking all of those though. E.g. I'm thinking the
below.
static inline int __get_file_write_access(struct inode *inode,
struct vfsmount *mnt)
{
[...]
if (!special_file(inode->i_mode)) {
/*
* Balanced in __fput()
*/
error = mnt_want_write(mnt);
if (error)
put_write_access(inode);
}
return error;
}
Thanks.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
---
fs/namespace.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff -puN fs/namespace.c~mnt_want_write-wrong-assume fs/namespace.c
--- linux-2.6/fs/namespace.c~mnt_want_write-wrong-assume 2009-08-03 04:33:35.000000000 +0900
+++ linux-2.6-hirofumi/fs/namespace.c 2009-08-03 04:31:34.000000000 +0900
@@ -316,7 +316,8 @@ EXPORT_SYMBOL_GPL(mnt_clone_write);
*/
int mnt_want_write_file(struct file *file)
{
- if (!(file->f_mode & FMODE_WRITE))
+ struct inode *inode = file->f_dentry->d_inode;
+ if (!(file->f_mode & FMODE_WRITE) || special_file(inode->i_mode))
return mnt_want_write(file->f_path.mnt);
else
return mnt_clone_write(file->f_path.mnt);
_
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mnt_want_write_file() has problem?
2009-08-02 21:36 mnt_want_write_file() has problem? OGAWA Hirofumi
@ 2009-08-03 18:31 ` Dave Hansen
2009-08-03 18:48 ` OGAWA Hirofumi
2009-08-04 19:15 ` Dave Hansen
1 sibling, 1 reply; 7+ messages in thread
From: Dave Hansen @ 2009-08-03 18:31 UTC (permalink / raw)
To: OGAWA Hirofumi; +Cc: Al Viro, Nick Piggin, linux-kernel
On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote:
> While I'm reading some code, I suspected that mnt_want_write_file() may
> have wrong assumption. I think mnt_want_write_file() is assuming it
> increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's
> special_file(), it is false?
>
> Sorry, I'm still not checking all of those though. E.g. I'm thinking the
> below.
>
> static inline int __get_file_write_access(struct inode *inode,
> struct vfsmount *mnt)
> {
> [...]
> if (!special_file(inode->i_mode)) {
> /*
> * Balanced in __fput()
> */
> error = mnt_want_write(mnt);
> if (error)
> put_write_access(inode);
> }
> return error;
> }
In practice I don't think this is an issue. We were never supposed to
do mnt_want_write(mnt) for any 'struct file' that was a special_file(),
specifically because of what you mention.
Nick's use of mnt_want_write_file() was a 1:1 drop-in for
mnt_want_write(). So, if all is well in the world, there should not be
any call sites where mnt_want_write_file() gets called on a
special_file().
Future users of mnt_want_file_write() may not notice this fact, though.
This is probably worth at least a note in the documentation or perhaps a
WARN_ON().
-- Dave
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mnt_want_write_file() has problem?
2009-08-03 18:31 ` Dave Hansen
@ 2009-08-03 18:48 ` OGAWA Hirofumi
2009-08-03 20:37 ` Dave Hansen
0 siblings, 1 reply; 7+ messages in thread
From: OGAWA Hirofumi @ 2009-08-03 18:48 UTC (permalink / raw)
To: Dave Hansen; +Cc: Al Viro, Nick Piggin, linux-kernel
Dave Hansen <dave@linux.vnet.ibm.com> writes:
> On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote:
>> While I'm reading some code, I suspected that mnt_want_write_file() may
>> have wrong assumption. I think mnt_want_write_file() is assuming it
>> increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's
>> special_file(), it is false?
>>
>> Sorry, I'm still not checking all of those though. E.g. I'm thinking the
>> below.
>>
>> static inline int __get_file_write_access(struct inode *inode,
>> struct vfsmount *mnt)
>> {
>> [...]
>> if (!special_file(inode->i_mode)) {
>> /*
>> * Balanced in __fput()
>> */
>> error = mnt_want_write(mnt);
>> if (error)
>> put_write_access(inode);
>> }
>> return error;
>> }
>
> In practice I don't think this is an issue. We were never supposed to
> do mnt_want_write(mnt) for any 'struct file' that was a special_file(),
> specifically because of what you mention.
>
> Nick's use of mnt_want_write_file() was a 1:1 drop-in for
> mnt_want_write(). So, if all is well in the world, there should not be
> any call sites where mnt_want_write_file() gets called on a
> special_file().
void file_update_time(struct file *file)
sys_fchmod()
sys_fchown()
sys_fsetxattr()
sys_fremovexattr()
Um..., the users of mnt_want_write_file() seems to be those. I think
all of those filp can be special file?
Thanks.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mnt_want_write_file() has problem?
2009-08-03 18:48 ` OGAWA Hirofumi
@ 2009-08-03 20:37 ` Dave Hansen
0 siblings, 0 replies; 7+ messages in thread
From: Dave Hansen @ 2009-08-03 20:37 UTC (permalink / raw)
To: OGAWA Hirofumi; +Cc: Al Viro, Nick Piggin, linux-kernel
On Tue, 2009-08-04 at 03:48 +0900, OGAWA Hirofumi wrote:
> Dave Hansen <dave@linux.vnet.ibm.com> writes:
>
> > On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote:
> >> While I'm reading some code, I suspected that mnt_want_write_file() may
> >> have wrong assumption. I think mnt_want_write_file() is assuming it
> >> increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's
> >> special_file(), it is false?
> >>
> >> Sorry, I'm still not checking all of those though. E.g. I'm thinking the
> >> below.
> >>
> >> static inline int __get_file_write_access(struct inode *inode,
> >> struct vfsmount *mnt)
> >> {
> >> [...]
> >> if (!special_file(inode->i_mode)) {
> >> /*
> >> * Balanced in __fput()
> >> */
> >> error = mnt_want_write(mnt);
> >> if (error)
> >> put_write_access(inode);
> >> }
> >> return error;
> >> }
> >
> > In practice I don't think this is an issue. We were never supposed to
> > do mnt_want_write(mnt) for any 'struct file' that was a special_file(),
> > specifically because of what you mention.
> >
> > Nick's use of mnt_want_write_file() was a 1:1 drop-in for
> > mnt_want_write(). So, if all is well in the world, there should not be
> > any call sites where mnt_want_write_file() gets called on a
> > special_file().
>
> void file_update_time(struct file *file)
> sys_fchmod()
> sys_fchown()
> sys_fsetxattr()
> sys_fremovexattr()
>
> Um..., the users of mnt_want_write_file() seems to be those. I think
> all of those filp can be special file?
OK, I see where you're going now. I think the race goes like this:
Let's say we have a process with /dev/null opened with FMODE_WRITE. It
is the only file open on the filesystem and so the /dev mount has a 0
mnt_writers count. That process goes to f_chmod() its fd to /dev/null.
The code checks and notices that (file->f_mode & FMODE_WRITE), and goes
to mnt_clone_write().
At the same time, another process tries to 'mount -o remount,ro /dev'.
That process never sees mnt_clone_write()'s mnt_writers bump and allows
the remount,ro, even though there's an elevated mnt_writers count.
Here's a completely untested/uncompiled patch. I'll see if I can find a
test case that triggers this bug with the BUG_ON() in this patch.
diff --git a/fs/namespace.c b/fs/namespace.c
index 277c28a..a4714c4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -294,9 +294,17 @@ EXPORT_SYMBOL_GPL(mnt_want_write);
*
* After finished, mnt_drop_write must be called as usual to
* drop the reference.
+ *
+ * Be very careful using this. You must *guarantee* that
+ * this vfsmount has at least one existing, persistent writer
+ * that can not possibly go away, before calling this.
*/
int mnt_clone_write(struct vfsmount *mnt)
{
+ /* This would kill the performance
+ * optimization in this function
+ BUG_ON(count_mnt_writers(mnt) > 0);
+ */
/* superblock may be r/o */
if (__mnt_is_readonly(mnt))
return -EROFS;
@@ -312,14 +320,20 @@ EXPORT_SYMBOL_GPL(mnt_clone_write);
* @file: the file who's mount on which to take a write
*
* This is like mnt_want_write, but it takes a file and can
- * do some optimisations if the file is open for write already
+ * do some optimisations if the file is open for write already.
+ * We do not do mnt_want_write() on read-only or special files,
+ * so we can not use mnt_clone_write() for them.
*/
int mnt_want_write_file(struct file *file)
{
- if (!(file->f_mode & FMODE_WRITE))
- return mnt_want_write(file->f_path.mnt);
- else
- return mnt_clone_write(file->f_path.mnt);
+ struct path *path = &file->f_path;
+ struct inode *inode = path->dentry->d_inode;
+
+ if ((file->f_mode & FMODE_WRITE) &&
+ !special_file(inode))
+ return mnt_clone_write(path->mnt);
+
+ return mnt_want_write(path->mnt);
}
EXPORT_SYMBOL_GPL(mnt_want_write_file);
-- Dave
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: mnt_want_write_file() has problem?
2009-08-02 21:36 mnt_want_write_file() has problem? OGAWA Hirofumi
2009-08-03 18:31 ` Dave Hansen
@ 2009-08-04 19:15 ` Dave Hansen
2009-08-05 5:37 ` Nick Piggin
2009-09-12 13:39 ` Al Viro
1 sibling, 2 replies; 7+ messages in thread
From: Dave Hansen @ 2009-08-04 19:15 UTC (permalink / raw)
To: OGAWA Hirofumi; +Cc: Al Viro, Nick Piggin, linux-kernel, akpm
On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote:
> diff -puN fs/namespace.c~mnt_want_write-wrong-assume fs/namespace.c
> ---
> linux-2.6/fs/namespace.c~mnt_want_write-wrong-assume 2009-08-03
> 04:33:35.000000000 +0900
> +++ linux-2.6-hirofumi/fs/namespace.c 2009-08-03 04:31:34.000000000
> +0900
> @@ -316,7 +316,8 @@ EXPORT_SYMBOL_GPL(mnt_clone_write);
> */
> int mnt_want_write_file(struct file *file)
> {
> - if (!(file->f_mode & FMODE_WRITE))
> + struct inode *inode = file->f_dentry->d_inode;
> + if (!(file->f_mode & FMODE_WRITE) || special_file(inode->i_mode))
> return mnt_want_write(file->f_path.mnt);
> else
> return mnt_clone_write(file->f_path.mnt);
I'm fine with this. I'd like a debugging check in mnt_clone_write()
since this bug is easy to detect, but such a check will also cost all of
the performance gains that Nick added. So, we can't have it
unconditionally.
--
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
-- Dave
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mnt_want_write_file() has problem?
2009-08-04 19:15 ` Dave Hansen
@ 2009-08-05 5:37 ` Nick Piggin
2009-09-12 13:39 ` Al Viro
1 sibling, 0 replies; 7+ messages in thread
From: Nick Piggin @ 2009-08-05 5:37 UTC (permalink / raw)
To: Dave Hansen; +Cc: OGAWA Hirofumi, Al Viro, linux-kernel, akpm
On Tue, Aug 04, 2009 at 12:15:19PM -0700, Dave Hansen wrote:
> On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote:
> > diff -puN fs/namespace.c~mnt_want_write-wrong-assume fs/namespace.c
> > ---
> > linux-2.6/fs/namespace.c~mnt_want_write-wrong-assume 2009-08-03
> > 04:33:35.000000000 +0900
> > +++ linux-2.6-hirofumi/fs/namespace.c 2009-08-03 04:31:34.000000000
> > +0900
> > @@ -316,7 +316,8 @@ EXPORT_SYMBOL_GPL(mnt_clone_write);
> > */
> > int mnt_want_write_file(struct file *file)
> > {
> > - if (!(file->f_mode & FMODE_WRITE))
> > + struct inode *inode = file->f_dentry->d_inode;
> > + if (!(file->f_mode & FMODE_WRITE) || special_file(inode->i_mode))
> > return mnt_want_write(file->f_path.mnt);
> > else
> > return mnt_clone_write(file->f_path.mnt);
>
> I'm fine with this. I'd like a debugging check in mnt_clone_write()
> since this bug is easy to detect, but such a check will also cost all of
> the performance gains that Nick added. So, we can't have it
> unconditionally.
Yeah, good catch, thanks for this.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mnt_want_write_file() has problem?
2009-08-04 19:15 ` Dave Hansen
2009-08-05 5:37 ` Nick Piggin
@ 2009-09-12 13:39 ` Al Viro
1 sibling, 0 replies; 7+ messages in thread
From: Al Viro @ 2009-09-12 13:39 UTC (permalink / raw)
To: Dave Hansen; +Cc: OGAWA Hirofumi, Nick Piggin, linux-kernel, akpm
On Tue, Aug 04, 2009 at 12:15:19PM -0700, Dave Hansen wrote:
> On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote:
> > diff -puN fs/namespace.c~mnt_want_write-wrong-assume fs/namespace.c
> > ---
> > linux-2.6/fs/namespace.c~mnt_want_write-wrong-assume 2009-08-03
> > 04:33:35.000000000 +0900
> > +++ linux-2.6-hirofumi/fs/namespace.c 2009-08-03 04:31:34.000000000
> > +0900
> > @@ -316,7 +316,8 @@ EXPORT_SYMBOL_GPL(mnt_clone_write);
> > */
> > int mnt_want_write_file(struct file *file)
> > {
> > - if (!(file->f_mode & FMODE_WRITE))
> > + struct inode *inode = file->f_dentry->d_inode;
> > + if (!(file->f_mode & FMODE_WRITE) || special_file(inode->i_mode))
> > return mnt_want_write(file->f_path.mnt);
> > else
> > return mnt_clone_write(file->f_path.mnt);
>
> I'm fine with this. I'd like a debugging check in mnt_clone_write()
> since this bug is easy to detect, but such a check will also cost all of
> the performance gains that Nick added. So, we can't have it
> unconditionally.
[Very belated] ACK.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-09-12 13:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-02 21:36 mnt_want_write_file() has problem? OGAWA Hirofumi
2009-08-03 18:31 ` Dave Hansen
2009-08-03 18:48 ` OGAWA Hirofumi
2009-08-03 20:37 ` Dave Hansen
2009-08-04 19:15 ` Dave Hansen
2009-08-05 5:37 ` Nick Piggin
2009-09-12 13:39 ` Al Viro
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.