* mnt_want_write_file() has problem? @ 2009-08-02 21:36 OGAWA Hirofumi 2009-08-03 18:31 ` Dave Hansen 2009-08-04 19:15 ` Dave Hansen 0 siblings, 2 replies; 7+ messages in thread From: OGAWA Hirofumi @ 2009-08-02 21:36 UTC (permalink / raw) To: Al Viro, Nick Piggin; +Cc: linux-kernel Hi, While I'm reading some code, I suspected that mnt_want_write_file() may have wrong assumption. I think mnt_want_write_file() is assuming it increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's special_file(), it is false? Sorry, I'm still not checking all of those though. E.g. I'm thinking the below. static inline int __get_file_write_access(struct inode *inode, struct vfsmount *mnt) { [...] if (!special_file(inode->i_mode)) { /* * Balanced in __fput() */ error = mnt_want_write(mnt); if (error) put_write_access(inode); } return error; } Thanks. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> --- fs/namespace.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff -puN fs/namespace.c~mnt_want_write-wrong-assume fs/namespace.c --- linux-2.6/fs/namespace.c~mnt_want_write-wrong-assume 2009-08-03 04:33:35.000000000 +0900 +++ linux-2.6-hirofumi/fs/namespace.c 2009-08-03 04:31:34.000000000 +0900 @@ -316,7 +316,8 @@ EXPORT_SYMBOL_GPL(mnt_clone_write); */ int mnt_want_write_file(struct file *file) { - if (!(file->f_mode & FMODE_WRITE)) + struct inode *inode = file->f_dentry->d_inode; + if (!(file->f_mode & FMODE_WRITE) || special_file(inode->i_mode)) return mnt_want_write(file->f_path.mnt); else return mnt_clone_write(file->f_path.mnt); _ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mnt_want_write_file() has problem? 2009-08-02 21:36 mnt_want_write_file() has problem? OGAWA Hirofumi @ 2009-08-03 18:31 ` Dave Hansen 2009-08-03 18:48 ` OGAWA Hirofumi 2009-08-04 19:15 ` Dave Hansen 1 sibling, 1 reply; 7+ messages in thread From: Dave Hansen @ 2009-08-03 18:31 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Al Viro, Nick Piggin, linux-kernel On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote: > While I'm reading some code, I suspected that mnt_want_write_file() may > have wrong assumption. I think mnt_want_write_file() is assuming it > increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's > special_file(), it is false? > > Sorry, I'm still not checking all of those though. E.g. I'm thinking the > below. > > static inline int __get_file_write_access(struct inode *inode, > struct vfsmount *mnt) > { > [...] > if (!special_file(inode->i_mode)) { > /* > * Balanced in __fput() > */ > error = mnt_want_write(mnt); > if (error) > put_write_access(inode); > } > return error; > } In practice I don't think this is an issue. We were never supposed to do mnt_want_write(mnt) for any 'struct file' that was a special_file(), specifically because of what you mention. Nick's use of mnt_want_write_file() was a 1:1 drop-in for mnt_want_write(). So, if all is well in the world, there should not be any call sites where mnt_want_write_file() gets called on a special_file(). Future users of mnt_want_file_write() may not notice this fact, though. This is probably worth at least a note in the documentation or perhaps a WARN_ON(). -- Dave ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mnt_want_write_file() has problem? 2009-08-03 18:31 ` Dave Hansen @ 2009-08-03 18:48 ` OGAWA Hirofumi 2009-08-03 20:37 ` Dave Hansen 0 siblings, 1 reply; 7+ messages in thread From: OGAWA Hirofumi @ 2009-08-03 18:48 UTC (permalink / raw) To: Dave Hansen; +Cc: Al Viro, Nick Piggin, linux-kernel Dave Hansen <dave@linux.vnet.ibm.com> writes: > On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote: >> While I'm reading some code, I suspected that mnt_want_write_file() may >> have wrong assumption. I think mnt_want_write_file() is assuming it >> increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's >> special_file(), it is false? >> >> Sorry, I'm still not checking all of those though. E.g. I'm thinking the >> below. >> >> static inline int __get_file_write_access(struct inode *inode, >> struct vfsmount *mnt) >> { >> [...] >> if (!special_file(inode->i_mode)) { >> /* >> * Balanced in __fput() >> */ >> error = mnt_want_write(mnt); >> if (error) >> put_write_access(inode); >> } >> return error; >> } > > In practice I don't think this is an issue. We were never supposed to > do mnt_want_write(mnt) for any 'struct file' that was a special_file(), > specifically because of what you mention. > > Nick's use of mnt_want_write_file() was a 1:1 drop-in for > mnt_want_write(). So, if all is well in the world, there should not be > any call sites where mnt_want_write_file() gets called on a > special_file(). void file_update_time(struct file *file) sys_fchmod() sys_fchown() sys_fsetxattr() sys_fremovexattr() Um..., the users of mnt_want_write_file() seems to be those. I think all of those filp can be special file? Thanks. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mnt_want_write_file() has problem? 2009-08-03 18:48 ` OGAWA Hirofumi @ 2009-08-03 20:37 ` Dave Hansen 0 siblings, 0 replies; 7+ messages in thread From: Dave Hansen @ 2009-08-03 20:37 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Al Viro, Nick Piggin, linux-kernel On Tue, 2009-08-04 at 03:48 +0900, OGAWA Hirofumi wrote: > Dave Hansen <dave@linux.vnet.ibm.com> writes: > > > On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote: > >> While I'm reading some code, I suspected that mnt_want_write_file() may > >> have wrong assumption. I think mnt_want_write_file() is assuming it > >> increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's > >> special_file(), it is false? > >> > >> Sorry, I'm still not checking all of those though. E.g. I'm thinking the > >> below. > >> > >> static inline int __get_file_write_access(struct inode *inode, > >> struct vfsmount *mnt) > >> { > >> [...] > >> if (!special_file(inode->i_mode)) { > >> /* > >> * Balanced in __fput() > >> */ > >> error = mnt_want_write(mnt); > >> if (error) > >> put_write_access(inode); > >> } > >> return error; > >> } > > > > In practice I don't think this is an issue. We were never supposed to > > do mnt_want_write(mnt) for any 'struct file' that was a special_file(), > > specifically because of what you mention. > > > > Nick's use of mnt_want_write_file() was a 1:1 drop-in for > > mnt_want_write(). So, if all is well in the world, there should not be > > any call sites where mnt_want_write_file() gets called on a > > special_file(). > > void file_update_time(struct file *file) > sys_fchmod() > sys_fchown() > sys_fsetxattr() > sys_fremovexattr() > > Um..., the users of mnt_want_write_file() seems to be those. I think > all of those filp can be special file? OK, I see where you're going now. I think the race goes like this: Let's say we have a process with /dev/null opened with FMODE_WRITE. It is the only file open on the filesystem and so the /dev mount has a 0 mnt_writers count. That process goes to f_chmod() its fd to /dev/null. The code checks and notices that (file->f_mode & FMODE_WRITE), and goes to mnt_clone_write(). At the same time, another process tries to 'mount -o remount,ro /dev'. That process never sees mnt_clone_write()'s mnt_writers bump and allows the remount,ro, even though there's an elevated mnt_writers count. Here's a completely untested/uncompiled patch. I'll see if I can find a test case that triggers this bug with the BUG_ON() in this patch. diff --git a/fs/namespace.c b/fs/namespace.c index 277c28a..a4714c4 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -294,9 +294,17 @@ EXPORT_SYMBOL_GPL(mnt_want_write); * * After finished, mnt_drop_write must be called as usual to * drop the reference. + * + * Be very careful using this. You must *guarantee* that + * this vfsmount has at least one existing, persistent writer + * that can not possibly go away, before calling this. */ int mnt_clone_write(struct vfsmount *mnt) { + /* This would kill the performance + * optimization in this function + BUG_ON(count_mnt_writers(mnt) > 0); + */ /* superblock may be r/o */ if (__mnt_is_readonly(mnt)) return -EROFS; @@ -312,14 +320,20 @@ EXPORT_SYMBOL_GPL(mnt_clone_write); * @file: the file who's mount on which to take a write * * This is like mnt_want_write, but it takes a file and can - * do some optimisations if the file is open for write already + * do some optimisations if the file is open for write already. + * We do not do mnt_want_write() on read-only or special files, + * so we can not use mnt_clone_write() for them. */ int mnt_want_write_file(struct file *file) { - if (!(file->f_mode & FMODE_WRITE)) - return mnt_want_write(file->f_path.mnt); - else - return mnt_clone_write(file->f_path.mnt); + struct path *path = &file->f_path; + struct inode *inode = path->dentry->d_inode; + + if ((file->f_mode & FMODE_WRITE) && + !special_file(inode)) + return mnt_clone_write(path->mnt); + + return mnt_want_write(path->mnt); } EXPORT_SYMBOL_GPL(mnt_want_write_file); -- Dave ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: mnt_want_write_file() has problem? 2009-08-02 21:36 mnt_want_write_file() has problem? OGAWA Hirofumi 2009-08-03 18:31 ` Dave Hansen @ 2009-08-04 19:15 ` Dave Hansen 2009-08-05 5:37 ` Nick Piggin 2009-09-12 13:39 ` Al Viro 1 sibling, 2 replies; 7+ messages in thread From: Dave Hansen @ 2009-08-04 19:15 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Al Viro, Nick Piggin, linux-kernel, akpm On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote: > diff -puN fs/namespace.c~mnt_want_write-wrong-assume fs/namespace.c > --- > linux-2.6/fs/namespace.c~mnt_want_write-wrong-assume 2009-08-03 > 04:33:35.000000000 +0900 > +++ linux-2.6-hirofumi/fs/namespace.c 2009-08-03 04:31:34.000000000 > +0900 > @@ -316,7 +316,8 @@ EXPORT_SYMBOL_GPL(mnt_clone_write); > */ > int mnt_want_write_file(struct file *file) > { > - if (!(file->f_mode & FMODE_WRITE)) > + struct inode *inode = file->f_dentry->d_inode; > + if (!(file->f_mode & FMODE_WRITE) || special_file(inode->i_mode)) > return mnt_want_write(file->f_path.mnt); > else > return mnt_clone_write(file->f_path.mnt); I'm fine with this. I'd like a debugging check in mnt_clone_write() since this bug is easy to detect, but such a check will also cost all of the performance gains that Nick added. So, we can't have it unconditionally. -- Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> -- Dave ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mnt_want_write_file() has problem? 2009-08-04 19:15 ` Dave Hansen @ 2009-08-05 5:37 ` Nick Piggin 2009-09-12 13:39 ` Al Viro 1 sibling, 0 replies; 7+ messages in thread From: Nick Piggin @ 2009-08-05 5:37 UTC (permalink / raw) To: Dave Hansen; +Cc: OGAWA Hirofumi, Al Viro, linux-kernel, akpm On Tue, Aug 04, 2009 at 12:15:19PM -0700, Dave Hansen wrote: > On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote: > > diff -puN fs/namespace.c~mnt_want_write-wrong-assume fs/namespace.c > > --- > > linux-2.6/fs/namespace.c~mnt_want_write-wrong-assume 2009-08-03 > > 04:33:35.000000000 +0900 > > +++ linux-2.6-hirofumi/fs/namespace.c 2009-08-03 04:31:34.000000000 > > +0900 > > @@ -316,7 +316,8 @@ EXPORT_SYMBOL_GPL(mnt_clone_write); > > */ > > int mnt_want_write_file(struct file *file) > > { > > - if (!(file->f_mode & FMODE_WRITE)) > > + struct inode *inode = file->f_dentry->d_inode; > > + if (!(file->f_mode & FMODE_WRITE) || special_file(inode->i_mode)) > > return mnt_want_write(file->f_path.mnt); > > else > > return mnt_clone_write(file->f_path.mnt); > > I'm fine with this. I'd like a debugging check in mnt_clone_write() > since this bug is easy to detect, but such a check will also cost all of > the performance gains that Nick added. So, we can't have it > unconditionally. Yeah, good catch, thanks for this. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mnt_want_write_file() has problem? 2009-08-04 19:15 ` Dave Hansen 2009-08-05 5:37 ` Nick Piggin @ 2009-09-12 13:39 ` Al Viro 1 sibling, 0 replies; 7+ messages in thread From: Al Viro @ 2009-09-12 13:39 UTC (permalink / raw) To: Dave Hansen; +Cc: OGAWA Hirofumi, Nick Piggin, linux-kernel, akpm On Tue, Aug 04, 2009 at 12:15:19PM -0700, Dave Hansen wrote: > On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote: > > diff -puN fs/namespace.c~mnt_want_write-wrong-assume fs/namespace.c > > --- > > linux-2.6/fs/namespace.c~mnt_want_write-wrong-assume 2009-08-03 > > 04:33:35.000000000 +0900 > > +++ linux-2.6-hirofumi/fs/namespace.c 2009-08-03 04:31:34.000000000 > > +0900 > > @@ -316,7 +316,8 @@ EXPORT_SYMBOL_GPL(mnt_clone_write); > > */ > > int mnt_want_write_file(struct file *file) > > { > > - if (!(file->f_mode & FMODE_WRITE)) > > + struct inode *inode = file->f_dentry->d_inode; > > + if (!(file->f_mode & FMODE_WRITE) || special_file(inode->i_mode)) > > return mnt_want_write(file->f_path.mnt); > > else > > return mnt_clone_write(file->f_path.mnt); > > I'm fine with this. I'd like a debugging check in mnt_clone_write() > since this bug is easy to detect, but such a check will also cost all of > the performance gains that Nick added. So, we can't have it > unconditionally. [Very belated] ACK. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-09-12 13:39 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-08-02 21:36 mnt_want_write_file() has problem? OGAWA Hirofumi 2009-08-03 18:31 ` Dave Hansen 2009-08-03 18:48 ` OGAWA Hirofumi 2009-08-03 20:37 ` Dave Hansen 2009-08-04 19:15 ` Dave Hansen 2009-08-05 5:37 ` Nick Piggin 2009-09-12 13:39 ` Al Viro
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.