linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Q] [VFS] NULL f_dentry for opened files ; possible race condition
@ 2001-08-31 17:18 Jean-Marc Saffroy
  2001-08-31 20:56 ` [RFD] readonly/read-write semantics Alexander Viro
  0 siblings, 1 reply; 5+ messages in thread
From: Jean-Marc Saffroy @ 2001-08-31 17:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jean-Marc Saffroy

Hello,


In 2.4.9, I have encountered a strange condition while playing with file
structs chained on a superblock list (sb->s_files) : some of them can have
a NULL f_dentry pointer. The only case I found which can cause this is
when fput is called and f_count drops to zero. Is that the only case ?

While exploring the corresponding code for an explanation, I found what
looks like a possible race condition : do_remount_sb calls
fs_may_remount_ro, and only then uses lock_super to do the actual remount.

Isn't it possible for a program to open a file for writing just after
fs_may_remount_ro ? The whole thing seems to be protected by the BKL and
mount_sem, but I guess it won't stop an open.


Regards,

-- 
Jean-Marc Saffroy - Research Engineer - Silicomp Research Institute
mailto:saffroy@ri.silicomp.fr


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFD] readonly/read-write semantics
  2001-08-31 17:18 [Q] [VFS] NULL f_dentry for opened files ; possible race condition Jean-Marc Saffroy
@ 2001-08-31 20:56 ` Alexander Viro
  2001-09-01 13:08   ` Juan Quintela
  2001-09-04  1:16   ` Jean-Marc Saffroy
  0 siblings, 2 replies; 5+ messages in thread
From: Alexander Viro @ 2001-08-31 20:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jean-Marc Saffroy, linux-fsdevel, Linus Torvalds


Folks, stuff below is, IMNSHO, worth a discussion.  Please, give it
a thought.  Short summary: we need well-defined semantics for
read-only/read-write. Currently we don't have any.

On Fri, 31 Aug 2001, Jean-Marc Saffroy wrote:

(first the trivial part)

> Hello,
> 
> 
> In 2.4.9, I have encountered a strange condition while playing with file
> structs chained on a superblock list (sb->s_files) : some of them can have
> a NULL f_dentry pointer. The only case I found which can cause this is
> when fput is called and f_count drops to zero. Is that the only case ?

Yes, it is, and yes, it's legitimate - code that scans that list should
(and in-tree one does) deal with such case.
 
> While exploring the corresponding code for an explanation, I found what
> looks like a possible race condition : do_remount_sb calls
> fs_may_remount_ro, and only then uses lock_super to do the actual remount.
> 
> Isn't it possible for a program to open a file for writing just after
> fs_may_remount_ro ? The whole thing seems to be protected by the BKL and
> mount_sem, but I guess it won't stop an open.

... and here comes the serious stuff.

mount_sem, BKL and lock_super() have nothing to checks done in the open().

fs_may_remount_ro() is, indeed, racy and had been since very long.
However, the main problem is not in opening something after the
check - the check itself is not exact enough.

Think what happens if the object we hold for writing doesn't currently
have struct file.  At all.  E.g. it is a directory in the middle of
subdirectory creation.

Or, for that matter, combine that with "what happens if we do ...
after we've done the checks" - e.g. consider mkdir() called after
the check.  Or unlink() on opened file driving ->i_nlink to 0.

What we need is a "I want rw access to fs"/"I give up rw access"/"make
it ro" set of primitives.  Unfortunately, it's even more compilcated -
e.g. fs may stomp its foot and set MS_RDONLY in ->s_flags (e.g. upon
finding an error if it has such policy).  That DOESN'T look for files
opened for write (reasonable) and DOESN'T revoke write access to them.

So you end up with fs that is claimed to be r/o, but people still have
files opened for write.

We need clear semantics for readonly/read-write state of filesystems.
Until then all we have is "well, if you go single-user before remount
or otherwise prevent users from access to mountpoit - you should be OK".

Which, BTW, is not _too_ unreasonable, since otherwise you are gambling
on the fact that users won't make the sucker busy in the wrong moment.
More clean solution would be "revoke everyone's write access and remount
r/o", but that will take quite an effort.  Which might be worth doing.

Again, the main issue here is what do we want, not how to implement it.
Flame away.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-08-31 20:56 ` [RFD] readonly/read-write semantics Alexander Viro
@ 2001-09-01 13:08   ` Juan Quintela
  2001-09-04  1:16   ` Jean-Marc Saffroy
  1 sibling, 0 replies; 5+ messages in thread
From: Juan Quintela @ 2001-09-01 13:08 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, Jean-Marc Saffroy, linux-fsdevel, Linus Torvalds

>>>>> "alexander" == Alexander Viro <viro@math.psu.edu> writes:


Hi

viro> What we need is a "I want rw access to fs"/"I give up rw access"/"make
viro> it ro" set of primitives.  Unfortunately, it's even more compilcated -
viro> e.g. fs may stomp its foot and set MS_RDONLY in ->s_flags (e.g. upon
viro> finding an error if it has such policy).  That DOESN'T look for files
viro> opened for write (reasonable) and DOESN'T revoke write access to them.

I really will like that thing for supermount, supermount tries to do
that thing by hand, and it really fails because it is difficult,
supermount tries to have the underlying fs unmounted if nobody has
open files on it, and mounted rw only when somebody has a file opened
on it and if someone has a file opened for write of there is happening
any operation that needs write access.  As we don't have an easy way
to check if we are able to write in one filesystem (we can only use
the IS_RDONLY() macro), it happens that I have to mount the filesystem
rw for being able to call permission in that filesystem.  Notice that
permission don't need write access per se, but the IS_RDONLY() macro
needs to have the filesystem mounted rw to fail.  Yes, I can hack the
macro to do the things that I need, but that means that everybody that
needs that functionality will have to also hack it :(

viro> Again, the main issue here is what do we want, not how to implement it.
viro> Flame away.

I will want a method is the inode/super_block (don't care which of
them) for:
      - is_read_only_fs()?
          Notice that this method told as if we are able to have the
          fs rw, not necessarily that the fs is rw at the moment.
      - get_write_access()
      - put_write_access()

Notice that there exist the functions get_write_access() and
put_write_access() functions in the tree, and I will be really happy if
there where a way to hook fs specific information there, as it will
make a lot of the code in supermount really easy, and the same for
other fs that need similar semantics.

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-08-31 20:56 ` [RFD] readonly/read-write semantics Alexander Viro
  2001-09-01 13:08   ` Juan Quintela
@ 2001-09-04  1:16   ` Jean-Marc Saffroy
  2001-09-04  4:00     ` [PATCH] " Alexander Viro
  1 sibling, 1 reply; 5+ messages in thread
From: Jean-Marc Saffroy @ 2001-09-04  1:16 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel, linux-fsdevel, Linus Torvalds

On Fri, 31 Aug 2001, Alexander Viro wrote:

> > In 2.4.9, I have encountered a strange condition while playing with file
> > structs chained on a superblock list (sb->s_files) : some of them can have
> > a NULL f_dentry pointer. The only case I found which can cause this is
> > when fput is called and f_count drops to zero. Is that the only case ?
>
> Yes, it is, and yes, it's legitimate - code that scans that list should
> (and in-tree one does) deal with such case.

AFAICT fput (and also dentry_open, BTW) nullifies f_dentry without any
lock held, so code that scans the list (such as fs_may_remount_ro, I
haven't looked for other instances) can never assume that a file struct
found in the list has or even will keep what looks like a valid f_dentry.

> fs_may_remount_ro() is, indeed, racy and had been since very long.

Sure, let's consider code in fs_may_remount_ro :

	file_list_lock();
	/* loop over files in sb->s_files */
		if (!file->f_dentry)
			continue;
		/* now a concurrent fput may set f_dentry to NULL */
		inode = file->f_dentry->d_inode; /* oops */

Maybe the file struct should be removed from the list /before/ f_dentry is
assigned NULL ?

> However, the main problem is not in opening something after the
> check - the check itself is not exact enough.

I agree fs_may_remount_ro can report wrong results (ie. "you may remount
ro" while you really can't) because of how it is used, but as stated
above, I think it also has a small but real potential for directly
crashing the system, and should be fixed.


-- 
Jean-Marc Saffroy - Research Engineer - Silicomp Research Institute
mailto:saffroy@ri.silicomp.fr


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] Re: [RFD] readonly/read-write semantics
  2001-09-04  1:16   ` Jean-Marc Saffroy
@ 2001-09-04  4:00     ` Alexander Viro
  0 siblings, 0 replies; 5+ messages in thread
From: Alexander Viro @ 2001-09-04  4:00 UTC (permalink / raw)
  To: Jean-Marc Saffroy; +Cc: linux-kernel, linux-fsdevel, Linus Torvalds



On Tue, 4 Sep 2001, Jean-Marc Saffroy wrote:

> On Fri, 31 Aug 2001, Alexander Viro wrote:
> 
> > > In 2.4.9, I have encountered a strange condition while playing with file
> > > structs chained on a superblock list (sb->s_files) : some of them can have
> > > a NULL f_dentry pointer. The only case I found which can cause this is
> > > when fput is called and f_count drops to zero. Is that the only case ?
> >
> > Yes, it is, and yes, it's legitimate - code that scans that list should
> > (and in-tree one does) deal with such case.
> 
> AFAICT fput (and also dentry_open, BTW) nullifies f_dentry without any
> lock held, so code that scans the list (such as fs_may_remount_ro, I

/me looks at the code in question

Oh, shit.  Thanks for spotting.  Patch below should fix it.  BTW, there
was a similar problem in dquot.c.

Fix being: take the file off-list before we touch dentry/vfsmount in
any way.  BTW, file->f_vfsmnt is always non-NULL, ditto for inode->i_sb,
so I've removed bogus if from fput() and dentry_open().  Another thing
that was way overdue is removal of file_moveto() - I thought that we
had removed it back when loop.c fixes went into the tree (old loop.c
was the only user of that function).

See if the patch below looks sane, but keep in mind that it's completely
untested.

diff -urN S10-pre4/drivers/char/tty_io.c S10-pre4-files/drivers/char/tty_io.c
--- S10-pre4/drivers/char/tty_io.c	Thu Aug 16 20:05:46 2001
+++ S10-pre4-files/drivers/char/tty_io.c	Mon Sep  3 23:45:35 2001
@@ -442,8 +442,6 @@
 	file_list_lock();
 	for (l = tty->tty_files.next; l != &tty->tty_files; l = l->next) {
 		struct file * filp = list_entry(l, struct file, f_list);
-		if (!filp->f_dentry)
-			continue;
 		if (filp->f_dentry->d_inode->i_rdev == CONSOLE_DEV ||
 		    filp->f_dentry->d_inode->i_rdev == SYSCONS_DEV) {
 			cons_filp = filp;
diff -urN S10-pre4/fs/dquot.c S10-pre4-files/fs/dquot.c
--- S10-pre4/fs/dquot.c	Sun Sep  2 23:22:19 2001
+++ S10-pre4-files/fs/dquot.c	Mon Sep  3 23:48:10 2001
@@ -660,7 +660,6 @@
 static void add_dquot_ref(struct super_block *sb, short type)
 {
 	struct list_head *p;
-	struct inode *inode;
 
 	if (!sb->dq_op)
 		return;	/* nothing to do */
@@ -669,13 +668,15 @@
 	file_list_lock();
 	for (p = sb->s_files.next; p != &sb->s_files; p = p->next) {
 		struct file *filp = list_entry(p, struct file, f_list);
-		if (!filp->f_dentry)
-			continue;
-		inode = filp->f_dentry->d_inode;
+		struct inode *inode = filp->f_dentry->d_inode;
 		if (filp->f_mode & FMODE_WRITE && dqinit_needed(inode, type)) {
+			struct vfsmount *mnt = mntget(file->f_vfsmnt);
+			struct dentry *dentry = dget(file->f_dentry);
 			file_list_unlock();
 			sb->dq_op->initialize(inode, type);
 			inode->i_flags |= S_QUOTA;
+			dput(dentry);
+			mntput(mnt);
 			/* As we may have blocked we had better restart... */
 			goto restart;
 		}
diff -urN S10-pre4/fs/file_table.c S10-pre4-files/fs/file_table.c
--- S10-pre4/fs/file_table.c	Thu Apr 19 23:46:42 2001
+++ S10-pre4-files/fs/file_table.c	Mon Sep  3 23:50:59 2001
@@ -107,18 +107,17 @@
 		if (file->f_op && file->f_op->release)
 			file->f_op->release(inode, file);
 		fops_put(file->f_op);
-		file->f_dentry = NULL;
-		file->f_vfsmnt = NULL;
 		if (file->f_mode & FMODE_WRITE)
 			put_write_access(inode);
-		dput(dentry);
-		if (mnt)
-			mntput(mnt);
 		file_list_lock();
+		file->f_dentry = NULL;
+		file->f_vfsmnt = NULL;
 		list_del(&file->f_list);
 		list_add(&file->f_list, &free_list);
 		files_stat.nr_free_files++;
 		file_list_unlock();
+		dput(dentry);
+		mntput(mnt);
 	}
 }
 
@@ -158,14 +157,6 @@
 	file_list_unlock();
 }
 
-void file_moveto(struct file *new, struct file *old)
-{
-	file_list_lock();
-	list_del(&new->f_list);
-	list_add(&new->f_list, &old->f_list);
-	file_list_unlock();
-}
-
 int fs_may_remount_ro(struct super_block *sb)
 {
 	struct list_head *p;
@@ -174,12 +165,7 @@
 	file_list_lock();
 	for (p = sb->s_files.next; p != &sb->s_files; p = p->next) {
 		struct file *file = list_entry(p, struct file, f_list);
-		struct inode *inode;
-
-		if (!file->f_dentry)
-			continue;
-
-		inode = file->f_dentry->d_inode;
+		struct inode *inode = file->f_dentry->d_inode;
 
 		/* File with pending delete? */
 		if (inode->i_nlink == 0)
diff -urN S10-pre4/fs/open.c S10-pre4-files/fs/open.c
--- S10-pre4/fs/open.c	Thu Aug 16 20:05:50 2001
+++ S10-pre4-files/fs/open.c	Mon Sep  3 23:44:49 2001
@@ -634,6 +634,7 @@
 {
 	struct file * f;
 	struct inode *inode;
+	static LIST_HEAD(kill_list);
 	int error;
 
 	error = -ENFILE;
@@ -654,8 +655,7 @@
 	f->f_pos = 0;
 	f->f_reada = 0;
 	f->f_op = fops_get(inode->i_fop);
-	if (inode->i_sb)
-		file_move(f, &inode->i_sb->s_files);
+	file_move(f, &inode->i_sb->s_files);
 	if (f->f_op && f->f_op->open) {
 		error = f->f_op->open(inode,f);
 		if (error)
@@ -669,6 +669,7 @@
 	fops_put(f->f_op);
 	if (f->f_mode & FMODE_WRITE)
 		put_write_access(inode);
+	file_move(f, &kill_list); /* out of the way.. */
 	f->f_dentry = NULL;
 	f->f_vfsmnt = NULL;
 cleanup_file:
diff -urN S10-pre4/fs/proc/generic.c S10-pre4-files/fs/proc/generic.c
--- S10-pre4/fs/proc/generic.c	Tue Jul  3 21:09:13 2001
+++ S10-pre4-files/fs/proc/generic.c	Mon Sep  3 23:50:23 2001
@@ -397,19 +397,18 @@
 	file_list_lock();
 	for (p = sb->s_files.next; p != &sb->s_files; p = p->next) {
 		struct file * filp = list_entry(p, struct file, f_list);
-		struct dentry * dentry;
+		struct dentry * dentry = filp->f_dentry;
 		struct inode * inode;
+		struct file_operations *fops;
 
-		dentry = filp->f_dentry;
-		if (!dentry)
-			continue;
 		if (dentry->d_op != &proc_dentry_operations)
 			continue;
 		inode = dentry->d_inode;
 		if (inode->u.generic_ip != de)
 			continue;
-		fops_put(filp->f_op);
+		fops = filp->f_op;
 		filp->f_op = NULL;
+		fops_put(fops);
 	}
 	file_list_unlock();
 }
diff -urN S10-pre4/include/linux/fs.h S10-pre4-files/include/linux/fs.h
--- S10-pre4/include/linux/fs.h	Mon Sep  3 17:38:27 2001
+++ S10-pre4-files/include/linux/fs.h	Mon Sep  3 23:50:55 2001
@@ -1297,7 +1297,6 @@
 extern void remove_inode_hash(struct inode *);
 extern struct file * get_empty_filp(void);
 extern void file_move(struct file *f, struct list_head *list);
-extern void file_moveto(struct file *new, struct file *old);
 extern struct buffer_head * get_hash_table(kdev_t, int, int);
 extern struct buffer_head * getblk(kdev_t, int, int);
 extern void ll_rw_block(int, int, struct buffer_head * bh[]);
diff -urN S10-pre4/kernel/ksyms.c S10-pre4-files/kernel/ksyms.c
--- S10-pre4/kernel/ksyms.c	Sun Sep  2 23:22:20 2001
+++ S10-pre4-files/kernel/ksyms.c	Mon Sep  3 23:50:50 2001
@@ -307,7 +307,6 @@
 EXPORT_SYMBOL(refile_buffer);
 EXPORT_SYMBOL(max_sectors);
 EXPORT_SYMBOL(max_readahead);
-EXPORT_SYMBOL(file_moveto);
 
 /* tty routines */
 EXPORT_SYMBOL(tty_hangup);


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-09-04  4:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-31 17:18 [Q] [VFS] NULL f_dentry for opened files ; possible race condition Jean-Marc Saffroy
2001-08-31 20:56 ` [RFD] readonly/read-write semantics Alexander Viro
2001-09-01 13:08   ` Juan Quintela
2001-09-04  1:16   ` Jean-Marc Saffroy
2001-09-04  4:00     ` [PATCH] " Alexander Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).