All of lore.kernel.org
 help / color / mirror / Atom feed
* Q: NFSD readdir in linux-2.6.28
@ 2009-03-19 14:54 hooanon05
  2009-03-19 15:17 ` David Woodhouse
  0 siblings, 1 reply; 39+ messages in thread
From: hooanon05 @ 2009-03-19 14:54 UTC (permalink / raw)
  To: David Woodhouse, Al Viro; +Cc: linux-kernel, linux-fsdevel


Hello David and Al,
I have a question about NFSD readdir.

By the commit 14f7dd632011bb89c035722edd6ea0d90ca6b078
"[PATCH] Copy XFS readdir hack into nfsd code", nfsd_buffered_filldir()
was introduced and nfs3svc_encode_entry_plus() (the 'func' parameter) is
not called from vfs_readdir().

In 2.6.27, when nfs3svc_encode_entry_plus() calls lookup_one_len(), the
i_mutex lock was acquired by vfs_readdir() and it was not a problem.

After the commit (above), nfsd_readdir/nfsd_buffered_readdir/vfs_readdir
calls nfsd_buffered_filldir(), and nfs3svc_encode_entry_plus() is called
later.
In this sequence, lookup_one_len() is called without i_mutex held.

Isn't it a problem?


J. R. Okajima

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-03-19 14:54 Q: NFSD readdir in linux-2.6.28 hooanon05
@ 2009-03-19 15:17 ` David Woodhouse
  2009-03-19 15:34   ` hooanon05
                     ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: David Woodhouse @ 2009-03-19 15:17 UTC (permalink / raw)
  To: hooanon05; +Cc: Al Viro, linux-kernel, linux-fsdevel

On Thu, 2009-03-19 at 14:54 +0000, hooanon05@yahoo.co.jp wrote:
> 
> Hello David and Al,
> I have a question about NFSD readdir.
> 
> By the commit 14f7dd632011bb89c035722edd6ea0d90ca6b078
> "[PATCH] Copy XFS readdir hack into nfsd code", nfsd_buffered_filldir()
> was introduced and nfs3svc_encode_entry_plus() (the 'func' parameter) is
> not called from vfs_readdir().
> 
> In 2.6.27, when nfs3svc_encode_entry_plus() calls lookup_one_len(), the
> i_mutex lock was acquired by vfs_readdir() and it was not a problem.
> 
> After the commit (above), nfsd_readdir/nfsd_buffered_readdir/vfs_readdir
> calls nfsd_buffered_filldir(), and nfs3svc_encode_entry_plus() is called
> later.
> In this sequence, lookup_one_len() is called without i_mutex held.
> 
> Isn't it a problem?

Yes, well spotted. It didn't matter when the buffered readdir() was
purely internal to XFS, because it didn't matter there that we called
->lookup() without i_mutex set. But now we're exposing arbitrary file
systems to it, we need to make sure we follow the locking rules. 

I _think_ it's sufficient to make the affected callers of
lookup_one_len() lock the parent's i_mutex for themselves before calling
it. I'll take a closer look...

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-03-19 15:17 ` David Woodhouse
@ 2009-03-19 15:34   ` hooanon05
  2009-03-19 15:51     ` David Woodhouse
  2009-04-17  9:32     ` David Woodhouse
  2009-03-19 15:37   ` Q: NFSD readdir in linux-2.6.28 Matthew Wilcox
  2009-04-16 21:45   ` J. Bruce Fields
  2 siblings, 2 replies; 39+ messages in thread
From: hooanon05 @ 2009-03-19 15:34 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Al Viro, linux-kernel, linux-fsdevel


David Woodhouse:
> Yes, well spotted. It didn't matter when the buffered readdir() was
> purely internal to XFS, because it didn't matter there that we called
> ->lookup() without i_mutex set. But now we're exposing arbitrary file
> systems to it, we need to make sure we follow the locking rules. 
> 
> I _think_ it's sufficient to make the affected callers of
> lookup_one_len() lock the parent's i_mutex for themselves before calling
> it. I'll take a closer look...

If you remember why you discarded the FS_NO_LOOKUP_IN_READDIR flag
approach, please let me know. URL or something is enough.

Thanx
J. R. Okajima

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-03-19 15:17 ` David Woodhouse
  2009-03-19 15:34   ` hooanon05
@ 2009-03-19 15:37   ` Matthew Wilcox
  2009-03-19 15:45     ` hooanon05
  2009-03-19 16:01     ` David Woodhouse
  2009-04-16 21:45   ` J. Bruce Fields
  2 siblings, 2 replies; 39+ messages in thread
From: Matthew Wilcox @ 2009-03-19 15:37 UTC (permalink / raw)
  To: David Woodhouse; +Cc: hooanon05, Al Viro, linux-kernel, linux-fsdevel

On Thu, Mar 19, 2009 at 03:17:17PM +0000, David Woodhouse wrote:
> > In 2.6.27, when nfs3svc_encode_entry_plus() calls lookup_one_len(), the
> > i_mutex lock was acquired by vfs_readdir() and it was not a problem.
> > 
> > After the commit (above), nfsd_readdir/nfsd_buffered_readdir/vfs_readdir
> > calls nfsd_buffered_filldir(), and nfs3svc_encode_entry_plus() is called
> > later.
> > In this sequence, lookup_one_len() is called without i_mutex held.
> > 
> > Isn't it a problem?
> 
> Yes, well spotted. It didn't matter when the buffered readdir() was
> purely internal to XFS, because it didn't matter there that we called
> ->lookup() without i_mutex set. But now we're exposing arbitrary file
> systems to it, we need to make sure we follow the locking rules. 
> 
> I _think_ it's sufficient to make the affected callers of
> lookup_one_len() lock the parent's i_mutex for themselves before calling
> it. I'll take a closer look...

Should we also add this?

---

Ensure inode is locked in lookup_one_len()

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

diff --git a/fs/namei.c b/fs/namei.c
index bbc15c2..476b1d0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1244,6 +1244,7 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
 	int err;
 	struct qstr this;
 
+	BUG_ON(!mutex_is_locked(&base->d_inode->i_mutex));
 	err = __lookup_one_len(name, &this, base, len);
 	if (err)
 		return ERR_PTR(err);

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-03-19 15:37   ` Q: NFSD readdir in linux-2.6.28 Matthew Wilcox
@ 2009-03-19 15:45     ` hooanon05
  2009-03-19 16:01     ` David Woodhouse
  1 sibling, 0 replies; 39+ messages in thread
From: hooanon05 @ 2009-03-19 15:45 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: David Woodhouse, Al Viro, linux-kernel, linux-fsdevel


Matthew Wilcox:
> > I _think_ it's sufficient to make the affected callers of
> > lookup_one_len() lock the parent's i_mutex for themselves before calling
> > it. I'll take a closer look...
> 
> Should we also add this?
	:::
> @@ -1244,6 +1244,7 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
>  	int err;
>  	struct qstr this;
>  
> +	BUG_ON(!mutex_is_locked(&base->d_inode->i_mutex));
>  	err = __lookup_one_len(name, &this, base, len);

That was how I found this problem actually. :-)
I implemented very similar debug code in aufs which is compiled when
CONFIG_AUFS_DEBUG is enabled.

(in aufs header files)
#ifdef CONFIG_AUFS_DEBUG
#define AuDebugOn(a)		BUG_ON(a)
	:::
#endif

/* to debug easier, do not make them inlined functions */
#define MtxMustLock(mtx)	AuDebugOn(!mutex_is_locked(mtx))
#define IMustLock(i)		MtxMustLock(&(i)->i_mutex)

(in ->lookup of aufs)
static struct dentry *aufs_lookup(struct inode *dir, struct dentry *dentry,
				  struct nameidata *nd)
{
	:::
	IMustLock(dir);


J. R. Okajima

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-03-19 15:34   ` hooanon05
@ 2009-03-19 15:51     ` David Woodhouse
  2009-04-17  9:32     ` David Woodhouse
  1 sibling, 0 replies; 39+ messages in thread
From: David Woodhouse @ 2009-03-19 15:51 UTC (permalink / raw)
  To: hooanon05; +Cc: Al Viro, linux-kernel, linux-fsdevel

On Fri, 2009-03-20 at 00:34 +0900, hooanon05@yahoo.co.jp wrote:
> If you remember why you discarded the FS_NO_LOOKUP_IN_READDIR flag
> approach, please let me know. URL or something is enough.

I was just lamenting that decision. I think someone persuaded me that
the extra complexity wasn't worth it, and that we should just do it
unconditionally. 

One option would be to restore the FS_NO_LOOKUP_IN_READDIR flag, and
document that it means that your ->lookup is called _without_ i_mutex
held. That would actually be OK in all cases that need the flag, I
believe (since the whole point in those cases is that they have their
_own_ locking which is why we did it in the first place).

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-03-19 15:37   ` Q: NFSD readdir in linux-2.6.28 Matthew Wilcox
  2009-03-19 15:45     ` hooanon05
@ 2009-03-19 16:01     ` David Woodhouse
  1 sibling, 0 replies; 39+ messages in thread
From: David Woodhouse @ 2009-03-19 16:01 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: hooanon05, Al Viro, linux-kernel, linux-fsdevel

On Thu, 2009-03-19 at 09:37 -0600, Matthew Wilcox wrote:
> Should we also add this?

Maybe. It was actually perfectly OK when XFS was doing that to _itself_
though, and that BUG_ON() would have triggered. It's only now that we're
doing it unconditionally that it's really a problem.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-03-19 15:17 ` David Woodhouse
  2009-03-19 15:34   ` hooanon05
  2009-03-19 15:37   ` Q: NFSD readdir in linux-2.6.28 Matthew Wilcox
@ 2009-04-16 21:45   ` J. Bruce Fields
  2009-04-17  4:13     ` hooanon05
  2 siblings, 1 reply; 39+ messages in thread
From: J. Bruce Fields @ 2009-04-16 21:45 UTC (permalink / raw)
  To: David Woodhouse; +Cc: hooanon05, Al Viro, linux-kernel, linux-fsdevel

On Thu, Mar 19, 2009 at 03:17:17PM +0000, David Woodhouse wrote:
> On Thu, 2009-03-19 at 14:54 +0000, hooanon05@yahoo.co.jp wrote:
> > 
> > Hello David and Al,
> > I have a question about NFSD readdir.
> > 
> > By the commit 14f7dd632011bb89c035722edd6ea0d90ca6b078
> > "[PATCH] Copy XFS readdir hack into nfsd code", nfsd_buffered_filldir()
> > was introduced and nfs3svc_encode_entry_plus() (the 'func' parameter) is
> > not called from vfs_readdir().
> > 
> > In 2.6.27, when nfs3svc_encode_entry_plus() calls lookup_one_len(), the
> > i_mutex lock was acquired by vfs_readdir() and it was not a problem.
> > 
> > After the commit (above), nfsd_readdir/nfsd_buffered_readdir/vfs_readdir
> > calls nfsd_buffered_filldir(), and nfs3svc_encode_entry_plus() is called
> > later.
> > In this sequence, lookup_one_len() is called without i_mutex held.
> > 
> > Isn't it a problem?
> 
> Yes, well spotted. It didn't matter when the buffered readdir() was
> purely internal to XFS, because it didn't matter there that we called
> ->lookup() without i_mutex set. But now we're exposing arbitrary file
> systems to it, we need to make sure we follow the locking rules. 
> 
> I _think_ it's sufficient to make the affected callers of
> lookup_one_len() lock the parent's i_mutex for themselves before calling
> it. I'll take a closer look...

Yipes--is this problem still here?

--b.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-04-16 21:45   ` J. Bruce Fields
@ 2009-04-17  4:13     ` hooanon05
  0 siblings, 0 replies; 39+ messages in thread
From: hooanon05 @ 2009-04-17  4:13 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: David Woodhouse, Al Viro, linux-kernel, linux-fsdevel


"J. Bruce Fields":
> > > After the commit (above), nfsd_readdir/nfsd_buffered_readdir/vfs_readdir
> > > calls nfsd_buffered_filldir(), and nfs3svc_encode_entry_plus() is called
> > > later.
> > > In this sequence, lookup_one_len() is called without i_mutex held.
	:::
> Yipes--is this problem still here?

Yes.


J. R. Okajima

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-03-19 15:34   ` hooanon05
  2009-03-19 15:51     ` David Woodhouse
@ 2009-04-17  9:32     ` David Woodhouse
  2009-04-17 19:32       ` Al Viro
  1 sibling, 1 reply; 39+ messages in thread
From: David Woodhouse @ 2009-04-17  9:32 UTC (permalink / raw)
  To: hooanon05; +Cc: Al Viro, hch, linux-kernel, linux-fsdevel

On Fri, 2009-03-20 at 00:34 +0900, hooanon05@yahoo.co.jp wrote:
> David Woodhouse:
> > Yes, well spotted. It didn't matter when the buffered readdir() was
> > purely internal to XFS, because it didn't matter there that we called
> > ->lookup() without i_mutex set. But now we're exposing arbitrary file
> > systems to it, we need to make sure we follow the locking rules. 
> > 
> > I _think_ it's sufficient to make the affected callers of
> > lookup_one_len() lock the parent's i_mutex for themselves before calling
> > it. I'll take a closer look...
> 
> If you remember why you discarded the FS_NO_LOOKUP_IN_READDIR flag
> approach, please let me know. URL or something is enough.

I think someone talked me into doing it in the interest of simplicity,
so we confine the necessary hack into the NFS code alone and _always_
just use the "safe" version. I can't remember who it was, but I'm
guessing Al or Christoph -- both of whom are CC'd in case they want to
object:

Given the fun with i_mutex I think the best answer is to bring it back.
I'm about to test this patch which restores it (and sets it for btrfs,
jffs2 and xfs)...

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index deeeed0..06f539c 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -357,7 +357,10 @@ otherwise noted.
 	If you wish to overload the dentry methods then you should
 	initialise the "d_dop" field in the dentry; this is a pointer
 	to a struct "dentry_operations".
-	This method is called with the directory inode semaphore held
+	This method is called with the directory inode mutex held
+	unless the file system sets the FS_NO_LOOKUP_IN_READDIR flag
+	its file system flags, in which case lookup() calls from NFS
+	readdirplus() requests will be made without directory mutex.
 
   link: called by the link(2) system call. Only required if you want
 	to support hard links. You will probably need to call
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 9744af9..f0e9d18 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -619,7 +619,7 @@ static struct file_system_type btrfs_fs_type = {
 	.name		= "btrfs",
 	.get_sb		= btrfs_get_sb,
 	.kill_sb	= kill_anon_super,
-	.fs_flags	= FS_REQUIRES_DEV,
+	.fs_flags	= FS_REQUIRES_DEV | FS_NO_LOOKUP_IN_READDIR,
 };
 
 /*
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index 4c4e18c..0e97a35 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -209,6 +209,7 @@ static struct file_system_type jffs2_fs_type = {
 	.name =		"jffs2",
 	.get_sb =	jffs2_get_sb,
 	.kill_sb =	jffs2_kill_sb,
+	.flags =	FS_NO_LOOKUP_IN_READDIR,
 };
 
 static int __init init_jffs2_fs(void)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ab93fcf..230e30e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1841,6 +1841,30 @@ out:
 	return err;
 }
 
+
+static int nfsd_real_readdir(struct file *file, filldir_t func,
+			     struct readdir_cd *cdp, loff_t *offsetp)
+{
+	int host_err;
+
+	/*
+	 * Read the directory entries. This silly loop is necessary because
+	 * readdir() is not guaranteed to fill up the entire buffer, but
+	 * may choose to do less.
+	 */
+	do {
+		cdp->err = nfserr_eof; /* will be cleared on successful read */
+		host_err = vfs_readdir(file, func, cdp);
+	} while (host_err >=0 && cdp->err == nfs_ok);
+
+	*offsetp = vfs_llseek(file, 0, 1);
+
+	if (host_err)
+		return nfserrno(host_err);
+	else
+		return cdp->err;
+}
+
 /*
  * We do this buffering because we must not call back into the file
  * system's ->lookup() method from the filldir callback. That may well
@@ -1970,7 +1994,11 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp,
 		goto out_close;
 	}
 
-	err = nfsd_buffered_readdir(file, func, cdp, offsetp);
+	if ((file->f_path.dentry->d_inode->i_sb->s_type->fs_flags &
+	     FS_NO_LOOKUP_IN_READDIR))
+		err = nfsd_buffered_readdir(file, func, cdp, offsetp);
+	else
+		err = nfsd_real_readdir(file, func, cdp, offsetp);
 
 	if (err == nfserr_eof || err == nfserr_toosmall)
 		err = nfs_ok; /* can still be found in ->err */
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index bb68526..7bba46a 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1546,7 +1546,7 @@ static struct file_system_type xfs_fs_type = {
 	.name			= "xfs",
 	.get_sb			= xfs_fs_get_sb,
 	.kill_sb		= kill_block_super,
-	.fs_flags		= FS_REQUIRES_DEV,
+	.fs_flags		= FS_REQUIRES_DEV | FS_NO_LOOKUP_IN_READDIR,
 };
 
 STATIC int __init
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e766be0..caf2a94 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -176,6 +176,11 @@ struct inodes_stat_t {
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move()
 					 * during rename() internally.
 					 */
+#define FS_NO_LOOKUP_IN_READDIR	65536	/* FS has its own locking and will
+	   deadlock if you call its lookup() method from within a filldir
+	   function, as NFSd wants to. A file system setting this flag must
+	   be happy for its ->lookup() method to be called without i_mutex,
+	   which is what the NFSd workaround code will do. */
 
 /*
  * These are the fs-independent mount-flags: up to 32 flags are supported


-- 
dwmw2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-04-17  9:32     ` David Woodhouse
@ 2009-04-17 19:32       ` Al Viro
  2009-04-17 22:17         ` David Woodhouse
  0 siblings, 1 reply; 39+ messages in thread
From: Al Viro @ 2009-04-17 19:32 UTC (permalink / raw)
  To: David Woodhouse; +Cc: hooanon05, hch, linux-kernel, linux-fsdevel

On Fri, Apr 17, 2009 at 10:32:19AM +0100, David Woodhouse wrote:
> On Fri, 2009-03-20 at 00:34 +0900, hooanon05@yahoo.co.jp wrote:
> > David Woodhouse:
> > > Yes, well spotted. It didn't matter when the buffered readdir() was
> > > purely internal to XFS, because it didn't matter there that we called
> > > ->lookup() without i_mutex set. But now we're exposing arbitrary file
> > > systems to it, we need to make sure we follow the locking rules. 
> > > 
> > > I _think_ it's sufficient to make the affected callers of
> > > lookup_one_len() lock the parent's i_mutex for themselves before calling
> > > it. I'll take a closer look...
> > 
> > If you remember why you discarded the FS_NO_LOOKUP_IN_READDIR flag
> > approach, please let me know. URL or something is enough.
> 
> I think someone talked me into doing it in the interest of simplicity,
> so we confine the necessary hack into the NFS code alone and _always_
> just use the "safe" version. I can't remember who it was, but I'm
> guessing Al or Christoph -- both of whom are CC'd in case they want to
> object:

Ow...  Sorry, I've missed that one.  I really don't like flags-based
solution ;-/  It's not just filesystem method that want i_mutex there -
we have fs/namei.c code suddenly called in different locking conditions
now.

What were the details of xfs and jffs2 locking problems, just in case
if something can be done in that direction short-term?


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-04-17 19:32       ` Al Viro
@ 2009-04-17 22:17         ` David Woodhouse
  2009-04-17 22:43           ` Al Viro
  0 siblings, 1 reply; 39+ messages in thread
From: David Woodhouse @ 2009-04-17 22:17 UTC (permalink / raw)
  To: Al Viro; +Cc: hooanon05, hch, linux-kernel, linux-fsdevel

On Fri, 2009-04-17 at 20:32 +0100, Al Viro wrote:
> Ow...  Sorry, I've missed that one.  I really don't like flags-based
> solution ;-/  It's not just filesystem method that want i_mutex there -
> we have fs/namei.c code suddenly called in different locking conditions
> now.

Well, we have that already.

When nfsd readdirplus code ends up calling back into lookup_one_len(),
it does so without i_mutex held. XFS had been doing it that way for
years, so that detail escaped our notice when we shifted the XFS 'hack'
into nfsd code.

> What were the details of xfs and jffs2 locking problems, just in case
> if something can be done in that direction short-term?

JFFS2 has its own internal mutex protecting the directory. It needs an
internal mutex instead of i_mutex because of ordering issues with
garbage collection. We are _in_ jffs2_readdir(), with the lock held,
when we call back into nfsd's filldir function... which calls right back
into jffs2_lookup() and deadlocks when it tries to take the same lock
again.

The situation in btrfs and xfs is broadly similar. They have their own
locks, and we end up deadlocking. GFS2 has some lovely "is this process
the owner of the lock" magic to avoid a similar deadlock.

It sounds like the better answer is to just make sure i_mutex is held
when nfsd_buffered_readdir() calls back into the provided filldir
function (we could do it in the various filldir functions themselves,
_if_ they call lookup_one_len(), but I think I prefer it this way --
it's simpler). Patch below for comment.

(While I'm staring at it, it looks like nfsd_buffered_readdir() should
be returning a __be32 not an int, and its 'return -ENOMEM' should be
'return nfserrno(-ENOMEM)'. The first bug I inherited from the existing
nfsd_do_readdir() when I replaced it, but the second is all my own. I'll
send a patch to fix those shortly.)

diff --git a/fs/namei.c b/fs/namei.c
index b8433eb..78f253c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1248,6 +1248,8 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
 	int err;
 	struct qstr this;
 
+	WARN_ON_ONCE(!mutex_is_locked(&base->d_inode->i_mutex));
+
 	err = __lookup_one_len(name, &this, base, len);
 	if (err)
 		return ERR_PTR(err);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ab93fcf..fb2965d 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1920,13 +1920,27 @@ static int nfsd_buffered_readdir(struct file *file, filldir_t func,
 			break;
 
 		de = (struct buffered_dirent *)buf.dirent;
+
 		while (size > 0) {
+			struct inode *dir_inode = file->f_path.dentry->d_inode;
+			int finished;
+
 			offset = de->offset;
 
-			if (func(cdp, de->name, de->namlen, de->offset,
-				 de->ino, de->d_type))
+			/*
+			 * Various filldir functions may end up calling back
+			 * into lookup_one_len() and the file system's 
+			 * ->lookup() method. These expect i_mutex to be held.
+			 */
+			host_err = mutex_lock_killable(&dir_inode->i_mutex);
+			if (host_err)
 				goto done;
 
+			finished = func(cdp, de->name, de->namlen, de->offset,
+					de->ino, de->d_type);
+
+			mutex_unlock(&dir_inode->i_mutex);
+
 			if (cdp->err != nfs_ok)
 				goto done;
 



-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-04-17 22:17         ` David Woodhouse
@ 2009-04-17 22:43           ` Al Viro
  2009-04-17 22:51             ` David Woodhouse
  2009-04-17 22:53             ` Al Viro
  0 siblings, 2 replies; 39+ messages in thread
From: Al Viro @ 2009-04-17 22:43 UTC (permalink / raw)
  To: David Woodhouse; +Cc: hooanon05, hch, linux-kernel, linux-fsdevel

On Fri, Apr 17, 2009 at 11:17:00PM +0100, David Woodhouse wrote:
> It sounds like the better answer is to just make sure i_mutex is held
> when nfsd_buffered_readdir() calls back into the provided filldir
> function (we could do it in the various filldir functions themselves,
> _if_ they call lookup_one_len(), but I think I prefer it this way --
> it's simpler). Patch below for comment.

Umm...  I can live with that, assuming that we don't have callbacks
that take i_mutex themselves.  AFAICS, everything we call there is
either obviously not touching i_mutex or is already called while we
hold i_mutex elsewhere, but I'd appreciate if somebody actually
tested that sucker for different versions of protocol...

> (While I'm staring at it, it looks like nfsd_buffered_readdir() should
> be returning a __be32 not an int, and its 'return -ENOMEM' should be
> 'return nfserrno(-ENOMEM)'. The first bug I inherited from the existing
> nfsd_do_readdir() when I replaced it, but the second is all my own. I'll
> send a patch to fix those shortly.)

Fold it into this one, please.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-04-17 22:43           ` Al Viro
@ 2009-04-17 22:51             ` David Woodhouse
  2009-04-17 22:53             ` Al Viro
  1 sibling, 0 replies; 39+ messages in thread
From: David Woodhouse @ 2009-04-17 22:51 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, hooanon05, hch, linux-kernel, linux-fsdevel

On Fri, 2009-04-17 at 23:43 +0100, Al Viro wrote:
> On Fri, Apr 17, 2009 at 11:17:00PM +0100, David Woodhouse wrote:
> > It sounds like the better answer is to just make sure i_mutex is held
> > when nfsd_buffered_readdir() calls back into the provided filldir
> > function (we could do it in the various filldir functions themselves,
> > _if_ they call lookup_one_len(), but I think I prefer it this way --
> > it's simpler). Patch below for comment.
> 
> Umm...  I can live with that, assuming that we don't have callbacks
> that take i_mutex themselves.  AFAICS, everything we call there is
> either obviously not touching i_mutex or is already called while we
> hold i_mutex elsewhere,

More than that... until commit 14f7dd63, we were holding i_mutex when we
called back into those callbacks from nfsd_readdir() itself. We're only
reverting to a fairly recent behaviour, in that respect.

> > (While I'm staring at it, it looks like nfsd_buffered_readdir() should
> > be returning a __be32 not an int, and its 'return -ENOMEM' should be
> > 'return nfserrno(-ENOMEM)'. The first bug I inherited from the existing
> > nfsd_do_readdir() when I replaced it, but the second is all my own. I'll
> > send a patch to fix those shortly.)
> 
> Fold it into this one, please.

OK.

Ah, and see also commit 05f4f678b (Bruce).

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-04-17 22:43           ` Al Viro
  2009-04-17 22:51             ` David Woodhouse
@ 2009-04-17 22:53             ` Al Viro
  2009-04-17 22:55               ` David Woodhouse
                                 ` (2 more replies)
  1 sibling, 3 replies; 39+ messages in thread
From: Al Viro @ 2009-04-17 22:53 UTC (permalink / raw)
  To: David Woodhouse; +Cc: hooanon05, hch, linux-kernel, linux-fsdevel

On Fri, Apr 17, 2009 at 11:43:50PM +0100, Al Viro wrote:
> On Fri, Apr 17, 2009 at 11:17:00PM +0100, David Woodhouse wrote:
> > It sounds like the better answer is to just make sure i_mutex is held
> > when nfsd_buffered_readdir() calls back into the provided filldir
> > function (we could do it in the various filldir functions themselves,
> > _if_ they call lookup_one_len(), but I think I prefer it this way --
> > it's simpler). Patch below for comment.
> 
> Umm...  I can live with that, assuming that we don't have callbacks
> that take i_mutex themselves.  AFAICS, everything we call there is
> either obviously not touching i_mutex or is already called while we
> hold i_mutex elsewhere, but I'd appreciate if somebody actually
> tested that sucker for different versions of protocol...

BTW, why mess with taking i_mutex inside the inner loop and not
immediately around it?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-04-17 22:53             ` Al Viro
@ 2009-04-17 22:55               ` David Woodhouse
  2009-04-17 23:23               ` David Woodhouse
  2009-04-18  0:15               ` [PATCH] Fix i_mutex handling in nfsd readdir David Woodhouse
  2 siblings, 0 replies; 39+ messages in thread
From: David Woodhouse @ 2009-04-17 22:55 UTC (permalink / raw)
  To: Al Viro; +Cc: hooanon05, hch, linux-kernel, linux-fsdevel

On Fri, 2009-04-17 at 23:53 +0100, Al Viro wrote:
> On Fri, Apr 17, 2009 at 11:43:50PM +0100, Al Viro wrote:
> > On Fri, Apr 17, 2009 at 11:17:00PM +0100, David Woodhouse wrote:
> > > It sounds like the better answer is to just make sure i_mutex is held
> > > when nfsd_buffered_readdir() calls back into the provided filldir
> > > function (we could do it in the various filldir functions themselves,
> > > _if_ they call lookup_one_len(), but I think I prefer it this way --
> > > it's simpler). Patch below for comment.
> > 
> > Umm...  I can live with that, assuming that we don't have callbacks
> > that take i_mutex themselves.  AFAICS, everything we call there is
> > either obviously not touching i_mutex or is already called while we
> > hold i_mutex elsewhere, but I'd appreciate if somebody actually
> > tested that sucker for different versions of protocol...
> 
> BTW, why mess with taking i_mutex inside the inner loop and not
> immediately around it?

Just to avoid making spaghetti of the bail-out cases, really. I did
consider it (it wouldn't be _that_ bad), but figured that it wasn't such
a bad thing if we don't hog the lock.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-04-17 22:53             ` Al Viro
  2009-04-17 22:55               ` David Woodhouse
@ 2009-04-17 23:23               ` David Woodhouse
  2009-04-17 23:37                 ` Al Viro
  2009-04-18  0:15               ` [PATCH] Fix i_mutex handling in nfsd readdir David Woodhouse
  2 siblings, 1 reply; 39+ messages in thread
From: David Woodhouse @ 2009-04-17 23:23 UTC (permalink / raw)
  To: Al Viro; +Cc: hooanon05, hch, linux-kernel, linux-fsdevel

On Fri, 2009-04-17 at 23:53 +0100, Al Viro wrote:
> On Fri, Apr 17, 2009 at 11:43:50PM +0100, Al Viro wrote:
> > On Fri, Apr 17, 2009 at 11:17:00PM +0100, David Woodhouse wrote:
> > > It sounds like the better answer is to just make sure i_mutex is held
> > > when nfsd_buffered_readdir() calls back into the provided filldir
> > > function (we could do it in the various filldir functions themselves,
> > > _if_ they call lookup_one_len(), but I think I prefer it this way --
> > > it's simpler). Patch below for comment.
> > 
> > Umm...  I can live with that, assuming that we don't have callbacks
> > that take i_mutex themselves.  AFAICS, everything we call there is
> > either obviously not touching i_mutex or is already called while we
> > hold i_mutex elsewhere, but I'd appreciate if somebody actually
> > tested that sucker for different versions of protocol...
> 
> BTW, why mess with taking i_mutex inside the inner loop and not
> immediately around it?

Or, to phrase my last response slightly differently... because I don't
like either of these two versions very much...



diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ab93fcf..0523e9f 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1885,9 +1885,10 @@ static int nfsd_buffered_filldir(void *__buf, const char *name, int namlen,
 	return 0;
 }
 
-static int nfsd_buffered_readdir(struct file *file, filldir_t func,
-				 struct readdir_cd *cdp, loff_t *offsetp)
+static __be32 nfsd_buffered_readdir(struct file *file, filldir_t func,
+				    struct readdir_cd *cdp, loff_t *offsetp)
 {
+	struct inode *dir_inode = file->f_path.dentry->d_inode;
 	struct readdir_data buf;
 	struct buffered_dirent *de;
 	int host_err;
@@ -1896,7 +1897,7 @@ static int nfsd_buffered_readdir(struct file *file, filldir_t func,
 
 	buf.dirent = (void *)__get_free_page(GFP_KERNEL);
 	if (!buf.dirent)
-		return -ENOMEM;
+		return nfserrno(-ENOMEM);
 
 	offset = *offsetp;
 
@@ -1912,23 +1913,37 @@ static int nfsd_buffered_readdir(struct file *file, filldir_t func,
 			host_err = 0;
 
 		if (host_err < 0)
-			break;
+			goto done;
 
 		size = buf.used;
 
 		if (!size)
-			break;
+			goto done;
 
 		de = (struct buffered_dirent *)buf.dirent;
+
+		host_err = mutex_lock_killable(&dir_inode->i_mutex);
+		if (host_err)
+			goto done;
+
 		while (size > 0) {
+
+			int finished;
+
 			offset = de->offset;
 
+			/*
+			 * Various filldir functions may end up calling back
+			 * into lookup_one_len() and the file system's 
+			 * ->lookup() method. These expect i_mutex to be held.
+			 */
+
 			if (func(cdp, de->name, de->namlen, de->offset,
 				 de->ino, de->d_type))
-				goto done;
-
+				goto done_unlock;
+			    
 			if (cdp->err != nfs_ok)
-				goto done;
+				goto done_unlock;
 
 			reclen = ALIGN(sizeof(*de) + de->namlen,
 				       sizeof(u64));
@@ -1937,6 +1952,8 @@ static int nfsd_buffered_readdir(struct file *file, filldir_t func,
 		}
 		offset = vfs_llseek(file, 0, SEEK_CUR);
 	}
+ done_unlock:
+	mutex_unlock(&dir_inode->i_mutex);
 
  done:
 	free_page((unsigned long)(buf.dirent));



====================================================================

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ab93fcf..915a47c 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1885,8 +1885,8 @@ static int nfsd_buffered_filldir(void *__buf, const char *name, int namlen,
 	return 0;
 }
 
-static int nfsd_buffered_readdir(struct file *file, filldir_t func,
-				 struct readdir_cd *cdp, loff_t *offsetp)
+static __be32 nfsd_buffered_readdir(struct file *file, filldir_t func,
+				    struct readdir_cd *cdp, loff_t *offsetp)
 {
 	struct readdir_data buf;
 	struct buffered_dirent *de;
@@ -1896,11 +1896,12 @@ static int nfsd_buffered_readdir(struct file *file, filldir_t func,
 
 	buf.dirent = (void *)__get_free_page(GFP_KERNEL);
 	if (!buf.dirent)
-		return -ENOMEM;
+		return nfserrno(-ENOMEM);
 
 	offset = *offsetp;
 
 	while (1) {
+		struct inode *dir_inode = file->f_path.dentry->d_inode;
 		unsigned int reclen;
 
 		cdp->err = nfserr_eof; /* will be cleared on successful read */
@@ -1920,24 +1921,42 @@ static int nfsd_buffered_readdir(struct file *file, filldir_t func,
 			break;
 
 		de = (struct buffered_dirent *)buf.dirent;
+
+		host_err = mutex_lock_killable(&dir_inode->i_mutex);
+		if (host_err)
+			break;
+
 		while (size > 0) {
+			int finished;
+
 			offset = de->offset;
 
+			/*
+			 * Various filldir functions may end up calling back
+			 * into lookup_one_len() and the file system's 
+			 * ->lookup() method. These expect i_mutex to be held.
+			 */
+
 			if (func(cdp, de->name, de->namlen, de->offset,
 				 de->ino, de->d_type))
-				goto done;
-
+				goto done_unlock;
+			    
 			if (cdp->err != nfs_ok)
-				goto done;
+				goto done_unlock;
 
 			reclen = ALIGN(sizeof(*de) + de->namlen,
 				       sizeof(u64));
 			size -= reclen;
 			de = (struct buffered_dirent *)((char *)de + reclen);
 		}
+		mutex_unlock(&dir_inode->i_mutex);
 		offset = vfs_llseek(file, 0, SEEK_CUR);
-	}
+		continue;
 
+	done_unlock:
+		mutex_unlock(&dir_inode->i_mutex);
+		break;
+	}
  done:
 	free_page((unsigned long)(buf.dirent));
 

-- 
dwmw2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-04-17 23:23               ` David Woodhouse
@ 2009-04-17 23:37                 ` Al Viro
  2009-04-17 23:39                   ` Al Viro
  0 siblings, 1 reply; 39+ messages in thread
From: Al Viro @ 2009-04-17 23:37 UTC (permalink / raw)
  To: David Woodhouse; +Cc: hooanon05, hch, linux-kernel, linux-fsdevel

On Sat, Apr 18, 2009 at 12:23:32AM +0100, David Woodhouse wrote:

> Or, to phrase my last response slightly differently... because I don't
> like either of these two versions very much...

Eh?  Just have
		host_err = mutex_lock_killable(....);
		if (host_err)
			break;
		de = ...
                while (size > 0) {
			offset = de->offset;
			if (func(cdp, de->name, de->namlen, de->offset,
				 de->ino, de->d_type))
				break;
			if (cdp->err != nfs_ok)
				break;
			...
			size -= reclen;
			de = ...
		}
		mutex_unlock(....);
		if (size > 0) /* we'd an error */
			break;
		offset = vfs_llseek(....);
	}
	free_page(....);
and to hell with all goto in there...

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Q: NFSD readdir in linux-2.6.28
  2009-04-17 23:37                 ` Al Viro
@ 2009-04-17 23:39                   ` Al Viro
  0 siblings, 0 replies; 39+ messages in thread
From: Al Viro @ 2009-04-17 23:39 UTC (permalink / raw)
  To: David Woodhouse; +Cc: hooanon05, hch, linux-kernel, linux-fsdevel

On Sat, Apr 18, 2009 at 12:37:55AM +0100, Al Viro wrote:
> 		if (size > 0) /* we'd an error */
				      bailed out early
that is.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] Fix i_mutex handling in nfsd readdir.
  2009-04-17 22:53             ` Al Viro
  2009-04-17 22:55               ` David Woodhouse
  2009-04-17 23:23               ` David Woodhouse
@ 2009-04-18  0:15               ` David Woodhouse
  2009-04-18  3:11                 ` hooanon05
  2009-04-19 12:27                 ` [PATCH v2] " David Woodhouse
  2 siblings, 2 replies; 39+ messages in thread
From: David Woodhouse @ 2009-04-18  0:15 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, hooanon05, hch, linux-kernel, linux-fsdevel

Commit 14f7dd63 ("Copy XFS readdir hack into nfsd code") introduced a
bug to generic code which had been extant for a long time in the XFS
version -- it started to call through into lookup_one_len() and hence
into the file systems' ->lookup() methods without i_mutex held on the
directory.

This patch fixes it by locking the directory's i_mutex again before
calling the filldir functions. The original deadlocks which commit
14f7dd63 was designed to avoid are still avoided, because they were due
to fs-internal locking, not i_mutex.

While we're at it, fix the return type of nfsd_buffered_readdir() which
should be a __be32 not an int -- it's an NFS errno, not a Linux errno.
And return nfserrno(-ENOMEM) when allocation fails, not just -ENOMEM.
Sparse would have caught both of those if it wasn't so busy bitching
about __cold__.

Reported-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---

Commit 05f4f678b introduced a similar problem for NFSv4 which I'm not
going to attempt to deal with before getting some sleep...

diff --git a/fs/namei.c b/fs/namei.c
index b8433eb..78f253c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1248,6 +1248,8 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
 	int err;
 	struct qstr this;
 
+	WARN_ON_ONCE(!mutex_is_locked(&base->d_inode->i_mutex));
+
 	err = __lookup_one_len(name, &this, base, len);
 	if (err)
 		return ERR_PTR(err);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ab93fcf..44a68e1 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1885,8 +1885,8 @@ static int nfsd_buffered_filldir(void *__buf, const char *name, int namlen,
 	return 0;
 }
 
-static int nfsd_buffered_readdir(struct file *file, filldir_t func,
-				 struct readdir_cd *cdp, loff_t *offsetp)
+static __be32 nfsd_buffered_readdir(struct file *file, filldir_t func,
+				    struct readdir_cd *cdp, loff_t *offsetp)
 {
 	struct readdir_data buf;
 	struct buffered_dirent *de;
@@ -1896,11 +1896,12 @@ static int nfsd_buffered_readdir(struct file *file, filldir_t func,
 
 	buf.dirent = (void *)__get_free_page(GFP_KERNEL);
 	if (!buf.dirent)
-		return -ENOMEM;
+		return nfserrno(-ENOMEM);
 
 	offset = *offsetp;
 
 	while (1) {
+		struct inode *dir_inode = file->f_path.dentry->d_inode;
 		unsigned int reclen;
 
 		cdp->err = nfserr_eof; /* will be cleared on successful read */
@@ -1919,22 +1920,35 @@ static int nfsd_buffered_readdir(struct file *file, filldir_t func,
 		if (!size)
 			break;
 
+		/*
+		 * Various filldir functions may end up calling back into
+		 * lookup_one_len() and the file system's ->lookup() method.
+		 * These expect i_mutex to be held, as it would within readdir.
+		 */
+		host_err = mutex_lock_killable(&dir_inode->i_mutex);
+		if (host_err)
+			break;
+
 		de = (struct buffered_dirent *)buf.dirent;
 		while (size > 0) {
 			offset = de->offset;
 
 			if (func(cdp, de->name, de->namlen, de->offset,
 				 de->ino, de->d_type))
-				goto done;
+				break;
 
 			if (cdp->err != nfs_ok)
-				goto done;
+				break;
 
 			reclen = ALIGN(sizeof(*de) + de->namlen,
 				       sizeof(u64));
 			size -= reclen;
 			de = (struct buffered_dirent *)((char *)de + reclen);
 		}
+		mutex_unlock(&dir_inode->i_mutex);
+		if (size > 0) /* We bailed out early */
+			break;
+
 		offset = vfs_llseek(file, 0, SEEK_CUR);
 	}
 


-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] Fix i_mutex handling in nfsd readdir.
  2009-04-18  0:15               ` [PATCH] Fix i_mutex handling in nfsd readdir David Woodhouse
@ 2009-04-18  3:11                 ` hooanon05
  2009-04-18 14:25                   ` Al Viro
  2009-04-19  7:18                   ` David Woodhouse
  2009-04-19 12:27                 ` [PATCH v2] " David Woodhouse
  1 sibling, 2 replies; 39+ messages in thread
From: hooanon05 @ 2009-04-18  3:11 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Al Viro, bfields, hch, linux-kernel, linux-fsdevel


David Woodhouse:
> This patch fixes it by locking the directory's i_mutex again before
> calling the filldir functions. The original deadlocks which commit
> 14f7dd63 was designed to avoid are still avoided, because they were due
> to fs-internal locking, not i_mutex.
	:::
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1248,6 +1248,8 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
>  	int err;
>  	struct qstr this;
>  
> +	WARN_ON_ONCE(!mutex_is_locked(&base->d_inode->i_mutex));
> +
>  	err = __lookup_one_len(name, &this, base, len);

I'd suggest this checking is done only when CONFIG_DEBUG_KERNEL (or
something) is enabled. Because unconditional checking costs high for the
well-reviewed lookup code.


J. R. Okajima

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] Fix i_mutex handling in nfsd readdir.
  2009-04-18  3:11                 ` hooanon05
@ 2009-04-18 14:25                   ` Al Viro
  2009-04-19  7:18                   ` David Woodhouse
  1 sibling, 0 replies; 39+ messages in thread
From: Al Viro @ 2009-04-18 14:25 UTC (permalink / raw)
  To: hooanon05; +Cc: David Woodhouse, bfields, hch, linux-kernel, linux-fsdevel

On Sat, Apr 18, 2009 at 12:11:54PM +0900, hooanon05@yahoo.co.jp wrote:
> 
> David Woodhouse:
> > This patch fixes it by locking the directory's i_mutex again before
> > calling the filldir functions. The original deadlocks which commit
> > 14f7dd63 was designed to avoid are still avoided, because they were due
> > to fs-internal locking, not i_mutex.
> 	:::
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -1248,6 +1248,8 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
> >  	int err;
> >  	struct qstr this;
> >  
> > +	WARN_ON_ONCE(!mutex_is_locked(&base->d_inode->i_mutex));
> > +
> >  	err = __lookup_one_len(name, &this, base, len);
> 
> I'd suggest this checking is done only when CONFIG_DEBUG_KERNEL (or
> something) is enabled. Because unconditional checking costs high for the
> well-reviewed lookup code.

Frankly, I'd rather have DEBUG_WARN... conditional on that, rather than
cluttering code with ifdefs...

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] Fix i_mutex handling in nfsd readdir.
  2009-04-18  3:11                 ` hooanon05
  2009-04-18 14:25                   ` Al Viro
@ 2009-04-19  7:18                   ` David Woodhouse
  1 sibling, 0 replies; 39+ messages in thread
From: David Woodhouse @ 2009-04-19  7:18 UTC (permalink / raw)
  To: hooanon05; +Cc: Al Viro, bfields, hch, linux-kernel, linux-fsdevel

On Sat, 2009-04-18 at 12:11 +0900, hooanon05@yahoo.co.jp wrote:
> David Woodhouse:
> > This patch fixes it by locking the directory's i_mutex again before
> > calling the filldir functions. The original deadlocks which commit
> > 14f7dd63 was designed to avoid are still avoided, because they were due
> > to fs-internal locking, not i_mutex.
> 	:::
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -1248,6 +1248,8 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
> >  	int err;
> >  	struct qstr this;
> >  
> > +	WARN_ON_ONCE(!mutex_is_locked(&base->d_inode->i_mutex));
> > +
> >  	err = __lookup_one_len(name, &this, base, len);
> 
> I'd suggest this checking is done only when CONFIG_DEBUG_KERNEL (or
> something) is enabled. Because unconditional checking costs high for the
> well-reviewed lookup code.

It's supposed to be locked. It's likely to have been locked quite
recently, so it'll be in the cache. I don't think the mutex_is_locked()
check is going to be that expensive, is it?

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-04-18  0:15               ` [PATCH] Fix i_mutex handling in nfsd readdir David Woodhouse
  2009-04-18  3:11                 ` hooanon05
@ 2009-04-19 12:27                 ` David Woodhouse
  2009-04-19 20:51                   ` J. Bruce Fields
  2009-04-22  4:41                   ` hooanon05
  1 sibling, 2 replies; 39+ messages in thread
From: David Woodhouse @ 2009-04-19 12:27 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, hooanon05, hch, linux-kernel, linux-fsdevel

Commit 14f7dd63 ("Copy XFS readdir hack into nfsd code") introduced a
bug to generic code which had been extant for a long time in the XFS
version -- it started to call through into lookup_one_len() and hence
into the file systems' ->lookup() methods without i_mutex held on the
directory.

This patch fixes it by locking the directory's i_mutex again before
calling the filldir functions. The original deadlocks which commit
14f7dd63 was designed to avoid are still avoided, because they were due
to fs-internal locking, not i_mutex.

Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
code") introduced a similar problem there, which this also addresses.

While we're at it, fix the return type of nfsd_buffered_readdir() which
should be a __be32 not an int -- it's an NFS errno, not a Linux errno.
And return nfserrno(-ENOMEM) when allocation fails, not just -ENOMEM.
Sparse would have caught both of those if it wasn't so busy bitching
about __cold__.

Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
code") introduced a similar problem with calling lookup_one_len()
without i_mutex, which this patch also addresses.

Reported-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Umm-I-can-live-with-that-by: Al Viro <viro@zeniv.linux.org.uk>
---
Still haven't tested the NFSv4 bit -- Bruce?

This time I remembered to remove the no-longer-used 'done:' label from
nfsd_buffered_readdir() too.

diff --git a/fs/namei.c b/fs/namei.c
index b8433eb..78f253c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1248,6 +1248,8 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
 	int err;
 	struct qstr this;
 
+	WARN_ON_ONCE(!mutex_is_locked(&base->d_inode->i_mutex));
+
 	err = __lookup_one_len(name, &this, base, len);
 	if (err)
 		return ERR_PTR(err);
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 3444c00..210709c 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -229,21 +229,23 @@ nfsd4_list_rec_dir(struct dentry *dir, recdir_func *f)
 		goto out;
 	status = vfs_readdir(filp, nfsd4_build_namelist, &names);
 	fput(filp);
+	mutex_lock(&dir->d_inode->i_mutex);
 	while (!list_empty(&names)) {
 		entry = list_entry(names.next, struct name_list, list);
 
 		dentry = lookup_one_len(entry->name, dir, HEXDIR_LEN-1);
 		if (IS_ERR(dentry)) {
 			status = PTR_ERR(dentry);
-			goto out;
+			break;
 		}
 		status = f(dir, dentry);
 		dput(dentry);
 		if (status)
-			goto out;
+			break;
 		list_del(&entry->list);
 		kfree(entry);
 	}
+	mutex_unlock(&dir->d_inode->i_mutex);
 out:
 	while (!list_empty(&names)) {
 		entry = list_entry(names.next, struct name_list, list);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ab93fcf..0874299 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1885,8 +1885,8 @@ static int nfsd_buffered_filldir(void *__buf, const char *name, int namlen,
 	return 0;
 }
 
-static int nfsd_buffered_readdir(struct file *file, filldir_t func,
-				 struct readdir_cd *cdp, loff_t *offsetp)
+static __be32 nfsd_buffered_readdir(struct file *file, filldir_t func,
+				    struct readdir_cd *cdp, loff_t *offsetp)
 {
 	struct readdir_data buf;
 	struct buffered_dirent *de;
@@ -1896,11 +1896,12 @@ static int nfsd_buffered_readdir(struct file *file, filldir_t func,
 
 	buf.dirent = (void *)__get_free_page(GFP_KERNEL);
 	if (!buf.dirent)
-		return -ENOMEM;
+		return nfserrno(-ENOMEM);
 
 	offset = *offsetp;
 
 	while (1) {
+		struct inode *dir_inode = file->f_path.dentry->d_inode;
 		unsigned int reclen;
 
 		cdp->err = nfserr_eof; /* will be cleared on successful read */
@@ -1919,26 +1920,38 @@ static int nfsd_buffered_readdir(struct file *file, filldir_t func,
 		if (!size)
 			break;
 
+		/*
+		 * Various filldir functions may end up calling back into
+		 * lookup_one_len() and the file system's ->lookup() method.
+		 * These expect i_mutex to be held, as it would within readdir.
+		 */
+		host_err = mutex_lock_killable(&dir_inode->i_mutex);
+		if (host_err)
+			break;
+
 		de = (struct buffered_dirent *)buf.dirent;
 		while (size > 0) {
 			offset = de->offset;
 
 			if (func(cdp, de->name, de->namlen, de->offset,
 				 de->ino, de->d_type))
-				goto done;
+				break;
 
 			if (cdp->err != nfs_ok)
-				goto done;
+				break;
 
 			reclen = ALIGN(sizeof(*de) + de->namlen,
 				       sizeof(u64));
 			size -= reclen;
 			de = (struct buffered_dirent *)((char *)de + reclen);
 		}
+		mutex_unlock(&dir_inode->i_mutex);
+		if (size > 0) /* We bailed out early */
+			break;
+
 		offset = vfs_llseek(file, 0, SEEK_CUR);
 	}
 
- done:
 	free_page((unsigned long)(buf.dirent));
 
 	if (host_err)

-- 
dwmw2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-04-19 12:27                 ` [PATCH v2] " David Woodhouse
@ 2009-04-19 20:51                   ` J. Bruce Fields
  2009-04-20 19:50                     ` J. Bruce Fields
  2009-04-22  4:41                   ` hooanon05
  1 sibling, 1 reply; 39+ messages in thread
From: J. Bruce Fields @ 2009-04-19 20:51 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Al Viro, hooanon05, hch, linux-kernel, linux-fsdevel

On Sun, Apr 19, 2009 at 01:27:49PM +0100, David Woodhouse wrote:
> Commit 14f7dd63 ("Copy XFS readdir hack into nfsd code") introduced a
> bug to generic code which had been extant for a long time in the XFS
> version -- it started to call through into lookup_one_len() and hence
> into the file systems' ->lookup() methods without i_mutex held on the
> directory.
> 
> This patch fixes it by locking the directory's i_mutex again before
> calling the filldir functions. The original deadlocks which commit
> 14f7dd63 was designed to avoid are still avoided, because they were due
> to fs-internal locking, not i_mutex.
> 
> Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
> code") introduced a similar problem there, which this also addresses.
> 
> While we're at it, fix the return type of nfsd_buffered_readdir() which
> should be a __be32 not an int -- it's an NFS errno, not a Linux errno.
> And return nfserrno(-ENOMEM) when allocation fails, not just -ENOMEM.
> Sparse would have caught both of those if it wasn't so busy bitching
> about __cold__.
> 
> Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
> code") introduced a similar problem with calling lookup_one_len()
> without i_mutex, which this patch also addresses.
> 
> Reported-by: J. R. Okajima <hooanon05@yahoo.co.jp>
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> Umm-I-can-live-with-that-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> Still haven't tested the NFSv4 bit -- Bruce?

Thanks, there's an iterator in there that calls a passed-in function,
some examples of which were taking the i_mutex--so some fixing up is
needed.  I'll follow up with a patch once I've got one tested.

--b.

> 
> This time I remembered to remove the no-longer-used 'done:' label from
> nfsd_buffered_readdir() too.
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index b8433eb..78f253c 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1248,6 +1248,8 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
>  	int err;
>  	struct qstr this;
>  
> +	WARN_ON_ONCE(!mutex_is_locked(&base->d_inode->i_mutex));
> +
>  	err = __lookup_one_len(name, &this, base, len);
>  	if (err)
>  		return ERR_PTR(err);
> diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> index 3444c00..210709c 100644
> --- a/fs/nfsd/nfs4recover.c
> +++ b/fs/nfsd/nfs4recover.c
> @@ -229,21 +229,23 @@ nfsd4_list_rec_dir(struct dentry *dir, recdir_func *f)
>  		goto out;
>  	status = vfs_readdir(filp, nfsd4_build_namelist, &names);
>  	fput(filp);
> +	mutex_lock(&dir->d_inode->i_mutex);
>  	while (!list_empty(&names)) {
>  		entry = list_entry(names.next, struct name_list, list);
>  
>  		dentry = lookup_one_len(entry->name, dir, HEXDIR_LEN-1);
>  		if (IS_ERR(dentry)) {
>  			status = PTR_ERR(dentry);
> -			goto out;
> +			break;
>  		}
>  		status = f(dir, dentry);
>  		dput(dentry);
>  		if (status)
> -			goto out;
> +			break;
>  		list_del(&entry->list);
>  		kfree(entry);
>  	}
> +	mutex_unlock(&dir->d_inode->i_mutex);
>  out:
>  	while (!list_empty(&names)) {
>  		entry = list_entry(names.next, struct name_list, list);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index ab93fcf..0874299 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1885,8 +1885,8 @@ static int nfsd_buffered_filldir(void *__buf, const char *name, int namlen,
>  	return 0;
>  }
>  
> -static int nfsd_buffered_readdir(struct file *file, filldir_t func,
> -				 struct readdir_cd *cdp, loff_t *offsetp)
> +static __be32 nfsd_buffered_readdir(struct file *file, filldir_t func,
> +				    struct readdir_cd *cdp, loff_t *offsetp)
>  {
>  	struct readdir_data buf;
>  	struct buffered_dirent *de;
> @@ -1896,11 +1896,12 @@ static int nfsd_buffered_readdir(struct file *file, filldir_t func,
>  
>  	buf.dirent = (void *)__get_free_page(GFP_KERNEL);
>  	if (!buf.dirent)
> -		return -ENOMEM;
> +		return nfserrno(-ENOMEM);
>  
>  	offset = *offsetp;
>  
>  	while (1) {
> +		struct inode *dir_inode = file->f_path.dentry->d_inode;
>  		unsigned int reclen;
>  
>  		cdp->err = nfserr_eof; /* will be cleared on successful read */
> @@ -1919,26 +1920,38 @@ static int nfsd_buffered_readdir(struct file *file, filldir_t func,
>  		if (!size)
>  			break;
>  
> +		/*
> +		 * Various filldir functions may end up calling back into
> +		 * lookup_one_len() and the file system's ->lookup() method.
> +		 * These expect i_mutex to be held, as it would within readdir.
> +		 */
> +		host_err = mutex_lock_killable(&dir_inode->i_mutex);
> +		if (host_err)
> +			break;
> +
>  		de = (struct buffered_dirent *)buf.dirent;
>  		while (size > 0) {
>  			offset = de->offset;
>  
>  			if (func(cdp, de->name, de->namlen, de->offset,
>  				 de->ino, de->d_type))
> -				goto done;
> +				break;
>  
>  			if (cdp->err != nfs_ok)
> -				goto done;
> +				break;
>  
>  			reclen = ALIGN(sizeof(*de) + de->namlen,
>  				       sizeof(u64));
>  			size -= reclen;
>  			de = (struct buffered_dirent *)((char *)de + reclen);
>  		}
> +		mutex_unlock(&dir_inode->i_mutex);
> +		if (size > 0) /* We bailed out early */
> +			break;
> +
>  		offset = vfs_llseek(file, 0, SEEK_CUR);
>  	}
>  
> - done:
>  	free_page((unsigned long)(buf.dirent));
>  
>  	if (host_err)
> 
> -- 
> dwmw2
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-04-19 20:51                   ` J. Bruce Fields
@ 2009-04-20 19:50                     ` J. Bruce Fields
  2009-04-21  0:29                       ` Al Viro
  2009-05-11 23:16                       ` J. Bruce Fields
  0 siblings, 2 replies; 39+ messages in thread
From: J. Bruce Fields @ 2009-04-20 19:50 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Al Viro, hooanon05, hch, linux-kernel, linux-fsdevel

On Sun, Apr 19, 2009 at 04:51:54PM -0400, bfields wrote:
> On Sun, Apr 19, 2009 at 01:27:49PM +0100, David Woodhouse wrote:
> > Commit 14f7dd63 ("Copy XFS readdir hack into nfsd code") introduced a
> > bug to generic code which had been extant for a long time in the XFS
> > version -- it started to call through into lookup_one_len() and hence
> > into the file systems' ->lookup() methods without i_mutex held on the
> > directory.
> > 
> > This patch fixes it by locking the directory's i_mutex again before
> > calling the filldir functions. The original deadlocks which commit
> > 14f7dd63 was designed to avoid are still avoided, because they were due
> > to fs-internal locking, not i_mutex.
> > 
> > Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
> > code") introduced a similar problem there, which this also addresses.
> > 
> > While we're at it, fix the return type of nfsd_buffered_readdir() which
> > should be a __be32 not an int -- it's an NFS errno, not a Linux errno.
> > And return nfserrno(-ENOMEM) when allocation fails, not just -ENOMEM.
> > Sparse would have caught both of those if it wasn't so busy bitching
> > about __cold__.
> > 
> > Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
> > code") introduced a similar problem with calling lookup_one_len()
> > without i_mutex, which this patch also addresses.
> > 
> > Reported-by: J. R. Okajima <hooanon05@yahoo.co.jp>
> > Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> > Umm-I-can-live-with-that-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
> > Still haven't tested the NFSv4 bit -- Bruce?
> 
> Thanks, there's an iterator in there that calls a passed-in function,
> some examples of which were taking the i_mutex--so some fixing up is
> needed.  I'll follow up with a patch once I've got one tested.

Sorry for the delay.  Simpler might be just to drop and reacquire the
mutex each time through nfsd4_list_rec_dir()'s loop, but I'd just as
soon rework the called functions to expect the mutex be held (and get
rid of the unused, probably fragile, clear_clid_dir() in the process).

So the following could be folded in to your patch.

I tested the combined patch over 2.6.30-rc2.  I also tested 2.6.29 +
05f4f678 + the combined patch.  Both look  OK.  Feel free to add a
tested-by or acked-by for "J. Bruce Fields" <bfields@citi.umich.edu> as
appropriate.  Or happy to add a s-o-b and shepherd it along myself if
it's easier....

--b.

diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 210709c..5275097 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -257,36 +257,6 @@ out:
 }
 
 static int
-nfsd4_remove_clid_file(struct dentry *dir, struct dentry *dentry)
-{
-	int status;
-
-	if (!S_ISREG(dir->d_inode->i_mode)) {
-		printk("nfsd4: non-file found in client recovery directory\n");
-		return -EINVAL;
-	}
-	mutex_lock_nested(&dir->d_inode->i_mutex, I_MUTEX_PARENT);
-	status = vfs_unlink(dir->d_inode, dentry);
-	mutex_unlock(&dir->d_inode->i_mutex);
-	return status;
-}
-
-static int
-nfsd4_clear_clid_dir(struct dentry *dir, struct dentry *dentry)
-{
-	int status;
-
-	/* For now this directory should already be empty, but we empty it of
-	 * any regular files anyway, just in case the directory was created by
-	 * a kernel from the future.... */
-	nfsd4_list_rec_dir(dentry, nfsd4_remove_clid_file);
-	mutex_lock_nested(&dir->d_inode->i_mutex, I_MUTEX_PARENT);
-	status = vfs_rmdir(dir->d_inode, dentry);
-	mutex_unlock(&dir->d_inode->i_mutex);
-	return status;
-}
-
-static int
 nfsd4_unlink_clid_dir(char *name, int namlen)
 {
 	struct dentry *dentry;
@@ -296,18 +266,18 @@ nfsd4_unlink_clid_dir(char *name, int namlen)
 
 	mutex_lock(&rec_dir.dentry->d_inode->i_mutex);
 	dentry = lookup_one_len(name, rec_dir.dentry, namlen);
-	mutex_unlock(&rec_dir.dentry->d_inode->i_mutex);
 	if (IS_ERR(dentry)) {
 		status = PTR_ERR(dentry);
-		return status;
+		goto out_unlock;
 	}
 	status = -ENOENT;
 	if (!dentry->d_inode)
 		goto out;
-
-	status = nfsd4_clear_clid_dir(rec_dir.dentry, dentry);
+	status = vfs_rmdir(rec_dir.dentry->d_inode, dentry);
 out:
 	dput(dentry);
+out_unlock:
+	mutex_unlock(&rec_dir.dentry->d_inode->i_mutex);
 	return status;
 }
 
@@ -350,7 +320,7 @@ purge_old(struct dentry *parent, struct dentry *child)
 	if (nfs4_has_reclaimed_state(child->d_name.name, false))
 		return 0;
 
-	status = nfsd4_clear_clid_dir(parent, child);
+	status = vfs_rmdir(parent->d_inode, child);
 	if (status)
 		printk("failed to remove client recovery directory %s\n",
 				child->d_name.name);

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-04-20 19:50                     ` J. Bruce Fields
@ 2009-04-21  0:29                       ` Al Viro
  2009-04-21 21:15                         ` J. Bruce Fields
  2009-05-11 23:16                       ` J. Bruce Fields
  1 sibling, 1 reply; 39+ messages in thread
From: Al Viro @ 2009-04-21  0:29 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: David Woodhouse, hooanon05, hch, linux-kernel, linux-fsdevel

> Sorry for the delay.  Simpler might be just to drop and reacquire the
> mutex each time through nfsd4_list_rec_dir()'s loop, but I'd just as
> soon rework the called functions to expect the mutex be held (and get
> rid of the unused, probably fragile, clear_clid_dir() in the process).
> 
> So the following could be folded in to your patch.
> 
> I tested the combined patch over 2.6.30-rc2.  I also tested 2.6.29 +
> 05f4f678 + the combined patch.  Both look  OK.  Feel free to add a
> tested-by or acked-by for "J. Bruce Fields" <bfields@citi.umich.edu> as
> appropriate.  Or happy to add a s-o-b and shepherd it along myself if
> it's easier....

I can take both, but if you prefer to have that one go through nfs tree -
fine by me.

I'm going to push a queue into for-next in a couple of hours; running build
tests right now.  Patch from dwmw2 is in there...

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-04-21  0:29                       ` Al Viro
@ 2009-04-21 21:15                         ` J. Bruce Fields
  2009-04-21 21:54                           ` Al Viro
  0 siblings, 1 reply; 39+ messages in thread
From: J. Bruce Fields @ 2009-04-21 21:15 UTC (permalink / raw)
  To: Al Viro; +Cc: David Woodhouse, hooanon05, hch, linux-kernel, linux-fsdevel

On Tue, Apr 21, 2009 at 01:29:34AM +0100, Al Viro wrote:
> > Sorry for the delay.  Simpler might be just to drop and reacquire the
> > mutex each time through nfsd4_list_rec_dir()'s loop, but I'd just as
> > soon rework the called functions to expect the mutex be held (and get
> > rid of the unused, probably fragile, clear_clid_dir() in the process).
> > 
> > So the following could be folded in to your patch.
> > 
> > I tested the combined patch over 2.6.30-rc2.  I also tested 2.6.29 +
> > 05f4f678 + the combined patch.  Both look  OK.  Feel free to add a
> > tested-by or acked-by for "J. Bruce Fields" <bfields@citi.umich.edu> as
> > appropriate.  Or happy to add a s-o-b and shepherd it along myself if
> > it's easier....
> 
> I can take both, but if you prefer to have that one go through nfs tree -
> fine by me.
> 
> I'm going to push a queue into for-next in a couple of hours; running build
> tests right now.  Patch from dwmw2 is in there...

However it makes it there is fine by me; I'll wait for one of you to
take care of it.

--b.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-04-21 21:15                         ` J. Bruce Fields
@ 2009-04-21 21:54                           ` Al Viro
  0 siblings, 0 replies; 39+ messages in thread
From: Al Viro @ 2009-04-21 21:54 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: David Woodhouse, hooanon05, hch, linux-kernel, linux-fsdevel

On Tue, Apr 21, 2009 at 05:15:59PM -0400, J. Bruce Fields wrote:

> However it makes it there is fine by me; I'll wait for one of you to
> take care of it.

Folded variant is in the mainline.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-04-19 12:27                 ` [PATCH v2] " David Woodhouse
  2009-04-19 20:51                   ` J. Bruce Fields
@ 2009-04-22  4:41                   ` hooanon05
  2009-04-22 19:12                     ` J. Bruce Fields
  1 sibling, 1 reply; 39+ messages in thread
From: hooanon05 @ 2009-04-22  4:41 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Al Viro, bfields, hch, linux-kernel, linux-fsdevel


David Woodhouse:
> This patch fixes it by locking the directory's i_mutex again before
> calling the filldir functions. The original deadlocks which commit
	:::

An entry may be removed between the first mutex_unlock and the second
mutex_lock. In this case, lookup_one_len() in compose_entry_fh() will
return a negative dentry.
Currently the inode test (positive/negative) is done AFTER fh_compose().
Isn't it better to test it BEFORE fh_compose()?

compose_entry_fh()
{
	:::
		dchild = lookup_one_len(name, dparent, namlen);
	if (IS_ERR(dchild))
		return 1;
	if (d_mountpoint(dchild) ||
	    fh_compose(fhp, exp, dchild, &cd->fh) != 0 ||
	    !dchild->d_inode)
		rv = 1;
	:::
}


J. R. Okajima

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-04-22  4:41                   ` hooanon05
@ 2009-04-22 19:12                     ` J. Bruce Fields
  2009-04-23  6:40                       ` hooanon05
  0 siblings, 1 reply; 39+ messages in thread
From: J. Bruce Fields @ 2009-04-22 19:12 UTC (permalink / raw)
  To: hooanon05; +Cc: David Woodhouse, Al Viro, hch, linux-kernel, linux-fsdevel

On Wed, Apr 22, 2009 at 01:41:46PM +0900, hooanon05@yahoo.co.jp wrote:
> 
> David Woodhouse:
> > This patch fixes it by locking the directory's i_mutex again before
> > calling the filldir functions. The original deadlocks which commit
> 	:::
> 
> An entry may be removed between the first mutex_unlock and the second
> mutex_lock. In this case, lookup_one_len() in compose_entry_fh() will
> return a negative dentry.
> Currently the inode test (positive/negative) is done AFTER fh_compose().
> Isn't it better to test it BEFORE fh_compose()?
> 
> compose_entry_fh()
> {
> 	:::
> 		dchild = lookup_one_len(name, dparent, namlen);
> 	if (IS_ERR(dchild))
> 		return 1;
> 	if (d_mountpoint(dchild) ||
> 	    fh_compose(fhp, exp, dchild, &cd->fh) != 0 ||
> 	    !dchild->d_inode)
> 		rv = 1;
> 	:::
> }

Yes, I think you're right.

Arrrgh.

--b.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-04-22 19:12                     ` J. Bruce Fields
@ 2009-04-23  6:40                       ` hooanon05
  2009-04-23 20:27                         ` J. Bruce Fields
  0 siblings, 1 reply; 39+ messages in thread
From: hooanon05 @ 2009-04-23  6:40 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: David Woodhouse, Al Viro, hch, linux-kernel, linux-fsdevel


"J. Bruce Fields":
> > Isn't it better to test it BEFORE fh_compose()?
	:::
> Yes, I think you're right.

Then here you are.

J. R. Okajima

----------------------------------------------------------------------

commit c98c6c4a207d602bd9498ea5f1d2993a00e98445
Author: J. R. Okajima <hooanon05@yahoo.co.jp>
Date:   Thu Apr 23 15:38:43 2009 +0900

    NFSD: test d_inode before fh_compose()
    
    After 2f9092e1020246168b1309b35e085ecd7ff9ff72 "Fix i_mutex vs. readdir
    handling in nfsd" (and 14f7dd63 "Copy XFS readdir hack into nfsd code"),
    an entry may be removed between the first mutex_unlock and the second
    mutex_lock. In this case, lookup_one_len() in compose_entry_fh() will
    return a negative dentry.
    It is better to test inode (positive/negative) BEFORE fh_compose().
    
    Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp>

diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 17d0dd9..1b5543b 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -851,8 +851,8 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
 	if (IS_ERR(dchild))
 		return 1;
 	if (d_mountpoint(dchild) ||
-	    fh_compose(fhp, exp, dchild, &cd->fh) != 0 ||
-	    !dchild->d_inode)
+	    !dchild->d_inode ||
+	    fh_compose(fhp, exp, dchild, &cd->fh) != 0)
 		rv = 1;
 	dput(dchild);
 	return rv;

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-04-23  6:40                       ` hooanon05
@ 2009-04-23 20:27                         ` J. Bruce Fields
  2009-05-05 23:35                           ` J. Bruce Fields
  0 siblings, 1 reply; 39+ messages in thread
From: J. Bruce Fields @ 2009-04-23 20:27 UTC (permalink / raw)
  To: hooanon05; +Cc: David Woodhouse, Al Viro, hch, linux-kernel, linux-fsdevel

On Thu, Apr 23, 2009 at 03:40:23PM +0900, hooanon05@yahoo.co.jp wrote:
> 
> "J. Bruce Fields":
> > > Isn't it better to test it BEFORE fh_compose()?
> 	:::
> > Yes, I think you're right.
> 
> Then here you are.

The nfsv4 readdir callback needs a similar fix.

Also, it looks to me like this results in us encoding an entry for this
deleted file in the readdir reply, but with an empty filehandle.  From a
quick glance at the rfc it's not clear to me whether this is really
legal.  I suspect it may cause odd behavior on clients.  At the least it
would seem cleaner to check for this condition early enough that we can
just skip the entry entirely.

--b.

> 
> J. R. Okajima
> 
> ----------------------------------------------------------------------
> 
> commit c98c6c4a207d602bd9498ea5f1d2993a00e98445
> Author: J. R. Okajima <hooanon05@yahoo.co.jp>
> Date:   Thu Apr 23 15:38:43 2009 +0900
> 
>     NFSD: test d_inode before fh_compose()
>     
>     After 2f9092e1020246168b1309b35e085ecd7ff9ff72 "Fix i_mutex vs. readdir
>     handling in nfsd" (and 14f7dd63 "Copy XFS readdir hack into nfsd code"),
>     an entry may be removed between the first mutex_unlock and the second
>     mutex_lock. In this case, lookup_one_len() in compose_entry_fh() will
>     return a negative dentry.
>     It is better to test inode (positive/negative) BEFORE fh_compose().
>     
>     Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp>
> 
> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> index 17d0dd9..1b5543b 100644
> --- a/fs/nfsd/nfs3xdr.c
> +++ b/fs/nfsd/nfs3xdr.c
> @@ -851,8 +851,8 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
>  	if (IS_ERR(dchild))
>  		return 1;
>  	if (d_mountpoint(dchild) ||
> -	    fh_compose(fhp, exp, dchild, &cd->fh) != 0 ||
> -	    !dchild->d_inode)
> +	    !dchild->d_inode ||
> +	    fh_compose(fhp, exp, dchild, &cd->fh) != 0)
>  		rv = 1;
>  	dput(dchild);
>  	return rv;

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-04-23 20:27                         ` J. Bruce Fields
@ 2009-05-05 23:35                           ` J. Bruce Fields
  2009-05-06  5:09                             ` hooanon05
  0 siblings, 1 reply; 39+ messages in thread
From: J. Bruce Fields @ 2009-05-05 23:35 UTC (permalink / raw)
  To: hooanon05; +Cc: David Woodhouse, Al Viro, hch, linux-kernel, linux-fsdevel

On Thu, Apr 23, 2009 at 04:27:05PM -0400, bfields wrote:
> On Thu, Apr 23, 2009 at 03:40:23PM +0900, hooanon05@yahoo.co.jp wrote:
> > 
> > "J. Bruce Fields":
> > > > Isn't it better to test it BEFORE fh_compose()?
> > 	:::
> > > Yes, I think you're right.
> > 
> > Then here you are.
> 
> The nfsv4 readdir callback needs a similar fix.
> 
> Also, it looks to me like this results in us encoding an entry for this
> deleted file in the readdir reply, but with an empty filehandle.  From a
> quick glance at the rfc it's not clear to me whether this is really
> legal.  I suspect it may cause odd behavior on clients.  At the least it
> would seem cleaner to check for this condition early enough that we can
> just skip the entry entirely.

Err, no, I was confused, the v3 spec does clearly state that the
filehandle field here is just an optional optimization.

But now that I look fh_compose() seems perfectly capable of dealing with
negative dentries, so I don't think your patch is necessary after all.

In the v4 case it looks like we actually have to return an error (if
only in the special rdattr_error attribute defined for this case), and
it seems to me that a client might legimately be unhappy with a server
returning a readdir entry with the rdattr_error set to -ENOENT.  So I
think we need something like the following for v4.--b.

commit 97a452f918403493309357146af3d8d8360be970
Author: J. Bruce Fields <bfields@citi.umich.edu>
Date:   Tue May 5 19:04:29 2009 -0400

    nfsd4: check for negative dentry before use in nfsv4 readdir
    
    After 2f9092e1020246168b1309b35e085ecd7ff9ff72 "Fix i_mutex vs.  readdir
    handling in nfsd" (and 14f7dd63 "Copy XFS readdir hack into nfsd code"),
    an entry may be removed between the first mutex_unlock and the second
    mutex_lock. In this case, lookup_one_len() will return a negative
    dentry.  Check for this case to avoid a NULL dereference.
    
    Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
    Cc: stable@kernel.org

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index b820c31..fe91427 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2204,14 +2204,12 @@ static inline int attributes_need_mount(u32 *bmval)
 
 static __be32
 nfsd4_encode_dirent_fattr(struct nfsd4_readdir *cd,
-		const char *name, int namlen, __be32 *p, int *buflen)
+		struct dentry *dentry, __be32 *p, int *buflen)
 {
 	struct svc_export *exp = cd->rd_fhp->fh_export;
-	struct dentry *dentry;
 	__be32 nfserr;
 	int ignore_crossmnt = 0;
 
-	dentry = lookup_one_len(name, cd->rd_fhp->fh_dentry, namlen);
 	if (IS_ERR(dentry))
 		return nfserrno(PTR_ERR(dentry));
 
@@ -2274,6 +2272,7 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
 {
 	struct readdir_cd *ccd = ccdv;
 	struct nfsd4_readdir *cd = container_of(ccd, struct nfsd4_readdir, common);
+	struct dentry *dentry;
 	int buflen;
 	__be32 *p = cd->buffer;
 	__be32 nfserr = nfserr_toosmall;
@@ -2283,20 +2282,32 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
 		cd->common.err = nfs_ok;
 		return 0;
 	}
+	dentry = lookup_one_len(name, cd->rd_fhp->fh_dentry, namlen);
+	/* Whoops, it vanished after we dropped the i_sem: */
+	if (!IS_ERR(dentry) && !dentry->d_inode)
+		return 0;
+	/*
+	 * (Note that nfsd4_encode_dirent_fattr() checks for other
+	 * lookup errors below, allowing us to return them in the
+	 * RDATTR_ERROR attribute instead of failing the whole readdir.
+	 */
 
 	if (cd->offset)
 		xdr_encode_hyper(cd->offset, (u64) offset);
 
 	buflen = cd->buflen - 4 - XDR_QUADLEN(namlen);
-	if (buflen < 0)
+	if (buflen < 0) {
+		dput(dentry);
 		goto fail;
+	}
 
 	*p++ = xdr_one;                             /* mark entry present */
 	cd->offset = p;                             /* remember pointer */
 	p = xdr_encode_hyper(p, NFS_OFFSET_MAX);    /* offset of next entry */
 	p = xdr_encode_array(p, name, namlen);      /* name length & name */
 
-	nfserr = nfsd4_encode_dirent_fattr(cd, name, namlen, p, &buflen);
+	/* This consumes a reference to dentry: */
+	nfserr = nfsd4_encode_dirent_fattr(cd, dentry, p, &buflen);
 	switch (nfserr) {
 	case nfs_ok:
 		p += buflen;

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-05-05 23:35                           ` J. Bruce Fields
@ 2009-05-06  5:09                             ` hooanon05
  2009-05-06 20:20                               ` J. Bruce Fields
  0 siblings, 1 reply; 39+ messages in thread
From: hooanon05 @ 2009-05-06  5:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: David Woodhouse, Al Viro, hch, linux-kernel, linux-fsdevel


"J. Bruce Fields":
> On Thu, Apr 23, 2009 at 04:27:05PM -0400, bfields wrote:
> > On Thu, Apr 23, 2009 at 03:40:23PM +0900, hooanon05@yahoo.co.jp wrote:
> > > 
> > > "J. Bruce Fields":
> > > > > Isn't it better to test it BEFORE fh_compose()?
> > > 	:::
> > > > Yes, I think you're right.
	:::
> Err, no, I was confused, the v3 spec does clearly state that the
> filehandle field here is just an optional optimization.
> 
> But now that I look fh_compose() seems perfectly capable of dealing with
> negative dentries, so I don't think your patch is necessary after all.

I agree with you.
I just thought it is _better_ to test it BEFORE fh_compose(). I don't
think fh_compose() would crash.


If you move lookup_one_len() from nfsd4_encode_dirent_fattr() to
nfsd4_encode_dirent(), then I'd suggest you to move dput() too.
Applying your patch,
- when we get a negative dentry, nfsd4_encode_dirent() will return
  without dput(). Is it OK?
- when lookup_one_len() returns an error, nfsd4_encode_dirent() may
  crash later.


J. R. Okajima

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-05-06  5:09                             ` hooanon05
@ 2009-05-06 20:20                               ` J. Bruce Fields
  2009-05-07  4:38                                 ` hooanon05
  0 siblings, 1 reply; 39+ messages in thread
From: J. Bruce Fields @ 2009-05-06 20:20 UTC (permalink / raw)
  To: hooanon05; +Cc: David Woodhouse, Al Viro, hch, linux-kernel, linux-fsdevel

On Wed, May 06, 2009 at 02:09:05PM +0900, hooanon05@yahoo.co.jp wrote:
> 
> "J. Bruce Fields":
> > On Thu, Apr 23, 2009 at 04:27:05PM -0400, bfields wrote:
> > > On Thu, Apr 23, 2009 at 03:40:23PM +0900, hooanon05@yahoo.co.jp wrote:
> > > > 
> > > > "J. Bruce Fields":
> > > > > > Isn't it better to test it BEFORE fh_compose()?
> > > > 	:::
> > > > > Yes, I think you're right.
> 	:::
> > Err, no, I was confused, the v3 spec does clearly state that the
> > filehandle field here is just an optional optimization.
> > 
> > But now that I look fh_compose() seems perfectly capable of dealing with
> > negative dentries, so I don't think your patch is necessary after all.
> 
> I agree with you.
> I just thought it is _better_ to test it BEFORE fh_compose(). I don't
> think fh_compose() would crash.

OK, got it.  The nfsd code routinely calls fh_compose() before handling
negative dentries, so I think it can be left as is.

> If you move lookup_one_len() from nfsd4_encode_dirent_fattr() to
> nfsd4_encode_dirent(), then I'd suggest you to move dput() too.

Yes, that'd be cleaner, but there's an nfsd_cross_mount() in
nfsd_cross_mnt that overwrites its local dentry variable, and that makes
this a little ugly.

> Applying your patch,
> - when we get a negative dentry, nfsd4_encode_dirent() will return
>   without dput(). Is it OK?

Whoops, no, you're right.

> - when lookup_one_len() returns an error, nfsd4_encode_dirent() may
>   crash later.

Yes, it can crash in the "goto fail" case immediately below, because we
pass the error pointer to dput().  Oops.

I wonder if it'd be simpler just to keep the loookup_one_len() in
nfsd4_encode_dirent() and just be prepared to roll back the encoding
we've already done if we find out at that point the dentry has gone
negative.  Actually, that doesn't look hard.

Thanks for looking at this!

--b.

commit b2c0cea6b1cb210e962f07047df602875564069e
Author: J. Bruce Fields <bfields@citi.umich.edu>
Date:   Tue May 5 19:04:29 2009 -0400

    nfsd4: check for negative dentry before use in nfsv4 readdir
    
    After 2f9092e1020246168b1309b35e085ecd7ff9ff72 "Fix i_mutex vs.  readdir
    handling in nfsd" (and 14f7dd63 "Copy XFS readdir hack into nfsd code"),
    an entry may be removed between the first mutex_unlock and the second
    mutex_lock. In this case, lookup_one_len() will return a negative
    dentry.  Check for this case to avoid a NULL dereference.
    
    Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
    Reviewed-by: J. R. Okajima <hooanon05@yahoo.co.jp>
    Cc: stable@kernel.org

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index b820c31..b73549d 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2214,6 +2214,15 @@ nfsd4_encode_dirent_fattr(struct nfsd4_readdir *cd,
 	dentry = lookup_one_len(name, cd->rd_fhp->fh_dentry, namlen);
 	if (IS_ERR(dentry))
 		return nfserrno(PTR_ERR(dentry));
+	if (!dentry->d_inode) {
+		/*
+		 * nfsd_buffered_readdir drops the i_mutex between
+		 * readdir and calling this callback, leaving a window
+		 * where this directory entry could have gone away.
+		 */
+		dput(dentry);
+		return nfserr_noent;
+	}
 
 	exp_get(exp);
 	/*
@@ -2276,6 +2285,7 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
 	struct nfsd4_readdir *cd = container_of(ccd, struct nfsd4_readdir, common);
 	int buflen;
 	__be32 *p = cd->buffer;
+	__be32 *cookiep;
 	__be32 nfserr = nfserr_toosmall;
 
 	/* In nfsv4, "." and ".." never make it onto the wire.. */
@@ -2292,7 +2302,7 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
 		goto fail;
 
 	*p++ = xdr_one;                             /* mark entry present */
-	cd->offset = p;                             /* remember pointer */
+	cookiep = p;
 	p = xdr_encode_hyper(p, NFS_OFFSET_MAX);    /* offset of next entry */
 	p = xdr_encode_array(p, name, namlen);      /* name length & name */
 
@@ -2306,6 +2316,8 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
 		goto fail;
 	case nfserr_dropit:
 		goto fail;
+	case nfserr_noent:
+		goto skip_entry;
 	default:
 		/*
 		 * If the client requested the RDATTR_ERROR attribute,
@@ -2324,6 +2336,8 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
 	}
 	cd->buflen -= (p - cd->buffer);
 	cd->buffer = p;
+	cd->offset = cookiep;
+skip_entry:
 	cd->common.err = nfs_ok;
 	return 0;
 fail:

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-05-06 20:20                               ` J. Bruce Fields
@ 2009-05-07  4:38                                 ` hooanon05
  2009-05-08 18:47                                   ` J. Bruce Fields
  0 siblings, 1 reply; 39+ messages in thread
From: hooanon05 @ 2009-05-07  4:38 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: David Woodhouse, Al Viro, hch, linux-kernel, linux-fsdevel


"J. Bruce Fields":
> I wonder if it'd be simpler just to keep the loookup_one_len() in
> nfsd4_encode_dirent() and just be prepared to roll back the encoding
> we've already done if we find out at that point the dentry has gone
> negative.  Actually, that doesn't look hard.
> 
> Thanks for looking at this!

The patch looks much better than before, and more than I expected. :-)


J. R. Okajima

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-05-07  4:38                                 ` hooanon05
@ 2009-05-08 18:47                                   ` J. Bruce Fields
  0 siblings, 0 replies; 39+ messages in thread
From: J. Bruce Fields @ 2009-05-08 18:47 UTC (permalink / raw)
  To: hooanon05; +Cc: David Woodhouse, Al Viro, hch, linux-kernel, linux-fsdevel

On Thu, May 07, 2009 at 01:38:27PM +0900, hooanon05@yahoo.co.jp wrote:
> 
> "J. Bruce Fields":
> > I wonder if it'd be simpler just to keep the loookup_one_len() in
> > nfsd4_encode_dirent() and just be prepared to roll back the encoding
> > we've already done if we find out at that point the dentry has gone
> > negative.  Actually, that doesn't look hard.
> > 
> > Thanks for looking at this!
> 
> The patch looks much better than before, and more than I expected. :-)

OK, thanks, I'll submit it to 2.6.30 and .29.

--b.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] Fix i_mutex handling in nfsd readdir
  2009-04-20 19:50                     ` J. Bruce Fields
  2009-04-21  0:29                       ` Al Viro
@ 2009-05-11 23:16                       ` J. Bruce Fields
  1 sibling, 0 replies; 39+ messages in thread
From: J. Bruce Fields @ 2009-05-11 23:16 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Al Viro, hooanon05, hch, linux-kernel, linux-fsdevel

On Mon, Apr 20, 2009 at 03:50:08PM -0400, bfields wrote:
> On Sun, Apr 19, 2009 at 04:51:54PM -0400, bfields wrote:
> > On Sun, Apr 19, 2009 at 01:27:49PM +0100, David Woodhouse wrote:
> > > Commit 14f7dd63 ("Copy XFS readdir hack into nfsd code") introduced a
> > > bug to generic code which had been extant for a long time in the XFS
> > > version -- it started to call through into lookup_one_len() and hence
> > > into the file systems' ->lookup() methods without i_mutex held on the
> > > directory.
> > > 
> > > This patch fixes it by locking the directory's i_mutex again before
> > > calling the filldir functions. The original deadlocks which commit
> > > 14f7dd63 was designed to avoid are still avoided, because they were due
> > > to fs-internal locking, not i_mutex.
> > > 
> > > Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
> > > code") introduced a similar problem there, which this also addresses.
> > > 
> > > While we're at it, fix the return type of nfsd_buffered_readdir() which
> > > should be a __be32 not an int -- it's an NFS errno, not a Linux errno.
> > > And return nfserrno(-ENOMEM) when allocation fails, not just -ENOMEM.
> > > Sparse would have caught both of those if it wasn't so busy bitching
> > > about __cold__.
> > > 
> > > Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
> > > code") introduced a similar problem with calling lookup_one_len()
> > > without i_mutex, which this patch also addresses.
> > > 
> > > Reported-by: J. R. Okajima <hooanon05@yahoo.co.jp>
> > > Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> > > Umm-I-can-live-with-that-by: Al Viro <viro@zeniv.linux.org.uk>
> > > ---
> > > Still haven't tested the NFSv4 bit -- Bruce?
> > 
> > Thanks, there's an iterator in there that calls a passed-in function,
> > some examples of which were taking the i_mutex--so some fixing up is
> > needed.  I'll follow up with a patch once I've got one tested.
> 
> Sorry for the delay.  Simpler might be just to drop and reacquire the
> mutex each time through nfsd4_list_rec_dir()'s loop, but I'd just as
> soon rework the called functions to expect the mutex be held (and get
> rid of the unused, probably fragile, clear_clid_dir() in the process).
> 
> So the following could be folded in to your patch.
> 
> I tested the combined patch over 2.6.30-rc2.  I also tested 2.6.29 +
> 05f4f678 + the combined patch.  Both look  OK.  Feel free to add a
> tested-by or acked-by for "J. Bruce Fields" <bfields@citi.umich.edu> as
> appropriate.  Or happy to add a s-o-b and shepherd it along myself if
> it's easier....

Unfortunately, I wasn't watching my logs carefully enough, and missed a
lockdep warning.

(Stupid policy question: is this for stable, current, or next (.29, .30,
or .31?)  On the one hand, it's just a warning.  On the other hand,
people freak out when they see backtraces in their logs.  But I don't
know how common it is to have lockdep on.)

--b.

commit 8daed1e549b55827758b3af7b8132a73fc51526f
Author: J. Bruce Fields <bfields@citi.umich.edu>
Date:   Mon May 11 16:10:19 2009 -0400

    nfsd: silence lockdep warning
    
    Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 5275097..b534840 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -229,7 +229,7 @@ nfsd4_list_rec_dir(struct dentry *dir, recdir_func *f)
 		goto out;
 	status = vfs_readdir(filp, nfsd4_build_namelist, &names);
 	fput(filp);
-	mutex_lock(&dir->d_inode->i_mutex);
+	mutex_lock_nested(&dir->d_inode->i_mutex, I_MUTEX_PARENT);
 	while (!list_empty(&names)) {
 		entry = list_entry(names.next, struct name_list, list);
 
@@ -264,7 +264,7 @@ nfsd4_unlink_clid_dir(char *name, int namlen)
 
 	dprintk("NFSD: nfsd4_unlink_clid_dir. name %.*s\n", namlen, name);
 
-	mutex_lock(&rec_dir.dentry->d_inode->i_mutex);
+	mutex_lock_nested(&rec_dir.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
 	dentry = lookup_one_len(name, rec_dir.dentry, namlen);
 	if (IS_ERR(dentry)) {
 		status = PTR_ERR(dentry);

^ permalink raw reply related	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2009-05-11 23:16 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-19 14:54 Q: NFSD readdir in linux-2.6.28 hooanon05
2009-03-19 15:17 ` David Woodhouse
2009-03-19 15:34   ` hooanon05
2009-03-19 15:51     ` David Woodhouse
2009-04-17  9:32     ` David Woodhouse
2009-04-17 19:32       ` Al Viro
2009-04-17 22:17         ` David Woodhouse
2009-04-17 22:43           ` Al Viro
2009-04-17 22:51             ` David Woodhouse
2009-04-17 22:53             ` Al Viro
2009-04-17 22:55               ` David Woodhouse
2009-04-17 23:23               ` David Woodhouse
2009-04-17 23:37                 ` Al Viro
2009-04-17 23:39                   ` Al Viro
2009-04-18  0:15               ` [PATCH] Fix i_mutex handling in nfsd readdir David Woodhouse
2009-04-18  3:11                 ` hooanon05
2009-04-18 14:25                   ` Al Viro
2009-04-19  7:18                   ` David Woodhouse
2009-04-19 12:27                 ` [PATCH v2] " David Woodhouse
2009-04-19 20:51                   ` J. Bruce Fields
2009-04-20 19:50                     ` J. Bruce Fields
2009-04-21  0:29                       ` Al Viro
2009-04-21 21:15                         ` J. Bruce Fields
2009-04-21 21:54                           ` Al Viro
2009-05-11 23:16                       ` J. Bruce Fields
2009-04-22  4:41                   ` hooanon05
2009-04-22 19:12                     ` J. Bruce Fields
2009-04-23  6:40                       ` hooanon05
2009-04-23 20:27                         ` J. Bruce Fields
2009-05-05 23:35                           ` J. Bruce Fields
2009-05-06  5:09                             ` hooanon05
2009-05-06 20:20                               ` J. Bruce Fields
2009-05-07  4:38                                 ` hooanon05
2009-05-08 18:47                                   ` J. Bruce Fields
2009-03-19 15:37   ` Q: NFSD readdir in linux-2.6.28 Matthew Wilcox
2009-03-19 15:45     ` hooanon05
2009-03-19 16:01     ` David Woodhouse
2009-04-16 21:45   ` J. Bruce Fields
2009-04-17  4:13     ` hooanon05

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.