linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal
@ 2012-09-17 20:23 Miklos Szeredi
  2012-09-17 20:31 ` [PATCH 2/2] vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill() Miklos Szeredi
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Miklos Szeredi @ 2012-09-17 20:23 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, linux-kernel, hch, torvalds, Trond.Myklebust

From: Miklos Szeredi <mszeredi@suse.cz>

IBM reported a deadlock in select_parent().  This was found to be caused by
taking rename_lock when already locked when restarting the tree traversal.

There are two cases when the traversal needs to be restarted:

 1) concurrent d_move(); this can only happen when not already locked,
 since taking rename_lock protects against concurrent d_move().

 2) racing with final d_put() on child just at the moment of ascending
 to parent; rename_lock doesn't protect against this rare race, so it
 can happen when already locked.

Because of case 2. we need to be able to handle restarting the traversal
when rename_lock is already held.  This patch fixes all three callers of
try_to_ascend().

IBM reported that the deadlock is gone with this patch.  However, there's still
a soft lockup which is addressed by the next patch.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org
---
 fs/dcache.c |   19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2012-09-17 19:56:19.000000000 +0200
+++ linux-2.6/fs/dcache.c	2012-09-17 20:04:07.000000000 +0200
@@ -1134,8 +1134,10 @@ int have_submounts(struct dentry *parent
 	return 1;
 
 rename_retry:
-	locked = 1;
-	write_seqlock(&rename_lock);
+	if (!locked) {
+		locked = 1;
+		write_seqlock(&rename_lock);
+	}
 	goto again;
 }
 EXPORT_SYMBOL(have_submounts);
@@ -1236,8 +1238,10 @@ static int select_parent(struct dentry *
 rename_retry:
 	if (found)
 		return found;
-	locked = 1;
-	write_seqlock(&rename_lock);
+	if (!locked) {
+		locked = 1;
+		write_seqlock(&rename_lock);
+	}
 	goto again;
 }
 
@@ -3035,8 +3039,11 @@ void d_genocide(struct dentry *root)
 	return;
 
 rename_retry:
-	locked = 1;
-	write_seqlock(&rename_lock);
+	if (!locked) {
+		locked = 1;
+		write_seqlock(&rename_lock);
+
+	}
 	goto again;
 }
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill()
  2012-09-17 20:23 [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal Miklos Szeredi
@ 2012-09-17 20:31 ` Miklos Szeredi
  2012-09-17 20:39 ` [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal Al Viro
  2012-09-18 20:35 ` [PATCH] trivial select_parent documentation fix J. Bruce Fields
  2 siblings, 0 replies; 9+ messages in thread
From: Miklos Szeredi @ 2012-09-17 20:31 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, linux-kernel, hch, torvalds, Trond.Myklebust

From: Miklos Szeredi <mszeredi@suse.cz>

IBM reported a soft lockup after applying the fix for the rename_lock deadlock.
Commit c83ce989 ("VFS: Fix the nfs sillyrename regression in kernel 2.6.38") was
found to be the culprit.

The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the dentry was
killed.  This flag can be set on non-killed dentries too, which results in
infinite retries when trying to traverse the dentry tree.

This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is only set
in d_kill() and makes try_to_ascend() test only this flag.

IBM reported successful test results with this patch.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
---
 fs/dcache.c            |    4 ++--
 include/linux/dcache.h |    2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2012-09-17 20:04:07.000000000 +0200
+++ linux-2.6/fs/dcache.c	2012-09-17 20:04:21.000000000 +0200
@@ -389,7 +389,7 @@ static struct dentry *d_kill(struct dent
 	 * Inform try_to_ascend() that we are no longer attached to the
 	 * dentry tree
 	 */
-	dentry->d_flags |= DCACHE_DISCONNECTED;
+	dentry->d_flags |= DCACHE_DENTRY_KILLED;
 	if (parent)
 		spin_unlock(&parent->d_lock);
 	dentry_iput(dentry);
@@ -1048,7 +1048,7 @@ static struct dentry *try_to_ascend(stru
 	 * or deletion
 	 */
 	if (new != old->d_parent ||
-		 (old->d_flags & DCACHE_DISCONNECTED) ||
+		 (old->d_flags & DCACHE_DENTRY_KILLED) ||
 		 (!locked && read_seqretry(&rename_lock, seq))) {
 		spin_unlock(&new->d_lock);
 		new = NULL;
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h	2012-09-17 19:56:19.000000000 +0200
+++ linux-2.6/include/linux/dcache.h	2012-09-17 20:04:21.000000000 +0200
@@ -206,6 +206,8 @@ struct dentry_operations {
 #define DCACHE_MANAGED_DENTRY \
 	(DCACHE_MOUNTED|DCACHE_NEED_AUTOMOUNT|DCACHE_MANAGE_TRANSIT)
 
+#define DCACHE_DENTRY_KILLED	0x100000
+
 extern seqlock_t rename_lock;
 
 static inline int dname_external(struct dentry *dentry)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal
  2012-09-17 20:23 [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal Miklos Szeredi
  2012-09-17 20:31 ` [PATCH 2/2] vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill() Miklos Szeredi
@ 2012-09-17 20:39 ` Al Viro
  2012-09-17 22:09   ` Linus Torvalds
  2012-09-18 20:35 ` [PATCH] trivial select_parent documentation fix J. Bruce Fields
  2 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2012-09-17 20:39 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, hch, torvalds, Trond.Myklebust

On Mon, Sep 17, 2012 at 10:23:30PM +0200, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> IBM reported a deadlock in select_parent().  This was found to be caused by
> taking rename_lock when already locked when restarting the tree traversal.
> 
> There are two cases when the traversal needs to be restarted:
> 
>  1) concurrent d_move(); this can only happen when not already locked,
>  since taking rename_lock protects against concurrent d_move().
> 
>  2) racing with final d_put() on child just at the moment of ascending
>  to parent; rename_lock doesn't protect against this rare race, so it
>  can happen when already locked.
> 
> Because of case 2. we need to be able to handle restarting the traversal
> when rename_lock is already held.  This patch fixes all three callers of
> try_to_ascend().
> 
> IBM reported that the deadlock is gone with this patch.  However, there's still
> a soft lockup which is addressed by the next patch.

Egads...  The problem is real and analysis, AFAICS, is correct, but result
is extremely ugly ;-/  Let me try to come up with something saner; I'll
push that one to Linus if nothing better comes to mind, but I'd really
prefer to avoid adding to ugliness in fs/dcache.c - we already have too
much of that...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal
  2012-09-17 20:39 ` [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal Al Viro
@ 2012-09-17 22:09   ` Linus Torvalds
  2012-09-18 14:53     ` Miklos Szeredi
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2012-09-17 22:09 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, hch, Trond.Myklebust

On Mon, Sep 17, 2012 at 1:39 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Egads...  The problem is real and analysis, AFAICS, is correct, but result
> is extremely ugly ;-/

Agreed.

The problem (or at least one *part* of the problem) is that the "goto
rename_retry" case is done for two different entities entirely:

 - the "try_to_ascend()" failure path, which can happen even when
renamelock is held for writing.

 - the "if we weren't write-locked before, and the read-lock failed"
case (which obviously cannot happen if we already held things for
writing)

That said, I'm not sure why/how that try_to_ascend() could even fail
when we're holding things locked. I guess it's the DCACHE_DISCONNECTED
case that triggers.

So I'll ignore this series for now, hoping that Al will send a nicer
version. Al, Miklos, please make sure this issue doesn't get dropped
by mistake.

            Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal
  2012-09-17 22:09   ` Linus Torvalds
@ 2012-09-18 14:53     ` Miklos Szeredi
  2012-09-18 15:56       ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Miklos Szeredi @ 2012-09-18 14:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel, linux-kernel, hch, Trond.Myklebust

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Mon, Sep 17, 2012 at 1:39 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> Egads...  The problem is real and analysis, AFAICS, is correct, but result
>> is extremely ugly ;-/
>
> Agreed.
>
> The problem (or at least one *part* of the problem) is that the "goto
> rename_retry" case is done for two different entities entirely:
>
>  - the "try_to_ascend()" failure path, which can happen even when
> renamelock is held for writing.
>
>  - the "if we weren't write-locked before, and the read-lock failed"
> case (which obviously cannot happen if we already held things for
> writing)
>
> That said, I'm not sure why/how that try_to_ascend() could even fail
> when we're holding things locked. I guess it's the DCACHE_DISCONNECTED
> case that triggers.

Yes, with the test cases that IBM were using it is DCACHE_DISCONNECTED
case that triggers the double-lock.  Trond was misusing
DCACHE_DISCONNECTED and this made the failure in try_to_ascend() much
more likely (and bogus).  But there is a case, which is triggered rarely
if ever, when try_to_ascend() failure with rename_lock held is perfectly
valid.

try_to_ascend() does an ellaborate locking dance that aims to ensure
that child remains valid after having ascended to parent.  It nees a
valid child because it needs the next sibling to continue tree
traversal.  If the child becomes invalid (final dput()) then the whole
tree traversal needs to be restarted since we don't know where we left
off. So we basically do

 1. take the RCU lock
 2. unlock child
 3. lock parent
 4. check whether child has been freed, if yes restart traversal
 5. release RCU
 6. find next sibling (child->d_child.next)
 7. lock new child

It is step 4. that needs the RCU guarantees so we can check child's
dentry flag (DCACHE_DISCONNECTED flag presently).  After that d_lock on
parent will protect "child" from going away (d_kill or d_move) and we
can get hold of the next sibling.

Thanks,
Miklos


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal
  2012-09-18 14:53     ` Miklos Szeredi
@ 2012-09-18 15:56       ` Linus Torvalds
  2012-09-18 16:27         ` Miklos Szeredi
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2012-09-18 15:56 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-fsdevel, linux-kernel, hch, Trond.Myklebust

On Tue, Sep 18, 2012 at 7:53 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> Yes, with the test cases that IBM were using it is DCACHE_DISCONNECTED
> case that triggers the double-lock.  Trond was misusing
> DCACHE_DISCONNECTED and this made the failure in try_to_ascend() much
> more likely (and bogus).  But there is a case, which is triggered rarely
> if ever, when try_to_ascend() failure with rename_lock held is perfectly
> valid.

Ok. The whole DCACHE_DISCONNECTED logic there is clearly bogus and
results in endless loops if that case then ever triggers, but you fix
that in the second patch.

HOWEVER. Why introduce that new DCACHE_KILLED flag at all? Wouldn't it
be much better to just check whether the dentry is hashed instead of
introducing a new flag for this case? Couldn't we just check for
"d_unhashed()"?

Anyway, I don't hate the 2/2 thing, but these are clearly related, and
I'd like to understand why we had that odd DCACHE_DISCONNECTED test in
the first place when there *seems* to be more straightforward
approaches to it?

              Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal
  2012-09-18 15:56       ` Linus Torvalds
@ 2012-09-18 16:27         ` Miklos Szeredi
  2012-09-18 18:22           ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Miklos Szeredi @ 2012-09-18 16:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel, linux-kernel, hch, Trond.Myklebust

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, Sep 18, 2012 at 7:53 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> Yes, with the test cases that IBM were using it is DCACHE_DISCONNECTED
>> case that triggers the double-lock.  Trond was misusing
>> DCACHE_DISCONNECTED and this made the failure in try_to_ascend() much
>> more likely (and bogus).  But there is a case, which is triggered rarely
>> if ever, when try_to_ascend() failure with rename_lock held is perfectly
>> valid.
>
> Ok. The whole DCACHE_DISCONNECTED logic there is clearly bogus and
> results in endless loops if that case then ever triggers, but you fix
> that in the second patch.
>
> HOWEVER. Why introduce that new DCACHE_KILLED flag at all? Wouldn't it
> be much better to just check whether the dentry is hashed instead of
> introducing a new flag for this case? Couldn't we just check for
> "d_unhashed()"?

Not good, because an unhashed dentry can stay around and that would mean
that we'd need to restart the walk until the last ref to the dentry is
dropped (which can be an arbitrary long time).

Look at commit c83ce989.  Before that patch try_to_ascend() was using
old->d_parent != NULL (implicit in the "new != old->d_parent" test) to
test for the dentry being killed.  I think we need to keep that logic,
and using a dentry flag for that looks simple enough.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal
  2012-09-18 16:27         ` Miklos Szeredi
@ 2012-09-18 18:22           ` Linus Torvalds
  0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2012-09-18 18:22 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Al Viro, linux-fsdevel, linux-kernel, hch, Trond.Myklebust

On Tue, Sep 18, 2012 at 9:27 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>>
>> HOWEVER. Why introduce that new DCACHE_KILLED flag at all? Wouldn't it
>> be much better to just check whether the dentry is hashed instead of
>> introducing a new flag for this case? Couldn't we just check for
>> "d_unhashed()"?
>
> Not good, because an unhashed dentry can stay around and that would mean
> that we'd need to restart the walk until the last ref to the dentry is
> dropped (which can be an arbitrary long time).

Ok, fair enough.

I'll apply your second patch as obviously fixing one issue, and wait
for Al to hopefully come up with something nicer for the first one.

         Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] trivial select_parent documentation fix
  2012-09-17 20:23 [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal Miklos Szeredi
  2012-09-17 20:31 ` [PATCH 2/2] vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill() Miklos Szeredi
  2012-09-17 20:39 ` [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal Al Viro
@ 2012-09-18 20:35 ` J. Bruce Fields
  2 siblings, 0 replies; 9+ messages in thread
From: J. Bruce Fields @ 2012-09-18 20:35 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: viro, linux-fsdevel, linux-kernel, hch, torvalds, Trond.Myklebust

"Search list for X" sounds like you're trying to find X on a list.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

---

On Mon, Sep 17, 2012 at 10:23:30PM +0200, Miklos Szeredi wrote:
> IBM reported a deadlock in select_parent().

(And I went to look at select_parent() and got tripped up for a few
moments on this comment.  Arguably my reading comprehension is just
bad.)--b.

diff --git a/fs/dcache.c b/fs/dcache.c
index 8086636..a0b8d65 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1141,7 +1141,7 @@ rename_retry:
 EXPORT_SYMBOL(have_submounts);
 
 /*
- * Search the dentry child list for the specified parent,
+ * Search the dentry child list of the specified parent,
  * and move any unused dentries to the end of the unused
  * list for prune_dcache(). We descend to the next level
  * whenever the d_subdirs list is non-empty and continue

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-09-18 20:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-17 20:23 [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal Miklos Szeredi
2012-09-17 20:31 ` [PATCH 2/2] vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill() Miklos Szeredi
2012-09-17 20:39 ` [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal Al Viro
2012-09-17 22:09   ` Linus Torvalds
2012-09-18 14:53     ` Miklos Szeredi
2012-09-18 15:56       ` Linus Torvalds
2012-09-18 16:27         ` Miklos Szeredi
2012-09-18 18:22           ` Linus Torvalds
2012-09-18 20:35 ` [PATCH] trivial select_parent documentation fix J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).