linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fs: Work around NFS wreckage
@ 2011-01-13 13:54 Thomas Gleixner
  2011-01-13 14:10 ` J. R. Okajima
  2011-01-13 14:12 ` Uwe Kleine-König
  0 siblings, 2 replies; 12+ messages in thread
From: Thomas Gleixner @ 2011-01-13 13:54 UTC (permalink / raw)
  To: Nick Piggin
  Cc: LKML, Linus Torvalds, linux-fsdevel, u.kleine-koenig, Ramirez Luna, Omar

The dcache scalability work broke NFS root filesystems.

"cd /" results in the following problem:

    link_path_walk("/",...);
	jumps to return_reval
	need_reval_dot() returns true for NFS
	d_revalidate()
		dentry->d_op->d_revalidate(dentry, nd);
		 	returns -ECHILD due to nd->flags & LOOKUP_RCU
		nameidata_dentry_drop_rcu()
			spin_lock(&parent->d_lock);
			spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);

			This deadlocks because dentry == parent

This problem exists for any filesystem which implements d_revalidate.

Uwe bisected is down to commit 34286d6(fs: rcu-walk aware d_revalidate
method), but reverting that patch causes different wreckage to show up.

Check for parent equal dentry and skip the nested lock to avoid the
deadlock. I'm sure this is the wrong fix, but at least it "works" :)

Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Reported-by: "Ramirez Luna, Omar" <omar.ramirez@ti.com>
Not-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 fs/namei.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c
+++ linux-2.6/fs/namei.c
@@ -487,6 +487,8 @@ static int nameidata_dentry_drop_rcu(str
 			goto err_root;
 	}
 	spin_lock(&parent->d_lock);
+	if (parent == dentry)
+		goto same;
 	spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
 	if (!__d_rcu_to_refcount(dentry, nd->seq))
 		goto err;
@@ -499,6 +501,8 @@ static int nameidata_dentry_drop_rcu(str
 	BUG_ON(!parent->d_count);
 	parent->d_count++;
 	spin_unlock(&dentry->d_lock);
+
+same:
 	spin_unlock(&parent->d_lock);
 	if (nd->root.mnt) {
 		path_get(&nd->root);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fs: Work around NFS wreckage
  2011-01-13 13:54 [PATCH] fs: Work around NFS wreckage Thomas Gleixner
@ 2011-01-13 14:10 ` J. R. Okajima
  2011-01-13 15:17   ` Uwe Kleine-König
  2011-01-13 16:41   ` Thomas Gleixner
  2011-01-13 14:12 ` Uwe Kleine-König
  1 sibling, 2 replies; 12+ messages in thread
From: J. R. Okajima @ 2011-01-13 14:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Nick Piggin, LKML, Linus Torvalds, linux-fsdevel,
	u.kleine-koenig, Ramirez Luna, Omar


Thomas Gleixner:
> The dcache scalability work broke NFS root filesystems.
	:::
> Check for parent equal dentry and skip the nested lock to avoid the
> deadlock. I'm sure this is the wrong fix, but at least it "works" :)

With this patch, can you unmount it cleanly?
It skips incrementing d_count for both dentries.
Do we need at least one?


J. R. Okajima

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fs: Work around NFS wreckage
  2011-01-13 13:54 [PATCH] fs: Work around NFS wreckage Thomas Gleixner
  2011-01-13 14:10 ` J. R. Okajima
@ 2011-01-13 14:12 ` Uwe Kleine-König
  1 sibling, 0 replies; 12+ messages in thread
From: Uwe Kleine-König @ 2011-01-13 14:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Nick Piggin, LKML, Linus Torvalds, linux-fsdevel, Ramirez Luna, Omar

Hello,

On Thu, Jan 13, 2011 at 02:54:30PM +0100, Thomas Gleixner wrote:
> The dcache scalability work broke NFS root filesystems.
> 
> "cd /" results in the following problem:
> 
>     link_path_walk("/",...);
> 	jumps to return_reval
> 	need_reval_dot() returns true for NFS
> 	d_revalidate()
> 		dentry->d_op->d_revalidate(dentry, nd);
> 		 	returns -ECHILD due to nd->flags & LOOKUP_RCU
> 		nameidata_dentry_drop_rcu()
> 			spin_lock(&parent->d_lock);
> 			spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> 
> 			This deadlocks because dentry == parent
> 
> This problem exists for any filesystem which implements d_revalidate.
> 
> Uwe bisected is down to commit 34286d6(fs: rcu-walk aware d_revalidate
> method), but reverting that patch causes different wreckage to show up.
> 
> Check for parent equal dentry and skip the nested lock to avoid the
> deadlock. I'm sure this is the wrong fix, but at least it "works" :)
> 
> Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
> Reported-by: "Ramirez Luna, Omar" <omar.ramirez@ti.com>
> Not-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  fs/namei.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> Index: linux-2.6/fs/namei.c
> ===================================================================
> --- linux-2.6.orig/fs/namei.c
> +++ linux-2.6/fs/namei.c
> @@ -487,6 +487,8 @@ static int nameidata_dentry_drop_rcu(str
>  			goto err_root;
>  	}
>  	spin_lock(&parent->d_lock);
> +	if (parent == dentry)
> +		goto same;
>  	spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
>  	if (!__d_rcu_to_refcount(dentry, nd->seq))
>  		goto err;
> @@ -499,6 +501,8 @@ static int nameidata_dentry_drop_rcu(str
>  	BUG_ON(!parent->d_count);
>  	parent->d_count++;
>  	spin_unlock(&dentry->d_lock);
> +
> +same:
>  	spin_unlock(&parent->d_lock);
>  	if (nd->root.mnt) {
>  		path_get(&nd->root);
> 
Note there is a different patch available in the thread here:

	http://thread.gmane.org/gmane.linux.kernel/1087013/focus=1087048

The differences are that it tests for IS_ROOT(dentry) instead of parent ==
dentry (which looks more reasonable IMVHO) and that it increases
parent->d_count even if the test triggered.
(And it doesn't skip the BUG_ONs which hopefully doesn't make a
difference.)

Note I really have no glue about the code below fs/, but I wonder if
the toplevel directories of mounts need some treatment here, too.  (But
I expect that they don't.  So I ask just in case ...)

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fs: Work around NFS wreckage
  2011-01-13 14:10 ` J. R. Okajima
@ 2011-01-13 15:17   ` Uwe Kleine-König
  2011-01-14  1:30     ` J. R. Okajima
  2011-01-13 16:41   ` Thomas Gleixner
  1 sibling, 1 reply; 12+ messages in thread
From: Uwe Kleine-König @ 2011-01-13 15:17 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Thomas Gleixner, Nick Piggin, LKML, Linus Torvalds,
	linux-fsdevel, Ramirez Luna, Omar

Hello,

On Thu, Jan 13, 2011 at 11:10:45PM +0900, J. R. Okajima wrote:
> Thomas Gleixner:
> > The dcache scalability work broke NFS root filesystems.
> 	:::
> > Check for parent equal dentry and skip the nested lock to avoid the
> > deadlock. I'm sure this is the wrong fix, but at least it "works" :)
> 
> With this patch, can you unmount it cleanly?
> It skips incrementing d_count for both dentries.
> Do we need at least one?
If you tell me how to test that, I volunteer to try.
Something involving pivot_root?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fs: Work around NFS wreckage
  2011-01-13 14:10 ` J. R. Okajima
  2011-01-13 15:17   ` Uwe Kleine-König
@ 2011-01-13 16:41   ` Thomas Gleixner
  2011-01-14  1:01     ` Nick Piggin
  1 sibling, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2011-01-13 16:41 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Nick Piggin, LKML, Linus Torvalds, linux-fsdevel,
	u.kleine-koenig, Ramirez Luna, Omar

On Thu, 13 Jan 2011, J. R. Okajima wrote:
> Thomas Gleixner:
> > The dcache scalability work broke NFS root filesystems.
> 	:::
> > Check for parent equal dentry and skip the nested lock to avoid the
> > deadlock. I'm sure this is the wrong fix, but at least it "works" :)
> 
> With this patch, can you unmount it cleanly?
> It skips incrementing d_count for both dentries.
> Do we need at least one?

Not sure, as we should not end up there at all I guess.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fs: Work around NFS wreckage
  2011-01-13 16:41   ` Thomas Gleixner
@ 2011-01-14  1:01     ` Nick Piggin
  2011-01-14  9:29       ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2011-01-14  1:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: J. R. Okajima, Nick Piggin, LKML, Linus Torvalds, linux-fsdevel,
	u.kleine-koenig, Ramirez Luna, Omar

On Fri, Jan 14, 2011 at 3:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 13 Jan 2011, J. R. Okajima wrote:
>> Thomas Gleixner:
>> > The dcache scalability work broke NFS root filesystems.
>>       :::
>> > Check for parent equal dentry and skip the nested lock to avoid the
>> > deadlock. I'm sure this is the wrong fix, but at least it "works" :)
>>
>> With this patch, can you unmount it cleanly?
>> It skips incrementing d_count for both dentries.
>> Do we need at least one?
>

Yeah thanks everyone for reporting this.

I have a fix for it in a slightly different way, and a couple of other reported
bugs, which are going through stress testing now.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fs: Work around NFS wreckage
  2011-01-13 15:17   ` Uwe Kleine-König
@ 2011-01-14  1:30     ` J. R. Okajima
  0 siblings, 0 replies; 12+ messages in thread
From: J. R. Okajima @ 2011-01-14  1:30 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Thomas Gleixner, Nick Piggin, LKML, Linus Torvalds,
	linux-fsdevel, Ramirez Luna, Omar


Uwe Kleine-Knig:
> > With this patch, can you unmount it cleanly?
> > It skips incrementing d_count for both dentries.
> > Do we need at least one?
> If you tell me how to test that, I volunteer to try.
> Something involving pivot_root?

For example,
# mount host:/dir /nfs
# chroot /nfs
# umount /nfs

It is ok if /nfs is empty.


J. R. Okajima

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fs: Work around NFS wreckage
  2011-01-14  1:01     ` Nick Piggin
@ 2011-01-14  9:29       ` Thomas Gleixner
  2011-01-14  9:47         ` Sedat Dilek
  2011-01-14 15:47         ` Linus Torvalds
  0 siblings, 2 replies; 12+ messages in thread
From: Thomas Gleixner @ 2011-01-14  9:29 UTC (permalink / raw)
  To: Nick Piggin
  Cc: J. R. Okajima, Nick Piggin, LKML, Linus Torvalds, linux-fsdevel,
	u.kleine-koenig, Ramirez Luna, Omar

[-- Attachment #1: Type: TEXT/PLAIN, Size: 840 bytes --]

On Fri, 14 Jan 2011, Nick Piggin wrote:

> On Fri, Jan 14, 2011 at 3:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Thu, 13 Jan 2011, J. R. Okajima wrote:
> >> Thomas Gleixner:
> >> > The dcache scalability work broke NFS root filesystems.
> >>       :::
> >> > Check for parent equal dentry and skip the nested lock to avoid the
> >> > deadlock. I'm sure this is the wrong fix, but at least it "works" :)
> >>
> >> With this patch, can you unmount it cleanly?
> >> It skips incrementing d_count for both dentries.
> >> Do we need at least one?
> >
> 
> Yeah thanks everyone for reporting this.
> 
> I have a fix for it in a slightly different way, and a couple of other reported
> bugs, which are going through stress testing now.

Can you point us to the patches, so we can run our own set of
tests in paralell?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fs: Work around NFS wreckage
  2011-01-14  9:29       ` Thomas Gleixner
@ 2011-01-14  9:47         ` Sedat Dilek
  2011-01-14 15:47         ` Linus Torvalds
  1 sibling, 0 replies; 12+ messages in thread
From: Sedat Dilek @ 2011-01-14  9:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Nick Piggin, J. R. Okajima, Nick Piggin, LKML, Linus Torvalds,
	linux-fsdevel, u.kleine-koenig, Ramirez Luna, Omar

On Fri, Jan 14, 2011 at 10:29 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 14 Jan 2011, Nick Piggin wrote:
>
>> On Fri, Jan 14, 2011 at 3:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > On Thu, 13 Jan 2011, J. R. Okajima wrote:
>> >> Thomas Gleixner:
>> >> > The dcache scalability work broke NFS root filesystems.
>> >>       :::
>> >> > Check for parent equal dentry and skip the nested lock to avoid the
>> >> > deadlock. I'm sure this is the wrong fix, but at least it "works" :)
>> >>
>> >> With this patch, can you unmount it cleanly?
>> >> It skips incrementing d_count for both dentries.
>> >> Do we need at least one?
>> >
>>
>> Yeah thanks everyone for reporting this.
>>
>> I have a fix for it in a slightly different way, and a couple of other reported
>> bugs, which are going through stress testing now.
>
> Can you point us to the patches, so we can run our own set of
> tests in paralell?
>
> Thanks,
>
>        tglx

I guess several mentionned patches are now in [1] and I am testing
them against linux-next (next-20110114).

- Sedat -

[1] http://git.kernel.org/?p=linux/kernel/git/npiggin/linux-npiggin.git;a=shortlog;h=refs/heads/vfs-scale-working

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fs: Work around NFS wreckage
  2011-01-14  9:29       ` Thomas Gleixner
  2011-01-14  9:47         ` Sedat Dilek
@ 2011-01-14 15:47         ` Linus Torvalds
  2011-01-14 16:40           ` Thomas Gleixner
  1 sibling, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2011-01-14 15:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Nick Piggin, J. R. Okajima, Nick Piggin, LKML, linux-fsdevel,
	u.kleine-koenig, Ramirez Luna, Omar

On Fri, Jan 14, 2011 at 1:29 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Can you point us to the patches, so we can run our own set of
> tests in paralell?

They should be in current -git as of yesterday evening.

In particular, I think this one is commit 90dbb77ba48d.

                      Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fs: Work around NFS wreckage
  2011-01-14 15:47         ` Linus Torvalds
@ 2011-01-14 16:40           ` Thomas Gleixner
  2011-01-14 21:44             ` Nick Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2011-01-14 16:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, J. R. Okajima, Nick Piggin, LKML, linux-fsdevel,
	u.kleine-koenig, Ramirez Luna, Omar

On Fri, 14 Jan 2011, Linus Torvalds wrote:

> On Fri, Jan 14, 2011 at 1:29 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Can you point us to the patches, so we can run our own set of
> > tests in paralell?
> 
> They should be in current -git as of yesterday evening.
> 
> In particular, I think this one is commit 90dbb77ba48d.

Yep, that works - though it needs the pending fixes from Nick latest
pull request as well

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fs: Work around NFS wreckage
  2011-01-14 16:40           ` Thomas Gleixner
@ 2011-01-14 21:44             ` Nick Piggin
  0 siblings, 0 replies; 12+ messages in thread
From: Nick Piggin @ 2011-01-14 21:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, J. R. Okajima, Nick Piggin, LKML, linux-fsdevel,
	u.kleine-koenig, Ramirez Luna, Omar

On Sat, Jan 15, 2011 at 3:40 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 14 Jan 2011, Linus Torvalds wrote:
>
>> On Fri, Jan 14, 2011 at 1:29 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> >
>> > Can you point us to the patches, so we can run our own set of
>> > tests in paralell?
>>
>> They should be in current -git as of yesterday evening.
>>
>> In particular, I think this one is commit 90dbb77ba48d.
>
> Yep, that works - though it needs the pending fixes from Nick latest
> pull request as well

Thanks for reporting and testing.

Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-01-14 21:45 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-13 13:54 [PATCH] fs: Work around NFS wreckage Thomas Gleixner
2011-01-13 14:10 ` J. R. Okajima
2011-01-13 15:17   ` Uwe Kleine-König
2011-01-14  1:30     ` J. R. Okajima
2011-01-13 16:41   ` Thomas Gleixner
2011-01-14  1:01     ` Nick Piggin
2011-01-14  9:29       ` Thomas Gleixner
2011-01-14  9:47         ` Sedat Dilek
2011-01-14 15:47         ` Linus Torvalds
2011-01-14 16:40           ` Thomas Gleixner
2011-01-14 21:44             ` Nick Piggin
2011-01-13 14:12 ` Uwe Kleine-König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).