* [PATCH] fs: Work around NFS wreckage
@ 2011-01-13 13:54 Thomas Gleixner
2011-01-13 14:10 ` J. R. Okajima
2011-01-13 14:12 ` Uwe Kleine-König
0 siblings, 2 replies; 12+ messages in thread
From: Thomas Gleixner @ 2011-01-13 13:54 UTC (permalink / raw)
To: Nick Piggin
Cc: LKML, Linus Torvalds, linux-fsdevel, u.kleine-koenig, Ramirez Luna, Omar
The dcache scalability work broke NFS root filesystems.
"cd /" results in the following problem:
link_path_walk("/",...);
jumps to return_reval
need_reval_dot() returns true for NFS
d_revalidate()
dentry->d_op->d_revalidate(dentry, nd);
returns -ECHILD due to nd->flags & LOOKUP_RCU
nameidata_dentry_drop_rcu()
spin_lock(&parent->d_lock);
spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
This deadlocks because dentry == parent
This problem exists for any filesystem which implements d_revalidate.
Uwe bisected is down to commit 34286d6(fs: rcu-walk aware d_revalidate
method), but reverting that patch causes different wreckage to show up.
Check for parent equal dentry and skip the nested lock to avoid the
deadlock. I'm sure this is the wrong fix, but at least it "works" :)
Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Reported-by: "Ramirez Luna, Omar" <omar.ramirez@ti.com>
Not-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
fs/namei.c | 4 ++++
1 file changed, 4 insertions(+)
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c
+++ linux-2.6/fs/namei.c
@@ -487,6 +487,8 @@ static int nameidata_dentry_drop_rcu(str
goto err_root;
}
spin_lock(&parent->d_lock);
+ if (parent == dentry)
+ goto same;
spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
if (!__d_rcu_to_refcount(dentry, nd->seq))
goto err;
@@ -499,6 +501,8 @@ static int nameidata_dentry_drop_rcu(str
BUG_ON(!parent->d_count);
parent->d_count++;
spin_unlock(&dentry->d_lock);
+
+same:
spin_unlock(&parent->d_lock);
if (nd->root.mnt) {
path_get(&nd->root);
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: Work around NFS wreckage
2011-01-13 13:54 [PATCH] fs: Work around NFS wreckage Thomas Gleixner
@ 2011-01-13 14:10 ` J. R. Okajima
2011-01-13 15:17 ` Uwe Kleine-König
2011-01-13 16:41 ` Thomas Gleixner
2011-01-13 14:12 ` Uwe Kleine-König
1 sibling, 2 replies; 12+ messages in thread
From: J. R. Okajima @ 2011-01-13 14:10 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Nick Piggin, LKML, Linus Torvalds, linux-fsdevel,
u.kleine-koenig, Ramirez Luna, Omar
Thomas Gleixner:
> The dcache scalability work broke NFS root filesystems.
:::
> Check for parent equal dentry and skip the nested lock to avoid the
> deadlock. I'm sure this is the wrong fix, but at least it "works" :)
With this patch, can you unmount it cleanly?
It skips incrementing d_count for both dentries.
Do we need at least one?
J. R. Okajima
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: Work around NFS wreckage
2011-01-13 13:54 [PATCH] fs: Work around NFS wreckage Thomas Gleixner
2011-01-13 14:10 ` J. R. Okajima
@ 2011-01-13 14:12 ` Uwe Kleine-König
1 sibling, 0 replies; 12+ messages in thread
From: Uwe Kleine-König @ 2011-01-13 14:12 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Nick Piggin, LKML, Linus Torvalds, linux-fsdevel, Ramirez Luna, Omar
Hello,
On Thu, Jan 13, 2011 at 02:54:30PM +0100, Thomas Gleixner wrote:
> The dcache scalability work broke NFS root filesystems.
>
> "cd /" results in the following problem:
>
> link_path_walk("/",...);
> jumps to return_reval
> need_reval_dot() returns true for NFS
> d_revalidate()
> dentry->d_op->d_revalidate(dentry, nd);
> returns -ECHILD due to nd->flags & LOOKUP_RCU
> nameidata_dentry_drop_rcu()
> spin_lock(&parent->d_lock);
> spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
>
> This deadlocks because dentry == parent
>
> This problem exists for any filesystem which implements d_revalidate.
>
> Uwe bisected is down to commit 34286d6(fs: rcu-walk aware d_revalidate
> method), but reverting that patch causes different wreckage to show up.
>
> Check for parent equal dentry and skip the nested lock to avoid the
> deadlock. I'm sure this is the wrong fix, but at least it "works" :)
>
> Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
> Reported-by: "Ramirez Luna, Omar" <omar.ramirez@ti.com>
> Not-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> fs/namei.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> Index: linux-2.6/fs/namei.c
> ===================================================================
> --- linux-2.6.orig/fs/namei.c
> +++ linux-2.6/fs/namei.c
> @@ -487,6 +487,8 @@ static int nameidata_dentry_drop_rcu(str
> goto err_root;
> }
> spin_lock(&parent->d_lock);
> + if (parent == dentry)
> + goto same;
> spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> if (!__d_rcu_to_refcount(dentry, nd->seq))
> goto err;
> @@ -499,6 +501,8 @@ static int nameidata_dentry_drop_rcu(str
> BUG_ON(!parent->d_count);
> parent->d_count++;
> spin_unlock(&dentry->d_lock);
> +
> +same:
> spin_unlock(&parent->d_lock);
> if (nd->root.mnt) {
> path_get(&nd->root);
>
Note there is a different patch available in the thread here:
http://thread.gmane.org/gmane.linux.kernel/1087013/focus=1087048
The differences are that it tests for IS_ROOT(dentry) instead of parent ==
dentry (which looks more reasonable IMVHO) and that it increases
parent->d_count even if the test triggered.
(And it doesn't skip the BUG_ONs which hopefully doesn't make a
difference.)
Note I really have no glue about the code below fs/, but I wonder if
the toplevel directories of mounts need some treatment here, too. (But
I expect that they don't. So I ask just in case ...)
Best regards
Uwe
--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | http://www.pengutronix.de/ |
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: Work around NFS wreckage
2011-01-13 14:10 ` J. R. Okajima
@ 2011-01-13 15:17 ` Uwe Kleine-König
2011-01-14 1:30 ` J. R. Okajima
2011-01-13 16:41 ` Thomas Gleixner
1 sibling, 1 reply; 12+ messages in thread
From: Uwe Kleine-König @ 2011-01-13 15:17 UTC (permalink / raw)
To: J. R. Okajima
Cc: Thomas Gleixner, Nick Piggin, LKML, Linus Torvalds,
linux-fsdevel, Ramirez Luna, Omar
Hello,
On Thu, Jan 13, 2011 at 11:10:45PM +0900, J. R. Okajima wrote:
> Thomas Gleixner:
> > The dcache scalability work broke NFS root filesystems.
> :::
> > Check for parent equal dentry and skip the nested lock to avoid the
> > deadlock. I'm sure this is the wrong fix, but at least it "works" :)
>
> With this patch, can you unmount it cleanly?
> It skips incrementing d_count for both dentries.
> Do we need at least one?
If you tell me how to test that, I volunteer to try.
Something involving pivot_root?
Best regards
Uwe
--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | http://www.pengutronix.de/ |
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: Work around NFS wreckage
2011-01-13 14:10 ` J. R. Okajima
2011-01-13 15:17 ` Uwe Kleine-König
@ 2011-01-13 16:41 ` Thomas Gleixner
2011-01-14 1:01 ` Nick Piggin
1 sibling, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2011-01-13 16:41 UTC (permalink / raw)
To: J. R. Okajima
Cc: Nick Piggin, LKML, Linus Torvalds, linux-fsdevel,
u.kleine-koenig, Ramirez Luna, Omar
On Thu, 13 Jan 2011, J. R. Okajima wrote:
> Thomas Gleixner:
> > The dcache scalability work broke NFS root filesystems.
> :::
> > Check for parent equal dentry and skip the nested lock to avoid the
> > deadlock. I'm sure this is the wrong fix, but at least it "works" :)
>
> With this patch, can you unmount it cleanly?
> It skips incrementing d_count for both dentries.
> Do we need at least one?
Not sure, as we should not end up there at all I guess.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: Work around NFS wreckage
2011-01-13 16:41 ` Thomas Gleixner
@ 2011-01-14 1:01 ` Nick Piggin
2011-01-14 9:29 ` Thomas Gleixner
0 siblings, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2011-01-14 1:01 UTC (permalink / raw)
To: Thomas Gleixner
Cc: J. R. Okajima, Nick Piggin, LKML, Linus Torvalds, linux-fsdevel,
u.kleine-koenig, Ramirez Luna, Omar
On Fri, Jan 14, 2011 at 3:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 13 Jan 2011, J. R. Okajima wrote:
>> Thomas Gleixner:
>> > The dcache scalability work broke NFS root filesystems.
>> :::
>> > Check for parent equal dentry and skip the nested lock to avoid the
>> > deadlock. I'm sure this is the wrong fix, but at least it "works" :)
>>
>> With this patch, can you unmount it cleanly?
>> It skips incrementing d_count for both dentries.
>> Do we need at least one?
>
Yeah thanks everyone for reporting this.
I have a fix for it in a slightly different way, and a couple of other reported
bugs, which are going through stress testing now.
Thanks,
Nick
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: Work around NFS wreckage
2011-01-13 15:17 ` Uwe Kleine-König
@ 2011-01-14 1:30 ` J. R. Okajima
0 siblings, 0 replies; 12+ messages in thread
From: J. R. Okajima @ 2011-01-14 1:30 UTC (permalink / raw)
To: Uwe Kleine-König
Cc: Thomas Gleixner, Nick Piggin, LKML, Linus Torvalds,
linux-fsdevel, Ramirez Luna, Omar
Uwe Kleine-Knig:
> > With this patch, can you unmount it cleanly?
> > It skips incrementing d_count for both dentries.
> > Do we need at least one?
> If you tell me how to test that, I volunteer to try.
> Something involving pivot_root?
For example,
# mount host:/dir /nfs
# chroot /nfs
# umount /nfs
It is ok if /nfs is empty.
J. R. Okajima
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: Work around NFS wreckage
2011-01-14 1:01 ` Nick Piggin
@ 2011-01-14 9:29 ` Thomas Gleixner
2011-01-14 9:47 ` Sedat Dilek
2011-01-14 15:47 ` Linus Torvalds
0 siblings, 2 replies; 12+ messages in thread
From: Thomas Gleixner @ 2011-01-14 9:29 UTC (permalink / raw)
To: Nick Piggin
Cc: J. R. Okajima, Nick Piggin, LKML, Linus Torvalds, linux-fsdevel,
u.kleine-koenig, Ramirez Luna, Omar
[-- Attachment #1: Type: TEXT/PLAIN, Size: 840 bytes --]
On Fri, 14 Jan 2011, Nick Piggin wrote:
> On Fri, Jan 14, 2011 at 3:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Thu, 13 Jan 2011, J. R. Okajima wrote:
> >> Thomas Gleixner:
> >> > The dcache scalability work broke NFS root filesystems.
> >> :::
> >> > Check for parent equal dentry and skip the nested lock to avoid the
> >> > deadlock. I'm sure this is the wrong fix, but at least it "works" :)
> >>
> >> With this patch, can you unmount it cleanly?
> >> It skips incrementing d_count for both dentries.
> >> Do we need at least one?
> >
>
> Yeah thanks everyone for reporting this.
>
> I have a fix for it in a slightly different way, and a couple of other reported
> bugs, which are going through stress testing now.
Can you point us to the patches, so we can run our own set of
tests in paralell?
Thanks,
tglx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: Work around NFS wreckage
2011-01-14 9:29 ` Thomas Gleixner
@ 2011-01-14 9:47 ` Sedat Dilek
2011-01-14 15:47 ` Linus Torvalds
1 sibling, 0 replies; 12+ messages in thread
From: Sedat Dilek @ 2011-01-14 9:47 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Nick Piggin, J. R. Okajima, Nick Piggin, LKML, Linus Torvalds,
linux-fsdevel, u.kleine-koenig, Ramirez Luna, Omar
On Fri, Jan 14, 2011 at 10:29 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 14 Jan 2011, Nick Piggin wrote:
>
>> On Fri, Jan 14, 2011 at 3:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > On Thu, 13 Jan 2011, J. R. Okajima wrote:
>> >> Thomas Gleixner:
>> >> > The dcache scalability work broke NFS root filesystems.
>> >> :::
>> >> > Check for parent equal dentry and skip the nested lock to avoid the
>> >> > deadlock. I'm sure this is the wrong fix, but at least it "works" :)
>> >>
>> >> With this patch, can you unmount it cleanly?
>> >> It skips incrementing d_count for both dentries.
>> >> Do we need at least one?
>> >
>>
>> Yeah thanks everyone for reporting this.
>>
>> I have a fix for it in a slightly different way, and a couple of other reported
>> bugs, which are going through stress testing now.
>
> Can you point us to the patches, so we can run our own set of
> tests in paralell?
>
> Thanks,
>
> tglx
I guess several mentionned patches are now in [1] and I am testing
them against linux-next (next-20110114).
- Sedat -
[1] http://git.kernel.org/?p=linux/kernel/git/npiggin/linux-npiggin.git;a=shortlog;h=refs/heads/vfs-scale-working
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: Work around NFS wreckage
2011-01-14 9:29 ` Thomas Gleixner
2011-01-14 9:47 ` Sedat Dilek
@ 2011-01-14 15:47 ` Linus Torvalds
2011-01-14 16:40 ` Thomas Gleixner
1 sibling, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2011-01-14 15:47 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Nick Piggin, J. R. Okajima, Nick Piggin, LKML, linux-fsdevel,
u.kleine-koenig, Ramirez Luna, Omar
On Fri, Jan 14, 2011 at 1:29 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Can you point us to the patches, so we can run our own set of
> tests in paralell?
They should be in current -git as of yesterday evening.
In particular, I think this one is commit 90dbb77ba48d.
Linus
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: Work around NFS wreckage
2011-01-14 15:47 ` Linus Torvalds
@ 2011-01-14 16:40 ` Thomas Gleixner
2011-01-14 21:44 ` Nick Piggin
0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2011-01-14 16:40 UTC (permalink / raw)
To: Linus Torvalds
Cc: Nick Piggin, J. R. Okajima, Nick Piggin, LKML, linux-fsdevel,
u.kleine-koenig, Ramirez Luna, Omar
On Fri, 14 Jan 2011, Linus Torvalds wrote:
> On Fri, Jan 14, 2011 at 1:29 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Can you point us to the patches, so we can run our own set of
> > tests in paralell?
>
> They should be in current -git as of yesterday evening.
>
> In particular, I think this one is commit 90dbb77ba48d.
Yep, that works - though it needs the pending fixes from Nick latest
pull request as well
Thanks,
tglx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: Work around NFS wreckage
2011-01-14 16:40 ` Thomas Gleixner
@ 2011-01-14 21:44 ` Nick Piggin
0 siblings, 0 replies; 12+ messages in thread
From: Nick Piggin @ 2011-01-14 21:44 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Linus Torvalds, J. R. Okajima, Nick Piggin, LKML, linux-fsdevel,
u.kleine-koenig, Ramirez Luna, Omar
On Sat, Jan 15, 2011 at 3:40 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 14 Jan 2011, Linus Torvalds wrote:
>
>> On Fri, Jan 14, 2011 at 1:29 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> >
>> > Can you point us to the patches, so we can run our own set of
>> > tests in paralell?
>>
>> They should be in current -git as of yesterday evening.
>>
>> In particular, I think this one is commit 90dbb77ba48d.
>
> Yep, that works - though it needs the pending fixes from Nick latest
> pull request as well
Thanks for reporting and testing.
Nick
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-01-14 21:45 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-13 13:54 [PATCH] fs: Work around NFS wreckage Thomas Gleixner
2011-01-13 14:10 ` J. R. Okajima
2011-01-13 15:17 ` Uwe Kleine-König
2011-01-14 1:30 ` J. R. Okajima
2011-01-13 16:41 ` Thomas Gleixner
2011-01-14 1:01 ` Nick Piggin
2011-01-14 9:29 ` Thomas Gleixner
2011-01-14 9:47 ` Sedat Dilek
2011-01-14 15:47 ` Linus Torvalds
2011-01-14 16:40 ` Thomas Gleixner
2011-01-14 21:44 ` Nick Piggin
2011-01-13 14:12 ` Uwe Kleine-König
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).