linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* lockdep "splat" on v2.6.33.5-rt23
@ 2010-06-24 15:40 John Kacur
  2010-06-24 17:03 ` john stultz
  0 siblings, 1 reply; 2+ messages in thread
From: John Kacur @ 2010-06-24 15:40 UTC (permalink / raw)
  To: Nick Piggin, Peter Zijlstra, john stultz, Thomas Gleixner
  Cc: linux-kernel, linux-rt-users

I believe this is related to the dcache scale discussion thread. I've 
shown this to Peter privately, but thought it would be useful to share it 
with everyone so we all have the same info.
The kernel is from tip/rt/2.6.33 up to commit 
faf35813f204901f85dd0c6b3c5092e0064c6c2f
It has a lot of debug options enabled, but is not modified.
The "splat" is very easy to reproduce, it simply occurs when I 
boot the kernel on my T500.

=============================================
[ INFO: possible recursive locking detected ]
2.6.33.5-rt23-tip-debug #3
---------------------------------------------
init/1 is trying to acquire lock:
 (&dentry->d_lock/1){+.+...}, at: [<ffffffff811981be>] 
shrink_dcache_parent+0x10f/0x2eb

but task is already holding lock:
 (&dentry->d_lock/1){+.+...}, at: [<ffffffff811981be>] 
shrink_dcache_parent+0x10f/0x2eb

other info that might help us debug this:
2 locks held by init/1:
 #0:  (&dentry->d_lock){+.+...}, at: [<ffffffff8119818e>] 
shrink_dcache_parent+0xdf/0x2eb
 #1:  (&dentry->d_lock/1){+.+...}, at: [<ffffffff811981be>] 
shrink_dcache_parent+0x10f/0x2eb

stack backtrace:
Pid: 1, comm: init Not tainted 2.6.33.5-rt23-tip-debug #3
Call Trace:
 [<ffffffff810a68e2>] __lock_acquire+0xcb9/0xd35
 [<ffffffff811981be>] ? shrink_dcache_parent+0x10f/0x2eb
 [<ffffffff810a701c>] lock_acquire+0xd4/0xf1
 [<ffffffff811981be>] ? shrink_dcache_parent+0x10f/0x2eb
 [<ffffffff81521006>] rt_spin_lock_nested+0x3d/0x44
 [<ffffffff811981be>] ? shrink_dcache_parent+0x10f/0x2eb
 [<ffffffff81193456>] ? dentry_lru_del_init+0x3e/0xa8
 [<ffffffff8119818e>] ? shrink_dcache_parent+0xdf/0x2eb
 [<ffffffff811981be>] shrink_dcache_parent+0x10f/0x2eb
 [<ffffffff811f59a4>] proc_flush_task+0xd4/0x26f
 [<ffffffff81065ea7>] release_task+0x47/0x650
 [<ffffffff81066e6a>] wait_consider_task+0x9ba/0x10b4
 [<ffffffff810676e1>] do_wait+0x17d/0x38b
 [<ffffffff810679e5>] sys_wait4+0xf6/0x119
 [<ffffffff810037cc>] ? sysret_check+0x27/0x62
 [<ffffffff810652a4>] ? child_wait_callback+0x0/0xa5
 [<ffffffff8100379b>] system_call_fastpath+0x16/0x1b

[jkacur@tycho tip-rt-2.6.33]$ addr2line -e vmlinux ffffffff811981be
/home/jkacur/rt.linux.git/fs/dcache.c:1033
[jkacur@tycho tip-rt-2.6.33]$ addr2line -e vmlinux ffffffff8119818e
/home/jkacur/rt.linux.git/fs/dcache.c:1023

So that is function static int select_parent(struct dentry * parent)

line 1023 is
spin_lock(&this_parent->d_lock);

and lines 1032 and 1033 are
spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
dentry_lru_del_init(dentry);

If there is any other info I can provide to help solve this, or testing 
etc, pls let me know!

John Kacur

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: lockdep "splat" on v2.6.33.5-rt23
  2010-06-24 15:40 lockdep "splat" on v2.6.33.5-rt23 John Kacur
@ 2010-06-24 17:03 ` john stultz
  0 siblings, 0 replies; 2+ messages in thread
From: john stultz @ 2010-06-24 17:03 UTC (permalink / raw)
  To: John Kacur
  Cc: Nick Piggin, Peter Zijlstra, Thomas Gleixner, linux-kernel,
	linux-rt-users

On Thu, 2010-06-24 at 17:40 +0200, John Kacur wrote:
> I believe this is related to the dcache scale discussion thread. I've 
> shown this to Peter privately, but thought it would be useful to share it 
> with everyone so we all have the same info.
> The kernel is from tip/rt/2.6.33 up to commit 
> faf35813f204901f85dd0c6b3c5092e0064c6c2f
> It has a lot of debug options enabled, but is not modified.
> The "splat" is very easy to reproduce, it simply occurs when I 
> boot the kernel on my T500.
> 
> =============================================
> [ INFO: possible recursive locking detected ]
> 2.6.33.5-rt23-tip-debug #3
> ---------------------------------------------
> init/1 is trying to acquire lock:
>  (&dentry->d_lock/1){+.+...}, at: [<ffffffff811981be>] 
> shrink_dcache_parent+0x10f/0x2eb
> 
> but task is already holding lock:
>  (&dentry->d_lock/1){+.+...}, at: [<ffffffff811981be>] 
> shrink_dcache_parent+0x10f/0x2eb

This looks like the issue Peter brought up earlier this week. I think
you were cc'ed (although it may have been your gmail account).

It seems my fix for the earlier dput/select_parent race is causing this.
Lockdep doesn't allow us to lock sub-chains, so any time select_parent
descends two directories down, this will trigger.

Right off I'm not sure what to do about it. We can't just revert, since
that will open the race up, and trying to serialize dput/select parent
using something other then the parent/child dentry->d_locks will
probably be akin to the dcache_lock, and will hurt scalability.

I need to read through Nick's new patchset and see how its changed and
try to adapt any fixes to the -rt tree, but that's competing with some
other critical issues I'm working at the moment.

thanks
-john



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-06-24 17:04 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-24 15:40 lockdep "splat" on v2.6.33.5-rt23 John Kacur
2010-06-24 17:03 ` john stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).