linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* combinatorial explosion in lockdep
@ 2008-07-28 22:37 David Miller
  2008-07-28 22:50 ` Jeremy Fitzhardinge
  2008-07-28 23:13 ` David Miller
  0 siblings, 2 replies; 21+ messages in thread
From: David Miller @ 2008-07-28 22:37 UTC (permalink / raw)
  To: mingo; +Cc: peterz, linux-kernel


As Peter and some others have known I've been trying to track down a
full system wedge that occurs early in the userland boot on sparc64
Niagara systems when lockdep is enabled.  This has been happening
since I first added sparc64 lockdep support, the problem has always
been there.

Some fiddling around recently showed that booting with max_cpus=16
allowed the system to fully boot, albeit very slowly.

That was a clue.

My suspicion became that somehow the number of active runqueues
contributes to the problem.

Also, the problem seems to occur when udev is forked off to
load driver modules.

So I added a piece of debugging that would capture all cpu's program
counters and initial backtrace even if they had interrupts disabled,
when the softlockup watchdog triggers.  Then I lowered the softlockup
watchdog threshold to only a few seconds so that I could more quickly
trigger the dumps.

The triggering event seems to be kstopmachine as done by the first
module load.  This seems to create the situation where we do double RQ
locking on essentially all the run queues, in order to move the
kstopmachine threads around to their proprer cpus.  I think this is
what starts to make the lockdep dependency chains huge.

Then, all the cpus wedge trying to grab a particular RQ lock, with one
of the cpus cycling seemingly endlessly in find_usage_backwards().

At first I thought we were getting killed by recurion on sparc64.
After going 8 or more entries deep, every recursive call or return
from such a call will write or read 128 bytes from the stack for
the window spill/fill traps sparc uses to manage the register windows.
So I rewrote find_usage_{backwards/forwards} and check_usage() to
use iteration instead of recurion.

The problem persisted, and I was still seeing one of the cpus spending
tons of time in find_usage_backwards().

So I added debugging to count how many lock_class entries are visited
in a single top-level invocation of find_usage_{backwards,forwards}()
and then I printed out this information when the softlockup watchdog
triggered.

We find:

[ 1287.450897] BUG: max_forwards_checks[73290] max_backwards_checks[56968881]
[ 1287.456186] BUG: max_forwards_class --> (&rq->rq_lock_key#3){++..}
[ 1287.461866] BUG: max_backwards_class --> (&rt_b->rt_runtime_lock){+...}

Just under 57 million lock_class entries visited in a single top-level
find_usage_backwards() call, on rt_b->rt_runtime_lock.  No wonder the
machine grinds to a halt :-)

And on the forwards side, the winner is the suspected runqueue lock,
but it's not as bad as the backwards chain from the rt_runtime_lock.

I'm still digging on what exactly makes this happen, but I wanted to
get the information out as soon as I had something useful like this.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-08-04 12:22 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-28 22:37 combinatorial explosion in lockdep David Miller
2008-07-28 22:50 ` Jeremy Fitzhardinge
2008-07-28 22:56   ` David Miller
2008-07-28 23:13 ` David Miller
2008-07-28 23:51   ` Ingo Molnar
2008-07-29  0:44     ` David Miller
2008-07-30  4:45       ` David Miller
2008-07-30  6:56         ` David Miller
2008-07-30  7:21           ` Peter Zijlstra
2008-07-30  7:19         ` Peter Zijlstra
2008-07-30 11:15         ` Peter Zijlstra
2008-07-31 16:50           ` [stable] " Greg KH
2008-07-30 11:26         ` [PATCH] lockdep: change scheduler annotation Peter Zijlstra
2008-07-30 11:34           ` David Miller
2008-07-31 16:34           ` Ingo Molnar
2008-07-31 16:39         ` combinatorial explosion in lockdep Ingo Molnar
2008-08-01  9:22           ` Ingo Molnar
2008-08-01  9:32             ` David Miller
2008-08-01 11:57               ` Hugh Dickins
2008-08-03  8:14                 ` David Miller
2008-08-04 12:21                   ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).