linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* combinatorial explosion in lockdep
@ 2008-07-28 22:37 David Miller
  2008-07-28 22:50 ` Jeremy Fitzhardinge
  2008-07-28 23:13 ` David Miller
  0 siblings, 2 replies; 21+ messages in thread
From: David Miller @ 2008-07-28 22:37 UTC (permalink / raw)
  To: mingo; +Cc: peterz, linux-kernel


As Peter and some others have known I've been trying to track down a
full system wedge that occurs early in the userland boot on sparc64
Niagara systems when lockdep is enabled.  This has been happening
since I first added sparc64 lockdep support, the problem has always
been there.

Some fiddling around recently showed that booting with max_cpus=16
allowed the system to fully boot, albeit very slowly.

That was a clue.

My suspicion became that somehow the number of active runqueues
contributes to the problem.

Also, the problem seems to occur when udev is forked off to
load driver modules.

So I added a piece of debugging that would capture all cpu's program
counters and initial backtrace even if they had interrupts disabled,
when the softlockup watchdog triggers.  Then I lowered the softlockup
watchdog threshold to only a few seconds so that I could more quickly
trigger the dumps.

The triggering event seems to be kstopmachine as done by the first
module load.  This seems to create the situation where we do double RQ
locking on essentially all the run queues, in order to move the
kstopmachine threads around to their proprer cpus.  I think this is
what starts to make the lockdep dependency chains huge.

Then, all the cpus wedge trying to grab a particular RQ lock, with one
of the cpus cycling seemingly endlessly in find_usage_backwards().

At first I thought we were getting killed by recurion on sparc64.
After going 8 or more entries deep, every recursive call or return
from such a call will write or read 128 bytes from the stack for
the window spill/fill traps sparc uses to manage the register windows.
So I rewrote find_usage_{backwards/forwards} and check_usage() to
use iteration instead of recurion.

The problem persisted, and I was still seeing one of the cpus spending
tons of time in find_usage_backwards().

So I added debugging to count how many lock_class entries are visited
in a single top-level invocation of find_usage_{backwards,forwards}()
and then I printed out this information when the softlockup watchdog
triggered.

We find:

[ 1287.450897] BUG: max_forwards_checks[73290] max_backwards_checks[56968881]
[ 1287.456186] BUG: max_forwards_class --> (&rq->rq_lock_key#3){++..}
[ 1287.461866] BUG: max_backwards_class --> (&rt_b->rt_runtime_lock){+...}

Just under 57 million lock_class entries visited in a single top-level
find_usage_backwards() call, on rt_b->rt_runtime_lock.  No wonder the
machine grinds to a halt :-)

And on the forwards side, the winner is the suspected runqueue lock,
but it's not as bad as the backwards chain from the rt_runtime_lock.

I'm still digging on what exactly makes this happen, but I wanted to
get the information out as soon as I had something useful like this.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-07-28 22:37 combinatorial explosion in lockdep David Miller
@ 2008-07-28 22:50 ` Jeremy Fitzhardinge
  2008-07-28 22:56   ` David Miller
  2008-07-28 23:13 ` David Miller
  1 sibling, 1 reply; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-28 22:50 UTC (permalink / raw)
  To: David Miller; +Cc: mingo, peterz, linux-kernel, Rusty Russell

David Miller wrote:
> The triggering event seems to be kstopmachine as done by the first
> module load.  This seems to create the situation where we do double RQ
> locking on essentially all the run queues, in order to move the
> kstopmachine threads around to their proprer cpus.  I think this is
> what starts to make the lockdep dependency chains huge.
>   

Does Rusty's stop_machine rewrite which puts the threads on the correct 
cpus to start with at least mitigate the symptoms?

    J

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-07-28 22:50 ` Jeremy Fitzhardinge
@ 2008-07-28 22:56   ` David Miller
  0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2008-07-28 22:56 UTC (permalink / raw)
  To: jeremy; +Cc: mingo, peterz, linux-kernel, rusty

From: Jeremy Fitzhardinge <jeremy@goop.org>
Date: Mon, 28 Jul 2008 15:50:04 -0700

> David Miller wrote:
> > The triggering event seems to be kstopmachine as done by the first
> > module load.  This seems to create the situation where we do double RQ
> > locking on essentially all the run queues, in order to move the
> > kstopmachine threads around to their proprer cpus.  I think this is
> > what starts to make the lockdep dependency chains huge.
> >   
> 
> Does Rusty's stop_machine rewrite which puts the threads on the correct 
> cpus to start with at least mitigate the symptoms?

I could check that, but it won't keep this from happening
and being triggerable by a trivial userland program setting
scheduler affinities.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-07-28 22:37 combinatorial explosion in lockdep David Miller
  2008-07-28 22:50 ` Jeremy Fitzhardinge
@ 2008-07-28 23:13 ` David Miller
  2008-07-28 23:51   ` Ingo Molnar
  1 sibling, 1 reply; 21+ messages in thread
From: David Miller @ 2008-07-28 23:13 UTC (permalink / raw)
  To: mingo; +Cc: peterz, linux-kernel

From: David Miller <davem@davemloft.net>
Date: Mon, 28 Jul 2008 15:37:02 -0700 (PDT)

> I'm still digging on what exactly makes this happen, but I wanted to
> get the information out as soon as I had something useful like this.

As a simple hack, I tried mitigating these effects using a
generation-count based "visited" scheme to bypass traversing
lock_class chains we've already walked down.

It seems to work.

But I wonder if the real problem is that there is some unintended
loop in the chains, or something like that.

Anyways, just some more info...

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 2486eb4..1bfdc30 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -89,6 +89,7 @@ struct lock_class {
 
 	struct lockdep_subclass_key	*key;
 	unsigned int			subclass;
+	unsigned int			dep_gen_id;
 
 	/*
 	 * IRQ/softirq usage tracking bits:
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index d38a643..59218a0 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -959,6 +959,18 @@ static int noinline print_infinite_recursion_bug(void)
 	return 0;
 }
 
+static unsigned int dependency_gen_id;
+
+static bool dependency_visit(struct lock_class *source, unsigned int depth)
+{
+	if (!depth)
+		dependency_gen_id++;
+	if (source->dep_gen_id == dependency_gen_id)
+		return true;
+	source->dep_gen_id = dependency_gen_id;
+	return false;
+}
+
 /*
  * Prove that the dependency graph starting at <entry> can not
  * lead to <target>. Print an error and return 0 if it does.
@@ -968,6 +980,9 @@ check_noncircular(struct lock_class *source, unsigned int depth)
 {
 	struct lock_list *entry;
 
+	if (dependency_visit(source, depth))
+		return 1;
+
 	debug_atomic_inc(&nr_cyclic_check_recursions);
 	if (depth > max_recursion_depth)
 		max_recursion_depth = depth;
@@ -1011,6 +1026,9 @@ find_usage_forwards(struct lock_class *source, unsigned int depth)
 	struct lock_list *entry;
 	int ret;
 
+	if (dependency_visit(source, depth))
+		return 1;
+
 	if (depth > max_recursion_depth)
 		max_recursion_depth = depth;
 	if (depth >= RECURSION_LIMIT)
@@ -1050,6 +1068,9 @@ find_usage_backwards(struct lock_class *source, unsigned int depth)
 	struct lock_list *entry;
 	int ret;
 
+	if (dependency_visit(source, depth))
+		return 1;
+
 	if (!__raw_spin_is_locked(&lockdep_lock))
 		return DEBUG_LOCKS_WARN_ON(1);
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-07-28 23:13 ` David Miller
@ 2008-07-28 23:51   ` Ingo Molnar
  2008-07-29  0:44     ` David Miller
  0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2008-07-28 23:51 UTC (permalink / raw)
  To: David Miller; +Cc: peterz, linux-kernel


* David Miller <davem@davemloft.net> wrote:

> From: David Miller <davem@davemloft.net>
> Date: Mon, 28 Jul 2008 15:37:02 -0700 (PDT)
> 
> > I'm still digging on what exactly makes this happen, but I wanted to 
> > get the information out as soon as I had something useful like this.
> 
> As a simple hack, I tried mitigating these effects using a 
> generation-count based "visited" scheme to bypass traversing 
> lock_class chains we've already walked down.
> 
> It seems to work.

nice!

Any chance to get the "cat /proc/lockdep*" output, so that we could see 
and check the expected behavior of the full graph?

But with 128 cpus and double locking taking in the scheduler, there's 
around 127*128/2 possible combinations of cpu rq lock ordering, that's 
thousands of chains, and that takes a quadratic number of steps to 
iterate via the current algorithm - which would be in the neighborhood 
of 127^4 / 4, or about 65 million steps for each new change to the 
graph. (but it's late and i'm not sure at all about the numbers so i 
could be off by a few orders of magnitude, in either direction)

Plus the irqsafe/softirqsafe layers multiply things too.

Could you please also send the non-stack-iterator changes you did as 
well - that's cool stuff too!

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-07-28 23:51   ` Ingo Molnar
@ 2008-07-29  0:44     ` David Miller
  2008-07-30  4:45       ` David Miller
  0 siblings, 1 reply; 21+ messages in thread
From: David Miller @ 2008-07-29  0:44 UTC (permalink / raw)
  To: mingo; +Cc: peterz, linux-kernel

From: Ingo Molnar <mingo@elte.hu>
Date: Tue, 29 Jul 2008 01:51:33 +0200

> Any chance to get the "cat /proc/lockdep*" output, so that we could see 
> and check the expected behavior of the full graph?

/proc/lockdep loops forever in count_forward_deps() :-)

I was able to capture a copy of lockdep_chains:

http://vger.kernel.org/~davem/lockdep_chains.bz2

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-07-29  0:44     ` David Miller
@ 2008-07-30  4:45       ` David Miller
  2008-07-30  6:56         ` David Miller
                           ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: David Miller @ 2008-07-30  4:45 UTC (permalink / raw)
  To: mingo; +Cc: peterz, linux-kernel

From: David Miller <davem@davemloft.net>
Date: Mon, 28 Jul 2008 17:44:15 -0700 (PDT)

> From: Ingo Molnar <mingo@elte.hu>
> Date: Tue, 29 Jul 2008 01:51:33 +0200
> 
> > Any chance to get the "cat /proc/lockdep*" output, so that we could see 
> > and check the expected behavior of the full graph?
> 
> /proc/lockdep loops forever in count_forward_deps() :-)
> 
> I was able to capture a copy of lockdep_chains:
> 
> http://vger.kernel.org/~davem/lockdep_chains.bz2

As a followup I dumped the full backwards search graph once the cost
got up to about (1 * 1024 * 1024) checks or so.

I didn't find any loops, but it is clear that the cost is huge because
of the runqueue lock double-locking, without the generation count
patch I posted the other day.

For example, if you start with the first runqueue lock you search one
entry:

	1

Next, if you start with the second runqueue lock you search two
entries:

	2, 1

Next, if you start with the third runqueue lock you search
4 entries:

	3, 2, 1, 1

And the next series is:

	4, 3, 2, 1, 1, 2, 1, 1

and so on and so forth.  So the cost of a full backwards graph
traversal for N cpus is on the order of "1 << (N - 1)".  So with just
32 cpus the cost is on the order of a few billions of checks.

And then you have to factor in all of those runqueue locks's backwards
graphs that don't involve other runqueue locks (on my machine each
such sub-graph is about 200 locks deep).

Here is an updated version of my patch to solve this problem.  The only
unnice bit is that I had to move the procfs dep counting code into
lockdep.c and run it under the lockdep_lock.  This is the only way to
safely employ the dependency generation ID marking algorithm the
short-circuiting uses.

If we can agree on this as a fix, it should definitely be backported
and submitted for -stable :-)

----------------------------------------

lockdep: Fix combinatorial explosion in lock subgraph traversal.

When we traverse the graph, either forwards or backwards, we
are interested in whether a certain property exists somewhere
in a node reachable in the graph.

Therefore it is never necessary to traverse through a node more
than once to get a correct answer to the given query.

Take advantage of this property using a global ID counter so that we
need not clear all the markers in all the lock_class entries before
doing a traversal.  A new ID is choosen when we start to traverse, and
we continue through a lock_class only if it's ID hasn't been marked
with the new value yet.

This short-circuiting is essential especially for high CPU count
systems.  The scheduler has a runqueue per cpu, and needs to take
two runqueue locks at a time, which leads to long chains of
backwards and forwards subgraphs from these runqueue lock nodes.
Without the short-circuit implemented here, a graph traversal on
a runqueue lock can take up to (1 << (N - 1)) checks on a system
with N cpus.

For anything more than 16 cpus or so, lockdep will eventually bring
the machine to a complete standstill.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 2486eb4..1bfdc30 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -89,6 +89,7 @@ struct lock_class {
 
 	struct lockdep_subclass_key	*key;
 	unsigned int			subclass;
+	unsigned int			dep_gen_id;
 
 	/*
 	 * IRQ/softirq usage tracking bits:
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index d38a643..6999e64 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -372,6 +372,19 @@ unsigned int nr_process_chains;
 unsigned int max_lockdep_depth;
 unsigned int max_recursion_depth;
 
+static unsigned int lockdep_dependency_gen_id;
+
+static bool lockdep_dependency_visit(struct lock_class *source,
+				     unsigned int depth)
+{
+	if (!depth)
+		lockdep_dependency_gen_id++;
+	if (source->dep_gen_id == lockdep_dependency_gen_id)
+		return true;
+	source->dep_gen_id = lockdep_dependency_gen_id;
+	return false;
+}
+
 #ifdef CONFIG_DEBUG_LOCKDEP
 /*
  * We cannot printk in early bootup code. Not even early_printk()
@@ -558,6 +571,9 @@ static void print_lock_dependencies(struct lock_class *class, int depth)
 {
 	struct lock_list *entry;
 
+	if (lockdep_dependency_visit(class, depth))
+		return;
+
 	if (DEBUG_LOCKS_WARN_ON(depth >= 20))
 		return;
 
@@ -959,6 +975,67 @@ static int noinline print_infinite_recursion_bug(void)
 	return 0;
 }
 
+unsigned long __lockdep_count_forward_deps(struct lock_class *class,
+					   unsigned int depth)
+{
+	struct lock_list *entry;
+	unsigned long ret = 1;
+
+	if (lockdep_dependency_visit(class, depth))
+		return 0;
+
+	/*
+	 * Recurse this class's dependency list:
+	 */
+	list_for_each_entry(entry, &class->locks_after, entry)
+		ret += __lockdep_count_forward_deps(entry->class, depth + 1);
+
+	return ret;
+}
+
+unsigned long lockdep_count_forward_deps(struct lock_class *class)
+{
+	unsigned long ret, flags;
+
+	local_irq_save(flags);
+	__raw_spin_lock(&lockdep_lock);
+	ret = __lockdep_count_forward_deps(class, 0);
+	__raw_spin_unlock(&lockdep_lock);
+	local_irq_restore(flags);
+
+	return ret;
+}
+
+unsigned long __lockdep_count_backward_deps(struct lock_class *class,
+					    unsigned int depth)
+{
+	struct lock_list *entry;
+	unsigned long ret = 1;
+
+	if (lockdep_dependency_visit(class, depth))
+		return 0;
+	/*
+	 * Recurse this class's dependency list:
+	 */
+	list_for_each_entry(entry, &class->locks_before, entry)
+		ret += __lockdep_count_backward_deps(entry->class, depth + 1);
+
+	return ret;
+}
+
+unsigned long lockdep_count_backward_deps(struct lock_class *class)
+{
+	unsigned long ret, flags;
+
+	local_irq_save(flags);
+	__raw_spin_lock(&lockdep_lock);
+	ret = __lockdep_count_backward_deps(class, 0);
+	__raw_spin_unlock(&lockdep_lock);
+	local_irq_restore(flags);
+
+	return ret;
+}
+
 /*
  * Prove that the dependency graph starting at <entry> can not
  * lead to <target>. Print an error and return 0 if it does.
@@ -968,6 +1045,9 @@ check_noncircular(struct lock_class *source, unsigned int depth)
 {
 	struct lock_list *entry;
 
+	if (lockdep_dependency_visit(source, depth))
+		return 1;
+
 	debug_atomic_inc(&nr_cyclic_check_recursions);
 	if (depth > max_recursion_depth)
 		max_recursion_depth = depth;
@@ -1011,6 +1091,9 @@ find_usage_forwards(struct lock_class *source, unsigned int depth)
 	struct lock_list *entry;
 	int ret;
 
+	if (lockdep_dependency_visit(source, depth))
+		return 1;
+
 	if (depth > max_recursion_depth)
 		max_recursion_depth = depth;
 	if (depth >= RECURSION_LIMIT)
@@ -1050,6 +1133,9 @@ find_usage_backwards(struct lock_class *source, unsigned int depth)
 	struct lock_list *entry;
 	int ret;
 
+	if (lockdep_dependency_visit(source, depth))
+		return 1;
+
 	if (!__raw_spin_is_locked(&lockdep_lock))
 		return DEBUG_LOCKS_WARN_ON(1);
 
diff --git a/kernel/lockdep_internals.h b/kernel/lockdep_internals.h
index c3600a0..68d44ec 100644
--- a/kernel/lockdep_internals.h
+++ b/kernel/lockdep_internals.h
@@ -53,6 +53,9 @@ extern unsigned int nr_process_chains;
 extern unsigned int max_lockdep_depth;
 extern unsigned int max_recursion_depth;
 
+extern unsigned long lockdep_count_forward_deps(struct lock_class *);
+extern unsigned long lockdep_count_backward_deps(struct lock_class *);
+
 #ifdef CONFIG_DEBUG_LOCKDEP
 /*
  * Various lockdep statistics:
diff --git a/kernel/lockdep_proc.c b/kernel/lockdep_proc.c
index 9b0e940..6252ff7 100644
--- a/kernel/lockdep_proc.c
+++ b/kernel/lockdep_proc.c
@@ -63,34 +63,6 @@ static void l_stop(struct seq_file *m, void *v)
 {
 }
 
-static unsigned long count_forward_deps(struct lock_class *class)
-{
-	struct lock_list *entry;
-	unsigned long ret = 1;
-
-	/*
-	 * Recurse this class's dependency list:
-	 */
-	list_for_each_entry(entry, &class->locks_after, entry)
-		ret += count_forward_deps(entry->class);
-
-	return ret;
-}
-
-static unsigned long count_backward_deps(struct lock_class *class)
-{
-	struct lock_list *entry;
-	unsigned long ret = 1;
-
-	/*
-	 * Recurse this class's dependency list:
-	 */
-	list_for_each_entry(entry, &class->locks_before, entry)
-		ret += count_backward_deps(entry->class);
-
-	return ret;
-}
-
 static void print_name(struct seq_file *m, struct lock_class *class)
 {
 	char str[128];
@@ -124,10 +96,10 @@ static int l_show(struct seq_file *m, void *v)
 #ifdef CONFIG_DEBUG_LOCKDEP
 	seq_printf(m, " OPS:%8ld", class->ops);
 #endif
-	nr_forward_deps = count_forward_deps(class);
+	nr_forward_deps = lockdep_count_forward_deps(class);
 	seq_printf(m, " FD:%5ld", nr_forward_deps);
 
-	nr_backward_deps = count_backward_deps(class);
+	nr_backward_deps = lockdep_count_backward_deps(class);
 	seq_printf(m, " BD:%5ld", nr_backward_deps);
 
 	get_usage_chars(class, &c1, &c2, &c3, &c4);
@@ -350,7 +322,7 @@ static int lockdep_stats_show(struct seq_file *m, void *v)
 		if (class->usage_mask & LOCKF_ENABLED_HARDIRQS_READ)
 			nr_hardirq_read_unsafe++;
 
-		sum_forward_deps += count_forward_deps(class);
+		sum_forward_deps += lockdep_count_forward_deps(class);
 	}
 #ifdef CONFIG_DEBUG_LOCKDEP
 	DEBUG_LOCKS_WARN_ON(debug_atomic_read(&nr_unused_locks) != nr_unused);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-07-30  4:45       ` David Miller
@ 2008-07-30  6:56         ` David Miller
  2008-07-30  7:21           ` Peter Zijlstra
  2008-07-30  7:19         ` Peter Zijlstra
                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: David Miller @ 2008-07-30  6:56 UTC (permalink / raw)
  To: mingo; +Cc: peterz, linux-kernel

From: David Miller <davem@davemloft.net>
Date: Tue, 29 Jul 2008 21:45:03 -0700 (PDT)

> If we can agree on this as a fix, it should definitely be backported
> and submitted for -stable :-)
> 
> ----------------------------------------
> 
> lockdep: Fix combinatorial explosion in lock subgraph traversal.

So the next problem I run into is that 8192 is not a large
enough MAX_LOCKDEP_ENTRIES for this larger cpu count boxes.
8192 isn't even big enough for my 64 cpu box.

It needs to be scaled by NR_CPUS, or something like that.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-07-30  4:45       ` David Miller
  2008-07-30  6:56         ` David Miller
@ 2008-07-30  7:19         ` Peter Zijlstra
  2008-07-30 11:15         ` Peter Zijlstra
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2008-07-30  7:19 UTC (permalink / raw)
  To: David Miller; +Cc: mingo, linux-kernel

On Tue, 2008-07-29 at 21:45 -0700, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Mon, 28 Jul 2008 17:44:15 -0700 (PDT)
> 
> > From: Ingo Molnar <mingo@elte.hu>
> > Date: Tue, 29 Jul 2008 01:51:33 +0200
> > 
> > > Any chance to get the "cat /proc/lockdep*" output, so that we could see 
> > > and check the expected behavior of the full graph?
> > 
> > /proc/lockdep loops forever in count_forward_deps() :-)
> > 
> > I was able to capture a copy of lockdep_chains:
> > 
> > http://vger.kernel.org/~davem/lockdep_chains.bz2
> 
> As a followup I dumped the full backwards search graph once the cost
> got up to about (1 * 1024 * 1024) checks or so.
> 
> I didn't find any loops, but it is clear that the cost is huge because
> of the runqueue lock double-locking, without the generation count
> patch I posted the other day.
> 
> For example, if you start with the first runqueue lock you search one
> entry:
> 
> 	1
> 
> Next, if you start with the second runqueue lock you search two
> entries:
> 
> 	2, 1
> 
> Next, if you start with the third runqueue lock you search
> 4 entries:
> 
> 	3, 2, 1, 1
> 
> And the next series is:
> 
> 	4, 3, 2, 1, 1, 2, 1, 1
> 
> and so on and so forth.  So the cost of a full backwards graph
> traversal for N cpus is on the order of "1 << (N - 1)".  So with just
> 32 cpus the cost is on the order of a few billions of checks.
> 
> And then you have to factor in all of those runqueue locks's backwards
> graphs that don't involve other runqueue locks (on my machine each
> such sub-graph is about 200 locks deep).
> 
> Here is an updated version of my patch to solve this problem.  The only
> unnice bit is that I had to move the procfs dep counting code into
> lockdep.c and run it under the lockdep_lock.  This is the only way to
> safely employ the dependency generation ID marking algorithm the
> short-circuiting uses.
> 
> If we can agree on this as a fix, it should definitely be backported
> and submitted for -stable :-)

Way cool stuff - will try and wrap my brains around it asap.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-07-30  6:56         ` David Miller
@ 2008-07-30  7:21           ` Peter Zijlstra
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2008-07-30  7:21 UTC (permalink / raw)
  To: David Miller; +Cc: mingo, linux-kernel

On Tue, 2008-07-29 at 23:56 -0700, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 29 Jul 2008 21:45:03 -0700 (PDT)
> 
> > If we can agree on this as a fix, it should definitely be backported
> > and submitted for -stable :-)
> > 
> > ----------------------------------------
> > 
> > lockdep: Fix combinatorial explosion in lock subgraph traversal.
> 
> So the next problem I run into is that 8192 is not a large
> enough MAX_LOCKDEP_ENTRIES for this larger cpu count boxes.
> 8192 isn't even big enough for my 64 cpu box.
> 
> It needs to be scaled by NR_CPUS, or something like that.

Or something like that indeed - if distro kernels are going to be build
using NR_CPUS=4096, then these tables might explode - then again,
lockdep is only enabled on -debug kernels anyway...

So yeah.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-07-30  4:45       ` David Miller
  2008-07-30  6:56         ` David Miller
  2008-07-30  7:19         ` Peter Zijlstra
@ 2008-07-30 11:15         ` Peter Zijlstra
  2008-07-31 16:50           ` [stable] " Greg KH
  2008-07-30 11:26         ` [PATCH] lockdep: change scheduler annotation Peter Zijlstra
  2008-07-31 16:39         ` combinatorial explosion in lockdep Ingo Molnar
  4 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2008-07-30 11:15 UTC (permalink / raw)
  To: David Miller; +Cc: mingo, linux-kernel, stable

On Tue, 2008-07-29 at 21:45 -0700, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Mon, 28 Jul 2008 17:44:15 -0700 (PDT)
> 
> > From: Ingo Molnar <mingo@elte.hu>
> > Date: Tue, 29 Jul 2008 01:51:33 +0200
> > 
> > > Any chance to get the "cat /proc/lockdep*" output, so that we could see 
> > > and check the expected behavior of the full graph?
> > 
> > /proc/lockdep loops forever in count_forward_deps() :-)
> > 
> > I was able to capture a copy of lockdep_chains:
> > 
> > http://vger.kernel.org/~davem/lockdep_chains.bz2
> 
> As a followup I dumped the full backwards search graph once the cost
> got up to about (1 * 1024 * 1024) checks or so.
> 
> I didn't find any loops, but it is clear that the cost is huge because
> of the runqueue lock double-locking, without the generation count
> patch I posted the other day.
> 
> For example, if you start with the first runqueue lock you search one
> entry:
> 
> 	1
> 
> Next, if you start with the second runqueue lock you search two
> entries:
> 
> 	2, 1
> 
> Next, if you start with the third runqueue lock you search
> 4 entries:
> 
> 	3, 2, 1, 1
> 
> And the next series is:
> 
> 	4, 3, 2, 1, 1, 2, 1, 1
> 
> and so on and so forth.  So the cost of a full backwards graph
> traversal for N cpus is on the order of "1 << (N - 1)".  So with just
> 32 cpus the cost is on the order of a few billions of checks.
> 
> And then you have to factor in all of those runqueue locks's backwards
> graphs that don't involve other runqueue locks (on my machine each
> such sub-graph is about 200 locks deep).
> 
> Here is an updated version of my patch to solve this problem.  The only
> unnice bit is that I had to move the procfs dep counting code into
> lockdep.c and run it under the lockdep_lock.  This is the only way to
> safely employ the dependency generation ID marking algorithm the
> short-circuiting uses.
> 
> If we can agree on this as a fix, it should definitely be backported
> and submitted for -stable :-)

Agreed adding the stable team to the CC

> ----------------------------------------
> 
> lockdep: Fix combinatorial explosion in lock subgraph traversal.
> 
> When we traverse the graph, either forwards or backwards, we
> are interested in whether a certain property exists somewhere
> in a node reachable in the graph.
> 
> Therefore it is never necessary to traverse through a node more
> than once to get a correct answer to the given query.
> 
> Take advantage of this property using a global ID counter so that we
> need not clear all the markers in all the lock_class entries before
> doing a traversal.  A new ID is choosen when we start to traverse, and
> we continue through a lock_class only if it's ID hasn't been marked
> with the new value yet.
> 
> This short-circuiting is essential especially for high CPU count
> systems.  The scheduler has a runqueue per cpu, and needs to take
> two runqueue locks at a time, which leads to long chains of
> backwards and forwards subgraphs from these runqueue lock nodes.
> Without the short-circuit implemented here, a graph traversal on
> a runqueue lock can take up to (1 << (N - 1)) checks on a system
> with N cpus.
> 
> For anything more than 16 cpus or so, lockdep will eventually bring
> the machine to a complete standstill.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 2486eb4..1bfdc30 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -89,6 +89,7 @@ struct lock_class {
>  
>  	struct lockdep_subclass_key	*key;
>  	unsigned int			subclass;
> +	unsigned int			dep_gen_id;
>  
>  	/*
>  	 * IRQ/softirq usage tracking bits:
> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
> index d38a643..6999e64 100644
> --- a/kernel/lockdep.c
> +++ b/kernel/lockdep.c
> @@ -372,6 +372,19 @@ unsigned int nr_process_chains;
>  unsigned int max_lockdep_depth;
>  unsigned int max_recursion_depth;
>  
> +static unsigned int lockdep_dependency_gen_id;
> +
> +static bool lockdep_dependency_visit(struct lock_class *source,
> +				     unsigned int depth)
> +{
> +	if (!depth)
> +		lockdep_dependency_gen_id++;
> +	if (source->dep_gen_id == lockdep_dependency_gen_id)
> +		return true;
> +	source->dep_gen_id = lockdep_dependency_gen_id;
> +	return false;
> +}
> +
>  #ifdef CONFIG_DEBUG_LOCKDEP
>  /*
>   * We cannot printk in early bootup code. Not even early_printk()
> @@ -558,6 +571,9 @@ static void print_lock_dependencies(struct lock_class *class, int depth)
>  {
>  	struct lock_list *entry;
>  
> +	if (lockdep_dependency_visit(class, depth))
> +		return;
> +
>  	if (DEBUG_LOCKS_WARN_ON(depth >= 20))
>  		return;
>  
> @@ -959,6 +975,67 @@ static int noinline print_infinite_recursion_bug(void)
>  	return 0;
>  }
>  
> +unsigned long __lockdep_count_forward_deps(struct lock_class *class,
> +					   unsigned int depth)
> +{
> +	struct lock_list *entry;
> +	unsigned long ret = 1;
> +
> +	if (lockdep_dependency_visit(class, depth))
> +		return 0;
> +
> +	/*
> +	 * Recurse this class's dependency list:
> +	 */
> +	list_for_each_entry(entry, &class->locks_after, entry)
> +		ret += __lockdep_count_forward_deps(entry->class, depth + 1);
> +
> +	return ret;
> +}
> +
> +unsigned long lockdep_count_forward_deps(struct lock_class *class)
> +{
> +	unsigned long ret, flags;
> +
> +	local_irq_save(flags);
> +	__raw_spin_lock(&lockdep_lock);
> +	ret = __lockdep_count_forward_deps(class, 0);
> +	__raw_spin_unlock(&lockdep_lock);
> +	local_irq_restore(flags);
> +
> +	return ret;
> +}
> +
> +unsigned long __lockdep_count_backward_deps(struct lock_class *class,
> +					    unsigned int depth)
> +{
> +	struct lock_list *entry;
> +	unsigned long ret = 1;
> +
> +	if (lockdep_dependency_visit(class, depth))
> +		return 0;
> +	/*
> +	 * Recurse this class's dependency list:
> +	 */
> +	list_for_each_entry(entry, &class->locks_before, entry)
> +		ret += __lockdep_count_backward_deps(entry->class, depth + 1);
> +
> +	return ret;
> +}
> +
> +unsigned long lockdep_count_backward_deps(struct lock_class *class)
> +{
> +	unsigned long ret, flags;
> +
> +	local_irq_save(flags);
> +	__raw_spin_lock(&lockdep_lock);
> +	ret = __lockdep_count_backward_deps(class, 0);
> +	__raw_spin_unlock(&lockdep_lock);
> +	local_irq_restore(flags);
> +
> +	return ret;
> +}
> +
>  /*
>   * Prove that the dependency graph starting at <entry> can not
>   * lead to <target>. Print an error and return 0 if it does.
> @@ -968,6 +1045,9 @@ check_noncircular(struct lock_class *source, unsigned int depth)
>  {
>  	struct lock_list *entry;
>  
> +	if (lockdep_dependency_visit(source, depth))
> +		return 1;
> +
>  	debug_atomic_inc(&nr_cyclic_check_recursions);
>  	if (depth > max_recursion_depth)
>  		max_recursion_depth = depth;
> @@ -1011,6 +1091,9 @@ find_usage_forwards(struct lock_class *source, unsigned int depth)
>  	struct lock_list *entry;
>  	int ret;
>  
> +	if (lockdep_dependency_visit(source, depth))
> +		return 1;
> +
>  	if (depth > max_recursion_depth)
>  		max_recursion_depth = depth;
>  	if (depth >= RECURSION_LIMIT)
> @@ -1050,6 +1133,9 @@ find_usage_backwards(struct lock_class *source, unsigned int depth)
>  	struct lock_list *entry;
>  	int ret;
>  
> +	if (lockdep_dependency_visit(source, depth))
> +		return 1;
> +
>  	if (!__raw_spin_is_locked(&lockdep_lock))
>  		return DEBUG_LOCKS_WARN_ON(1);
>  
> diff --git a/kernel/lockdep_internals.h b/kernel/lockdep_internals.h
> index c3600a0..68d44ec 100644
> --- a/kernel/lockdep_internals.h
> +++ b/kernel/lockdep_internals.h
> @@ -53,6 +53,9 @@ extern unsigned int nr_process_chains;
>  extern unsigned int max_lockdep_depth;
>  extern unsigned int max_recursion_depth;
>  
> +extern unsigned long lockdep_count_forward_deps(struct lock_class *);
> +extern unsigned long lockdep_count_backward_deps(struct lock_class *);
> +
>  #ifdef CONFIG_DEBUG_LOCKDEP
>  /*
>   * Various lockdep statistics:
> diff --git a/kernel/lockdep_proc.c b/kernel/lockdep_proc.c
> index 9b0e940..6252ff7 100644
> --- a/kernel/lockdep_proc.c
> +++ b/kernel/lockdep_proc.c
> @@ -63,34 +63,6 @@ static void l_stop(struct seq_file *m, void *v)
>  {
>  }
>  
> -static unsigned long count_forward_deps(struct lock_class *class)
> -{
> -	struct lock_list *entry;
> -	unsigned long ret = 1;
> -
> -	/*
> -	 * Recurse this class's dependency list:
> -	 */
> -	list_for_each_entry(entry, &class->locks_after, entry)
> -		ret += count_forward_deps(entry->class);
> -
> -	return ret;
> -}
> -
> -static unsigned long count_backward_deps(struct lock_class *class)
> -{
> -	struct lock_list *entry;
> -	unsigned long ret = 1;
> -
> -	/*
> -	 * Recurse this class's dependency list:
> -	 */
> -	list_for_each_entry(entry, &class->locks_before, entry)
> -		ret += count_backward_deps(entry->class);
> -
> -	return ret;
> -}
> -
>  static void print_name(struct seq_file *m, struct lock_class *class)
>  {
>  	char str[128];
> @@ -124,10 +96,10 @@ static int l_show(struct seq_file *m, void *v)
>  #ifdef CONFIG_DEBUG_LOCKDEP
>  	seq_printf(m, " OPS:%8ld", class->ops);
>  #endif
> -	nr_forward_deps = count_forward_deps(class);
> +	nr_forward_deps = lockdep_count_forward_deps(class);
>  	seq_printf(m, " FD:%5ld", nr_forward_deps);
>  
> -	nr_backward_deps = count_backward_deps(class);
> +	nr_backward_deps = lockdep_count_backward_deps(class);
>  	seq_printf(m, " BD:%5ld", nr_backward_deps);
>  
>  	get_usage_chars(class, &c1, &c2, &c3, &c4);
> @@ -350,7 +322,7 @@ static int lockdep_stats_show(struct seq_file *m, void *v)
>  		if (class->usage_mask & LOCKF_ENABLED_HARDIRQS_READ)
>  			nr_hardirq_read_unsafe++;
>  
> -		sum_forward_deps += count_forward_deps(class);
> +		sum_forward_deps += lockdep_count_forward_deps(class);
>  	}
>  #ifdef CONFIG_DEBUG_LOCKDEP
>  	DEBUG_LOCKS_WARN_ON(debug_atomic_read(&nr_unused_locks) != nr_unused);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] lockdep: change scheduler annotation
  2008-07-30  4:45       ` David Miller
                           ` (2 preceding siblings ...)
  2008-07-30 11:15         ` Peter Zijlstra
@ 2008-07-30 11:26         ` Peter Zijlstra
  2008-07-30 11:34           ` David Miller
  2008-07-31 16:34           ` Ingo Molnar
  2008-07-31 16:39         ` combinatorial explosion in lockdep Ingo Molnar
  4 siblings, 2 replies; 21+ messages in thread
From: Peter Zijlstra @ 2008-07-30 11:26 UTC (permalink / raw)
  To: David Miller; +Cc: mingo, linux-kernel

While thinking about David's patch it _finally_ dawned on me that there
is no reason we have a lock class per cpu..

Sorry for being dense :-/

The below changes the annotation from a lock class per cpu, to a single
nested lock, as the scheduler never holds more that 2 rq locks at a time
anyway.

If there was code requiring holding all rq locks this would not work and
the original annotation would be the only option, but that not being the
case, this is a much lighter one.

Compiles and boots on a 2-way x86_64.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -600,7 +600,6 @@ struct rq {
 	/* BKL stats */
 	unsigned int bkl_count;
 #endif
-	struct lock_class_key rq_lock_key;
 };
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
@@ -2759,10 +2758,10 @@ static void double_rq_lock(struct rq *rq
 	} else {
 		if (rq1 < rq2) {
 			spin_lock(&rq1->lock);
-			spin_lock(&rq2->lock);
+			spin_lock_nested(&rq2->lock, SINGLE_DEPTH_NESTING);
 		} else {
 			spin_lock(&rq2->lock);
-			spin_lock(&rq1->lock);
+			spin_lock_nested(&rq1->lock, SINGLE_DEPTH_NESTING);
 		}
 	}
 	update_rq_clock(rq1);
@@ -2805,10 +2804,10 @@ static int double_lock_balance(struct rq
 		if (busiest < this_rq) {
 			spin_unlock(&this_rq->lock);
 			spin_lock(&busiest->lock);
-			spin_lock(&this_rq->lock);
+			spin_lock_nested(&this_rq->lock, SINGLE_DEPTH_NESTING);
 			ret = 1;
 		} else
-			spin_lock(&busiest->lock);
+			spin_lock_nested(&busiest->lock, SINGLE_DEPTH_NESTING);
 	}
 	return ret;
 }
@@ -8000,7 +7999,6 @@ void __init sched_init(void)
 
 		rq = cpu_rq(i);
 		spin_lock_init(&rq->lock);
-		lockdep_set_class(&rq->lock, &rq->rq_lock_key);
 		rq->nr_running = 0;
 		init_cfs_rq(&rq->cfs, rq);
 		init_rt_rq(&rq->rt, rq);



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] lockdep: change scheduler annotation
  2008-07-30 11:26         ` [PATCH] lockdep: change scheduler annotation Peter Zijlstra
@ 2008-07-30 11:34           ` David Miller
  2008-07-31 16:34           ` Ingo Molnar
  1 sibling, 0 replies; 21+ messages in thread
From: David Miller @ 2008-07-30 11:34 UTC (permalink / raw)
  To: peterz; +Cc: mingo, linux-kernel

From: Peter Zijlstra <peterz@infradead.org>
Date: Wed, 30 Jul 2008 13:26:57 +0200

> While thinking about David's patch it _finally_ dawned on me that there
> is no reason we have a lock class per cpu..
> 
> Sorry for being dense :-/
> 
> The below changes the annotation from a lock class per cpu, to a single
> nested lock, as the scheduler never holds more that 2 rq locks at a time
> anyway.
> 
> If there was code requiring holding all rq locks this would not work and
> the original annotation would be the only option, but that not being the
> case, this is a much lighter one.
> 
> Compiles and boots on a 2-way x86_64.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

I had been wondering this entire debugging session why the
per-rq lock classes were even there, thanks for getting
rid of them :)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] lockdep: change scheduler annotation
  2008-07-30 11:26         ` [PATCH] lockdep: change scheduler annotation Peter Zijlstra
  2008-07-30 11:34           ` David Miller
@ 2008-07-31 16:34           ` Ingo Molnar
  1 sibling, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2008-07-31 16:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: David Miller, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> While thinking about David's patch it _finally_ dawned on me that 
> there is no reason we have a lock class per cpu..

/me arghs too! :)

> Sorry for being dense :-/
> 
> The below changes the annotation from a lock class per cpu, to a 
> single nested lock, as the scheduler never holds more that 2 rq locks 
> at a time anyway.
> 
> If there was code requiring holding all rq locks this would not work 
> and the original annotation would be the only option, but that not 
> being the case, this is a much lighter one.
> 
> Compiles and boots on a 2-way x86_64.

applied to tip/sched/urgent - thanks Peter!

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-07-30  4:45       ` David Miller
                           ` (3 preceding siblings ...)
  2008-07-30 11:26         ` [PATCH] lockdep: change scheduler annotation Peter Zijlstra
@ 2008-07-31 16:39         ` Ingo Molnar
  2008-08-01  9:22           ` Ingo Molnar
  4 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2008-07-31 16:39 UTC (permalink / raw)
  To: David Miller; +Cc: peterz, linux-kernel


* David Miller <davem@davemloft.net> wrote:

> lockdep: Fix combinatorial explosion in lock subgraph traversal.

applied to tip/core/locking - thanks David. I guess we need to test this 
a bit, the patch is far from simple :-)

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [stable] combinatorial explosion in lockdep
  2008-07-30 11:15         ` Peter Zijlstra
@ 2008-07-31 16:50           ` Greg KH
  0 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2008-07-31 16:50 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: David Miller, mingo, linux-kernel, stable

On Wed, Jul 30, 2008 at 01:15:19PM +0200, Peter Zijlstra wrote:
> On Tue, 2008-07-29 at 21:45 -0700, David Miller wrote:
> > From: David Miller <davem@davemloft.net>
> > Date: Mon, 28 Jul 2008 17:44:15 -0700 (PDT)
> > 
> > > From: Ingo Molnar <mingo@elte.hu>
> > > Date: Tue, 29 Jul 2008 01:51:33 +0200
> > > 
> > > > Any chance to get the "cat /proc/lockdep*" output, so that we could see 
> > > > and check the expected behavior of the full graph?
> > > 
> > > /proc/lockdep loops forever in count_forward_deps() :-)
> > > 
> > > I was able to capture a copy of lockdep_chains:
> > > 
> > > http://vger.kernel.org/~davem/lockdep_chains.bz2
> > 
> > As a followup I dumped the full backwards search graph once the cost
> > got up to about (1 * 1024 * 1024) checks or so.
> > 
> > I didn't find any loops, but it is clear that the cost is huge because
> > of the runqueue lock double-locking, without the generation count
> > patch I posted the other day.
> > 
> > For example, if you start with the first runqueue lock you search one
> > entry:
> > 
> > 	1
> > 
> > Next, if you start with the second runqueue lock you search two
> > entries:
> > 
> > 	2, 1
> > 
> > Next, if you start with the third runqueue lock you search
> > 4 entries:
> > 
> > 	3, 2, 1, 1
> > 
> > And the next series is:
> > 
> > 	4, 3, 2, 1, 1, 2, 1, 1
> > 
> > and so on and so forth.  So the cost of a full backwards graph
> > traversal for N cpus is on the order of "1 << (N - 1)".  So with just
> > 32 cpus the cost is on the order of a few billions of checks.
> > 
> > And then you have to factor in all of those runqueue locks's backwards
> > graphs that don't involve other runqueue locks (on my machine each
> > such sub-graph is about 200 locks deep).
> > 
> > Here is an updated version of my patch to solve this problem.  The only
> > unnice bit is that I had to move the procfs dep counting code into
> > lockdep.c and run it under the lockdep_lock.  This is the only way to
> > safely employ the dependency generation ID marking algorithm the
> > short-circuiting uses.
> > 
> > If we can agree on this as a fix, it should definitely be backported
> > and submitted for -stable :-)
> 
> Agreed adding the stable team to the CC

Great, please send the stable team the patch when it hits Linus's tree.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-07-31 16:39         ` combinatorial explosion in lockdep Ingo Molnar
@ 2008-08-01  9:22           ` Ingo Molnar
  2008-08-01  9:32             ` David Miller
  0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2008-08-01  9:22 UTC (permalink / raw)
  To: David Miller; +Cc: peterz, linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * David Miller <davem@davemloft.net> wrote:
> 
> > lockdep: Fix combinatorial explosion in lock subgraph traversal.
> 
> applied to tip/core/locking - thanks David. I guess we need to test 
> this a bit, the patch is far from simple :-)

small build fallout fix below.

	Ingo

---------------->
>From 8887b1c04671c66dcde3d4104913ba9587e1e40c Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Fri, 1 Aug 2008 11:23:50 +0200
Subject: [PATCH] lockdep: build fix

fix:

 kernel/built-in.o: In function `lockdep_stats_show':
 lockdep_proc.c:(.text+0x3cb2f): undefined reference to `lockdep_count_forward_deps'
 kernel/built-in.o: In function `l_show':
 lockdep_proc.c:(.text+0x3d02b): undefined reference to `lockdep_count_forward_deps'
 lockdep_proc.c:(.text+0x3d047): undefined reference to `lockdep_count_backward_deps'

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/lockdep_internals.h |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/kernel/lockdep_internals.h b/kernel/lockdep_internals.h
index 68d44ec..f5c6a14 100644
--- a/kernel/lockdep_internals.h
+++ b/kernel/lockdep_internals.h
@@ -53,8 +53,21 @@ extern unsigned int nr_process_chains;
 extern unsigned int max_lockdep_depth;
 extern unsigned int max_recursion_depth;
 
+#ifdef CONFIG_PROVE_LOCKING
 extern unsigned long lockdep_count_forward_deps(struct lock_class *);
 extern unsigned long lockdep_count_backward_deps(struct lock_class *);
+#else
+static inline unsigned long
+lockdep_count_forward_deps(struct lock_class *class)
+{
+	return 0;
+}
+static inline unsigned long
+lockdep_count_backward_deps(struct lock_class *class)
+{
+	return 0;
+}
+#endif
 
 #ifdef CONFIG_DEBUG_LOCKDEP
 /*

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-08-01  9:22           ` Ingo Molnar
@ 2008-08-01  9:32             ` David Miller
  2008-08-01 11:57               ` Hugh Dickins
  0 siblings, 1 reply; 21+ messages in thread
From: David Miller @ 2008-08-01  9:32 UTC (permalink / raw)
  To: mingo; +Cc: peterz, linux-kernel

From: Ingo Molnar <mingo@elte.hu>
Date: Fri, 1 Aug 2008 11:22:19 +0200

> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * David Miller <davem@davemloft.net> wrote:
> > 
> > > lockdep: Fix combinatorial explosion in lock subgraph traversal.
> > 
> > applied to tip/core/locking - thanks David. I guess we need to test 
> > this a bit, the patch is far from simple :-)
> 
> small build fallout fix below.

Thanks.

BTW, until something like Peter's attempt is working, we need to also
scale some of the lockdep limits by NR_CPUS.  The formula I came up
with that worked with my 32-cpu, 64-cpu and 128-cpu machines was:

#define __LOCKDEP_NR_CPU_SCALE	\
	((NR_CPUS <= 16) ? 0 : ilog2(NR_CPUS) - 4)

#define MAX_LOCKDEP_ENTRIES	(8192UL << __LOCKDEP_NR_CPU_SCALE)

#define MAX_LOCKDEP_CHAINS_BITS	(16 + __LOCKDEP_NR_CPU_SCALE)
#define MAX_LOCKDEP_CHAINS	(1UL << MAX_LOCKDEP_CHAINS_BITS)

But this is going to explode for NR_CPUS=4096, but it is the only way
to get a working lockdep currently, due to the runqueue lock classes.

Also, when these limits reached triggered, we get the same printk
wakeup deadlock problem I hit with Peter's patch.

I think a non-trivial number of people hit that printk deadlock bug,
but just didn't report it because the machine essentially hard hangs
silently.  At best you'd see the:

========================================

initial line from lockdep, but often even that doesn't make it to the
console.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-08-01  9:32             ` David Miller
@ 2008-08-01 11:57               ` Hugh Dickins
  2008-08-03  8:14                 ` David Miller
  0 siblings, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2008-08-01 11:57 UTC (permalink / raw)
  To: David Miller; +Cc: mingo, peterz, linux-kernel

On Fri, 1 Aug 2008, David Miller wrote:
> 
> I think a non-trivial number of people hit that printk deadlock bug,
> but just didn't report it because the machine essentially hard hangs
> silently.  At best you'd see the:
> 
> ========================================
> 
> initial line from lockdep, but often even that doesn't make it to the
> console.

Not just with lockdep.  A couple of months back CONFIG_DEBUG_SPINLOCK
was giving me spinlock debug messages under load on x86_64 duo laptop,
but hung up before showing the intended backtrace.  I added the patch
below, but IIRC shifted over to other issues (and switched from cfq to
anticipatory scheduler, which made a difference) before finding out if
this patch really helped.  If it makes sense to you, please add it in.

Hugh

[PATCH] set oops_in_progress for spinlock lockup

Set oops_in_progress while reporting spinlock lockup,
to avoid deadlock inside printk() or the backtraces.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 lib/spinlock_debug.c |    2 ++
 1 file changed, 2 insertions(+)

--- 2.6.27-rc1/lib/spinlock_debug.c	2008-01-24 22:58:37.000000000 +0000
+++ linux/lib/spinlock_debug.c	2008-08-01 12:41:52.000000000 +0100
@@ -113,6 +113,7 @@ static void __spin_lock_debug(spinlock_t
 		/* lockup suspected: */
 		if (print_once) {
 			print_once = 0;
+			oops_in_progress = 1;
 			printk(KERN_EMERG "BUG: spinlock lockup on CPU#%d, "
 					"%s/%d, %p\n",
 				raw_smp_processor_id(), current->comm,
@@ -121,6 +122,7 @@ static void __spin_lock_debug(spinlock_t
 #ifdef CONFIG_SMP
 			trigger_all_cpu_backtrace();
 #endif
+			oops_in_progress = 0;
 		}
 	}
 }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-08-01 11:57               ` Hugh Dickins
@ 2008-08-03  8:14                 ` David Miller
  2008-08-04 12:21                   ` Hugh Dickins
  0 siblings, 1 reply; 21+ messages in thread
From: David Miller @ 2008-08-03  8:14 UTC (permalink / raw)
  To: hugh; +Cc: mingo, peterz, linux-kernel

From: Hugh Dickins <hugh@veritas.com>
Date: Fri, 1 Aug 2008 12:57:27 +0100 (BST)

> @@ -113,6 +113,7 @@ static void __spin_lock_debug(spinlock_t
>  		/* lockup suspected: */
>  		if (print_once) {
>  			print_once = 0;
> +			oops_in_progress = 1;
>  			printk(KERN_EMERG "BUG: spinlock lockup on CPU#%d, "
>  					"%s/%d, %p\n",
>  				raw_smp_processor_id(), current->comm,
> @@ -121,6 +122,7 @@ static void __spin_lock_debug(spinlock_t
>  #ifdef CONFIG_SMP
>  			trigger_all_cpu_backtrace();
>  #endif
> +			oops_in_progress = 0;
>  		}
>  	}
>  }

It's probably best to not later clear oops_in_progress when we trigger
an event like this, to ensure that we do actually get any followon
messages on the console.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: combinatorial explosion in lockdep
  2008-08-03  8:14                 ` David Miller
@ 2008-08-04 12:21                   ` Hugh Dickins
  0 siblings, 0 replies; 21+ messages in thread
From: Hugh Dickins @ 2008-08-04 12:21 UTC (permalink / raw)
  To: David Miller; +Cc: mingo, peterz, linux-kernel

On Sun, 3 Aug 2008, David Miller wrote:
> 
> It's probably best to not later clear oops_in_progress when we trigger
> an event like this, to ensure that we do actually get any followon
> messages on the console.

Ah, so it was intentional that your patch set oops_in_progess without
ever clearing it again.  Hmm.  I think I'd reword your "probably best"
to "arguably best".  If everything really locks up at this point, then
there is no point to clearing oops_in_progress after; but if the system
manages to resume (arguably) normal operation after, then leaving it
forever oops_in_progess worries me, and differs from current practice.

I think I'd rather clear it afterwards in any public patch; but edit
that out privately if it helps while debugging some particular problem.
I did try to reproduce my spinlock lockups yesterday, but without
success, so have no practical experience one way or the other.

But notice that I shouldn't be messing directly with oops_in_progress:
better to use bust_spinlocks() (oops_in_progress++/-- and wake klogd).

[PATCH] bust_spinlocks while reporting spinlock lockup

Use bust_spinlocks() while reporting spinlock lockup to avoid deadlock
inside printk() or the backtraces.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 lib/spinlock_debug.c |    2 ++
 1 file changed, 2 insertions(+)

--- 2.6.27-rc1/lib/spinlock_debug.c	2008-01-24 22:58:37.000000000 +0000
+++ linux/lib/spinlock_debug.c	2008-08-01 12:41:52.000000000 +0100
@@ -113,6 +113,7 @@ static void __spin_lock_debug(spinlock_t
 		/* lockup suspected: */
 		if (print_once) {
 			print_once = 0;
+			bust_spinlocks(1);
 			printk(KERN_EMERG "BUG: spinlock lockup on CPU#%d, "
 					"%s/%d, %p\n",
 				raw_smp_processor_id(), current->comm,
@@ -121,6 +122,7 @@ static void __spin_lock_debug(spinlock_t
 #ifdef CONFIG_SMP
 			trigger_all_cpu_backtrace();
 #endif
+			bust_spinlocks(0);
 		}
 	}
 }

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-08-04 12:22 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-28 22:37 combinatorial explosion in lockdep David Miller
2008-07-28 22:50 ` Jeremy Fitzhardinge
2008-07-28 22:56   ` David Miller
2008-07-28 23:13 ` David Miller
2008-07-28 23:51   ` Ingo Molnar
2008-07-29  0:44     ` David Miller
2008-07-30  4:45       ` David Miller
2008-07-30  6:56         ` David Miller
2008-07-30  7:21           ` Peter Zijlstra
2008-07-30  7:19         ` Peter Zijlstra
2008-07-30 11:15         ` Peter Zijlstra
2008-07-31 16:50           ` [stable] " Greg KH
2008-07-30 11:26         ` [PATCH] lockdep: change scheduler annotation Peter Zijlstra
2008-07-30 11:34           ` David Miller
2008-07-31 16:34           ` Ingo Molnar
2008-07-31 16:39         ` combinatorial explosion in lockdep Ingo Molnar
2008-08-01  9:22           ` Ingo Molnar
2008-08-01  9:32             ` David Miller
2008-08-01 11:57               ` Hugh Dickins
2008-08-03  8:14                 ` David Miller
2008-08-04 12:21                   ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).