* combinatorial explosion in lockdep @ 2008-07-28 22:37 David Miller 2008-07-28 22:50 ` Jeremy Fitzhardinge 2008-07-28 23:13 ` David Miller 0 siblings, 2 replies; 21+ messages in thread From: David Miller @ 2008-07-28 22:37 UTC (permalink / raw) To: mingo; +Cc: peterz, linux-kernel As Peter and some others have known I've been trying to track down a full system wedge that occurs early in the userland boot on sparc64 Niagara systems when lockdep is enabled. This has been happening since I first added sparc64 lockdep support, the problem has always been there. Some fiddling around recently showed that booting with max_cpus=16 allowed the system to fully boot, albeit very slowly. That was a clue. My suspicion became that somehow the number of active runqueues contributes to the problem. Also, the problem seems to occur when udev is forked off to load driver modules. So I added a piece of debugging that would capture all cpu's program counters and initial backtrace even if they had interrupts disabled, when the softlockup watchdog triggers. Then I lowered the softlockup watchdog threshold to only a few seconds so that I could more quickly trigger the dumps. The triggering event seems to be kstopmachine as done by the first module load. This seems to create the situation where we do double RQ locking on essentially all the run queues, in order to move the kstopmachine threads around to their proprer cpus. I think this is what starts to make the lockdep dependency chains huge. Then, all the cpus wedge trying to grab a particular RQ lock, with one of the cpus cycling seemingly endlessly in find_usage_backwards(). At first I thought we were getting killed by recurion on sparc64. After going 8 or more entries deep, every recursive call or return from such a call will write or read 128 bytes from the stack for the window spill/fill traps sparc uses to manage the register windows. So I rewrote find_usage_{backwards/forwards} and check_usage() to use iteration instead of recurion. The problem persisted, and I was still seeing one of the cpus spending tons of time in find_usage_backwards(). So I added debugging to count how many lock_class entries are visited in a single top-level invocation of find_usage_{backwards,forwards}() and then I printed out this information when the softlockup watchdog triggered. We find: [ 1287.450897] BUG: max_forwards_checks[73290] max_backwards_checks[56968881] [ 1287.456186] BUG: max_forwards_class --> (&rq->rq_lock_key#3){++..} [ 1287.461866] BUG: max_backwards_class --> (&rt_b->rt_runtime_lock){+...} Just under 57 million lock_class entries visited in a single top-level find_usage_backwards() call, on rt_b->rt_runtime_lock. No wonder the machine grinds to a halt :-) And on the forwards side, the winner is the suspected runqueue lock, but it's not as bad as the backwards chain from the rt_runtime_lock. I'm still digging on what exactly makes this happen, but I wanted to get the information out as soon as I had something useful like this. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-07-28 22:37 combinatorial explosion in lockdep David Miller @ 2008-07-28 22:50 ` Jeremy Fitzhardinge 2008-07-28 22:56 ` David Miller 2008-07-28 23:13 ` David Miller 1 sibling, 1 reply; 21+ messages in thread From: Jeremy Fitzhardinge @ 2008-07-28 22:50 UTC (permalink / raw) To: David Miller; +Cc: mingo, peterz, linux-kernel, Rusty Russell David Miller wrote: > The triggering event seems to be kstopmachine as done by the first > module load. This seems to create the situation where we do double RQ > locking on essentially all the run queues, in order to move the > kstopmachine threads around to their proprer cpus. I think this is > what starts to make the lockdep dependency chains huge. > Does Rusty's stop_machine rewrite which puts the threads on the correct cpus to start with at least mitigate the symptoms? J ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-07-28 22:50 ` Jeremy Fitzhardinge @ 2008-07-28 22:56 ` David Miller 0 siblings, 0 replies; 21+ messages in thread From: David Miller @ 2008-07-28 22:56 UTC (permalink / raw) To: jeremy; +Cc: mingo, peterz, linux-kernel, rusty From: Jeremy Fitzhardinge <jeremy@goop.org> Date: Mon, 28 Jul 2008 15:50:04 -0700 > David Miller wrote: > > The triggering event seems to be kstopmachine as done by the first > > module load. This seems to create the situation where we do double RQ > > locking on essentially all the run queues, in order to move the > > kstopmachine threads around to their proprer cpus. I think this is > > what starts to make the lockdep dependency chains huge. > > > > Does Rusty's stop_machine rewrite which puts the threads on the correct > cpus to start with at least mitigate the symptoms? I could check that, but it won't keep this from happening and being triggerable by a trivial userland program setting scheduler affinities. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-07-28 22:37 combinatorial explosion in lockdep David Miller 2008-07-28 22:50 ` Jeremy Fitzhardinge @ 2008-07-28 23:13 ` David Miller 2008-07-28 23:51 ` Ingo Molnar 1 sibling, 1 reply; 21+ messages in thread From: David Miller @ 2008-07-28 23:13 UTC (permalink / raw) To: mingo; +Cc: peterz, linux-kernel From: David Miller <davem@davemloft.net> Date: Mon, 28 Jul 2008 15:37:02 -0700 (PDT) > I'm still digging on what exactly makes this happen, but I wanted to > get the information out as soon as I had something useful like this. As a simple hack, I tried mitigating these effects using a generation-count based "visited" scheme to bypass traversing lock_class chains we've already walked down. It seems to work. But I wonder if the real problem is that there is some unintended loop in the chains, or something like that. Anyways, just some more info... diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 2486eb4..1bfdc30 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -89,6 +89,7 @@ struct lock_class { struct lockdep_subclass_key *key; unsigned int subclass; + unsigned int dep_gen_id; /* * IRQ/softirq usage tracking bits: diff --git a/kernel/lockdep.c b/kernel/lockdep.c index d38a643..59218a0 100644 --- a/kernel/lockdep.c +++ b/kernel/lockdep.c @@ -959,6 +959,18 @@ static int noinline print_infinite_recursion_bug(void) return 0; } +static unsigned int dependency_gen_id; + +static bool dependency_visit(struct lock_class *source, unsigned int depth) +{ + if (!depth) + dependency_gen_id++; + if (source->dep_gen_id == dependency_gen_id) + return true; + source->dep_gen_id = dependency_gen_id; + return false; +} + /* * Prove that the dependency graph starting at <entry> can not * lead to <target>. Print an error and return 0 if it does. @@ -968,6 +980,9 @@ check_noncircular(struct lock_class *source, unsigned int depth) { struct lock_list *entry; + if (dependency_visit(source, depth)) + return 1; + debug_atomic_inc(&nr_cyclic_check_recursions); if (depth > max_recursion_depth) max_recursion_depth = depth; @@ -1011,6 +1026,9 @@ find_usage_forwards(struct lock_class *source, unsigned int depth) struct lock_list *entry; int ret; + if (dependency_visit(source, depth)) + return 1; + if (depth > max_recursion_depth) max_recursion_depth = depth; if (depth >= RECURSION_LIMIT) @@ -1050,6 +1068,9 @@ find_usage_backwards(struct lock_class *source, unsigned int depth) struct lock_list *entry; int ret; + if (dependency_visit(source, depth)) + return 1; + if (!__raw_spin_is_locked(&lockdep_lock)) return DEBUG_LOCKS_WARN_ON(1); ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-07-28 23:13 ` David Miller @ 2008-07-28 23:51 ` Ingo Molnar 2008-07-29 0:44 ` David Miller 0 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2008-07-28 23:51 UTC (permalink / raw) To: David Miller; +Cc: peterz, linux-kernel * David Miller <davem@davemloft.net> wrote: > From: David Miller <davem@davemloft.net> > Date: Mon, 28 Jul 2008 15:37:02 -0700 (PDT) > > > I'm still digging on what exactly makes this happen, but I wanted to > > get the information out as soon as I had something useful like this. > > As a simple hack, I tried mitigating these effects using a > generation-count based "visited" scheme to bypass traversing > lock_class chains we've already walked down. > > It seems to work. nice! Any chance to get the "cat /proc/lockdep*" output, so that we could see and check the expected behavior of the full graph? But with 128 cpus and double locking taking in the scheduler, there's around 127*128/2 possible combinations of cpu rq lock ordering, that's thousands of chains, and that takes a quadratic number of steps to iterate via the current algorithm - which would be in the neighborhood of 127^4 / 4, or about 65 million steps for each new change to the graph. (but it's late and i'm not sure at all about the numbers so i could be off by a few orders of magnitude, in either direction) Plus the irqsafe/softirqsafe layers multiply things too. Could you please also send the non-stack-iterator changes you did as well - that's cool stuff too! Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-07-28 23:51 ` Ingo Molnar @ 2008-07-29 0:44 ` David Miller 2008-07-30 4:45 ` David Miller 0 siblings, 1 reply; 21+ messages in thread From: David Miller @ 2008-07-29 0:44 UTC (permalink / raw) To: mingo; +Cc: peterz, linux-kernel From: Ingo Molnar <mingo@elte.hu> Date: Tue, 29 Jul 2008 01:51:33 +0200 > Any chance to get the "cat /proc/lockdep*" output, so that we could see > and check the expected behavior of the full graph? /proc/lockdep loops forever in count_forward_deps() :-) I was able to capture a copy of lockdep_chains: http://vger.kernel.org/~davem/lockdep_chains.bz2 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-07-29 0:44 ` David Miller @ 2008-07-30 4:45 ` David Miller 2008-07-30 6:56 ` David Miller ` (4 more replies) 0 siblings, 5 replies; 21+ messages in thread From: David Miller @ 2008-07-30 4:45 UTC (permalink / raw) To: mingo; +Cc: peterz, linux-kernel From: David Miller <davem@davemloft.net> Date: Mon, 28 Jul 2008 17:44:15 -0700 (PDT) > From: Ingo Molnar <mingo@elte.hu> > Date: Tue, 29 Jul 2008 01:51:33 +0200 > > > Any chance to get the "cat /proc/lockdep*" output, so that we could see > > and check the expected behavior of the full graph? > > /proc/lockdep loops forever in count_forward_deps() :-) > > I was able to capture a copy of lockdep_chains: > > http://vger.kernel.org/~davem/lockdep_chains.bz2 As a followup I dumped the full backwards search graph once the cost got up to about (1 * 1024 * 1024) checks or so. I didn't find any loops, but it is clear that the cost is huge because of the runqueue lock double-locking, without the generation count patch I posted the other day. For example, if you start with the first runqueue lock you search one entry: 1 Next, if you start with the second runqueue lock you search two entries: 2, 1 Next, if you start with the third runqueue lock you search 4 entries: 3, 2, 1, 1 And the next series is: 4, 3, 2, 1, 1, 2, 1, 1 and so on and so forth. So the cost of a full backwards graph traversal for N cpus is on the order of "1 << (N - 1)". So with just 32 cpus the cost is on the order of a few billions of checks. And then you have to factor in all of those runqueue locks's backwards graphs that don't involve other runqueue locks (on my machine each such sub-graph is about 200 locks deep). Here is an updated version of my patch to solve this problem. The only unnice bit is that I had to move the procfs dep counting code into lockdep.c and run it under the lockdep_lock. This is the only way to safely employ the dependency generation ID marking algorithm the short-circuiting uses. If we can agree on this as a fix, it should definitely be backported and submitted for -stable :-) ---------------------------------------- lockdep: Fix combinatorial explosion in lock subgraph traversal. When we traverse the graph, either forwards or backwards, we are interested in whether a certain property exists somewhere in a node reachable in the graph. Therefore it is never necessary to traverse through a node more than once to get a correct answer to the given query. Take advantage of this property using a global ID counter so that we need not clear all the markers in all the lock_class entries before doing a traversal. A new ID is choosen when we start to traverse, and we continue through a lock_class only if it's ID hasn't been marked with the new value yet. This short-circuiting is essential especially for high CPU count systems. The scheduler has a runqueue per cpu, and needs to take two runqueue locks at a time, which leads to long chains of backwards and forwards subgraphs from these runqueue lock nodes. Without the short-circuit implemented here, a graph traversal on a runqueue lock can take up to (1 << (N - 1)) checks on a system with N cpus. For anything more than 16 cpus or so, lockdep will eventually bring the machine to a complete standstill. Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 2486eb4..1bfdc30 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -89,6 +89,7 @@ struct lock_class { struct lockdep_subclass_key *key; unsigned int subclass; + unsigned int dep_gen_id; /* * IRQ/softirq usage tracking bits: diff --git a/kernel/lockdep.c b/kernel/lockdep.c index d38a643..6999e64 100644 --- a/kernel/lockdep.c +++ b/kernel/lockdep.c @@ -372,6 +372,19 @@ unsigned int nr_process_chains; unsigned int max_lockdep_depth; unsigned int max_recursion_depth; +static unsigned int lockdep_dependency_gen_id; + +static bool lockdep_dependency_visit(struct lock_class *source, + unsigned int depth) +{ + if (!depth) + lockdep_dependency_gen_id++; + if (source->dep_gen_id == lockdep_dependency_gen_id) + return true; + source->dep_gen_id = lockdep_dependency_gen_id; + return false; +} + #ifdef CONFIG_DEBUG_LOCKDEP /* * We cannot printk in early bootup code. Not even early_printk() @@ -558,6 +571,9 @@ static void print_lock_dependencies(struct lock_class *class, int depth) { struct lock_list *entry; + if (lockdep_dependency_visit(class, depth)) + return; + if (DEBUG_LOCKS_WARN_ON(depth >= 20)) return; @@ -959,6 +975,67 @@ static int noinline print_infinite_recursion_bug(void) return 0; } +unsigned long __lockdep_count_forward_deps(struct lock_class *class, + unsigned int depth) +{ + struct lock_list *entry; + unsigned long ret = 1; + + if (lockdep_dependency_visit(class, depth)) + return 0; + + /* + * Recurse this class's dependency list: + */ + list_for_each_entry(entry, &class->locks_after, entry) + ret += __lockdep_count_forward_deps(entry->class, depth + 1); + + return ret; +} + +unsigned long lockdep_count_forward_deps(struct lock_class *class) +{ + unsigned long ret, flags; + + local_irq_save(flags); + __raw_spin_lock(&lockdep_lock); + ret = __lockdep_count_forward_deps(class, 0); + __raw_spin_unlock(&lockdep_lock); + local_irq_restore(flags); + + return ret; +} + +unsigned long __lockdep_count_backward_deps(struct lock_class *class, + unsigned int depth) +{ + struct lock_list *entry; + unsigned long ret = 1; + + if (lockdep_dependency_visit(class, depth)) + return 0; + /* + * Recurse this class's dependency list: + */ + list_for_each_entry(entry, &class->locks_before, entry) + ret += __lockdep_count_backward_deps(entry->class, depth + 1); + + return ret; +} + +unsigned long lockdep_count_backward_deps(struct lock_class *class) +{ + unsigned long ret, flags; + + local_irq_save(flags); + __raw_spin_lock(&lockdep_lock); + ret = __lockdep_count_backward_deps(class, 0); + __raw_spin_unlock(&lockdep_lock); + local_irq_restore(flags); + + return ret; +} + /* * Prove that the dependency graph starting at <entry> can not * lead to <target>. Print an error and return 0 if it does. @@ -968,6 +1045,9 @@ check_noncircular(struct lock_class *source, unsigned int depth) { struct lock_list *entry; + if (lockdep_dependency_visit(source, depth)) + return 1; + debug_atomic_inc(&nr_cyclic_check_recursions); if (depth > max_recursion_depth) max_recursion_depth = depth; @@ -1011,6 +1091,9 @@ find_usage_forwards(struct lock_class *source, unsigned int depth) struct lock_list *entry; int ret; + if (lockdep_dependency_visit(source, depth)) + return 1; + if (depth > max_recursion_depth) max_recursion_depth = depth; if (depth >= RECURSION_LIMIT) @@ -1050,6 +1133,9 @@ find_usage_backwards(struct lock_class *source, unsigned int depth) struct lock_list *entry; int ret; + if (lockdep_dependency_visit(source, depth)) + return 1; + if (!__raw_spin_is_locked(&lockdep_lock)) return DEBUG_LOCKS_WARN_ON(1); diff --git a/kernel/lockdep_internals.h b/kernel/lockdep_internals.h index c3600a0..68d44ec 100644 --- a/kernel/lockdep_internals.h +++ b/kernel/lockdep_internals.h @@ -53,6 +53,9 @@ extern unsigned int nr_process_chains; extern unsigned int max_lockdep_depth; extern unsigned int max_recursion_depth; +extern unsigned long lockdep_count_forward_deps(struct lock_class *); +extern unsigned long lockdep_count_backward_deps(struct lock_class *); + #ifdef CONFIG_DEBUG_LOCKDEP /* * Various lockdep statistics: diff --git a/kernel/lockdep_proc.c b/kernel/lockdep_proc.c index 9b0e940..6252ff7 100644 --- a/kernel/lockdep_proc.c +++ b/kernel/lockdep_proc.c @@ -63,34 +63,6 @@ static void l_stop(struct seq_file *m, void *v) { } -static unsigned long count_forward_deps(struct lock_class *class) -{ - struct lock_list *entry; - unsigned long ret = 1; - - /* - * Recurse this class's dependency list: - */ - list_for_each_entry(entry, &class->locks_after, entry) - ret += count_forward_deps(entry->class); - - return ret; -} - -static unsigned long count_backward_deps(struct lock_class *class) -{ - struct lock_list *entry; - unsigned long ret = 1; - - /* - * Recurse this class's dependency list: - */ - list_for_each_entry(entry, &class->locks_before, entry) - ret += count_backward_deps(entry->class); - - return ret; -} - static void print_name(struct seq_file *m, struct lock_class *class) { char str[128]; @@ -124,10 +96,10 @@ static int l_show(struct seq_file *m, void *v) #ifdef CONFIG_DEBUG_LOCKDEP seq_printf(m, " OPS:%8ld", class->ops); #endif - nr_forward_deps = count_forward_deps(class); + nr_forward_deps = lockdep_count_forward_deps(class); seq_printf(m, " FD:%5ld", nr_forward_deps); - nr_backward_deps = count_backward_deps(class); + nr_backward_deps = lockdep_count_backward_deps(class); seq_printf(m, " BD:%5ld", nr_backward_deps); get_usage_chars(class, &c1, &c2, &c3, &c4); @@ -350,7 +322,7 @@ static int lockdep_stats_show(struct seq_file *m, void *v) if (class->usage_mask & LOCKF_ENABLED_HARDIRQS_READ) nr_hardirq_read_unsafe++; - sum_forward_deps += count_forward_deps(class); + sum_forward_deps += lockdep_count_forward_deps(class); } #ifdef CONFIG_DEBUG_LOCKDEP DEBUG_LOCKS_WARN_ON(debug_atomic_read(&nr_unused_locks) != nr_unused); ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-07-30 4:45 ` David Miller @ 2008-07-30 6:56 ` David Miller 2008-07-30 7:21 ` Peter Zijlstra 2008-07-30 7:19 ` Peter Zijlstra ` (3 subsequent siblings) 4 siblings, 1 reply; 21+ messages in thread From: David Miller @ 2008-07-30 6:56 UTC (permalink / raw) To: mingo; +Cc: peterz, linux-kernel From: David Miller <davem@davemloft.net> Date: Tue, 29 Jul 2008 21:45:03 -0700 (PDT) > If we can agree on this as a fix, it should definitely be backported > and submitted for -stable :-) > > ---------------------------------------- > > lockdep: Fix combinatorial explosion in lock subgraph traversal. So the next problem I run into is that 8192 is not a large enough MAX_LOCKDEP_ENTRIES for this larger cpu count boxes. 8192 isn't even big enough for my 64 cpu box. It needs to be scaled by NR_CPUS, or something like that. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-07-30 6:56 ` David Miller @ 2008-07-30 7:21 ` Peter Zijlstra 0 siblings, 0 replies; 21+ messages in thread From: Peter Zijlstra @ 2008-07-30 7:21 UTC (permalink / raw) To: David Miller; +Cc: mingo, linux-kernel On Tue, 2008-07-29 at 23:56 -0700, David Miller wrote: > From: David Miller <davem@davemloft.net> > Date: Tue, 29 Jul 2008 21:45:03 -0700 (PDT) > > > If we can agree on this as a fix, it should definitely be backported > > and submitted for -stable :-) > > > > ---------------------------------------- > > > > lockdep: Fix combinatorial explosion in lock subgraph traversal. > > So the next problem I run into is that 8192 is not a large > enough MAX_LOCKDEP_ENTRIES for this larger cpu count boxes. > 8192 isn't even big enough for my 64 cpu box. > > It needs to be scaled by NR_CPUS, or something like that. Or something like that indeed - if distro kernels are going to be build using NR_CPUS=4096, then these tables might explode - then again, lockdep is only enabled on -debug kernels anyway... So yeah. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-07-30 4:45 ` David Miller 2008-07-30 6:56 ` David Miller @ 2008-07-30 7:19 ` Peter Zijlstra 2008-07-30 11:15 ` Peter Zijlstra ` (2 subsequent siblings) 4 siblings, 0 replies; 21+ messages in thread From: Peter Zijlstra @ 2008-07-30 7:19 UTC (permalink / raw) To: David Miller; +Cc: mingo, linux-kernel On Tue, 2008-07-29 at 21:45 -0700, David Miller wrote: > From: David Miller <davem@davemloft.net> > Date: Mon, 28 Jul 2008 17:44:15 -0700 (PDT) > > > From: Ingo Molnar <mingo@elte.hu> > > Date: Tue, 29 Jul 2008 01:51:33 +0200 > > > > > Any chance to get the "cat /proc/lockdep*" output, so that we could see > > > and check the expected behavior of the full graph? > > > > /proc/lockdep loops forever in count_forward_deps() :-) > > > > I was able to capture a copy of lockdep_chains: > > > > http://vger.kernel.org/~davem/lockdep_chains.bz2 > > As a followup I dumped the full backwards search graph once the cost > got up to about (1 * 1024 * 1024) checks or so. > > I didn't find any loops, but it is clear that the cost is huge because > of the runqueue lock double-locking, without the generation count > patch I posted the other day. > > For example, if you start with the first runqueue lock you search one > entry: > > 1 > > Next, if you start with the second runqueue lock you search two > entries: > > 2, 1 > > Next, if you start with the third runqueue lock you search > 4 entries: > > 3, 2, 1, 1 > > And the next series is: > > 4, 3, 2, 1, 1, 2, 1, 1 > > and so on and so forth. So the cost of a full backwards graph > traversal for N cpus is on the order of "1 << (N - 1)". So with just > 32 cpus the cost is on the order of a few billions of checks. > > And then you have to factor in all of those runqueue locks's backwards > graphs that don't involve other runqueue locks (on my machine each > such sub-graph is about 200 locks deep). > > Here is an updated version of my patch to solve this problem. The only > unnice bit is that I had to move the procfs dep counting code into > lockdep.c and run it under the lockdep_lock. This is the only way to > safely employ the dependency generation ID marking algorithm the > short-circuiting uses. > > If we can agree on this as a fix, it should definitely be backported > and submitted for -stable :-) Way cool stuff - will try and wrap my brains around it asap. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-07-30 4:45 ` David Miller 2008-07-30 6:56 ` David Miller 2008-07-30 7:19 ` Peter Zijlstra @ 2008-07-30 11:15 ` Peter Zijlstra 2008-07-31 16:50 ` [stable] " Greg KH 2008-07-30 11:26 ` [PATCH] lockdep: change scheduler annotation Peter Zijlstra 2008-07-31 16:39 ` combinatorial explosion in lockdep Ingo Molnar 4 siblings, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2008-07-30 11:15 UTC (permalink / raw) To: David Miller; +Cc: mingo, linux-kernel, stable On Tue, 2008-07-29 at 21:45 -0700, David Miller wrote: > From: David Miller <davem@davemloft.net> > Date: Mon, 28 Jul 2008 17:44:15 -0700 (PDT) > > > From: Ingo Molnar <mingo@elte.hu> > > Date: Tue, 29 Jul 2008 01:51:33 +0200 > > > > > Any chance to get the "cat /proc/lockdep*" output, so that we could see > > > and check the expected behavior of the full graph? > > > > /proc/lockdep loops forever in count_forward_deps() :-) > > > > I was able to capture a copy of lockdep_chains: > > > > http://vger.kernel.org/~davem/lockdep_chains.bz2 > > As a followup I dumped the full backwards search graph once the cost > got up to about (1 * 1024 * 1024) checks or so. > > I didn't find any loops, but it is clear that the cost is huge because > of the runqueue lock double-locking, without the generation count > patch I posted the other day. > > For example, if you start with the first runqueue lock you search one > entry: > > 1 > > Next, if you start with the second runqueue lock you search two > entries: > > 2, 1 > > Next, if you start with the third runqueue lock you search > 4 entries: > > 3, 2, 1, 1 > > And the next series is: > > 4, 3, 2, 1, 1, 2, 1, 1 > > and so on and so forth. So the cost of a full backwards graph > traversal for N cpus is on the order of "1 << (N - 1)". So with just > 32 cpus the cost is on the order of a few billions of checks. > > And then you have to factor in all of those runqueue locks's backwards > graphs that don't involve other runqueue locks (on my machine each > such sub-graph is about 200 locks deep). > > Here is an updated version of my patch to solve this problem. The only > unnice bit is that I had to move the procfs dep counting code into > lockdep.c and run it under the lockdep_lock. This is the only way to > safely employ the dependency generation ID marking algorithm the > short-circuiting uses. > > If we can agree on this as a fix, it should definitely be backported > and submitted for -stable :-) Agreed adding the stable team to the CC > ---------------------------------------- > > lockdep: Fix combinatorial explosion in lock subgraph traversal. > > When we traverse the graph, either forwards or backwards, we > are interested in whether a certain property exists somewhere > in a node reachable in the graph. > > Therefore it is never necessary to traverse through a node more > than once to get a correct answer to the given query. > > Take advantage of this property using a global ID counter so that we > need not clear all the markers in all the lock_class entries before > doing a traversal. A new ID is choosen when we start to traverse, and > we continue through a lock_class only if it's ID hasn't been marked > with the new value yet. > > This short-circuiting is essential especially for high CPU count > systems. The scheduler has a runqueue per cpu, and needs to take > two runqueue locks at a time, which leads to long chains of > backwards and forwards subgraphs from these runqueue lock nodes. > Without the short-circuit implemented here, a graph traversal on > a runqueue lock can take up to (1 << (N - 1)) checks on a system > with N cpus. > > For anything more than 16 cpus or so, lockdep will eventually bring > the machine to a complete standstill. > > Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h > index 2486eb4..1bfdc30 100644 > --- a/include/linux/lockdep.h > +++ b/include/linux/lockdep.h > @@ -89,6 +89,7 @@ struct lock_class { > > struct lockdep_subclass_key *key; > unsigned int subclass; > + unsigned int dep_gen_id; > > /* > * IRQ/softirq usage tracking bits: > diff --git a/kernel/lockdep.c b/kernel/lockdep.c > index d38a643..6999e64 100644 > --- a/kernel/lockdep.c > +++ b/kernel/lockdep.c > @@ -372,6 +372,19 @@ unsigned int nr_process_chains; > unsigned int max_lockdep_depth; > unsigned int max_recursion_depth; > > +static unsigned int lockdep_dependency_gen_id; > + > +static bool lockdep_dependency_visit(struct lock_class *source, > + unsigned int depth) > +{ > + if (!depth) > + lockdep_dependency_gen_id++; > + if (source->dep_gen_id == lockdep_dependency_gen_id) > + return true; > + source->dep_gen_id = lockdep_dependency_gen_id; > + return false; > +} > + > #ifdef CONFIG_DEBUG_LOCKDEP > /* > * We cannot printk in early bootup code. Not even early_printk() > @@ -558,6 +571,9 @@ static void print_lock_dependencies(struct lock_class *class, int depth) > { > struct lock_list *entry; > > + if (lockdep_dependency_visit(class, depth)) > + return; > + > if (DEBUG_LOCKS_WARN_ON(depth >= 20)) > return; > > @@ -959,6 +975,67 @@ static int noinline print_infinite_recursion_bug(void) > return 0; > } > > +unsigned long __lockdep_count_forward_deps(struct lock_class *class, > + unsigned int depth) > +{ > + struct lock_list *entry; > + unsigned long ret = 1; > + > + if (lockdep_dependency_visit(class, depth)) > + return 0; > + > + /* > + * Recurse this class's dependency list: > + */ > + list_for_each_entry(entry, &class->locks_after, entry) > + ret += __lockdep_count_forward_deps(entry->class, depth + 1); > + > + return ret; > +} > + > +unsigned long lockdep_count_forward_deps(struct lock_class *class) > +{ > + unsigned long ret, flags; > + > + local_irq_save(flags); > + __raw_spin_lock(&lockdep_lock); > + ret = __lockdep_count_forward_deps(class, 0); > + __raw_spin_unlock(&lockdep_lock); > + local_irq_restore(flags); > + > + return ret; > +} > + > +unsigned long __lockdep_count_backward_deps(struct lock_class *class, > + unsigned int depth) > +{ > + struct lock_list *entry; > + unsigned long ret = 1; > + > + if (lockdep_dependency_visit(class, depth)) > + return 0; > + /* > + * Recurse this class's dependency list: > + */ > + list_for_each_entry(entry, &class->locks_before, entry) > + ret += __lockdep_count_backward_deps(entry->class, depth + 1); > + > + return ret; > +} > + > +unsigned long lockdep_count_backward_deps(struct lock_class *class) > +{ > + unsigned long ret, flags; > + > + local_irq_save(flags); > + __raw_spin_lock(&lockdep_lock); > + ret = __lockdep_count_backward_deps(class, 0); > + __raw_spin_unlock(&lockdep_lock); > + local_irq_restore(flags); > + > + return ret; > +} > + > /* > * Prove that the dependency graph starting at <entry> can not > * lead to <target>. Print an error and return 0 if it does. > @@ -968,6 +1045,9 @@ check_noncircular(struct lock_class *source, unsigned int depth) > { > struct lock_list *entry; > > + if (lockdep_dependency_visit(source, depth)) > + return 1; > + > debug_atomic_inc(&nr_cyclic_check_recursions); > if (depth > max_recursion_depth) > max_recursion_depth = depth; > @@ -1011,6 +1091,9 @@ find_usage_forwards(struct lock_class *source, unsigned int depth) > struct lock_list *entry; > int ret; > > + if (lockdep_dependency_visit(source, depth)) > + return 1; > + > if (depth > max_recursion_depth) > max_recursion_depth = depth; > if (depth >= RECURSION_LIMIT) > @@ -1050,6 +1133,9 @@ find_usage_backwards(struct lock_class *source, unsigned int depth) > struct lock_list *entry; > int ret; > > + if (lockdep_dependency_visit(source, depth)) > + return 1; > + > if (!__raw_spin_is_locked(&lockdep_lock)) > return DEBUG_LOCKS_WARN_ON(1); > > diff --git a/kernel/lockdep_internals.h b/kernel/lockdep_internals.h > index c3600a0..68d44ec 100644 > --- a/kernel/lockdep_internals.h > +++ b/kernel/lockdep_internals.h > @@ -53,6 +53,9 @@ extern unsigned int nr_process_chains; > extern unsigned int max_lockdep_depth; > extern unsigned int max_recursion_depth; > > +extern unsigned long lockdep_count_forward_deps(struct lock_class *); > +extern unsigned long lockdep_count_backward_deps(struct lock_class *); > + > #ifdef CONFIG_DEBUG_LOCKDEP > /* > * Various lockdep statistics: > diff --git a/kernel/lockdep_proc.c b/kernel/lockdep_proc.c > index 9b0e940..6252ff7 100644 > --- a/kernel/lockdep_proc.c > +++ b/kernel/lockdep_proc.c > @@ -63,34 +63,6 @@ static void l_stop(struct seq_file *m, void *v) > { > } > > -static unsigned long count_forward_deps(struct lock_class *class) > -{ > - struct lock_list *entry; > - unsigned long ret = 1; > - > - /* > - * Recurse this class's dependency list: > - */ > - list_for_each_entry(entry, &class->locks_after, entry) > - ret += count_forward_deps(entry->class); > - > - return ret; > -} > - > -static unsigned long count_backward_deps(struct lock_class *class) > -{ > - struct lock_list *entry; > - unsigned long ret = 1; > - > - /* > - * Recurse this class's dependency list: > - */ > - list_for_each_entry(entry, &class->locks_before, entry) > - ret += count_backward_deps(entry->class); > - > - return ret; > -} > - > static void print_name(struct seq_file *m, struct lock_class *class) > { > char str[128]; > @@ -124,10 +96,10 @@ static int l_show(struct seq_file *m, void *v) > #ifdef CONFIG_DEBUG_LOCKDEP > seq_printf(m, " OPS:%8ld", class->ops); > #endif > - nr_forward_deps = count_forward_deps(class); > + nr_forward_deps = lockdep_count_forward_deps(class); > seq_printf(m, " FD:%5ld", nr_forward_deps); > > - nr_backward_deps = count_backward_deps(class); > + nr_backward_deps = lockdep_count_backward_deps(class); > seq_printf(m, " BD:%5ld", nr_backward_deps); > > get_usage_chars(class, &c1, &c2, &c3, &c4); > @@ -350,7 +322,7 @@ static int lockdep_stats_show(struct seq_file *m, void *v) > if (class->usage_mask & LOCKF_ENABLED_HARDIRQS_READ) > nr_hardirq_read_unsafe++; > > - sum_forward_deps += count_forward_deps(class); > + sum_forward_deps += lockdep_count_forward_deps(class); > } > #ifdef CONFIG_DEBUG_LOCKDEP > DEBUG_LOCKS_WARN_ON(debug_atomic_read(&nr_unused_locks) != nr_unused); ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [stable] combinatorial explosion in lockdep 2008-07-30 11:15 ` Peter Zijlstra @ 2008-07-31 16:50 ` Greg KH 0 siblings, 0 replies; 21+ messages in thread From: Greg KH @ 2008-07-31 16:50 UTC (permalink / raw) To: Peter Zijlstra; +Cc: David Miller, mingo, linux-kernel, stable On Wed, Jul 30, 2008 at 01:15:19PM +0200, Peter Zijlstra wrote: > On Tue, 2008-07-29 at 21:45 -0700, David Miller wrote: > > From: David Miller <davem@davemloft.net> > > Date: Mon, 28 Jul 2008 17:44:15 -0700 (PDT) > > > > > From: Ingo Molnar <mingo@elte.hu> > > > Date: Tue, 29 Jul 2008 01:51:33 +0200 > > > > > > > Any chance to get the "cat /proc/lockdep*" output, so that we could see > > > > and check the expected behavior of the full graph? > > > > > > /proc/lockdep loops forever in count_forward_deps() :-) > > > > > > I was able to capture a copy of lockdep_chains: > > > > > > http://vger.kernel.org/~davem/lockdep_chains.bz2 > > > > As a followup I dumped the full backwards search graph once the cost > > got up to about (1 * 1024 * 1024) checks or so. > > > > I didn't find any loops, but it is clear that the cost is huge because > > of the runqueue lock double-locking, without the generation count > > patch I posted the other day. > > > > For example, if you start with the first runqueue lock you search one > > entry: > > > > 1 > > > > Next, if you start with the second runqueue lock you search two > > entries: > > > > 2, 1 > > > > Next, if you start with the third runqueue lock you search > > 4 entries: > > > > 3, 2, 1, 1 > > > > And the next series is: > > > > 4, 3, 2, 1, 1, 2, 1, 1 > > > > and so on and so forth. So the cost of a full backwards graph > > traversal for N cpus is on the order of "1 << (N - 1)". So with just > > 32 cpus the cost is on the order of a few billions of checks. > > > > And then you have to factor in all of those runqueue locks's backwards > > graphs that don't involve other runqueue locks (on my machine each > > such sub-graph is about 200 locks deep). > > > > Here is an updated version of my patch to solve this problem. The only > > unnice bit is that I had to move the procfs dep counting code into > > lockdep.c and run it under the lockdep_lock. This is the only way to > > safely employ the dependency generation ID marking algorithm the > > short-circuiting uses. > > > > If we can agree on this as a fix, it should definitely be backported > > and submitted for -stable :-) > > Agreed adding the stable team to the CC Great, please send the stable team the patch when it hits Linus's tree. thanks, greg k-h ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] lockdep: change scheduler annotation 2008-07-30 4:45 ` David Miller ` (2 preceding siblings ...) 2008-07-30 11:15 ` Peter Zijlstra @ 2008-07-30 11:26 ` Peter Zijlstra 2008-07-30 11:34 ` David Miller 2008-07-31 16:34 ` Ingo Molnar 2008-07-31 16:39 ` combinatorial explosion in lockdep Ingo Molnar 4 siblings, 2 replies; 21+ messages in thread From: Peter Zijlstra @ 2008-07-30 11:26 UTC (permalink / raw) To: David Miller; +Cc: mingo, linux-kernel While thinking about David's patch it _finally_ dawned on me that there is no reason we have a lock class per cpu.. Sorry for being dense :-/ The below changes the annotation from a lock class per cpu, to a single nested lock, as the scheduler never holds more that 2 rq locks at a time anyway. If there was code requiring holding all rq locks this would not work and the original annotation would be the only option, but that not being the case, this is a much lighter one. Compiles and boots on a 2-way x86_64. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- kernel/sched.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) Index: linux-2.6/kernel/sched.c =================================================================== --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -600,7 +600,6 @@ struct rq { /* BKL stats */ unsigned int bkl_count; #endif - struct lock_class_key rq_lock_key; }; static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); @@ -2759,10 +2758,10 @@ static void double_rq_lock(struct rq *rq } else { if (rq1 < rq2) { spin_lock(&rq1->lock); - spin_lock(&rq2->lock); + spin_lock_nested(&rq2->lock, SINGLE_DEPTH_NESTING); } else { spin_lock(&rq2->lock); - spin_lock(&rq1->lock); + spin_lock_nested(&rq1->lock, SINGLE_DEPTH_NESTING); } } update_rq_clock(rq1); @@ -2805,10 +2804,10 @@ static int double_lock_balance(struct rq if (busiest < this_rq) { spin_unlock(&this_rq->lock); spin_lock(&busiest->lock); - spin_lock(&this_rq->lock); + spin_lock_nested(&this_rq->lock, SINGLE_DEPTH_NESTING); ret = 1; } else - spin_lock(&busiest->lock); + spin_lock_nested(&busiest->lock, SINGLE_DEPTH_NESTING); } return ret; } @@ -8000,7 +7999,6 @@ void __init sched_init(void) rq = cpu_rq(i); spin_lock_init(&rq->lock); - lockdep_set_class(&rq->lock, &rq->rq_lock_key); rq->nr_running = 0; init_cfs_rq(&rq->cfs, rq); init_rt_rq(&rq->rt, rq); ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] lockdep: change scheduler annotation 2008-07-30 11:26 ` [PATCH] lockdep: change scheduler annotation Peter Zijlstra @ 2008-07-30 11:34 ` David Miller 2008-07-31 16:34 ` Ingo Molnar 1 sibling, 0 replies; 21+ messages in thread From: David Miller @ 2008-07-30 11:34 UTC (permalink / raw) To: peterz; +Cc: mingo, linux-kernel From: Peter Zijlstra <peterz@infradead.org> Date: Wed, 30 Jul 2008 13:26:57 +0200 > While thinking about David's patch it _finally_ dawned on me that there > is no reason we have a lock class per cpu.. > > Sorry for being dense :-/ > > The below changes the annotation from a lock class per cpu, to a single > nested lock, as the scheduler never holds more that 2 rq locks at a time > anyway. > > If there was code requiring holding all rq locks this would not work and > the original annotation would be the only option, but that not being the > case, this is a much lighter one. > > Compiles and boots on a 2-way x86_64. > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> I had been wondering this entire debugging session why the per-rq lock classes were even there, thanks for getting rid of them :) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] lockdep: change scheduler annotation 2008-07-30 11:26 ` [PATCH] lockdep: change scheduler annotation Peter Zijlstra 2008-07-30 11:34 ` David Miller @ 2008-07-31 16:34 ` Ingo Molnar 1 sibling, 0 replies; 21+ messages in thread From: Ingo Molnar @ 2008-07-31 16:34 UTC (permalink / raw) To: Peter Zijlstra; +Cc: David Miller, linux-kernel * Peter Zijlstra <peterz@infradead.org> wrote: > While thinking about David's patch it _finally_ dawned on me that > there is no reason we have a lock class per cpu.. /me arghs too! :) > Sorry for being dense :-/ > > The below changes the annotation from a lock class per cpu, to a > single nested lock, as the scheduler never holds more that 2 rq locks > at a time anyway. > > If there was code requiring holding all rq locks this would not work > and the original annotation would be the only option, but that not > being the case, this is a much lighter one. > > Compiles and boots on a 2-way x86_64. applied to tip/sched/urgent - thanks Peter! Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-07-30 4:45 ` David Miller ` (3 preceding siblings ...) 2008-07-30 11:26 ` [PATCH] lockdep: change scheduler annotation Peter Zijlstra @ 2008-07-31 16:39 ` Ingo Molnar 2008-08-01 9:22 ` Ingo Molnar 4 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2008-07-31 16:39 UTC (permalink / raw) To: David Miller; +Cc: peterz, linux-kernel * David Miller <davem@davemloft.net> wrote: > lockdep: Fix combinatorial explosion in lock subgraph traversal. applied to tip/core/locking - thanks David. I guess we need to test this a bit, the patch is far from simple :-) Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-07-31 16:39 ` combinatorial explosion in lockdep Ingo Molnar @ 2008-08-01 9:22 ` Ingo Molnar 2008-08-01 9:32 ` David Miller 0 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2008-08-01 9:22 UTC (permalink / raw) To: David Miller; +Cc: peterz, linux-kernel * Ingo Molnar <mingo@elte.hu> wrote: > > * David Miller <davem@davemloft.net> wrote: > > > lockdep: Fix combinatorial explosion in lock subgraph traversal. > > applied to tip/core/locking - thanks David. I guess we need to test > this a bit, the patch is far from simple :-) small build fallout fix below. Ingo ----------------> >From 8887b1c04671c66dcde3d4104913ba9587e1e40c Mon Sep 17 00:00:00 2001 From: Ingo Molnar <mingo@elte.hu> Date: Fri, 1 Aug 2008 11:23:50 +0200 Subject: [PATCH] lockdep: build fix fix: kernel/built-in.o: In function `lockdep_stats_show': lockdep_proc.c:(.text+0x3cb2f): undefined reference to `lockdep_count_forward_deps' kernel/built-in.o: In function `l_show': lockdep_proc.c:(.text+0x3d02b): undefined reference to `lockdep_count_forward_deps' lockdep_proc.c:(.text+0x3d047): undefined reference to `lockdep_count_backward_deps' Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/lockdep_internals.h | 13 +++++++++++++ 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/kernel/lockdep_internals.h b/kernel/lockdep_internals.h index 68d44ec..f5c6a14 100644 --- a/kernel/lockdep_internals.h +++ b/kernel/lockdep_internals.h @@ -53,8 +53,21 @@ extern unsigned int nr_process_chains; extern unsigned int max_lockdep_depth; extern unsigned int max_recursion_depth; +#ifdef CONFIG_PROVE_LOCKING extern unsigned long lockdep_count_forward_deps(struct lock_class *); extern unsigned long lockdep_count_backward_deps(struct lock_class *); +#else +static inline unsigned long +lockdep_count_forward_deps(struct lock_class *class) +{ + return 0; +} +static inline unsigned long +lockdep_count_backward_deps(struct lock_class *class) +{ + return 0; +} +#endif #ifdef CONFIG_DEBUG_LOCKDEP /* ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-08-01 9:22 ` Ingo Molnar @ 2008-08-01 9:32 ` David Miller 2008-08-01 11:57 ` Hugh Dickins 0 siblings, 1 reply; 21+ messages in thread From: David Miller @ 2008-08-01 9:32 UTC (permalink / raw) To: mingo; +Cc: peterz, linux-kernel From: Ingo Molnar <mingo@elte.hu> Date: Fri, 1 Aug 2008 11:22:19 +0200 > > * Ingo Molnar <mingo@elte.hu> wrote: > > > > > * David Miller <davem@davemloft.net> wrote: > > > > > lockdep: Fix combinatorial explosion in lock subgraph traversal. > > > > applied to tip/core/locking - thanks David. I guess we need to test > > this a bit, the patch is far from simple :-) > > small build fallout fix below. Thanks. BTW, until something like Peter's attempt is working, we need to also scale some of the lockdep limits by NR_CPUS. The formula I came up with that worked with my 32-cpu, 64-cpu and 128-cpu machines was: #define __LOCKDEP_NR_CPU_SCALE \ ((NR_CPUS <= 16) ? 0 : ilog2(NR_CPUS) - 4) #define MAX_LOCKDEP_ENTRIES (8192UL << __LOCKDEP_NR_CPU_SCALE) #define MAX_LOCKDEP_CHAINS_BITS (16 + __LOCKDEP_NR_CPU_SCALE) #define MAX_LOCKDEP_CHAINS (1UL << MAX_LOCKDEP_CHAINS_BITS) But this is going to explode for NR_CPUS=4096, but it is the only way to get a working lockdep currently, due to the runqueue lock classes. Also, when these limits reached triggered, we get the same printk wakeup deadlock problem I hit with Peter's patch. I think a non-trivial number of people hit that printk deadlock bug, but just didn't report it because the machine essentially hard hangs silently. At best you'd see the: ======================================== initial line from lockdep, but often even that doesn't make it to the console. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-08-01 9:32 ` David Miller @ 2008-08-01 11:57 ` Hugh Dickins 2008-08-03 8:14 ` David Miller 0 siblings, 1 reply; 21+ messages in thread From: Hugh Dickins @ 2008-08-01 11:57 UTC (permalink / raw) To: David Miller; +Cc: mingo, peterz, linux-kernel On Fri, 1 Aug 2008, David Miller wrote: > > I think a non-trivial number of people hit that printk deadlock bug, > but just didn't report it because the machine essentially hard hangs > silently. At best you'd see the: > > ======================================== > > initial line from lockdep, but often even that doesn't make it to the > console. Not just with lockdep. A couple of months back CONFIG_DEBUG_SPINLOCK was giving me spinlock debug messages under load on x86_64 duo laptop, but hung up before showing the intended backtrace. I added the patch below, but IIRC shifted over to other issues (and switched from cfq to anticipatory scheduler, which made a difference) before finding out if this patch really helped. If it makes sense to you, please add it in. Hugh [PATCH] set oops_in_progress for spinlock lockup Set oops_in_progress while reporting spinlock lockup, to avoid deadlock inside printk() or the backtraces. Signed-off-by: Hugh Dickins <hugh@veritas.com> --- lib/spinlock_debug.c | 2 ++ 1 file changed, 2 insertions(+) --- 2.6.27-rc1/lib/spinlock_debug.c 2008-01-24 22:58:37.000000000 +0000 +++ linux/lib/spinlock_debug.c 2008-08-01 12:41:52.000000000 +0100 @@ -113,6 +113,7 @@ static void __spin_lock_debug(spinlock_t /* lockup suspected: */ if (print_once) { print_once = 0; + oops_in_progress = 1; printk(KERN_EMERG "BUG: spinlock lockup on CPU#%d, " "%s/%d, %p\n", raw_smp_processor_id(), current->comm, @@ -121,6 +122,7 @@ static void __spin_lock_debug(spinlock_t #ifdef CONFIG_SMP trigger_all_cpu_backtrace(); #endif + oops_in_progress = 0; } } } ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-08-01 11:57 ` Hugh Dickins @ 2008-08-03 8:14 ` David Miller 2008-08-04 12:21 ` Hugh Dickins 0 siblings, 1 reply; 21+ messages in thread From: David Miller @ 2008-08-03 8:14 UTC (permalink / raw) To: hugh; +Cc: mingo, peterz, linux-kernel From: Hugh Dickins <hugh@veritas.com> Date: Fri, 1 Aug 2008 12:57:27 +0100 (BST) > @@ -113,6 +113,7 @@ static void __spin_lock_debug(spinlock_t > /* lockup suspected: */ > if (print_once) { > print_once = 0; > + oops_in_progress = 1; > printk(KERN_EMERG "BUG: spinlock lockup on CPU#%d, " > "%s/%d, %p\n", > raw_smp_processor_id(), current->comm, > @@ -121,6 +122,7 @@ static void __spin_lock_debug(spinlock_t > #ifdef CONFIG_SMP > trigger_all_cpu_backtrace(); > #endif > + oops_in_progress = 0; > } > } > } It's probably best to not later clear oops_in_progress when we trigger an event like this, to ensure that we do actually get any followon messages on the console. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: combinatorial explosion in lockdep 2008-08-03 8:14 ` David Miller @ 2008-08-04 12:21 ` Hugh Dickins 0 siblings, 0 replies; 21+ messages in thread From: Hugh Dickins @ 2008-08-04 12:21 UTC (permalink / raw) To: David Miller; +Cc: mingo, peterz, linux-kernel On Sun, 3 Aug 2008, David Miller wrote: > > It's probably best to not later clear oops_in_progress when we trigger > an event like this, to ensure that we do actually get any followon > messages on the console. Ah, so it was intentional that your patch set oops_in_progess without ever clearing it again. Hmm. I think I'd reword your "probably best" to "arguably best". If everything really locks up at this point, then there is no point to clearing oops_in_progress after; but if the system manages to resume (arguably) normal operation after, then leaving it forever oops_in_progess worries me, and differs from current practice. I think I'd rather clear it afterwards in any public patch; but edit that out privately if it helps while debugging some particular problem. I did try to reproduce my spinlock lockups yesterday, but without success, so have no practical experience one way or the other. But notice that I shouldn't be messing directly with oops_in_progress: better to use bust_spinlocks() (oops_in_progress++/-- and wake klogd). [PATCH] bust_spinlocks while reporting spinlock lockup Use bust_spinlocks() while reporting spinlock lockup to avoid deadlock inside printk() or the backtraces. Signed-off-by: Hugh Dickins <hugh@veritas.com> --- lib/spinlock_debug.c | 2 ++ 1 file changed, 2 insertions(+) --- 2.6.27-rc1/lib/spinlock_debug.c 2008-01-24 22:58:37.000000000 +0000 +++ linux/lib/spinlock_debug.c 2008-08-01 12:41:52.000000000 +0100 @@ -113,6 +113,7 @@ static void __spin_lock_debug(spinlock_t /* lockup suspected: */ if (print_once) { print_once = 0; + bust_spinlocks(1); printk(KERN_EMERG "BUG: spinlock lockup on CPU#%d, " "%s/%d, %p\n", raw_smp_processor_id(), current->comm, @@ -121,6 +122,7 @@ static void __spin_lock_debug(spinlock_t #ifdef CONFIG_SMP trigger_all_cpu_backtrace(); #endif + bust_spinlocks(0); } } } ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2008-08-04 12:22 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-07-28 22:37 combinatorial explosion in lockdep David Miller 2008-07-28 22:50 ` Jeremy Fitzhardinge 2008-07-28 22:56 ` David Miller 2008-07-28 23:13 ` David Miller 2008-07-28 23:51 ` Ingo Molnar 2008-07-29 0:44 ` David Miller 2008-07-30 4:45 ` David Miller 2008-07-30 6:56 ` David Miller 2008-07-30 7:21 ` Peter Zijlstra 2008-07-30 7:19 ` Peter Zijlstra 2008-07-30 11:15 ` Peter Zijlstra 2008-07-31 16:50 ` [stable] " Greg KH 2008-07-30 11:26 ` [PATCH] lockdep: change scheduler annotation Peter Zijlstra 2008-07-30 11:34 ` David Miller 2008-07-31 16:34 ` Ingo Molnar 2008-07-31 16:39 ` combinatorial explosion in lockdep Ingo Molnar 2008-08-01 9:22 ` Ingo Molnar 2008-08-01 9:32 ` David Miller 2008-08-01 11:57 ` Hugh Dickins 2008-08-03 8:14 ` David Miller 2008-08-04 12:21 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).