All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH, resend] eliminate spurious pointless WARN_ON()s
@ 2009-03-12 13:21 Jan Beulich
  2009-03-12 13:48 ` Andi Kleen
                   ` (5 more replies)
  0 siblings, 6 replies; 35+ messages in thread
From: Jan Beulich @ 2009-03-12 13:21 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andi Kleen, linux-kernel

Namely during early boot, the panic() or BUG() paths may end up in
smp_call_function_*() with just a single online CPU. In that situation
the warnings generated are not only meaningless, but also result in
relevant output being cluttered.

Therefore, defer the WARN_ON() checks until after the (unaffected from
the problem that is being attempted to be detected here) cases have
been handled.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>

---
 kernel/smp.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- tip/kernel/smp.c
+++ tip-smp-call-defer-warn/kernel/smp.c
@@ -282,9 +282,6 @@ int smp_call_function_single(int cpu, vo
 	 */
 	this_cpu = get_cpu();
 
-	/* Can deadlock when called with interrupts disabled */
-	WARN_ON(irqs_disabled());
-
 	if (cpu == this_cpu) {
 		local_irq_save(flags);
 		func(info);
@@ -292,6 +289,9 @@ int smp_call_function_single(int cpu, vo
 		if ((unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) {
 			struct call_single_data *data = &d;
 
+			/* Can deadlock when called with interrupts disabled */
+			WARN_ON(irqs_disabled());
+
 			if (!wait)
 				data = &__get_cpu_var(csd_data);
 
@@ -365,9 +365,6 @@ void smp_call_function_many(const struct
 	unsigned long flags;
 	int cpu, next_cpu, this_cpu = smp_processor_id();
 
-	/* Can deadlock when called with interrupts disabled */
-	WARN_ON(irqs_disabled());
-
 	/* So, what's a CPU they want? Ignoring this one. */
 	cpu = cpumask_first_and(mask, cpu_online_mask);
 	if (cpu == this_cpu)
@@ -387,6 +384,9 @@ void smp_call_function_many(const struct
 		return;
 	}
 
+	/* Can deadlock when called with interrupts disabled */
+	WARN_ON(irqs_disabled());
+
 	data = &__get_cpu_var(cfd_data);
 	csd_lock(&data->csd);
 




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH, resend] eliminate spurious pointless WARN_ON()s
  2009-03-12 13:21 [PATCH, resend] eliminate spurious pointless WARN_ON()s Jan Beulich
@ 2009-03-12 13:48 ` Andi Kleen
  2009-03-13  8:52   ` Peter Zijlstra
  2009-03-13  1:39 ` [tip:core/ipi] generic-ipi: " Jan Beulich
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 35+ messages in thread
From: Andi Kleen @ 2009-03-12 13:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ingo Molnar, Andi Kleen, linux-kernel

On Thu, Mar 12, 2009 at 01:21:50PM +0000, Jan Beulich wrote:
> Namely during early boot, the panic() or BUG() paths may end up in
> smp_call_function_*() with just a single online CPU. In that situation
> the warnings generated are not only meaningless, but also result in
> relevant output being cluttered.

I actually have patches that just fix panic/shutdown to never call
smp_call_function(), but use an own vector. It does all kinds of other things
too that are not appropiate in panic, like allocating memory.

My main motivation was for machine checks which currently always
run into WARN_Ons when they panic with interrupts off, but as you
say there are other cases too like early boot.

Will post them later today or tomorrow.

-Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [tip:core/ipi] generic-ipi: eliminate spurious pointless WARN_ON()s
  2009-03-12 13:21 [PATCH, resend] eliminate spurious pointless WARN_ON()s Jan Beulich
  2009-03-12 13:48 ` Andi Kleen
@ 2009-03-13  1:39 ` Jan Beulich
  2009-03-13  8:54   ` Peter Zijlstra
  2009-03-13 10:36 ` [tip:core/ipi] generic-ipi: eliminate WARN_ON()s during oops/panic Ingo Molnar
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2009-03-13  1:39 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, jbeulich, tglx, mingo

Commit-ID:  6a09dfa870ba0ed21b1124539968a36b42660661
Gitweb:     http://git.kernel.org/tip/6a09dfa870ba0ed21b1124539968a36b42660661
Author:     Jan Beulich <jbeulich@novell.com>
AuthorDate: Thu, 12 Mar 2009 13:21:50 +0000
Commit:     Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 13 Mar 2009 02:14:41 +0100

generic-ipi: eliminate spurious pointless WARN_ON()s

Namely during early boot, the panic() or BUG() paths may end up in
smp_call_function_*() with just a single online CPU. In that situation
the warnings generated are not only meaningless, but also result in
relevant output being cluttered.

Therefore, defer the WARN_ON() checks until after the (unaffected from
the problem that is being attempted to be detected here) cases have
been handled.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B91A7E.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/smp.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 7ad2262..37b90a9 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -284,9 +284,6 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
 	 */
 	this_cpu = get_cpu();
 
-	/* Can deadlock when called with interrupts disabled */
-	WARN_ON(irqs_disabled());
-
 	if (cpu == this_cpu) {
 		local_irq_save(flags);
 		func(info);
@@ -295,6 +292,9 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
 		if ((unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) {
 			struct call_single_data *data = &d;
 
+			/* Can deadlock when called with interrupts disabled */
+			WARN_ON(irqs_disabled());
+
 			if (!wait)
 				data = &__get_cpu_var(csd_data);
 
@@ -364,9 +364,6 @@ void smp_call_function_many(const struct cpumask *mask,
 	unsigned long flags;
 	int cpu, next_cpu, this_cpu = smp_processor_id();
 
-	/* Can deadlock when called with interrupts disabled */
-	WARN_ON(irqs_disabled());
-
 	/* So, what's a CPU they want? Ignoring this one. */
 	cpu = cpumask_first_and(mask, cpu_online_mask);
 	if (cpu == this_cpu)
@@ -387,6 +384,9 @@ void smp_call_function_many(const struct cpumask *mask,
 		return;
 	}
 
+	/* Can deadlock when called with interrupts disabled */
+	WARN_ON(irqs_disabled());
+
 	data = &__get_cpu_var(cfd_data);
 	csd_lock(&data->csd);
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH, resend] eliminate spurious pointless WARN_ON()s
  2009-03-12 13:48 ` Andi Kleen
@ 2009-03-13  8:52   ` Peter Zijlstra
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2009-03-13  8:52 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jan Beulich, Ingo Molnar, linux-kernel

On Thu, 2009-03-12 at 14:48 +0100, Andi Kleen wrote:
> On Thu, Mar 12, 2009 at 01:21:50PM +0000, Jan Beulich wrote:
> > Namely during early boot, the panic() or BUG() paths may end up in
> > smp_call_function_*() with just a single online CPU. In that situation
> > the warnings generated are not only meaningless, but also result in
> > relevant output being cluttered.
> 
> I actually have patches that just fix panic/shutdown to never call
> smp_call_function(), but use an own vector. It does all kinds of other things
> too that are not appropiate in panic, like allocating memory.

Oh? The only allocation that it does it at cpu-hotplug, where it
allocates a cpu-mask thingy, and that's only a real allocation on
NR_CPUS>64.




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [tip:core/ipi] generic-ipi: eliminate spurious pointless WARN_ON()s
  2009-03-13  1:39 ` [tip:core/ipi] generic-ipi: " Jan Beulich
@ 2009-03-13  8:54   ` Peter Zijlstra
  2009-03-13  9:21     ` [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s Jan Beulich
  2009-03-13  9:31     ` [tip:core/ipi] generic-ipi: eliminate spurious pointless WARN_ON()s Ingo Molnar
  0 siblings, 2 replies; 35+ messages in thread
From: Peter Zijlstra @ 2009-03-13  8:54 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, jbeulich, tglx, mingo; +Cc: linux-tip-commits

On Fri, 2009-03-13 at 01:39 +0000, Jan Beulich wrote:
> Commit-ID:  6a09dfa870ba0ed21b1124539968a36b42660661
> Gitweb:     http://git.kernel.org/tip/6a09dfa870ba0ed21b1124539968a36b42660661
> Author:     Jan Beulich <jbeulich@novell.com>
> AuthorDate: Thu, 12 Mar 2009 13:21:50 +0000
> Commit:     Ingo Molnar <mingo@elte.hu>
> CommitDate: Fri, 13 Mar 2009 02:14:41 +0100
> 
> generic-ipi: eliminate spurious pointless WARN_ON()s
> 
> Namely during early boot, the panic() or BUG() paths may end up in
> smp_call_function_*() with just a single online CPU. In that situation
> the warnings generated are not only meaningless, but also result in
> relevant output being cluttered.
> 
> Therefore, defer the WARN_ON() checks until after the (unaffected from
> the problem that is being attempted to be detected here) cases have
> been handled.
> 
> Signed-off-by: Jan Beulich <jbeulich@novell.com>
> LKML-Reference: <49B91A7E.76E4.0078.0@novell.com>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>

I don't really like this,.. it moves the WARN_ON to weird locations
without an explanatory comment.

Wouldn't leaving them in place but changing them to:

WARN_ON(irqs_disabled() && system_state == SYSTEM_RUNNING);

be clearer?

> ---
>  kernel/smp.c |   12 ++++++------
>  1 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 7ad2262..37b90a9 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -284,9 +284,6 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
>  	 */
>  	this_cpu = get_cpu();
>  
> -	/* Can deadlock when called with interrupts disabled */
> -	WARN_ON(irqs_disabled());
> -
>  	if (cpu == this_cpu) {
>  		local_irq_save(flags);
>  		func(info);
> @@ -295,6 +292,9 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
>  		if ((unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) {
>  			struct call_single_data *data = &d;
>  
> +			/* Can deadlock when called with interrupts disabled */
> +			WARN_ON(irqs_disabled());
> +
>  			if (!wait)
>  				data = &__get_cpu_var(csd_data);
>  
> @@ -364,9 +364,6 @@ void smp_call_function_many(const struct cpumask *mask,
>  	unsigned long flags;
>  	int cpu, next_cpu, this_cpu = smp_processor_id();
>  
> -	/* Can deadlock when called with interrupts disabled */
> -	WARN_ON(irqs_disabled());
> -
>  	/* So, what's a CPU they want? Ignoring this one. */
>  	cpu = cpumask_first_and(mask, cpu_online_mask);
>  	if (cpu == this_cpu)
> @@ -387,6 +384,9 @@ void smp_call_function_many(const struct cpumask *mask,
>  		return;
>  	}
>  
> +	/* Can deadlock when called with interrupts disabled */
> +	WARN_ON(irqs_disabled());
> +
>  	data = &__get_cpu_var(cfd_data);
>  	csd_lock(&data->csd);
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s
  2009-03-13  8:54   ` Peter Zijlstra
@ 2009-03-13  9:21     ` Jan Beulich
  2009-03-13  9:43       ` Peter Zijlstra
  2009-03-13  9:31     ` [tip:core/ipi] generic-ipi: eliminate spurious pointless WARN_ON()s Ingo Molnar
  1 sibling, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2009-03-13  9:21 UTC (permalink / raw)
  To: mingo, Peter Zijlstra, tglx, mingo, hpa; +Cc: linux-kernel

>>> Peter Zijlstra <peterz@infradead.org> 13.03.09 09:54 >>>
>Wouldn't leaving them in place but changing them to:
>
>WARN_ON(irqs_disabled() && system_state == SYSTEM_RUNNING);
>
>be clearer?

I don't think that would be precise: system_state gets set to
SYSTEM_RUNNING much later than APs get brought up (i.e. there are
cases where the WARN_ON()s could validly trigger with SYSTEM_BOOTING),
and also doesn't cover states > SYSTEM_RUNNING (where, again, after
perhaps having brought down all APs the warnings could become pointless,
but the warnings could be meaningful as long as there are still some APs
online).

While from an abstract code reading perspective your suggestion might
seem reasonable, the changed placement really reflects what the warning
is trying to warn about - a potential deadlock which cannot occur under
the conditions filtered out by conditionals the warnings were moved
beyond.

Jan


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [tip:core/ipi] generic-ipi: eliminate spurious pointless WARN_ON()s
  2009-03-13  8:54   ` Peter Zijlstra
  2009-03-13  9:21     ` [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s Jan Beulich
@ 2009-03-13  9:31     ` Ingo Molnar
  1 sibling, 0 replies; 35+ messages in thread
From: Ingo Molnar @ 2009-03-13  9:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, hpa, linux-kernel, jbeulich, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2009-03-13 at 01:39 +0000, Jan Beulich wrote:
> > Commit-ID:  6a09dfa870ba0ed21b1124539968a36b42660661
> > Gitweb:     http://git.kernel.org/tip/6a09dfa870ba0ed21b1124539968a36b42660661
> > Author:     Jan Beulich <jbeulich@novell.com>
> > AuthorDate: Thu, 12 Mar 2009 13:21:50 +0000
> > Commit:     Ingo Molnar <mingo@elte.hu>
> > CommitDate: Fri, 13 Mar 2009 02:14:41 +0100
> > 
> > generic-ipi: eliminate spurious pointless WARN_ON()s
> > 
> > Namely during early boot, the panic() or BUG() paths may end up in
> > smp_call_function_*() with just a single online CPU. In that situation
> > the warnings generated are not only meaningless, but also result in
> > relevant output being cluttered.
> > 
> > Therefore, defer the WARN_ON() checks until after the (unaffected from
> > the problem that is being attempted to be detected here) cases have
> > been handled.
> > 
> > Signed-off-by: Jan Beulich <jbeulich@novell.com>
> > LKML-Reference: <49B91A7E.76E4.0078.0@novell.com>
> > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> I don't really like this,.. it moves the WARN_ON to weird 
> locations without an explanatory comment.

ok, i've reverted it for now.

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s
  2009-03-13  9:21     ` [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s Jan Beulich
@ 2009-03-13  9:43       ` Peter Zijlstra
  2009-03-13 10:38         ` Ingo Molnar
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2009-03-13  9:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: mingo, tglx, mingo, hpa, linux-kernel

On Fri, 2009-03-13 at 09:21 +0000, Jan Beulich wrote:
> >>> Peter Zijlstra <peterz@infradead.org> 13.03.09 09:54 >>>
> >Wouldn't leaving them in place but changing them to:
> >
> >WARN_ON(irqs_disabled() && system_state == SYSTEM_RUNNING);
> >
> >be clearer?
> 
> I don't think that would be precise: system_state gets set to
> SYSTEM_RUNNING much later than APs get brought up (i.e. there are
> cases where the WARN_ON()s could validly trigger with SYSTEM_BOOTING),
> and also doesn't cover states > SYSTEM_RUNNING (where, again, after
> perhaps having brought down all APs the warnings could become pointless,
> but the warnings could be meaningful as long as there are still some APs
> online).
> 
> While from an abstract code reading perspective your suggestion might
> seem reasonable, the changed placement really reflects what the warning
> is trying to warn about - a potential deadlock which cannot occur under
> the conditions filtered out by conditionals the warnings were moved
> beyond.

How about?

WARN_ON_ONCE(irqs_disabled() && !oops_in_progress)



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [tip:core/ipi] generic-ipi: eliminate WARN_ON()s during oops/panic
  2009-03-12 13:21 [PATCH, resend] eliminate spurious pointless WARN_ON()s Jan Beulich
  2009-03-12 13:48 ` Andi Kleen
  2009-03-13  1:39 ` [tip:core/ipi] generic-ipi: " Jan Beulich
@ 2009-03-13 10:36 ` Ingo Molnar
  2009-03-13 10:36 ` [tip:core/ipi] panic: decrease oops_in_progress only after having done the panic Ingo Molnar
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 35+ messages in thread
From: Ingo Molnar @ 2009-03-13 10:36 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, jbeulich, peterz, tglx, mingo

Commit-ID:  641cd4cfcdc71ce01535b31cc4d57d59a1fae1fc
Gitweb:     http://git.kernel.org/tip/641cd4cfcdc71ce01535b31cc4d57d59a1fae1fc
Author:     Ingo Molnar <mingo@elte.hu>
AuthorDate: Fri, 13 Mar 2009 10:47:34 +0100
Commit:     Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 13 Mar 2009 10:47:34 +0100

generic-ipi: eliminate WARN_ON()s during oops/panic

Do not output smp-call related warnings in the oops/panic codepath.

Reported-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <49B91A7E.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/smp.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 7ad2262..858baac 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -5,6 +5,7 @@
  */
 #include <linux/rcupdate.h>
 #include <linux/rculist.h>
+#include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/percpu.h>
 #include <linux/init.h>
@@ -285,7 +286,7 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
 	this_cpu = get_cpu();
 
 	/* Can deadlock when called with interrupts disabled */
-	WARN_ON(irqs_disabled());
+	WARN_ON_ONCE(irqs_disabled() && !oops_in_progress);
 
 	if (cpu == this_cpu) {
 		local_irq_save(flags);
@@ -329,7 +330,7 @@ void __smp_call_function_single(int cpu, struct call_single_data *data,
 	csd_lock(data);
 
 	/* Can deadlock when called with interrupts disabled */
-	WARN_ON(wait && irqs_disabled());
+	WARN_ON_ONCE(wait && irqs_disabled() && !oops_in_progress);
 
 	generic_exec_single(cpu, data, wait);
 }
@@ -365,7 +366,7 @@ void smp_call_function_many(const struct cpumask *mask,
 	int cpu, next_cpu, this_cpu = smp_processor_id();
 
 	/* Can deadlock when called with interrupts disabled */
-	WARN_ON(irqs_disabled());
+	WARN_ON_ONCE(irqs_disabled() && !oops_in_progress);
 
 	/* So, what's a CPU they want? Ignoring this one. */
 	cpu = cpumask_first_and(mask, cpu_online_mask);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip:core/ipi] panic: decrease oops_in_progress only after having done the panic
  2009-03-12 13:21 [PATCH, resend] eliminate spurious pointless WARN_ON()s Jan Beulich
                   ` (2 preceding siblings ...)
  2009-03-13 10:36 ` [tip:core/ipi] generic-ipi: eliminate WARN_ON()s during oops/panic Ingo Molnar
@ 2009-03-13 10:36 ` Ingo Molnar
  2009-03-13 10:36 ` [tip:core/ipi] panic, smp: provide smp_send_stop() wrapper on UP too Ingo Molnar
  2009-03-13 10:36 ` [tip:core/ipi] panic: clean up kernel/panic.c Ingo Molnar
  5 siblings, 0 replies; 35+ messages in thread
From: Ingo Molnar @ 2009-03-13 10:36 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, jbeulich, tglx, mingo

Commit-ID:  ffd71da4e3f323b7673b061e6f7e0d0c12dc2b49
Gitweb:     http://git.kernel.org/tip/ffd71da4e3f323b7673b061e6f7e0d0c12dc2b49
Author:     Ingo Molnar <mingo@elte.hu>
AuthorDate: Fri, 13 Mar 2009 10:54:24 +0100
Commit:     Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 13 Mar 2009 11:06:47 +0100

panic: decrease oops_in_progress only after having done the panic

Impact: eliminate secondary warnings during panic()

We can panic() in a number of difficult, atomic contexts, hence
we use bust_spinlocks(1) in panic() to increase oops_in_progress,
which prevents various debug checks we have in place.

But in practice this protection only covers the first few printk's
done by panic() - it does not cover the later attempt to stop all
other CPUs and kexec(). If a secondary warning triggers in one of
those facilities that can make the panic message scroll off.

So do bust_spinlocks(0) only much later in panic(). (which code
is only reached if panic policy is relaxed that it can return
after a warning message)

Reported-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B91A7E.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/panic.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 32fe4ef..57fb005 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -77,7 +77,6 @@ NORET_TYPE void panic(const char * fmt, ...)
 #ifdef CONFIG_DEBUG_BUGVERBOSE
 	dump_stack();
 #endif
-	bust_spinlocks(0);
 
 	/*
 	 * If we have crashed and we have a crash kernel loaded let it handle
@@ -136,6 +135,7 @@ NORET_TYPE void panic(const char * fmt, ...)
 		mdelay(1);
 		i++;
 	}
+	bust_spinlocks(0);
 }
 
 EXPORT_SYMBOL(panic);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip:core/ipi] panic, smp: provide smp_send_stop() wrapper on UP too
  2009-03-12 13:21 [PATCH, resend] eliminate spurious pointless WARN_ON()s Jan Beulich
                   ` (3 preceding siblings ...)
  2009-03-13 10:36 ` [tip:core/ipi] panic: decrease oops_in_progress only after having done the panic Ingo Molnar
@ 2009-03-13 10:36 ` Ingo Molnar
  2009-03-13 10:36 ` [tip:core/ipi] panic: clean up kernel/panic.c Ingo Molnar
  5 siblings, 0 replies; 35+ messages in thread
From: Ingo Molnar @ 2009-03-13 10:36 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx, mingo

Commit-ID:  d1dedb52acd98bd5e13e1ff4c4d045d58bbd16fe
Gitweb:     http://git.kernel.org/tip/d1dedb52acd98bd5e13e1ff4c4d045d58bbd16fe
Author:     Ingo Molnar <mingo@elte.hu>
AuthorDate: Fri, 13 Mar 2009 11:14:06 +0100
Commit:     Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 13 Mar 2009 11:24:31 +0100

panic, smp: provide smp_send_stop() wrapper on UP too

Impact: cleanup, no code changed

Remove an ugly #ifdef CONFIG_SMP from panic(), by providing
an smp_send_stop() wrapper on UP too.

LKML-Reference: <49B91A7E.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 include/linux/smp.h |    4 +++-
 kernel/panic.c      |    2 --
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 2d3bcb6..a69db82 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -38,7 +38,7 @@ int smp_call_function_single(int cpuid, void (*func) (void *info), void *info,
 /*
  * main cross-CPU interfaces, handles INIT, TLB flush, STOP, etc.
  * (defined in asm header):
- */ 
+ */
 
 /*
  * stops all CPUs but the current one:
@@ -122,6 +122,8 @@ extern unsigned int setup_max_cpus;
 
 #else /* !SMP */
 
+static inline void smp_send_stop(void) { }
+
 /*
  *	These macros fold the SMP functionality into a single CPU system
  */
diff --git a/kernel/panic.c b/kernel/panic.c
index 57fb005..ca75e81 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -85,14 +85,12 @@ NORET_TYPE void panic(const char * fmt, ...)
 	 */
 	crash_kexec(NULL);
 
-#ifdef CONFIG_SMP
 	/*
 	 * Note smp_send_stop is the usual smp shutdown function, which
 	 * unfortunately means it may not be hardened to work in a panic
 	 * situation.
 	 */
 	smp_send_stop();
-#endif
 
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip:core/ipi] panic: clean up kernel/panic.c
  2009-03-12 13:21 [PATCH, resend] eliminate spurious pointless WARN_ON()s Jan Beulich
                   ` (4 preceding siblings ...)
  2009-03-13 10:36 ` [tip:core/ipi] panic, smp: provide smp_send_stop() wrapper on UP too Ingo Molnar
@ 2009-03-13 10:36 ` Ingo Molnar
  5 siblings, 0 replies; 35+ messages in thread
From: Ingo Molnar @ 2009-03-13 10:36 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx, mingo

Commit-ID:  c95dbf27e201587b2a6cf63f8501a0313d9b4801
Gitweb:     http://git.kernel.org/tip/c95dbf27e201587b2a6cf63f8501a0313d9b4801
Author:     Ingo Molnar <mingo@elte.hu>
AuthorDate: Fri, 13 Mar 2009 11:14:06 +0100
Commit:     Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 13 Mar 2009 11:25:53 +0100

panic: clean up kernel/panic.c

Impact: cleanup, no code changed

Clean up kernel/panic.c some more and make it more consistent.

LKML-Reference: <49B91A7E.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/panic.c |  111 ++++++++++++++++++++++++++++++--------------------------
 1 files changed, 59 insertions(+), 52 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index ca75e81..3fd8c5b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -8,19 +8,19 @@
  * This function is used through-out the kernel (including mm and fs)
  * to indicate a major problem.
  */
+#include <linux/debug_locks.h>
+#include <linux/interrupt.h>
+#include <linux/kallsyms.h>
+#include <linux/notifier.h>
 #include <linux/module.h>
-#include <linux/sched.h>
-#include <linux/delay.h>
+#include <linux/random.h>
 #include <linux/reboot.h>
-#include <linux/notifier.h>
-#include <linux/init.h>
+#include <linux/delay.h>
+#include <linux/kexec.h>
+#include <linux/sched.h>
 #include <linux/sysrq.h>
-#include <linux/interrupt.h>
+#include <linux/init.h>
 #include <linux/nmi.h>
-#include <linux/kexec.h>
-#include <linux/debug_locks.h>
-#include <linux/random.h>
-#include <linux/kallsyms.h>
 #include <linux/dmi.h>
 
 int panic_on_oops;
@@ -52,19 +52,15 @@ EXPORT_SYMBOL(panic_blink);
  *
  *	This function never returns.
  */
-
 NORET_TYPE void panic(const char * fmt, ...)
 {
-	long i;
 	static char buf[1024];
 	va_list args;
-#if defined(CONFIG_S390)
-	unsigned long caller = (unsigned long) __builtin_return_address(0);
-#endif
+	long i;
 
 	/*
-	 * It's possible to come here directly from a panic-assertion and not
-	 * have preempt disabled. Some functions called from here want
+	 * It's possible to come here directly from a panic-assertion and
+	 * not have preempt disabled. Some functions called from here want
 	 * preempt to be disabled. No point enabling it later though...
 	 */
 	preempt_disable();
@@ -99,19 +95,21 @@ NORET_TYPE void panic(const char * fmt, ...)
 
 	if (panic_timeout > 0) {
 		/*
-	 	 * Delay timeout seconds before rebooting the machine. 
-		 * We can't use the "normal" timers since we just panicked..
-	 	 */
-		printk(KERN_EMERG "Rebooting in %d seconds..",panic_timeout);
+		 * Delay timeout seconds before rebooting the machine.
+		 * We can't use the "normal" timers since we just panicked.
+		 */
+		printk(KERN_EMERG "Rebooting in %d seconds..", panic_timeout);
+
 		for (i = 0; i < panic_timeout*1000; ) {
 			touch_nmi_watchdog();
 			i += panic_blink(i);
 			mdelay(1);
 			i++;
 		}
-		/*	This will not be a clean reboot, with everything
-		 *	shutting down.  But if there is a chance of
-		 *	rebooting the system it will be rebooted.
+		/*
+		 * This will not be a clean reboot, with everything
+		 * shutting down.  But if there is a chance of
+		 * rebooting the system it will be rebooted.
 		 */
 		emergency_restart();
 	}
@@ -124,10 +122,15 @@ NORET_TYPE void panic(const char * fmt, ...)
 	}
 #endif
 #if defined(CONFIG_S390)
-	disabled_wait(caller);
+	{
+		unsigned long caller;
+
+		caller = (unsigned long)__builtin_return_address(0);
+		disabled_wait(caller);
+	}
 #endif
 	local_irq_enable();
-	for (i = 0;;) {
+	for (i = 0; ; ) {
 		touch_softlockup_watchdog();
 		i += panic_blink(i);
 		mdelay(1);
@@ -140,23 +143,23 @@ EXPORT_SYMBOL(panic);
 
 
 struct tnt {
-	u8 bit;
-	char true;
-	char false;
+	u8	bit;
+	char	true;
+	char	false;
 };
 
 static const struct tnt tnts[] = {
-	{ TAINT_PROPRIETARY_MODULE, 'P', 'G' },
-	{ TAINT_FORCED_MODULE, 'F', ' ' },
-	{ TAINT_UNSAFE_SMP, 'S', ' ' },
-	{ TAINT_FORCED_RMMOD, 'R', ' ' },
-	{ TAINT_MACHINE_CHECK, 'M', ' ' },
-	{ TAINT_BAD_PAGE, 'B', ' ' },
-	{ TAINT_USER, 'U', ' ' },
-	{ TAINT_DIE, 'D', ' ' },
-	{ TAINT_OVERRIDDEN_ACPI_TABLE, 'A', ' ' },
-	{ TAINT_WARN, 'W', ' ' },
-	{ TAINT_CRAP, 'C', ' ' },
+	{ TAINT_PROPRIETARY_MODULE,	'P', 'G' },
+	{ TAINT_FORCED_MODULE,		'F', ' ' },
+	{ TAINT_UNSAFE_SMP,		'S', ' ' },
+	{ TAINT_FORCED_RMMOD,		'R', ' ' },
+	{ TAINT_MACHINE_CHECK,		'M', ' ' },
+	{ TAINT_BAD_PAGE,		'B', ' ' },
+	{ TAINT_USER,			'U', ' ' },
+	{ TAINT_DIE,			'D', ' ' },
+	{ TAINT_OVERRIDDEN_ACPI_TABLE,	'A', ' ' },
+	{ TAINT_WARN,			'W', ' ' },
+	{ TAINT_CRAP,			'C', ' ' },
 };
 
 /**
@@ -193,7 +196,8 @@ const char *print_tainted(void)
 		*s = 0;
 	} else
 		snprintf(buf, sizeof(buf), "Not tainted");
-	return(buf);
+
+	return buf;
 }
 
 int test_taint(unsigned flag)
@@ -209,7 +213,8 @@ unsigned long get_taint(void)
 
 void add_taint(unsigned flag)
 {
-	debug_locks = 0; /* can't trust the integrity of the kernel anymore */
+	/* can't trust the integrity of the kernel anymore: */
+	debug_locks = 0;
 	set_bit(flag, &tainted_mask);
 }
 EXPORT_SYMBOL(add_taint);
@@ -264,8 +269,8 @@ static void do_oops_enter_exit(void)
 }
 
 /*
- * Return true if the calling CPU is allowed to print oops-related info.  This
- * is a bit racy..
+ * Return true if the calling CPU is allowed to print oops-related info.
+ * This is a bit racy..
  */
 int oops_may_print(void)
 {
@@ -274,20 +279,22 @@ int oops_may_print(void)
 
 /*
  * Called when the architecture enters its oops handler, before it prints
- * anything.  If this is the first CPU to oops, and it's oopsing the first time
- * then let it proceed.
+ * anything.  If this is the first CPU to oops, and it's oopsing the first
+ * time then let it proceed.
  *
- * This is all enabled by the pause_on_oops kernel boot option.  We do all this
- * to ensure that oopses don't scroll off the screen.  It has the side-effect
- * of preventing later-oopsing CPUs from mucking up the display, too.
+ * This is all enabled by the pause_on_oops kernel boot option.  We do all
+ * this to ensure that oopses don't scroll off the screen.  It has the
+ * side-effect of preventing later-oopsing CPUs from mucking up the display,
+ * too.
  *
- * It turns out that the CPU which is allowed to print ends up pausing for the
- * right duration, whereas all the other CPUs pause for twice as long: once in
- * oops_enter(), once in oops_exit().
+ * It turns out that the CPU which is allowed to print ends up pausing for
+ * the right duration, whereas all the other CPUs pause for twice as long:
+ * once in oops_enter(), once in oops_exit().
  */
 void oops_enter(void)
 {
-	debug_locks_off(); /* can't trust the integrity of the kernel anymore */
+	/* can't trust the integrity of the kernel anymore: */
+	debug_locks_off();
 	do_oops_enter_exit();
 }
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s
  2009-03-13  9:43       ` Peter Zijlstra
@ 2009-03-13 10:38         ` Ingo Molnar
  2009-03-19 22:14           ` Eric W. Biederman
  0 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2009-03-13 10:38 UTC (permalink / raw)
  To: Peter Zijlstra, Eric W. Biederman
  Cc: Jan Beulich, tglx, mingo, hpa, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2009-03-13 at 09:21 +0000, Jan Beulich wrote:
> > >>> Peter Zijlstra <peterz@infradead.org> 13.03.09 09:54 >>>
> > >Wouldn't leaving them in place but changing them to:
> > >
> > >WARN_ON(irqs_disabled() && system_state == SYSTEM_RUNNING);
> > >
> > >be clearer?
> > 
> > I don't think that would be precise: system_state gets set to
> > SYSTEM_RUNNING much later than APs get brought up (i.e. there are
> > cases where the WARN_ON()s could validly trigger with SYSTEM_BOOTING),
> > and also doesn't cover states > SYSTEM_RUNNING (where, again, after
> > perhaps having brought down all APs the warnings could become pointless,
> > but the warnings could be meaningful as long as there are still some APs
> > online).
> > 
> > While from an abstract code reading perspective your suggestion might
> > seem reasonable, the changed placement really reflects what the warning
> > is trying to warn about - a potential deadlock which cannot occur under
> > the conditions filtered out by conditionals the warnings were moved
> > beyond.
> 
> How about?
> 
> WARN_ON_ONCE(irqs_disabled() && !oops_in_progress)

ok, that's indeed better - i've done that - see the upcoming 
commit messages in this thread.

I also changed all of panic() to be covered by oops_in_progress 
- see the other commit. Eric: that should make kexec [a tiny 
bit] more robust too, agreed?

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s
  2009-03-13 10:38         ` Ingo Molnar
@ 2009-03-19 22:14           ` Eric W. Biederman
  2009-03-20  8:52             ` Ingo Molnar
  0 siblings, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2009-03-19 22:14 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Jan Beulich, tglx, mingo, hpa, linux-kernel

Ingo Molnar <mingo@elte.hu> writes:

> I also changed all of panic() to be covered by oops_in_progress 
> - see the other commit. Eric: that should make kexec [a tiny 
> bit] more robust too, agreed?

A tad.  We shouldn't have any printk's on that path.  But if we
do it will keep us from wedging and getting ourselves into trouble.

Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s
  2009-03-19 22:14           ` Eric W. Biederman
@ 2009-03-20  8:52             ` Ingo Molnar
  2009-03-20  9:58               ` Eric W. Biederman
  0 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2009-03-20  8:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Peter Zijlstra, Jan Beulich, tglx, mingo, hpa, linux-kernel


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> Ingo Molnar <mingo@elte.hu> writes:
> 
> > I also changed all of panic() to be covered by oops_in_progress 
> > - see the other commit. Eric: that should make kexec [a tiny 
> > bit] more robust too, agreed?
> 
> A tad.  We shouldn't have any printk's on that path.  But if we do 
> it will keep us from wedging and getting ourselves into trouble.

Ok. Baby steps are what got us from Linux 0.12 to where we are now 
so i'm content with making a tad of progress with each commit ;-)

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s
  2009-03-20  8:52             ` Ingo Molnar
@ 2009-03-20  9:58               ` Eric W. Biederman
  2009-03-20 18:24                 ` Ingo Molnar
  0 siblings, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2009-03-20  9:58 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Jan Beulich, tglx, mingo, hpa, linux-kernel

Ingo Molnar <mingo@elte.hu> writes:

> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> Ingo Molnar <mingo@elte.hu> writes:
>> 
>> > I also changed all of panic() to be covered by oops_in_progress 
>> > - see the other commit. Eric: that should make kexec [a tiny 
>> > bit] more robust too, agreed?
>> 
>> A tad.  We shouldn't have any printk's on that path.  But if we do 
>> it will keep us from wedging and getting ourselves into trouble.
>
> Ok. Baby steps are what got us from Linux 0.12 to where we are now 
> so i'm content with making a tad of progress with each commit ;-)

Our biggest todo item for kexec on panic is to initialize the local
apics early enough in boot so we never need to run in compatibility mode.
Once we get that code working we can remove 50%+ of the kexec on panic
code.

I expect one of these days the apic code will be clean enough to handle
that.

Now back to my regularly scheduled hotunplug challenges starting with
getting lockdep to warn about the dead locks.

Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s
  2009-03-20  9:58               ` Eric W. Biederman
@ 2009-03-20 18:24                 ` Ingo Molnar
  2009-03-20 18:52                   ` Peter Zijlstra
  0 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2009-03-20 18:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Peter Zijlstra, Jan Beulich, tglx, mingo, hpa, linux-kernel


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> Now back to my regularly scheduled hotunplug challenges starting 
> with getting lockdep to warn about the dead locks.

that's a neat idea ...

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s
  2009-03-20 18:24                 ` Ingo Molnar
@ 2009-03-20 18:52                   ` Peter Zijlstra
  2009-03-20 19:34                     ` cpu hotplug and lockdep (was: Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s) Peter Zijlstra
  2009-03-20 23:40                     ` [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s Eric W. Biederman
  0 siblings, 2 replies; 35+ messages in thread
From: Peter Zijlstra @ 2009-03-20 18:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric W. Biederman, Jan Beulich, tglx, mingo, hpa, linux-kernel

On Fri, 2009-03-20 at 19:24 +0100, Ingo Molnar wrote:
> * Eric W. Biederman <ebiederm@xmission.com> wrote:
> 
> > Now back to my regularly scheduled hotunplug challenges starting 
> > with getting lockdep to warn about the dead locks.

http://programming.kicks-ass.net/kernel-patches/cpu-hotplug/

The recursive read code needs more work.. 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* cpu hotplug and lockdep (was: Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s)
  2009-03-20 18:52                   ` Peter Zijlstra
@ 2009-03-20 19:34                     ` Peter Zijlstra
  2009-03-21  7:39                       ` [PATCH 0/2] sysctl: lockdep support Eric W. Biederman
  2009-03-20 23:40                     ` [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s Eric W. Biederman
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2009-03-20 19:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric W. Biederman, Jan Beulich, tglx, mingo, hpa, linux-kernel,
	Gautham R Shenoy

On Fri, 2009-03-20 at 19:52 +0100, Peter Zijlstra wrote:
> On Fri, 2009-03-20 at 19:24 +0100, Ingo Molnar wrote:
> > * Eric W. Biederman <ebiederm@xmission.com> wrote:
> > 
> > > Now back to my regularly scheduled hotunplug challenges starting 
> > > with getting lockdep to warn about the dead locks.
> 
> http://programming.kicks-ass.net/kernel-patches/cpu-hotplug/
> 
> The recursive read code needs more work.. 

You'll find some network softirq inversion 'fixes' there, but those are
false, (soft)irq recursion isn't a problem for recursive read locks, but
of course only as long as writes (and thereby all nesting exclusive
locks) are irq-safe so as to not create write in read deadlocks.

Just never gotten around to actually implementing that, ego also offered
to look at that at one point, but I guess he didn't get around to it
either ;-)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s
  2009-03-20 18:52                   ` Peter Zijlstra
  2009-03-20 19:34                     ` cpu hotplug and lockdep (was: Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s) Peter Zijlstra
@ 2009-03-20 23:40                     ` Eric W. Biederman
  2009-03-21 10:20                       ` Peter Zijlstra
  1 sibling, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2009-03-20 23:40 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Jan Beulich, tglx, mingo, hpa, linux-kernel

Peter Zijlstra <peterz@infradead.org> writes:

> On Fri, 2009-03-20 at 19:24 +0100, Ingo Molnar wrote:
>> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>> 
>> > Now back to my regularly scheduled hotunplug challenges starting 
>> > with getting lockdep to warn about the dead locks.
>
> http://programming.kicks-ass.net/kernel-patches/cpu-hotplug/
>
> The recursive read code needs more work.. 

Cpu hotunplug isn't quite where I am looking at.

Normal device hotunplug, and in particular the use counts sysctl,
proc, and sysfs that act as reader locks that we eventually wait for
when it comes time to free those structures.

I think they fit the same model as reader/writer locks where the
readers can recurse but the writers can not.  Which seems to be
the model of rcu locks as well.

It doesn't look like read==2 works properly because it does not log any
locks grabbed underneath the reader lock.  Is that deliberate?  It certainly
does not seem to be correct.

The deadlock I am worried about is:
rtnl_lock()                                 inc_use_count();
wait use_count == 0.                        rtnl_lock();

Which you I have seen several people hit in the last couple of weeks.

Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 0/2] sysctl: lockdep support.
  2009-03-20 19:34                     ` cpu hotplug and lockdep (was: Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s) Peter Zijlstra
@ 2009-03-21  7:39                       ` Eric W. Biederman
  2009-03-21  7:40                         ` [PATCH 1/2] sysctl: Don't take the use count of multiple heads at a time Eric W. Biederman
  0 siblings, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2009-03-21  7:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Jan Beulich, tglx, mingo, hpa, linux-kernel,
	Gautham R Shenoy, Peter Zijlstra, Alexey Dobriyan, netdev


The problem:  There is a class of deadlocks that we currently
have in the kernel that lockdep does not recognize. 

In particular with the network stack we can have:

rtnl_lock();                         use_table();
unregister_netdevice();              rtnl_lock();
unregister_sysctl_table();           ....
wait_for_completion();               rtnl_lock();
....                                 unuse_table()
rtnl_unlock();                       complete();



Where we never make it to the lines labled ....

My patch following patches treats the sysctl use count as a read/writer
lock for purposes of lockdep.

The code works but I'm not certain I have plugged into lockdep quite
correctly.

Eric


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/2] sysctl: Don't take the use count of multiple heads at a time.
  2009-03-21  7:39                       ` [PATCH 0/2] sysctl: lockdep support Eric W. Biederman
@ 2009-03-21  7:40                         ` Eric W. Biederman
  2009-03-21  7:42                           ` [PATCH 2/2] sysctl: lockdep support for sysctl reference counting Eric W. Biederman
  0 siblings, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2009-03-21  7:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Jan Beulich, tglx, mingo, hpa, linux-kernel,
	Gautham R Shenoy, Peter Zijlstra, Alexey Dobriyan, netdev


The current code works fine, and is actually not buggy but it
does prevent enabling the use of lockdep to check refcounting
vs lock holding ordering problems.   So since we can ensure
that we are only hold a single sysctl_head at a time.  Allowing
lockdep to complain.

Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
---
 fs/proc/proc_sysctl.c |   24 ++++++++++--------------
 1 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 94fcfff..46eb34c 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -79,7 +79,6 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 {
 	struct ctl_table_header *head = grab_header(dir);
 	struct ctl_table *table = PROC_I(dir)->sysctl_entry;
-	struct ctl_table_header *h = NULL;
 	struct qstr *name = &dentry->d_name;
 	struct ctl_table *p;
 	struct inode *inode;
@@ -97,10 +96,11 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 
 	p = find_in_table(table, name);
 	if (!p) {
-		for (h = sysctl_head_next(NULL); h; h = sysctl_head_next(h)) {
-			if (h->attached_to != table)
+		sysctl_head_finish(head);
+		for (head = sysctl_head_next(NULL); head; head = sysctl_head_next(head)) {
+			if (head->attached_to != table)
 				continue;
-			p = find_in_table(h->attached_by, name);
+			p = find_in_table(head->attached_by, name);
 			if (p)
 				break;
 		}
@@ -110,9 +110,7 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 		goto out;
 
 	err = ERR_PTR(-ENOMEM);
-	inode = proc_sys_make_inode(dir->i_sb, h ? h : head, p);
-	if (h)
-		sysctl_head_finish(h);
+	inode = proc_sys_make_inode(dir->i_sb, head, p);
 
 	if (!inode)
 		goto out;
@@ -243,7 +241,6 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
 	struct inode *inode = dentry->d_inode;
 	struct ctl_table_header *head = grab_header(inode);
 	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
-	struct ctl_table_header *h = NULL;
 	unsigned long pos;
 	int ret = -EINVAL;
 
@@ -277,14 +274,13 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
 	if (ret)
 		goto out;
 
-	for (h = sysctl_head_next(NULL); h; h = sysctl_head_next(h)) {
-		if (h->attached_to != table)
+	sysctl_head_finish(head);
+	for (head = sysctl_head_next(NULL); head; head = sysctl_head_next(head)) {
+		if (head->attached_to != table)
 			continue;
-		ret = scan(h, h->attached_by, &pos, filp, dirent, filldir);
-		if (ret) {
-			sysctl_head_finish(h);
+		ret = scan(head, head->attached_by, &pos, filp, dirent, filldir);
+		if (ret)
 			break;
-		}
 	}
 	ret = 1;
 out:
-- 
1.6.1.2.350.g88cc


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-21  7:40                         ` [PATCH 1/2] sysctl: Don't take the use count of multiple heads at a time Eric W. Biederman
@ 2009-03-21  7:42                           ` Eric W. Biederman
  2009-03-30 22:26                             ` Andrew Morton
                                               ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Eric W. Biederman @ 2009-03-21  7:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Jan Beulich, tglx, mingo, hpa, linux-kernel,
	Gautham R Shenoy, Peter Zijlstra, Alexey Dobriyan, netdev


It is possible for get lock ordering deadlocks between locks
and waiting for the sysctl used count to drop to zero.  We have
recently observed one of these in the networking code.

So teach the sysctl code how to speak lockdep so the kernel
can warn about these kinds of rare issues proactively.

Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
---
 include/linux/sysctl.h |    4 ++
 kernel/sysctl.c        |  108 ++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 91 insertions(+), 21 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 39d471d..ec9b1dd 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -28,6 +28,7 @@
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/compiler.h>
+#include <linux/lockdep.h>
 
 struct file;
 struct completion;
@@ -1087,6 +1088,9 @@ struct ctl_table_header
 	struct ctl_table *attached_by;
 	struct ctl_table *attached_to;
 	struct ctl_table_header *parent;
+#ifdef CONFIG_PROVE_LOCKING
+	struct lockdep_map dep_map;
+#endif
 };
 
 /* struct ctl_path describes where in the hierarchy a table is added */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c5ef44f..ea8cc39 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1454,12 +1454,63 @@ static struct ctl_table dev_table[] = {
 
 static DEFINE_SPINLOCK(sysctl_lock);
 
+#ifndef CONFIG_PROVE_LOCKING
+
+# define lock_sysctl() spin_lock(&sysctl_lock)
+# define unlock_sysctl() spin_unlock(&sysctl_lock)
+
+static inline void table_acquire_use(struct ctl_table_header *hdr) { }
+static inline void table_release_use(struct ctl_table_header *hdr) { }
+static inline void table_acquire(struct ctl_table_header *hdr) { }
+static inline void table_contended(struct ctl_table_header *hdr) { }
+static inline void table_acquired(struct ctl_table_header *hdr) { }
+static inline void table_release(struct ctl_table_header *hdr) { }
+
+#else	/* CONFIG_PROVE_LOCKING */
+
+#  define lock_sysctl() __raw_spin_lock(&sysctl_lock.raw_lock)
+#  define unlock_sysctl() __raw_spin_unlock(&sysctl_lock.raw_lock)
+
+static inline void table_acquire_use(struct ctl_table_header *hdr)
+{
+	lock_acquire(&hdr->dep_map, 0, 0, 1, 2, NULL, _RET_IP_);
+	lock_acquired(&hdr->dep_map, _RET_IP_);
+}
+
+static inline void table_release_use(struct ctl_table_header *hdr)
+{
+	lock_release(&hdr->dep_map, 0, _RET_IP_);
+}
+
+static inline void table_acquire(struct ctl_table_header *hdr)
+{
+	lock_acquire(&hdr->dep_map, 0, 0, 0, 2, NULL, _RET_IP_);
+}
+
+static inline void table_contended(struct ctl_table_header *hdr)
+{
+	lock_contended(&hdr->dep_map, _RET_IP_);
+}
+
+static inline void table_acquired(struct ctl_table_header *hdr)
+{
+	lock_acquired(&hdr->dep_map, _RET_IP_);
+}
+
+static inline void table_release(struct ctl_table_header *hdr)
+{
+	lock_release(&hdr->dep_map, 0, _RET_IP_);
+}
+
+#endif /* CONFIG_PROVE_LOCKING */
+
 /* called under sysctl_lock */
 static int use_table(struct ctl_table_header *p)
 {
 	if (unlikely(p->unregistering))
 		return 0;
 	p->used++;
+	table_acquire_use(p);
 	return 1;
 }
 
@@ -1469,6 +1520,8 @@ static void unuse_table(struct ctl_table_header *p)
 	if (!--p->used)
 		if (unlikely(p->unregistering))
 			complete(p->unregistering);
+
+	table_release_use(p);
 }
 
 /* called under sysctl_lock, will reacquire if has to wait */
@@ -1478,47 +1531,54 @@ static void start_unregistering(struct ctl_table_header *p)
 	 * if p->used is 0, nobody will ever touch that entry again;
 	 * we'll eliminate all paths to it before dropping sysctl_lock
 	 */
+	table_acquire(p);
 	if (unlikely(p->used)) {
 		struct completion wait;
+		table_contended(p);
+
 		init_completion(&wait);
 		p->unregistering = &wait;
-		spin_unlock(&sysctl_lock);
+		unlock_sysctl();
 		wait_for_completion(&wait);
-		spin_lock(&sysctl_lock);
+		lock_sysctl();
 	} else {
 		/* anything non-NULL; we'll never dereference it */
 		p->unregistering = ERR_PTR(-EINVAL);
 	}
+	table_acquired(p);
+
 	/*
 	 * do not remove from the list until nobody holds it; walking the
 	 * list in do_sysctl() relies on that.
 	 */
 	list_del_init(&p->ctl_entry);
+
+	table_release(p);
 }
 
 void sysctl_head_get(struct ctl_table_header *head)
 {
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	head->count++;
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 }
 
 void sysctl_head_put(struct ctl_table_header *head)
 {
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	if (!--head->count)
 		kfree(head);
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 }
 
 struct ctl_table_header *sysctl_head_grab(struct ctl_table_header *head)
 {
 	if (!head)
 		BUG();
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	if (!use_table(head))
 		head = ERR_PTR(-ENOENT);
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 	return head;
 }
 
@@ -1526,9 +1586,9 @@ void sysctl_head_finish(struct ctl_table_header *head)
 {
 	if (!head)
 		return;
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	unuse_table(head);
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 }
 
 static struct ctl_table_set *
@@ -1555,7 +1615,7 @@ struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
 	struct ctl_table_header *head;
 	struct list_head *tmp;
 
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	if (prev) {
 		head = prev;
 		tmp = &prev->ctl_entry;
@@ -1568,7 +1628,7 @@ struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
 
 		if (!use_table(head))
 			goto next;
-		spin_unlock(&sysctl_lock);
+		unlock_sysctl();
 		return head;
 	next:
 		root = head->root;
@@ -1587,7 +1647,7 @@ struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
 		tmp = header_list->next;
 	}
 out:
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 	return NULL;
 }
 
@@ -1598,9 +1658,9 @@ struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev)
 
 void register_sysctl_root(struct ctl_table_root *root)
 {
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	list_add_tail(&root->root_list, &sysctl_table_root.root_list);
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 }
 
 #ifdef CONFIG_SYSCTL_SYSCALL
@@ -1951,7 +2011,13 @@ struct ctl_table_header *__register_sysctl_paths(
 		return NULL;
 	}
 #endif
-	spin_lock(&sysctl_lock);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	{
+		static struct lock_class_key __key;
+		lockdep_init_map(&header->dep_map, "sysctl_used", &__key, 0);
+	}
+#endif	
+	lock_sysctl();
 	header->set = lookup_header_set(root, namespaces);
 	header->attached_by = header->ctl_table;
 	header->attached_to = root_table;
@@ -1966,7 +2032,7 @@ struct ctl_table_header *__register_sysctl_paths(
 	}
 	header->parent->count++;
 	list_add_tail(&header->ctl_entry, &header->set->list);
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 
 	return header;
 }
@@ -2018,7 +2084,7 @@ void unregister_sysctl_table(struct ctl_table_header * header)
 	if (header == NULL)
 		return;
 
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	start_unregistering(header);
 	if (!--header->parent->count) {
 		WARN_ON(1);
@@ -2026,21 +2092,21 @@ void unregister_sysctl_table(struct ctl_table_header * header)
 	}
 	if (!--header->count)
 		kfree(header);
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 }
 
 int sysctl_is_seen(struct ctl_table_header *p)
 {
 	struct ctl_table_set *set = p->set;
 	int res;
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	if (p->unregistering)
 		res = 0;
 	else if (!set->is_seen)
 		res = 1;
 	else
 		res = set->is_seen(set);
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 	return res;
 }
 
-- 
1.6.1.2.350.g88cc


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s
  2009-03-20 23:40                     ` [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s Eric W. Biederman
@ 2009-03-21 10:20                       ` Peter Zijlstra
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2009-03-21 10:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, Jan Beulich, tglx, mingo, hpa, linux-kernel

On Fri, 2009-03-20 at 16:40 -0700, Eric W. Biederman wrote:
> It doesn't look like read==2 works properly because it does not log any
> locks grabbed underneath the reader lock.  Is that deliberate?  It certainly
> does not seem to be correct.

You're right, it doesn't work correctly. One of the patches in that heap
I pointed you at tries to fix it, but gets it wrong.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-21  7:42                           ` [PATCH 2/2] sysctl: lockdep support for sysctl reference counting Eric W. Biederman
@ 2009-03-30 22:26                             ` Andrew Morton
  2009-03-30 22:53                               ` Eric W. Biederman
  2009-03-31  8:10                               ` Peter Zijlstra
  2009-03-31  8:17                             ` Peter Zijlstra
  2009-04-10  9:18                             ` Andrew Morton
  2 siblings, 2 replies; 35+ messages in thread
From: Andrew Morton @ 2009-03-30 22:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: mingo, jbeulich, tglx, mingo, hpa, linux-kernel, ego, peterz,
	adobriyan, netdev


So I merged these two patches.  I designated them as to-be-merged via
Alexey, with a Cc:Peter (IOW: wakey wakey ;))

Regarding this:

> From: ebiederm@xmission.com (Eric W. Biederman)
> ...
> Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>

It doesn't make a lot of sense to have a different Author: address,
IMO.  So I rewrote the From: address to be ebiederm@aristanetworks.com,
OK?

But it would be better if you were to do this, so please do put an
explicit From: line at the top of your changelogs.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-30 22:26                             ` Andrew Morton
@ 2009-03-30 22:53                               ` Eric W. Biederman
  2009-03-30 23:18                                 ` Andrew Morton
  2009-03-31  8:10                               ` Peter Zijlstra
  1 sibling, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2009-03-30 22:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mingo, jbeulich, tglx, mingo, hpa, linux-kernel, ego, peterz,
	adobriyan, netdev

Andrew Morton <akpm@linux-foundation.org> writes:

> So I merged these two patches.  I designated them as to-be-merged via
> Alexey, with a Cc:Peter (IOW: wakey wakey ;))

It is more sysctl than proc but whichever.  As long as people aren't
blind sided at the last minute I'm fine.

> Regarding this:
>
>> From: ebiederm@xmission.com (Eric W. Biederman)
>> ...
>> Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
>
> It doesn't make a lot of sense to have a different Author: address,
> IMO.  So I rewrote the From: address to be ebiederm@aristanetworks.com,
> OK?

>From my perspective it does.  It is much easier to use my personal
email setup for doing kernel work.  But I need to indicate I am
signing off as an employee of AristaNetworks.  Saying aristanetworks
in my signed-off-by seems a little more official.

Eric


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-30 22:53                               ` Eric W. Biederman
@ 2009-03-30 23:18                                 ` Andrew Morton
  2009-03-30 23:50                                   ` Eric W. Biederman
  0 siblings, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2009-03-30 23:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: mingo, jbeulich, tglx, mingo, hpa, linux-kernel, ego, peterz,
	adobriyan, netdev

On Mon, 30 Mar 2009 15:53:04 -0700
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Andrew Morton <akpm@linux-foundation.org> writes:
> 
> > So I merged these two patches.  I designated them as to-be-merged via
> > Alexey, with a Cc:Peter (IOW: wakey wakey ;))
> 
> It is more sysctl than proc but whichever.  As long as people aren't
> blind sided at the last minute I'm fine.

Yep.  But keeping all fs/proc/ changes in Alexey's tree keeps things
neat and is part of my secret maintainer-impressment plot.

> > Regarding this:
> >
> >> From: ebiederm@xmission.com (Eric W. Biederman)
> >> ...
> >> Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
> >
> > It doesn't make a lot of sense to have a different Author: address,
> > IMO.  So I rewrote the From: address to be ebiederm@aristanetworks.com,
> > OK?
> 
> >From my perspective it does.  It is much easier to use my personal
> email setup for doing kernel work.  But I need to indicate I am
> signing off as an employee of AristaNetworks.  Saying aristanetworks
> in my signed-off-by seems a little more official.

OK, fine, but to avoid confusion and to overtly announce this
preference, please do add the From: Line?

Plus that saves me from having to edit
"ebiederm@xmission.com (Eric W. Biederman)" to
"Eric W. Biederman <ebiederm@xmission.com>"

a zillion times ;)



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-30 23:18                                 ` Andrew Morton
@ 2009-03-30 23:50                                   ` Eric W. Biederman
  0 siblings, 0 replies; 35+ messages in thread
From: Eric W. Biederman @ 2009-03-30 23:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mingo, jbeulich, tglx, mingo, hpa, linux-kernel, ego, peterz,
	adobriyan, netdev

Andrew Morton <akpm@linux-foundation.org> writes:

> On Mon, 30 Mar 2009 15:53:04 -0700
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>
>> Andrew Morton <akpm@linux-foundation.org> writes:
>> 
>> > So I merged these two patches.  I designated them as to-be-merged via
>> > Alexey, with a Cc:Peter (IOW: wakey wakey ;))
>> 
>> It is more sysctl than proc but whichever.  As long as people aren't
>> blind sided at the last minute I'm fine.
>
> Yep.  But keeping all fs/proc/ changes in Alexey's tree keeps things
> neat and is part of my secret maintainer-impressment plot.

Sounds like a plan.  You work on that.  I will work on undermining
the plot by breaking /proc into it's several separate filesystems.

> OK, fine, but to avoid confusion and to overtly announce this
> preference, please do add the From: Line?

Sounds like a plan.  In truth I have been deleting that line from my
patches anyway....  It just seemed silly to have it in there when
the real From matched.

Eric


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-30 22:26                             ` Andrew Morton
  2009-03-30 22:53                               ` Eric W. Biederman
@ 2009-03-31  8:10                               ` Peter Zijlstra
  2009-03-31  8:47                                 ` Eric W. Biederman
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2009-03-31  8:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, mingo, jbeulich, tglx, mingo, hpa,
	linux-kernel, ego, adobriyan, netdev

On Mon, 2009-03-30 at 15:26 -0700, Andrew Morton wrote:
> So I merged these two patches.  I designated them as to-be-merged via
> Alexey, with a Cc:Peter (IOW: wakey wakey ;))

Ah, I was under the impression they weren't quite finished yet,
apparently I was mistaken.

Eric, didn't you need recursive locks fixed and such?

Let me have a look at them.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-21  7:42                           ` [PATCH 2/2] sysctl: lockdep support for sysctl reference counting Eric W. Biederman
  2009-03-30 22:26                             ` Andrew Morton
@ 2009-03-31  8:17                             ` Peter Zijlstra
  2009-03-31 13:40                               ` Eric W. Biederman
  2009-04-10  9:18                             ` Andrew Morton
  2 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2009-03-31  8:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Ingo Molnar, Jan Beulich, tglx, mingo, hpa,
	linux-kernel, Gautham R Shenoy, Alexey Dobriyan, netdev

On Sat, 2009-03-21 at 00:42 -0700, Eric W. Biederman wrote:
> It is possible for get lock ordering deadlocks between locks
> and waiting for the sysctl used count to drop to zero.  We have
> recently observed one of these in the networking code.
> 
> So teach the sysctl code how to speak lockdep so the kernel
> can warn about these kinds of rare issues proactively.

It would be very good to extend this changelog with a more detailed
explanation of the deadlock in question.

Let me see if I got it right:

We're holding a lock, while waiting for the refcount to drop to 0.
Dropping that refcount is blocked on that lock.

Something like that?

> Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
> ---
>  include/linux/sysctl.h |    4 ++
>  kernel/sysctl.c        |  108 ++++++++++++++++++++++++++++++++++++++---------
>  2 files changed, 91 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index 39d471d..ec9b1dd 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -28,6 +28,7 @@
>  #include <linux/kernel.h>
>  #include <linux/types.h>
>  #include <linux/compiler.h>
> +#include <linux/lockdep.h>
>  
>  struct file;
>  struct completion;
> @@ -1087,6 +1088,9 @@ struct ctl_table_header
>  	struct ctl_table *attached_by;
>  	struct ctl_table *attached_to;
>  	struct ctl_table_header *parent;
> +#ifdef CONFIG_PROVE_LOCKING
> +	struct lockdep_map dep_map;
> +#endif
>  };
>  
>  /* struct ctl_path describes where in the hierarchy a table is added */
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index c5ef44f..ea8cc39 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1454,12 +1454,63 @@ static struct ctl_table dev_table[] = {
>  
>  static DEFINE_SPINLOCK(sysctl_lock);
>  
> +#ifndef CONFIG_PROVE_LOCKING
> +
> +# define lock_sysctl() spin_lock(&sysctl_lock)
> +# define unlock_sysctl() spin_unlock(&sysctl_lock)
> +
> +static inline void table_acquire_use(struct ctl_table_header *hdr) { }
> +static inline void table_release_use(struct ctl_table_header *hdr) { }
> +static inline void table_acquire(struct ctl_table_header *hdr) { }
> +static inline void table_contended(struct ctl_table_header *hdr) { }
> +static inline void table_acquired(struct ctl_table_header *hdr) { }
> +static inline void table_release(struct ctl_table_header *hdr) { }
> +
> +#else	/* CONFIG_PROVE_LOCKING */
> +
> +#  define lock_sysctl() __raw_spin_lock(&sysctl_lock.raw_lock)
> +#  define unlock_sysctl() __raw_spin_unlock(&sysctl_lock.raw_lock)

Uhmm, Please explain that -- without a proper explanation this is a NAK.

> +static inline void table_acquire_use(struct ctl_table_header *hdr)
> +{
> +	lock_acquire(&hdr->dep_map, 0, 0, 1, 2, NULL, _RET_IP_);
> +	lock_acquired(&hdr->dep_map, _RET_IP_);
> +}
> +
> +static inline void table_release_use(struct ctl_table_header *hdr)
> +{
> +	lock_release(&hdr->dep_map, 0, _RET_IP_);
> +}
> +
> +static inline void table_acquire(struct ctl_table_header *hdr)
> +{
> +	lock_acquire(&hdr->dep_map, 0, 0, 0, 2, NULL, _RET_IP_);
> +}
> +
> +static inline void table_contended(struct ctl_table_header *hdr)
> +{
> +	lock_contended(&hdr->dep_map, _RET_IP_);
> +}
> +
> +static inline void table_acquired(struct ctl_table_header *hdr)
> +{
> +	lock_acquired(&hdr->dep_map, _RET_IP_);
> +}
> +
> +static inline void table_release(struct ctl_table_header *hdr)
> +{
> +	lock_release(&hdr->dep_map, 0, _RET_IP_);
> +}
> +
> +#endif /* CONFIG_PROVE_LOCKING */
> +
>  /* called under sysctl_lock */
>  static int use_table(struct ctl_table_header *p)
>  {
>  	if (unlikely(p->unregistering))
>  		return 0;
>  	p->used++;
> +	table_acquire_use(p);
>  	return 1;
>  }
>  
> @@ -1469,6 +1520,8 @@ static void unuse_table(struct ctl_table_header *p)
>  	if (!--p->used)
>  		if (unlikely(p->unregistering))
>  			complete(p->unregistering);
> +
> +	table_release_use(p);
>  }
>  
>  /* called under sysctl_lock, will reacquire if has to wait */
> @@ -1478,47 +1531,54 @@ static void start_unregistering(struct ctl_table_header *p)
>  	 * if p->used is 0, nobody will ever touch that entry again;
>  	 * we'll eliminate all paths to it before dropping sysctl_lock
>  	 */
> +	table_acquire(p);
>  	if (unlikely(p->used)) {
>  		struct completion wait;
> +		table_contended(p);
> +
>  		init_completion(&wait);
>  		p->unregistering = &wait;
> -		spin_unlock(&sysctl_lock);
> +		unlock_sysctl();
>  		wait_for_completion(&wait);
> -		spin_lock(&sysctl_lock);
> +		lock_sysctl();
>  	} else {
>  		/* anything non-NULL; we'll never dereference it */
>  		p->unregistering = ERR_PTR(-EINVAL);
>  	}
> +	table_acquired(p);
> +
>  	/*
>  	 * do not remove from the list until nobody holds it; walking the
>  	 * list in do_sysctl() relies on that.
>  	 */
>  	list_del_init(&p->ctl_entry);
> +
> +	table_release(p);
>  }
>  

> @@ -1951,7 +2011,13 @@ struct ctl_table_header *__register_sysctl_paths(
>  		return NULL;
>  	}
>  #endif
> -	spin_lock(&sysctl_lock);
> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +	{
> +		static struct lock_class_key __key;
> +		lockdep_init_map(&header->dep_map, "sysctl_used", &__key, 0);
> +	}
> +#endif	

This means every sysctl thingy gets the same class, is that
intended/desired?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-31  8:10                               ` Peter Zijlstra
@ 2009-03-31  8:47                                 ` Eric W. Biederman
  0 siblings, 0 replies; 35+ messages in thread
From: Eric W. Biederman @ 2009-03-31  8:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, mingo, jbeulich, tglx, mingo, hpa, linux-kernel,
	ego, adobriyan, netdev

Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, 2009-03-30 at 15:26 -0700, Andrew Morton wrote:
>> So I merged these two patches.  I designated them as to-be-merged via
>> Alexey, with a Cc:Peter (IOW: wakey wakey ;))
>
> Ah, I was under the impression they weren't quite finished yet,
> apparently I was mistaken.
>
> Eric, didn't you need recursive locks fixed and such?

In theory.  In practice I removed the one case that could legitimately
recurse.

> Let me have a look at them.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-31  8:17                             ` Peter Zijlstra
@ 2009-03-31 13:40                               ` Eric W. Biederman
  2009-03-31 15:35                                 ` Peter Zijlstra
  0 siblings, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2009-03-31 13:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Ingo Molnar, Jan Beulich, tglx, mingo, hpa,
	linux-kernel, Gautham R Shenoy, Alexey Dobriyan, netdev

Peter Zijlstra <peterz@infradead.org> writes:

> On Sat, 2009-03-21 at 00:42 -0700, Eric W. Biederman wrote:
>> It is possible for get lock ordering deadlocks between locks
>> and waiting for the sysctl used count to drop to zero.  We have
>> recently observed one of these in the networking code.
>> 
>> So teach the sysctl code how to speak lockdep so the kernel
>> can warn about these kinds of rare issues proactively.
>
> It would be very good to extend this changelog with a more detailed
> explanation of the deadlock in question.
>
> Let me see if I got it right:
>
> We're holding a lock, while waiting for the refcount to drop to 0.
> Dropping that refcount is blocked on that lock.
>
> Something like that?

Exactly.

I must have written an explanation so many times that it got
lost when I wrote that commit message.

In particular the problem can be see with /proc/sys/net/ipv4/conf/*/forwarding.

The problem is that the handler for fowarding takes the rtnl_lock
with the reference count held.

Then we call unregister_sysctl_table under the rtnl_lock.
which waits for the reference count to go to zero.


>> Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
>> ---
>>  include/linux/sysctl.h |    4 ++
>>  kernel/sysctl.c        |  108 ++++++++++++++++++++++++++++++++++++++---------
>>  2 files changed, 91 insertions(+), 21 deletions(-)
>> 
>> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
>> index 39d471d..ec9b1dd 100644
>> --- a/include/linux/sysctl.h
>> +++ b/include/linux/sysctl.h
>> @@ -28,6 +28,7 @@
>>  #include <linux/kernel.h>
>>  #include <linux/types.h>
>>  #include <linux/compiler.h>
>> +#include <linux/lockdep.h>
>>  
>>  struct file;
>>  struct completion;
>> @@ -1087,6 +1088,9 @@ struct ctl_table_header
>>  	struct ctl_table *attached_by;
>>  	struct ctl_table *attached_to;
>>  	struct ctl_table_header *parent;
>> +#ifdef CONFIG_PROVE_LOCKING
>> +	struct lockdep_map dep_map;
>> +#endif
>>  };
>>  
>>  /* struct ctl_path describes where in the hierarchy a table is added */
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index c5ef44f..ea8cc39 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -1454,12 +1454,63 @@ static struct ctl_table dev_table[] = {
>>  
>>  static DEFINE_SPINLOCK(sysctl_lock);
>>  
>> +#ifndef CONFIG_PROVE_LOCKING
>> +
>> +# define lock_sysctl() spin_lock(&sysctl_lock)
>> +# define unlock_sysctl() spin_unlock(&sysctl_lock)
>> +
>> +static inline void table_acquire_use(struct ctl_table_header *hdr) { }
>> +static inline void table_release_use(struct ctl_table_header *hdr) { }
>> +static inline void table_acquire(struct ctl_table_header *hdr) { }
>> +static inline void table_contended(struct ctl_table_header *hdr) { }
>> +static inline void table_acquired(struct ctl_table_header *hdr) { }
>> +static inline void table_release(struct ctl_table_header *hdr) { }
>> +
>> +#else	/* CONFIG_PROVE_LOCKING */
>> +
>> +#  define lock_sysctl() __raw_spin_lock(&sysctl_lock.raw_lock)
>> +#  define unlock_sysctl() __raw_spin_unlock(&sysctl_lock.raw_lock)
>
> Uhmm, Please explain that -- without a proper explanation this is a NAK.

If the refcount is to be considered a lock.  sysctl_lock must be considered
the internals of that lock.  lockdep gets extremely confused otherwise.
Since the spinlock is static to this file I'm not especially worried
about it.

>> +static inline void table_acquire_use(struct ctl_table_header *hdr)
>> +{
>> +	lock_acquire(&hdr->dep_map, 0, 0, 1, 2, NULL, _RET_IP_);
>> +	lock_acquired(&hdr->dep_map, _RET_IP_);
>> +}
>> +
>> +static inline void table_release_use(struct ctl_table_header *hdr)
>> +{
>> +	lock_release(&hdr->dep_map, 0, _RET_IP_);
>> +}
>> +
>> +static inline void table_acquire(struct ctl_table_header *hdr)
>> +{
>> +	lock_acquire(&hdr->dep_map, 0, 0, 0, 2, NULL, _RET_IP_);
>> +}
>> +
>> +static inline void table_contended(struct ctl_table_header *hdr)
>> +{
>> +	lock_contended(&hdr->dep_map, _RET_IP_);
>> +}
>> +
>> +static inline void table_acquired(struct ctl_table_header *hdr)
>> +{
>> +	lock_acquired(&hdr->dep_map, _RET_IP_);
>> +}
>> +
>> +static inline void table_release(struct ctl_table_header *hdr)
>> +{
>> +	lock_release(&hdr->dep_map, 0, _RET_IP_);
>> +}
>> +
>> +#endif /* CONFIG_PROVE_LOCKING */
>> +
>>  /* called under sysctl_lock */
>>  static int use_table(struct ctl_table_header *p)
>>  {
>>  	if (unlikely(p->unregistering))
>>  		return 0;
>>  	p->used++;
>> +	table_acquire_use(p);
>>  	return 1;
>>  }
>>  
>> @@ -1469,6 +1520,8 @@ static void unuse_table(struct ctl_table_header *p)
>>  	if (!--p->used)
>>  		if (unlikely(p->unregistering))
>>  			complete(p->unregistering);
>> +
>> +	table_release_use(p);
>>  }
>>  
>>  /* called under sysctl_lock, will reacquire if has to wait */
>> @@ -1478,47 +1531,54 @@ static void start_unregistering(struct ctl_table_header *p)
>>  	 * if p->used is 0, nobody will ever touch that entry again;
>>  	 * we'll eliminate all paths to it before dropping sysctl_lock
>>  	 */
>> +	table_acquire(p);
>>  	if (unlikely(p->used)) {
>>  		struct completion wait;
>> +		table_contended(p);
>> +
>>  		init_completion(&wait);
>>  		p->unregistering = &wait;
>> -		spin_unlock(&sysctl_lock);
>> +		unlock_sysctl();
>>  		wait_for_completion(&wait);
>> -		spin_lock(&sysctl_lock);
>> +		lock_sysctl();
>>  	} else {
>>  		/* anything non-NULL; we'll never dereference it */
>>  		p->unregistering = ERR_PTR(-EINVAL);
>>  	}
>> +	table_acquired(p);
>> +
>>  	/*
>>  	 * do not remove from the list until nobody holds it; walking the
>>  	 * list in do_sysctl() relies on that.
>>  	 */
>>  	list_del_init(&p->ctl_entry);
>> +
>> +	table_release(p);
>>  }
>>  
>
>> @@ -1951,7 +2011,13 @@ struct ctl_table_header *__register_sysctl_paths(
>>  		return NULL;
>>  	}
>>  #endif
>> -	spin_lock(&sysctl_lock);
>> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
>> +	{
>> +		static struct lock_class_key __key;
>> +		lockdep_init_map(&header->dep_map, "sysctl_used", &__key, 0);
>> +	}
>> +#endif	
>
> This means every sysctl thingy gets the same class, is that
> intended/desired?

There is only one place we initialize it, and as far as I know really
only one place we take it.  Which is the definition of a lockdep
class as far as I know.

Eric


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-31 13:40                               ` Eric W. Biederman
@ 2009-03-31 15:35                                 ` Peter Zijlstra
  2009-03-31 22:44                                   ` Eric W. Biederman
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2009-03-31 15:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Ingo Molnar, Jan Beulich, tglx, mingo, hpa,
	linux-kernel, Gautham R Shenoy, Alexey Dobriyan, netdev

On Tue, 2009-03-31 at 06:40 -0700, Eric W. Biederman wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Sat, 2009-03-21 at 00:42 -0700, Eric W. Biederman wrote:
> >> It is possible for get lock ordering deadlocks between locks
> >> and waiting for the sysctl used count to drop to zero.  We have
> >> recently observed one of these in the networking code.
> >> 
> >> So teach the sysctl code how to speak lockdep so the kernel
> >> can warn about these kinds of rare issues proactively.
> >
> > It would be very good to extend this changelog with a more detailed
> > explanation of the deadlock in question.
> >
> > Let me see if I got it right:
> >
> > We're holding a lock, while waiting for the refcount to drop to 0.
> > Dropping that refcount is blocked on that lock.
> >
> > Something like that?
> 
> Exactly.
> 
> I must have written an explanation so many times that it got
> lost when I wrote that commit message.
> 
> In particular the problem can be see with /proc/sys/net/ipv4/conf/*/forwarding.
> 
> The problem is that the handler for fowarding takes the rtnl_lock
> with the reference count held.
> 
> Then we call unregister_sysctl_table under the rtnl_lock.
> which waits for the reference count to go to zero.

> >> +
> >> +#  define lock_sysctl() __raw_spin_lock(&sysctl_lock.raw_lock)
> >> +#  define unlock_sysctl() __raw_spin_unlock(&sysctl_lock.raw_lock)
> >
> > Uhmm, Please explain that -- without a proper explanation this is a NAK.
> 
> If the refcount is to be considered a lock.  sysctl_lock must be considered
> the internals of that lock.  lockdep gets extremely confused otherwise.
> Since the spinlock is static to this file I'm not especially worried
> about it.

Usually lock internal locks still get lockdep coverage. Let see if we
can find a way for this to be true even here. I suspect the below to
cause the issue:

> >>  /* called under sysctl_lock, will reacquire if has to wait */
> >> @@ -1478,47 +1531,54 @@ static void start_unregistering(struct ctl_table_header *p)
> >>  	 * if p->used is 0, nobody will ever touch that entry again;
> >>  	 * we'll eliminate all paths to it before dropping sysctl_lock
> >>  	 */
> >> +	table_acquire(p);
> >>  	if (unlikely(p->used)) {
> >>  		struct completion wait;
> >> +		table_contended(p);
> >> +
> >>  		init_completion(&wait);
> >>  		p->unregistering = &wait;
> >> -		spin_unlock(&sysctl_lock);
> >> +		unlock_sysctl();
> >>  		wait_for_completion(&wait);
> >> -		spin_lock(&sysctl_lock);
> >> +		lock_sysctl();
> >>  	} else {
> >>  		/* anything non-NULL; we'll never dereference it */
> >>  		p->unregistering = ERR_PTR(-EINVAL);
> >>  	}
> >> +	table_acquired(p);
> >> +
> >>  	/*
> >>  	 * do not remove from the list until nobody holds it; walking the
> >>  	 * list in do_sysctl() relies on that.
> >>  	 */
> >>  	list_del_init(&p->ctl_entry);
> >> +
> >> +	table_release(p);
> >>  }

There you acquire the table while holding the spinlock, generating:
sysctl_lock -> table_lock, however you then release the sysctl_lock and
re-acquire it, generating table_lock -> sysctl_lock.

Humm, can't we write that differently?


> >> @@ -1951,7 +2011,13 @@ struct ctl_table_header *__register_sysctl_paths(
> >>  		return NULL;
> >>  	}
> >>  #endif
> >> -	spin_lock(&sysctl_lock);
> >> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> >> +	{
> >> +		static struct lock_class_key __key;
> >> +		lockdep_init_map(&header->dep_map, "sysctl_used", &__key, 0);
> >> +	}
> >> +#endif	
> >
> > This means every sysctl thingy gets the same class, is that
> > intended/desired?
> 
> There is only one place we initialize it, and as far as I know really
> only one place we take it.  Which is the definition of a lockdep
> class as far as I know.

Indeed, just checking.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-31 15:35                                 ` Peter Zijlstra
@ 2009-03-31 22:44                                   ` Eric W. Biederman
  0 siblings, 0 replies; 35+ messages in thread
From: Eric W. Biederman @ 2009-03-31 22:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Ingo Molnar, Jan Beulich, tglx, mingo, hpa,
	linux-kernel, Gautham R Shenoy, Alexey Dobriyan, netdev

Peter Zijlstra <peterz@infradead.org> writes:

> On Tue, 2009-03-31 at 06:40 -0700, Eric W. Biederman wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> 
>> > On Sat, 2009-03-21 at 00:42 -0700, Eric W. Biederman wrote:
>> >> It is possible for get lock ordering deadlocks between locks
>> >> and waiting for the sysctl used count to drop to zero.  We have
>> >> recently observed one of these in the networking code.
>> >> 
>> >> So teach the sysctl code how to speak lockdep so the kernel
>> >> can warn about these kinds of rare issues proactively.
>> >
>> > It would be very good to extend this changelog with a more detailed
>> > explanation of the deadlock in question.
>> >
>> > Let me see if I got it right:
>> >
>> > We're holding a lock, while waiting for the refcount to drop to 0.
>> > Dropping that refcount is blocked on that lock.
>> >
>> > Something like that?
>> 
>> Exactly.
>> 
>> I must have written an explanation so many times that it got
>> lost when I wrote that commit message.
>> 
>> In particular the problem can be see with /proc/sys/net/ipv4/conf/*/forwarding.
>> 
>> The problem is that the handler for fowarding takes the rtnl_lock
>> with the reference count held.
>> 
>> Then we call unregister_sysctl_table under the rtnl_lock.
>> which waits for the reference count to go to zero.
>
>> >> +
>> >> +#  define lock_sysctl() __raw_spin_lock(&sysctl_lock.raw_lock)
>> >> +#  define unlock_sysctl() __raw_spin_unlock(&sysctl_lock.raw_lock)
>> >
>> > Uhmm, Please explain that -- without a proper explanation this is a NAK.
>> 
>> If the refcount is to be considered a lock.  sysctl_lock must be considered
>> the internals of that lock.  lockdep gets extremely confused otherwise.
>> Since the spinlock is static to this file I'm not especially worried
>> about it.
>
> Usually lock internal locks still get lockdep coverage. Let see if we
> can find a way for this to be true even here. I suspect the below to
> cause the issue:
>
>> >>  /* called under sysctl_lock, will reacquire if has to wait */
>> >> @@ -1478,47 +1531,54 @@ static void start_unregistering(struct ctl_table_header *p)
>> >>  	 * if p->used is 0, nobody will ever touch that entry again;
>> >>  	 * we'll eliminate all paths to it before dropping sysctl_lock
>> >>  	 */
>> >> +	table_acquire(p);
>> >>  	if (unlikely(p->used)) {
>> >>  		struct completion wait;
>> >> +		table_contended(p);
>> >> +
>> >>  		init_completion(&wait);
>> >>  		p->unregistering = &wait;
>> >> -		spin_unlock(&sysctl_lock);
>> >> +		unlock_sysctl();
>> >>  		wait_for_completion(&wait);
>> >> -		spin_lock(&sysctl_lock);
>> >> +		lock_sysctl();
>> >>  	} else {
>> >>  		/* anything non-NULL; we'll never dereference it */
>> >>  		p->unregistering = ERR_PTR(-EINVAL);
>> >>  	}
>> >> +	table_acquired(p);
>> >> +
>> >>  	/*
>> >>  	 * do not remove from the list until nobody holds it; walking the
>> >>  	 * list in do_sysctl() relies on that.
>> >>  	 */
>> >>  	list_del_init(&p->ctl_entry);
>> >> +
>> >> +	table_release(p);
>> >>  }
>
> There you acquire the table while holding the spinlock, generating:
> sysctl_lock -> table_lock, however you then release the sysctl_lock and
> re-acquire it, generating table_lock -> sysctl_lock.
>
> Humm, can't we write that differently?

That is an artifact of sysctl_lock being used to implement
table_lock as best as I can tell.  The case you point
out I could probably play with where I claim the lock
is acquired and make it work.

__sysctl_head_next on the read side is trickier.
We come in with table_lock held for read.
We grab sysctl_lock.
We release table_lock (aka the reference count is decremented)
We grab table_lock on the next table (aka the reference count is incremented)
We release sysctl_lock

If we generate lockdep annotations for that it would seem to transition
through the states:
table_lock
table_lock -> sysctl_lock
sysctl_lock
sysctl_lock -> table_lock
table_lock

Short of saying table_lock is an implementation detail.  Used to
make certain operations atomic I do not see how to model this case.

Let me take a slightly simpler case and ask how that gets modeled.
Looking at rwsem.  Ok all of the annotations are outside of the
spin_lock.  So in some sense we are sloppy, and fib to lockdep
about when the we acquire/release a lock.  In another sense
we are simply respecting the abstraction.

I guess I can take a look and see if I can model things a slightly
more lossy fashion so I don't need to do the __raw_spin_lock thing.


>> >> @@ -1951,7 +2011,13 @@ struct ctl_table_header *__register_sysctl_paths(
>> >>  		return NULL;
>> >>  	}
>> >>  #endif
>> >> -	spin_lock(&sysctl_lock);
>> >> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
>> >> +	{
>> >> +		static struct lock_class_key __key;
>> >> +		lockdep_init_map(&header->dep_map, "sysctl_used", &__key, 0);
>> >> +	}
>> >> +#endif	
>> >
>> > This means every sysctl thingy gets the same class, is that
>> > intended/desired?
>> 
>> There is only one place we initialize it, and as far as I know really
>> only one place we take it.  Which is the definition of a lockdep
>> class as far as I know.
>
> Indeed, just checking.

The only difference I can possibly see is read side versus write side.
Or in my case refcount side versus wait side.

Eric


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-21  7:42                           ` [PATCH 2/2] sysctl: lockdep support for sysctl reference counting Eric W. Biederman
  2009-03-30 22:26                             ` Andrew Morton
  2009-03-31  8:17                             ` Peter Zijlstra
@ 2009-04-10  9:18                             ` Andrew Morton
  2 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2009-04-10  9:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, Jan Beulich, tglx, mingo, hpa, linux-kernel,
	Gautham R Shenoy, Peter Zijlstra, Alexey Dobriyan, netdev

On Sat, 21 Mar 2009 00:42:19 -0700 ebiederm@xmission.com (Eric W. Biederman) wrote:

> It is possible for get lock ordering deadlocks between locks
> and waiting for the sysctl used count to drop to zero.  We have
> recently observed one of these in the networking code.
> 
> So teach the sysctl code how to speak lockdep so the kernel
> can warn about these kinds of rare issues proactively.

`make headers_check':

/usr/src/25/usr/include/linux/sysctl.h:31: included file 'linux/lockdep.h' is not exported
make[2]: *** [/usr/src/25/usr/include/linux/.check] Error 1

It looks like we'll need version 2 on these patches..

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2009-04-10  9:27 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-12 13:21 [PATCH, resend] eliminate spurious pointless WARN_ON()s Jan Beulich
2009-03-12 13:48 ` Andi Kleen
2009-03-13  8:52   ` Peter Zijlstra
2009-03-13  1:39 ` [tip:core/ipi] generic-ipi: " Jan Beulich
2009-03-13  8:54   ` Peter Zijlstra
2009-03-13  9:21     ` [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s Jan Beulich
2009-03-13  9:43       ` Peter Zijlstra
2009-03-13 10:38         ` Ingo Molnar
2009-03-19 22:14           ` Eric W. Biederman
2009-03-20  8:52             ` Ingo Molnar
2009-03-20  9:58               ` Eric W. Biederman
2009-03-20 18:24                 ` Ingo Molnar
2009-03-20 18:52                   ` Peter Zijlstra
2009-03-20 19:34                     ` cpu hotplug and lockdep (was: Re: [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s) Peter Zijlstra
2009-03-21  7:39                       ` [PATCH 0/2] sysctl: lockdep support Eric W. Biederman
2009-03-21  7:40                         ` [PATCH 1/2] sysctl: Don't take the use count of multiple heads at a time Eric W. Biederman
2009-03-21  7:42                           ` [PATCH 2/2] sysctl: lockdep support for sysctl reference counting Eric W. Biederman
2009-03-30 22:26                             ` Andrew Morton
2009-03-30 22:53                               ` Eric W. Biederman
2009-03-30 23:18                                 ` Andrew Morton
2009-03-30 23:50                                   ` Eric W. Biederman
2009-03-31  8:10                               ` Peter Zijlstra
2009-03-31  8:47                                 ` Eric W. Biederman
2009-03-31  8:17                             ` Peter Zijlstra
2009-03-31 13:40                               ` Eric W. Biederman
2009-03-31 15:35                                 ` Peter Zijlstra
2009-03-31 22:44                                   ` Eric W. Biederman
2009-04-10  9:18                             ` Andrew Morton
2009-03-20 23:40                     ` [tip:core/ipi] generic-ipi: eliminate spurious pointlessWARN_ON()s Eric W. Biederman
2009-03-21 10:20                       ` Peter Zijlstra
2009-03-13  9:31     ` [tip:core/ipi] generic-ipi: eliminate spurious pointless WARN_ON()s Ingo Molnar
2009-03-13 10:36 ` [tip:core/ipi] generic-ipi: eliminate WARN_ON()s during oops/panic Ingo Molnar
2009-03-13 10:36 ` [tip:core/ipi] panic: decrease oops_in_progress only after having done the panic Ingo Molnar
2009-03-13 10:36 ` [tip:core/ipi] panic, smp: provide smp_send_stop() wrapper on UP too Ingo Molnar
2009-03-13 10:36 ` [tip:core/ipi] panic: clean up kernel/panic.c Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.