All of lore.kernel.org
 help / color / mirror / Atom feed
* timer/hrtimer & cpu hotplug lockdep complaints
@ 2007-02-27 15:30 Heiko Carstens
  2007-03-02 12:58 ` [patch] timer/hrtimer: take per cpu locks in sane order Heiko Carstens
  0 siblings, 1 reply; 10+ messages in thread
From: Heiko Carstens @ 2007-02-27 15:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Thomas Gleixner, Martin Schwidefsky, john stultz

lockdep tells us that we have a possible circular locking dependency.
The output below is incomplete since our console driver deadlocked on
del_timer()...

    <4>Processor 4 spun down
    <4>
    <4>=======================================================
    <4>[ INFO: possible circular locking dependency detected ]
    <4>2.6.20-23.x.20070222-s390xdebug #1
    <4>-------------------------------------------------------
    <4>bash/1447 is trying to acquire lock:
    <4> (base_lock_keys + cpu#5){++..}, at: [<000000000013b686>] timer_cpu_notify+0xd2/0x3a4
    <4>
    <4>but task is already holding lock:
    <4> (base_lock_keys + cpu#7){++..}, at: [<000000000013b67c>] timer_cpu_notify+0xc8/0x3a4
    <4>
    <4>which lock already depends on the new lock.
    <4>
    <4>
    <4>the existing dependency chain (in reverse order) is:
    <4>
    <4>-> #1 (base_lock_keys + cpu#7){++..}:
    <4>       [<0000000000158380>] __lock_acquire+0xe40/0xf78
    <4>       [<000000000015854a>] lock_acquire+0x92/0xb8
    <4>       [<0000000000441eea>] _spin_lock+0x52/0x6c
    <4>       [<000000000013b686>] timer_cpu_notify+0xd2/0x3a4
    <4>       [<0000000000443c44>] notifier_call_chain+0x64/0x88
    <4>       [<000000000014200a>] raw_notifier_call_chain+0x26/0x38
    <4>       [<000000000015deee>] cpu_down+0x25e/0x350
    <4>       [<00000000002cb650>] store_online+0x64/0xb8

The problem in this case is the cpu hotplug code in kernel/timer.c:

	migrate_timers(...)
	...
	old_base = per_cpu(tvec_bases, cpu);
	new_base = get_cpu_var(tvec_bases);

	local_irq_disable();
	spin_lock(&new_base->lock);
	spin_lock(&old_base->lock);

takes per-cpu locks once in A-B order and once in B-A order, depending
on from which cpu another one gets killed. This should be ok, since
migrate_timers() can only be active on one cpu and therefore no deadlock
should occur...
The same problem is present in kernel/hrtimer.c btw.
Any idea how to fix this? Just adding a lockdep_off/on between the two
spin_lock()s will fix this, but I don't like it.

Btw.: this is the call trace that lead to the deadlock when printing the
lockdep message. Looks like it is a bad idea if the console driver needs
to take a lock that lockdep wants to complain about. But I don't think
this needs to be fixed.

 0 _raw_spin_lock+326 [0x2a90ee]
 1 _spin_lock_irqsave+108 [0x442198]
 2 lock_timer_base+64 [0x13c758]
 3 del_timer+84 [0x13cfc4]
 4 raw3215_write+610 [0x340fb6]
 5 con3215_write+96 [0x3410a4]
 6 __call_console_drivers+194 [0x12d95a]
 7 _call_console_drivers+142 [0x12da06]
 8 release_console_sem+460 [0x12dfc4]
 9 vprintk+546 [0x12e922]
10 printk+82 [0x12da9e]
11 __print_symbol+220 [0x162f40]
12 print_stack_trace+136 [0x152664]
13 print_circular_bug_entry+104 [0x154f84]
14 print_circular_bug_header+260 [0x1560fc]
15 check_noncircular+354 [0x15752e]
16 __lock_acquire+2970 [0x1580da]
17 lock_acquire+146 [0x15854a]
18 _spin_lock+82 [0x441eea]
19 timer_cpu_notify+210 [0x13b686]
20 notifier_call_chain+100 [0x443c44]
21 raw_notifier_call_chain+38 [0x14200a]
22 cpu_down+606 [0x15deee]
23 store_online+100 [0x2cb650]
24 sysdev_store+72 [0x2c56d0]
25 sysfs_write_file+286 [0x20ea52]
26 vfs_write+210 [0x1ad74e]
27 sys_write+86 [0x1adf7a]
28 sysc_noemu+16 [0x111ad0]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch] timer/hrtimer: take per cpu locks in sane order
  2007-02-27 15:30 timer/hrtimer & cpu hotplug lockdep complaints Heiko Carstens
@ 2007-03-02 12:58 ` Heiko Carstens
  2007-03-02 13:04   ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Heiko Carstens @ 2007-03-02 12:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Thomas Gleixner, Martin Schwidefsky, john stultz,
	Roman Zippel, Christian Borntraeger, linux-s390, linux-kernel

Doing something like this on a two cpu system

# echo 0 > /sys/devices/system/cpu/cpu0/online 
# echo 1 > /sys/devices/system/cpu/cpu0/online 
# echo 0 > /sys/devices/system/cpu/cpu1/online 

will give me this:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.21-rc2-g562aa1d4-dirty #7
-------------------------------------------------------
bash/1282 is trying to acquire lock:
 (&cpu_base->lock_key){.+..}, at: [<000000000005f17e>] hrtimer_cpu_notify+0xc6/0x240

but task is already holding lock:
 (&cpu_base->lock_key#2){.+..}, at: [<000000000005f174>] hrtimer_cpu_notify+0xbc/0x240

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&cpu_base->lock_key#2){.+..}:
       [<000000000006585e>] __lock_acquire+0x8d6/0x1184
       [<000000000006661c>] lock_acquire+0x90/0xc0
       [<000000000036c70a>] _spin_lock+0x4e/0x68
       [<000000000005f17e>] hrtimer_cpu_notify+0xc6/0x240
       [<000000000036e544>] notifier_call_chain+0x54/0x78
       [<00000000000508ba>] raw_notifier_call_chain+0x26/0x34
       [<000000000006d4b2>] cpu_down+0x276/0x36c
       [<00000000001c5764>] store_online+0x60/0xc0
       [<00000000001bf874>] sysdev_store+0x40/0x58
       [<000000000010aa3a>] sysfs_write_file+0x12e/0x1d0
       [<00000000000b0584>] vfs_write+0xa8/0x15c
       [<00000000000b0716>] sys_write+0x56/0x88
       [<000000000002275c>] sysc_noemu+0x10/0x16
       [<0000020000118da8>] 0x20000118da8

-> #0 (&cpu_base->lock_key){.+..}:
       [<00000000000656b8>] __lock_acquire+0x730/0x1184
       [<000000000006661c>] lock_acquire+0x90/0xc0
       [<000000000036c70a>] _spin_lock+0x4e/0x68
       [<000000000005f17e>] hrtimer_cpu_notify+0xc6/0x240
       [<000000000036e544>] notifier_call_chain+0x54/0x78
       [<00000000000508ba>] raw_notifier_call_chain+0x26/0x34
       [<000000000006d4b2>] cpu_down+0x276/0x36c
       [<00000000001c5764>] store_online+0x60/0xc0
       [<00000000001bf874>] sysdev_store+0x40/0x58
       [<000000000010aa3a>] sysfs_write_file+0x12e/0x1d0
       [<00000000000b0584>] vfs_write+0xa8/0x15c
       [<00000000000b0716>] sys_write+0x56/0x88
       [<000000000002275c>] sysc_noemu+0x10/0x16
       [<0000020000118da8>] 0x20000118da8

other info that might help us debug this:

4 locks held by bash/1282:
 #0:  (cpu_add_remove_lock){--..}, at: [<000000000036a472>] mutex_lock+0x3e/0x4c
 #1:  (cache_chain_mutex){--..}, at: [<000000000036a472>] mutex_lock+0x3e/0x4c
 #2:  (workqueue_mutex){--..}, at: [<000000000036a472>] mutex_lock+0x3e/0x4c
 #3:  (&cpu_base->lock_key#2){.+..}, at: [<000000000005f174>] hrtimer_cpu_notify+0xbc/0x240

stack backtrace:
4141e8b900000000 4141e8b900000000 3ff2c76e1deacc92 0000000000000000 
       0000000000000000 0000000000420498 0000000000420498 000000000001595e 
       0000000000000000 0000000000000004 0000000000000000 00000000004fadf0 
       0000000000000000 0000336e0000000d 000000000ce0f970 000000000ce0f9e8 
       0000000000375c30 000000000001595e 000000000ce0f970 000000000ce0f9c0 
Call Trace:
([<000000000001588a>] show_trace+0x92/0xe8)
 [<0000000000015980>] show_stack+0xa0/0xd0
 [<00000000000159de>] dump_stack+0x2e/0x3c
 [<0000000000062cdc>] print_circular_bug_tail+0xa0/0xa4
 [<00000000000656b8>] __lock_acquire+0x730/0x1184
 [<000000000006661c>] lock_acquire+0x90/0xc0
 [<000000000036c70a>] _spin_lock+0x4e/0x68
 [<000000000005f17e>] hrtimer_cpu_notify+0xc6/0x240
 [<000000000036e544>] notifier_call_chain+0x54/0x78
 [<00000000000508ba>] raw_notifier_call_chain+0x26/0x34
 [<000000000006d4b2>] cpu_down+0x276/0x36c
 [<00000000001c5764>] store_online+0x60/0xc0
 [<00000000001bf874>] sysdev_store+0x40/0x58
 [<000000000010aa3a>] sysfs_write_file+0x12e/0x1d0
 [<00000000000b0584>] vfs_write+0xa8/0x15c
 [<00000000000b0716>] sys_write+0x56/0x88
 [<000000000002275c>] sysc_noemu+0x10/0x16
 [<0000020000118da8>] 0x20000118da8

4 locks held by bash/1282:
 #0:  (cpu_add_remove_lock){--..}, at: [<000000000036a472>] mutex_lock+0x3e/0x4c
 #1:  (cache_chain_mutex){--..}, at: [<000000000036a472>] mutex_lock+0x3e/0x4c
 #2:  (workqueue_mutex){--..}, at: [<000000000036a472>] mutex_lock+0x3e/0x4c
 #3:  (&cpu_base->lock_key#2){.+..}, at: [<000000000005f174>] hrtimer_cpu_notify+0xbc/0x240

This happens because we have the following code in kernel/hrtimer.c:

migrate_hrtimers(int cpu)
[...]
old_base = &per_cpu(hrtimer_bases, cpu);
new_base = &get_cpu_var(hrtimer_bases);
[...]
spin_lock(&new_base->lock);
spin_lock(&old_base->lock);

Which means the spinlocks are taken in an order which depends on which cpu
gets shut down from which other cpu. Therefore lockdep complains that there
might be an ABBA deadlock. Since migrate_hrtimers() gets only called on
cpu hotplug it's safe to assume that it isn't executed concurrently on a
different cpu and therefore the locking should be ok.

The same problem exists in kernel/timer.c: migrate_timers(). As an added
bonus we get there a locked up system on s390, since the lock that lockdep
want's to complain about via printk is also needed by our 3215 console
driver (calls del_timer). So we end up in a deadlock.

As pointed out by Christian Borntraeger one possible solution to avoid
the locking order complaints would be to make sure that the locks are
always taken in the same order. E.g. by taking the lock of the cpu with
the lower number first. AFIACS this should be safe and that is what this
patch does.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Christian Borntraeger <cborntra@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 kernel/hrtimer.c |   17 ++++++++++++-----
 kernel/timer.c   |   16 ++++++++++++----
 2 files changed, 24 insertions(+), 9 deletions(-)

Index: linux-2.6/kernel/hrtimer.c
===================================================================
--- linux-2.6.orig/kernel/hrtimer.c
+++ linux-2.6/kernel/hrtimer.c
@@ -1346,6 +1346,7 @@ static void migrate_hrtimer_list(struct 
 static void migrate_hrtimers(int cpu)
 {
 	struct hrtimer_cpu_base *old_base, *new_base;
+	spinlock_t *lock1, *lock2;
 	int i;
 
 	BUG_ON(cpu_online(cpu));
@@ -1355,16 +1356,22 @@ static void migrate_hrtimers(int cpu)
 	tick_cancel_sched_timer(cpu);
 
 	local_irq_disable();
-
-	spin_lock(&new_base->lock);
-	spin_lock(&old_base->lock);
+	/*
+	 * If we take a lock from a different cpu, make sure we have always
+	 * the same locking order. That is the lock that belongs to the cpu
+	 * with the lowest number is taken first.
+	 */
+	lock1 = smp_processor_id() < cpu ? &new_base->lock : &old_base->lock;
+	lock2 = smp_processor_id() < cpu ? &old_base->lock : &new_base->lock;
+	spin_lock(lock1);
+	spin_lock(lock2);
 
 	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
 		migrate_hrtimer_list(&old_base->clock_base[i],
 				     &new_base->clock_base[i]);
 	}
-	spin_unlock(&old_base->lock);
-	spin_unlock(&new_base->lock);
+	spin_unlock(lock2);
+	spin_unlock(lock1);
 
 	local_irq_enable();
 	put_cpu_var(hrtimer_bases);
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -1644,6 +1644,7 @@ static void __devinit migrate_timers(int
 {
 	tvec_base_t *old_base;
 	tvec_base_t *new_base;
+	spinlock_t *lock1, *lock2;
 	int i;
 
 	BUG_ON(cpu_online(cpu));
@@ -1651,8 +1652,15 @@ static void __devinit migrate_timers(int
 	new_base = get_cpu_var(tvec_bases);
 
 	local_irq_disable();
-	spin_lock(&new_base->lock);
-	spin_lock(&old_base->lock);
+	/*
+	 * If we take a lock from a different cpu, make sure we have always
+	 * the same locking order. That is the lock that belongs to the cpu
+	 * with the lowest number is taken first.
+	 */
+	lock1 = smp_processor_id() < cpu ? &new_base->lock : &old_base->lock;
+	lock2 = smp_processor_id() < cpu ? &old_base->lock : &new_base->lock;
+	spin_lock(lock1);
+	spin_lock(lock2);
 
 	BUG_ON(old_base->running_timer);
 
@@ -1665,8 +1673,8 @@ static void __devinit migrate_timers(int
 		migrate_timer_list(new_base, old_base->tv5.vec + i);
 	}
 
-	spin_unlock(&old_base->lock);
-	spin_unlock(&new_base->lock);
+	spin_unlock(lock2);
+	spin_unlock(lock1);
 	local_irq_enable();
 	put_cpu_var(tvec_bases);
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] timer/hrtimer: take per cpu locks in sane order
  2007-03-02 12:58 ` [patch] timer/hrtimer: take per cpu locks in sane order Heiko Carstens
@ 2007-03-02 13:04   ` Ingo Molnar
  2007-03-02 14:23     ` Heiko Carstens
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2007-03-02 13:04 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Andrew Morton, Thomas Gleixner, Martin Schwidefsky, john stultz,
	Roman Zippel, Christian Borntraeger, linux-s390, linux-kernel


* Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> -	spin_lock(&new_base->lock);
> -	spin_lock(&old_base->lock);
> +	/*
> +	 * If we take a lock from a different cpu, make sure we have always
> +	 * the same locking order. That is the lock that belongs to the cpu
> +	 * with the lowest number is taken first.
> +	 */
> +	lock1 = smp_processor_id() < cpu ? &new_base->lock : &old_base->lock;
> +	lock2 = smp_processor_id() < cpu ? &old_base->lock : &new_base->lock;
> +	spin_lock(lock1);
> +	spin_lock(lock2);

looks good to me. Wouldnt this be cleaner via double_lock_timer() - 
similar to how double_rq_lock() works in kernel/sched.c - instead of 
open-coding it?

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] timer/hrtimer: take per cpu locks in sane order
  2007-03-02 13:04   ` Ingo Molnar
@ 2007-03-02 14:23     ` Heiko Carstens
  2007-03-02 16:48       ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Heiko Carstens @ 2007-03-02 14:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Thomas Gleixner, Martin Schwidefsky, john stultz,
	Roman Zippel, Christian Borntraeger, linux-s390, linux-kernel

On Fri, Mar 02, 2007 at 02:04:33PM +0100, Ingo Molnar wrote:
> 
> * Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> 
> > -	spin_lock(&new_base->lock);
> > -	spin_lock(&old_base->lock);
> > +	/*
> > +	 * If we take a lock from a different cpu, make sure we have always
> > +	 * the same locking order. That is the lock that belongs to the cpu
> > +	 * with the lowest number is taken first.
> > +	 */
> > +	lock1 = smp_processor_id() < cpu ? &new_base->lock : &old_base->lock;
> > +	lock2 = smp_processor_id() < cpu ? &old_base->lock : &new_base->lock;
> > +	spin_lock(lock1);
> > +	spin_lock(lock2);
> 
> looks good to me. Wouldnt this be cleaner via double_lock_timer() - 
> similar to how double_rq_lock() works in kernel/sched.c - instead of 
> open-coding it?

Something like the stuff below? Exploits the knowledge that the two
tvec_base_t's are in a per_cpu array. Otherwise I would end up passing
a lot of redundant stuff. But still I think that isn't a good solution
but rather a hack...?
I'd go for the patch above.

---
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -1640,6 +1640,28 @@ static void migrate_timer_list(tvec_base
 	}
 }
 
+static void __devinit double_tvec_lock(tvec_base_t *base1, tvec_base_t *base2)
+{
+	if (base1 < base2) {
+		spin_lock(&base1->lock);
+		spin_lock(&base2->lock);
+	} else {
+		spin_lock(&base2->lock);
+		spin_lock(&base1->lock);
+	}
+}
+
+static void __devinit double_tvec_unlock(tvec_base_t *base1, tvec_base_t *base2)
+{
+	if (base1 < base2) {
+		spin_unlock(&base1->lock);
+		spin_unlock(&base2->lock);
+	} else {
+		spin_unlock(&base2->lock);
+		spin_unlock(&base1->lock);
+	}
+}
+
 static void __devinit migrate_timers(int cpu)
 {
 	tvec_base_t *old_base;
@@ -1651,8 +1673,7 @@ static void __devinit migrate_timers(int
 	new_base = get_cpu_var(tvec_bases);
 
 	local_irq_disable();
-	spin_lock(&new_base->lock);
-	spin_lock(&old_base->lock);
+	double_tvec_lock(new_base, old_base);
 
 	BUG_ON(old_base->running_timer);
 
@@ -1665,8 +1686,7 @@ static void __devinit migrate_timers(int
 		migrate_timer_list(new_base, old_base->tv5.vec + i);
 	}
 
-	spin_unlock(&old_base->lock);
-	spin_unlock(&new_base->lock);
+	double_tvec_unlock(new_base, old_base);
 	local_irq_enable();
 	put_cpu_var(tvec_bases);
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] timer/hrtimer: take per cpu locks in sane order
  2007-03-02 14:23     ` Heiko Carstens
@ 2007-03-02 16:48       ` Andrew Morton
  2007-03-02 19:08         ` Heiko Carstens
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2007-03-02 16:48 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Ingo Molnar, Thomas Gleixner, Martin Schwidefsky, john stultz,
	Roman Zippel, Christian Borntraeger, linux-s390, linux-kernel

On Fri, 2 Mar 2007 15:23:08 +0100 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> On Fri, Mar 02, 2007 at 02:04:33PM +0100, Ingo Molnar wrote:
> > 
> > * Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > 
> > > -	spin_lock(&new_base->lock);
> > > -	spin_lock(&old_base->lock);
> > > +	/*
> > > +	 * If we take a lock from a different cpu, make sure we have always
> > > +	 * the same locking order. That is the lock that belongs to the cpu
> > > +	 * with the lowest number is taken first.
> > > +	 */
> > > +	lock1 = smp_processor_id() < cpu ? &new_base->lock : &old_base->lock;
> > > +	lock2 = smp_processor_id() < cpu ? &old_base->lock : &new_base->lock;
> > > +	spin_lock(lock1);
> > > +	spin_lock(lock2);
> > 
> > looks good to me. Wouldnt this be cleaner via double_lock_timer() - 
> > similar to how double_rq_lock() works in kernel/sched.c - instead of 
> > open-coding it?
> 
> Something like the stuff below? Exploits the knowledge that the two
> tvec_base_t's are in a per_cpu array. Otherwise I would end up passing
> a lot of redundant stuff. But still I think that isn't a good solution
> but rather a hack...?
> I'd go for the patch above.

Yeah, it'd be nicer to pass in the CPU number(s), use that to make the
ordering decision.  Perhaps (smp_processor_id() - cpu).

> ---
> Index: linux-2.6/kernel/timer.c
> ===================================================================
> --- linux-2.6.orig/kernel/timer.c
> +++ linux-2.6/kernel/timer.c
> @@ -1640,6 +1640,28 @@ static void migrate_timer_list(tvec_base
>  	}
>  }
>  
> +static void __devinit double_tvec_lock(tvec_base_t *base1, tvec_base_t *base2)
> +{
> +	if (base1 < base2) {
> +		spin_lock(&base1->lock);
> +		spin_lock(&base2->lock);
> +	} else {
> +		spin_lock(&base2->lock);
> +		spin_lock(&base1->lock);
> +	}
> +}
> +
> +static void __devinit double_tvec_unlock(tvec_base_t *base1, tvec_base_t *base2)
> +{
> +	if (base1 < base2) {
> +		spin_unlock(&base1->lock);
> +		spin_unlock(&base2->lock);
> +	} else {
> +		spin_unlock(&base2->lock);
> +		spin_unlock(&base1->lock);
> +	}
> +}

And to undo the locks in the reverse order from that in which they were
taken.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch] timer/hrtimer: take per cpu locks in sane order
  2007-03-02 16:48       ` Andrew Morton
@ 2007-03-02 19:08         ` Heiko Carstens
  2007-03-02 19:45           ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Heiko Carstens @ 2007-03-02 19:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Thomas Gleixner, Martin Schwidefsky, john stultz,
	Roman Zippel, Christian Borntraeger, linux-s390, linux-kernel

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Doing something like this on a two cpu system

# echo 0 > /sys/devices/system/cpu/cpu0/online 
# echo 1 > /sys/devices/system/cpu/cpu0/online 
# echo 0 > /sys/devices/system/cpu/cpu1/online 

will give me this:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.21-rc2-g562aa1d4-dirty #7
-------------------------------------------------------
bash/1282 is trying to acquire lock:
 (&cpu_base->lock_key){.+..}, at: [<000000000005f17e>] hrtimer_cpu_notify+0xc6/0x240

but task is already holding lock:
 (&cpu_base->lock_key#2){.+..}, at: [<000000000005f174>] hrtimer_cpu_notify+0xbc/0x240

which lock already depends on the new lock.

This happens because we have the following code in kernel/hrtimer.c:

migrate_hrtimers(int cpu)
[...]
old_base = &per_cpu(hrtimer_bases, cpu);
new_base = &get_cpu_var(hrtimer_bases);
[...]
spin_lock(&new_base->lock);
spin_lock(&old_base->lock);

Which means the spinlocks are taken in an order which depends on which cpu
gets shut down from which other cpu. Therefore lockdep complains that there
might be an ABBA deadlock. Since migrate_hrtimers() gets only called on
cpu hotplug it's safe to assume that it isn't executed concurrently on a
different cpu and therefore the locking should be ok.

The same problem exists in kernel/timer.c: migrate_timers().

As pointed out by Christian Borntraeger one possible solution to avoid
the locking order complaints would be to make sure that the locks are
always taken in the same order. E.g. by taking the lock of the cpu with
the lower number first. AFIACS this should be safe and that is what this
patch does.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Christian Borntraeger <cborntra@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 kernel/hrtimer.c |   39 ++++++++++++++++++++++++++++++++++-----
 kernel/timer.c   |   38 ++++++++++++++++++++++++++++++++++----
 2 files changed, 68 insertions(+), 9 deletions(-)

Index: linux-2.6/kernel/hrtimer.c
===================================================================
--- linux-2.6.orig/kernel/hrtimer.c
+++ linux-2.6/kernel/hrtimer.c
@@ -1343,6 +1343,38 @@ static void migrate_hrtimer_list(struct 
 	}
 }
 
+/*
+ * double_hrtimer_lock/unlock are used to ensure that on cpu hotplug the
+ * per cpu timer locks are always taken in the same order.
+ */
+static void double_hrtimer_lock(struct hrtimer_cpu_base *base1,
+				struct hrtimer_cpu_base *base2, int ind)
+	__acquires(base1->lock)
+	__acquires(base2->lock)
+{
+	if (ind < 0) {
+		spin_lock(&base1->lock);
+		spin_lock(&base2->lock);
+	} else {
+		spin_lock(&base2->lock);
+		spin_lock(&base1->lock);
+	}
+}
+
+static void double_hrtimer_unlock(struct hrtimer_cpu_base *base1,
+				  struct hrtimer_cpu_base *base2, int ind)
+	__releases(base1->lock)
+	__releases(base2->lock)
+{
+	if (ind < 0) {
+		spin_unlock(&base2->lock);
+		spin_unlock(&base1->lock);
+	} else {
+		spin_unlock(&base1->lock);
+		spin_unlock(&base2->lock);
+	}
+}
+
 static void migrate_hrtimers(int cpu)
 {
 	struct hrtimer_cpu_base *old_base, *new_base;
@@ -1355,17 +1387,14 @@ static void migrate_hrtimers(int cpu)
 	tick_cancel_sched_timer(cpu);
 
 	local_irq_disable();
-
-	spin_lock(&new_base->lock);
-	spin_lock(&old_base->lock);
+	double_hrtimer_lock(new_base, old_base, smp_processor_id() - cpu);
 
 	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
 		migrate_hrtimer_list(&old_base->clock_base[i],
 				     &new_base->clock_base[i]);
 	}
-	spin_unlock(&old_base->lock);
-	spin_unlock(&new_base->lock);
 
+	double_hrtimer_unlock(new_base, old_base, smp_processor_id() - cpu);
 	local_irq_enable();
 	put_cpu_var(hrtimer_bases);
 }
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -1640,6 +1640,38 @@ static void migrate_timer_list(tvec_base
 	}
 }
 
+/*
+ * double_timer_lock/unlock are used to ensure that on cpu hotplug the
+ * per cpu timer locks are always taken in the same order.
+ */
+static void __devinit double_timer_lock(tvec_base_t *base1,
+					tvec_base_t *base2, int ind)
+	__acquires(base1->lock)
+	__acquires(base2->lock)
+{
+	if (ind < 0) {
+		spin_lock(&base1->lock);
+		spin_lock(&base2->lock);
+	} else {
+		spin_lock(&base2->lock);
+		spin_lock(&base1->lock);
+	}
+}
+
+static void __devinit double_timer_unlock(tvec_base_t *base1,
+					  tvec_base_t *base2, int ind)
+	__releases(base1->lock)
+	__releases(base2->lock)
+{
+	if (ind < 0) {
+		spin_unlock(&base2->lock);
+		spin_unlock(&base1->lock);
+	} else {
+		spin_unlock(&base1->lock);
+		spin_unlock(&base2->lock);
+	}
+}
+
 static void __devinit migrate_timers(int cpu)
 {
 	tvec_base_t *old_base;
@@ -1651,8 +1683,7 @@ static void __devinit migrate_timers(int
 	new_base = get_cpu_var(tvec_bases);
 
 	local_irq_disable();
-	spin_lock(&new_base->lock);
-	spin_lock(&old_base->lock);
+	double_timer_lock(new_base, old_base, smp_processor_id() - cpu);
 
 	BUG_ON(old_base->running_timer);
 
@@ -1665,8 +1696,7 @@ static void __devinit migrate_timers(int
 		migrate_timer_list(new_base, old_base->tv5.vec + i);
 	}
 
-	spin_unlock(&old_base->lock);
-	spin_unlock(&new_base->lock);
+	double_timer_unlock(new_base, old_base, smp_processor_id() - cpu);
 	local_irq_enable();
 	put_cpu_var(tvec_bases);
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] timer/hrtimer: take per cpu locks in sane order
  2007-03-02 19:08         ` Heiko Carstens
@ 2007-03-02 19:45           ` Andrew Morton
  2007-03-02 21:39             ` Heiko Carstens
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2007-03-02 19:45 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Ingo Molnar, Thomas Gleixner, Martin Schwidefsky, john stultz,
	Roman Zippel, Christian Borntraeger, linux-s390, linux-kernel

On Fri, 2 Mar 2007 20:08:36 +0100
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> +/*
> + * double_hrtimer_lock/unlock are used to ensure that on cpu hotplug the
> + * per cpu timer locks are always taken in the same order.
> + */
> +static void double_hrtimer_lock(struct hrtimer_cpu_base *base1,
> +				struct hrtimer_cpu_base *base2, int ind)
> +	__acquires(base1->lock)
> +	__acquires(base2->lock)
> +{
>
> ...
>
> +/*
> + * double_timer_lock/unlock are used to ensure that on cpu hotplug the
> + * per cpu timer locks are always taken in the same order.
> + */
> +static void __devinit double_timer_lock(tvec_base_t *base1,
> +					tvec_base_t *base2, int ind)
> +	__acquires(base1->lock)
> +	__acquires(base2->lock)

hm.  Can we not just pass in the spinlock_t*'s and use a common function?

	void double_spin_lock(spinlock_t *l1, spinlock_t *l2, int ind);

that way it has nothing to do with timers and can potentially be used
elsewhere in the kernel, too.

(what does "ind" mean?)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] timer/hrtimer: take per cpu locks in sane order
  2007-03-02 19:45           ` Andrew Morton
@ 2007-03-02 21:39             ` Heiko Carstens
  2007-03-02 22:47               ` Heiko Carstens
  0 siblings, 1 reply; 10+ messages in thread
From: Heiko Carstens @ 2007-03-02 21:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Thomas Gleixner, Martin Schwidefsky, john stultz,
	Roman Zippel, Christian Borntraeger, linux-s390, linux-kernel

> > +/*
> > + * double_timer_lock/unlock are used to ensure that on cpu hotplug the
> > + * per cpu timer locks are always taken in the same order.
> > + */
> > +static void __devinit double_timer_lock(tvec_base_t *base1,
> > +					tvec_base_t *base2, int ind)
> > +	__acquires(base1->lock)
> > +	__acquires(base2->lock)
> 
> hm.  Can we not just pass in the spinlock_t*'s and use a common function?
> 
> 	void double_spin_lock(spinlock_t *l1, spinlock_t *l2, int ind);
> 
> that way it has nothing to do with timers and can potentially be used
> elsewhere in the kernel, too.
> 
> (what does "ind" mean?)

Sure. Will put a static inline function into include/linux/spinlock.h.
"ind" is supposed to be the short form of "indicator".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch] timer/hrtimer: take per cpu locks in sane order
  2007-03-02 21:39             ` Heiko Carstens
@ 2007-03-02 22:47               ` Heiko Carstens
  2007-03-04 23:24                 ` Matt Mackall
  0 siblings, 1 reply; 10+ messages in thread
From: Heiko Carstens @ 2007-03-02 22:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Thomas Gleixner, Martin Schwidefsky, john stultz,
	Roman Zippel, Christian Borntraeger, linux-s390, linux-kernel

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Doing something like this on a two cpu system

# echo 0 > /sys/devices/system/cpu/cpu0/online 
# echo 1 > /sys/devices/system/cpu/cpu0/online 
# echo 0 > /sys/devices/system/cpu/cpu1/online 

will give me this:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.21-rc2-g562aa1d4-dirty #7
-------------------------------------------------------
bash/1282 is trying to acquire lock:
 (&cpu_base->lock_key){.+..}, at: [<000000000005f17e>] hrtimer_cpu_notify+0xc6/0x240

but task is already holding lock:
 (&cpu_base->lock_key#2){.+..}, at: [<000000000005f174>] hrtimer_cpu_notify+0xbc/0x240

which lock already depends on the new lock.

This happens because we have the following code in kernel/hrtimer.c:

migrate_hrtimers(int cpu)
[...]
old_base = &per_cpu(hrtimer_bases, cpu);
new_base = &get_cpu_var(hrtimer_bases);
[...]
spin_lock(&new_base->lock);
spin_lock(&old_base->lock);

Which means the spinlocks are taken in an order which depends on which cpu
gets shut down from which other cpu. Therefore lockdep complains that there
might be an ABBA deadlock. Since migrate_hrtimers() gets only called on
cpu hotplug it's safe to assume that it isn't executed concurrently on a
different cpu and therefore the locking should be ok.

The same problem exists in kernel/timer.c: migrate_timers().

As pointed out by Christian Borntraeger one possible solution to avoid
the locking order complaints would be to make sure that the locks are
always taken in the same order. E.g. by taking the lock of the cpu with
the lower number first.

To achieve this we introduce two new spinlock functions double_spin_lock
and double_spin_unlock which lock or unlock two locks in a given order.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Christian Borntraeger <cborntra@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---

Next try. Hopefully the interface with "l1_first" and "l1_taken_first"
is not confusing?

 include/linux/spinlock.h |   37 +++++++++++++++++++++++++++++++++++++
 kernel/hrtimer.c         |    9 ++++-----
 kernel/timer.c           |    8 ++++----
 3 files changed, 45 insertions(+), 9 deletions(-)

Index: linux-2.6/kernel/hrtimer.c
===================================================================
--- linux-2.6.orig/kernel/hrtimer.c
+++ linux-2.6/kernel/hrtimer.c
@@ -1355,17 +1355,16 @@ static void migrate_hrtimers(int cpu)
 	tick_cancel_sched_timer(cpu);
 
 	local_irq_disable();
-
-	spin_lock(&new_base->lock);
-	spin_lock(&old_base->lock);
+	double_spin_lock(&new_base->lock, &old_base->lock,
+			 smp_processor_id() < cpu);
 
 	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
 		migrate_hrtimer_list(&old_base->clock_base[i],
 				     &new_base->clock_base[i]);
 	}
-	spin_unlock(&old_base->lock);
-	spin_unlock(&new_base->lock);
 
+	double_spin_unlock(&new_base->lock, &old_base->lock,
+			   smp_processor_id() < cpu);
 	local_irq_enable();
 	put_cpu_var(hrtimer_bases);
 }
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -1651,8 +1651,8 @@ static void __devinit migrate_timers(int
 	new_base = get_cpu_var(tvec_bases);
 
 	local_irq_disable();
-	spin_lock(&new_base->lock);
-	spin_lock(&old_base->lock);
+	double_spin_lock(&new_base->lock, &old_base->lock,
+			 smp_processor_id() < cpu);
 
 	BUG_ON(old_base->running_timer);
 
@@ -1665,8 +1665,8 @@ static void __devinit migrate_timers(int
 		migrate_timer_list(new_base, old_base->tv5.vec + i);
 	}
 
-	spin_unlock(&old_base->lock);
-	spin_unlock(&new_base->lock);
+	double_spin_unlock(&new_base->lock, &old_base->lock,
+			   smp_processor_id() < cpu);
 	local_irq_enable();
 	put_cpu_var(tvec_bases);
 }
Index: linux-2.6/include/linux/spinlock.h
===================================================================
--- linux-2.6.orig/include/linux/spinlock.h
+++ linux-2.6/include/linux/spinlock.h
@@ -283,6 +283,43 @@ do {						\
 })
 
 /*
+ * Locks two spinlocks l1 and l2.
+ * l1_first indicates if spinlock l1 should be taken first.
+ */
+static inline void double_spin_lock(spinlock_t *l1, spinlock_t *l2,
+				    bool l1_first)
+	__acquires(l1)
+	__acquires(l2)
+{
+	if (l1_first) {
+		spin_lock(l1);
+		spin_lock(l2);
+	} else {
+		spin_lock(l2);
+		spin_lock(l1);
+	}
+}
+
+/*
+ * Unlocks two spinlocks l1 and l2.
+ * l1_taken_first indicates if spinlock l1 was taken first and therefore
+ * should be released after spinlock l2.
+ */
+static inline void double_spin_unlock(spinlock_t *l1, spinlock_t *l2,
+				      bool l1_taken_first)
+	__releases(l1)
+	__releases(l2)
+{
+	if (l1_taken_first) {
+		spin_unlock(l2);
+		spin_unlock(l1);
+	} else {
+		spin_unlock(l1);
+		spin_unlock(l2);
+	}
+}
+
+/*
  * Pull the atomic_t declaration:
  * (asm-mips/atomic.h needs above definitions)
  */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] timer/hrtimer: take per cpu locks in sane order
  2007-03-02 22:47               ` Heiko Carstens
@ 2007-03-04 23:24                 ` Matt Mackall
  0 siblings, 0 replies; 10+ messages in thread
From: Matt Mackall @ 2007-03-04 23:24 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, Martin Schwidefsky,
	john stultz, Roman Zippel, Christian Borntraeger, linux-s390,
	linux-kernel

On Fri, Mar 02, 2007 at 11:47:52PM +0100, Heiko Carstens wrote:
>  /*
> + * Locks two spinlocks l1 and l2.
> + * l1_first indicates if spinlock l1 should be taken first.
> + */
> +static inline void double_spin_lock(spinlock_t *l1, spinlock_t *l2,
> +				    bool l1_first)
> +	__acquires(l1)
> +	__acquires(l2)
> +{
> +	if (l1_first) {
> +		spin_lock(l1);
> +		spin_lock(l2);
> +	} else {
> +		spin_lock(l2);
> +		spin_lock(l1);
> +	}
> +}

Two observations:

- We probably don't want people using this for locks that aren't
  explicitly in the same level of the hierarchy. The name should
  somehow indicate that. Something like spin_lock_siblings()?

- And once we know that, we can internally impose a natural stable
  ordering on them based on their addresses, eliminating the third
  argument and the need to duplicate the ordering calculation.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-03-04 23:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-27 15:30 timer/hrtimer & cpu hotplug lockdep complaints Heiko Carstens
2007-03-02 12:58 ` [patch] timer/hrtimer: take per cpu locks in sane order Heiko Carstens
2007-03-02 13:04   ` Ingo Molnar
2007-03-02 14:23     ` Heiko Carstens
2007-03-02 16:48       ` Andrew Morton
2007-03-02 19:08         ` Heiko Carstens
2007-03-02 19:45           ` Andrew Morton
2007-03-02 21:39             ` Heiko Carstens
2007-03-02 22:47               ` Heiko Carstens
2007-03-04 23:24                 ` Matt Mackall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.