clockevent: don't remove broadcast device when cpu is dead
diff mbox series

Message ID 1262834564-13033-1-git-send-email-dfeng@redhat.com
State New, archived
Headers show
Series
  • clockevent: don't remove broadcast device when cpu is dead
Related show

Commit Message

tip-bot for Xiaotian Feng Jan. 7, 2010, 3:22 a.m. UTC
Marc reported BUG during shutdown, after debugging, kernel is trying
to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.

The root cause for this bug is that in clockevents_notify,
"cpumask_weight(dev->cpumask) == 1" is always true even if dev is a
broadcast device. We need to use tick_is_broadcast_device to check
if it is a broadcast device.

Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Magnus Damm <damm@igel.co.jp>
Cc: H Hartley Sweeten <hsweeten@visionengravers.com>
---
 kernel/time/clockevents.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

tip-bot for Xiaotian Feng Jan. 12, 2010, 2:24 a.m. UTC | #1
Hi:

     Any comments on the patch? Marc confirmed this patch also fixed his 
hang at suspend/resume stage. Thanks.

     (Cc'ed stable@kernel.org)

Regards
Xiaotian

On 01/07/2010 11:22 AM, Xiaotian Feng wrote:
> Marc reported BUG during shutdown, after debugging, kernel is trying
> to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.
>
> The root cause for this bug is that in clockevents_notify,
> "cpumask_weight(dev->cpumask) == 1" is always true even if dev is a
> broadcast device. We need to use tick_is_broadcast_device to check
> if it is a broadcast device.
>
> Reported-by: Marc Dionne<marc.c.dionne@gmail.com>
> Tested-by: Marc Dionne<marc.c.dionne@gmail.com>
> Signed-off-by: Xiaotian Feng<dfeng@redhat.com>
> Cc: Thomas Gleixner<tglx@linutronix.de>
> Cc: Magnus Damm<damm@igel.co.jp>
> Cc: H Hartley Sweeten<hsweeten@visionengravers.com>
> ---
>   kernel/time/clockevents.c |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
> index 6f740d9..0223d83 100644
> --- a/kernel/time/clockevents.c
> +++ b/kernel/time/clockevents.c
> @@ -259,7 +259,7 @@ void clockevents_notify(unsigned long reason, void *arg)
>   		cpu = *((int *)arg);
>   		list_for_each_entry_safe(dev, tmp,&clockevent_devices, list) {
>   			if (cpumask_test_cpu(cpu, dev->cpumask)&&
> -			    cpumask_weight(dev->cpumask) == 1) {
> +			    !tick_is_broadcast_device(dev)) {
>   				BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
>   				list_del(&dev->list);
>   			}
>    

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Thomas Gleixner Jan. 12, 2010, 1:20 p.m. UTC | #2
On Thu, 7 Jan 2010, Xiaotian Feng wrote:

> Marc reported BUG during shutdown, after debugging, kernel is trying
> to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.
> 
> The root cause for this bug is that in clockevents_notify,
> "cpumask_weight(dev->cpumask) == 1" is always true even if dev is a

Why is cpumask_weight(dev->cpumask) == 1 always true when we shutdown
a non boot cpu ?

The broadcast device is not a per cpu device and the cpumask should
not only contain the CPU which is shut down !

The patch is papering over the real problem.

Marc, can you please apply the following debug patch and provide the
dmesg outputs from boot and shutdown ?

Thanks,

	tglx
---

Index: linux-2.6-tip/kernel/time/clockevents.c
===================================================================
--- linux-2.6-tip.orig/kernel/time/clockevents.c
+++ linux-2.6-tip/kernel/time/clockevents.c
@@ -186,7 +186,7 @@ void clockevents_register_device(struct 
 	BUG_ON(!dev->cpumask);
 
 	raw_spin_lock_irqsave(&clockevents_lock, flags);
-
+	printk(KERN_ERR "CE register %p %s\n", dev, dev->name);
 	list_add(&dev->list, &clockevent_devices);
 	clockevents_do_notify(CLOCK_EVT_NOTIFY_ADD, dev);
 	clockevents_notify_released();
@@ -220,6 +220,7 @@ void clockevents_exchange_device(struct 
 	 * released list and do a notify add later.
 	 */
 	if (old) {
+		printk(KERN_INFO "CE Release %p %s\n", old, old->name);
 		clockevents_set_mode(old, CLOCK_EVT_MODE_UNUSED);
 		list_del(&old->list);
 		list_add(&old->list, &clockevents_released);
@@ -260,6 +261,13 @@ void clockevents_notify(unsigned long re
 		list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
 			if (cpumask_test_cpu(cpu, dev->cpumask) &&
 			    cpumask_weight(dev->cpumask) == 1) {
+				if (dev->mode != CLOCK_EVT_MODE_UNUSED) {
+					printk(KERN_INFO
+					       "CE Remove %p %s bc: %d\n",
+					       dev, dev->name,
+					       tick_is_broadcast_device(dev));
+					continue;
+				}
 				BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
 				list_del(&dev->list);
 			}
Index: linux-2.6-tip/kernel/time/tick-broadcast.c
===================================================================
--- linux-2.6-tip.orig/kernel/time/tick-broadcast.c
+++ linux-2.6-tip/kernel/time/tick-broadcast.c
@@ -72,6 +72,8 @@ int tick_check_broadcast_device(struct c
 	     (dev->features & CLOCK_EVT_FEAT_C3STOP))
 		return 0;
 
+	printk(KERN_INFO "CE set broadcast %p %s\n", dev, dev->name);
+
 	clockevents_exchange_device(NULL, dev);
 	tick_broadcast_device.evtdev = dev;
 	if (!cpumask_empty(tick_get_broadcast_mask()))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Marc Dionne Jan. 13, 2010, 1:28 a.m. UTC | #3
On Tue, Jan 12, 2010 at 8:20 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 7 Jan 2010, Xiaotian Feng wrote:
>
>> Marc reported BUG during shutdown, after debugging, kernel is trying
>> to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.
>>
>> The root cause for this bug is that in clockevents_notify,
>> "cpumask_weight(dev->cpumask) == 1" is always true even if dev is a
>
> Why is cpumask_weight(dev->cpumask) == 1 always true when we shutdown
> a non boot cpu ?
>
> The broadcast device is not a per cpu device and the cpumask should
> not only contain the CPU which is shut down !
>
> The patch is papering over the real problem.
>
> Marc, can you please apply the following debug patch and provide the
> dmesg outputs from boot and shutdown ?
>
> Thanks,
>
>        tglx

Full dmesg output from boot is attached.

On shutdown I wrote down the message just before the BUG by hand
(after modifying your patch a little to let it BUG):

CE Remove ffffffff81a44100 hpet bc: 1

Marc
tip-bot for Xiaotian Feng Jan. 13, 2010, 1:48 a.m. UTC | #4
On 01/12/2010 09:20 PM, Thomas Gleixner wrote:
> On Thu, 7 Jan 2010, Xiaotian Feng wrote:
>
>> Marc reported BUG during shutdown, after debugging, kernel is trying
>> to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.
>>
>> The root cause for this bug is that in clockevents_notify,
>> "cpumask_weight(dev->cpumask) == 1" is always true even if dev is a
>
> Why is cpumask_weight(dev->cpumask) == 1 always true when we shutdown
> a non boot cpu ?
>
> The broadcast device is not a per cpu device and the cpumask should
> not only contain the CPU which is shut down !

At least for hpet broadcast dev, it's dev->cpumask is only contain the 
CPU which it is initialized from.
And for broadcast device, kernel is using tick_broadcast_mask not 
dev->cpumask, right?

>
> The patch is papering over the real problem.
>
> Marc, can you please apply the following debug patch and provide the
> dmesg outputs from boot and shutdown ?
>
> Thanks,
>
> 	tglx
> ---
>
> Index: linux-2.6-tip/kernel/time/clockevents.c
> ===================================================================
> --- linux-2.6-tip.orig/kernel/time/clockevents.c
> +++ linux-2.6-tip/kernel/time/clockevents.c
> @@ -186,7 +186,7 @@ void clockevents_register_device(struct
>   	BUG_ON(!dev->cpumask);
>
>   	raw_spin_lock_irqsave(&clockevents_lock, flags);
> -
> +	printk(KERN_ERR "CE register %p %s\n", dev, dev->name);
>   	list_add(&dev->list,&clockevent_devices);
>   	clockevents_do_notify(CLOCK_EVT_NOTIFY_ADD, dev);
>   	clockevents_notify_released();
> @@ -220,6 +220,7 @@ void clockevents_exchange_device(struct
>   	 * released list and do a notify add later.
>   	 */
>   	if (old) {
> +		printk(KERN_INFO "CE Release %p %s\n", old, old->name);
>   		clockevents_set_mode(old, CLOCK_EVT_MODE_UNUSED);
>   		list_del(&old->list);
>   		list_add(&old->list,&clockevents_released);
> @@ -260,6 +261,13 @@ void clockevents_notify(unsigned long re
>   		list_for_each_entry_safe(dev, tmp,&clockevent_devices, list) {
>   			if (cpumask_test_cpu(cpu, dev->cpumask)&&
>   			cpumask_weight(dev->cpumask) == 1) {
> +				if (dev->mode != CLOCK_EVT_MODE_UNUSED) {
> +					printk(KERN_INFO
> +					       "CE Remove %p %s bc: %d\n",
> +					       dev, dev->name,
> +					       tick_is_broadcast_device(dev));
> +					continue;
> +				}
>   				BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
>   				list_del(&dev->list);
>   			}
> Index: linux-2.6-tip/kernel/time/tick-broadcast.c
> ===================================================================
> --- linux-2.6-tip.orig/kernel/time/tick-broadcast.c
> +++ linux-2.6-tip/kernel/time/tick-broadcast.c
> @@ -72,6 +72,8 @@ int tick_check_broadcast_device(struct c
>   	     (dev->features&  CLOCK_EVT_FEAT_C3STOP))
>   		return 0;
>
> +	printk(KERN_INFO "CE set broadcast %p %s\n", dev, dev->name);
> +
>   	clockevents_exchange_device(NULL, dev);
>   	tick_broadcast_device.evtdev = dev;
>   	if (!cpumask_empty(tick_get_broadcast_mask()))
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Thomas Gleixner Jan. 13, 2010, 10:08 p.m. UTC | #5
On Wed, 13 Jan 2010, Xiaotian Feng wrote:
> On 01/12/2010 09:20 PM, Thomas Gleixner wrote:
> > On Thu, 7 Jan 2010, Xiaotian Feng wrote:
> > 
> > > Marc reported BUG during shutdown, after debugging, kernel is trying
> > > to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.
> > > 
> > > The root cause for this bug is that in clockevents_notify,
> > > "cpumask_weight(dev->cpumask) == 1" is always true even if dev is a
> > 
> > Why is cpumask_weight(dev->cpumask) == 1 always true when we shutdown
> > a non boot cpu ?
> > 
> > The broadcast device is not a per cpu device and the cpumask should
> > not only contain the CPU which is shut down !
> 
> At least for hpet broadcast dev, it's dev->cpumask is only contain the CPU
> which it is initialized from.

Which is fundamentaly wrong and the root cause of the problem. I'll
have a look tomorrow morning when my brain is more awake than now.

> And for broadcast device, kernel is using tick_broadcast_mask not
> dev->cpumask, right?

No, tick_broadcast_mask is the bitmask which tells us which cpus get
the broadcast IPI.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
tip-bot for Xiaotian Feng Jan. 14, 2010, 11:43 a.m. UTC | #6
On 01/14/2010 06:08 AM, Thomas Gleixner wrote:
> On Wed, 13 Jan 2010, Xiaotian Feng wrote:
>> On 01/12/2010 09:20 PM, Thomas Gleixner wrote:
>>> On Thu, 7 Jan 2010, Xiaotian Feng wrote:
>>>
>>>> Marc reported BUG during shutdown, after debugging, kernel is trying
>>>> to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.
>>>>
>>>> The root cause for this bug is that in clockevents_notify,
>>>> "cpumask_weight(dev->cpumask) == 1" is always true even if dev is a
>>>
>>> Why is cpumask_weight(dev->cpumask) == 1 always true when we shutdown
>>> a non boot cpu ?
>>>
>>> The broadcast device is not a per cpu device and the cpumask should
>>> not only contain the CPU which is shut down !
>>
>> At least for hpet broadcast dev, it's dev->cpumask is only contain the CPU
>> which it is initialized from.
>
> Which is fundamentaly wrong and the root cause of the problem. I'll
> have a look tomorrow morning when my brain is more awake than now.

hpet_legacy_clockevent_register is trying to register new CE, but 
replace failed,
then in tick_check_new_device -> tick_check_broadcast_device, the legacy 
hpet CE
was registered as multicast device, but its dev->cpumask is cpumask of 
smp_processor_id().

on my system its dev->cpumask is cpumask of 0, but in Marc's, 
dev->cpumask is cpumask of 4.
So when kernel is trying to offline cpu 4, the broadcast hpet is removed.

>
>> And for broadcast device, kernel is using tick_broadcast_mask not
>> dev->cpumask, right?
>
> No, tick_broadcast_mask is the bitmask which tells us which cpus get
> the broadcast IPI.
>
> Thanks,
>
> 	tglx
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Thomas Gleixner Jan. 14, 2010, 11:52 a.m. UTC | #7
On Thu, 14 Jan 2010, Xiaotian Feng wrote:
> On 01/14/2010 06:08 AM, Thomas Gleixner wrote:
> > On Wed, 13 Jan 2010, Xiaotian Feng wrote:
> > > On 01/12/2010 09:20 PM, Thomas Gleixner wrote:
> > > > On Thu, 7 Jan 2010, Xiaotian Feng wrote:
> > > > 
> > > > > Marc reported BUG during shutdown, after debugging, kernel is trying
> > > > > to remove a broadcast device which mode is CLOCK_EVT_MODE_ONESHOT.
> > > > > 
> > > > > The root cause for this bug is that in clockevents_notify,
> > > > > "cpumask_weight(dev->cpumask) == 1" is always true even if dev is a
> > > > 
> > > > Why is cpumask_weight(dev->cpumask) == 1 always true when we shutdown
> > > > a non boot cpu ?
> > > > 
> > > > The broadcast device is not a per cpu device and the cpumask should
> > > > not only contain the CPU which is shut down !
> > > 
> > > At least for hpet broadcast dev, it's dev->cpumask is only contain the CPU
> > > which it is initialized from.
> > 
> > Which is fundamentaly wrong and the root cause of the problem. I'll
> > have a look tomorrow morning when my brain is more awake than now.
> 
> hpet_legacy_clockevent_register is trying to register new CE, but replace
> failed,
> then in tick_check_new_device -> tick_check_broadcast_device, the legacy hpet
> CE
> was registered as multicast device, but its dev->cpumask is cpumask of
> smp_processor_id().
> 
> on my system its dev->cpumask is cpumask of 0, but in Marc's, dev->cpumask is
> cpumask of 4.
> So when kernel is trying to offline cpu 4, the broadcast hpet is removed.

I know, but the point is that a broadcast device should not be bound
to the CPU which is registering it. Working on a fix.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch
diff mbox series

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 6f740d9..0223d83 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -259,7 +259,7 @@  void clockevents_notify(unsigned long reason, void *arg)
 		cpu = *((int *)arg);
 		list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
 			if (cpumask_test_cpu(cpu, dev->cpumask) &&
-			    cpumask_weight(dev->cpumask) == 1) {
+			    !tick_is_broadcast_device(dev)) {
 				BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
 				list_del(&dev->list);
 			}