linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] NOHZ, check to see if tick device is initialized in IRQ handling path
@ 2013-04-30 12:36 Prarit Bhargava
  2013-05-02 22:51 ` Tony Luck
  2013-05-02 22:56 ` Thomas Gleixner
  0 siblings, 2 replies; 17+ messages in thread
From: Prarit Bhargava @ 2013-04-30 12:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Prarit Bhargava, Thomas Gleixner, John Stultz

2nd try at this ... going with a more global cc.

I think the linux.git "system hang" isn't really a hang.  For some reason the
panic text wasn't displayed on the console.  I've seen this behaviour a few
times now ... maybe there's a bug in the panic output path?

It seems that the power interrupt is an error with the CPU exceeded the
OSes current requested frequency on the package.  If I disable on demand
cpu frequency, the problem goes away.

Anyhoo, here's a patch...

----8<----

When adding a CPU there is a small window in which interrupts are enabled and
the clock tick device has not been initialized.  If an interrupt occurs in
this window, irq_exit() will be called which calls tick_nohz_irq_exit() which
in turn calls __tick_nohz_idle_enter().

__tick_nohz_idle() enter assumes that the tick has been initialized.  In the
above case, however, it has not and this leads to what appears to be a system
hang on latest linux.git or a the following panic on RHEL6:

Pid: 0, comm: swapper Not tainted 2.6.32-358.el6.x86_64 #1
RIP: 0010:[<ffffffff810a89e5>]  [<ffffffff810a89e5>] tick_nohz_stop_sched_tick+0x2a5/0x3e0
RSP: 0018:ffff88089c503f38  EFLAGS: 00010046
RAX: ffffffff81c07520 RBX: ffff88089c5116a0 RCX: 000002f04bb18cd8
RDX: 0000000000000000 RSI: 000000000000a1b5 RDI: 000002f04bb0eb23
RBP: ffff88089c503f88 R08: ffff88089c50e060 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000017
R13: 000002f04bb17dd5 R14: 0000000000000000 R15: 0000000000000092
FS:  0000000000000000(0000) GS:ffff88089c500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000078 CR3: 0000000001a85000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8810745c0000, task ffff8808740f2080)
Stack:
 00000000000116a0 0000000000000087 ffff88089c503f78 0000000000000046
<d> ffff88089c503f98 0000000000000000 0000000000000000 0000000000000000
<d> 0000000000000000 0000000000000000 ffff88089c503f98 ffffffff81076d86
Call Trace:
 <IRQ>
 [<ffffffff81076d86>] irq_exit+0x76/0x90
 [<ffffffff81028dd6>] smp_thermal_interrupt+0x26/0x40
 [<ffffffff8100bcf3>] thermal_interrupt+0x13/0x20
 <EOI>
 [<ffffffff81506997>] ? start_secondary+0x127/0x2ef
 [<ffffffff81506990>] ? start_secondary+0x120/0x2ef

The code currently assumes that the tick device is initialized when
irq_enter() and irq_exit() are called.  This is not correct and a check must
be performed prior to entering the tick code through these code paths to
ensure that the tick device is initialized and running.

I've only seen this occur on a few systems.  I've tested with and without the
patch and as far as I can tell this patch resolves the problem on
linux.git top of tree.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
---
 kernel/time/tick-sched.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a19a399..5027187 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -567,6 +567,12 @@ EXPORT_SYMBOL_GPL(tick_nohz_idle_enter);
 void tick_nohz_irq_exit(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+	struct clock_event_device *dev =
+				     __get_cpu_var(tick_cpu_device).evtdev;
+
+	/* Has the tick been initialized yet? */
+	if (unlikely(!dev || dev->mode == CLOCK_EVT_MODE_UNUSED))
+		return;
 
 	if (!ts->inidle)
 		return;
@@ -809,6 +815,12 @@ static inline void tick_check_nohz(int cpu) { }
  */
 void tick_check_idle(int cpu)
 {
+	struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev;
+
+	/* Has the tick been initialized yet? */
+	if (unlikely(!dev || dev->mode == CLOCK_EVT_MODE_UNUSED))
+		return;
+
 	tick_check_oneshot_broadcast(cpu);
 	tick_check_nohz(cpu);
 }
-- 
1.7.9.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] NOHZ, check to see if tick device is initialized in IRQ handling path
  2013-04-30 12:36 [PATCH] NOHZ, check to see if tick device is initialized in IRQ handling path Prarit Bhargava
@ 2013-05-02 22:51 ` Tony Luck
  2013-05-02 22:56 ` Thomas Gleixner
  1 sibling, 0 replies; 17+ messages in thread
From: Tony Luck @ 2013-05-02 22:51 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: Linux Kernel Mailing List, Thomas Gleixner, John Stultz

>  void tick_nohz_irq_exit(void)
>  {
>         struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
> +       struct clock_event_device *dev =
> +                                    __get_cpu_var(tick_cpu_device).evtdev;
> +
> +       /* Has the tick been initialized yet? */
> +       if (unlikely(!dev || dev->mode == CLOCK_EVT_MODE_UNUSED))
> +               return;

Could we have something in the "struct tick_sched" to tell us whether
it has been set up? Rather than this somewhat convoluted digging
around in the clock_event_device innards?

> +       if (unlikely(!dev || dev->mode == CLOCK_EVT_MODE_UNUSED))
> +               return;

Ditto here.

-Tony

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] NOHZ, check to see if tick device is initialized in IRQ handling path
  2013-04-30 12:36 [PATCH] NOHZ, check to see if tick device is initialized in IRQ handling path Prarit Bhargava
  2013-05-02 22:51 ` Tony Luck
@ 2013-05-02 22:56 ` Thomas Gleixner
  2013-05-03  8:10   ` Thomas Gleixner
  1 sibling, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2013-05-02 22:56 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: LKML, John Stultz, Yinghai Lu, Dave Jones

On Tue, 30 Apr 2013, Prarit Bhargava wrote:

> 2nd try at this ... going with a more global cc.
> 
> I think the linux.git "system hang" isn't really a hang.  For some reason the
> panic text wasn't displayed on the console.  I've seen this behaviour a few
> times now ... maybe there's a bug in the panic output path?
> 
> It seems that the power interrupt is an error with the CPU exceeded the
> OSes current requested frequency on the package.  If I disable on demand
> cpu frequency, the problem goes away.

Huch?
 
> Anyhoo, here's a patch...
> 
> ----8<----
> 
> When adding a CPU there is a small window in which interrupts are enabled and
> the clock tick device has not been initialized.  If an interrupt occurs in

What's that small window and why does it exist?

> this window, irq_exit() will be called which calls tick_nohz_irq_exit() which
> in turn calls __tick_nohz_idle_enter().
> 
> __tick_nohz_idle() enter assumes that the tick has been initialized.  In the
> above case, however, it has not and this leads to what appears to be a system
> hang on latest linux.git or a the following panic on RHEL6:
> 
> Pid: 0, comm: swapper Not tainted 2.6.32-358.el6.x86_64 #1
> RIP: 0010:[<ffffffff810a89e5>]  [<ffffffff810a89e5>] tick_nohz_stop_sched_tick+0x2a5/0x3e0
> RSP: 0018:ffff88089c503f38  EFLAGS: 00010046
> RAX: ffffffff81c07520 RBX: ffff88089c5116a0 RCX: 000002f04bb18cd8
> RDX: 0000000000000000 RSI: 000000000000a1b5 RDI: 000002f04bb0eb23
> RBP: ffff88089c503f88 R08: ffff88089c50e060 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000017
> R13: 000002f04bb17dd5 R14: 0000000000000000 R15: 0000000000000092
> FS:  0000000000000000(0000) GS:ffff88089c500000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000078 CR3: 0000000001a85000 CR4: 00000000001406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffff8810745c0000, task ffff8808740f2080)
> Stack:
>  00000000000116a0 0000000000000087 ffff88089c503f78 0000000000000046
> <d> ffff88089c503f98 0000000000000000 0000000000000000 0000000000000000
> <d> 0000000000000000 0000000000000000 ffff88089c503f98 ffffffff81076d86
> Call Trace:
>  <IRQ>
>  [<ffffffff81076d86>] irq_exit+0x76/0x90
>  [<ffffffff81028dd6>] smp_thermal_interrupt+0x26/0x40
>  [<ffffffff8100bcf3>] thermal_interrupt+0x13/0x20
>  <EOI>
>  [<ffffffff81506997>] ? start_secondary+0x127/0x2ef
>  [<ffffffff81506990>] ? start_secondary+0x120/0x2ef
> 
> The code currently assumes that the tick device is initialized when
> irq_enter() and irq_exit() are called.  This is not correct and a check must
> be performed prior to entering the tick code through these code paths to
> ensure that the tick device is initialized and running.
> 
> I've only seen this occur on a few systems.  I've tested with and without the
> patch and as far as I can tell this patch resolves the problem on
> linux.git top of tree.
> 
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: John Stultz <john.stultz@linaro.org>
> ---
>  kernel/time/tick-sched.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index a19a399..5027187 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -567,6 +567,12 @@ EXPORT_SYMBOL_GPL(tick_nohz_idle_enter);
>  void tick_nohz_irq_exit(void)
>  {
>  	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
> +	struct clock_event_device *dev =
> +				     __get_cpu_var(tick_cpu_device).evtdev;
> +
> +	/* Has the tick been initialized yet? */
> +	if (unlikely(!dev || dev->mode == CLOCK_EVT_MODE_UNUSED))
> +		return;
>  
>  	if (!ts->inidle)
>  		return;
> @@ -809,6 +815,12 @@ static inline void tick_check_nohz(int cpu) { }
>   */
>  void tick_check_idle(int cpu)
>  {
> +	struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev;
> +
> +	/* Has the tick been initialized yet? */
> +	if (unlikely(!dev || dev->mode == CLOCK_EVT_MODE_UNUSED))
> +		return;
> +
>  	tick_check_oneshot_broadcast(cpu);
>  	tick_check_nohz(cpu);
>  }
> -- 
> 1.7.9.3
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] NOHZ, check to see if tick device is initialized in IRQ handling path
  2013-05-02 22:56 ` Thomas Gleixner
@ 2013-05-03  8:10   ` Thomas Gleixner
  2013-05-03 12:34     ` Prarit Bhargava
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2013-05-03  8:10 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: LKML, John Stultz, Yinghai Lu, Dave Jones

On Fri, 3 May 2013, Thomas Gleixner wrote:

> On Tue, 30 Apr 2013, Prarit Bhargava wrote:
> 
> > 2nd try at this ... going with a more global cc.
> > 
> > I think the linux.git "system hang" isn't really a hang.  For some reason the
> > panic text wasn't displayed on the console.  I've seen this behaviour a few
> > times now ... maybe there's a bug in the panic output path?
> > 
> > It seems that the power interrupt is an error with the CPU exceeded the
> > OSes current requested frequency on the package.  If I disable on demand
> > cpu frequency, the problem goes away.
> 
> Huch?
>  
> > Anyhoo, here's a patch...
> > 
> > ----8<----
> > 
> > When adding a CPU there is a small window in which interrupts are enabled and
> > the clock tick device has not been initialized.  If an interrupt occurs in
> 
> What's that small window and why does it exist?
> 
> > this window, irq_exit() will be called which calls tick_nohz_irq_exit() which
> > in turn calls __tick_nohz_idle_enter().
> > 
> > __tick_nohz_idle() enter assumes that the tick has been initialized.  In the
> > above case, however, it has not and this leads to what appears to be a system
> > hang on latest linux.git or a the following panic on RHEL6:
> > 
> > Pid: 0, comm: swapper Not tainted 2.6.32-358.el6.x86_64 #1
> > RIP: 0010:[<ffffffff810a89e5>]  [<ffffffff810a89e5>] tick_nohz_stop_sched_tick+0x2a5/0x3e0
> > RSP: 0018:ffff88089c503f38  EFLAGS: 00010046
> > RAX: ffffffff81c07520 RBX: ffff88089c5116a0 RCX: 000002f04bb18cd8
> > RDX: 0000000000000000 RSI: 000000000000a1b5 RDI: 000002f04bb0eb23
> > RBP: ffff88089c503f88 R08: ffff88089c50e060 R09: 0000000000000000
> > R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000017
> > R13: 000002f04bb17dd5 R14: 0000000000000000 R15: 0000000000000092
> > FS:  0000000000000000(0000) GS:ffff88089c500000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > CR2: 0000000000000078 CR3: 0000000001a85000 CR4: 00000000001406e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process swapper (pid: 0, threadinfo ffff8810745c0000, task ffff8808740f2080)
> > Stack:
> >  00000000000116a0 0000000000000087 ffff88089c503f78 0000000000000046
> > <d> ffff88089c503f98 0000000000000000 0000000000000000 0000000000000000
> > <d> 0000000000000000 0000000000000000 ffff88089c503f98 ffffffff81076d86
> > Call Trace:
> >  <IRQ>
> >  [<ffffffff81076d86>] irq_exit+0x76/0x90
> >  [<ffffffff81028dd6>] smp_thermal_interrupt+0x26/0x40
> >  [<ffffffff8100bcf3>] thermal_interrupt+0x13/0x20
> >  <EOI>
> >  [<ffffffff81506997>] ? start_secondary+0x127/0x2ef
> >  [<ffffffff81506990>] ? start_secondary+0x120/0x2ef
> > 
> > The code currently assumes that the tick device is initialized when
> > irq_enter() and irq_exit() are called.  This is not correct and a check must
> > be performed prior to entering the tick code through these code paths to
> > ensure that the tick device is initialized and running.
> > 
> > I've only seen this occur on a few systems.  I've tested with and without the
> > patch and as far as I can tell this patch resolves the problem on
> > linux.git top of tree.
> > 
> > Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: John Stultz <john.stultz@linaro.org>
> > ---
> >  kernel/time/tick-sched.c |   12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index a19a399..5027187 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -567,6 +567,12 @@ EXPORT_SYMBOL_GPL(tick_nohz_idle_enter);
> >  void tick_nohz_irq_exit(void)
> >  {
> >  	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
> > +	struct clock_event_device *dev =
> > +				     __get_cpu_var(tick_cpu_device).evtdev;
> > +
> > +	/* Has the tick been initialized yet? */
> > +	if (unlikely(!dev || dev->mode == CLOCK_EVT_MODE_UNUSED))
> > +		return;
> >  
> >  	if (!ts->inidle)
> >  		return;

This does not make any sense at all.

If ts->inidle is set, then the cpu has entered the idle loop. The
local apic timer is registered _BEFORE_ the idle loop is reached. So
how would dev end up being NULL?

So even if an interrupt hits right before we register the local APIC
timer, ts->inidle cannot be set.

The same is true for tick_check_idle(). The cpu cannot be in the
broadcast mask _BEFORE_ it went deep idle (from the idle loop) and
neither ts->idle_active nor ts->tick_stopped can be set.

So there is something else wrong and you're just papering over the the
underlying issue.

Can you please instrument the order of events (apic registration etc),
so we can see what goes wrong.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] NOHZ, check to see if tick device is initialized in IRQ handling path
  2013-05-03  8:10   ` Thomas Gleixner
@ 2013-05-03 12:34     ` Prarit Bhargava
  2013-05-03 13:02       ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Prarit Bhargava @ 2013-05-03 12:34 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz, Yinghai Lu, Dave Jones



On 05/03/2013 04:10 AM, Thomas Gleixner wrote:
> On Fri, 3 May 2013, Thomas Gleixner wrote:
> 
>> On Tue, 30 Apr 2013, Prarit Bhargava wrote:
>>
>>> 2nd try at this ... going with a more global cc.
>>>
>>> I think the linux.git "system hang" isn't really a hang.  For some reason the
>>> panic text wasn't displayed on the console.  I've seen this behaviour a few
>>> times now ... maybe there's a bug in the panic output path?
>>>
>>> It seems that the power interrupt is an error with the CPU exceeded the
>>> OSes current requested frequency on the package.  If I disable on demand
>>> cpu frequency, the problem goes away.
>>
>> Huch?
>>  
>>> Anyhoo, here's a patch...
>>>
>>> ----8<----
>>>
>>> When adding a CPU there is a small window in which interrupts are enabled and
>>> the clock tick device has not been initialized.  If an interrupt occurs in
>>
>> What's that small window and why does it exist?
>>
>>> this window, irq_exit() will be called which calls tick_nohz_irq_exit() which
>>> in turn calls __tick_nohz_idle_enter().
>>>
>>> __tick_nohz_idle() enter assumes that the tick has been initialized.  In the
>>> above case, however, it has not and this leads to what appears to be a system
>>> hang on latest linux.git or a the following panic on RHEL6:
>>>
>>> Pid: 0, comm: swapper Not tainted 2.6.32-358.el6.x86_64 #1
>>> RIP: 0010:[<ffffffff810a89e5>]  [<ffffffff810a89e5>] tick_nohz_stop_sched_tick+0x2a5/0x3e0
>>> RSP: 0018:ffff88089c503f38  EFLAGS: 00010046
>>> RAX: ffffffff81c07520 RBX: ffff88089c5116a0 RCX: 000002f04bb18cd8
>>> RDX: 0000000000000000 RSI: 000000000000a1b5 RDI: 000002f04bb0eb23
>>> RBP: ffff88089c503f88 R08: ffff88089c50e060 R09: 0000000000000000
>>> R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000017
>>> R13: 000002f04bb17dd5 R14: 0000000000000000 R15: 0000000000000092
>>> FS:  0000000000000000(0000) GS:ffff88089c500000(0000) knlGS:0000000000000000
>>> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>>> CR2: 0000000000000078 CR3: 0000000001a85000 CR4: 00000000001406e0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> Process swapper (pid: 0, threadinfo ffff8810745c0000, task ffff8808740f2080)
>>> Stack:
>>>  00000000000116a0 0000000000000087 ffff88089c503f78 0000000000000046
>>> <d> ffff88089c503f98 0000000000000000 0000000000000000 0000000000000000
>>> <d> 0000000000000000 0000000000000000 ffff88089c503f98 ffffffff81076d86
>>> Call Trace:
>>>  <IRQ>
>>>  [<ffffffff81076d86>] irq_exit+0x76/0x90
>>>  [<ffffffff81028dd6>] smp_thermal_interrupt+0x26/0x40
>>>  [<ffffffff8100bcf3>] thermal_interrupt+0x13/0x20
>>>  <EOI>
>>>  [<ffffffff81506997>] ? start_secondary+0x127/0x2ef
>>>  [<ffffffff81506990>] ? start_secondary+0x120/0x2ef
>>>
>>> The code currently assumes that the tick device is initialized when
>>> irq_enter() and irq_exit() are called.  This is not correct and a check must
>>> be performed prior to entering the tick code through these code paths to
>>> ensure that the tick device is initialized and running.
>>>
>>> I've only seen this occur on a few systems.  I've tested with and without the
>>> patch and as far as I can tell this patch resolves the problem on
>>> linux.git top of tree.
>>>
>>> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: John Stultz <john.stultz@linaro.org>
>>> ---
>>>  kernel/time/tick-sched.c |   12 ++++++++++++
>>>  1 file changed, 12 insertions(+)
>>>
>>> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
>>> index a19a399..5027187 100644
>>> --- a/kernel/time/tick-sched.c
>>> +++ b/kernel/time/tick-sched.c
>>> @@ -567,6 +567,12 @@ EXPORT_SYMBOL_GPL(tick_nohz_idle_enter);
>>>  void tick_nohz_irq_exit(void)
>>>  {
>>>  	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
>>> +	struct clock_event_device *dev =
>>> +				     __get_cpu_var(tick_cpu_device).evtdev;
>>> +
>>> +	/* Has the tick been initialized yet? */
>>> +	if (unlikely(!dev || dev->mode == CLOCK_EVT_MODE_UNUSED))
>>> +		return;
>>>  
>>>  	if (!ts->inidle)
>>>  		return;
> 
> This does not make any sense at all.
> 
> If ts->inidle is set, then the cpu has entered the idle loop. The
> local apic timer is registered _BEFORE_ the idle loop is reached. So
> how would dev end up being NULL?

I see ... AFAICT the cpu hasn't even hit the idle loop in this case and we take
a thermal interrupt (see explanation below).

> 
> So even if an interrupt hits right before we register the local APIC
> timer, ts->inidle cannot be set.
> 
> The same is true for tick_check_idle(). The cpu cannot be in the
> broadcast mask _BEFORE_ it went deep idle (from the idle loop) and
> neither ts->idle_active nor ts->tick_stopped can be set.
> 
> So there is something else wrong and you're just papering over the the
> underlying issue.
> 
> Can you please instrument the order of events (apic registration etc),
> so we can see what goes wrong.

Sorry, I should have done that in the original submit.  The sequence of events
is this ...

Down a cpu and then bring it back up.

On the cpu_up sequence, start_secondary() is called.  The cpu is set up is
marked online, etc..

In start_secondary() at arch/x86/kernel/smpboot.c: line 279 we enable interrupts
on the cpu, and then at arch/x86/kernel/smpboot.c: line 284 we setup the tick
device and initialize evtdev.

In between these two, if the CPU takes an interrupt (in my case it appears to be
a non-fatal power warning interrupt, see
arch/x86/kernel/cpu/mcheck/therm_throt.c) the IRQ handling code is called.

Since we're handling an IRQ, on the exit path, irq_exit() will be called, which
does the following at kernel/softirq.c,

#ifdef CONFIG_NO_HZ
        /* Make sure that timer wheel updates are propagated */
        if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
                tick_nohz_irq_exit();
#endif

tick_nohz_irq_exit() calls __tick_nohz_idle_enter(), which calls
tick_nohz_stop_sched_tick() and then uses the evtdev ... which again, AFAICT
from the stack trace is NULL.

This "feels" like a race between taking an interrupt and setting up the tick
device.

I think I understand your point about ts->inidle -- I wonder if this is some odd
situation with stale data when we bring the cpu down and then back up?  I'll go
take instrument the code and see what happens.

Obviously I'm willing to hear other suggestions on debugging -- tglx, if you
have any please let me know ...

P.




> 
> Thanks,
> 
> 	tglx
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] NOHZ, check to see if tick device is initialized in IRQ handling path
  2013-05-03 12:34     ` Prarit Bhargava
@ 2013-05-03 13:02       ` Thomas Gleixner
  2013-05-03 13:43         ` Prarit Bhargava
                           ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Thomas Gleixner @ 2013-05-03 13:02 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: LKML, John Stultz, Yinghai Lu, Dave Jones

On Fri, 3 May 2013, Prarit Bhargava wrote:
> Down a cpu and then bring it back up.

Ahhhh. So the issue is, that we do not clear the per cpu ts->inidle
and friends when we bring the cpu down.

The patch below should address that.

Thanks,

	tglx

 
Index: linux-2.6/kernel/time/tick-common.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-common.c
+++ linux-2.6/kernel/time/tick-common.c
@@ -387,6 +387,7 @@ static int tick_notify(struct notifier_b
 		tick_shutdown_broadcast_oneshot(dev);
 		tick_shutdown_broadcast(dev);
 		tick_shutdown(dev);
+		tick_shutdown_nohz(dev);
 		break;
 
 	case CLOCK_EVT_NOTIFY_SUSPEND:
Index: linux-2.6/kernel/time/tick-internal.h
===================================================================
--- linux-2.6.orig/kernel/time/tick-internal.h
+++ linux-2.6/kernel/time/tick-internal.h
@@ -144,3 +144,9 @@ static inline int tick_device_is_functio
 #endif
 
 extern void do_timer(unsigned long ticks);
+
+#ifdef CONFIG_NO_HZ
+extern void tick_shutdown_nohz(unsigned int *cpup);
+#else
+static inline void tick_shutdown_nohz(unsigned int *cpup) { }
+#endif
Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c
+++ linux-2.6/kernel/time/tick-sched.c
@@ -797,6 +797,13 @@ static inline void tick_check_nohz(int c
 	}
 }
 
+void tick_shutdown_nohz(unsigned int *cpup)
+{
+	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+
+	memset(ts, 0, sizeof(*ts));
+}
+
 #else
 
 static inline void tick_nohz_switch_to_nohz(void) { }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] NOHZ, check to see if tick device is initialized in IRQ handling path
  2013-05-03 13:02       ` Thomas Gleixner
@ 2013-05-03 13:43         ` Prarit Bhargava
  2013-05-05  6:20         ` [tip:timers/urgent] tick: Cleanup NOHZ per cpu data on cpu down tip-bot for Thomas Gleixner
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Prarit Bhargava @ 2013-05-03 13:43 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz, Yinghai Lu, Dave Jones



On 05/03/2013 09:02 AM, Thomas Gleixner wrote:
> On Fri, 3 May 2013, Prarit Bhargava wrote:
>> Down a cpu and then bring it back up.
> 
> Ahhhh. So the issue is, that we do not clear the per cpu ts->inidle
> and friends when we bring the cpu down.
> 

:)  I'll give this a shot and will update with testing results.

Thanks tglx!

P.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:timers/urgent] tick: Cleanup NOHZ per cpu data on cpu down
  2013-05-03 13:02       ` Thomas Gleixner
  2013-05-03 13:43         ` Prarit Bhargava
@ 2013-05-05  6:20         ` tip-bot for Thomas Gleixner
  2013-05-05 19:54           ` Prarit Bhargava
  2013-05-05 12:48         ` tip-bot for Thomas Gleixner
                           ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-05-05  6:20 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx, prarit

Commit-ID:  ae7868e241c015aadc8632d9fe633a102a5918f6
Gitweb:     http://git.kernel.org/tip/ae7868e241c015aadc8632d9fe633a102a5918f6
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 3 May 2013 15:02:50 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 5 May 2013 08:15:11 +0200

tick: Cleanup NOHZ per cpu data on cpu down

Prarit reported a crash on CPU offline/online. The reason is that on
CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
up. If at cpu online an interrupt happens before the per cpu tick
device is registered the irq_enter() check potentially sees stale data
and dereferences a NULL pointer.

Cleanup the data after the cpu is dead.

Reported-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-common.c   | 1 +
 kernel/time/tick-internal.h | 6 ++++++
 kernel/time/tick-sched.c    | 7 +++++++
 3 files changed, 14 insertions(+)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 6176a3e..29b765d 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -387,6 +387,7 @@ static int tick_notify(struct notifier_block *nb, unsigned long reason,
 		tick_shutdown_broadcast_oneshot(dev);
 		tick_shutdown_broadcast(dev);
 		tick_shutdown(dev);
+		tick_shutdown_nohz(dev);
 		break;
 
 	case CLOCK_EVT_NOTIFY_SUSPEND:
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index f0299ea..9644b29 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -144,3 +144,9 @@ static inline int tick_device_is_functional(struct clock_event_device *dev)
 #endif
 
 extern void do_timer(unsigned long ticks);
+
+#ifdef CONFIG_NO_HZ
+extern void tick_shutdown_nohz(unsigned int *cpup);
+#else
+static inline void tick_shutdown_nohz(unsigned int *cpup) { }
+#endif
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 225f8bf..e985ccd 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -797,6 +797,13 @@ static inline void tick_check_nohz(int cpu)
 	}
 }
 
+void tick_shutdown_nohz(unsigned int *cpup)
+{
+	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+
+	memset(ts, 0, sizeof(*ts));
+}
+
 #else
 
 static inline void tick_nohz_switch_to_nohz(void) { }

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip:timers/urgent] tick: Cleanup NOHZ per cpu data on cpu down
  2013-05-03 13:02       ` Thomas Gleixner
  2013-05-03 13:43         ` Prarit Bhargava
  2013-05-05  6:20         ` [tip:timers/urgent] tick: Cleanup NOHZ per cpu data on cpu down tip-bot for Thomas Gleixner
@ 2013-05-05 12:48         ` tip-bot for Thomas Gleixner
  2013-05-05 14:14         ` tip-bot for Thomas Gleixner
  2013-05-12 10:27         ` tip-bot for Thomas Gleixner
  4 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-05-05 12:48 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, bitbucket, tglx, prarit

Commit-ID:  d3d233f5e55c688ea3f17a3efeb4c044f749ff55
Gitweb:     http://git.kernel.org/tip/d3d233f5e55c688ea3f17a3efeb4c044f749ff55
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 3 May 2013 15:02:50 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 5 May 2013 14:42:43 +0200

tick: Cleanup NOHZ per cpu data on cpu down

Prarit reported a crash on CPU offline/online. The reason is that on
CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
up. If at cpu online an interrupt happens before the per cpu tick
device is registered the irq_enter() check potentially sees stale data
and dereferences a NULL pointer.

Cleanup the data after the cpu is dead.

[ hrtimer fix from Mike Galbraith ]

Reported-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Cc: Mike Galbraith <bitbucket@online.de>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/hrtimer.c            | 2 +-
 kernel/time/tick-common.c   | 1 +
 kernel/time/tick-internal.h | 6 ++++++
 kernel/time/tick-sched.c    | 7 +++++++
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 609d8ff..90f8642 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1759,8 +1759,8 @@ static int __cpuinit hrtimer_cpu_notify(struct notifier_block *self,
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
 	{
-		clockevents_notify(CLOCK_EVT_NOTIFY_CPU_DEAD, &scpu);
 		migrate_hrtimers(scpu);
+		clockevents_notify(CLOCK_EVT_NOTIFY_CPU_DEAD, &scpu);
 		break;
 	}
 #endif
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 6176a3e..29b765d 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -387,6 +387,7 @@ static int tick_notify(struct notifier_block *nb, unsigned long reason,
 		tick_shutdown_broadcast_oneshot(dev);
 		tick_shutdown_broadcast(dev);
 		tick_shutdown(dev);
+		tick_shutdown_nohz(dev);
 		break;
 
 	case CLOCK_EVT_NOTIFY_SUSPEND:
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index f0299ea..9644b29 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -144,3 +144,9 @@ static inline int tick_device_is_functional(struct clock_event_device *dev)
 #endif
 
 extern void do_timer(unsigned long ticks);
+
+#ifdef CONFIG_NO_HZ
+extern void tick_shutdown_nohz(unsigned int *cpup);
+#else
+static inline void tick_shutdown_nohz(unsigned int *cpup) { }
+#endif
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 225f8bf..e985ccd 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -797,6 +797,13 @@ static inline void tick_check_nohz(int cpu)
 	}
 }
 
+void tick_shutdown_nohz(unsigned int *cpup)
+{
+	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+
+	memset(ts, 0, sizeof(*ts));
+}
+
 #else
 
 static inline void tick_nohz_switch_to_nohz(void) { }

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip:timers/urgent] tick: Cleanup NOHZ per cpu data on cpu down
  2013-05-03 13:02       ` Thomas Gleixner
                           ` (2 preceding siblings ...)
  2013-05-05 12:48         ` tip-bot for Thomas Gleixner
@ 2013-05-05 14:14         ` tip-bot for Thomas Gleixner
  2013-05-12 10:27         ` tip-bot for Thomas Gleixner
  4 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-05-05 14:14 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, bitbucket, tglx, prarit

Commit-ID:  3fa479b9a7762e12bdad2db39f983ba1b5f51ba2
Gitweb:     http://git.kernel.org/tip/3fa479b9a7762e12bdad2db39f983ba1b5f51ba2
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 3 May 2013 15:02:50 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 5 May 2013 16:06:27 +0200

tick: Cleanup NOHZ per cpu data on cpu down

Prarit reported a crash on CPU offline/online. The reason is that on
CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
up. If at cpu online an interrupt happens before the per cpu tick
device is registered the irq_enter() check potentially sees stale data
and dereferences a NULL pointer.

Cleanup the data after the cpu is dead.

[ hrtimer fix from Mike Galbraith ]

Reported-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Cc: Mike Galbraith <bitbucket@online.de>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/hrtimer.c            | 2 +-
 kernel/time/tick-common.c   | 1 +
 kernel/time/tick-internal.h | 6 ++++++
 kernel/time/tick-sched.c    | 7 +++++++
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 609d8ff..90f8642 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1759,8 +1759,8 @@ static int __cpuinit hrtimer_cpu_notify(struct notifier_block *self,
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
 	{
-		clockevents_notify(CLOCK_EVT_NOTIFY_CPU_DEAD, &scpu);
 		migrate_hrtimers(scpu);
+		clockevents_notify(CLOCK_EVT_NOTIFY_CPU_DEAD, &scpu);
 		break;
 	}
 #endif
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 6176a3e..29b765d 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -387,6 +387,7 @@ static int tick_notify(struct notifier_block *nb, unsigned long reason,
 		tick_shutdown_broadcast_oneshot(dev);
 		tick_shutdown_broadcast(dev);
 		tick_shutdown(dev);
+		tick_shutdown_nohz(dev);
 		break;
 
 	case CLOCK_EVT_NOTIFY_SUSPEND:
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index f0299ea..9644b29 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -144,3 +144,9 @@ static inline int tick_device_is_functional(struct clock_event_device *dev)
 #endif
 
 extern void do_timer(unsigned long ticks);
+
+#ifdef CONFIG_NO_HZ
+extern void tick_shutdown_nohz(unsigned int *cpup);
+#else
+static inline void tick_shutdown_nohz(unsigned int *cpup) { }
+#endif
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 225f8bf..b79f562 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -797,6 +797,13 @@ static inline void tick_check_nohz(int cpu)
 	}
 }
 
+void tick_shutdown_nohz(unsigned int *cpup)
+{
+	struct tick_sched *ts = &per_cpu(tick_cpu_sched, *cpup);
+
+	memset(ts, 0, sizeof(*ts));
+}
+
 #else
 
 static inline void tick_nohz_switch_to_nohz(void) { }

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [tip:timers/urgent] tick: Cleanup NOHZ per cpu data on cpu down
  2013-05-05  6:20         ` [tip:timers/urgent] tick: Cleanup NOHZ per cpu data on cpu down tip-bot for Thomas Gleixner
@ 2013-05-05 19:54           ` Prarit Bhargava
  2013-05-06  8:48             ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Prarit Bhargava @ 2013-05-05 19:54 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, tglx, prarit
  Cc: tip-bot for Thomas Gleixner, linux-tip-commits



On 05/05/2013 02:20 AM, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  ae7868e241c015aadc8632d9fe633a102a5918f6
> Gitweb:     http://git.kernel.org/tip/ae7868e241c015aadc8632d9fe633a102a5918f6
> Author:     Thomas Gleixner <tglx@linutronix.de>
> AuthorDate: Fri, 3 May 2013 15:02:50 +0200
> Committer:  Thomas Gleixner <tglx@linutronix.de>
> CommitDate: Sun, 5 May 2013 08:15:11 +0200
> 
> tick: Cleanup NOHZ per cpu data on cpu down
> 
> Prarit reported a crash on CPU offline/online. The reason is that on
> CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
> up. If at cpu online an interrupt happens before the per cpu tick
> device is registered the irq_enter() check potentially sees stale data
> and dereferences a NULL pointer.
> 
> Cleanup the data after the cpu is dead.
> 
> Reported-by: Prarit Bhargava <prarit@redhat.com>
> Cc: stable@vger.kernel.org
> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/time/tick-common.c   | 1 +
>  kernel/time/tick-internal.h | 6 ++++++
>  kernel/time/tick-sched.c    | 7 +++++++
>  3 files changed, 14 insertions(+)

Whoa -- I thought I said I'll test this first.  It doesn't work :( so that means
something else is wrong ... I'm in the middle of debug ATM.

P.

> 
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 6176a3e..29b765d 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -387,6 +387,7 @@ static int tick_notify(struct notifier_block *nb, unsigned long reason,
>  		tick_shutdown_broadcast_oneshot(dev);
>  		tick_shutdown_broadcast(dev);
>  		tick_shutdown(dev);
> +		tick_shutdown_nohz(dev);
>  		break;
>  
>  	case CLOCK_EVT_NOTIFY_SUSPEND:
> diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
> index f0299ea..9644b29 100644
> --- a/kernel/time/tick-internal.h
> +++ b/kernel/time/tick-internal.h
> @@ -144,3 +144,9 @@ static inline int tick_device_is_functional(struct clock_event_device *dev)
>  #endif
>  
>  extern void do_timer(unsigned long ticks);
> +
> +#ifdef CONFIG_NO_HZ
> +extern void tick_shutdown_nohz(unsigned int *cpup);
> +#else
> +static inline void tick_shutdown_nohz(unsigned int *cpup) { }
> +#endif
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 225f8bf..e985ccd 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -797,6 +797,13 @@ static inline void tick_check_nohz(int cpu)
>  	}
>  }
>  
> +void tick_shutdown_nohz(unsigned int *cpup)
> +{
> +	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> +
> +	memset(ts, 0, sizeof(*ts));
> +}
> +
>  #else
>  
>  static inline void tick_nohz_switch_to_nohz(void) { }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip:timers/urgent] tick: Cleanup NOHZ per cpu data on cpu down
  2013-05-05 19:54           ` Prarit Bhargava
@ 2013-05-06  8:48             ` Thomas Gleixner
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2013-05-06  8:48 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: mingo, hpa, linux-kernel, tip-bot for Thomas Gleixner, linux-tip-commits

On Sun, 5 May 2013, Prarit Bhargava wrote:
> Whoa -- I thought I said I'll test this first.  It doesn't work :( so that means
> something else is wrong ... I'm in the middle of debug ATM.

I pushed it out accidentaly. New fixed and working version is uploaded already

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:timers/urgent] tick: Cleanup NOHZ per cpu data on cpu down
  2013-05-03 13:02       ` Thomas Gleixner
                           ` (3 preceding siblings ...)
  2013-05-05 14:14         ` tip-bot for Thomas Gleixner
@ 2013-05-12 10:27         ` tip-bot for Thomas Gleixner
  2013-05-13 14:51           ` Prarit Bhargava
  4 siblings, 1 reply; 17+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-05-12 10:27 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, bitbucket, tglx, prarit

Commit-ID:  4b0c0f294f60abcdd20994a8341a95c8ac5eeb96
Gitweb:     http://git.kernel.org/tip/4b0c0f294f60abcdd20994a8341a95c8ac5eeb96
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 3 May 2013 15:02:50 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 12 May 2013 12:20:09 +0200

tick: Cleanup NOHZ per cpu data on cpu down

Prarit reported a crash on CPU offline/online. The reason is that on
CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
up. If at cpu online an interrupt happens before the per cpu tick
device is registered the irq_enter() check potentially sees stale data
and dereferences a NULL pointer.

Cleanup the data after the cpu is dead.

Reported-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Cc: Mike Galbraith <bitbucket@online.de>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 225f8bf..0eed1db 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -904,7 +904,7 @@ void tick_cancel_sched_timer(int cpu)
 		hrtimer_cancel(&ts->sched_timer);
 # endif
 
-	ts->nohz_mode = NOHZ_MODE_INACTIVE;
+	memset(ts, 0, sizeof(*ts));
 }
 #endif
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [tip:timers/urgent] tick: Cleanup NOHZ per cpu data on cpu down
  2013-05-12 10:27         ` tip-bot for Thomas Gleixner
@ 2013-05-13 14:51           ` Prarit Bhargava
  2013-05-13 19:10             ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Prarit Bhargava @ 2013-05-13 14:51 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, bitbucket, tglx, prarit
  Cc: tip-bot for Thomas Gleixner, linux-tip-commits



On 05/12/2013 06:27 AM, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  4b0c0f294f60abcdd20994a8341a95c8ac5eeb96
> Gitweb:     http://git.kernel.org/tip/4b0c0f294f60abcdd20994a8341a95c8ac5eeb96
> Author:     Thomas Gleixner <tglx@linutronix.de>
> AuthorDate: Fri, 3 May 2013 15:02:50 +0200
> Committer:  Thomas Gleixner <tglx@linutronix.de>
> CommitDate: Sun, 12 May 2013 12:20:09 +0200
> 
> tick: Cleanup NOHZ per cpu data on cpu down
> 
> Prarit reported a crash on CPU offline/online. The reason is that on
> CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
> up. If at cpu online an interrupt happens before the per cpu tick
> device is registered the irq_enter() check potentially sees stale data
> and dereferences a NULL pointer.
> 
> Cleanup the data after the cpu is dead.

Thomas, while this does fix up the NULL pointer issue, I think you've introduced
a new bug in the schedule timer code.

While doing up and downs on the same CPU, I now occasionally see long delays in
the up and down...

[   65.150073] smpboot: Booting Node 1 Processor 19 APIC 0x28
[   66.715339] smpboot: CPU 19 is now offline
[   67.752751] smpboot: Booting Node 1 Processor 19 APIC 0x28
[   68.758711] smpboot: CPU 19 is now offline

Everything is normal ...

[   69.711612] smpboot: Booting Node 1 Processor 19 APIC 0x28
[   70.731521] smpboot: CPU 19 is now offline

Long delay in bringing CPU "down"

[   81.744565] smpboot: Booting Node 1 Processor 19 APIC 0x28
[   82.848591] smpboot: CPU 19 is now offline

Long delay in bringing CPU "up"

[   89.826533] smpboot: Booting Node 1 Processor 19 APIC 0x28
[   84.905358] smpboot: CPU 19 is now offline
[   87.565274] smpboot: Booting Node 1 Processor 19 APIC 0x28

Also, if the system is in this state I cannot reboot -- the system appears to
hang while bringing down CPUs...

Oddly, if I do

+       memset(ts, 0, sizeof(*ts));
+       ts->tick_stopped = 1;

instead of your memset, everything works.  I'm looking at the tick-sched.c code
to see why setting tick_stopped = 1 seems to fix the problem.

P.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip:timers/urgent] tick: Cleanup NOHZ per cpu data on cpu down
  2013-05-13 14:51           ` Prarit Bhargava
@ 2013-05-13 19:10             ` Thomas Gleixner
  2013-05-14 13:48               ` Prarit Bhargava
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2013-05-13 19:10 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: mingo, hpa, linux-kernel, bitbucket, tip-bot for Thomas Gleixner,
	linux-tip-commits

On Mon, 13 May 2013, Prarit Bhargava wrote:
> Thomas, while this does fix up the NULL pointer issue, I think you've introduced
> a new bug in the schedule timer code.

I don't think that I introduced a new bug. I'm quite sure that change
unearthed another issue which was papered over by the stale data.

That memset is putting the data structure into the same state as we
have on boot. From tick-sched perspective cpu onlining is not
different between boot and an offline/online cycle

> While doing up and downs on the same CPU, I now occasionally see long delays in
> the up and down...

> [   81.744565] smpboot: Booting Node 1 Processor 19 APIC 0x28
> [   82.848591] smpboot: CPU 19 is now offline
> 
> Long delay in bringing CPU "up"
> 
> [   89.826533] smpboot: Booting Node 1 Processor 19 APIC 0x28
> [   84.905358] smpboot: CPU 19 is now offline
> [   87.565274] smpboot: Booting Node 1 Processor 19 APIC 0x28

Errm, the timestamps are random. -ENOTUSEFUL
 
> Also, if the system is in this state I cannot reboot -- the system appears to
> hang while bringing down CPUs...
> 
> Oddly, if I do
> 
> +       memset(ts, 0, sizeof(*ts));
> +       ts->tick_stopped = 1;
> 
> instead of your memset, everything works.  I'm looking at the tick-sched.c code
> to see why setting tick_stopped = 1 seems to fix the problem.

That doesn't make any sense. So instead of changing random values in
ts, could you please fire up the tracer and gather evidence, so we can
see what the system does when these long delays happen. You can start
and stop the tracer from your script and terminate if one of the
operations takes too long.

Thanks,

	tglx







^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip:timers/urgent] tick: Cleanup NOHZ per cpu data on cpu down
  2013-05-13 19:10             ` Thomas Gleixner
@ 2013-05-14 13:48               ` Prarit Bhargava
  0 siblings, 0 replies; 17+ messages in thread
From: Prarit Bhargava @ 2013-05-14 13:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: mingo, hpa, linux-kernel, bitbucket, tip-bot for Thomas Gleixner,
	linux-tip-commits



On 05/13/2013 03:10 PM, Thomas Gleixner wrote:
> On Mon, 13 May 2013, Prarit Bhargava wrote:
>> Thomas, while this does fix up the NULL pointer issue, I think you've introduced
>> a new bug in the schedule timer code.
> 
> I don't think that I introduced a new bug. I'm quite sure that change
> unearthed another issue which was papered over by the stale data.
> 
> That memset is putting the data structure into the same state as we
> have on boot. From tick-sched perspective cpu onlining is not
> different between boot and an offline/online cycle
> 
>> While doing up and downs on the same CPU, I now occasionally see long delays in
>> the up and down...
> 
>> [   81.744565] smpboot: Booting Node 1 Processor 19 APIC 0x28
>> [   82.848591] smpboot: CPU 19 is now offline
>>
>> Long delay in bringing CPU "up"
>>
>> [   89.826533] smpboot: Booting Node 1 Processor 19 APIC 0x28
>> [   84.905358] smpboot: CPU 19 is now offline
>> [   87.565274] smpboot: Booting Node 1 Processor 19 APIC 0x28
> 
> Errm, the timestamps are random. -ENOTUSEFUL
>  

I'm always saying my computer is full of lies ;)

Here's the bottom line.  The patch included in this thread plus the patch you
pointed me to here

http://marc.info/?l=linux-kernel&m=136847403809031&w=2

seem to resolve the cpu up/down + thermal interrupt issues that I've been seeing.

So thank you :)

Tested-by: Prarit Bhargava <prarit@redhat.com>

P.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] NOHZ, check to see if tick device is initialized in IRQ handling path
@ 2013-04-16  1:20 Prarit Bhargava
  0 siblings, 0 replies; 17+ messages in thread
From: Prarit Bhargava @ 2013-04-16  1:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: Prarit Bhargava, Thomas Gleixner

I think the linux.git "system hang" isn't really a hang.  For some reason the
panic text wasn't displayed on the console.  I've seen this behaviour a few
times now ... maybe there's a bug in the panic output path?

I also haven't determined why I'm seeing a thermal interrupt during a CPU
hotplug test.  It could be a problem with the hardware I'm testing on, but
the bottom line is that it is possible to take an interrupt before the
tick's evtdev is set.

I can also see via some simple printk debug that it certainly looks like
the thermal interrupt code is enabled prior to the tick:

echo 1 > /sys/.../cpu/cpu20/online

[  347.402647] thermal_throttle_cpu_callback: cpu 20 add dev  <<< my debug
[  347.408992] Booting Node 0 Processor 20 APIC 0x1
[  347.429219] cpu_notify: called with CPU_STARTING <<< my debug
[  347.431261] cpu_notify: called with CPU_ONLINE <<< my debug
[  347.429219] tick_check_new_device: new cpu 20 evtdev start  <<< my debug
[  347.462542] microcode: CPU20 sig=0x306e2, pf=0x1, revision=0x209
[  347.469276] platform microcode: firmware: requesting intel-ucode/06-3e-02

Here's a patch...

----8<----

When adding a CPU there is a small window in which interrupts are enabled and
the clock tick device has not been initialized.  If an interrupt occurs in
this window, irq_exit() will be called which calls tick_nohz_irq_exit() which
in turn calls __tick_nohz_idle_enter().

__tick_nohz_idle() enter assumes that the tick has been initialized.  In the
above case, however, it has not and this leads to what appears to be a system
hang on latest linux.git or a the following panic on RHEL6:

Pid: 0, comm: swapper Not tainted 2.6.32-358.el6.x86_64 #1
RIP: 0010:[<ffffffff810a89e5>]  [<ffffffff810a89e5>] tick_nohz_stop_sched_tick+0x2a5/0x3e0
RSP: 0018:ffff88089c503f38  EFLAGS: 00010046
RAX: ffffffff81c07520 RBX: ffff88089c5116a0 RCX: 000002f04bb18cd8
RDX: 0000000000000000 RSI: 000000000000a1b5 RDI: 000002f04bb0eb23
RBP: ffff88089c503f88 R08: ffff88089c50e060 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000017
R13: 000002f04bb17dd5 R14: 0000000000000000 R15: 0000000000000092
FS:  0000000000000000(0000) GS:ffff88089c500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000078 CR3: 0000000001a85000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8810745c0000, task ffff8808740f2080)
Stack:
 00000000000116a0 0000000000000087 ffff88089c503f78 0000000000000046
<d> ffff88089c503f98 0000000000000000 0000000000000000 0000000000000000
<d> 0000000000000000 0000000000000000 ffff88089c503f98 ffffffff81076d86
Call Trace:
 <IRQ>
 [<ffffffff81076d86>] irq_exit+0x76/0x90
 [<ffffffff81028dd6>] smp_thermal_interrupt+0x26/0x40
 [<ffffffff8100bcf3>] thermal_interrupt+0x13/0x20
 <EOI>
 [<ffffffff81506997>] ? start_secondary+0x127/0x2ef
 [<ffffffff81506990>] ? start_secondary+0x120/0x2ef

The code currently assumes that the tick device is initialized when
irq_enter() and irq_exit() are called.  This is not correct and a check must
be performed prior to entering the tick code through these code paths to
ensure that the tick device is initialized and running.

I've only seen this occur on one system.  I've tested with and without the
patch and as far as I can tell this patch resolves the problem on
linux.git top of tree.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-sched.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a19a399..5027187 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -567,6 +567,12 @@ EXPORT_SYMBOL_GPL(tick_nohz_idle_enter);
 void tick_nohz_irq_exit(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+	struct clock_event_device *dev =
+				     __get_cpu_var(tick_cpu_device).evtdev;
+
+	/* Has the tick been initialized yet? */
+	if (unlikely(!dev || dev->mode == CLOCK_EVT_MODE_UNUSED))
+		return;
 
 	if (!ts->inidle)
 		return;
@@ -809,6 +815,12 @@ static inline void tick_check_nohz(int cpu) { }
  */
 void tick_check_idle(int cpu)
 {
+	struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev;
+
+	/* Has the tick been initialized yet? */
+	if (unlikely(!dev || dev->mode == CLOCK_EVT_MODE_UNUSED))
+		return;
+
 	tick_check_oneshot_broadcast(cpu);
 	tick_check_nohz(cpu);
 }
-- 
1.7.9.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-05-14 13:48 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-30 12:36 [PATCH] NOHZ, check to see if tick device is initialized in IRQ handling path Prarit Bhargava
2013-05-02 22:51 ` Tony Luck
2013-05-02 22:56 ` Thomas Gleixner
2013-05-03  8:10   ` Thomas Gleixner
2013-05-03 12:34     ` Prarit Bhargava
2013-05-03 13:02       ` Thomas Gleixner
2013-05-03 13:43         ` Prarit Bhargava
2013-05-05  6:20         ` [tip:timers/urgent] tick: Cleanup NOHZ per cpu data on cpu down tip-bot for Thomas Gleixner
2013-05-05 19:54           ` Prarit Bhargava
2013-05-06  8:48             ` Thomas Gleixner
2013-05-05 12:48         ` tip-bot for Thomas Gleixner
2013-05-05 14:14         ` tip-bot for Thomas Gleixner
2013-05-12 10:27         ` tip-bot for Thomas Gleixner
2013-05-13 14:51           ` Prarit Bhargava
2013-05-13 19:10             ` Thomas Gleixner
2013-05-14 13:48               ` Prarit Bhargava
  -- strict thread matches above, loose matches on Subject: below --
2013-04-16  1:20 [PATCH] NOHZ, check to see if tick device is initialized in IRQ handling path Prarit Bhargava

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).