All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH][3.10] nohz: Fix lockup on restart from wrong error code
@ 2013-05-21 17:46 Steven Rostedt
  2013-05-21 18:12 ` Frederic Weisbecker
  2013-05-21 20:14 ` Frederic Weisbecker
  0 siblings, 2 replies; 7+ messages in thread
From: Steven Rostedt @ 2013-05-21 17:46 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Paul E. McKenney, Andrew Morton, Paul Gortmaker,
	Tejun Heo

commit a382bf934449 "nohz: Assign timekeeping duty to a CPU outside the
full dynticks range" added a cpu notifier callback that would prevent
the time keeping CPU from going offline if the have_nohz_full_mask was
set.

This also prevents the CPU from going offline on system reboot.

Worse yet, the return code was -EINVAL, but the notifier does not
recognize error codes, and it must be wrapped by a notifier_from_errno()
function. This means that even though the CPU would fail to go down, the
notifier would think it succeeded, and the cpu down process would
continue.

This caused two different problems. One, the migration thread after
moving tasks from the CPU would park itself and then a task, namely the
reboot task, could migrate onto that CPU. Then the reboot task spins
waiting for the cpu to go idle. But because the reboot task happens to
be spinning on the cpu its waiting for, the system hangs.

The other error that happened was that the sched_domain re-setup would
get confused, and in get_group() the cpu = cpumask_first() would process
a mask that had nothing set, and return cpu > nr_cpu_ids. Later it would
reference the per_cpu sg with this CPU and get a bogus pointer and
crash.

This fix simply fixes the issue with the return code of the cpu
notifier. This prevents all non-boot CPUs from going down, but that only
gives us the following warnings and does not crash or lockup the system.

[   73.655698] _cpu_down: attempt to take down CPU 2 failed
[   73.661874] Error taking CPU2 down: -22
[   73.665727] Non-boot CPUs are not disabled
[   73.669853] Restarting system.

And because of this, we get this warning too. But at least the system
reboots.

[   73.432740] ------------[ cut here ]------------
[   73.433003] WARNING: at /home/rostedt/work/git/linux-trace.git/kernel/workqueue.c:4584 workqueue_cpu_up_callback+0x24b/0x48c()
[   73.433003] Modules linked in: ebtables ipt_MASQUERADE sunrpc bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ipv6 uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec kvm_intel kvm snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc shpchp i2c_i801 microcode pata_acpi firewire_ohci firewire_core crc_itu_t ata_generic i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: ip6_tables]
[   73.433003] CPU: 0 PID: 2765 Comm: reboot Not tainted 3.10.0-rc2-test+ #124
[   73.433003] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
[   73.433003]  ffffffff817d0b08 ffff88006a95bc28 ffffffff814ca368 ffff88006a95bc68
[   73.433003]  ffffffff81035267 0000000000000002 0000000000000000 ffff88007d512e00
[   73.433003]  0000000000000002 ffff88007a809cc0 ffff88007d513260 ffff88006a95bc78
[   73.433003] Call Trace:
[   73.433003]  [<ffffffff814ca368>] dump_stack+0x19/0x1b
[   73.433003]  [<ffffffff81035267>] warn_slowpath_common+0x67/0x80
[   73.433003]  [<ffffffff8103529a>] warn_slowpath_null+0x1a/0x1c
[   73.433003]  [<ffffffff814bee83>] workqueue_cpu_up_callback+0x24b/0x48c
[   73.433003]  [<ffffffff810679fd>] ? cpumask_weight+0x13/0x14
[   73.433003]  [<ffffffff814d22dd>] notifier_call_chain+0x37/0x63
[   73.433003]  [<ffffffff8105c19a>] __raw_notifier_call_chain+0xe/0x10
[   73.433003]  [<ffffffff810383d8>] __cpu_notify+0x20/0x32
[   73.433003]  [<ffffffff814b3122>] _cpu_down+0x90/0x229
[   73.433003]  [<ffffffff81038687>] disable_nonboot_cpus+0x5a/0xfb
[   73.433003]  [<ffffffff81049d87>] kernel_restart+0x18/0x5a
[   73.433003]  [<ffffffff81049f52>] SYSC_reboot+0x177/0x1d9
[   73.433003]  [<ffffffff810ca70a>] ? trace_preempt_on+0x1b/0x2f
[   73.433003]  [<ffffffff81085eac>] ? trace_hardirqs_on+0xd/0xf
[   73.433003]  [<ffffffff810e571e>] ? user_exit+0x69/0x70
[   73.433003]  [<ffffffff810e571e>] ? user_exit+0x69/0x70
[   73.433003]  [<ffffffff81085e68>] ? trace_hardirqs_on_caller+0x160/0x197
[   73.433003]  [<ffffffff81085eac>] ? trace_hardirqs_on+0xd/0xf
[   73.433003]  [<ffffffff8100c7b7>] ? syscall_trace_enter+0xdb/0x1b3
[   73.433003]  [<ffffffff81049fc2>] SyS_reboot+0xe/0x10
[   73.433003]  [<ffffffff814d5814>] tracesys+0xdd/0xe2
[   73.433003] ---[ end trace 1a5fc10dcbddf506 ]---

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index f420813..7e8e208 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -306,7 +306,7 @@ static int __cpuinit tick_nohz_cpu_down_callback(struct notifier_block *nfb,
 		 * we can't safely shutdown that CPU.
 		 */
 		if (have_nohz_full_mask && tick_do_timer_cpu == cpu)
-			return -EINVAL;
+			return notifier_from_errno(-EINVAL);
 		break;
 	}
 	return NOTIFY_OK;



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH][3.10] nohz: Fix lockup on restart from wrong error code
  2013-05-21 17:46 [PATCH][3.10] nohz: Fix lockup on restart from wrong error code Steven Rostedt
@ 2013-05-21 18:12 ` Frederic Weisbecker
  2013-05-21 18:24   ` Steven Rostedt
  2013-05-21 20:14 ` Frederic Weisbecker
  1 sibling, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2013-05-21 18:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Paul E. McKenney, Andrew Morton, Paul Gortmaker, Tejun Heo

2013/5/21 Steven Rostedt <rostedt@goodmis.org>:
> commit a382bf934449 "nohz: Assign timekeeping duty to a CPU outside the
> full dynticks range" added a cpu notifier callback that would prevent
> the time keeping CPU from going offline if the have_nohz_full_mask was
> set.
>
> This also prevents the CPU from going offline on system reboot.
>
> Worse yet, the return code was -EINVAL, but the notifier does not
> recognize error codes, and it must be wrapped by a notifier_from_errno()
> function. This means that even though the CPU would fail to go down, the
> notifier would think it succeeded, and the cpu down process would
> continue.
>
> This caused two different problems. One, the migration thread after
> moving tasks from the CPU would park itself and then a task, namely the
> reboot task, could migrate onto that CPU. Then the reboot task spins
> waiting for the cpu to go idle. But because the reboot task happens to
> be spinning on the cpu its waiting for, the system hangs.
>
> The other error that happened was that the sched_domain re-setup would
> get confused, and in get_group() the cpu = cpumask_first() would process
> a mask that had nothing set, and return cpu > nr_cpu_ids. Later it would
> reference the per_cpu sg with this CPU and get a bogus pointer and
> crash.
>
> This fix simply fixes the issue with the return code of the cpu
> notifier. This prevents all non-boot CPUs from going down, but that only
> gives us the following warnings and does not crash or lockup the system.
>
> [   73.655698] _cpu_down: attempt to take down CPU 2 failed
> [   73.661874] Error taking CPU2 down: -22
> [   73.665727] Non-boot CPUs are not disabled
> [   73.669853] Restarting system.
>
> And because of this, we get this warning too. But at least the system
> reboots.
>
> [   73.432740] ------------[ cut here ]------------
> [   73.433003] WARNING: at /home/rostedt/work/git/linux-trace.git/kernel/workqueue.c:4584 workqueue_cpu_up_callback+0x24b/0x48c()
> [   73.433003] Modules linked in: ebtables ipt_MASQUERADE sunrpc bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ipv6 uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec kvm_intel kvm snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc shpchp i2c_i801 microcode pata_acpi firewire_ohci firewire_core crc_itu_t ata_generic i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: ip6_tables]
> [   73.433003] CPU: 0 PID: 2765 Comm: reboot Not tainted 3.10.0-rc2-test+ #124
> [   73.433003] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
> [   73.433003]  ffffffff817d0b08 ffff88006a95bc28 ffffffff814ca368 ffff88006a95bc68
> [   73.433003]  ffffffff81035267 0000000000000002 0000000000000000 ffff88007d512e00
> [   73.433003]  0000000000000002 ffff88007a809cc0 ffff88007d513260 ffff88006a95bc78
> [   73.433003] Call Trace:
> [   73.433003]  [<ffffffff814ca368>] dump_stack+0x19/0x1b
> [   73.433003]  [<ffffffff81035267>] warn_slowpath_common+0x67/0x80
> [   73.433003]  [<ffffffff8103529a>] warn_slowpath_null+0x1a/0x1c
> [   73.433003]  [<ffffffff814bee83>] workqueue_cpu_up_callback+0x24b/0x48c
> [   73.433003]  [<ffffffff810679fd>] ? cpumask_weight+0x13/0x14
> [   73.433003]  [<ffffffff814d22dd>] notifier_call_chain+0x37/0x63
> [   73.433003]  [<ffffffff8105c19a>] __raw_notifier_call_chain+0xe/0x10
> [   73.433003]  [<ffffffff810383d8>] __cpu_notify+0x20/0x32
> [   73.433003]  [<ffffffff814b3122>] _cpu_down+0x90/0x229
> [   73.433003]  [<ffffffff81038687>] disable_nonboot_cpus+0x5a/0xfb
> [   73.433003]  [<ffffffff81049d87>] kernel_restart+0x18/0x5a
> [   73.433003]  [<ffffffff81049f52>] SYSC_reboot+0x177/0x1d9
> [   73.433003]  [<ffffffff810ca70a>] ? trace_preempt_on+0x1b/0x2f
> [   73.433003]  [<ffffffff81085eac>] ? trace_hardirqs_on+0xd/0xf
> [   73.433003]  [<ffffffff810e571e>] ? user_exit+0x69/0x70
> [   73.433003]  [<ffffffff810e571e>] ? user_exit+0x69/0x70
> [   73.433003]  [<ffffffff81085e68>] ? trace_hardirqs_on_caller+0x160/0x197
> [   73.433003]  [<ffffffff81085eac>] ? trace_hardirqs_on+0xd/0xf
> [   73.433003]  [<ffffffff8100c7b7>] ? syscall_trace_enter+0xdb/0x1b3
> [   73.433003]  [<ffffffff81049fc2>] SyS_reboot+0xe/0x10
> [   73.433003]  [<ffffffff814d5814>] tracesys+0xdd/0xe2
> [   73.433003] ---[ end trace 1a5fc10dcbddf506 ]---
>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

There has been this patch that makes it return -EPERM instead:
https://lkml.org/lkml/2013/5/20/386

Not sure which is best. Both sort of make sense to me.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][3.10] nohz: Fix lockup on restart from wrong error code
  2013-05-21 18:12 ` Frederic Weisbecker
@ 2013-05-21 18:24   ` Steven Rostedt
  2013-05-21 20:05     ` Frederic Weisbecker
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2013-05-21 18:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Paul E. McKenney, Andrew Morton, Paul Gortmaker, Tejun Heo

On Tue, 2013-05-21 at 20:12 +0200, Frederic Weisbecker wrote:

> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> 
> There has been this patch that makes it return -EPERM instead:
> https://lkml.org/lkml/2013/5/20/386
> 
> Not sure which is best. Both sort of make sense to me.

That's just as good, as it also fixes the issue.

You can take Li's and add my Acked-by to it. And perhaps even add some
of my change log as well. I believe that will still give the same
problem with reboot, which needs to be fixed.

-- Steve



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][3.10] nohz: Fix lockup on restart from wrong error code
  2013-05-21 18:24   ` Steven Rostedt
@ 2013-05-21 20:05     ` Frederic Weisbecker
  0 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2013-05-21 20:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Paul E. McKenney, Andrew Morton, Paul Gortmaker, Tejun Heo

2013/5/21 Steven Rostedt <rostedt@goodmis.org>:
> On Tue, 2013-05-21 at 20:12 +0200, Frederic Weisbecker wrote:
>
>> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>>
>> There has been this patch that makes it return -EPERM instead:
>> https://lkml.org/lkml/2013/5/20/386
>>
>> Not sure which is best. Both sort of make sense to me.
>
> That's just as good, as it also fixes the issue.
>
> You can take Li's and add my Acked-by to it. And perhaps even add some
> of my change log as well.

Ok, thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][3.10] nohz: Fix lockup on restart from wrong error code
  2013-05-21 17:46 [PATCH][3.10] nohz: Fix lockup on restart from wrong error code Steven Rostedt
  2013-05-21 18:12 ` Frederic Weisbecker
@ 2013-05-21 20:14 ` Frederic Weisbecker
  2013-05-22  0:09   ` Steven Rostedt
  1 sibling, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2013-05-21 20:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Paul E. McKenney, Andrew Morton, Paul Gortmaker, Tejun Heo

2013/5/21 Steven Rostedt <rostedt@goodmis.org>:
> commit a382bf934449 "nohz: Assign timekeeping duty to a CPU outside the
> full dynticks range" added a cpu notifier callback that would prevent
> the time keeping CPU from going offline if the have_nohz_full_mask was
> set.
>
> This also prevents the CPU from going offline on system reboot.
>
> Worse yet, the return code was -EINVAL, but the notifier does not
> recognize error codes, and it must be wrapped by a notifier_from_errno()
> function. This means that even though the CPU would fail to go down, the
> notifier would think it succeeded, and the cpu down process would
> continue.
>
> This caused two different problems. One, the migration thread after
> moving tasks from the CPU would park itself and then a task, namely the
> reboot task, could migrate onto that CPU. Then the reboot task spins
> waiting for the cpu to go idle. But because the reboot task happens to
> be spinning on the cpu its waiting for, the system hangs.

Can that happen if that CPU is the boot CPU? Note this is the only
possible timekeeper with the upstream code.

>
> The other error that happened was that the sched_domain re-setup would
> get confused, and in get_group() the cpu = cpumask_first() would process
> a mask that had nothing set, and return cpu > nr_cpu_ids. Later it would
> reference the per_cpu sg with this CPU and get a bogus pointer and
> crash.

Ouch, when are we doing this domain re-setup? I remember we
repartition the domains after cpu down/up but I don't understand how
that can interfere with this issue.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][3.10] nohz: Fix lockup on restart from wrong error code
  2013-05-21 20:14 ` Frederic Weisbecker
@ 2013-05-22  0:09   ` Steven Rostedt
  2013-05-22 16:09     ` Frederic Weisbecker
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2013-05-22  0:09 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Paul E. McKenney, Andrew Morton, Paul Gortmaker, Tejun Heo

On Tue, 2013-05-21 at 22:14 +0200, Frederic Weisbecker wrote:
> 2013/5/21 Steven Rostedt <rostedt@goodmis.org>:
> > commit a382bf934449 "nohz: Assign timekeeping duty to a CPU outside the
> > full dynticks range" added a cpu notifier callback that would prevent
> > the time keeping CPU from going offline if the have_nohz_full_mask was
> > set.
> >
> > This also prevents the CPU from going offline on system reboot.
> >
> > Worse yet, the return code was -EINVAL, but the notifier does not
> > recognize error codes, and it must be wrapped by a notifier_from_errno()
> > function. This means that even though the CPU would fail to go down, the
> > notifier would think it succeeded, and the cpu down process would
> > continue.
> >
> > This caused two different problems. One, the migration thread after
> > moving tasks from the CPU would park itself and then a task, namely the
> > reboot task, could migrate onto that CPU. Then the reboot task spins
> > waiting for the cpu to go idle. But because the reboot task happens to
> > be spinning on the cpu its waiting for, the system hangs.
> 
> Can that happen if that CPU is the boot CPU? Note this is the only
> possible timekeeper with the upstream code.

Yep it can happen in upstream (that's all I'm using). In
tick_broadcast_setup_oneshot(), it sets the tick_do_timer_cpu to the
current CPU, which can be something other than the boot CPU. Now that
CPU wont be able to be hot plugged.

> 
> >
> > The other error that happened was that the sched_domain re-setup would
> > get confused, and in get_group() the cpu = cpumask_first() would process
> > a mask that had nothing set, and return cpu > nr_cpu_ids. Later it would
> > reference the per_cpu sg with this CPU and get a bogus pointer and
> > crash.
> 
> Ouch, when are we doing this domain re-setup? I remember we
> repartition the domains after cpu down/up but I don't understand how
> that can interfere with this issue.

I haven't looked hard enough yet, but this problem only appeared when
this bug triggered. By telling the system a CPU is offline, but still
having tasks schedule to it, causes all sorts of weird side effects. I
haven't figured out in detail how this affected the sched domains, but I
don't get the sched domain corruption after fixing this bug.

-- Steve



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][3.10] nohz: Fix lockup on restart from wrong error code
  2013-05-22  0:09   ` Steven Rostedt
@ 2013-05-22 16:09     ` Frederic Weisbecker
  0 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2013-05-22 16:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Paul E. McKenney, Andrew Morton, Paul Gortmaker, Tejun Heo

2013/5/22, Steven Rostedt <rostedt@goodmis.org>:
> On Tue, 2013-05-21 at 22:14 +0200, Frederic Weisbecker wrote:
>> 2013/5/21 Steven Rostedt <rostedt@goodmis.org>:
>> > commit a382bf934449 "nohz: Assign timekeeping duty to a CPU outside the
>> > full dynticks range" added a cpu notifier callback that would prevent
>> > the time keeping CPU from going offline if the have_nohz_full_mask was
>> > set.
>> >
>> > This also prevents the CPU from going offline on system reboot.
>> >
>> > Worse yet, the return code was -EINVAL, but the notifier does not
>> > recognize error codes, and it must be wrapped by a
>> > notifier_from_errno()
>> > function. This means that even though the CPU would fail to go down,
>> > the
>> > notifier would think it succeeded, and the cpu down process would
>> > continue.
>> >
>> > This caused two different problems. One, the migration thread after
>> > moving tasks from the CPU would park itself and then a task, namely the
>> > reboot task, could migrate onto that CPU. Then the reboot task spins
>> > waiting for the cpu to go idle. But because the reboot task happens to
>> > be spinning on the cpu its waiting for, the system hangs.
>>
>> Can that happen if that CPU is the boot CPU? Note this is the only
>> possible timekeeper with the upstream code.
>
> Yep it can happen in upstream (that's all I'm using). In
> tick_broadcast_setup_oneshot(), it sets the tick_do_timer_cpu to the
> current CPU, which can be something other than the boot CPU. Now that
> CPU wont be able to be hot plugged.

Ah indeed it can happen on broadcast timer initialization. A secondary
CPU then steal the duty from the boot CPU. Hmm this reminds me of this
patch:  https://patchwork.kernel.org/patch/2302951/
I thought it deserved some attention due its code simplification but
it could also solve the issue.

>>
>> >
>> > The other error that happened was that the sched_domain re-setup would
>> > get confused, and in get_group() the cpu = cpumask_first() would
>> > process
>> > a mask that had nothing set, and return cpu > nr_cpu_ids. Later it
>> > would
>> > reference the per_cpu sg with this CPU and get a bogus pointer and
>> > crash.
>>
>> Ouch, when are we doing this domain re-setup? I remember we
>> repartition the domains after cpu down/up but I don't understand how
>> that can interfere with this issue.
>
> I haven't looked hard enough yet, but this problem only appeared when
> this bug triggered. By telling the system a CPU is offline, but still
> having tasks schedule to it, causes all sorts of weird side effects. I
> haven't figured out in detail how this affected the sched domains, but I
> don't get the sched domain corruption after fixing this bug.

Weird. but the CPU refuses to offline so how could it see itself
online? Anyway if that happen again I'll have a look when I'm fully
back next week.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-05-22 16:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-21 17:46 [PATCH][3.10] nohz: Fix lockup on restart from wrong error code Steven Rostedt
2013-05-21 18:12 ` Frederic Weisbecker
2013-05-21 18:24   ` Steven Rostedt
2013-05-21 20:05     ` Frederic Weisbecker
2013-05-21 20:14 ` Frederic Weisbecker
2013-05-22  0:09   ` Steven Rostedt
2013-05-22 16:09     ` Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.