* [PATCH 0/2] x86, reboot: cleanup NMI and REBOOT_IRQ @ 2012-02-10 21:02 Don Zickus 2012-02-10 21:02 ` [PATCH 1/2] x86, reboot: revert stop_other_cpus to using IRQ with NMI fallback Don Zickus 2012-02-10 21:02 ` [PATCH 2/2] x86, reschedule: check to see if system is shutting down Don Zickus 0 siblings, 2 replies; 14+ messages in thread From: Don Zickus @ 2012-02-10 21:02 UTC (permalink / raw) To: x86 Cc: LKML, Peter Zijlstra, tony.luck, seiji.aguchi, ak, mjg, levinsasha928, Don Zickus After dealing with pstore conversations about spin locks, I had an idea to simplify the native_smp_stop_other_cpus() path by using both REBOOT_IRQ and NMI instead of using either or. I also cleaned up a WARN_ON splat from rescheduling. Tested 10 panics on my core2 quad using 'echo c > /proc/sysrq-trigger' and panic=10 on the commandline. The machine panic'd and rebooted succesfully all 10 times. Though only the first time did I see the WARN_ON splat, the other 9 times I couldn't duplicate it. Don Zickus (2): x86, reboot: revert stop_other_cpus to using IRQ with NMI fallback x86, reschedule: check to see if system is shutting down arch/x86/kernel/smp.c | 104 +++++++++++++++++++++++-------------------------- 1 files changed, 49 insertions(+), 55 deletions(-) -- 1.7.7.6 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/2] x86, reboot: revert stop_other_cpus to using IRQ with NMI fallback 2012-02-10 21:02 [PATCH 0/2] x86, reboot: cleanup NMI and REBOOT_IRQ Don Zickus @ 2012-02-10 21:02 ` Don Zickus 2012-02-10 21:02 ` [PATCH 2/2] x86, reschedule: check to see if system is shutting down Don Zickus 1 sibling, 0 replies; 14+ messages in thread From: Don Zickus @ 2012-02-10 21:02 UTC (permalink / raw) To: x86 Cc: LKML, Peter Zijlstra, tony.luck, seiji.aguchi, ak, mjg, levinsasha928, Don Zickus For v3.3, I added code to use the NMI to stop other cpus in the panic case. The idea was to make sure all cpus on the system were definitely halted to help serialize the panic path to execute the rest of the code on a single cpu. The main problem it was trying to solve was how to stop a cpu that was spinning with its irqs disabled. A IPI irq would be stuck and couldn't get in there, but an NMI could. Things were great until we had another conversation about some pstore changes. Because some of the backend pstore still uses spinlocks to protect the device access, things could get ugly if a panic happened and we were stuck spinning on a lock. Now with the NMI shutting down cpus, we could assume no other cpus were running and just bust the spin lock and proceed. The counter argument was, well if you do that the backend could be in a screwed up state and you might not be able to save anything as a result. If we could have just given the cpu a little more time to finish things, we could have grabbed the spin lock cleanly and everything would have been fine. Well, how do give a cpu a 'little more time' in the panic case? For the most part you can't without spinning on the lock and even in that case, how long do you spin for? So instead of making it ugly in the pstore code, I had the idea that most spin locks are held with irqs disabled, naturally blocking other irqs until they are done. We just need an irq to sit there and grab the cpu once the lock is released and irqs are re-enabled again. I decided to modify stop_other_cpus to go back to the REBOOT_IRQ code and use that IPI irq as the blocking irq. This code has been working for a long time and will give up after one second. To fix the original problem of what happens after one second if a cpu hasn't accepted the IPI, I just added the NMI hammer to clobber the cpu. The end result of this more complicated-looking-diff-than-it-really-is, is a patch that mostly reverts 3603a25 x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus and instead just sticks another if-case after the REBOOT_IRQ and checks to see if online_cpus are still > 1, and if so, clobber the rest of the cpus with an NMI. For the most part, the NMI piece will never get hit, thus behaving just like pre-v3.3 code. However, in those rare conditions, we have a fallback plan. I still wrap the NMI check with the knob 'nonmi_ipi' in case someone still has issues with NMIs in the panic path. I also reset the original names of the functions. Signed-off-by: Don Zickus <dzickus@redhat.com> --- arch/x86/kernel/smp.c | 100 ++++++++++++++++++++++-------------------------- 1 files changed, 46 insertions(+), 54 deletions(-) diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 66c74f4..48d2b7d 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -109,6 +109,9 @@ * about nothing of note with C stepping upwards. */ +static atomic_t stopping_cpu = ATOMIC_INIT(-1); +static bool smp_no_nmi_ipi = false; + /* * this function sends a 'reschedule' IPI to another CPU. * it goes straight through and wastes no time serializing @@ -149,8 +152,6 @@ void native_send_call_func_ipi(const struct cpumask *mask) free_cpumask_var(allbutself); } -static atomic_t stopping_cpu = ATOMIC_INIT(-1); - static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs) { /* We are registered on stopping cpu too, avoid spurious NMI */ @@ -162,7 +163,19 @@ static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs) return NMI_HANDLED; } -static void native_nmi_stop_other_cpus(int wait) +/* + * this function calls the 'stop' function on all other CPUs in the system. + */ + +asmlinkage void smp_reboot_interrupt(void) +{ + ack_APIC_irq(); + irq_enter(); + stop_this_cpu(NULL); + irq_exit(); +} + +static void native_stop_other_cpus(int wait) { unsigned long flags; unsigned long timeout; @@ -174,20 +187,25 @@ static void native_nmi_stop_other_cpus(int wait) * Use an own vector here because smp_call_function * does lots of things not suitable in a panic situation. */ + + /* + * We start by using the REBOOT_VECTOR irq. + * The irq is treated as a sync point to allow critical + * regions of code on other cpus to release their spin locks + * and re-enable irqs. Jumping straight to an NMI might + * accidentally cause deadlocks with further shutdown/panic + * code. By syncing, we give the cpus up to one second to + * finish their work before we force them off with the NMI. + */ if (num_online_cpus() > 1) { /* did someone beat us here? */ if (atomic_cmpxchg(&stopping_cpu, -1, safe_smp_processor_id()) != -1) return; - if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback, - NMI_FLAG_FIRST, "smp_stop")) - /* Note: we ignore failures here */ - return; - - /* sync above data before sending NMI */ + /* sync above data before sending IRQ */ wmb(); - apic->send_IPI_allbutself(NMI_VECTOR); + apic->send_IPI_allbutself(REBOOT_VECTOR); /* * Don't wait longer than a second if the caller @@ -197,63 +215,37 @@ static void native_nmi_stop_other_cpus(int wait) while (num_online_cpus() > 1 && (wait || timeout--)) udelay(1); } + + /* if the REBOOT_VECTOR didn't work, try with the NMI */ + if ((num_online_cpus() > 1) && (!smp_no_nmi_ipi)) { + if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback, + NMI_FLAG_FIRST, "smp_stop")) + /* Note: we ignore failures here */ + /* Hope the REBOOT_IRQ is good enough */ + goto finish; - local_irq_save(flags); - disable_local_APIC(); - local_irq_restore(flags); -} - -/* - * this function calls the 'stop' function on all other CPUs in the system. - */ - -asmlinkage void smp_reboot_interrupt(void) -{ - ack_APIC_irq(); - irq_enter(); - stop_this_cpu(NULL); - irq_exit(); -} - -static void native_irq_stop_other_cpus(int wait) -{ - unsigned long flags; - unsigned long timeout; + /* sync above data before sending IRQ */ + wmb(); - if (reboot_force) - return; + pr_emerg("Shutting down cpus with NMI\n"); - /* - * Use an own vector here because smp_call_function - * does lots of things not suitable in a panic situation. - * On most systems we could also use an NMI here, - * but there are a few systems around where NMI - * is problematic so stay with an non NMI for now - * (this implies we cannot stop CPUs spinning with irq off - * currently) - */ - if (num_online_cpus() > 1) { - apic->send_IPI_allbutself(REBOOT_VECTOR); + apic->send_IPI_allbutself(NMI_VECTOR); /* - * Don't wait longer than a second if the caller + * Don't wait longer than a 10 ms if the caller * didn't ask us to wait. */ - timeout = USEC_PER_SEC; + timeout = USEC_PER_MSEC * 10; while (num_online_cpus() > 1 && (wait || timeout--)) udelay(1); } +finish: local_irq_save(flags); disable_local_APIC(); local_irq_restore(flags); } -static void native_smp_disable_nmi_ipi(void) -{ - smp_ops.stop_other_cpus = native_irq_stop_other_cpus; -} - /* * Reschedule call back. */ @@ -287,8 +279,8 @@ void smp_call_function_single_interrupt(struct pt_regs *regs) static int __init nonmi_ipi_setup(char *str) { - native_smp_disable_nmi_ipi(); - return 1; + smp_no_nmi_ipi = true; + return 1; } __setup("nonmi_ipi", nonmi_ipi_setup); @@ -298,7 +290,7 @@ struct smp_ops smp_ops = { .smp_prepare_cpus = native_smp_prepare_cpus, .smp_cpus_done = native_smp_cpus_done, - .stop_other_cpus = native_nmi_stop_other_cpus, + .stop_other_cpus = native_stop_other_cpus, .smp_send_reschedule = native_smp_send_reschedule, .cpu_up = native_cpu_up, -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/2] x86, reschedule: check to see if system is shutting down 2012-02-10 21:02 [PATCH 0/2] x86, reboot: cleanup NMI and REBOOT_IRQ Don Zickus 2012-02-10 21:02 ` [PATCH 1/2] x86, reboot: revert stop_other_cpus to using IRQ with NMI fallback Don Zickus @ 2012-02-10 21:02 ` Don Zickus 2012-02-10 22:42 ` Luck, Tony 1 sibling, 1 reply; 14+ messages in thread From: Don Zickus @ 2012-02-10 21:02 UTC (permalink / raw) To: x86 Cc: LKML, Peter Zijlstra, tony.luck, seiji.aguchi, ak, mjg, levinsasha928, Don Zickus Due to reschedule changes in v3.3, a WARN_ON has popped up because a cpu suddenly became offline without letting anyone know. This results in the schedule trying to move something to another cpu only to find out it isn't there and getting confused. The splat looks something like: [ 32.448626] ------------[ cut here ]------------ [ 32.449160] WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43() [ 32.449621] Pid: 1, comm: init_stage2 Not tainted 3.2.0+ #14 [ 32.449621] Call Trace: [ 32.449621] <IRQ> [<ffffffff81041a44>] ? native_smp_send_reschedule+0x25/0x43 [ 32.449621] [<ffffffff810735b2>] warn_slowpath_common+0x7b/0x93 [ 32.449621] [<ffffffff810962cc>] ? tick_nohz_handler+0xc9/0xc9 [ 32.449621] [<ffffffff81073675>] warn_slowpath_null+0x15/0x18 [ 32.449621] [<ffffffff81041a44>] native_smp_send_reschedule+0x25/0x43 [ 32.449621] [<ffffffff81067a00>] smp_send_reschedule+0xa/0xc [ 32.449621] [<ffffffff8106f25e>] scheduler_tick+0x21a/0x242 [ 32.449621] [<ffffffff8107da10>] update_process_times+0x62/0x73 [ 32.449621] [<ffffffff81096336>] tick_sched_timer+0x6a/0x8a [ 32.449621] [<ffffffff8108c5eb>] __run_hrtimer.clone.26+0x55/0xcb [ 32.449621] [<ffffffff8108cd77>] hrtimer_interrupt+0xcb/0x19b [ 32.449621] [<ffffffff810428a8>] smp_apic_timer_interrupt+0x72/0x85 [ 32.449621] [<ffffffff8165a8de>] apic_timer_interrupt+0x6e/0x80 [ 32.449621] <EOI> [<ffffffff8165928e>] ? _raw_spin_unlock_irqrestore+0x3a/0x3e [ 32.449621] [<ffffffff81042f4e>] ? arch_local_irq_restore+0x6/0xd [ 32.449621] [<ffffffff810430c4>] default_send_IPI_mask_allbutself_phys+0x78/0x88 [ 32.449621] [<ffffffff8106c3c4>] ? __migrate_task+0xf1/0xf1 [ 32.449621] [<ffffffff81045445>] physflat_send_IPI_allbutself+0x12/0x14 [ 32.449621] [<ffffffff81041aaf>] native_stop_other_cpus+0x4d/0xa8 [ 32.449621] [<ffffffff810411c6>] native_machine_shutdown+0x56/0x6d [ 32.449621] [<ffffffff81048499>] kvm_shutdown+0x1a/0x1c [ 32.449621] [<ffffffff810411f9>] machine_shutdown+0xa/0xc [ 32.449621] [<ffffffff81041265>] native_machine_restart+0x20/0x32 [ 32.449621] [<ffffffff81041297>] machine_restart+0xa/0xc [ 32.449621] [<ffffffff81081d53>] kernel_restart+0x49/0x4d I solved this by re-using the atomic global variable that is set during native_smp_stop_other_cpus(). This is the piece of code that causes the problem and it sets stopping_cpu to reflect the system is going down. If the variable is set do not yell with the WARN_ON, just return. Signed-off-by: Don Zickus <dzickus@redhat.com> --- arch/x86/kernel/smp.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 48d2b7d..026c8c2 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -120,7 +120,9 @@ static bool smp_no_nmi_ipi = false; static void native_smp_send_reschedule(int cpu) { if (unlikely(cpu_is_offline(cpu))) { - WARN_ON(1); + if (atomic_read(&stopping_cpu) == -1) + /* system is not shutting down.. yell */ + WARN_ON(1); return; } apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR); -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* RE: [PATCH 2/2] x86, reschedule: check to see if system is shutting down 2012-02-10 21:02 ` [PATCH 2/2] x86, reschedule: check to see if system is shutting down Don Zickus @ 2012-02-10 22:42 ` Luck, Tony 2012-02-10 22:53 ` Don Zickus 0 siblings, 1 reply; 14+ messages in thread From: Luck, Tony @ 2012-02-10 22:42 UTC (permalink / raw) To: Don Zickus, x86 Cc: LKML, Peter Zijlstra, seiji.aguchi, ak, mjg, levinsasha928 - WARN_ON(1); + if (atomic_read(&stopping_cpu) == -1) + /* system is not shutting down.. yell */ + WARN_ON(1); Should that be written as: /* system is not shutting down.. yell */ WARN_ON(atomic_read(&stopping_cpu) == -1); the if (something) WARN_ON(1); looks weird. -Tony ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86, reschedule: check to see if system is shutting down 2012-02-10 22:42 ` Luck, Tony @ 2012-02-10 22:53 ` Don Zickus 0 siblings, 0 replies; 14+ messages in thread From: Don Zickus @ 2012-02-10 22:53 UTC (permalink / raw) To: Luck, Tony Cc: x86, LKML, Peter Zijlstra, seiji.aguchi, ak, mjg, levinsasha928 On Fri, Feb 10, 2012 at 10:42:24PM +0000, Luck, Tony wrote: > - WARN_ON(1); > + if (atomic_read(&stopping_cpu) == -1) > + /* system is not shutting down.. yell */ > + WARN_ON(1); > > Should that be written as: > > /* system is not shutting down.. yell */ > WARN_ON(atomic_read(&stopping_cpu) == -1); > > the > if (something) > WARN_ON(1); > > looks weird. Indeed. In my haste, I wrote some pretty sloppy code. Thanks for catching that. Cheers, Don ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 0/2 v2] x86, reboot: cleanup NMI and REBOOT_IRQ @ 2012-02-13 20:27 Don Zickus 2012-02-13 20:27 ` [PATCH 2/2] x86, reschedule: check to see if system is shutting down Don Zickus 0 siblings, 1 reply; 14+ messages in thread From: Don Zickus @ 2012-02-13 20:27 UTC (permalink / raw) To: x86 Cc: LKML, Peter Zijlstra, tony.luck, seiji.aguchi, ak, mjg, levinsasha928, Don Zickus After dealing with pstore conversations about spin locks, I had an idea to simplify the native_smp_stop_other_cpus() path by using both REBOOT_IRQ and NMI instead of using either or. I also cleaned up a WARN_ON splat from rescheduling. Tested 10 panics on my core2 quad using 'echo c > /proc/sysrq-trigger' and panic=10 on the commandline. The machine panic'd and rebooted succesfully all 10 times. Though only the first time did I see the WARN_ON splat, the other 9 times I couldn't duplicate it. Don Zickus (2): x86, reboot: revert stop_other_cpus to using IRQ with NMI fallback x86, reschedule: check to see if system is shutting down arch/x86/kernel/smp.c | 103 +++++++++++++++++++++++-------------------------- 1 files changed, 48 insertions(+), 55 deletions(-) -- 1.7.7.6 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2] x86, reschedule: check to see if system is shutting down 2012-02-13 20:27 [PATCH 0/2 v2] x86, reboot: cleanup NMI and REBOOT_IRQ Don Zickus @ 2012-02-13 20:27 ` Don Zickus 2012-02-13 21:22 ` Seiji Aguchi 2012-02-15 11:26 ` Peter Zijlstra 0 siblings, 2 replies; 14+ messages in thread From: Don Zickus @ 2012-02-13 20:27 UTC (permalink / raw) To: x86 Cc: LKML, Peter Zijlstra, tony.luck, seiji.aguchi, ak, mjg, levinsasha928, Don Zickus Due to reschedule changes in v3.3, a WARN_ON has popped up because a cpu suddenly became offline without letting anyone know. This results in the schedule trying to move something to another cpu only to find out it isn't there and getting confused. The splat looks something like: [ 32.448626] ------------[ cut here ]------------ [ 32.449160] WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43() [ 32.449621] Pid: 1, comm: init_stage2 Not tainted 3.2.0+ #14 [ 32.449621] Call Trace: [ 32.449621] <IRQ> [<ffffffff81041a44>] ? native_smp_send_reschedule+0x25/0x43 [ 32.449621] [<ffffffff810735b2>] warn_slowpath_common+0x7b/0x93 [ 32.449621] [<ffffffff810962cc>] ? tick_nohz_handler+0xc9/0xc9 [ 32.449621] [<ffffffff81073675>] warn_slowpath_null+0x15/0x18 [ 32.449621] [<ffffffff81041a44>] native_smp_send_reschedule+0x25/0x43 [ 32.449621] [<ffffffff81067a00>] smp_send_reschedule+0xa/0xc [ 32.449621] [<ffffffff8106f25e>] scheduler_tick+0x21a/0x242 [ 32.449621] [<ffffffff8107da10>] update_process_times+0x62/0x73 [ 32.449621] [<ffffffff81096336>] tick_sched_timer+0x6a/0x8a [ 32.449621] [<ffffffff8108c5eb>] __run_hrtimer.clone.26+0x55/0xcb [ 32.449621] [<ffffffff8108cd77>] hrtimer_interrupt+0xcb/0x19b [ 32.449621] [<ffffffff810428a8>] smp_apic_timer_interrupt+0x72/0x85 [ 32.449621] [<ffffffff8165a8de>] apic_timer_interrupt+0x6e/0x80 [ 32.449621] <EOI> [<ffffffff8165928e>] ? _raw_spin_unlock_irqrestore+0x3a/0x3e [ 32.449621] [<ffffffff81042f4e>] ? arch_local_irq_restore+0x6/0xd [ 32.449621] [<ffffffff810430c4>] default_send_IPI_mask_allbutself_phys+0x78/0x88 [ 32.449621] [<ffffffff8106c3c4>] ? __migrate_task+0xf1/0xf1 [ 32.449621] [<ffffffff81045445>] physflat_send_IPI_allbutself+0x12/0x14 [ 32.449621] [<ffffffff81041aaf>] native_stop_other_cpus+0x4d/0xa8 [ 32.449621] [<ffffffff810411c6>] native_machine_shutdown+0x56/0x6d [ 32.449621] [<ffffffff81048499>] kvm_shutdown+0x1a/0x1c [ 32.449621] [<ffffffff810411f9>] machine_shutdown+0xa/0xc [ 32.449621] [<ffffffff81041265>] native_machine_restart+0x20/0x32 [ 32.449621] [<ffffffff81041297>] machine_restart+0xa/0xc [ 32.449621] [<ffffffff81081d53>] kernel_restart+0x49/0x4d I solved this by re-using the atomic global variable that is set during native_smp_stop_other_cpus(). This is the piece of code that causes the problem and it sets stopping_cpu to reflect the system is going down. If the variable is set do not yell with the WARN_ON, just return. v2: use if-condition for WARN-ON instead of if {WARN_ON(1)} Signed-off-by: Don Zickus <dzickus@redhat.com> --- arch/x86/kernel/smp.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 48d2b7d..54d570e 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -120,7 +120,8 @@ static bool smp_no_nmi_ipi = false; static void native_smp_send_reschedule(int cpu) { if (unlikely(cpu_is_offline(cpu))) { - WARN_ON(1); + /* system is not shutting down.. yell */ + WARN_ON(atomic_read(&stopping_cpu) == -1) return; } apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR); -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* RE: [PATCH 2/2] x86, reschedule: check to see if system is shutting down 2012-02-13 20:27 ` [PATCH 2/2] x86, reschedule: check to see if system is shutting down Don Zickus @ 2012-02-13 21:22 ` Seiji Aguchi 2012-02-13 22:43 ` Don Zickus 2012-02-15 11:26 ` Peter Zijlstra 1 sibling, 1 reply; 14+ messages in thread From: Seiji Aguchi @ 2012-02-13 21:22 UTC (permalink / raw) To: Don Zickus, x86; +Cc: LKML, Peter Zijlstra, tony.luck, ak, mjg, levinsasha928 >+ WARN_ON(atomic_read(&stopping_cpu) == -1) WARN_ON(atomic_read(&stopping_cpu) == -1); ? ";" is missing... Seiji -----Original Message----- From: Don Zickus [mailto:dzickus@redhat.com] Sent: Monday, February 13, 2012 3:28 PM To: x86@kernel.org Cc: LKML; Peter Zijlstra; tony.luck@intel.com; Seiji Aguchi; ak@linux.intel.com; mjg@redhat.com; levinsasha928@gmail.com; Don Zickus Subject: [PATCH 2/2] x86, reschedule: check to see if system is shutting down Due to reschedule changes in v3.3, a WARN_ON has popped up because a cpu suddenly became offline without letting anyone know. This results in the schedule trying to move something to another cpu only to find out it isn't there and getting confused. The splat looks something like: [ 32.448626] ------------[ cut here ]------------ [ 32.449160] WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43() [ 32.449621] Pid: 1, comm: init_stage2 Not tainted 3.2.0+ #14 [ 32.449621] Call Trace: [ 32.449621] <IRQ> [<ffffffff81041a44>] ? native_smp_send_reschedule+0x25/0x43 [ 32.449621] [<ffffffff810735b2>] warn_slowpath_common+0x7b/0x93 [ 32.449621] [<ffffffff810962cc>] ? tick_nohz_handler+0xc9/0xc9 [ 32.449621] [<ffffffff81073675>] warn_slowpath_null+0x15/0x18 [ 32.449621] [<ffffffff81041a44>] native_smp_send_reschedule+0x25/0x43 [ 32.449621] [<ffffffff81067a00>] smp_send_reschedule+0xa/0xc [ 32.449621] [<ffffffff8106f25e>] scheduler_tick+0x21a/0x242 [ 32.449621] [<ffffffff8107da10>] update_process_times+0x62/0x73 [ 32.449621] [<ffffffff81096336>] tick_sched_timer+0x6a/0x8a [ 32.449621] [<ffffffff8108c5eb>] __run_hrtimer.clone.26+0x55/0xcb [ 32.449621] [<ffffffff8108cd77>] hrtimer_interrupt+0xcb/0x19b [ 32.449621] [<ffffffff810428a8>] smp_apic_timer_interrupt+0x72/0x85 [ 32.449621] [<ffffffff8165a8de>] apic_timer_interrupt+0x6e/0x80 [ 32.449621] <EOI> [<ffffffff8165928e>] ? _raw_spin_unlock_irqrestore+0x3a/0x3e [ 32.449621] [<ffffffff81042f4e>] ? arch_local_irq_restore+0x6/0xd [ 32.449621] [<ffffffff810430c4>] default_send_IPI_mask_allbutself_phys+0x78/0x88 [ 32.449621] [<ffffffff8106c3c4>] ? __migrate_task+0xf1/0xf1 [ 32.449621] [<ffffffff81045445>] physflat_send_IPI_allbutself+0x12/0x14 [ 32.449621] [<ffffffff81041aaf>] native_stop_other_cpus+0x4d/0xa8 [ 32.449621] [<ffffffff810411c6>] native_machine_shutdown+0x56/0x6d [ 32.449621] [<ffffffff81048499>] kvm_shutdown+0x1a/0x1c [ 32.449621] [<ffffffff810411f9>] machine_shutdown+0xa/0xc [ 32.449621] [<ffffffff81041265>] native_machine_restart+0x20/0x32 [ 32.449621] [<ffffffff81041297>] machine_restart+0xa/0xc [ 32.449621] [<ffffffff81081d53>] kernel_restart+0x49/0x4d I solved this by re-using the atomic global variable that is set during native_smp_stop_other_cpus(). This is the piece of code that causes the problem and it sets stopping_cpu to reflect the system is going down. If the variable is set do not yell with the WARN_ON, just return. v2: use if-condition for WARN-ON instead of if {WARN_ON(1)} Signed-off-by: Don Zickus <dzickus@redhat.com> --- arch/x86/kernel/smp.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 48d2b7d..54d570e 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -120,7 +120,8 @@ static bool smp_no_nmi_ipi = false; static void native_smp_send_reschedule(int cpu) { if (unlikely(cpu_is_offline(cpu))) { - WARN_ON(1); + /* system is not shutting down.. yell */ + WARN_ON(atomic_read(&stopping_cpu) == -1) return; } apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR); -- 1.7.7.6 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86, reschedule: check to see if system is shutting down 2012-02-13 21:22 ` Seiji Aguchi @ 2012-02-13 22:43 ` Don Zickus 0 siblings, 0 replies; 14+ messages in thread From: Don Zickus @ 2012-02-13 22:43 UTC (permalink / raw) To: Seiji Aguchi; +Cc: x86, LKML, Peter Zijlstra, tony.luck, ak, mjg, levinsasha928 On Mon, Feb 13, 2012 at 04:22:21PM -0500, Seiji Aguchi wrote: > > >+ WARN_ON(atomic_read(&stopping_cpu) == -1) > > WARN_ON(atomic_read(&stopping_cpu) == -1); ? Ugh, I really need to focus on some stuff at work before I send more crap to the list. Sorry about that. Cheers, Don ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86, reschedule: check to see if system is shutting down 2012-02-13 20:27 ` [PATCH 2/2] x86, reschedule: check to see if system is shutting down Don Zickus 2012-02-13 21:22 ` Seiji Aguchi @ 2012-02-15 11:26 ` Peter Zijlstra 2012-02-15 14:54 ` Don Zickus 1 sibling, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2012-02-15 11:26 UTC (permalink / raw) To: Don Zickus; +Cc: x86, LKML, tony.luck, seiji.aguchi, ak, mjg, levinsasha928 On Mon, 2012-02-13 at 15:27 -0500, Don Zickus wrote: > Due to reschedule changes in v3.3, a WARN_ON has popped up > because a cpu suddenly became offline without letting anyone > know. > > This results in the schedule trying to move something to another > cpu only to find out it isn't there and getting confused. The splat > looks something like: <snip> > I solved this by re-using the atomic global variable that is set during > native_smp_stop_other_cpus(). This is the piece of code that causes the > problem and it sets stopping_cpu to reflect the system is going down. > > If the variable is set do not yell with the WARN_ON, just return. > > v2: use if-condition for WARN-ON instead of if {WARN_ON(1)} > > Signed-off-by: Don Zickus <dzickus@redhat.com> > --- > arch/x86/kernel/smp.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c > index 48d2b7d..54d570e 100644 > --- a/arch/x86/kernel/smp.c > +++ b/arch/x86/kernel/smp.c > @@ -120,7 +120,8 @@ static bool smp_no_nmi_ipi = false; > static void native_smp_send_reschedule(int cpu) > { > if (unlikely(cpu_is_offline(cpu))) { > - WARN_ON(1); > + /* system is not shutting down.. yell */ > + WARN_ON(atomic_read(&stopping_cpu) == -1) > return; > } > apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR); Right, so this fixes this one particular case, I imagine there's tons of places that could go splat due to this (but don't quite yet for some reason). We can't go around annotating everything, nor would we want to simply shut up all warnings for fear of missing an actual error. Why can't the normal shut-down path use a less crazy approach to going down? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86, reschedule: check to see if system is shutting down 2012-02-15 11:26 ` Peter Zijlstra @ 2012-02-15 14:54 ` Don Zickus 2012-02-15 14:57 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Don Zickus @ 2012-02-15 14:54 UTC (permalink / raw) To: Peter Zijlstra; +Cc: x86, LKML, tony.luck, seiji.aguchi, ak, mjg, levinsasha928 On Wed, Feb 15, 2012 at 12:26:37PM +0100, Peter Zijlstra wrote: > > Right, so this fixes this one particular case, I imagine there's tons of > places that could go splat due to this (but don't quite yet for some > reason). > > We can't go around annotating everything, nor would we want to simply > shut up all warnings for fear of missing an actual error. > > Why can't the normal shut-down path use a less crazy approach to going > down? Well maybe it can, it's been like that way for over three years now. I'm surprised no one ran into issues before now. The only thing I can think that would work is stop_machine(). Pass in a halt function and a cpumask of everyone but smp_processor_id(). That would solve the problem, no? Cheers, Don ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86, reschedule: check to see if system is shutting down 2012-02-15 14:54 ` Don Zickus @ 2012-02-15 14:57 ` Peter Zijlstra 2012-02-15 15:57 ` Don Zickus 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2012-02-15 14:57 UTC (permalink / raw) To: Don Zickus; +Cc: x86, LKML, tony.luck, seiji.aguchi, ak, mjg, levinsasha928 On Wed, 2012-02-15 at 09:54 -0500, Don Zickus wrote: > On Wed, Feb 15, 2012 at 12:26:37PM +0100, Peter Zijlstra wrote: > > > > Right, so this fixes this one particular case, I imagine there's tons of > > places that could go splat due to this (but don't quite yet for some > > reason). > > > > We can't go around annotating everything, nor would we want to simply > > shut up all warnings for fear of missing an actual error. > > > > Why can't the normal shut-down path use a less crazy approach to going > > down? > > Well maybe it can, it's been like that way for over three years now. I'm > surprised no one ran into issues before now. > > The only thing I can think that would work is stop_machine(). Pass in a > halt function and a cpumask of everyone but smp_processor_id(). That > would solve the problem, no? nope.. same problem, you're not telling anybody you're shooting CPUs down -- this telling is usually done through cpu hotplug notifiers that fix up state. The only way is to unplug all cpus except the one. Problem with that is that we cannot (as of yet) unplug the boot cpu. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86, reschedule: check to see if system is shutting down 2012-02-15 14:57 ` Peter Zijlstra @ 2012-02-15 15:57 ` Don Zickus 2012-02-15 17:59 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Don Zickus @ 2012-02-15 15:57 UTC (permalink / raw) To: Peter Zijlstra; +Cc: x86, LKML, tony.luck, seiji.aguchi, ak, mjg, levinsasha928 On Wed, Feb 15, 2012 at 03:57:15PM +0100, Peter Zijlstra wrote: > On Wed, 2012-02-15 at 09:54 -0500, Don Zickus wrote: > > On Wed, Feb 15, 2012 at 12:26:37PM +0100, Peter Zijlstra wrote: > > > > > > Right, so this fixes this one particular case, I imagine there's tons of > > > places that could go splat due to this (but don't quite yet for some > > > reason). > > > > > > We can't go around annotating everything, nor would we want to simply > > > shut up all warnings for fear of missing an actual error. > > > > > > Why can't the normal shut-down path use a less crazy approach to going > > > down? > > > > Well maybe it can, it's been like that way for over three years now. I'm > > surprised no one ran into issues before now. > > > > The only thing I can think that would work is stop_machine(). Pass in a > > halt function and a cpumask of everyone but smp_processor_id(). That > > would solve the problem, no? > > nope.. same problem, you're not telling anybody you're shooting CPUs > down -- this telling is usually done through cpu hotplug notifiers that > fix up state. Why? If you have successfully sync'd up the cpus, stopped them and then run a 'stop_cpu' function, you stop all those WARN_ONs I would think. And how much do we care that we fix up the state on a shutdown? We have already shutdown all the processes and unmounted the disks? Most of the drivers have probably been shutdown cleanly. What is left that we have to be polite? > > The only way is to unplug all cpus except the one. Problem with that is > that we cannot (as of yet) unplug the boot cpu. Yeah, well we can migrate to the boot cpu. I think powerpc does that for kdump. Cheers, Don ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86, reschedule: check to see if system is shutting down 2012-02-15 15:57 ` Don Zickus @ 2012-02-15 17:59 ` Peter Zijlstra 2012-02-16 3:14 ` Don Zickus 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2012-02-15 17:59 UTC (permalink / raw) To: Don Zickus; +Cc: x86, LKML, tony.luck, seiji.aguchi, ak, mjg, levinsasha928 On Wed, 2012-02-15 at 10:57 -0500, Don Zickus wrote: > > nope.. same problem, you're not telling anybody you're shooting CPUs > > down -- this telling is usually done through cpu hotplug notifiers that > > fix up state. > > Why? If you have successfully sync'd up the cpus, stopped them and then > run a 'stop_cpu' function, you stop all those WARN_ONs I would think. So the one you're running in here is native_smp_send_reschedule(), its send by the scheduler to kick a remote cpu. Doing kstopmachine() flipping online bits and calling hlt with irqs disabled doesn't inform the scheduler those CPUs aren't actually around anymore. > And > how much do we care that we fix up the state on a shutdown? We have > already shutdown all the processes and unmounted the disks? Most of the > drivers have probably been shutdown cleanly. What is left that we have to > be polite? Dunno, probably not much, but things like RCU might just want to try and wait for remote CPUs to complete their grace period, which'll be a while with them being quite dead. There might also still be work left in workqueues, timers left on timer wheels etc.. timers won't tick since their apic is dead along with the cpu, so that shouldn't be a big problem unless someone is waiting for one to complete. Similar with workqueues, I guess since tasks won't get scheduled. But the scheduler is still up and thinking its got all these CPUs, poking them and getting WARNed about it. > > The only way is to unplug all cpus except the one. Problem with that is > > that we cannot (as of yet) unplug the boot cpu. > > Yeah, well we can migrate to the boot cpu. I think powerpc does that for > kdump. Right, there's that. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86, reschedule: check to see if system is shutting down 2012-02-15 17:59 ` Peter Zijlstra @ 2012-02-16 3:14 ` Don Zickus 0 siblings, 0 replies; 14+ messages in thread From: Don Zickus @ 2012-02-16 3:14 UTC (permalink / raw) To: Peter Zijlstra; +Cc: x86, LKML, tony.luck, seiji.aguchi, ak, mjg, levinsasha928 On Wed, Feb 15, 2012 at 06:59:47PM +0100, Peter Zijlstra wrote: > > > The only way is to unplug all cpus except the one. Problem with that is > > > that we cannot (as of yet) unplug the boot cpu. > > > > Yeah, well we can migrate to the boot cpu. I think powerpc does that for > > kdump. > > Right, there's that. Well we are in luck, migrating to cpu0 is already done in the shutdown path. arch/x86/kernel/reboot.c::native_machine_shutdown:662 set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); Now most of the time it seems to be cpu0, but I guess corner cases can put it elsewhere. I replaced 'stop_other_cpus' with 'for_each_online_cpu(cpu) { cpu_down(cpu) }. And things seemed to work. Output below, patch below that: ===================== Please stand by while rebooting the system... [ 101.470240] md: stopping all md devices. [ 101.474346] kvm: exiting hardware virtualization [ 101.480071] sd 3:0:0:0: [sdb] Synchronizing SCSI cache [ 101.535977] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 101.643479] Restarting system. [ 101.646624] machine restart [ 101.649501] DONDON: on cpu2 [ 101.652502] DONDON: on new cpu0 [ 101.658397] Broke affinity for irq 42 [ 101.662168] Broke affinity for irq 47 [ 101.669281] bnx2fc: CPU 1 offline: Remove Rx thread [ 101.674381] CPU 1 offline: Remove Rx thread [ 101.680595] Broke affinity for irq 22 [ 101.686510] bnx2fc: CPU 2 offline: Remove Rx thread [ 101.691569] CPU 2 offline: Remove Rx thread [ 101.697260] Broke affinity for irq 16 [ 101.702057] lockdep: fixing up alternatives. [ 101.706476] SMP alternatives: switching to UP code [ 101.720162] bnx2fc: CPU 3 offline: Remove Rx thread [ 101.725206] CPU 3 offline: Remove Rx thread ==================== Does that work for you? Cheers, Don diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index d840e69..c3569ac 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -644,6 +644,7 @@ void native_machine_shutdown(void) /* The boot cpu is always logical cpu 0 */ int reboot_cpu_id = 0; + int cpu; #ifdef CONFIG_X86_32 /* See if there has been given a command line override */ @@ -652,17 +653,24 @@ void native_machine_shutdown(void) reboot_cpu_id = reboot_cpu; #endif + printk("DONDON: on cpu%d\n", smp_processor_id()); /* Make certain the cpu I'm about to reboot on is online */ if (!cpu_online(reboot_cpu_id)) reboot_cpu_id = smp_processor_id(); /* Make certain I only run on the appropriate processor */ set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id)); + printk("DONDON: on new cpu%d\n", smp_processor_id()); /* O.K Now that I'm on the appropriate processor, * stop all of the others. */ - stop_other_cpus(); + //stop_other_cpus(); + for_each_online_cpu(cpu) { + if (cpu == 0) + continue; + cpu_down(cpu); + } #endif lapic_shutdown(); ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2012-02-16 3:14 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-02-10 21:02 [PATCH 0/2] x86, reboot: cleanup NMI and REBOOT_IRQ Don Zickus 2012-02-10 21:02 ` [PATCH 1/2] x86, reboot: revert stop_other_cpus to using IRQ with NMI fallback Don Zickus 2012-02-10 21:02 ` [PATCH 2/2] x86, reschedule: check to see if system is shutting down Don Zickus 2012-02-10 22:42 ` Luck, Tony 2012-02-10 22:53 ` Don Zickus 2012-02-13 20:27 [PATCH 0/2 v2] x86, reboot: cleanup NMI and REBOOT_IRQ Don Zickus 2012-02-13 20:27 ` [PATCH 2/2] x86, reschedule: check to see if system is shutting down Don Zickus 2012-02-13 21:22 ` Seiji Aguchi 2012-02-13 22:43 ` Don Zickus 2012-02-15 11:26 ` Peter Zijlstra 2012-02-15 14:54 ` Don Zickus 2012-02-15 14:57 ` Peter Zijlstra 2012-02-15 15:57 ` Don Zickus 2012-02-15 17:59 ` Peter Zijlstra 2012-02-16 3:14 ` Don Zickus
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).