All of lore.kernel.org
 help / color / mirror / Atom feed
* [stable request 3.4] x86/reboot: Fix a warning message triggered by stop_other_cpus()
@ 2015-05-21 18:40 Vinson Lee
  2015-06-15  3:24 ` Zefan Li
  2015-08-01 19:52 ` Ben Hutchings
  0 siblings, 2 replies; 3+ messages in thread
From: Vinson Lee @ 2015-05-21 18:40 UTC (permalink / raw)
  To: stable, lizefan; +Cc: Feng Tang, Don Zickus, Peter Zijlstra, Ingo Molnar

Please backport upstream 3.5 commit
55c844a4dd16a4d1fdc0cf2a283ec631a02ec448 "x86/reboot: Fix a warning
message triggered by stop_other_cpus()" to stable tree 3.4.

commit 55c844a4dd16a4d1fdc0cf2a283ec631a02ec448
Author: Feng Tang <feng.tang@intel.com>
Date:   Wed May 30 23:15:41 2012 +0800

    x86/reboot: Fix a warning message triggered by stop_other_cpus()

    When rebooting our 24 CPU Westmere servers with 3.4-rc6, we
    always see this warning msg:

    Restarting system.
    machine restart
    ------------[ cut here ]------------
    WARNING: at arch/x86/kernel/smp.c:125
    native_smp_send_reschedule+0x74/0xa7() Hardware name: X8DTN
    Modules linked in: igb [last unloaded: scsi_wait_scan]
    Pid: 1, comm: systemd-shutdow Not tainted 3.4.0-rc6+ #22
    Call Trace:
     <IRQ>  [<ffffffff8102a41f>] warn_slowpath_common+0x7e/0x96
     [<ffffffff8102a44c>] warn_slowpath_null+0x15/0x17
     [<ffffffff81018cf7>] native_smp_send_reschedule+0x74/0xa7
     [<ffffffff810561c1>] trigger_load_balance+0x279/0x2a6
     [<ffffffff81050112>] scheduler_tick+0xe0/0xe9
     [<ffffffff81036768>] update_process_times+0x60/0x70
     [<ffffffff81062f2f>] tick_sched_timer+0x68/0x92
     [<ffffffff81046e33>] __run_hrtimer+0xb3/0x13c
     [<ffffffff81062ec7>] ? tick_nohz_handler+0xd0/0xd0
     [<ffffffff810474f2>] hrtimer_interrupt+0xdb/0x198
     [<ffffffff81019a35>] smp_apic_timer_interrupt+0x81/0x94
     [<ffffffff81655187>] apic_timer_interrupt+0x67/0x70
     <EOI>  [<ffffffff8101a3c4>] ?
default_send_IPI_mask_allbutself_phys+0xb4/0xc4
     [<ffffffff8101c680>] physflat_send_IPI_allbutself+0x12/0x14
     [<ffffffff81018db4>] native_nmi_stop_other_cpus+0x8a/0xd6
     [<ffffffff810188ba>] native_machine_shutdown+0x50/0x67
     [<ffffffff81018926>] machine_shutdown+0xa/0xc
     [<ffffffff8101897e>] native_machine_restart+0x20/0x32
     [<ffffffff810189b0>] machine_restart+0xa/0xc
     [<ffffffff8103b196>] kernel_restart+0x47/0x4c
     [<ffffffff8103b2e6>] sys_reboot+0x13e/0x17c
     [<ffffffff8164e436>] ? _raw_spin_unlock_bh+0x10/0x12
     [<ffffffff810fcac9>] ? bdi_queue_work+0xcf/0xd8
     [<ffffffff810fe82f>] ? __bdi_start_writeback+0xae/0xb7
     [<ffffffff810e0d64>] ? iterate_supers+0xa3/0xb7
     [<ffffffff816547a2>] system_call_fastpath+0x16/0x1b
    ---[ end trace 320af5cb1cb60c5b ]---

    The root cause seems to be the
    default_send_IPI_mask_allbutself_phys() takes quite some time (I
    measured it could be several ms) to complete sending NMIs to all
    the other 23 CPUs, and for HZ=250/1000 system, the time is long
    enough for a timer interrupt to happen, which will in turn
    trigger to kick load balance to a stopped CPU and cause this
    warning in native_smp_send_reschedule().

    So disabling the local irq before stop_other_cpu() can fix this
    problem (tested 25 times reboot ok), and it is fine as there
    should be nobody caring the timer interrupt in such reboot
    stage.

    The latest 3.4 kernel slightly changes this behavior by sending
    REBOOT_VECTOR first and only send NMI_VECTOR if the REBOOT_VCTOR
    fails, and this patch is still needed to prevent the problem.

    Signed-off-by: Feng Tang <feng.tang@intel.com>
    Acked-by: Don Zickus <dzickus@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Link: http://lkml.kernel.org/r/20120530231541.4c13433a@feng-i7
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

Cheers,
Vinson

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [stable request 3.4] x86/reboot: Fix a warning message triggered by stop_other_cpus()
  2015-05-21 18:40 [stable request 3.4] x86/reboot: Fix a warning message triggered by stop_other_cpus() Vinson Lee
@ 2015-06-15  3:24 ` Zefan Li
  2015-08-01 19:52 ` Ben Hutchings
  1 sibling, 0 replies; 3+ messages in thread
From: Zefan Li @ 2015-06-15  3:24 UTC (permalink / raw)
  To: Vinson Lee; +Cc: stable, Feng Tang, Don Zickus, Peter Zijlstra, Ingo Molnar

On 2015/5/22 2:40, Vinson Lee wrote:
> Please backport upstream 3.5 commit
> 55c844a4dd16a4d1fdc0cf2a283ec631a02ec448 "x86/reboot: Fix a warning
> message triggered by stop_other_cpus()" to stable tree 3.4.
> 

Queued up for 3.4. Thanks!

> commit 55c844a4dd16a4d1fdc0cf2a283ec631a02ec448
> Author: Feng Tang <feng.tang@intel.com>
> Date:   Wed May 30 23:15:41 2012 +0800
> 
>     x86/reboot: Fix a warning message triggered by stop_other_cpus()
> 
>     When rebooting our 24 CPU Westmere servers with 3.4-rc6, we
>     always see this warning msg:
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [stable request 3.4] x86/reboot: Fix a warning message triggered by stop_other_cpus()
  2015-05-21 18:40 [stable request 3.4] x86/reboot: Fix a warning message triggered by stop_other_cpus() Vinson Lee
  2015-06-15  3:24 ` Zefan Li
@ 2015-08-01 19:52 ` Ben Hutchings
  1 sibling, 0 replies; 3+ messages in thread
From: Ben Hutchings @ 2015-08-01 19:52 UTC (permalink / raw)
  To: Vinson Lee, stable, lizefan
  Cc: Feng Tang, Don Zickus, Peter Zijlstra, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 414 bytes --]

On Thu, 2015-05-21 at 11:40 -0700, Vinson Lee wrote:
> Please backport upstream 3.5 commit
> 55c844a4dd16a4d1fdc0cf2a283ec631a02ec448 "x86/reboot: Fix a warning
> message triggered by stop_other_cpus()" to stable tree 3.4.
[...]

I've queued this up for 3.2 as well, since it seems to be applicable.

Ben.

-- 
Ben Hutchings
One of the nice things about standards is that there are so many of them.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-08-01 19:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-21 18:40 [stable request 3.4] x86/reboot: Fix a warning message triggered by stop_other_cpus() Vinson Lee
2015-06-15  3:24 ` Zefan Li
2015-08-01 19:52 ` Ben Hutchings

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.