linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Ming Lei <tom.leiming@gmail.com>,
	Djalal Harouni <tixxdz@opendz.org>,
	Borislav Petkov <borislav.petkov@amd.com>,
	Tony Luck <tony.luck@intel.com>,
	Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
	Ingo Molnar <mingo@elte.hu>, Andi Kleen <ak@linux.intel.com>,
	linux-kernel@vger.kernel.org, Greg Kroah-Hartman <gregkh@suse.de>,
	Kay Sievers <kay.sievers@vrfy.org>,
	gouders@et.bocholt.fh-gelsenkirchen.de,
	Marcos Souza <marcos.mage@gmail.com>,
	Linux PM mailing list <linux-pm@vger.kernel.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	prasad@linux.vnet.ibm.com, justinmattock@gmail.com,
	Jeff Chua <jeff.chua.linux@gmail.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Mel Gorman <mgorman@suse.de>,
	Gilad Ben-Yossef <gilad@benyossef.com>
Subject: Re: x86/mce: machine check warning during poweroff
Date: Tue, 17 Jan 2012 15:22:42 +0530	[thread overview]
Message-ID: <4F1544EA.5060907@linux.vnet.ibm.com> (raw)
In-Reply-To: <1326766892.16150.21.camel@sbsiddha-desk.sc.intel.com>

On 01/17/2012 07:51 AM, Suresh Siddha wrote:

> On Sat, 2012-01-14 at 08:11 +0530, Srivatsa S. Bhat wrote:
>> Of course, the warnings at drivers/base/core.c: device_release()
>> as well as the IPI to offline cpu warnings still appear but are rather
>> unrelated and harmless to the issue being discussed.
> 
> As far the IPI offline cpu warnings are concerned, appended patch should
> fix it. Can you please give it a try? Peterz, can you please review and
> queue it after Srivatsa confirms that it works? Thanks.


Hi Suresh,

Thanks for the patch, but unfortunately it doesn't fix the problem!
Exactly the same stack traces are seen during a CPU Hotplug stress test.
(I didn't even have to stress it - it is so fragile that just a script
to offline all cpus except the boot cpu was good enough to reproduce the
problem easily.)

[  562.269083] ------------[ cut here ]------------
[  562.273079] WARNING: at arch/x86/kernel/smp.c:120 native_smp_send_reschedule+0x59/0x60()
[  562.273079] Hardware name: IBM System x -[7870C4Q]-
[  562.273079] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod iTCO_wdt i7core_edac i2c_i801 ioatdma cdc_ether i2c_core tpm_tis bnx2 shpchp usbnet pcspkr mii iTCO_vendor_support edac_core serio_raw dca sg rtc_cmos tpm tpm_bios pci_hotplug button uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[  562.273079] Pid: 6, comm: migration/0 Not tainted 3.2.0-sureshipi-0.0.0.28.36b5ec9-default #2
[  562.273079] Call Trace:
[  562.273079]  <IRQ>  [<ffffffff810213d9>] ? native_smp_send_reschedule+0x59/0x60
[  562.273079]  [<ffffffff8103cf4a>] warn_slowpath_common+0x7a/0xb0
[  562.273079]  [<ffffffff8103cf95>] warn_slowpath_null+0x15/0x20
[  562.273079]  [<ffffffff810213d9>] native_smp_send_reschedule+0x59/0x60
[  562.273079]  [<ffffffff81082d65>] trigger_load_balance+0x185/0x500
[  562.273079]  [<ffffffff81082d9b>] ? trigger_load_balance+0x1bb/0x500
[  562.273079]  [<ffffffff81073db7>] scheduler_tick+0x107/0x170
[  562.273079]  [<ffffffff8104e6f7>] update_process_times+0x67/0x80
[  562.273079]  [<ffffffff8109c64f>] tick_sched_timer+0x5f/0xc0
[  562.273079]  [<ffffffff8109c5f0>] ? tick_nohz_handler+0x100/0x100
[  562.273079]  [<ffffffff8106a85e>] __run_hrtimer+0x12e/0x330
[  562.273079]  [<ffffffff8106aca7>] hrtimer_interrupt+0xc7/0x1f0
[  562.273079]  [<ffffffff81022ff4>] smp_apic_timer_interrupt+0x64/0xa0
[  562.273079]  [<ffffffff814a2a33>] apic_timer_interrupt+0x73/0x80
[  562.273079]  <EOI>  [<ffffffff810c563a>] ? stop_machine_cpu_stop+0xda/0x130
[  562.273079]  [<ffffffff810c5560>] ? stop_one_cpu_nowait+0x50/0x50
[  562.273079]  [<ffffffff810c5279>] cpu_stopper_thread+0xd9/0x1b0
[  562.273079]  [<ffffffff81498ddf>] ? _raw_spin_unlock_irqrestore+0x3f/0x80
[  562.273079]  [<ffffffff810c51a0>] ? res_counter_init+0x50/0x50
[  562.273079]  [<ffffffff810a2add>] ? trace_hardirqs_on_caller+0x12d/0x1b0
[  562.273079]  [<ffffffff810a2b6d>] ? trace_hardirqs_on+0xd/0x10
[  562.273079]  [<ffffffff810c51a0>] ? res_counter_init+0x50/0x50
[  562.273079]  [<ffffffff8106553e>] kthread+0x9e/0xb0
[  562.273079]  [<ffffffff814a3334>] kernel_thread_helper+0x4/0x10
[  562.273079]  [<ffffffff81499174>] ? retint_restore_args+0x13/0x13
[  562.273079]  [<ffffffff810654a0>] ? __init_kthread_worker+0x70/0x70
[  562.273079]  [<ffffffff814a3330>] ? gs_change+0x13/0x13
[  562.273079] ---[ end trace 4efec5b2532b902d ]---


I have a few questions regarding the synchronization with CPU Hotplug.
What guarantees that the code which selects and IPIs the new ilb is totally
race-free with respect to CPU hotplug and we will never IPI an offline CPU?

(In 3.2-rc7 I hadn't hit the IPI to offline cpu issue (the above stack trace)
as far as I remember.)

While trying to figure out what changed in the 3.3 merge window, I added a
WARN_ON in the 3.2-rc7 kernel as shown below:

static void nohz_balancer_kick(int cpu)
{
        ....

        if (!cpu_rq(ilb_cpu)->nohz_balance_kick) {
                cpu_rq(ilb_cpu)->nohz_balance_kick = 1;

                smp_mb();
                /*
                 * Use smp_send_reschedule() instead of resched_cpu().
                 * This way we generate a sched IPI on the target cpu which
                 * is idle. And the softirq performing nohz idle load balance
                 * will be run before returning from the IPI.
                 */
==========>      if (!cpu_active(ilb_cpu))
==========>             WARN_ON(1);
                smp_send_reschedule(ilb_cpu);
        }
        return;
}

As expected, I hit this warning during my CPU hotplug stress tests. I am sure
this happens on latest kernel too (3.3 merge window), since there is
apparently no change in that part of code in that aspect.

So, while selecting the new ilb, why are we not careful enough to ensure we
don't select a cpu that is going offline? Is this by design (to avoid some
overhead) or is this a bug? (As demonstrated above, this issue is in 3.2-rc7
as well.)

And the only reason I can think why we did not hit the "IPI to offline CPU"
issue in 3.2-rc7 kernel is that the race window (with CPU offline) was
probably too small and _not_ because we explicitly synchronized with CPU
Hotplug.

Probably I am missing something obvious... I would be grateful if you could
kindly help me understand how this works..

Regards,
Srivatsa S. Bhat


  reply	other threads:[~2012-01-17  9:53 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-11  0:00 x86/mce: machine check warning during poweroff Djalal Harouni
2012-01-12 14:22 ` Ming Lei
2012-01-13 20:22   ` Srivatsa S. Bhat
2012-01-13 20:34     ` Justin P. Mattock
2012-01-13 20:37     ` Linus Torvalds
2012-01-13 20:53       ` Srivatsa S. Bhat
2012-01-13 21:08         ` Linus Torvalds
2012-01-13 21:24           ` Andi Kleen
2012-01-13 21:38             ` Justin P. Mattock
2012-01-13 22:06               ` Srivatsa S. Bhat
2012-01-13 22:17                 ` Alan Stern
2012-01-13 22:18                 ` Srivatsa S. Bhat
2012-01-13 23:13             ` Andi Kleen
2012-01-14  0:44       ` Dirk Gouders
2012-01-13 23:02     ` Linus Torvalds
2012-01-13 23:27       ` Srivatsa S. Bhat
2012-01-14  0:05         ` Linus Torvalds
2012-01-14  2:41           ` Srivatsa S. Bhat
2012-01-14  2:53             ` Linus Torvalds
2012-01-14  3:05               ` Srivatsa S. Bhat
2012-01-14  3:10                 ` Linus Torvalds
2012-01-14  3:18                   ` Srivatsa S. Bhat
2012-01-14  3:41                     ` Linus Torvalds
2012-01-14  5:15                   ` Tony Luck
2012-01-14 14:49               ` Greg KH
2012-01-14 16:30                 ` Alan Stern
2012-01-14 20:45                   ` Jeff Chua
2012-01-15  2:05                   ` Tony Luck
2012-01-15  2:34                     ` Greg KH
2012-01-15  3:36                       ` Alan Stern
2012-01-16 18:15                         ` Greg KH
2012-01-16 18:11                 ` Greg KH
2012-01-16 18:27                   ` Luck, Tony
2012-01-16 18:34                     ` Greg KH
2012-01-16 18:42                   ` Kay Sievers
2012-01-17  2:21             ` Suresh Siddha
2012-01-17  9:52               ` Srivatsa S. Bhat [this message]
2012-01-17 16:15                 ` Jeff Chua
2012-01-17 16:36                   ` Srivatsa S. Bhat
2012-01-18  3:17                 ` Suresh Siddha
2012-01-18 10:19                   ` Srivatsa S. Bhat
2012-01-18 13:15                   ` Srivatsa S. Bhat
2012-01-18 13:32                     ` Sergey Senozhatsky
2012-01-18 22:08                       ` Suresh Siddha
2012-01-19  7:50                         ` Sergey Senozhatsky
2012-01-19 12:02                         ` Srivatsa S. Bhat
2012-01-20  2:28                           ` Suresh Siddha
2012-01-23  8:43                             ` Peter Zijlstra
2012-01-26 20:27                             ` [tip:sched/urgent] sched/nohz: Fix nohz cpu idle load balancing state with cpu hotplug tip-bot for Suresh Siddha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F1544EA.5060907@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=ak@linux.intel.com \
    --cc=borislav.petkov@amd.com \
    --cc=gilad@benyossef.com \
    --cc=gouders@et.bocholt.fh-gelsenkirchen.de \
    --cc=gregkh@suse.de \
    --cc=jeff.chua.linux@gmail.com \
    --cc=justinmattock@gmail.com \
    --cc=kay.sievers@vrfy.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=marcos.mage@gmail.com \
    --cc=mgorman@suse.de \
    --cc=mingo@elte.hu \
    --cc=prasad@linux.vnet.ibm.com \
    --cc=rjw@sisk.pl \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tixxdz@opendz.org \
    --cc=tom.leiming@gmail.com \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).