linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Ming Lei <tom.leiming@gmail.com>,
	Djalal Harouni <tixxdz@opendz.org>,
	Borislav Petkov <borislav.petkov@amd.com>,
	Tony Luck <tony.luck@intel.com>,
	Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
	Ingo Molnar <mingo@elte.hu>, Andi Kleen <ak@linux.intel.com>,
	linux-kernel@vger.kernel.org, Greg Kroah-Hartman <gregkh@suse.de>,
	Kay Sievers <kay.sievers@vrfy.org>,
	gouders@et.bocholt.fh-gelsenkirchen.de,
	Marcos Souza <marcos.mage@gmail.com>,
	Linux PM mailing list <linux-pm@vger.kernel.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	prasad@linux.vnet.ibm.com, justinmattock@gmail.com,
	Jeff Chua <jeff.chua.linux@gmail.com>
Subject: Re: x86/mce: machine check warning during poweroff
Date: Fri, 13 Jan 2012 15:02:01 -0800	[thread overview]
Message-ID: <CA+55aFzGZ_eSTChemYczKr3-0zQ3J3MJ3TfGtxh9wkhSKrrfCA@mail.gmail.com> (raw)
In-Reply-To: <4F10929E.8070007@linux.vnet.ibm.com>

On Fri, Jan 13, 2012 at 12:22 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
>
> Fundamentally, this warning is triggered during CPU Offline, which is done
> during poweroff, suspend, hibernate etc. IOW, even a simple
> # echo 0 > /sys/devices/system/cpu/cpuX/online will trigger it.

There is definitely something wrong with CPU hotplug and MCE.

I seem to be able to trigger not only warnings, but some oopses, by doing:

 - enable list debugging, slab debugging, and kobject debugging in the
kernel (I've got some other things enabled too, but I think those are
the main ones)

 - do

     echo 0 > /sys/devices/system/cpu/cpuX/online

   this gets a few warnings

 - then do

     echo 1 > /sys/devices/system/cpu/cpuX/online

where bringing it up again will crash the machine entirely.

The oops scrolled off the screen an ddidn't get caught anywhere, but
the call trace seems to be (warning: hand-entered, so some of this may
be bogus):

Oops in:
  kobject_get+0x10/0x40

Code:
  55 48 89 f8 48 89 e5 48 83 ec 10 48 85 ff 74 0b <8b> 57 38 85 d2 74 06 f0 ff

Call trace:
  get_device
  klist_device_get
  klist_add_tail
  bus_add_device
  device_add
  device_register
  mce_device_create
  notifier_call_chain
  __raw_notifier_call_chain
  __cpu_notify
  _cpu_up
  store_online
  dev_attr_change
  sysfs_write_file

so it's definitely something bad in MCE device handling, and probably
something to do with reusing a 'struct device' after freeign it, or
after not having completely cleaned it up.

I didn't see if I could spot the problem, but I think this is entirely
reproducible, so hopefully somebody who knows the MCE code can
trivially see this and fix it.

                   Linus

  parent reply	other threads:[~2012-01-13 23:02 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-11  0:00 x86/mce: machine check warning during poweroff Djalal Harouni
2012-01-12 14:22 ` Ming Lei
2012-01-13 20:22   ` Srivatsa S. Bhat
2012-01-13 20:34     ` Justin P. Mattock
2012-01-13 20:37     ` Linus Torvalds
2012-01-13 20:53       ` Srivatsa S. Bhat
2012-01-13 21:08         ` Linus Torvalds
2012-01-13 21:24           ` Andi Kleen
2012-01-13 21:38             ` Justin P. Mattock
2012-01-13 22:06               ` Srivatsa S. Bhat
2012-01-13 22:17                 ` Alan Stern
2012-01-13 22:18                 ` Srivatsa S. Bhat
2012-01-13 23:13             ` Andi Kleen
2012-01-14  0:44       ` Dirk Gouders
2012-01-13 23:02     ` Linus Torvalds [this message]
2012-01-13 23:27       ` Srivatsa S. Bhat
2012-01-14  0:05         ` Linus Torvalds
2012-01-14  2:41           ` Srivatsa S. Bhat
2012-01-14  2:53             ` Linus Torvalds
2012-01-14  3:05               ` Srivatsa S. Bhat
2012-01-14  3:10                 ` Linus Torvalds
2012-01-14  3:18                   ` Srivatsa S. Bhat
2012-01-14  3:41                     ` Linus Torvalds
2012-01-14  5:15                   ` Tony Luck
2012-01-14 14:49               ` Greg KH
2012-01-14 16:30                 ` Alan Stern
2012-01-14 20:45                   ` Jeff Chua
2012-01-15  2:05                   ` Tony Luck
2012-01-15  2:34                     ` Greg KH
2012-01-15  3:36                       ` Alan Stern
2012-01-16 18:15                         ` Greg KH
2012-01-16 18:11                 ` Greg KH
2012-01-16 18:27                   ` Luck, Tony
2012-01-16 18:34                     ` Greg KH
2012-01-16 18:42                   ` Kay Sievers
2012-01-17  2:21             ` Suresh Siddha
2012-01-17  9:52               ` Srivatsa S. Bhat
2012-01-17 16:15                 ` Jeff Chua
2012-01-17 16:36                   ` Srivatsa S. Bhat
2012-01-18  3:17                 ` Suresh Siddha
2012-01-18 10:19                   ` Srivatsa S. Bhat
2012-01-18 13:15                   ` Srivatsa S. Bhat
2012-01-18 13:32                     ` Sergey Senozhatsky
2012-01-18 22:08                       ` Suresh Siddha
2012-01-19  7:50                         ` Sergey Senozhatsky
2012-01-19 12:02                         ` Srivatsa S. Bhat
2012-01-20  2:28                           ` Suresh Siddha
2012-01-23  8:43                             ` Peter Zijlstra
2012-01-26 20:27                             ` [tip:sched/urgent] sched/nohz: Fix nohz cpu idle load balancing state with cpu hotplug tip-bot for Suresh Siddha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFzGZ_eSTChemYczKr3-0zQ3J3MJ3TfGtxh9wkhSKrrfCA@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=ak@linux.intel.com \
    --cc=borislav.petkov@amd.com \
    --cc=gouders@et.bocholt.fh-gelsenkirchen.de \
    --cc=gregkh@suse.de \
    --cc=jeff.chua.linux@gmail.com \
    --cc=justinmattock@gmail.com \
    --cc=kay.sievers@vrfy.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=marcos.mage@gmail.com \
    --cc=mingo@elte.hu \
    --cc=prasad@linux.vnet.ibm.com \
    --cc=rjw@sisk.pl \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=tixxdz@opendz.org \
    --cc=tom.leiming@gmail.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).