linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "Raj, Ashok" <ashok.raj@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Subject: Re: [Patch V2] x86, mce: Ensure offline CPU's don't participate in mce rendezvous process.
Date: Tue, 8 Dec 2015 19:56:44 +0100	[thread overview]
Message-ID: <20151208185644.GE27180@pd.tnic> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F39F7CF67@ORSMSX114.amr.corp.intel.com>

On Tue, Dec 08, 2015 at 03:59:58PM +0000, Luck, Tony wrote:
> > No, the system did panic in both times. The "strange" observation is
> > that the MCE gets reported only on the cores on node 0. Or at least only
> > the printks from mce_panic() on the cores on node0 reach the serial
> > console.
> 
> You only see messages and logs from node0, because the cpus there are
> the only ones that see any errors logged in their banks.
> 
> The cpus on node 1, 2, 3 scan all banks and find nothing, so say nothing.

Right, sure, of course. Doh!

Confirmation:

[  183.840517] mce: do_machine_check: CPU: 30
[  183.840531] mce: do_machine_check: CPU: 27
[  183.840536] mce: do_machine_check: CPU: 29
[  183.840541] mce: do_machine_check: CPU: 56
[  183.840546] mce: do_machine_check: CPU: 28
[  183.840548] mce: do_machine_check: CPU: 60
[  183.840550] mce: do_machine_check: CPU: 24
[  183.840557] mce: do_machine_check: CPU: 12
[  183.840561] mce: do_machine_check: CPU: 45
[  183.840565] mce: do_machine_check: CPU: 59
[  183.840569] mce: do_machine_check: CPU: 57
[  183.840572] mce: do_machine_check: CPU: 61
[  183.840584] mce: do_machine_check: CPU: 0
[  183.840587] mce: do_machine_check: CPU: 32
[  183.840593] mce: do_machine_check: CPU: 63
[  183.840596] mce: do_machine_check: CPU: 31
[  183.840602] mce: do_machine_check: CPU: 42
[  183.840606] mce: do_machine_check: CPU: 11
[  183.840611] mce: do_machine_check: CPU: 41
[  183.840613] mce: do_machine_check: CPU: 9
[  183.840617] mce: do_machine_check: CPU: 62
[  183.840619] mce: do_machine_check: CPU: 25
[  183.840624] mce: do_machine_check: CPU: 58
[  183.840627] mce: do_machine_check: CPU: 26
[  183.840633] mce: do_machine_check: CPU: 5
[  183.840638] mce: do_machine_check: CPU: 1
[  183.840642] mce: do_machine_check: CPU: 37
[  183.840648] mce: do_machine_check: CPU: 15
[  183.840650] mce: do_machine_check: CPU: 47
[  183.840653] mce: do_machine_check: CPU: 44
[  183.840657] mce: do_machine_check: CPU: 14
[  183.840659] mce: do_machine_check: CPU: 46
[  183.840666] mce: do_machine_check: CPU: 52
[  183.840670] mce: do_machine_check: CPU: 50
[  183.840675] mce: do_machine_check: CPU: 48
[  183.840677] mce: do_machine_check: CPU: 16
[  183.840682] mce: do_machine_check: CPU: 54
[  183.840686] mce: do_machine_check: CPU: 18
[  183.840692] mce: do_machine_check: CPU: 40
[  183.840695] mce: do_machine_check: CPU: 8
[  183.840701] mce: do_machine_check: CPU: 2
[  183.840705] mce: do_machine_check: CPU: 20
[  183.840710] mce: do_machine_check: CPU: 13
[  183.840712] mce: do_machine_check: CPU: 43
[  183.840716] mce: do_machine_check: CPU: 10
[  183.840722] mce: do_machine_check: CPU: 3
[  183.840724] mce: do_machine_check: CPU: 35
[  183.840727] mce: do_machine_check: CPU: 33
[  183.840730] mce: do_machine_check: CPU: 34
[  183.840734] mce: do_machine_check: CPU: 6
[  183.840738] mce: do_machine_check: CPU: 38
[  183.840743] mce: do_machine_check: CPU: 53
[  183.840745] mce: do_machine_check: CPU: 21
[  183.840750] mce: do_machine_check: CPU: 23
[  183.840752] mce: do_machine_check: CPU: 55
[  183.840755] mce: do_machine_check: CPU: 22
[  183.840759] mce: do_machine_check: CPU: 49
[  183.840761] mce: do_machine_check: CPU: 17
[  183.840767] mce: do_machine_check: CPU: 19
[  183.840770] mce: do_machine_check: CPU: 51
[  183.840776] mce: do_machine_check: CPU: 39
[  183.840778] mce: do_machine_check: CPU: 7
[  183.840784] mce: do_machine_check: CPU: 36
[  183.840786] mce: do_machine_check: CPU: 4
[  184.485104] Disabling lock debugging due to kernel taint
[  184.498006] mce: [Hardware Error]: CPU 32: Machine Check Exception: 5 Bank 5: be00000000010090
[  184.498023] mce: [Hardware Error]: Machine check events logged
[  184.531428] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[  184.551126] mce: [Hardware Error]: TSC c760ad064ccce ADDR bb68ec00 MISC 421c8c86 
[  184.568358] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449600598 SOCKET 0 APIC 1 microcode 710
[  184.588862] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
...

mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 1: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 2: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 32: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 33: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 35: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 36: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 37: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 38: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 39: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 3: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 4: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 5: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 6: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 7: Machine Check Exception: 5 Bank 5: be00000000010090

CPUs:

[    1.103200] x86: Booting SMP configuration:
[    1.112441] .... node  #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    1.227835] .... node  #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.451861] .... node  #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.674819] .... node  #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    1.899011] .... node  #0, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.026616] .... node  #1, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    2.152645] .... node  #2, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    2.276782] .... node  #3, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    2.402263] x86: Booted up 4 nodes, 64 CPUs

Ok, all clear.

Thanks!

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

      reply	other threads:[~2015-12-08 18:57 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-05  0:29 [Patch V2] x86, mce: Ensure offline CPU's don't participate in mce rendezvous process Ashok Raj
2015-12-07 20:00 ` Borislav Petkov
2015-12-07 20:04   ` Luck, Tony
2015-12-07 20:19     ` Borislav Petkov
2015-12-07 22:07       ` Luck, Tony
2015-12-07 22:34         ` Borislav Petkov
2015-12-07 23:26           ` Luck, Tony
2015-12-07 23:46           ` Raj, Ashok
2015-12-07 23:25             ` Borislav Petkov
2015-12-08  1:41               ` Raj, Ashok
2015-12-08  9:18                 ` Borislav Petkov
2015-12-08 15:59                   ` Luck, Tony
2015-12-08 18:56                     ` Borislav Petkov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151208185644.GE27180@pd.tnic \
    --to=bp@alien8.de \
    --cc=ashok.raj@intel.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).