From: Borislav Petkov <bp@alien8.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "Raj, Ashok" <ashok.raj@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Subject: Re: [Patch V2] x86, mce: Ensure offline CPU's don't participate in mce rendezvous process.
Date: Tue, 8 Dec 2015 19:56:44 +0100 [thread overview]
Message-ID: <20151208185644.GE27180@pd.tnic> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F39F7CF67@ORSMSX114.amr.corp.intel.com>
On Tue, Dec 08, 2015 at 03:59:58PM +0000, Luck, Tony wrote:
> > No, the system did panic in both times. The "strange" observation is
> > that the MCE gets reported only on the cores on node 0. Or at least only
> > the printks from mce_panic() on the cores on node0 reach the serial
> > console.
>
> You only see messages and logs from node0, because the cpus there are
> the only ones that see any errors logged in their banks.
>
> The cpus on node 1, 2, 3 scan all banks and find nothing, so say nothing.
Right, sure, of course. Doh!
Confirmation:
[ 183.840517] mce: do_machine_check: CPU: 30
[ 183.840531] mce: do_machine_check: CPU: 27
[ 183.840536] mce: do_machine_check: CPU: 29
[ 183.840541] mce: do_machine_check: CPU: 56
[ 183.840546] mce: do_machine_check: CPU: 28
[ 183.840548] mce: do_machine_check: CPU: 60
[ 183.840550] mce: do_machine_check: CPU: 24
[ 183.840557] mce: do_machine_check: CPU: 12
[ 183.840561] mce: do_machine_check: CPU: 45
[ 183.840565] mce: do_machine_check: CPU: 59
[ 183.840569] mce: do_machine_check: CPU: 57
[ 183.840572] mce: do_machine_check: CPU: 61
[ 183.840584] mce: do_machine_check: CPU: 0
[ 183.840587] mce: do_machine_check: CPU: 32
[ 183.840593] mce: do_machine_check: CPU: 63
[ 183.840596] mce: do_machine_check: CPU: 31
[ 183.840602] mce: do_machine_check: CPU: 42
[ 183.840606] mce: do_machine_check: CPU: 11
[ 183.840611] mce: do_machine_check: CPU: 41
[ 183.840613] mce: do_machine_check: CPU: 9
[ 183.840617] mce: do_machine_check: CPU: 62
[ 183.840619] mce: do_machine_check: CPU: 25
[ 183.840624] mce: do_machine_check: CPU: 58
[ 183.840627] mce: do_machine_check: CPU: 26
[ 183.840633] mce: do_machine_check: CPU: 5
[ 183.840638] mce: do_machine_check: CPU: 1
[ 183.840642] mce: do_machine_check: CPU: 37
[ 183.840648] mce: do_machine_check: CPU: 15
[ 183.840650] mce: do_machine_check: CPU: 47
[ 183.840653] mce: do_machine_check: CPU: 44
[ 183.840657] mce: do_machine_check: CPU: 14
[ 183.840659] mce: do_machine_check: CPU: 46
[ 183.840666] mce: do_machine_check: CPU: 52
[ 183.840670] mce: do_machine_check: CPU: 50
[ 183.840675] mce: do_machine_check: CPU: 48
[ 183.840677] mce: do_machine_check: CPU: 16
[ 183.840682] mce: do_machine_check: CPU: 54
[ 183.840686] mce: do_machine_check: CPU: 18
[ 183.840692] mce: do_machine_check: CPU: 40
[ 183.840695] mce: do_machine_check: CPU: 8
[ 183.840701] mce: do_machine_check: CPU: 2
[ 183.840705] mce: do_machine_check: CPU: 20
[ 183.840710] mce: do_machine_check: CPU: 13
[ 183.840712] mce: do_machine_check: CPU: 43
[ 183.840716] mce: do_machine_check: CPU: 10
[ 183.840722] mce: do_machine_check: CPU: 3
[ 183.840724] mce: do_machine_check: CPU: 35
[ 183.840727] mce: do_machine_check: CPU: 33
[ 183.840730] mce: do_machine_check: CPU: 34
[ 183.840734] mce: do_machine_check: CPU: 6
[ 183.840738] mce: do_machine_check: CPU: 38
[ 183.840743] mce: do_machine_check: CPU: 53
[ 183.840745] mce: do_machine_check: CPU: 21
[ 183.840750] mce: do_machine_check: CPU: 23
[ 183.840752] mce: do_machine_check: CPU: 55
[ 183.840755] mce: do_machine_check: CPU: 22
[ 183.840759] mce: do_machine_check: CPU: 49
[ 183.840761] mce: do_machine_check: CPU: 17
[ 183.840767] mce: do_machine_check: CPU: 19
[ 183.840770] mce: do_machine_check: CPU: 51
[ 183.840776] mce: do_machine_check: CPU: 39
[ 183.840778] mce: do_machine_check: CPU: 7
[ 183.840784] mce: do_machine_check: CPU: 36
[ 183.840786] mce: do_machine_check: CPU: 4
[ 184.485104] Disabling lock debugging due to kernel taint
[ 184.498006] mce: [Hardware Error]: CPU 32: Machine Check Exception: 5 Bank 5: be00000000010090
[ 184.498023] mce: [Hardware Error]: Machine check events logged
[ 184.531428] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[ 184.551126] mce: [Hardware Error]: TSC c760ad064ccce ADDR bb68ec00 MISC 421c8c86
[ 184.568358] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449600598 SOCKET 0 APIC 1 microcode 710
[ 184.588862] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
...
mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 1: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 2: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 32: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 33: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 35: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 36: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 37: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 38: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 39: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 3: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 4: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 5: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 6: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 7: Machine Check Exception: 5 Bank 5: be00000000010090
CPUs:
[ 1.103200] x86: Booting SMP configuration:
[ 1.112441] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7
[ 1.227835] .... node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15
[ 1.451861] .... node #2, CPUs: #16 #17 #18 #19 #20 #21 #22 #23
[ 1.674819] .... node #3, CPUs: #24 #25 #26 #27 #28 #29 #30 #31
[ 1.899011] .... node #0, CPUs: #32 #33 #34 #35 #36 #37 #38 #39
[ 2.026616] .... node #1, CPUs: #40 #41 #42 #43 #44 #45 #46 #47
[ 2.152645] .... node #2, CPUs: #48 #49 #50 #51 #52 #53 #54 #55
[ 2.276782] .... node #3, CPUs: #56 #57 #58 #59 #60 #61 #62 #63
[ 2.402263] x86: Booted up 4 nodes, 64 CPUs
Ok, all clear.
Thanks!
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
prev parent reply other threads:[~2015-12-08 18:57 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-05 0:29 [Patch V2] x86, mce: Ensure offline CPU's don't participate in mce rendezvous process Ashok Raj
2015-12-07 20:00 ` Borislav Petkov
2015-12-07 20:04 ` Luck, Tony
2015-12-07 20:19 ` Borislav Petkov
2015-12-07 22:07 ` Luck, Tony
2015-12-07 22:34 ` Borislav Petkov
2015-12-07 23:26 ` Luck, Tony
2015-12-07 23:46 ` Raj, Ashok
2015-12-07 23:25 ` Borislav Petkov
2015-12-08 1:41 ` Raj, Ashok
2015-12-08 9:18 ` Borislav Petkov
2015-12-08 15:59 ` Luck, Tony
2015-12-08 18:56 ` Borislav Petkov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151208185644.GE27180@pd.tnic \
--to=bp@alien8.de \
--cc=ashok.raj@intel.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).