From: Paul Menzel <pmenzel@molgen.mpg.de>
To: Ashok Raj <ashok.raj@intel.com>, Borislav Petkov <bp@alien8.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Thorsten Leemhuis <linux@leemhuis.info>,
Len Brown <len.brown@intel.com>, Tony Luck <tony.luck@intel.com>
Subject: Re: Dell XPS13: MCE (Hardware Error) reported
Date: Mon, 9 Jan 2017 12:53:33 +0100 [thread overview]
Message-ID: <662102c9-94da-3193-08c4-9fe75411cadb@molgen.mpg.de> (raw)
In-Reply-To: <20170105011236.GA80100@otc-brkl-03>
Dear Ashosk, dear Borislav,
On 01/05/17 02:12, Raj, Ashok wrote:
>>> CPUID Vendor Intel Family 6 Model 142
> This is Kabylake Mobile
>
>>> Hardware event. This is not a software error.
>>> MCE 1
>>> CPU 0 BANK 7
>>> MISC 7880018086 ADDR fef1ce40
>>> TIME 1483543069 Wed Jan 4 16:17:49 2017
>>> MCG status:
>>> MCi status:
>>> Error overflow
>>> Uncorrected error
>>> MCi_MISC register valid
>>> MCi_ADDR register valid
>>> Processor context corrupt
>>> MCA: corrected filtering (some unreported errors in same region)
>>> Generic CACHE Level-2 Generic Error
>>> STATUS ee0000000040110a MCGSTATUS 0
>
> Decoding the bits further from MCi_STATUS above:
> Val=1, OVER=1, UC=1, but EN=0 indicates this isn't a MCE, hence should have
> been signaled by a CMCI.
>
> PCC=1, but should be ignored when EN=0.
> MCACOD: 110a MSCOD: 0040
>
> If the system is stable enough after the report, can you send the output of
> /proc/interrupts to confirm that.
To be clear, other than the message, the system is stable for me.
Here is `/proc/interrupts`.
```
$ more /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 27 0 0 0 IR-IO-APIC 2-edge
timer
1: 3 2 125 5 IR-IO-APIC 1-edge
i8042
8: 0 1 0 0 IR-IO-APIC 8-edge
rtc0
9: 108 31 397 5 IR-IO-APIC
9-fasteoi acpi
12: 66 18 92 35 IR-IO-APIC 12-edge
i8042
14: 0 0 0 0 IR-IO-APIC
14-fasteoi INT344B:00
16: 0 0 0 0 IR-IO-APIC
16-fasteoi idma64.0, i801_smbus, i2c_designware.0
17: 419 42 280 415 IR-IO-APIC
17-fasteoi idma64.1, i2c_designware.1
51: 2 0 0 1 IR-IO-APIC
51-fasteoi DLL075B:01
120: 0 0 0 0 DMAR-MSI 0-edge
dmar0
121: 0 0 0 0 DMAR-MSI 1-edge
dmar1
274: 17 2 0 4 IR-PCI-MSI
30932992-edge rtsx_pci
275: 89 26 57 45 IR-PCI-MSI
327680-edge xhci_hcd
276: 1886 0 2361 0 IR-PCI-MSI
31457280-edge nvme0q0, nvme0q1
277: 0 3010 2570 0 IR-PCI-MSI
31457281-edge nvme0q2
278: 0 0 2023 3480 IR-PCI-MSI
31457282-edge nvme0q3
279: 0 3319 0 5863 IR-PCI-MSI
31457283-edge nvme0q4
280: 45 0 0 0 IR-PCI-MSI
360448-edge mei_me
281: 201 52 3008 85 IR-PCI-MSI
32768-edge i915
282: 151 29 997 24821 IR-PCI-MSI
30408704-edge ath10k_pci
283: 331 938 677 188 IR-PCI-MSI
514048-edge snd_hda_intel:card0
NMI: 1 0 0 0 Non-maskable interrupts
LOC: 15198 21708 16850 31954 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 1 0 0 0 Performance
monitoring interrupts
IWI: 3 0 0 0 IRQ work interrupts
RTR: 0 0 0 0 APIC ICR read retries
RES: 1329 1974 1532 1959 Rescheduling interrupts
CAL: 2254 3827 1969 3963 Function call
interrupts
TLB: 396 2349 342 2193 TLB shootdowns
TRM: 0 0 0 0 Thermal event
interrupts
THR: 0 0 0 0 Threshold APIC
interrupts
DFR: 0 0 0 0 Deferred Error APIC
interrupts
MCE: 0 0 0 0 Machine check
exceptions
MCP: 9 9 9 9 Machine check polls
ERR: 17
MIS: 0
PIN: 0 0 0 0 Posted-interrupt
notification event
PIW: 0 0 0 0 Posted-interrupt
wakeup event
```
> Although its reported as a L2 error, some memory errors can also manifest
> itself as a cache error in certain cases. In this case it looks like
> some speculative fetch from bad memory might be the cause.
>
>>> MCGCAP c08 APICID 0 SOCKETID 0
>
> MCG_CAP: c08
> Support CMCI(bit 10) - Corrected Machine Check Interrupt (CMCI_P) and
> Threshold based error reporting (bit 11) (TES_P).
>
>
> Do you have another machine which doesn't report these errors? if so try
> swapping memory between them to see if the error disappears.
No, I don’t. And everybody I talked to with a Dell XPS13 (9360) seems to
have these errors.
> I don't have the model specific error handy.. will check that in the meantime
> to get some decoding as well.
>
> If you haven't already running some memory tests would also help.
I need some time for that.
> If you replaced the motherboard, did that involve both cpu and memory?
> or just the motheboard swap?
Sorry, I don’t know, as I am not the person from StackExchange [1].
Kind regards,
Paul
[1]
https://unix.stackexchange.com/questions/324237/understanding-machine-check-exceptions-mce/330283
next prev parent reply other threads:[~2017-01-09 11:54 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-04 15:42 Dell XPS13: MCE (Hardware Error) reported Paul Menzel
2017-01-04 22:55 ` Borislav Petkov
2017-01-05 1:12 ` Raj, Ashok
2017-01-09 11:53 ` Paul Menzel [this message]
2017-01-09 19:23 ` Raj, Ashok
2017-01-27 13:35 ` Paul Menzel
2017-01-27 17:10 ` Borislav Petkov
2017-01-27 17:16 ` Mario.Limonciello
2017-01-31 15:29 ` Paul Menzel
2017-01-31 17:20 ` Borislav Petkov
2017-01-31 18:50 ` Austin S. Hemmelgarn
2017-02-01 20:52 ` Mario.Limonciello
2017-01-05 5:00 Daniel J Blueman
2017-01-05 14:05 ` Daniel J Blueman
2017-01-05 20:10 ` Alexander Alemayhu
2017-01-05 20:31 ` Borislav Petkov
2017-01-05 20:43 ` Raj, Ashok
2017-01-05 21:03 ` Pandruvada, Srinivas
2017-01-05 23:23 ` Alexander Alemayhu
2017-01-05 21:38 ` Alexander Alemayhu
2017-01-05 23:28 ` Raj, Ashok
2017-01-05 23:56 ` Borislav Petkov
2017-01-06 1:26 ` Raj, Ashok
2017-01-06 11:16 ` Borislav Petkov
2017-01-06 15:58 ` Raj, Ashok
2017-01-06 16:54 ` Borislav Petkov
2017-01-06 17:04 ` Raj, Ashok
2017-01-09 10:55 ` Paul Menzel
2017-01-09 11:05 ` Borislav Petkov
2017-01-09 11:11 ` Paul Menzel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=662102c9-94da-3193-08c4-9fe75411cadb@molgen.mpg.de \
--to=pmenzel@molgen.mpg.de \
--cc=ashok.raj@intel.com \
--cc=bp@alien8.de \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@leemhuis.info \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.