All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Menzel <pmenzel@molgen.mpg.de>
To: Ashok Raj <ashok.raj@intel.com>, Borislav Petkov <bp@alien8.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Thorsten Leemhuis <linux@leemhuis.info>,
	Len Brown <len.brown@intel.com>, Tony Luck <tony.luck@intel.com>
Subject: Re: Dell XPS13: MCE (Hardware Error) reported
Date: Mon, 9 Jan 2017 12:53:33 +0100	[thread overview]
Message-ID: <662102c9-94da-3193-08c4-9fe75411cadb@molgen.mpg.de> (raw)
In-Reply-To: <20170105011236.GA80100@otc-brkl-03>

Dear Ashosk, dear Borislav,


On 01/05/17 02:12, Raj, Ashok wrote:

>>> CPUID Vendor Intel Family 6 Model 142
> This is Kabylake Mobile
>
>>> Hardware event. This is not a software error.
>>> MCE 1
>>> CPU 0 BANK 7
>>> MISC 7880018086 ADDR fef1ce40
>>> TIME 1483543069 Wed Jan  4 16:17:49 2017
>>> MCG status:
>>> MCi status:
>>> Error overflow
>>> Uncorrected error
>>> MCi_MISC register valid
>>> MCi_ADDR register valid
>>> Processor context corrupt
>>> MCA: corrected filtering (some unreported errors in same region)
>>> Generic CACHE Level-2 Generic Error
>>> STATUS ee0000000040110a MCGSTATUS 0
>
> Decoding the bits further from MCi_STATUS above:
> Val=1, OVER=1, UC=1, but EN=0 indicates this isn't a MCE, hence should have
> been signaled by a CMCI.
>
> PCC=1, but should be ignored when EN=0.
> MCACOD: 110a MSCOD: 0040
>
> If the system is stable enough after the report, can you send the output of
> /proc/interrupts to confirm that.

To be clear, other than the message, the system is stable for me.

Here is `/proc/interrupts`.

```
$ more /proc/interrupts
             CPU0       CPU1       CPU2       CPU3
    0:         27          0          0          0  IR-IO-APIC    2-edge 
      timer
    1:          3          2        125          5  IR-IO-APIC    1-edge 
      i8042
    8:          0          1          0          0  IR-IO-APIC    8-edge 
      rtc0
    9:        108         31        397          5  IR-IO-APIC 
9-fasteoi   acpi
   12:         66         18         92         35  IR-IO-APIC   12-edge 
      i8042
   14:          0          0          0          0  IR-IO-APIC 
14-fasteoi   INT344B:00
   16:          0          0          0          0  IR-IO-APIC 
16-fasteoi   idma64.0, i801_smbus, i2c_designware.0
   17:        419         42        280        415  IR-IO-APIC 
17-fasteoi   idma64.1, i2c_designware.1
   51:          2          0          0          1  IR-IO-APIC 
51-fasteoi   DLL075B:01
  120:          0          0          0          0  DMAR-MSI    0-edge 
    dmar0
  121:          0          0          0          0  DMAR-MSI    1-edge 
    dmar1
  274:         17          2          0          4  IR-PCI-MSI 
30932992-edge      rtsx_pci
  275:         89         26         57         45  IR-PCI-MSI 
327680-edge      xhci_hcd
  276:       1886          0       2361          0  IR-PCI-MSI 
31457280-edge      nvme0q0, nvme0q1
  277:          0       3010       2570          0  IR-PCI-MSI 
31457281-edge      nvme0q2
  278:          0          0       2023       3480  IR-PCI-MSI 
31457282-edge      nvme0q3
  279:          0       3319          0       5863  IR-PCI-MSI 
31457283-edge      nvme0q4
  280:         45          0          0          0  IR-PCI-MSI 
360448-edge      mei_me
  281:        201         52       3008         85  IR-PCI-MSI 
32768-edge      i915
  282:        151         29        997      24821  IR-PCI-MSI 
30408704-edge      ath10k_pci
  283:        331        938        677        188  IR-PCI-MSI 
514048-edge      snd_hda_intel:card0
  NMI:          1          0          0          0   Non-maskable interrupts
  LOC:      15198      21708      16850      31954   Local timer interrupts
  SPU:          0          0          0          0   Spurious interrupts
  PMI:          1          0          0          0   Performance 
monitoring interrupts
  IWI:          3          0          0          0   IRQ work interrupts
  RTR:          0          0          0          0   APIC ICR read retries
  RES:       1329       1974       1532       1959   Rescheduling interrupts
  CAL:       2254       3827       1969       3963   Function call 
interrupts
  TLB:        396       2349        342       2193   TLB shootdowns
  TRM:          0          0          0          0   Thermal event 
interrupts
  THR:          0          0          0          0   Threshold APIC 
interrupts
  DFR:          0          0          0          0   Deferred Error APIC 
interrupts
  MCE:          0          0          0          0   Machine check 
exceptions
  MCP:          9          9          9          9   Machine check polls
  ERR:         17
  MIS:          0
  PIN:          0          0          0          0   Posted-interrupt 
notification event
  PIW:          0          0          0          0   Posted-interrupt 
wakeup event
```

> Although its reported as a L2 error, some memory errors can also manifest
> itself as a cache error in certain cases.  In this case it looks like
> some speculative fetch from bad memory might be the cause.
>
>>> MCGCAP c08 APICID 0 SOCKETID 0
>
> MCG_CAP: c08
> Support CMCI(bit 10) - Corrected Machine Check Interrupt (CMCI_P) and
> Threshold based error reporting (bit 11) (TES_P).
>
>
> Do you have another machine which doesn't report these errors? if so try
> swapping memory between them to see if the error disappears.

No, I don’t. And everybody I talked to with a Dell XPS13 (9360) seems to 
have these errors.

> I don't have the model specific error handy.. will check that in the meantime
> to get some decoding as well.
>
> If you haven't already running some memory tests would also help.

I need some time for that.

> If you replaced the motherboard, did that involve both cpu and memory?
> or just the motheboard swap?

Sorry, I don’t know, as I am not the person from StackExchange [1].


Kind regards,

Paul


[1] 
https://unix.stackexchange.com/questions/324237/understanding-machine-check-exceptions-mce/330283

  reply	other threads:[~2017-01-09 11:54 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-04 15:42 Dell XPS13: MCE (Hardware Error) reported Paul Menzel
2017-01-04 22:55 ` Borislav Petkov
2017-01-05  1:12   ` Raj, Ashok
2017-01-09 11:53     ` Paul Menzel [this message]
2017-01-09 19:23       ` Raj, Ashok
2017-01-27 13:35         ` Paul Menzel
2017-01-27 17:10           ` Borislav Petkov
2017-01-27 17:16             ` Mario.Limonciello
2017-01-31 15:29               ` Paul Menzel
2017-01-31 17:20                 ` Borislav Petkov
2017-01-31 18:50                 ` Austin S. Hemmelgarn
2017-02-01 20:52                 ` Mario.Limonciello
2017-01-05  5:00 Daniel J Blueman
2017-01-05 14:05 ` Daniel J Blueman
2017-01-05 20:10   ` Alexander Alemayhu
2017-01-05 20:31     ` Borislav Petkov
2017-01-05 20:43       ` Raj, Ashok
2017-01-05 21:03         ` Pandruvada, Srinivas
2017-01-05 23:23           ` Alexander Alemayhu
2017-01-05 21:38       ` Alexander Alemayhu
2017-01-05 23:28       ` Raj, Ashok
2017-01-05 23:56         ` Borislav Petkov
2017-01-06  1:26           ` Raj, Ashok
2017-01-06 11:16             ` Borislav Petkov
2017-01-06 15:58               ` Raj, Ashok
2017-01-06 16:54                 ` Borislav Petkov
2017-01-06 17:04                   ` Raj, Ashok
2017-01-09 10:55                   ` Paul Menzel
2017-01-09 11:05                     ` Borislav Petkov
2017-01-09 11:11                       ` Paul Menzel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=662102c9-94da-3193-08c4-9fe75411cadb@molgen.mpg.de \
    --to=pmenzel@molgen.mpg.de \
    --cc=ashok.raj@intel.com \
    --cc=bp@alien8.de \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@leemhuis.info \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.