All of lore.kernel.org
 help / color / mirror / Atom feed
From: Corey Minyard <minyard@acm.org>
To: Borislav Petkov <bp@alien8.de>, Corey Minyard <cminyard@mvista.com>
Cc: "Luck, Tony" <tony.luck@intel.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	"linux-rt-users@vger.kernel.org" <linux-rt-users@vger.kernel.org>
Subject: Re: [PATCH][RT] x86: Fix an RT MCE crash
Date: Thu, 30 Jun 2016 17:47:29 -0500	[thread overview]
Message-ID: <5775A181.2050404@acm.org> (raw)
In-Reply-To: <20160630203457.GF3932@pd.tnic>

On 06/30/2016 03:34 PM, Borislav Petkov wrote:
> On Thu, Jun 30, 2016 at 02:44:42PM -0500, Corey Minyard wrote:
>> I don't think they are.  I think there is something about this
>> particular board.  We aren't having any issues with other systems.
> Right, so the fact that it raises the thresholding interrupt could
> mean that it generates a bunch of correctable ECC errors and it hits a
> threshold which is signalled by that interrupt.
>
> And if that is true, then you should be seeing some errors in mcelog or
> sb_edac reporting some.
>
> You could, just in case, try latest upstream and enable
> CONFIG_EDAC_SBRIDGE and check dmesg for some ECCs.
>
> Or, of course, something else entirely might be funny with that box,
> causing that interrupt to fire.

You are right, I enabled that on the tip of master and I get the
following spewing out for a while:

EDAC MC0: 27843 CE memory read error on CPU_SrcID#0_Ha#0_Chan#1_DIMM#0 
(channel:1 slot:0 page:0x102c offset:0x180 grain:32 syndrome:0x0 -  
OVERFLOW area:DRAM err_code:0001:0091 socket:0 ha:0 channel_mask:2 rank:0)

So there's apparently something broken in the hardware.

>> But as you say, the kernel should be ready for this.
> Right, and we've removed that mce_notify_irq() call in
> intel_threshold_interrupt() with
>
>    f29a7aff4bd6 ("x86/mce: Avoid potential deadlock due to printk() in MCE context")
>
> but that's more of a side-effect of that patch.
>
> And if you want to backport it, you'd need the mce_gen_pool_add() and
> remaining machinery for the genpool.

That sounds like a bit much.

Steven, what would you like to do here?

Thanks,

-corey

> Presumably, booting with "mce=no_cmci" should fix this but then you
> won't have the CMCI thresholding, i.e., the interrupt which gets raised
> when a certain amount of correctable errors has been generated.
>
> Hmm, a funny box that.
>


  reply	other threads:[~2016-06-30 22:48 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-30 13:24 [PATCH][RT] x86: Fix an RT MCE crash minyard
2016-06-30 13:43 ` Steven Rostedt
2016-06-30 14:49   ` Corey Minyard
2016-06-30 15:51     ` Steven Rostedt
2016-06-30 15:58       ` Corey Minyard
2016-06-30 16:01       ` Borislav Petkov
2016-06-30 16:17         ` Luck, Tony
2016-06-30 16:40           ` Corey Minyard
2016-06-30 17:01             ` Borislav Petkov
2016-06-30 17:18               ` Corey Minyard
2016-06-30 17:26                 ` Borislav Petkov
2016-06-30 17:54                   ` Corey Minyard
2016-06-30 18:22                     ` Borislav Petkov
2016-06-30 19:44                       ` Corey Minyard
2016-06-30 20:34                         ` Borislav Petkov
2016-06-30 22:47                           ` Corey Minyard [this message]
2016-07-01  7:20                             ` Borislav Petkov
2016-07-06  0:59                               ` Corey Minyard
2016-07-06  8:37                                 ` Borislav Petkov
2016-07-06 12:03                                   ` Corey Minyard
2016-07-06 13:32                                     ` Steven Rostedt
2016-07-06 13:43                                       ` Sebastian Andrzej Siewior
2016-07-11 17:32                                         ` Steven Rostedt
2016-07-01  9:20         ` Daniel Wagner
2016-06-30 16:04       ` Corey Minyard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5775A181.2050404@acm.org \
    --to=minyard@acm.org \
    --cc=bp@alien8.de \
    --cc=cminyard@mvista.com \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.