linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: ruiv.wang@gmail.com
Cc: linux-kernel@vger.kernel.org, tony.luck@intel.com,
	gong.chen@linux.intel.com, rui.y.wang@intel.com
Subject: Re: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
Date: Wed, 19 Nov 2014 11:29:54 +0100	[thread overview]
Message-ID: <20141119102954.GA5617@pd.tnic> (raw)
In-Reply-To: <1416388961-24159-1-git-send-email-ruiv.wang@gmail.com>

On Wed, Nov 19, 2014 at 05:22:41PM +0800, ruiv.wang@gmail.com wrote:
> From: Rui Wang <rui.y.wang@intel.com>
> 
> There are cases when an machine check panics without giving any information
> about the error:
> 
> [  177.806166] Kernel panic - not syncing: Machine check from unknown source
> 
> No information besides that it is a machine check. This happens in two cases:
> 1) The CPU logs the error with the MCi_STATUS.EN bit set to zero, and Linux
>    ignores EN=0 entries (as it should).

Well, I guess we shouldn't anymore. Apparently hw forgets to set the
bit when raising an MCE so then we should ignore it too in mce-severity
and delete that piece or grade it as higher severity based on, I dunno,
b0rked hardware family/model/stepping or whatever bit we set...

        MCESEV(
                NO, "Not enabled",
                BITCLR(MCI_STATUS_EN)
                ),

> 2) In normal processing the MCE handler ignores banks that do not contain fatal
>    or unrecoverable errors (these would later be found and logged by the CMCI
>    handler). If we panic, these will never be logged, but could be important
>    to diagnose the problem.

Well, we do this:

                /*
                 * Non uncorrected or non signaled errors are handled by
                 * machine_check_poll. Leave them alone, unless this panics.
                 */
                if (!(m.status & (cfg->ser ? MCI_STATUS_S : MCI_STATUS_UC)) &&
                        !no_way_out)
                        continue;

so no_way_out gets indirectly controlled by mce-severity too. So I guess
mce-severity would need adjusting instead of adding more stuff to the #MC
handler.

Btw, the panic message comes from

        /*
         * No machine check event found. Must be some external
         * source or one CPU is hung. Panic.
         */
        if (global_worst <= MCE_KEEP_SEVERITY && mca_cfg.tolerant < 3)
                mce_panic("Machine check from unknown source", NULL, NULL);

so fixing mce_severity is what should happen here instead, IMO.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

  reply	other threads:[~2014-11-19 10:30 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-19  9:22 [PATCH v3] x86/mce: Try printing all machine check banks known before panic ruiv.wang
2014-11-19 10:29 ` Borislav Petkov [this message]
2014-11-19 23:34   ` Luck, Tony
2014-11-20 10:15     ` Borislav Petkov
2014-11-21  1:20       ` rui wang
2014-11-21 16:41         ` Borislav Petkov
2014-11-21 17:20           ` Luck, Tony
2014-11-21 18:13             ` Borislav Petkov
2014-11-21 21:31               ` Luck, Tony
2014-11-21 21:35                 ` Borislav Petkov
2014-11-21 21:59                   ` Luck, Tony
2014-11-23 20:55                     ` Borislav Petkov
2014-11-22  2:16               ` rui wang
2014-11-22  9:44                 ` Borislav Petkov
2014-11-22 15:32                   ` rui wang
2014-11-22 16:31                     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141119102954.GA5617@pd.tnic \
    --to=bp@alien8.de \
    --cc=gong.chen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rui.y.wang@intel.com \
    --cc=ruiv.wang@gmail.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).