linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: minyard@acm.org
Cc: "Luck, Tony" <tony.luck@intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	linux-edac@vger.kernel.org, Corey Minyard <cminyard@mvista.com>,
	hidehiro.kawai.ez@hitachi.com, linfeilong@huawei.com,
	liuzhiqiang26@huawei.com
Subject: Re: [PATCH v2] x86: Fix MCE error handing when kdump is enabled
Date: Wed, 30 Sep 2020 19:56:33 +0200	[thread overview]
Message-ID: <20200930175633.GM6810@zn.tnic> (raw)
In-Reply-To: <20200929211644.31632-1-minyard@acm.org>

On Tue, Sep 29, 2020 at 04:16:44PM -0500, minyard@acm.org wrote:
> From: Corey Minyard <cminyard@mvista.com>
> 
> If kdump is enabled, the handling of shooting down CPUs does not use the
> RESET_VECTOR irq before trying to use NMIs to shoot down the CPUs.

So I've read that commit message like a bunch of times already and am
getting none the wiser about what the situation is, who's doing what and
what is this thing fixing.

It must be something about kdumping a kernel and an MCE happening at the
same time and we did something about this a while ago, see:

 5bc329503e81 ("x86/mce: Handle broadcasted MCE gracefully with kexec")

and that is simply letting CPUs which are not doing the kexec-ing
continue from the broadcasted MCE holding pattern so that kexec
finishes.

So please explain exactly what this problem is, who's doing what, when
does the MCE happen etc?

I've found this:

https://lkml.kernel.org/r/1600339070-570840-1-git-send-email-wubo40@huawei.com

and that sounds like the problem and I'm going to read that one in
detail if that is the issue we're talking about. But from skimming over
it, it sounds like the commit I mentioned above should take care of it.

Although I have no clue what this means:

"1) MCE appears on all CPUs, Currently all CPUs are in the NMI interrupt 
   context."

I think he means, all CPUs are in the #MC handler.

Also, looking at that mail, what kernel is Wu Bo using?

[ 4767.947960] BUG: unable to handle kernel paging request at ffff893e40000000
[ 4767.947962] PGD 13c001067 P4D 13c001067 PUD 0
[ 4767.947965] Oops: 0000 [#1] SMP PTI
[ 4767.947967] CPU: 0 PID: 0 Comm: swapper/0

There's no kernel version on this line above. Taint line is gone too. Why?

Judging by the "unable to handle kernel paging request" text, that must
be from before

  f28b11a2abd9 ("x86/fault: Reword initial BUG message for unhandled page faults")

which is 5.1. The commit above is in 5.1 but Wu Bo better try the latest
*upstream* kernel first. The stress being on *upstream*.

Also that kernel is in a guest - I take MCEs in guests not very
seriously.

So before we waste time, let's explain why we're doing all that exercise
first.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2020-09-30 17:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-29 21:16 [PATCH v2] x86: Fix MCE error handing when kdump is enabled minyard
2020-09-30 17:56 ` Borislav Petkov [this message]
2020-09-30 18:49   ` Corey Minyard
2020-10-01 11:33     ` Borislav Petkov
2020-10-01 13:44       ` Corey Minyard
2020-10-01 16:16         ` Borislav Petkov
2020-10-01 16:29           ` Luck, Tony
2020-10-01 16:58             ` Borislav Petkov
2020-10-01 17:12             ` Corey Minyard
2020-10-10  1:36 ` Zhiqiang Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200930175633.GM6810@zn.tnic \
    --to=bp@alien8.de \
    --cc=cminyard@mvista.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=linfeilong@huawei.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=liuzhiqiang26@huawei.com \
    --cc=luto@kernel.org \
    --cc=minyard@acm.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).