All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Corey Minyard <minyard@acm.org>
Cc: "Luck, Tony" <tony.luck@intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	linux-edac@vger.kernel.org, Corey Minyard <cminyard@mvista.com>,
	hidehiro.kawai.ez@hitachi.com, linfeilong@huawei.com,
	liuzhiqiang26@huawei.com
Subject: Re: [PATCH v2] x86: Fix MCE error handing when kdump is enabled
Date: Thu, 1 Oct 2020 18:16:45 +0200	[thread overview]
Message-ID: <20201001161645.GD17683@zn.tnic> (raw)
In-Reply-To: <20201001134449.GB3674@minyard.net>

On Thu, Oct 01, 2020 at 08:44:49AM -0500, Corey Minyard wrote:
> I don't understand the last sentence.  You don't want to do IRQ
> servicing when you are going to kdump.  That's going to change the state
> of the kernel and you may lose information, and it may interfere with
> the kdump process.

I misspoke: what I meant was, what mce_check_crashing_cpu() does - free
the CPU from the #MC handler so that it can do whatever it is supposed
to do under kdump.

> That's why (well, one of many reasons why) kdump goes straight to NMI
> shootdown.

Right.

> Also, it's still unclear to me how kdump would get the register
> information for the CPUs that enter wait_for_panic().

Yes, you said that already.

> I was thinking about this some yesterday.  It seems to me that enabling
> IRQS in an MCE handler is just a bad idea, but it's really a bad idea
> for kdump.

I don't think this code ever thought about kdump.

> I think you could just remove the irq enable in wait_for_panic() and
> call run_crash_ipi_callback() from the loop there without messing
> with irqs.  In the non-kdump case, it waits a second for the
> RESET_VECTOR to happen in native_stop_other_cpus() then it uses an NMI
> shootdown.  So it will delay for a second in the normal panic case.
> The kdump case uses nmi_shootdown_cpus(), which doesn't do the
> RESET_VECTOR stop.

Well, I don't think the MCE code should know anything about kdump. What
it should do in the kdump case - i.e., when crashing_cpu != -1, is
simply call mce_check_crashing_cpu() in wait_for_panic(). In that case,
the only thing it should do is get out of the #MC handler so that it can
get the shootdown NMI.

For all other cases, it should do what wait_for_panic() has been doing
so far.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2020-10-01 16:16 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-29 21:16 [PATCH v2] x86: Fix MCE error handing when kdump is enabled minyard
2020-09-30 17:56 ` Borislav Petkov
2020-09-30 18:49   ` Corey Minyard
2020-10-01 11:33     ` Borislav Petkov
2020-10-01 13:44       ` Corey Minyard
2020-10-01 16:16         ` Borislav Petkov [this message]
2020-10-01 16:29           ` Luck, Tony
2020-10-01 16:58             ` Borislav Petkov
2020-10-01 17:12             ` Corey Minyard
2020-10-10  1:36 ` Zhiqiang Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201001161645.GD17683@zn.tnic \
    --to=bp@alien8.de \
    --cc=cminyard@mvista.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=linfeilong@huawei.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=liuzhiqiang26@huawei.com \
    --cc=luto@kernel.org \
    --cc=minyard@acm.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.