All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Ingo Molnar <mingo@kernel.org>,
	Prarit Bhargava <prarit@redhat.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Junichi Nomura <j-nomura@ce.jp.nec.com>,
	Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Subject: Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump
Date: Thu, 9 Apr 2015 21:05:51 +0200	[thread overview]
Message-ID: <20150409190550.GJ25434@pd.tnic> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F32A5D502@ORSMSX114.amr.corp.intel.com>

On Thu, Apr 09, 2015 at 06:22:02PM +0000, Luck, Tony wrote:
> > Why? Those CPUs are offlined and num_online_cpus() in mce_start() should
> > account for that, no?
> >
> > And if those are offlined, they're very very unlikely to trigger an MCE
> > as they're idle and not executing code.
> 
> Let's step back a few feet and look at the big picture.  There are three main classes of machine check
> that we might see while trying to run kdump - an remember that all machine checks are currently
> broadcast, so all cpus whether online or offline will see them
> 
> 1) Fatal
> We have to crash - lose the dump.  Having a new machine check handler will make things a bit easier
> to see what happened because we won't have any synchronization failed messages from the offline
> cpus.

But this should not be a problem if kdump path keeps cpu_online_mask
uptodate. I'm looking at kdump_nmi_callback() or crash_nmi_callback() or
so. Those should clear cpu_online_mask and then mce_start() will work
fine on the crashing CPU.

IMHO, of course.

> 2) Execution path recoverable (SRAR in SDM parlance).
> Also going to be fatal (kdump is all running in ring0, and we can't recover from errors in ring 0). Cleaner
> messages as above. Potentially in the future we might be able to make the kdump machine check handler
> actually recover by just skipping a page - if the location of the error was in the old kernel image.
> 
> 3) Non-execution path recoverable (SRAO in SDM)
> We ought to be able to keep kdump running if this happens - the "AO" stands for "action optional",
> so we are going to choose to not take an action. Wherever the error was, it won't affect correctness
> of execution of the current context.

Those could be simply made to go to dmesg during kdump, i.e. decouple
any MCE consumers. And we do that now anyway, i.e. box without mcelog or
some other ras daemon running.

So we could reuse the normal handler - we just need to do some tweaking
first... AFAICT, of course. I believe in that endeavor, the devil will
be in the detail.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

  reply	other threads:[~2015-04-09 19:08 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-03  9:01 [PATCH v3 1/2] x86: mce: kexec: switch MCE handler for kexec/kdump Naoya Horiguchi
2015-03-03  9:01 ` [PATCH v3 2/2] x86: mce: comment about MCE synchronization timeout on definition of tolerant Naoya Horiguchi
2015-03-03 18:09 ` [PATCH v3 1/2] x86: mce: kexec: switch MCE handler for kexec/kdump Luck, Tony
2015-03-04  7:41   ` [PATCH v4] " Naoya Horiguchi
2015-03-04 23:12     ` Luck, Tony
2015-03-05  1:24       ` Naoya Horiguchi
2015-03-05  6:45         ` [PATCH v5] " Naoya Horiguchi
2015-03-05  8:57           ` Borislav Petkov
2015-03-05  9:37             ` Naoya Horiguchi
2015-03-06  2:59               ` [PATCH v6] " Naoya Horiguchi
2015-03-06  8:34                 ` Borislav Petkov
2015-03-06  9:09                   ` Naoya Horiguchi
2015-03-06  9:27                     ` Borislav Petkov
2015-03-06  9:32                       ` Naoya Horiguchi
2015-03-06 10:22                         ` [PATCH v7] " Naoya Horiguchi
2015-04-06  7:18                           ` Naoya Horiguchi
2015-04-06 11:59                             ` Borislav Petkov
2015-04-07  8:00                               ` Naoya Horiguchi
2015-04-07  8:02                                 ` [PATCH v8] " Naoya Horiguchi
2015-04-09  6:13                                   ` Borislav Petkov
2015-04-09  6:57                                     ` Naoya Horiguchi
2015-04-09  7:02                                       ` Borislav Petkov
2015-04-09 18:07                                         ` Luck, Tony
2015-04-09  8:00                                     ` Ingo Molnar
2015-04-09  8:21                                       ` Borislav Petkov
2015-04-09  8:59                                         ` Naoya Horiguchi
2015-04-09  9:53                                           ` Borislav Petkov
2015-04-09 18:22                                             ` Luck, Tony
2015-04-09 19:05                                               ` Borislav Petkov [this message]
2015-04-10  0:49                                                 ` Naoya Horiguchi
2015-04-10  4:07                                                   ` Naoya Horiguchi
2015-04-10  7:24                                                     ` Borislav Petkov
2015-04-28  8:41                                                   ` Baoquan He
2015-04-09  8:39                                       ` Naoya Horiguchi
2015-04-09  9:13                                         ` Ingo Molnar
2015-04-06 11:56                           ` [PATCH v7] " Borislav Petkov
2015-04-07  7:59                             ` Naoya Horiguchi
2015-03-06  8:28               ` [PATCH v5] " Borislav Petkov
2015-03-06  5:44         ` [PATCH v4] " Naoya Horiguchi
2015-03-05  8:48       ` Borislav Petkov
2015-03-03 18:53 ` [PATCH v3 1/2] " Borislav Petkov
2015-03-04  7:51   ` Naoya Horiguchi
2015-03-04  9:12     ` Borislav Petkov
2015-03-05  1:27       ` Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150409190550.GJ25434@pd.tnic \
    --to=bp@alien8.de \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=k-ueda@ct.jp.nec.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=prarit@redhat.com \
    --cc=tony.luck@intel.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.