Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

From: Xunlei Pang <xpang@redhat.com>
To: Borislav Petkov <bp@alien8.de>, "Luck, Tony" <tony.luck@intel.com>
Cc: xlpang@redhat.com, x86@kernel.org, linux-kernel@vger.kernel.org,
	kexec@lists.infradead.org, Ingo Molnar <mingo@redhat.com>,
	Dave Young <dyoung@redhat.com>,
	Prarit Bhargava <prarit@redhat.com>,
	Junichi Nomura <j-nomura@ce.jp.nec.com>,
	Kiyoshi Ueda <k-ueda@ct.jp.nec.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
Date: Tue, 24 Jan 2017 09:46:48 +0800	[thread overview]
Message-ID: <5886B208.90804@redhat.com> (raw)
In-Reply-To: <20170123175130.l7c7mnmu74ln5v6h@pd.tnic>

On 01/24/2017 at 01:51 AM, Borislav Petkov wrote:
> Hey Tony,
>
> a "welcome back" is in order? :-)
>
> On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
>> If the system had experienced some memory corruption, but
>> recovered ... then there would be some pages sitting around
>> that the old kernel had marked as POISON and stopped using.
>> The kexec'd kernel doesn't know about these, so may touch that
>> memory while taking a crash dump ...
> Hmm, pass a list of poisoned pages to the kdump kernel so as not to
> touch. Looks like there's already functionality for that:
>
> "makedumpfile can exclude the following types of pages while copying
> VMCORE to DUMPFILE, and a user can choose which type of pages will be
> excluded.
>
> - Pages filled with zero
> - Cache pages
> - User process data pages
> - Free pages"
>
>  (there is a makedumpfile manpage somewhere)
>
> And apparently crash knows about poisoned pages and handles them:
>
> static int __init crash_save_vmcoreinfo_init(void)
> {
> 	...
> #ifdef CONFIG_MEMORY_FAILURE
>         VMCOREINFO_NUMBER(PG_hwpoison);
> #endif
>
> so if that works, the kexeced kernel should know about that list.

>From the log in my previous reply, MCE occurred before makedumpfile dumping,
so I guess if the poisoned ones belong to the crash reserved memory or other
type of events?

Besides, some kdump kernel may not use makedumpfile, for example a simple "cp"
is also allowed to process "/proc/vmcore".

>
>> and then you have a broadcast machine check (on older[1] Intel CPUs
>> that don't support local machine check).
> Right.
>
>> This is hard to work around. You really need all the CPUs to have set
>> CR4.MCE=1 (if any didn't, then they will force a reset when they see
>> the machine check). Also you need to make sure that they jump to the
>> copy of do_machine_check() in the new kernel, not the old kernel.
> Doesn't matter, right? The new copy is as clueless as the old one about
> those MCEs.
>

It's the code in mce_start(), it waits for all the online cpus including the cpus
that kdump boots on to synchronize.

So for new mce handler of kdump kernel, it is fine as the number of online cpus
is correct; as for old mce handler of 1st kernel, it's not true because some cpus
which are regarded online from 1st kernel's view are running the 2nd kernel now,
they can't respond to the old mce handler which will timeout the old mce handler.

Regards,
Xunlei