All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xunlei Pang <xpang@redhat.com>
To: Borislav Petkov <bp@alien8.de>, "Luck, Tony" <tony.luck@intel.com>
Cc: xlpang@redhat.com, x86@kernel.org, linux-kernel@vger.kernel.org,
	kexec@lists.infradead.org, Ingo Molnar <mingo@redhat.com>,
	Dave Young <dyoung@redhat.com>,
	Prarit Bhargava <prarit@redhat.com>,
	Junichi Nomura <j-nomura@ce.jp.nec.com>,
	Kiyoshi Ueda <k-ueda@ct.jp.nec.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
Date: Tue, 24 Jan 2017 09:46:48 +0800	[thread overview]
Message-ID: <5886B208.90804@redhat.com> (raw)
In-Reply-To: <20170123175130.l7c7mnmu74ln5v6h@pd.tnic>

On 01/24/2017 at 01:51 AM, Borislav Petkov wrote:
> Hey Tony,
>
> a "welcome back" is in order? :-)
>
> On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
>> If the system had experienced some memory corruption, but
>> recovered ... then there would be some pages sitting around
>> that the old kernel had marked as POISON and stopped using.
>> The kexec'd kernel doesn't know about these, so may touch that
>> memory while taking a crash dump ...
> Hmm, pass a list of poisoned pages to the kdump kernel so as not to
> touch. Looks like there's already functionality for that:
>
> "makedumpfile can exclude the following types of pages while copying
> VMCORE to DUMPFILE, and a user can choose which type of pages will be
> excluded.
>
> - Pages filled with zero
> - Cache pages
> - User process data pages
> - Free pages"
>
>  (there is a makedumpfile manpage somewhere)
>
> And apparently crash knows about poisoned pages and handles them:
>
> static int __init crash_save_vmcoreinfo_init(void)
> {
> 	...
> #ifdef CONFIG_MEMORY_FAILURE
>         VMCOREINFO_NUMBER(PG_hwpoison);
> #endif
>
> so if that works, the kexeced kernel should know about that list.

>From the log in my previous reply, MCE occurred before makedumpfile dumping,
so I guess if the poisoned ones belong to the crash reserved memory or other
type of events?

Besides, some kdump kernel may not use makedumpfile, for example a simple "cp"
is also allowed to process "/proc/vmcore".

>
>> and then you have a broadcast machine check (on older[1] Intel CPUs
>> that don't support local machine check).
> Right.
>
>> This is hard to work around. You really need all the CPUs to have set
>> CR4.MCE=1 (if any didn't, then they will force a reset when they see
>> the machine check). Also you need to make sure that they jump to the
>> copy of do_machine_check() in the new kernel, not the old kernel.
> Doesn't matter, right? The new copy is as clueless as the old one about
> those MCEs.
>

It's the code in mce_start(), it waits for all the online cpus including the cpus
that kdump boots on to synchronize.

So for new mce handler of kdump kernel, it is fine as the number of online cpus
is correct; as for old mce handler of 1st kernel, it's not true because some cpus
which are regarded online from 1st kernel's view are running the 2nd kernel now,
they can't respond to the old mce handler which will timeout the old mce handler.

Regards,
Xunlei

WARNING: multiple messages have this Message-ID (diff)
From: Xunlei Pang <xpang@redhat.com>
To: Borislav Petkov <bp@alien8.de>, "Luck, Tony" <tony.luck@intel.com>
Cc: Prarit Bhargava <prarit@redhat.com>,
	Kiyoshi Ueda <k-ueda@ct.jp.nec.com>,
	xlpang@redhat.com, x86@kernel.org, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Junichi Nomura <j-nomura@ce.jp.nec.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Dave Young <dyoung@redhat.com>
Subject: Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
Date: Tue, 24 Jan 2017 09:46:48 +0800	[thread overview]
Message-ID: <5886B208.90804@redhat.com> (raw)
In-Reply-To: <20170123175130.l7c7mnmu74ln5v6h@pd.tnic>

On 01/24/2017 at 01:51 AM, Borislav Petkov wrote:
> Hey Tony,
>
> a "welcome back" is in order? :-)
>
> On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
>> If the system had experienced some memory corruption, but
>> recovered ... then there would be some pages sitting around
>> that the old kernel had marked as POISON and stopped using.
>> The kexec'd kernel doesn't know about these, so may touch that
>> memory while taking a crash dump ...
> Hmm, pass a list of poisoned pages to the kdump kernel so as not to
> touch. Looks like there's already functionality for that:
>
> "makedumpfile can exclude the following types of pages while copying
> VMCORE to DUMPFILE, and a user can choose which type of pages will be
> excluded.
>
> - Pages filled with zero
> - Cache pages
> - User process data pages
> - Free pages"
>
>  (there is a makedumpfile manpage somewhere)
>
> And apparently crash knows about poisoned pages and handles them:
>
> static int __init crash_save_vmcoreinfo_init(void)
> {
> 	...
> #ifdef CONFIG_MEMORY_FAILURE
>         VMCOREINFO_NUMBER(PG_hwpoison);
> #endif
>
> so if that works, the kexeced kernel should know about that list.

From the log in my previous reply, MCE occurred before makedumpfile dumping,
so I guess if the poisoned ones belong to the crash reserved memory or other
type of events?

Besides, some kdump kernel may not use makedumpfile, for example a simple "cp"
is also allowed to process "/proc/vmcore".

>
>> and then you have a broadcast machine check (on older[1] Intel CPUs
>> that don't support local machine check).
> Right.
>
>> This is hard to work around. You really need all the CPUs to have set
>> CR4.MCE=1 (if any didn't, then they will force a reset when they see
>> the machine check). Also you need to make sure that they jump to the
>> copy of do_machine_check() in the new kernel, not the old kernel.
> Doesn't matter, right? The new copy is as clueless as the old one about
> those MCEs.
>

It's the code in mce_start(), it waits for all the online cpus including the cpus
that kdump boots on to synchronize.

So for new mce handler of kdump kernel, it is fine as the number of online cpus
is correct; as for old mce handler of 1st kernel, it's not true because some cpus
which are regarded online from 1st kernel's view are running the 2nd kernel now,
they can't respond to the old mce handler which will timeout the old mce handler.

Regards,
Xunlei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  parent reply	other threads:[~2017-01-24  1:45 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-23  8:01 [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic Xunlei Pang
2017-01-23  8:01 ` Xunlei Pang
2017-01-23 12:51 ` Borislav Petkov
2017-01-23 12:51   ` Borislav Petkov
2017-01-23 13:35   ` Xunlei Pang
2017-01-23 13:35     ` Xunlei Pang
2017-01-23 14:50     ` Borislav Petkov
2017-01-23 14:50       ` Borislav Petkov
2017-01-23 17:40       ` Luck, Tony
2017-01-23 17:40         ` Luck, Tony
2017-01-23 17:51         ` Borislav Petkov
2017-01-23 17:51           ` Borislav Petkov
2017-01-23 18:01           ` Luck, Tony
2017-01-23 18:01             ` Luck, Tony
2017-01-23 18:14             ` Borislav Petkov
2017-01-23 18:14               ` Borislav Petkov
2017-01-24  2:33               ` Xunlei Pang
2017-01-24  2:33                 ` Xunlei Pang
2017-01-24  1:46           ` Xunlei Pang [this message]
2017-01-24  1:46             ` Xunlei Pang
2017-01-24  1:51             ` Xunlei Pang
2017-01-24  1:51               ` Xunlei Pang
2017-01-24  1:27       ` Xunlei Pang
2017-01-24  1:27         ` Xunlei Pang
2017-01-24 12:22         ` Borislav Petkov
2017-01-24 12:22           ` Borislav Petkov
2017-01-26  6:30           ` Xunlei Pang
2017-01-26  6:30             ` Xunlei Pang
2017-01-26  6:44             ` Borislav Petkov
2017-01-26  6:44               ` Borislav Petkov
2017-02-16  5:36               ` Xunlei Pang
2017-02-16  5:36                 ` Xunlei Pang
2017-02-16 10:18                 ` Borislav Petkov
2017-02-16 10:18                   ` Borislav Petkov
2017-02-16 11:52                   ` Xunlei Pang
2017-02-16 11:52                     ` Xunlei Pang
2017-02-16 12:22                     ` Borislav Petkov
2017-02-16 12:22                       ` Borislav Petkov
2017-02-17  1:53                       ` Xunlei Pang
2017-02-17  1:53                         ` Xunlei Pang
2017-02-17  9:07                         ` Borislav Petkov
2017-02-17  9:07                           ` Borislav Petkov
2017-02-17 16:21                           ` Xunlei Pang
2017-02-17 16:21                             ` Xunlei Pang
2017-02-21 18:20                             ` Luck, Tony
2017-02-21 18:20                               ` Luck, Tony
2017-02-22  5:50                               ` Xunlei Pang
2017-02-22  5:50                                 ` Xunlei Pang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5886B208.90804@redhat.com \
    --to=xpang@redhat.com \
    --cc=bp@alien8.de \
    --cc=dyoung@redhat.com \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=k-ueda@ct.jp.nec.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=prarit@redhat.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=xlpang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.