[PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-23  8:01 ` Xunlei Pang
  0 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-01-23  8:01 UTC (permalink / raw)
  To: x86, linux-kernel, kexec
  Cc: Tony Luck, Borislav Petkov, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Xunlei Pang,
	Naoya Horiguchi

We met an issue for kdump: after kdump kernel boots up,
and there comes a broadcasted mce in first kernel, the
other cpus remaining in first kernel will enter the old
mce handler of first kernel, then timeout and panic due
to MCE synchronization, finally reset the kdump cpus.

This patch lets cpus stay quiet when panic happens, so
before crash cpu shots them down or after kdump boots,
they should not do anything except clearing MCG_STATUS
in case of broadcasted mce. This is useful for kdump
to let the vmcore dumping perform as hard as it can.

Previous efforts:
https://patchwork.kernel.org/patch/6167631/
https://lists.gt.net/linux/kernel/2146557

Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 00ef432..0c2bf77 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1157,6 +1157,23 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 
 	mce_gather_info(&m, regs);
 
+	/*
+	 * Check if this MCE is signaled to only this logical processor,
+	 * on Intel only.
+	 */
+	if (m.cpuvendor == X86_VENDOR_INTEL)
+		lmce = m.mcgstatus & MCG_STATUS_LMCES;
+
+	/*
+	 * Special treatment for Intel broadcasted machine check:
+	 * To avoid panic due to MCE synchronization in case of kdump,
+	 * after system panic, clear global status and bail out.
+	 */
+	if (!lmce && atomic_read(&panic_cpu) != PANIC_CPU_INVALID) {
+		wrmsrl(MSR_IA32_MCG_STATUS, 0);
+		goto out;
+	}
+
 	final = this_cpu_ptr(&mces_seen);
 	*final = m;
 
@@ -1174,13 +1191,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		kill_it = 1;
 
 	/*
-	 * Check if this MCE is signaled to only this logical processor,
-	 * on Intel only.
-	 */
-	if (m.cpuvendor == X86_VENDOR_INTEL)
-		lmce = m.mcgstatus & MCG_STATUS_LMCES;
-
-	/*
 	 * Go through all banks in exclusion of the other CPUs. This way we
 	 * don't report duplicated events on shared banks because the first one
 	 * to see it will clear it. If this is a Local MCE, then no need to
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-23  8:01 ` Xunlei Pang
  0 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-01-23  8:01 UTC (permalink / raw)
  To: x86, linux-kernel, kexec
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, Xunlei Pang,
	Ingo Molnar, Borislav Petkov, Junichi Nomura, Naoya Horiguchi,
	Dave Young

We met an issue for kdump: after kdump kernel boots up,
and there comes a broadcasted mce in first kernel, the
other cpus remaining in first kernel will enter the old
mce handler of first kernel, then timeout and panic due
to MCE synchronization, finally reset the kdump cpus.

This patch lets cpus stay quiet when panic happens, so
before crash cpu shots them down or after kdump boots,
they should not do anything except clearing MCG_STATUS
in case of broadcasted mce. This is useful for kdump
to let the vmcore dumping perform as hard as it can.

Previous efforts:
https://patchwork.kernel.org/patch/6167631/
https://lists.gt.net/linux/kernel/2146557

Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 00ef432..0c2bf77 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1157,6 +1157,23 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 
 	mce_gather_info(&m, regs);
 
+	/*
+	 * Check if this MCE is signaled to only this logical processor,
+	 * on Intel only.
+	 */
+	if (m.cpuvendor == X86_VENDOR_INTEL)
+		lmce = m.mcgstatus & MCG_STATUS_LMCES;
+
+	/*
+	 * Special treatment for Intel broadcasted machine check:
+	 * To avoid panic due to MCE synchronization in case of kdump,
+	 * after system panic, clear global status and bail out.
+	 */
+	if (!lmce && atomic_read(&panic_cpu) != PANIC_CPU_INVALID) {
+		wrmsrl(MSR_IA32_MCG_STATUS, 0);
+		goto out;
+	}
+
 	final = this_cpu_ptr(&mces_seen);
 	*final = m;
 
@@ -1174,13 +1191,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		kill_it = 1;
 
 	/*
-	 * Check if this MCE is signaled to only this logical processor,
-	 * on Intel only.
-	 */
-	if (m.cpuvendor == X86_VENDOR_INTEL)
-		lmce = m.mcgstatus & MCG_STATUS_LMCES;
-
-	/*
 	 * Go through all banks in exclusion of the other CPUs. This way we
 	 * don't report duplicated events on shared banks because the first one
 	 * to see it will clear it. If this is a Local MCE, then no need to
-- 
1.8.3.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-23  8:01 ` Xunlei Pang
@ 2017-01-23 12:51   ` Borislav Petkov
  -1 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-01-23 12:51 UTC (permalink / raw)
  To: Xunlei Pang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On Mon, Jan 23, 2017 at 04:01:51PM +0800, Xunlei Pang wrote:
> We met an issue for kdump: after kdump kernel boots up,
> and there comes a broadcasted mce in first kernel, the

How does that even happen?

Lemme try to understand this correctly: the first kernel gets an
MCE, kdump starts and boots a *whole* kernel and *then* you get the
broadcasted MCE? I have real hard time believing that.

What happened to the approach of clearing CR4.MCE before loading the
kdump kernel, in native_machine_shutdown() or wherever does the kdump
gets loaded...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-23 12:51   ` Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-01-23 12:51 UTC (permalink / raw)
  To: Xunlei Pang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On Mon, Jan 23, 2017 at 04:01:51PM +0800, Xunlei Pang wrote:
> We met an issue for kdump: after kdump kernel boots up,
> and there comes a broadcasted mce in first kernel, the

How does that even happen?

Lemme try to understand this correctly: the first kernel gets an
MCE, kdump starts and boots a *whole* kernel and *then* you get the
broadcasted MCE? I have real hard time believing that.

What happened to the approach of clearing CR4.MCE before loading the
kdump kernel, in native_machine_shutdown() or wherever does the kdump
gets loaded...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-23 12:51   ` Borislav Petkov
@ 2017-01-23 13:35     ` Xunlei Pang
  -1 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-01-23 13:35 UTC (permalink / raw)
  To: Borislav Petkov, Xunlei Pang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On 01/23/2017 at 08:51 PM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 04:01:51PM +0800, Xunlei Pang wrote:
>> We met an issue for kdump: after kdump kernel boots up,
>> and there comes a broadcasted mce in first kernel, the
> How does that even happen?
>
> Lemme try to understand this correctly: the first kernel gets an
> MCE, kdump starts and boots a *whole* kernel and *then* you get the
> broadcasted MCE? I have real hard time believing that.
>
> What happened to the approach of clearing CR4.MCE before loading the
> kdump kernel, in native_machine_shutdown() or wherever does the kdump
> gets loaded...
>

One possible timing sequence would be:
1st kernel running on multiple cpus panicked
then the crash dump code starts
the crash dump code stops the others cpus except the crashing one
2nd kernel boots up on the crash cpu with "nr_cpus=1"
some broadcasted mce comes on some cpu amongst the other cpus(not the crashing cpu)
the other cpus enter old mce handler of 1st kernel, while crash cpu enters new mce handler of 2nd kernel
the old mce handler of 1st kernel will timeout and panic due to mce syncrhonization under default setting

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-23 13:35     ` Xunlei Pang
  0 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-01-23 13:35 UTC (permalink / raw)
  To: Borislav Petkov, Xunlei Pang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On 01/23/2017 at 08:51 PM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 04:01:51PM +0800, Xunlei Pang wrote:
>> We met an issue for kdump: after kdump kernel boots up,
>> and there comes a broadcasted mce in first kernel, the
> How does that even happen?
>
> Lemme try to understand this correctly: the first kernel gets an
> MCE, kdump starts and boots a *whole* kernel and *then* you get the
> broadcasted MCE? I have real hard time believing that.
>
> What happened to the approach of clearing CR4.MCE before loading the
> kdump kernel, in native_machine_shutdown() or wherever does the kdump
> gets loaded...
>

One possible timing sequence would be:
1st kernel running on multiple cpus panicked
then the crash dump code starts
the crash dump code stops the others cpus except the crashing one
2nd kernel boots up on the crash cpu with "nr_cpus=1"
some broadcasted mce comes on some cpu amongst the other cpus(not the crashing cpu)
the other cpus enter old mce handler of 1st kernel, while crash cpu enters new mce handler of 2nd kernel
the old mce handler of 1st kernel will timeout and panic due to mce syncrhonization under default setting

Regards,
Xunlei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-23 13:35     ` Xunlei Pang
@ 2017-01-23 14:50       ` Borislav Petkov
  -1 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-01-23 14:50 UTC (permalink / raw)
  To: xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
> One possible timing sequence would be:
> 1st kernel running on multiple cpus panicked
> then the crash dump code starts
> the crash dump code stops the others cpus except the crashing one
> 2nd kernel boots up on the crash cpu with "nr_cpus=1"
> some broadcasted mce comes on some cpu amongst the other cpus(not the crashing cpu)

Where does this broadcasted MCE come from?

The crash dump code triggered it? Or it happened before the panic()?

Are you talking about an *actual* sequence which you're experiencing on
real hw or is this something hypothetical?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-23 14:50       ` Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-01-23 14:50 UTC (permalink / raw)
  To: xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
> One possible timing sequence would be:
> 1st kernel running on multiple cpus panicked
> then the crash dump code starts
> the crash dump code stops the others cpus except the crashing one
> 2nd kernel boots up on the crash cpu with "nr_cpus=1"
> some broadcasted mce comes on some cpu amongst the other cpus(not the crashing cpu)

Where does this broadcasted MCE come from?

The crash dump code triggered it? Or it happened before the panic()?

Are you talking about an *actual* sequence which you're experiencing on
real hw or is this something hypothetical?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-23 14:50       ` Borislav Petkov
@ 2017-01-23 17:40         ` Luck, Tony
  -1 siblings, 0 replies; 48+ messages in thread
From: Luck, Tony @ 2017-01-23 17:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: xlpang, x86, linux-kernel, kexec, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On Mon, Jan 23, 2017 at 03:50:56PM +0100, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
> > One possible timing sequence would be:
> > 1st kernel running on multiple cpus panicked
> > then the crash dump code starts
> > the crash dump code stops the others cpus except the crashing one
> > 2nd kernel boots up on the crash cpu with "nr_cpus=1"
> > some broadcasted mce comes on some cpu amongst the other cpus(not the crashing cpu)
> 
> Where does this broadcasted MCE come from?
> 
> The crash dump code triggered it? Or it happened before the panic()?
> 
> Are you talking about an *actual* sequence which you're experiencing on
> real hw or is this something hypothetical?

If the system had experienced some memory corruption, but
recovered ... then there would be some pages sitting around
that the old kernel had marked as POISON and stopped using.
The kexec'd kernel doesn't know about these, so may touch that
memory while taking a crash dump ... and then you have a
broadcast machine check (on older[1] Intel CPUs that don't support
local machine check).

This is hard to work around.  You really need all the CPUs to
have set CR4.MCE=1 (if any didn't, then they will force a reset
when they see the machine check). Also you need to make sure that
they jump to the copy of do_machine_check() in the new kernel, not
the old kernel.

A while ago I played with the nr_cpus=N code to have it bring
all the CPUs far enough online to get the machine check initialization
done, then any extras above "N" just go back offline again.
But I never got this to work reliably.

-Tony

[1] older == all released ones, at the moment.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-23 17:40         ` Luck, Tony
  0 siblings, 0 replies; 48+ messages in thread
From: Luck, Tony @ 2017-01-23 17:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Prarit Bhargava, Kiyoshi Ueda, xlpang, x86, kexec, linux-kernel,
	Ingo Molnar, Junichi Nomura, Naoya Horiguchi, Dave Young

On Mon, Jan 23, 2017 at 03:50:56PM +0100, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
> > One possible timing sequence would be:
> > 1st kernel running on multiple cpus panicked
> > then the crash dump code starts
> > the crash dump code stops the others cpus except the crashing one
> > 2nd kernel boots up on the crash cpu with "nr_cpus=1"
> > some broadcasted mce comes on some cpu amongst the other cpus(not the crashing cpu)
> 
> Where does this broadcasted MCE come from?
> 
> The crash dump code triggered it? Or it happened before the panic()?
> 
> Are you talking about an *actual* sequence which you're experiencing on
> real hw or is this something hypothetical?

If the system had experienced some memory corruption, but
recovered ... then there would be some pages sitting around
that the old kernel had marked as POISON and stopped using.
The kexec'd kernel doesn't know about these, so may touch that
memory while taking a crash dump ... and then you have a
broadcast machine check (on older[1] Intel CPUs that don't support
local machine check).

This is hard to work around.  You really need all the CPUs to
have set CR4.MCE=1 (if any didn't, then they will force a reset
when they see the machine check). Also you need to make sure that
they jump to the copy of do_machine_check() in the new kernel, not
the old kernel.

A while ago I played with the nr_cpus=N code to have it bring
all the CPUs far enough online to get the machine check initialization
done, then any extras above "N" just go back offline again.
But I never got this to work reliably.

-Tony

[1] older == all released ones, at the moment.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-23 17:40         ` Luck, Tony
@ 2017-01-23 17:51           ` Borislav Petkov
  -1 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-01-23 17:51 UTC (permalink / raw)
  To: Luck, Tony
  Cc: xlpang, x86, linux-kernel, kexec, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

Hey Tony,

a "welcome back" is in order? :-)

On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
> If the system had experienced some memory corruption, but
> recovered ... then there would be some pages sitting around
> that the old kernel had marked as POISON and stopped using.
> The kexec'd kernel doesn't know about these, so may touch that
> memory while taking a crash dump ...

Hmm, pass a list of poisoned pages to the kdump kernel so as not to
touch. Looks like there's already functionality for that:

"makedumpfile can exclude the following types of pages while copying
VMCORE to DUMPFILE, and a user can choose which type of pages will be
excluded.

- Pages filled with zero
- Cache pages
- User process data pages
- Free pages"

 (there is a makedumpfile manpage somewhere)

And apparently crash knows about poisoned pages and handles them:

static int __init crash_save_vmcoreinfo_init(void)
{
	...
#ifdef CONFIG_MEMORY_FAILURE
        VMCOREINFO_NUMBER(PG_hwpoison);
#endif

so if that works, the kexeced kernel should know about that list.

> and then you have a broadcast machine check (on older[1] Intel CPUs
> that don't support local machine check).

Right.

> This is hard to work around. You really need all the CPUs to have set
> CR4.MCE=1 (if any didn't, then they will force a reset when they see
> the machine check). Also you need to make sure that they jump to the
> copy of do_machine_check() in the new kernel, not the old kernel.

Doesn't matter, right? The new copy is as clueless as the old one about
those MCEs.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-23 17:51           ` Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-01-23 17:51 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Prarit Bhargava, Kiyoshi Ueda, xlpang, x86, kexec, linux-kernel,
	Ingo Molnar, Junichi Nomura, Naoya Horiguchi, Dave Young

Hey Tony,

a "welcome back" is in order? :-)

On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
> If the system had experienced some memory corruption, but
> recovered ... then there would be some pages sitting around
> that the old kernel had marked as POISON and stopped using.
> The kexec'd kernel doesn't know about these, so may touch that
> memory while taking a crash dump ...

Hmm, pass a list of poisoned pages to the kdump kernel so as not to
touch. Looks like there's already functionality for that:

"makedumpfile can exclude the following types of pages while copying
VMCORE to DUMPFILE, and a user can choose which type of pages will be
excluded.

- Pages filled with zero
- Cache pages
- User process data pages
- Free pages"

 (there is a makedumpfile manpage somewhere)

And apparently crash knows about poisoned pages and handles them:

static int __init crash_save_vmcoreinfo_init(void)
{
	...
#ifdef CONFIG_MEMORY_FAILURE
        VMCOREINFO_NUMBER(PG_hwpoison);
#endif

so if that works, the kexeced kernel should know about that list.

> and then you have a broadcast machine check (on older[1] Intel CPUs
> that don't support local machine check).

Right.

> This is hard to work around. You really need all the CPUs to have set
> CR4.MCE=1 (if any didn't, then they will force a reset when they see
> the machine check). Also you need to make sure that they jump to the
> copy of do_machine_check() in the new kernel, not the old kernel.

Doesn't matter, right? The new copy is as clueless as the old one about
those MCEs.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-23 17:51           ` Borislav Petkov
@ 2017-01-23 18:01             ` Luck, Tony
  -1 siblings, 0 replies; 48+ messages in thread
From: Luck, Tony @ 2017-01-23 18:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: xlpang, x86, linux-kernel, kexec, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On Mon, Jan 23, 2017 at 06:51:30PM +0100, Borislav Petkov wrote:
> Hey Tony,
> 
> a "welcome back" is in order? :-)

Yes - first day back today. Lots of catching up to do.

> And apparently crash knows about poisoned pages and handles them:
> 
> static int __init crash_save_vmcoreinfo_init(void)
> {
> 	...
> #ifdef CONFIG_MEMORY_FAILURE
>         VMCOREINFO_NUMBER(PG_hwpoison);
> #endif
> 
> so if that works, the kexeced kernel should know about that list.

Oh good ... it is smarter than I thought.

> Doesn't matter, right? The new copy is as clueless as the old one about
> those MCEs.

If things are well enough initialized that we don't reset, and
get to do_machine_check(), then this code from Ashok:

        /* If this CPU is offline, just bail out. */
        if (cpu_is_offline(smp_processor_id())) {
                u64 mcgstatus;

                mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
                if (mcgstatus & MCG_STATUS_RIPV) {
                        mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
                        return;
                }
        }

will ignore the machine check on the other cpus ... assuming
that "cpu_is_offline(smp_processor_id())" does the right thing
in the kexec case where this is an "old" cpu that isn't online
in the new kernel.

-Tony

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-23 18:01             ` Luck, Tony
  0 siblings, 0 replies; 48+ messages in thread
From: Luck, Tony @ 2017-01-23 18:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Prarit Bhargava, Kiyoshi Ueda, xlpang, x86, kexec, linux-kernel,
	Ingo Molnar, Junichi Nomura, Naoya Horiguchi, Dave Young

On Mon, Jan 23, 2017 at 06:51:30PM +0100, Borislav Petkov wrote:
> Hey Tony,
> 
> a "welcome back" is in order? :-)

Yes - first day back today. Lots of catching up to do.

> And apparently crash knows about poisoned pages and handles them:
> 
> static int __init crash_save_vmcoreinfo_init(void)
> {
> 	...
> #ifdef CONFIG_MEMORY_FAILURE
>         VMCOREINFO_NUMBER(PG_hwpoison);
> #endif
> 
> so if that works, the kexeced kernel should know about that list.

Oh good ... it is smarter than I thought.

> Doesn't matter, right? The new copy is as clueless as the old one about
> those MCEs.

If things are well enough initialized that we don't reset, and
get to do_machine_check(), then this code from Ashok:

        /* If this CPU is offline, just bail out. */
        if (cpu_is_offline(smp_processor_id())) {
                u64 mcgstatus;

                mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
                if (mcgstatus & MCG_STATUS_RIPV) {
                        mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
                        return;
                }
        }

will ignore the machine check on the other cpus ... assuming
that "cpu_is_offline(smp_processor_id())" does the right thing
in the kexec case where this is an "old" cpu that isn't online
in the new kernel.

-Tony

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-23 18:01             ` Luck, Tony
@ 2017-01-23 18:14               ` Borislav Petkov
  -1 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-01-23 18:14 UTC (permalink / raw)
  To: Luck, Tony
  Cc: xlpang, x86, linux-kernel, kexec, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On Mon, Jan 23, 2017 at 10:01:53AM -0800, Luck, Tony wrote:
> will ignore the machine check on the other cpus ... assuming
> that "cpu_is_offline(smp_processor_id())" does the right thing
> in the kexec case where this is an "old" cpu that isn't online
> in the new kernel.

Nice. And kdump did do the dumping on one CPU, AFAIR. So we should be
good there.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-23 18:14               ` Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-01-23 18:14 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Prarit Bhargava, Kiyoshi Ueda, xlpang, x86, kexec, linux-kernel,
	Ingo Molnar, Junichi Nomura, Naoya Horiguchi, Dave Young

On Mon, Jan 23, 2017 at 10:01:53AM -0800, Luck, Tony wrote:
> will ignore the machine check on the other cpus ... assuming
> that "cpu_is_offline(smp_processor_id())" does the right thing
> in the kexec case where this is an "old" cpu that isn't online
> in the new kernel.

Nice. And kdump did do the dumping on one CPU, AFAIR. So we should be
good there.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-23 14:50       ` Borislav Petkov
@ 2017-01-24  1:27         ` Xunlei Pang
  -1 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-01-24  1:27 UTC (permalink / raw)
  To: Borislav Petkov, xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

[-- Attachment #1: Type: text/plain, Size: 1707 bytes --]

On 01/23/2017 at 10:50 PM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
>> One possible timing sequence would be:
>> 1st kernel running on multiple cpus panicked
>> then the crash dump code starts
>> the crash dump code stops the others cpus except the crashing one
>> 2nd kernel boots up on the crash cpu with "nr_cpus=1"
>> some broadcasted mce comes on some cpu amongst the other cpus(not the crashing cpu)
> Where does this broadcasted MCE come from?
>
> The crash dump code triggered it? Or it happened before the panic()?
>
> Are you talking about an *actual* sequence which you're experiencing on
> real hw or is this something hypothetical?
>

It occurred on real hardware when testing crash dump.

1) SysRq-c was injected for the test in 1st kernel
[ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for kdump
   [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789 ro console=ttyS1,115200 nmi_watchdog=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug disable_cpu_apicid=0 elfcorehdr=869772K 3) An MCE came to the 1st kernel, timeout panic occurred, and rebooted the machine
    [    6.095706] Dazed and confused, but trying to continue  // message of the 1st kernel
    [   81.655507] Kernel panic - not syncing: Timeout synchronizing machine check over CPUs
    [   82.729324] Shutting down cpus with NMI
    [   82.774539] drm_kms_helper: panic occurred, switching back to text console
    [   82.782257] Rebooting in 10 seconds..

Please see the attached for the full log. Regards, Xunlei


[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 27414 bytes --]

[   49.897279] SysRq : Trigger a crash 
[   49.901218] BUG: unable to handle kernel NULL pointer dereference at           (null) 
[   49.909988] IP: [<ffffffff81397486>] sysrq_handle_crash+0x16/0x20 
[   49.916805] PGD 868add067 PUD 867139067 PMD 0  
[   49.921805] Oops: 0002 [#1] SMP  
[   49.925432] Modules linked in: ipmi_devintf intel_powerclamp coretemp intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt sb_edac iTCO_vendor_support ntb mei_me pcspkr edac_core ioatdma lpc_ich i2c_i801 ipmi_si mei mfd_core shpchp dca ipmi_msghandler acpi_pad acpi_power_meter xfs sd_mod sr_mod crc_t10dif cdrom crct10dif_common usb_storage mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ata_generic ttm bnx2x pata_acpi mdio drm ata_piix ptp libata i2c_core pps_core libcrc32c 
[   49.984994] CPU: 9 PID: 9463 Comm: do-test.sh Not tainted 3.10.0-229.el7.x86_64 #1 
[   49.993456] Hardware name: NEC Express5800/B120d-h [N8400-126Y]/G7LDV, BIOS 4.6.2013 10/24/2012 
[   50.003164] task: ffff880433700000 ti: ffff8808653b8000 task.ti: ffff8808653b8000 
[   50.011514] RIP: 0010:[<ffffffff81397486>]  [<ffffffff81397486>] sysrq_handle_crash+0x16/0x20 
[   50.021045] RSP: 0018:ffff8808653bbe80  EFLAGS: 00010046 
[   50.026976] RAX: 000000000000000f RBX: ffffffff819c18a0 RCX: 0000000000000000 
[   50.034939] RDX: 0000000000000000 RSI: ffff88087fc2d488 RDI: 0000000000000063 
[   50.042908] RBP: ffff8808653bbe80 R08: 0000000000000092 R09: 0000000000000608 
[   50.050870] R10: 0000000000000607 R11: 0000000000000003 R12: 0000000000000063 
[   50.058837] R13: 0000000000000246 R14: 0000000000000007 R15: 0000000000000000 
[   50.066799] FS:  00007f0faaf54740(0000) GS:ffff88087fc20000(0000) knlGS:0000000000000000 
[   50.075828] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[   50.082244] CR2: 0000000000000000 CR3: 0000000866d07000 CR4: 00000000000407e0 
[   50.090212] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[   50.098173] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
[   50.106133] Stack: 
[   50.108388]  ffff8808653bbeb8 ffffffff81397c32 0000000000000002 00007f0faaf58000 
[   50.116671]  ffff8808653bbf48 0000000000000002 0000000000000000 ffff8808653bbed0 
[   50.124963]  ffffffff8139810f ffff8804674a6540 ffff8808653bbef0 ffffffff8122de0d 
[   50.133257] Call Trace: 
[   50.135993]  [<ffffffff81397c32>] __handle_sysrq+0xa2/0x170 
[   50.142219]  [<ffffffff8139810f>] write_sysrq_trigger+0x2f/0x40 
[   50.148841]  [<ffffffff8122de0d>] pro c_reg_write+0x3] Code: eb 9b 45 01 f4 45 39 65 34 75 e5 4c 89 ef e8 e2 f7 ff ff eb db 66 66 66 66 90 55 c7 05 50 d7 59 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 31 c0 c7 05 ce  
[   50.194758] RIP  [<ffffffff81397486>] sysrq_handle_crash+0x16/0x20 
[   50.201669]  RSP <ffff8808653bbe80> 
[   50.205558] CR2: 0000000000000000 
[    0.000000] Initializing cgroup subsys cpuset 
[    0.000000] Initializing cgroup subsys cpu 
[    0.000000] Initializing cgroup subsys cpuacct 
[    0.000000] Linux version 3.10.0-229.el7.x86_64 (mockbuild@x86-035.build.eng.bos.redhat.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC) ) #1 SMP Thu Jan 29 18:37:38 EST 2015 
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789 ro console=ttyS1,115200 nmi_watchdog=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug disable_cpu_apicid=0 elfcorehdr=869772K 
[    0.000000] e820: BIOS-provided physical RAM map: 
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved 
[    0.000000] BIOS-e820: [mem 0x0000000000001000-0x0000000000099fff] usable 
[    0.000000] BIOS-e820: [mem 0x000000000009a000-0x000000000009ffff] reserved 
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved 
[    0.000000] BIOS-e820: [mem 0x000000002b000000-0x0000000035162fff] usable 
[    0.000000] BIOS-e820: [mem 0x000000007cf6d000-0x000000007d08cfff] ACPI NVS 
[    0.000000] BIOS-e820: [mem 0x000000007d08d000-0x000000007d109fff] ACPI data 
[    0.000000] BIOS-e820: [mem 0x000000007d10a000-0x000000007de92fff] reserved 
[    0.000000] BIOS-e820: [mem 0x000000007de93000-0x000000007de9bfff] ACPI NVS 
[    0.000000] BIOS-e820: [mem 0x000000007de9c000-0x000000007df3efff] reserved 
[    0.000000] BIOS-e820: [mem 0x000000007df3f000-0x000000007e393fff] ACPI NVS 
[    0.000000] BIOS-e820: [mem 0x000000007e394000-0x000000007ed9bfff] reserved 
[    0.000000] BIOS-e820: [mem 0x000000007ed9c000-0x000000007f7a7fff] ACPI NVS 
[    0.000000] BIOS-e820: [mem 0x000000007f7a8000-0x000000007f7fffff] reserved 
[    0.000000] BIOS-e820: [mem 0x0000000080000000-0x000000008fffffff] reserved 
[    0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed3ffff] reserved 
[    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved 
[    0.000000] NX (Execute Disable) protection: active 
[    0.000000] SMBIOS 2.7 present. 
[    0.000000] No AGP bridge found 
[    0.000000] e820: last_pfn = 0x35163 max_arch_pfn = 0x400000000 
[    0.000000] x86 PAT enabled: cpu 0, old 0x7010600070106, new 0x7010600070106 
[    0.000000] x2apic enabled by BIOS, switching to x2apic ops 
[    0.000000] found SMP MP-table at [mem 0x000fce70-0x000fce7f] mapped at [ffff8800000fce70] 
[    0.000000] Using GB pages for direct mapping 
[    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff] 
[    0.000000] init_memory_mapping: [mem 0x34e00000-0x34ffffff] 
[    0.000000] init_memory_mapping: [mem 0x34000000-0x34dfffff] 
[    0.000000] init_memory_mapping: [mem 0x2b000000-0x33ffffff] 
[    0.000000] init_memory_mapping: [mem 0x35000000-0x35162fff] 
[    0.000000] RAMDISK: [mem 0x31f7a000-0x32ffffff] 
[    0.000000] ACPI: RSDP 00000000000f0450 00024 (v02 NEC   ) 
[    0.000000] ACPI: XSDT 000000007d08d080 00084 (v01 NEC    SVWSPD21 01072009 AMI  00010013) 
[    0.000000] ACPI: FACP 000000007d098570 000F4 (v04 NEC    SVWSPD21 01072009 AMI  00010013) 
[    0.000000] ACPI: DSDT 000000007d08d198 0B3D3 (v02 NEC    SVWSPD21 00000015 INTL 20091112) 
[    0.000000] ACPI: FACS 000000007f7a5f80 00040 
[    0.000000] ACPI: APIC 000000007d098668 00224 (v03 NEC    SVWSPD21 01072009 AMI  00010013) 
[    0.000000] ACPI: MCFG 000000007d098890 0003C (v01 NEC    SVWSPD21 01072009 MSFT 00000097) 
[    0.000000] ACPI: SRAT 000000007d0988d0 004B0 (v01 NEC    SVWSPD21 00000001 AMI. 00000000) 
[    0.000000] ACPI: SLIT 000000007d098d80 00030 (v01 NEC    SVWSPD21 00000000 AMI. 00000000) 
[    0.000000] ACPI: HPET 000000007d098db0 00038 (v01 NEC    SVWSPD21 01072009 AMI. 00000005) 
[    0.000000] ACPI: PRAD 000000007d098de8 000BE (v02 PRADID  PRADTID 00000001 MSFT 03000001) 
[    0.000000] ACPI: SSDT 000000007d098ea8 70104 (v02 NEC    SVWSPD21 00004000 INTL 20090903) 
[    0.000000] ACPI: SLIC 000000007d108fb0 00176 (v01 NEC    SVWSPD21 00000000 CAS  00000001) 
[    0.000000] ACPI: SPCR 000000007d109128 00050 (v01 NEC    SVWSPD21 01072009 AMI. 00000005) 
[    0.000000] ACPI: DMAR 000000007d109178 00138 (v01 NEC    SVWSPD21 00000001 INTL 00000001) 
[    0.000000] ACPI: BERT 000000007d1092b0 00030 (v01 NEC    SVWSPD21 00000000      00000000) 
[    0.000000] Setting APIC routing to cluster x2apic. 
[    0.000000] NUMA turned off 
[    0.000000] Faking a node at [mem 0x0000000000000000-0x0000000035162fff] 
[    0.000000] Initmem setup node 0 [mem 0x00000000-0x35162fff] 
[    0.000000]   NODE_DATA [mem 0x3513c000-0x35162fff] 
[    0.000000] Zone ranges: 
[    0.000000]   DMA      [mem 0x00001000-0x00ffffff] 
[    0.000000]   DMA32    [mem 0x01000000-0xffffffff] 
[    0.000000]   Normal   empty 
[    0.000000] Movable zone start for each node 
[    0.000000] Early memory node ranges 
[    0.000000]   node   0: [mem 0x00001000-0x00099fff] 
[    0.000000]   node   0: [mem 0x2b000000-0x35162fff] 
[    0.000000] ACPI: PM-Timer IO Port: 0x408 
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) 
[    0.000000] ACPI: Disabling requested cpu. Processor 0/0x0 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 1/0x2 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 2/0x4 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 3/0x6 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 4/0x8 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 5/0xa ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0c] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 6/0xc ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0e] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 7/0xe ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 8/0x20 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x12] lapic_id[0x22] enabled) 
[    0.000000] ACPI: LAPIC (acpi_id[0x14] lapic_id[0x24] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 10/0x24 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x16] lapic_id[0x26] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 11/0x26 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x18] lapic_id[0x28] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 12/0x28 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x1a] lapic_id[0x2a] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 13/0x2a ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x1c] lapic_id[0x2c] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 14/0x2c ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x1e] lapic_id[0x2e] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 15/0x2e ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 16/0x1 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 17/0x3 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 18/0x5 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 19/0x7 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 20/0x9 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 21/0xb ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0d] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 22/0xd ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0f] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 23/0xf ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x11] lapic_id[0x21] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 24/0x21 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x13] lapic_id[0x23] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 25/0x23 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x15] lapic_id[0x25] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 26/0x25 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x17] lapic_id[0x27] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 27/0x27 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x19] lapic_id[0x29] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 28/0x29 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x1b] lapic_id[0x2b] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 29/0x2b ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x1d] lapic_id[0x2d] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 30/0x2d ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x1f] lapic_id[0x2f] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 31/0x2f ignored. 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0a] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0c] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0e] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x10] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x12] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x14] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x16] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x18] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1a] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1c] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1e] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x09] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0b] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0d] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0f] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x11] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x13] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x15] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x17] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x19] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1b] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1d] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1f] high edge lint[0x1]) 
[    0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0]) 
[    0.000000] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23 
[    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec01000] gsi_base[24]) 
[    0.000000] IOAPIC[1]: apic_id 2, version 32, address 0xfec01000, GSI 24-47 
[    0.000000] ACPI: IOAPIC (id[0x03] address[0xfec40000] gsi_base[48]) 
[    0.000000] IOAPIC[2]: apic_id 3, version 32, address 0xfec40000, GSI 48-71 
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) 
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) 
[    0.000000] Using ACPI (MADT) for SMP configuration information 
[    0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000 
[    0.000000] smpboot: 32 Processors exceeds NR_CPUS limit of 1 
[    0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs 
[    0.000000] PM: Registered nosave memory: [mem 0x0009a000-0x0009ffff] 
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000dffff] 
[    0.000000] PM: Registered nosave memory: [mem 0x000e0000-0x000fffff] 
[    0.000000] PM: Registered nosave memory: [mem 0x00100000-0x2affffff] 
[    0.000000] e820: [mem 0x90000000-0xfed1bfff] available for PCI devices 
[    0.000000] Booting paravirtualized kernel on bare hardware 
[    0.000000] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:1 nr_cpu_ids:1 nr_node_ids:1 
[    0.000000] PERCPU: Embedded 28 pages/cpu @ffff880034e00000 s82752 r8192 d23744 u2097152 
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 40798 
[    0.000000] Policy zone: DMA32 
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789 ro console=ttyS1,115200 nmi_watchdog=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug disable_cpu_apicid=0 elfcorehdr=869772K 
[    0.000000] Misrouted IRQ fixup and polling support enabled 
[    0.000000] This may significantly impact system performance 
[    0.000000] Disabling memory control group subsystem 
[    0.000000] PID hash table entries: 1024 (order: 1, 8192 bytes) 
[    0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340 
[    0.000000] Checking aperture... 
[    0.000000] No AGP bridge found 
[    0.000000] Memory: 126944k/869772k available (6241k kernel code, 703900k absent, 38928k reserved, 4181k data, 1604k init) 
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 
[    0.000000] Hierarchical RCU implementation. 
[    0.000000] 	RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=1. 
[    0.000000] 	Experimental no-CBs for all CPUs 
[    0.000000] 	Experimental no-CBs CPUs: 0. 
[    0.000000] NR_IRQS:327936 nr_irqs:256 16 
[    0.000000] Spurious LAPIC timer interrupt on cpu 0 
[    0.000000] Console: colour dummy device 80x25 
[    0.000000] console [ttyS1] enabled 
[    0.001000] tsc: Fast TSC calibration using PIT 
[    0.002000] tsc: Detected 2000.018 MHz processor 
[    0.000002] Calibrating delay loop (skipped), value calculated using timer frequency.. 4000.03 BogoMIPS (lpj=2000018) 
[    0.011888] pid_max: default: 32768 minimum: 301 
[    0.017077] Security Framework initialized 
[    0.021662] SELinux:  Initializing. 
[    0.025640] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes) 
[    0.033499] Inode-cache hash table entries: 16384 (order: 5, 131072 bytes) 
[    0.041197] Mount-cache hash table entries: 4096 
[    0.046576] Initializing cgroup subsys memory 
[    0.051455] Initializing cgroup subsys devices 
[    0.056422] Initializing cgroup subsys freezer 
[    0.061390] Initializing cgroup subsys net_cls 
[    0.066364] Initializing cgroup subsys blkio 
[    0.071145] Initializing cgroup subsys perf_event 
[    0.076404] Initializing cgroup subsys hugetlb 
[    0.081408] CPU: Physical Processor ID: 1 
[    0.085890] CPU: Processor Core ID: 1 
[    0.089991] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' 
[    0.089991] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8) 
[    0.104561] Last level iTLB entries: 4KB 512, 2MB 0, 4MB 0 
[    0.104561] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32 
[    0.104561] tlb_flushall_shift: 6 
[    0.126986] Freeing SMP alternatives: 24k freed 
[    0.133595] ACPI: Core revision 20130517 
[    0.154363] ACPI: All ACPI Tables successfully acquired 
[    0.160257] ftrace: allocating 23909 entries in 94 pages 
[    0.179567] dmar: Host address width 46 
[    0.183858] dmar: DRHD base: 0x000000fbffe000 flags: 0x0 
[    0.189795] dmar: IOMMU 0: reg_base_addr fbffe000 ver 1:0 cap d2078c106f0462 ecap f020fe 
[    0.198842] dmar: DRHD base: 0x000000dfffc000 flags: 0x1 
[    0.204782] dmar: IOMMU 1: reg_base_addr dfffc000 ver 1:0 cap d2078c106f0462 ecap f020fe 
[    0.213823] dmar: RMRR base: 0x0000007dea3000 end: 0x0000007dedafff 
[    0.220834] dmar: ATSR flags: 0x0 
[       0.838008] pci 0000:00:02.2: PCI bridge to [bus 01] 
[    0.843617] pci 0000:00:03.0: PCI bridge to [bus 20-3c] 
[    0.849517] pci 0000:00:11.0: PCI bridge to [bus 50-5f] 
[    0.855410] pci 0000:00:1c.0: PCI bridge to [bus 60-61] 
[    0.863257] pci 0000:00:1c.7: PCI bridge to [bus 62-67] 
[    0.869171] pci 0000:00:1e.0: PCI bridge to [bus 6a] (subtractive decode) 
[    0.876828] acpi PNP0A08:00: Disabling ASPM (FADT indicates it is unsupported) 
[    0.885139] ACPI: PCI Root Bridge [UNC0] (domain 0000 [bus 7f]) 
[    0.891750] acpi PNP0A03:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] 
[    0.900880] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM 
[    0.908299] PCI host bridge to bus 0000:7f 
[    0.912871] pci_bus 0000:7f: root bus resource [bus 7f] 
[    0.920945] ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 80-fe]) 
[    0.927846] acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] 
[    0.937106] acpi PNP0A08:01: _OSC: platform does not support [AER] 
[    0.944123] acpi PNP0A08:01: _OSC: OS now controls [PCIeHotplug PME PCIeCapability] 
[    0.952860] PCI host bridge to bus 0000:80 
[    0.957433] pci_bus 0000:80: root bus resource [bus 80-fe] 
[    0.963555] pci_bus 0000:80: root bus resource [io  0xa000-0xffff] 
[    0.970451] pci_bus 0000:80: root bus resource [mem 0xe0000000-0xfbffffff] 
[    0.979463] pci 0000:80:00.0: PCI bridge to [bus 81-8f] 
[    0.985306] acpi PNP0A08:01: Disabling ASPM (FADT indicates it is unsupported) 
[    0.993424] ACPI: PCI Root Bridge [UNC1] (domain 0000 [bus ff]) 
[    1.000033] acpi PNP0A03:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] 
[    1.009155] acpi PNP0A03:01: _OSC failed (AE_NOT_FOUND); disabling ASPM 
[    1.016580] PCI host bridge to bus 0000:ff 
[    1.021153] pci_bus 0000:ff: root bus resource [bus ff] 
[    1.029013] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 *5 6 7 12 14 15), disabled. 
[    1.037527] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 *7 12 14 15), disabled. 
[    1.046043] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 *4 5 6 12 14 15), disabled. 
[    1.054354] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 *5 6 12 14 15), disabled. 
[    1.062659] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 6 7 10 12 14 15) *0, disabled. 
[    1.071462] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 6 7 10 12 14 15) *0, disabled. 
[    1.080266] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 6 7 *10 12 14 15), disabled. 
[    1.088873] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 6 7 *10 12 14 15), disabled. 
[    1.098586] acpi LNXCPU:12: BIOS reported wrong ACPI id 0 for the processor 
[    1.107217] ACPI: Enabled 3 GPEs in block 00 to 3F 
[    1.112688] vgaarb: device added: PCI:0000:62:00.0,decodes=io+mem,owns=io+mem,locks=none 
[    1.121727] vgaarb: loaded 
[    1.124745] vgaarb: bridge control possible 0000:62:00.0 
[    1.130752] SCSI subsystem initialized 
[    1.134961] ACPI: bus type USB registered 
[    1.139448] usbcore: registered new interface driver usbfs 
[    1.145585] usbcore: registered new interface driver hub 
[    1.151536] usbcore: registered new device driver usb 
[    1.157261] PCI: Using ACPI for IRQ routing 
[    1.166361] NetLabel: Initializing 
[    1.170164] NetLabel:  domain hash size = 128 
[    1.175024] NetLabel:  protocols = UNLABELED CIPSOv4 
[    1.180585] NetLabel:  unlabeled traffic allowed by default 
[    1.186864] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0, 0, 0, 0, 0 
[    1.193861] hpet0: 8 comparators, 64-bit 14.318180 MHz counter 
[    1.202396] Switching to clocksource hpet 
[    1.211432] pnp: PnP ACPI init 
[    1.214855] ACPI: bus type PNP registered 
[    1.219416] system 00:00: [mem 0xfc000000-0xfcffffff] has been reserved 
[    1.226803] system 00:00: [mem 0xfd000000-0xfdffffff] has been reserved 
[    1.234190] system 00:00: [mem 0xfe000000-0xfeafffff] has been reserved 
[    1.241580] system 00:00: [mem 0xfeb00000-0xfebfffff] has been reserved 
[    1.248970] system 00:00: [mem 0xfed00400-0xfed3ffff] could not be reserved 
[    1.256743] system 00:00: [mew full-speed USB device number 3 using ehci-pci 
[     4.642333] b53269] bnx2x 0000:02:00.2: part number 394D4342-31383735-30315430-473030 
[    4.754097] usb 2-1.1: New USB device found, idVendor=046b, idProduct=ff10 
[    4.761776] usb 2-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 
[    4.769944] usb 2-1.1: Product: Virtual Keyboard and Mouse 
[    4.776078] usb 2-1.1: Manufacturer: American Megatrends Inc. 
[    4.784918] bnx2x 0000:02:00.3: msix capability found 
[    4.794890] input: American Megatrends Inc. Virtual Keyboard and Mouse as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.1/2-1.1:1.0/input/input2 
[    4.810287] bnx2x 0000:02:00.3: part number 394D4342-31383735-30315430-473030 
[    4.832949] hid-generic 0003:046B:FF10.0001: input,hidraw0: USB HID v1.10 Keyboard [American Megatrends Inc. Virtual Keyboard and Mouse] on usb-0000:00:1d.0-1.1/input0 
[    4.868726] input: American Megatrends Inc. Virtual Keyboard and Mouse as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.1/2-1.1:1.1/input/input3 
[    4.897947] hid-generic 0003:046B:FF10.0002: input,hidraw1: USB HID v1.10 Mouse [American Megatrends Inc. Virtual Keyboard and Mouse] on usb-0000:00:1d.0-1.1/input1 
[    4.915125] bnx2x 0000:02:00.4: msix capability found 
[    4.925275] bnx2x 0000:02:00.4: part number 394D4342-31383735-30315430-473030 
[    5.020362] usb 2-1.3: new high-speed USB device number 4 using ehci-pci 
[    5.028219] bnx2x 0000:02:00.5: msix capability found 
[    5.038287] bnx2x 0000:02:00.5: part number 394D4342-31383735-30315430-473030 
[     5.135374] b0.6: msix capability found 
[     5.146276] b[    5.191222] usb 2-1.3: New USB device found, idVendor=046b, idProduct=ff92 
[    5.198902] usb 2-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=4 
[    5.207062] usb 2-1.3: Product: Composite Device 
[    5.212220] usb 2-1.3: Manufacturer: American Megatrends Inc. 
[    5.218638] usb 2-1.3: SerialNumber: AAAABBBBCCCC4 
[    5.279368] bnx2x 0000:02:00.7: msix capability found 
[    5.292279] bnx2x 0000:02:00.7: part number 394D4342-31383735-30315430-473030 
[    5.430381] bnx2x 0000:01:00.0: msix capability found 
[    5.441297] bnx2x 0000:01:00.0: part number 394D4342-31383735-30315430-473030 
[    5.504653] bnx2x 0000:01:00.1: msix capability found 
[    5.515280] bnx2x 0000:01:00.1: part number 394D4342-31383735-30315430-473030 
[     6.082783] Upower saving mode enabled? 
[    6.095706] Dazed and confused, but trying to continue 
[   81.655507] Kernel panic - not syncing: Timeout synchronizing machine check over CPUs 
[   82.729324] Shutting down cpus with NMI 
[   82.774539] drm_kms_helper: panic occurred, switching back to text console 
[   82.782257] Rebooting in 10 seconds.. 
[   92.787401] ACPI MEMORY or I/O RESET_REG. 
[   95.233944] ACPI MEMORY or I/O RESET_REG.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-24  1:27         ` Xunlei Pang
  0 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-01-24  1:27 UTC (permalink / raw)
  To: Borislav Petkov, xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

[-- Attachment #1: Type: text/plain, Size: 1707 bytes --]

On 01/23/2017 at 10:50 PM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
>> One possible timing sequence would be:
>> 1st kernel running on multiple cpus panicked
>> then the crash dump code starts
>> the crash dump code stops the others cpus except the crashing one
>> 2nd kernel boots up on the crash cpu with "nr_cpus=1"
>> some broadcasted mce comes on some cpu amongst the other cpus(not the crashing cpu)
> Where does this broadcasted MCE come from?
>
> The crash dump code triggered it? Or it happened before the panic()?
>
> Are you talking about an *actual* sequence which you're experiencing on
> real hw or is this something hypothetical?
>

It occurred on real hardware when testing crash dump.

1) SysRq-c was injected for the test in 1st kernel
[ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for kdump
   [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789 ro console=ttyS1,115200 nmi_watchdog=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug disable_cpu_apicid=0 elfcorehdr=869772K 3) An MCE came to the 1st kernel, timeout panic occurred, and rebooted the machine
    [    6.095706] Dazed and confused, but trying to continue  // message of the 1st kernel
    [   81.655507] Kernel panic - not syncing: Timeout synchronizing machine check over CPUs
    [   82.729324] Shutting down cpus with NMI
    [   82.774539] drm_kms_helper: panic occurred, switching back to text console
    [   82.782257] Rebooting in 10 seconds..

Please see the attached for the full log. Regards, Xunlei


[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 27414 bytes --]

[   49.897279] SysRq : Trigger a crash 
[   49.901218] BUG: unable to handle kernel NULL pointer dereference at           (null) 
[   49.909988] IP: [<ffffffff81397486>] sysrq_handle_crash+0x16/0x20 
[   49.916805] PGD 868add067 PUD 867139067 PMD 0  
[   49.921805] Oops: 0002 [#1] SMP  
[   49.925432] Modules linked in: ipmi_devintf intel_powerclamp coretemp intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt sb_edac iTCO_vendor_support ntb mei_me pcspkr edac_core ioatdma lpc_ich i2c_i801 ipmi_si mei mfd_core shpchp dca ipmi_msghandler acpi_pad acpi_power_meter xfs sd_mod sr_mod crc_t10dif cdrom crct10dif_common usb_storage mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ata_generic ttm bnx2x pata_acpi mdio drm ata_piix ptp libata i2c_core pps_core libcrc32c 
[   49.984994] CPU: 9 PID: 9463 Comm: do-test.sh Not tainted 3.10.0-229.el7.x86_64 #1 
[   49.993456] Hardware name: NEC Express5800/B120d-h [N8400-126Y]/G7LDV, BIOS 4.6.2013 10/24/2012 
[   50.003164] task: ffff880433700000 ti: ffff8808653b8000 task.ti: ffff8808653b8000 
[   50.011514] RIP: 0010:[<ffffffff81397486>]  [<ffffffff81397486>] sysrq_handle_crash+0x16/0x20 
[   50.021045] RSP: 0018:ffff8808653bbe80  EFLAGS: 00010046 
[   50.026976] RAX: 000000000000000f RBX: ffffffff819c18a0 RCX: 0000000000000000 
[   50.034939] RDX: 0000000000000000 RSI: ffff88087fc2d488 RDI: 0000000000000063 
[   50.042908] RBP: ffff8808653bbe80 R08: 0000000000000092 R09: 0000000000000608 
[   50.050870] R10: 0000000000000607 R11: 0000000000000003 R12: 0000000000000063 
[   50.058837] R13: 0000000000000246 R14: 0000000000000007 R15: 0000000000000000 
[   50.066799] FS:  00007f0faaf54740(0000) GS:ffff88087fc20000(0000) knlGS:0000000000000000 
[   50.075828] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[   50.082244] CR2: 0000000000000000 CR3: 0000000866d07000 CR4: 00000000000407e0 
[   50.090212] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[   50.098173] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
[   50.106133] Stack: 
[   50.108388]  ffff8808653bbeb8 ffffffff81397c32 0000000000000002 00007f0faaf58000 
[   50.116671]  ffff8808653bbf48 0000000000000002 0000000000000000 ffff8808653bbed0 
[   50.124963]  ffffffff8139810f ffff8804674a6540 ffff8808653bbef0 ffffffff8122de0d 
[   50.133257] Call Trace: 
[   50.135993]  [<ffffffff81397c32>] __handle_sysrq+0xa2/0x170 
[   50.142219]  [<ffffffff8139810f>] write_sysrq_trigger+0x2f/0x40 
[   50.148841]  [<ffffffff8122de0d>] pro c_reg_write+0x3] Code: eb 9b 45 01 f4 45 39 65 34 75 e5 4c 89 ef e8 e2 f7 ff ff eb db 66 66 66 66 90 55 c7 05 50 d7 59 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 31 c0 c7 05 ce  
[   50.194758] RIP  [<ffffffff81397486>] sysrq_handle_crash+0x16/0x20 
[   50.201669]  RSP <ffff8808653bbe80> 
[   50.205558] CR2: 0000000000000000 
[    0.000000] Initializing cgroup subsys cpuset 
[    0.000000] Initializing cgroup subsys cpu 
[    0.000000] Initializing cgroup subsys cpuacct 
[    0.000000] Linux version 3.10.0-229.el7.x86_64 (mockbuild@x86-035.build.eng.bos.redhat.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC) ) #1 SMP Thu Jan 29 18:37:38 EST 2015 
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789 ro console=ttyS1,115200 nmi_watchdog=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug disable_cpu_apicid=0 elfcorehdr=869772K 
[    0.000000] e820: BIOS-provided physical RAM map: 
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved 
[    0.000000] BIOS-e820: [mem 0x0000000000001000-0x0000000000099fff] usable 
[    0.000000] BIOS-e820: [mem 0x000000000009a000-0x000000000009ffff] reserved 
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved 
[    0.000000] BIOS-e820: [mem 0x000000002b000000-0x0000000035162fff] usable 
[    0.000000] BIOS-e820: [mem 0x000000007cf6d000-0x000000007d08cfff] ACPI NVS 
[    0.000000] BIOS-e820: [mem 0x000000007d08d000-0x000000007d109fff] ACPI data 
[    0.000000] BIOS-e820: [mem 0x000000007d10a000-0x000000007de92fff] reserved 
[    0.000000] BIOS-e820: [mem 0x000000007de93000-0x000000007de9bfff] ACPI NVS 
[    0.000000] BIOS-e820: [mem 0x000000007de9c000-0x000000007df3efff] reserved 
[    0.000000] BIOS-e820: [mem 0x000000007df3f000-0x000000007e393fff] ACPI NVS 
[    0.000000] BIOS-e820: [mem 0x000000007e394000-0x000000007ed9bfff] reserved 
[    0.000000] BIOS-e820: [mem 0x000000007ed9c000-0x000000007f7a7fff] ACPI NVS 
[    0.000000] BIOS-e820: [mem 0x000000007f7a8000-0x000000007f7fffff] reserved 
[    0.000000] BIOS-e820: [mem 0x0000000080000000-0x000000008fffffff] reserved 
[    0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed3ffff] reserved 
[    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved 
[    0.000000] NX (Execute Disable) protection: active 
[    0.000000] SMBIOS 2.7 present. 
[    0.000000] No AGP bridge found 
[    0.000000] e820: last_pfn = 0x35163 max_arch_pfn = 0x400000000 
[    0.000000] x86 PAT enabled: cpu 0, old 0x7010600070106, new 0x7010600070106 
[    0.000000] x2apic enabled by BIOS, switching to x2apic ops 
[    0.000000] found SMP MP-table at [mem 0x000fce70-0x000fce7f] mapped at [ffff8800000fce70] 
[    0.000000] Using GB pages for direct mapping 
[    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff] 
[    0.000000] init_memory_mapping: [mem 0x34e00000-0x34ffffff] 
[    0.000000] init_memory_mapping: [mem 0x34000000-0x34dfffff] 
[    0.000000] init_memory_mapping: [mem 0x2b000000-0x33ffffff] 
[    0.000000] init_memory_mapping: [mem 0x35000000-0x35162fff] 
[    0.000000] RAMDISK: [mem 0x31f7a000-0x32ffffff] 
[    0.000000] ACPI: RSDP 00000000000f0450 00024 (v02 NEC   ) 
[    0.000000] ACPI: XSDT 000000007d08d080 00084 (v01 NEC    SVWSPD21 01072009 AMI  00010013) 
[    0.000000] ACPI: FACP 000000007d098570 000F4 (v04 NEC    SVWSPD21 01072009 AMI  00010013) 
[    0.000000] ACPI: DSDT 000000007d08d198 0B3D3 (v02 NEC    SVWSPD21 00000015 INTL 20091112) 
[    0.000000] ACPI: FACS 000000007f7a5f80 00040 
[    0.000000] ACPI: APIC 000000007d098668 00224 (v03 NEC    SVWSPD21 01072009 AMI  00010013) 
[    0.000000] ACPI: MCFG 000000007d098890 0003C (v01 NEC    SVWSPD21 01072009 MSFT 00000097) 
[    0.000000] ACPI: SRAT 000000007d0988d0 004B0 (v01 NEC    SVWSPD21 00000001 AMI. 00000000) 
[    0.000000] ACPI: SLIT 000000007d098d80 00030 (v01 NEC    SVWSPD21 00000000 AMI. 00000000) 
[    0.000000] ACPI: HPET 000000007d098db0 00038 (v01 NEC    SVWSPD21 01072009 AMI. 00000005) 
[    0.000000] ACPI: PRAD 000000007d098de8 000BE (v02 PRADID  PRADTID 00000001 MSFT 03000001) 
[    0.000000] ACPI: SSDT 000000007d098ea8 70104 (v02 NEC    SVWSPD21 00004000 INTL 20090903) 
[    0.000000] ACPI: SLIC 000000007d108fb0 00176 (v01 NEC    SVWSPD21 00000000 CAS  00000001) 
[    0.000000] ACPI: SPCR 000000007d109128 00050 (v01 NEC    SVWSPD21 01072009 AMI. 00000005) 
[    0.000000] ACPI: DMAR 000000007d109178 00138 (v01 NEC    SVWSPD21 00000001 INTL 00000001) 
[    0.000000] ACPI: BERT 000000007d1092b0 00030 (v01 NEC    SVWSPD21 00000000      00000000) 
[    0.000000] Setting APIC routing to cluster x2apic. 
[    0.000000] NUMA turned off 
[    0.000000] Faking a node at [mem 0x0000000000000000-0x0000000035162fff] 
[    0.000000] Initmem setup node 0 [mem 0x00000000-0x35162fff] 
[    0.000000]   NODE_DATA [mem 0x3513c000-0x35162fff] 
[    0.000000] Zone ranges: 
[    0.000000]   DMA      [mem 0x00001000-0x00ffffff] 
[    0.000000]   DMA32    [mem 0x01000000-0xffffffff] 
[    0.000000]   Normal   empty 
[    0.000000] Movable zone start for each node 
[    0.000000] Early memory node ranges 
[    0.000000]   node   0: [mem 0x00001000-0x00099fff] 
[    0.000000]   node   0: [mem 0x2b000000-0x35162fff] 
[    0.000000] ACPI: PM-Timer IO Port: 0x408 
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) 
[    0.000000] ACPI: Disabling requested cpu. Processor 0/0x0 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 1/0x2 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 2/0x4 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 3/0x6 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 4/0x8 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 5/0xa ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0c] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 6/0xc ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0e] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 7/0xe ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu.  Processor 8/0x20 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x12] lapic_id[0x22] enabled) 
[    0.000000] ACPI: LAPIC (acpi_id[0x14] lapic_id[0x24] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 10/0x24 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x16] lapic_id[0x26] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 11/0x26 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x18] lapic_id[0x28] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 12/0x28 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x1a] lapic_id[0x2a] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 13/0x2a ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x1c] lapic_id[0x2c] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 14/0x2c ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x1e] lapic_id[0x2e] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 15/0x2e ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 16/0x1 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 17/0x3 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 18/0x5 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 19/0x7 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 20/0x9 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 21/0xb ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0d] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 22/0xd ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0f] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 23/0xf ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x11] lapic_id[0x21] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 24/0x21 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x13] lapic_id[0x23] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 25/0x23 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x15] lapic_id[0x25] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 26/0x25 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x17] lapic_id[0x27] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 27/0x27 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x19] lapic_id[0x29] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 28/0x29 ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x1b] lapic_id[0x2b] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 29/0x2b ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x1d] lapic_id[0x2d] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 30/0x2d ignored. 
[    0.000000] ACPI: LAPIC (acpi_id[0x1f] lapic_id[0x2f] enabled) 
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 31/0x2f ignored. 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0a] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0c] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0e] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x10] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x12] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x14] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x16] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x18] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1a] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1c] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1e] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x09] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0b] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0d] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0f] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x11] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x13] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x15] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x17] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x19] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1b] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1d] high edge lint[0x1]) 
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x1f] high edge lint[0x1]) 
[    0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0]) 
[    0.000000] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23 
[    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec01000] gsi_base[24]) 
[    0.000000] IOAPIC[1]: apic_id 2, version 32, address 0xfec01000, GSI 24-47 
[    0.000000] ACPI: IOAPIC (id[0x03] address[0xfec40000] gsi_base[48]) 
[    0.000000] IOAPIC[2]: apic_id 3, version 32, address 0xfec40000, GSI 48-71 
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) 
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) 
[    0.000000] Using ACPI (MADT) for SMP configuration information 
[    0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000 
[    0.000000] smpboot: 32 Processors exceeds NR_CPUS limit of 1 
[    0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs 
[    0.000000] PM: Registered nosave memory: [mem 0x0009a000-0x0009ffff] 
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000dffff] 
[    0.000000] PM: Registered nosave memory: [mem 0x000e0000-0x000fffff] 
[    0.000000] PM: Registered nosave memory: [mem 0x00100000-0x2affffff] 
[    0.000000] e820: [mem 0x90000000-0xfed1bfff] available for PCI devices 
[    0.000000] Booting paravirtualized kernel on bare hardware 
[    0.000000] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:1 nr_cpu_ids:1 nr_node_ids:1 
[    0.000000] PERCPU: Embedded 28 pages/cpu @ffff880034e00000 s82752 r8192 d23744 u2097152 
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 40798 
[    0.000000] Policy zone: DMA32 
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789 ro console=ttyS1,115200 nmi_watchdog=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug disable_cpu_apicid=0 elfcorehdr=869772K 
[    0.000000] Misrouted IRQ fixup and polling support enabled 
[    0.000000] This may significantly impact system performance 
[    0.000000] Disabling memory control group subsystem 
[    0.000000] PID hash table entries: 1024 (order: 1, 8192 bytes) 
[    0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340 
[    0.000000] Checking aperture... 
[    0.000000] No AGP bridge found 
[    0.000000] Memory: 126944k/869772k available (6241k kernel code, 703900k absent, 38928k reserved, 4181k data, 1604k init) 
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 
[    0.000000] Hierarchical RCU implementation. 
[    0.000000] 	RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=1. 
[    0.000000] 	Experimental no-CBs for all CPUs 
[    0.000000] 	Experimental no-CBs CPUs: 0. 
[    0.000000] NR_IRQS:327936 nr_irqs:256 16 
[    0.000000] Spurious LAPIC timer interrupt on cpu 0 
[    0.000000] Console: colour dummy device 80x25 
[    0.000000] console [ttyS1] enabled 
[    0.001000] tsc: Fast TSC calibration using PIT 
[    0.002000] tsc: Detected 2000.018 MHz processor 
[    0.000002] Calibrating delay loop (skipped), value calculated using timer frequency.. 4000.03 BogoMIPS (lpj=2000018) 
[    0.011888] pid_max: default: 32768 minimum: 301 
[    0.017077] Security Framework initialized 
[    0.021662] SELinux:  Initializing. 
[    0.025640] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes) 
[    0.033499] Inode-cache hash table entries: 16384 (order: 5, 131072 bytes) 
[    0.041197] Mount-cache hash table entries: 4096 
[    0.046576] Initializing cgroup subsys memory 
[    0.051455] Initializing cgroup subsys devices 
[    0.056422] Initializing cgroup subsys freezer 
[    0.061390] Initializing cgroup subsys net_cls 
[    0.066364] Initializing cgroup subsys blkio 
[    0.071145] Initializing cgroup subsys perf_event 
[    0.076404] Initializing cgroup subsys hugetlb 
[    0.081408] CPU: Physical Processor ID: 1 
[    0.085890] CPU: Processor Core ID: 1 
[    0.089991] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' 
[    0.089991] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8) 
[    0.104561] Last level iTLB entries: 4KB 512, 2MB 0, 4MB 0 
[    0.104561] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32 
[    0.104561] tlb_flushall_shift: 6 
[    0.126986] Freeing SMP alternatives: 24k freed 
[    0.133595] ACPI: Core revision 20130517 
[    0.154363] ACPI: All ACPI Tables successfully acquired 
[    0.160257] ftrace: allocating 23909 entries in 94 pages 
[    0.179567] dmar: Host address width 46 
[    0.183858] dmar: DRHD base: 0x000000fbffe000 flags: 0x0 
[    0.189795] dmar: IOMMU 0: reg_base_addr fbffe000 ver 1:0 cap d2078c106f0462 ecap f020fe 
[    0.198842] dmar: DRHD base: 0x000000dfffc000 flags: 0x1 
[    0.204782] dmar: IOMMU 1: reg_base_addr dfffc000 ver 1:0 cap d2078c106f0462 ecap f020fe 
[    0.213823] dmar: RMRR base: 0x0000007dea3000 end: 0x0000007dedafff 
[    0.220834] dmar: ATSR flags: 0x0 
[       0.838008] pci 0000:00:02.2: PCI bridge to [bus 01] 
[    0.843617] pci 0000:00:03.0: PCI bridge to [bus 20-3c] 
[    0.849517] pci 0000:00:11.0: PCI bridge to [bus 50-5f] 
[    0.855410] pci 0000:00:1c.0: PCI bridge to [bus 60-61] 
[    0.863257] pci 0000:00:1c.7: PCI bridge to [bus 62-67] 
[    0.869171] pci 0000:00:1e.0: PCI bridge to [bus 6a] (subtractive decode) 
[    0.876828] acpi PNP0A08:00: Disabling ASPM (FADT indicates it is unsupported) 
[    0.885139] ACPI: PCI Root Bridge [UNC0] (domain 0000 [bus 7f]) 
[    0.891750] acpi PNP0A03:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] 
[    0.900880] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM 
[    0.908299] PCI host bridge to bus 0000:7f 
[    0.912871] pci_bus 0000:7f: root bus resource [bus 7f] 
[    0.920945] ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 80-fe]) 
[    0.927846] acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] 
[    0.937106] acpi PNP0A08:01: _OSC: platform does not support [AER] 
[    0.944123] acpi PNP0A08:01: _OSC: OS now controls [PCIeHotplug PME PCIeCapability] 
[    0.952860] PCI host bridge to bus 0000:80 
[    0.957433] pci_bus 0000:80: root bus resource [bus 80-fe] 
[    0.963555] pci_bus 0000:80: root bus resource [io  0xa000-0xffff] 
[    0.970451] pci_bus 0000:80: root bus resource [mem 0xe0000000-0xfbffffff] 
[    0.979463] pci 0000:80:00.0: PCI bridge to [bus 81-8f] 
[    0.985306] acpi PNP0A08:01: Disabling ASPM (FADT indicates it is unsupported) 
[    0.993424] ACPI: PCI Root Bridge [UNC1] (domain 0000 [bus ff]) 
[    1.000033] acpi PNP0A03:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] 
[    1.009155] acpi PNP0A03:01: _OSC failed (AE_NOT_FOUND); disabling ASPM 
[    1.016580] PCI host bridge to bus 0000:ff 
[    1.021153] pci_bus 0000:ff: root bus resource [bus ff] 
[    1.029013] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 *5 6 7 12 14 15), disabled. 
[    1.037527] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 *7 12 14 15), disabled. 
[    1.046043] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 *4 5 6 12 14 15), disabled. 
[    1.054354] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 *5 6 12 14 15), disabled. 
[    1.062659] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 6 7 10 12 14 15) *0, disabled. 
[    1.071462] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 6 7 10 12 14 15) *0, disabled. 
[    1.080266] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 6 7 *10 12 14 15), disabled. 
[    1.088873] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 6 7 *10 12 14 15), disabled. 
[    1.098586] acpi LNXCPU:12: BIOS reported wrong ACPI id 0 for the processor 
[    1.107217] ACPI: Enabled 3 GPEs in block 00 to 3F 
[    1.112688] vgaarb: device added: PCI:0000:62:00.0,decodes=io+mem,owns=io+mem,locks=none 
[    1.121727] vgaarb: loaded 
[    1.124745] vgaarb: bridge control possible 0000:62:00.0 
[    1.130752] SCSI subsystem initialized 
[    1.134961] ACPI: bus type USB registered 
[    1.139448] usbcore: registered new interface driver usbfs 
[    1.145585] usbcore: registered new interface driver hub 
[    1.151536] usbcore: registered new device driver usb 
[    1.157261] PCI: Using ACPI for IRQ routing 
[    1.166361] NetLabel: Initializing 
[    1.170164] NetLabel:  domain hash size = 128 
[    1.175024] NetLabel:  protocols = UNLABELED CIPSOv4 
[    1.180585] NetLabel:  unlabeled traffic allowed by default 
[    1.186864] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0, 0, 0, 0, 0 
[    1.193861] hpet0: 8 comparators, 64-bit 14.318180 MHz counter 
[    1.202396] Switching to clocksource hpet 
[    1.211432] pnp: PnP ACPI init 
[    1.214855] ACPI: bus type PNP registered 
[    1.219416] system 00:00: [mem 0xfc000000-0xfcffffff] has been reserved 
[    1.226803] system 00:00: [mem 0xfd000000-0xfdffffff] has been reserved 
[    1.234190] system 00:00: [mem 0xfe000000-0xfeafffff] has been reserved 
[    1.241580] system 00:00: [mem 0xfeb00000-0xfebfffff] has been reserved 
[    1.248970] system 00:00: [mem 0xfed00400-0xfed3ffff] could not be reserved 
[    1.256743] system 00:00: [mew full-speed USB device number 3 using ehci-pci 
[     4.642333] b53269] bnx2x 0000:02:00.2: part number 394D4342-31383735-30315430-473030 
[    4.754097] usb 2-1.1: New USB device found, idVendor=046b, idProduct=ff10 
[    4.761776] usb 2-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 
[    4.769944] usb 2-1.1: Product: Virtual Keyboard and Mouse 
[    4.776078] usb 2-1.1: Manufacturer: American Megatrends Inc. 
[    4.784918] bnx2x 0000:02:00.3: msix capability found 
[    4.794890] input: American Megatrends Inc. Virtual Keyboard and Mouse as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.1/2-1.1:1.0/input/input2 
[    4.810287] bnx2x 0000:02:00.3: part number 394D4342-31383735-30315430-473030 
[    4.832949] hid-generic 0003:046B:FF10.0001: input,hidraw0: USB HID v1.10 Keyboard [American Megatrends Inc. Virtual Keyboard and Mouse] on usb-0000:00:1d.0-1.1/input0 
[    4.868726] input: American Megatrends Inc. Virtual Keyboard and Mouse as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.1/2-1.1:1.1/input/input3 
[    4.897947] hid-generic 0003:046B:FF10.0002: input,hidraw1: USB HID v1.10 Mouse [American Megatrends Inc. Virtual Keyboard and Mouse] on usb-0000:00:1d.0-1.1/input1 
[    4.915125] bnx2x 0000:02:00.4: msix capability found 
[    4.925275] bnx2x 0000:02:00.4: part number 394D4342-31383735-30315430-473030 
[    5.020362] usb 2-1.3: new high-speed USB device number 4 using ehci-pci 
[    5.028219] bnx2x 0000:02:00.5: msix capability found 
[    5.038287] bnx2x 0000:02:00.5: part number 394D4342-31383735-30315430-473030 
[     5.135374] b0.6: msix capability found 
[     5.146276] b[    5.191222] usb 2-1.3: New USB device found, idVendor=046b, idProduct=ff92 
[    5.198902] usb 2-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=4 
[    5.207062] usb 2-1.3: Product: Composite Device 
[    5.212220] usb 2-1.3: Manufacturer: American Megatrends Inc. 
[    5.218638] usb 2-1.3: SerialNumber: AAAABBBBCCCC4 
[    5.279368] bnx2x 0000:02:00.7: msix capability found 
[    5.292279] bnx2x 0000:02:00.7: part number 394D4342-31383735-30315430-473030 
[    5.430381] bnx2x 0000:01:00.0: msix capability found 
[    5.441297] bnx2x 0000:01:00.0: part number 394D4342-31383735-30315430-473030 
[    5.504653] bnx2x 0000:01:00.1: msix capability found 
[    5.515280] bnx2x 0000:01:00.1: part number 394D4342-31383735-30315430-473030 
[     6.082783] Upower saving mode enabled? 
[    6.095706] Dazed and confused, but trying to continue 
[   81.655507] Kernel panic - not syncing: Timeout synchronizing machine check over CPUs 
[   82.729324] Shutting down cpus with NMI 
[   82.774539] drm_kms_helper: panic occurred, switching back to text console 
[   82.782257] Rebooting in 10 seconds.. 
[   92.787401] ACPI MEMORY or I/O RESET_REG. 
[   95.233944] ACPI MEMORY or I/O RESET_REG.


[-- Attachment #3: Type: text/plain, Size: 143 bytes --]

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-23 17:51           ` Borislav Petkov
@ 2017-01-24  1:46             ` Xunlei Pang
  -1 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-01-24  1:46 UTC (permalink / raw)
  To: Borislav Petkov, Luck, Tony
  Cc: xlpang, x86, linux-kernel, kexec, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On 01/24/2017 at 01:51 AM, Borislav Petkov wrote:
> Hey Tony,
>
> a "welcome back" is in order? :-)
>
> On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
>> If the system had experienced some memory corruption, but
>> recovered ... then there would be some pages sitting around
>> that the old kernel had marked as POISON and stopped using.
>> The kexec'd kernel doesn't know about these, so may touch that
>> memory while taking a crash dump ...
> Hmm, pass a list of poisoned pages to the kdump kernel so as not to
> touch. Looks like there's already functionality for that:
>
> "makedumpfile can exclude the following types of pages while copying
> VMCORE to DUMPFILE, and a user can choose which type of pages will be
> excluded.
>
> - Pages filled with zero
> - Cache pages
> - User process data pages
> - Free pages"
>
>  (there is a makedumpfile manpage somewhere)
>
> And apparently crash knows about poisoned pages and handles them:
>
> static int __init crash_save_vmcoreinfo_init(void)
> {
> 	...
> #ifdef CONFIG_MEMORY_FAILURE
>         VMCOREINFO_NUMBER(PG_hwpoison);
> #endif
>
> so if that works, the kexeced kernel should know about that list.

>From the log in my previous reply, MCE occurred before makedumpfile dumping,
so I guess if the poisoned ones belong to the crash reserved memory or other
type of events?

Besides, some kdump kernel may not use makedumpfile, for example a simple "cp"
is also allowed to process "/proc/vmcore".

>
>> and then you have a broadcast machine check (on older[1] Intel CPUs
>> that don't support local machine check).
> Right.
>
>> This is hard to work around. You really need all the CPUs to have set
>> CR4.MCE=1 (if any didn't, then they will force a reset when they see
>> the machine check). Also you need to make sure that they jump to the
>> copy of do_machine_check() in the new kernel, not the old kernel.
> Doesn't matter, right? The new copy is as clueless as the old one about
> those MCEs.
>

It's the code in mce_start(), it waits for all the online cpus including the cpus
that kdump boots on to synchronize.

So for new mce handler of kdump kernel, it is fine as the number of online cpus
is correct; as for old mce handler of 1st kernel, it's not true because some cpus
which are regarded online from 1st kernel's view are running the 2nd kernel now,
they can't respond to the old mce handler which will timeout the old mce handler.

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-24  1:46             ` Xunlei Pang
  0 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-01-24  1:46 UTC (permalink / raw)
  To: Borislav Petkov, Luck, Tony
  Cc: Prarit Bhargava, Kiyoshi Ueda, xlpang, x86, kexec, linux-kernel,
	Ingo Molnar, Junichi Nomura, Naoya Horiguchi, Dave Young

On 01/24/2017 at 01:51 AM, Borislav Petkov wrote:
> Hey Tony,
>
> a "welcome back" is in order? :-)
>
> On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
>> If the system had experienced some memory corruption, but
>> recovered ... then there would be some pages sitting around
>> that the old kernel had marked as POISON and stopped using.
>> The kexec'd kernel doesn't know about these, so may touch that
>> memory while taking a crash dump ...
> Hmm, pass a list of poisoned pages to the kdump kernel so as not to
> touch. Looks like there's already functionality for that:
>
> "makedumpfile can exclude the following types of pages while copying
> VMCORE to DUMPFILE, and a user can choose which type of pages will be
> excluded.
>
> - Pages filled with zero
> - Cache pages
> - User process data pages
> - Free pages"
>
>  (there is a makedumpfile manpage somewhere)
>
> And apparently crash knows about poisoned pages and handles them:
>
> static int __init crash_save_vmcoreinfo_init(void)
> {
> 	...
> #ifdef CONFIG_MEMORY_FAILURE
>         VMCOREINFO_NUMBER(PG_hwpoison);
> #endif
>
> so if that works, the kexeced kernel should know about that list.

From the log in my previous reply, MCE occurred before makedumpfile dumping,
so I guess if the poisoned ones belong to the crash reserved memory or other
type of events?

Besides, some kdump kernel may not use makedumpfile, for example a simple "cp"
is also allowed to process "/proc/vmcore".

>
>> and then you have a broadcast machine check (on older[1] Intel CPUs
>> that don't support local machine check).
> Right.
>
>> This is hard to work around. You really need all the CPUs to have set
>> CR4.MCE=1 (if any didn't, then they will force a reset when they see
>> the machine check). Also you need to make sure that they jump to the
>> copy of do_machine_check() in the new kernel, not the old kernel.
> Doesn't matter, right? The new copy is as clueless as the old one about
> those MCEs.
>

It's the code in mce_start(), it waits for all the online cpus including the cpus
that kdump boots on to synchronize.

So for new mce handler of kdump kernel, it is fine as the number of online cpus
is correct; as for old mce handler of 1st kernel, it's not true because some cpus
which are regarded online from 1st kernel's view are running the 2nd kernel now,
they can't respond to the old mce handler which will timeout the old mce handler.

Regards,
Xunlei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-24  1:46             ` Xunlei Pang
@ 2017-01-24  1:51               ` Xunlei Pang
  -1 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-01-24  1:51 UTC (permalink / raw)
  To: Borislav Petkov, Luck, Tony
  Cc: xlpang, x86, linux-kernel, kexec, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On 01/24/2017 at 09:46 AM, Xunlei Pang wrote:
> On 01/24/2017 at 01:51 AM, Borislav Petkov wrote:
>> Hey Tony,
>>
>> a "welcome back" is in order? :-)
>>
>> On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
>>> If the system had experienced some memory corruption, but
>>> recovered ... then there would be some pages sitting around
>>> that the old kernel had marked as POISON and stopped using.
>>> The kexec'd kernel doesn't know about these, so may touch that
>>> memory while taking a crash dump ...
>> Hmm, pass a list of poisoned pages to the kdump kernel so as not to
>> touch. Looks like there's already functionality for that:
>>
>> "makedumpfile can exclude the following types of pages while copying
>> VMCORE to DUMPFILE, and a user can choose which type of pages will be
>> excluded.
>>
>> - Pages filled with zero
>> - Cache pages
>> - User process data pages
>> - Free pages"
>>
>>  (there is a makedumpfile manpage somewhere)
>>
>> And apparently crash knows about poisoned pages and handles them:
>>
>> static int __init crash_save_vmcoreinfo_init(void)
>> {
>> 	...
>> #ifdef CONFIG_MEMORY_FAILURE
>>         VMCOREINFO_NUMBER(PG_hwpoison);
>> #endif
>>
>> so if that works, the kexeced kernel should know about that list.
> From the log in my previous reply, MCE occurred before makedumpfile dumping,
> so I guess if the poisoned ones belong to the crash reserved memory or other
> type of events?

Another possibility may be from any system.reserved/pcie memory
which are shared between 1st and 2nd kernel.

>
> Besides, some kdump kernel may not use makedumpfile, for example a simple "cp"
> is also allowed to process "/proc/vmcore".
>
>>> and then you have a broadcast machine check (on older[1] Intel CPUs
>>> that don't support local machine check).
>> Right.
>>
>>> This is hard to work around. You really need all the CPUs to have set
>>> CR4.MCE=1 (if any didn't, then they will force a reset when they see
>>> the machine check). Also you need to make sure that they jump to the
>>> copy of do_machine_check() in the new kernel, not the old kernel.
>> Doesn't matter, right? The new copy is as clueless as the old one about
>> those MCEs.
>>
> It's the code in mce_start(), it waits for all the online cpus including the cpus
> that kdump boots on to synchronize.
>
> So for new mce handler of kdump kernel, it is fine as the number of online cpus
> is correct; as for old mce handler of 1st kernel, it's not true because some cpus
> which are regarded online from 1st kernel's view are running the 2nd kernel now,
> they can't respond to the old mce handler which will timeout the old mce handler.
>
> Regards,
> Xunlei

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-24  1:51               ` Xunlei Pang
  0 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-01-24  1:51 UTC (permalink / raw)
  To: Borislav Petkov, Luck, Tony
  Cc: Prarit Bhargava, Kiyoshi Ueda, xlpang, x86, kexec, linux-kernel,
	Ingo Molnar, Junichi Nomura, Naoya Horiguchi, Dave Young

On 01/24/2017 at 09:46 AM, Xunlei Pang wrote:
> On 01/24/2017 at 01:51 AM, Borislav Petkov wrote:
>> Hey Tony,
>>
>> a "welcome back" is in order? :-)
>>
>> On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
>>> If the system had experienced some memory corruption, but
>>> recovered ... then there would be some pages sitting around
>>> that the old kernel had marked as POISON and stopped using.
>>> The kexec'd kernel doesn't know about these, so may touch that
>>> memory while taking a crash dump ...
>> Hmm, pass a list of poisoned pages to the kdump kernel so as not to
>> touch. Looks like there's already functionality for that:
>>
>> "makedumpfile can exclude the following types of pages while copying
>> VMCORE to DUMPFILE, and a user can choose which type of pages will be
>> excluded.
>>
>> - Pages filled with zero
>> - Cache pages
>> - User process data pages
>> - Free pages"
>>
>>  (there is a makedumpfile manpage somewhere)
>>
>> And apparently crash knows about poisoned pages and handles them:
>>
>> static int __init crash_save_vmcoreinfo_init(void)
>> {
>> 	...
>> #ifdef CONFIG_MEMORY_FAILURE
>>         VMCOREINFO_NUMBER(PG_hwpoison);
>> #endif
>>
>> so if that works, the kexeced kernel should know about that list.
> From the log in my previous reply, MCE occurred before makedumpfile dumping,
> so I guess if the poisoned ones belong to the crash reserved memory or other
> type of events?

Another possibility may be from any system.reserved/pcie memory
which are shared between 1st and 2nd kernel.

>
> Besides, some kdump kernel may not use makedumpfile, for example a simple "cp"
> is also allowed to process "/proc/vmcore".
>
>>> and then you have a broadcast machine check (on older[1] Intel CPUs
>>> that don't support local machine check).
>> Right.
>>
>>> This is hard to work around. You really need all the CPUs to have set
>>> CR4.MCE=1 (if any didn't, then they will force a reset when they see
>>> the machine check). Also you need to make sure that they jump to the
>>> copy of do_machine_check() in the new kernel, not the old kernel.
>> Doesn't matter, right? The new copy is as clueless as the old one about
>> those MCEs.
>>
> It's the code in mce_start(), it waits for all the online cpus including the cpus
> that kdump boots on to synchronize.
>
> So for new mce handler of kdump kernel, it is fine as the number of online cpus
> is correct; as for old mce handler of 1st kernel, it's not true because some cpus
> which are regarded online from 1st kernel's view are running the 2nd kernel now,
> they can't respond to the old mce handler which will timeout the old mce handler.
>
> Regards,
> Xunlei


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-23 18:14               ` Borislav Petkov
@ 2017-01-24  2:33                 ` Xunlei Pang
  -1 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-01-24  2:33 UTC (permalink / raw)
  To: Borislav Petkov, Luck, Tony
  Cc: Prarit Bhargava, Kiyoshi Ueda, xlpang, x86, kexec, linux-kernel,
	Ingo Molnar, Junichi Nomura, Naoya Horiguchi, Dave Young

On 01/24/2017 at 02:14 AM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 10:01:53AM -0800, Luck, Tony wrote:
>> will ignore the machine check on the other cpus ... assuming
>> that "cpu_is_offline(smp_processor_id())" does the right thing
>> in the kexec case where this is an "old" cpu that isn't online
>> in the new kernel.
> Nice. And kdump did do the dumping on one CPU, AFAIR. So we should be
> good there.
>

"nr_cpus=N" will consume more memory, using very large N is almost
impossible for kdump to boot with considering the limited crash memory
reserved.

For some large machine, nr_cpus=1 might not be enough, we have to use
nr_cpus=4 or more, it is also helpful for the vmcore parallel dumping :-)

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-24  2:33                 ` Xunlei Pang
  0 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-01-24  2:33 UTC (permalink / raw)
  To: Borislav Petkov, Luck, Tony
  Cc: Prarit Bhargava, Kiyoshi Ueda, kexec, x86, xlpang, linux-kernel,
	Ingo Molnar, Junichi Nomura, Naoya Horiguchi, Dave Young

On 01/24/2017 at 02:14 AM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 10:01:53AM -0800, Luck, Tony wrote:
>> will ignore the machine check on the other cpus ... assuming
>> that "cpu_is_offline(smp_processor_id())" does the right thing
>> in the kexec case where this is an "old" cpu that isn't online
>> in the new kernel.
> Nice. And kdump did do the dumping on one CPU, AFAIR. So we should be
> good there.
>

"nr_cpus=N" will consume more memory, using very large N is almost
impossible for kdump to boot with considering the limited crash memory
reserved.

For some large machine, nr_cpus=1 might not be enough, we have to use
nr_cpus=4 or more, it is also helpful for the vmcore parallel dumping :-)

Regards,
Xunlei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-24  1:27         ` Xunlei Pang
@ 2017-01-24 12:22           ` Borislav Petkov
  -1 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-01-24 12:22 UTC (permalink / raw)
  To: xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On Tue, Jan 24, 2017 at 09:27:45AM +0800, Xunlei Pang wrote:
> It occurred on real hardware when testing crash dump.
> 
> 1) SysRq-c was injected for the test in 1st kernel
> [ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for kdump
>    [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789

Yeah, no, I'm not debugging the RH Frankenstein kernel.

Please retrigger this with latest tip/master first.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-24 12:22           ` Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-01-24 12:22 UTC (permalink / raw)
  To: xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On Tue, Jan 24, 2017 at 09:27:45AM +0800, Xunlei Pang wrote:
> It occurred on real hardware when testing crash dump.
> 
> 1) SysRq-c was injected for the test in 1st kernel
> [ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for kdump
>    [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789

Yeah, no, I'm not debugging the RH Frankenstein kernel.

Please retrigger this with latest tip/master first.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-24 12:22           ` Borislav Petkov
@ 2017-01-26  6:30             ` Xunlei Pang
  -1 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-01-26  6:30 UTC (permalink / raw)
  To: Borislav Petkov, xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On 01/24/2017 at 08:22 PM, Borislav Petkov wrote:
> On Tue, Jan 24, 2017 at 09:27:45AM +0800, Xunlei Pang wrote:
>> It occurred on real hardware when testing crash dump.
>>
>> 1) SysRq-c was injected for the test in 1st kernel
>> [ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for kdump
>>    [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789
> Yeah, no, I'm not debugging the RH Frankenstein kernel.
>
> Please retrigger this with latest tip/master first.
>

The hardware machine check is hard to reproduce, but the mce code of RHEL7 is quite
the same as that of tip/master, anyway we are able to inject software mce to reproduce it.

It is also clear from the theoretical analysis of the code.

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-26  6:30             ` Xunlei Pang
  0 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-01-26  6:30 UTC (permalink / raw)
  To: Borislav Petkov, xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On 01/24/2017 at 08:22 PM, Borislav Petkov wrote:
> On Tue, Jan 24, 2017 at 09:27:45AM +0800, Xunlei Pang wrote:
>> It occurred on real hardware when testing crash dump.
>>
>> 1) SysRq-c was injected for the test in 1st kernel
>> [ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for kdump
>>    [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789
> Yeah, no, I'm not debugging the RH Frankenstein kernel.
>
> Please retrigger this with latest tip/master first.
>

The hardware machine check is hard to reproduce, but the mce code of RHEL7 is quite
the same as that of tip/master, anyway we are able to inject software mce to reproduce it.

It is also clear from the theoretical analysis of the code.

Regards,
Xunlei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-26  6:30             ` Xunlei Pang
@ 2017-01-26  6:44               ` Borislav Petkov
  -1 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-01-26  6:44 UTC (permalink / raw)
  To: xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On Thu, Jan 26, 2017 at 02:30:02PM +0800, Xunlei Pang wrote:
> The hardware machine check is hard to reproduce, but the mce code of
> RHEL7 is quite the same as that of tip/master, anyway we are able to
> inject software mce to reproduce it.

Please give me your exact steps so that I can try to reproduce it here
too.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-01-26  6:44               ` Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-01-26  6:44 UTC (permalink / raw)
  To: xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On Thu, Jan 26, 2017 at 02:30:02PM +0800, Xunlei Pang wrote:
> The hardware machine check is hard to reproduce, but the mce code of
> RHEL7 is quite the same as that of tip/master, anyway we are able to
> inject software mce to reproduce it.

Please give me your exact steps so that I can try to reproduce it here
too.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-01-26  6:44               ` Borislav Petkov
@ 2017-02-16  5:36                 ` Xunlei Pang
  -1 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-02-16  5:36 UTC (permalink / raw)
  To: Borislav Petkov, xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On 01/26/2017 at 02:44 PM, Borislav Petkov wrote:
> On Thu, Jan 26, 2017 at 02:30:02PM +0800, Xunlei Pang wrote:
>> The hardware machine check is hard to reproduce, but the mce code of
>> RHEL7 is quite the same as that of tip/master, anyway we are able to
>> inject software mce to reproduce it.
> Please give me your exact steps so that I can try to reproduce it here
> too.
>

Hi Borislav,

I tried to use qemu to inject SRAO("mce -b 0 0 0xb100000000000000 0x5 0x0 0x0"),
it works well in 1st kernel, but it doesn't work for 1st kernel after kdump boots(seems
the cpus remain in 1st kernel don't respond to the simulated broadcasting mce).

But in theory, we know cpus belong to kdump kernel can't respond to the
old mce handler, so a single SRAO injection in 1st kernel should be similar.
For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with broadcast
mce supported, and inject SRAO to cpu0 only through qemu monitor
"mce 0 0 0xb100000000000000 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
the machine as follows(running on linux-4.9):
  Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
  Kernel Offset: disabled
  Rebooting in 30 seconds..

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-02-16  5:36                 ` Xunlei Pang
  0 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-02-16  5:36 UTC (permalink / raw)
  To: Borislav Petkov, xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On 01/26/2017 at 02:44 PM, Borislav Petkov wrote:
> On Thu, Jan 26, 2017 at 02:30:02PM +0800, Xunlei Pang wrote:
>> The hardware machine check is hard to reproduce, but the mce code of
>> RHEL7 is quite the same as that of tip/master, anyway we are able to
>> inject software mce to reproduce it.
> Please give me your exact steps so that I can try to reproduce it here
> too.
>

Hi Borislav,

I tried to use qemu to inject SRAO("mce -b 0 0 0xb100000000000000 0x5 0x0 0x0"),
it works well in 1st kernel, but it doesn't work for 1st kernel after kdump boots(seems
the cpus remain in 1st kernel don't respond to the simulated broadcasting mce).

But in theory, we know cpus belong to kdump kernel can't respond to the
old mce handler, so a single SRAO injection in 1st kernel should be similar.
For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with broadcast
mce supported, and inject SRAO to cpu0 only through qemu monitor
"mce 0 0 0xb100000000000000 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
the machine as follows(running on linux-4.9):
  Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
  Kernel Offset: disabled
  Rebooting in 30 seconds..

Regards,
Xunlei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-02-16  5:36                 ` Xunlei Pang
@ 2017-02-16 10:18                   ` Borislav Petkov
  -1 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-02-16 10:18 UTC (permalink / raw)
  To: xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On Thu, Feb 16, 2017 at 01:36:37PM +0800, Xunlei Pang wrote:
> I tried to use qemu to inject SRAO("mce -b 0 0 0xb100000000000000 0x5 0x0 0x0"),
> it works well in 1st kernel, but it doesn't work for 1st kernel after kdump boots(seems
> the cpus remain in 1st kernel don't respond to the simulated broadcasting mce).
> 
> But in theory, we know cpus belong to kdump kernel can't respond to the
> old mce handler, so a single SRAO injection in 1st kernel should be similar.
> For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with broadcast
> mce supported, and inject SRAO to cpu0 only through qemu monitor
> "mce 0 0 0xb100000000000000 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
> the machine as follows(running on linux-4.9):
>   Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler

Sounds to me like you're trying hard to prove some point of yours which
doesn't make much sense to me. And when you say "in theory", that makes
it even less believable. So I remember asking you for exact steps. That
above doesn't read like steps but like some babbling and I've actually
tried to make sense of it for a couple of minutes but failed.

So lemme spell it out for ya. I'd like for you to give me this:

1. Build kernel with this config
2. Boot it in kvm with this settings
3. Do this in the guest
4. Do that in the guest
5. ...
6. ...

And all should be exact commands so that I can do them here on my machine.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-02-16 10:18                   ` Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-02-16 10:18 UTC (permalink / raw)
  To: xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On Thu, Feb 16, 2017 at 01:36:37PM +0800, Xunlei Pang wrote:
> I tried to use qemu to inject SRAO("mce -b 0 0 0xb100000000000000 0x5 0x0 0x0"),
> it works well in 1st kernel, but it doesn't work for 1st kernel after kdump boots(seems
> the cpus remain in 1st kernel don't respond to the simulated broadcasting mce).
> 
> But in theory, we know cpus belong to kdump kernel can't respond to the
> old mce handler, so a single SRAO injection in 1st kernel should be similar.
> For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with broadcast
> mce supported, and inject SRAO to cpu0 only through qemu monitor
> "mce 0 0 0xb100000000000000 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
> the machine as follows(running on linux-4.9):
>   Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler

Sounds to me like you're trying hard to prove some point of yours which
doesn't make much sense to me. And when you say "in theory", that makes
it even less believable. So I remember asking you for exact steps. That
above doesn't read like steps but like some babbling and I've actually
tried to make sense of it for a couple of minutes but failed.

So lemme spell it out for ya. I'd like for you to give me this:

1. Build kernel with this config
2. Boot it in kvm with this settings
3. Do this in the guest
4. Do that in the guest
5. ...
6. ...

And all should be exact commands so that I can do them here on my machine.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-02-16 10:18                   ` Borislav Petkov
@ 2017-02-16 11:52                     ` Xunlei Pang
  -1 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-02-16 11:52 UTC (permalink / raw)
  To: Borislav Petkov, xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On 02/16/2017 at 06:18 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 01:36:37PM +0800, Xunlei Pang wrote:
>> I tried to use qemu to inject SRAO("mce -b 0 0 0xb100000000000000 0x5 0x0 0x0"),
>> it works well in 1st kernel, but it doesn't work for 1st kernel after kdump boots(seems
>> the cpus remain in 1st kernel don't respond to the simulated broadcasting mce).
>>
>> But in theory, we know cpus belong to kdump kernel can't respond to the
>> old mce handler, so a single SRAO injection in 1st kernel should be similar.
>> For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with broadcast
>> mce supported, and inject SRAO to cpu0 only through qemu monitor
>> "mce 0 0 0xb100000000000000 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
>> the machine as follows(running on linux-4.9):
>>   Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
> Sounds to me like you're trying hard to prove some point of yours which
> doesn't make much sense to me. And when you say "in theory", that makes
> it even less believable. So I remember asking you for exact steps. That
> above doesn't read like steps but like some babbling and I've actually
> tried to make sense of it for a couple of minutes but failed.
>
> So lemme spell it out for ya. I'd like for you to give me this:
>
> 1. Build kernel with this config
> 2. Boot it in kvm with this settings
> 3. Do this in the guest
> 4. Do that in the guest
> 5. ...
> 6. ...
>
>
> And all should be exact commands so that I can do them here on my machine.
>

Sorry, missed your point.

The steps should be as follows:
1. Prepare a multi-core intel machine with broadcasted mce support.
    Enable kdump(crashkernel=256M) and configure kdump kernel to boot with "nr_cpus=1".
2. Activate kdump, and crash the first kernel on some cpu, say cpu1
    (taskset -c 1 echo 0 > /proc/sysrq-trigger), then kdump will boot on cpu1.
3. After kdump boots up(let it enter shell), trigger a SRAO on cpu1
   (QEMU monitor cmd: mce -b 1 0 0xb100000000000000 0x5 0x0 0x0),
    then mce will be broadcast to the other cpus which are still running
    in the first kernel(i.e. looping in crash_nmi_callback).
    If you own some hardware to inject mce, it would be great, as QEMU does not work correctly for me.
4. Then something like below is expected to happen:

[    1.468556] tsc: Refined TSC clocksource calibration: 2933.437 MHz
         Starting Kdump Vmcore Save Service...
kdump: saving to /sysroot//var/crash/127.0.0.1-2015-09-01-05:07:03/
kdump: saving vmcore-dmesg.txt
[   39.000010] mce: [Hardware Error]: CPU 0: Machine Check Exception: 0 Bank 2: bd0000000000017a
[   39.000010] mce: [Hardware Error]: TSC 0 ADDR 61600000 MISC 8c 
[   39.000010] mce: [Hardware Error]: PROCESSOR 0:106a3 TIME 1441083980 SOCKET 0 APIC 0 microcode 1
[   39.000010] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[   39.000010] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
[   39.000010] Shutting down cpus with NMI
[    1.758463] Uhhuh. NMI received for unknown reason 20 on CPU 0.
[    1.758463] Do you have a strange power saving mode enabled?
[    1.758463] Dazed and confused, but trying to continue
[   39.000010] Rebooting in 30 seconds..

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-02-16 11:52                     ` Xunlei Pang
  0 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-02-16 11:52 UTC (permalink / raw)
  To: Borislav Petkov, xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On 02/16/2017 at 06:18 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 01:36:37PM +0800, Xunlei Pang wrote:
>> I tried to use qemu to inject SRAO("mce -b 0 0 0xb100000000000000 0x5 0x0 0x0"),
>> it works well in 1st kernel, but it doesn't work for 1st kernel after kdump boots(seems
>> the cpus remain in 1st kernel don't respond to the simulated broadcasting mce).
>>
>> But in theory, we know cpus belong to kdump kernel can't respond to the
>> old mce handler, so a single SRAO injection in 1st kernel should be similar.
>> For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with broadcast
>> mce supported, and inject SRAO to cpu0 only through qemu monitor
>> "mce 0 0 0xb100000000000000 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
>> the machine as follows(running on linux-4.9):
>>   Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
> Sounds to me like you're trying hard to prove some point of yours which
> doesn't make much sense to me. And when you say "in theory", that makes
> it even less believable. So I remember asking you for exact steps. That
> above doesn't read like steps but like some babbling and I've actually
> tried to make sense of it for a couple of minutes but failed.
>
> So lemme spell it out for ya. I'd like for you to give me this:
>
> 1. Build kernel with this config
> 2. Boot it in kvm with this settings
> 3. Do this in the guest
> 4. Do that in the guest
> 5. ...
> 6. ...
>
>
> And all should be exact commands so that I can do them here on my machine.
>

Sorry, missed your point.

The steps should be as follows:
1. Prepare a multi-core intel machine with broadcasted mce support.
    Enable kdump(crashkernel=256M) and configure kdump kernel to boot with "nr_cpus=1".
2. Activate kdump, and crash the first kernel on some cpu, say cpu1
    (taskset -c 1 echo 0 > /proc/sysrq-trigger), then kdump will boot on cpu1.
3. After kdump boots up(let it enter shell), trigger a SRAO on cpu1
   (QEMU monitor cmd: mce -b 1 0 0xb100000000000000 0x5 0x0 0x0),
    then mce will be broadcast to the other cpus which are still running
    in the first kernel(i.e. looping in crash_nmi_callback).
    If you own some hardware to inject mce, it would be great, as QEMU does not work correctly for me.
4. Then something like below is expected to happen:

[    1.468556] tsc: Refined TSC clocksource calibration: 2933.437 MHz
         Starting Kdump Vmcore Save Service...
kdump: saving to /sysroot//var/crash/127.0.0.1-2015-09-01-05:07:03/
kdump: saving vmcore-dmesg.txt
[   39.000010] mce: [Hardware Error]: CPU 0: Machine Check Exception: 0 Bank 2: bd0000000000017a
[   39.000010] mce: [Hardware Error]: TSC 0 ADDR 61600000 MISC 8c 
[   39.000010] mce: [Hardware Error]: PROCESSOR 0:106a3 TIME 1441083980 SOCKET 0 APIC 0 microcode 1
[   39.000010] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[   39.000010] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
[   39.000010] Shutting down cpus with NMI
[    1.758463] Uhhuh. NMI received for unknown reason 20 on CPU 0.
[    1.758463] Do you have a strange power saving mode enabled?
[    1.758463] Dazed and confused, but trying to continue
[   39.000010] Rebooting in 30 seconds..

Regards,
Xunlei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-02-16 11:52                     ` Xunlei Pang
@ 2017-02-16 12:22                       ` Borislav Petkov
  -1 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-02-16 12:22 UTC (permalink / raw)
  To: xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi,
	Peter Zijlstra, Thomas Gleixner

On Thu, Feb 16, 2017 at 07:52:09PM +0800, Xunlei Pang wrote:
>     then mce will be broadcast to the other cpus which are still running
>     in the first kernel(i.e. looping in crash_nmi_callback).

Simple: the crash code should really mark CPUs as not being online:

void do_machine_check(struct pt_regs *regs, long error_code)

	...

        /* If this CPU is offline, just bail out. */
        if (cpu_is_offline(smp_processor_id())) {
                u64 mcgstatus;

                mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
                if (mcgstatus & MCG_STATUS_RIPV) {
                        mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
                        return;
                }
        }

because looping in crash_nmi_callback() does not really denote them as
CPUs being online.

And just so that you don't disturb the machine too much during crashing,
you could simply clear them from the online masks, i.e., perhaps call
remove_cpu_from_maps() with the proper locking around it instead of
doing a full cpu_down().

The machine will be killed anyway after kdump is done writing out
memory.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-02-16 12:22                       ` Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-02-16 12:22 UTC (permalink / raw)
  To: xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, Peter Zijlstra, x86,
	kexec, linux-kernel, Ingo Molnar, Junichi Nomura,
	Naoya Horiguchi, Dave Young, Thomas Gleixner

On Thu, Feb 16, 2017 at 07:52:09PM +0800, Xunlei Pang wrote:
>     then mce will be broadcast to the other cpus which are still running
>     in the first kernel(i.e. looping in crash_nmi_callback).

Simple: the crash code should really mark CPUs as not being online:

void do_machine_check(struct pt_regs *regs, long error_code)

	...

        /* If this CPU is offline, just bail out. */
        if (cpu_is_offline(smp_processor_id())) {
                u64 mcgstatus;

                mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
                if (mcgstatus & MCG_STATUS_RIPV) {
                        mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
                        return;
                }
        }

because looping in crash_nmi_callback() does not really denote them as
CPUs being online.

And just so that you don't disturb the machine too much during crashing,
you could simply clear them from the online masks, i.e., perhaps call
remove_cpu_from_maps() with the proper locking around it instead of
doing a full cpu_down().

The machine will be killed anyway after kdump is done writing out
memory.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-02-16 12:22                       ` Borislav Petkov
@ 2017-02-17  1:53                         ` Xunlei Pang
  -1 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-02-17  1:53 UTC (permalink / raw)
  To: Borislav Petkov, xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi,
	Peter Zijlstra, Thomas Gleixner

On 02/16/2017 at 08:22 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 07:52:09PM +0800, Xunlei Pang wrote:
>>     then mce will be broadcast to the other cpus which are still running
>>     in the first kernel(i.e. looping in crash_nmi_callback).
> Simple: the crash code should really mark CPUs as not being online:
>
> void do_machine_check(struct pt_regs *regs, long error_code)
>
> 	...
>
>         /* If this CPU is offline, just bail out. */
>         if (cpu_is_offline(smp_processor_id())) {
>                 u64 mcgstatus;
>
>                 mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>                 if (mcgstatus & MCG_STATUS_RIPV) {
>                         mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
>                         return;
>                 }
>         }
>
> because looping in crash_nmi_callback() does not really denote them as
> CPUs being online.
>
> And just so that you don't disturb the machine too much during crashing,
> you could simply clear them from the online masks, i.e., perhaps call
> remove_cpu_from_maps() with the proper locking around it instead of
> doing a full cpu_down().

It changes the value of cpu_online_mask/etc which will cause confusion to vmcore analysis.
Moreover, for the code(see comment inlined)

        if (cpu_is_offline(smp_processor_id())) {
                u64 mcgstatus;

                mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
                if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be not true, the mce triggered on kdump cpu 
                                                                     // doesn't need to have this bit set for the other cpus remain in 1st kernel. 
                        mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
                        return;
                }
        }


Regards,
Xunlei

>
> The machine will be killed anyway after kdump is done writing out
> memory.
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-02-17  1:53                         ` Xunlei Pang
  0 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-02-17  1:53 UTC (permalink / raw)
  To: Borislav Petkov, xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, Peter Zijlstra, x86,
	kexec, linux-kernel, Ingo Molnar, Junichi Nomura,
	Naoya Horiguchi, Dave Young, Thomas Gleixner

On 02/16/2017 at 08:22 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 07:52:09PM +0800, Xunlei Pang wrote:
>>     then mce will be broadcast to the other cpus which are still running
>>     in the first kernel(i.e. looping in crash_nmi_callback).
> Simple: the crash code should really mark CPUs as not being online:
>
> void do_machine_check(struct pt_regs *regs, long error_code)
>
> 	...
>
>         /* If this CPU is offline, just bail out. */
>         if (cpu_is_offline(smp_processor_id())) {
>                 u64 mcgstatus;
>
>                 mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>                 if (mcgstatus & MCG_STATUS_RIPV) {
>                         mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
>                         return;
>                 }
>         }
>
> because looping in crash_nmi_callback() does not really denote them as
> CPUs being online.
>
> And just so that you don't disturb the machine too much during crashing,
> you could simply clear them from the online masks, i.e., perhaps call
> remove_cpu_from_maps() with the proper locking around it instead of
> doing a full cpu_down().

It changes the value of cpu_online_mask/etc which will cause confusion to vmcore analysis.
Moreover, for the code(see comment inlined)

        if (cpu_is_offline(smp_processor_id())) {
                u64 mcgstatus;

                mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
                if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be not true, the mce triggered on kdump cpu 
                                                                     // doesn't need to have this bit set for the other cpus remain in 1st kernel. 
                        mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
                        return;
                }
        }


Regards,
Xunlei

>
> The machine will be killed anyway after kdump is done writing out
> memory.
>


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-02-17  1:53                         ` Xunlei Pang
@ 2017-02-17  9:07                           ` Borislav Petkov
  -1 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-02-17  9:07 UTC (permalink / raw)
  To: xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi,
	Peter Zijlstra, Thomas Gleixner

On Fri, Feb 17, 2017 at 09:53:21AM +0800, Xunlei Pang wrote:
> It changes the value of cpu_online_mask/etc which will cause confusion to vmcore analysis.

Then export the crashing_cpu variable, initialize it to something
invalid in the first kernel, -1 for example, and test it in the #MC
handlier like this:

	int cpu;

	...

	cpu = smp_processor_id();

	if (cpu_is_offline(cpu) ||
	    ((crashing_cpu != -1) && (crashing_cpu != cpu)) {
                u64 mcgstatus;

                mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
                if (mcgstatus & MCG_STATUS_RIPV) {
                        mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
			return;
		}
	}

> Moreover, for the code(see comment inlined)
> 
>         if (cpu_is_offline(smp_processor_id())) {
>                 u64 mcgstatus;
> 
>                 mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>                 if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be not true, the mce triggered on kdump cpu 
>                                                                      // doesn't need to have this bit set for the other cpus remain in 1st kernel. 

Is this on kvm or on a real hardware? Because for kvm I don't care. And
don't say "theoretically".

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-02-17  9:07                           ` Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2017-02-17  9:07 UTC (permalink / raw)
  To: xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, Peter Zijlstra, x86,
	kexec, linux-kernel, Ingo Molnar, Junichi Nomura,
	Naoya Horiguchi, Dave Young, Thomas Gleixner

On Fri, Feb 17, 2017 at 09:53:21AM +0800, Xunlei Pang wrote:
> It changes the value of cpu_online_mask/etc which will cause confusion to vmcore analysis.

Then export the crashing_cpu variable, initialize it to something
invalid in the first kernel, -1 for example, and test it in the #MC
handlier like this:

	int cpu;

	...

	cpu = smp_processor_id();

	if (cpu_is_offline(cpu) ||
	    ((crashing_cpu != -1) && (crashing_cpu != cpu)) {
                u64 mcgstatus;

                mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
                if (mcgstatus & MCG_STATUS_RIPV) {
                        mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
			return;
		}
	}

> Moreover, for the code(see comment inlined)
> 
>         if (cpu_is_offline(smp_processor_id())) {
>                 u64 mcgstatus;
> 
>                 mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>                 if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be not true, the mce triggered on kdump cpu 
>                                                                      // doesn't need to have this bit set for the other cpus remain in 1st kernel. 

Is this on kvm or on a real hardware? Because for kvm I don't care. And
don't say "theoretically".

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-02-17  9:07                           ` Borislav Petkov
@ 2017-02-17 16:21                             ` Xunlei Pang
  -1 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-02-17 16:21 UTC (permalink / raw)
  To: Borislav Petkov, xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, Peter Zijlstra, x86,
	kexec, linux-kernel, Ingo Molnar, Junichi Nomura,
	Naoya Horiguchi, Dave Young, Thomas Gleixner

On 02/17/2017 at 05:07 PM, Borislav Petkov wrote:
> On Fri, Feb 17, 2017 at 09:53:21AM +0800, Xunlei Pang wrote:
>> It changes the value of cpu_online_mask/etc which will cause confusion to vmcore analysis.
> Then export the crashing_cpu variable, initialize it to something
> invalid in the first kernel, -1 for example, and test it in the #MC
> handlier like this:
>
> 	int cpu;
>
> 	...
>
> 	cpu = smp_processor_id();
>
> 	if (cpu_is_offline(cpu) ||
> 	    ((crashing_cpu != -1) && (crashing_cpu != cpu)) {
>                 u64 mcgstatus;
>
>                 mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>                 if (mcgstatus & MCG_STATUS_RIPV) {
>                         mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> 			return;
> 		}
> 	}

Yes, it is doable, I will do some tests later.

>> Moreover, for the code(see comment inlined)
>>
>>         if (cpu_is_offline(smp_processor_id())) {
>>                 u64 mcgstatus;
>>
>>                 mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>>                 if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be not true, the mce triggered on kdump cpu 
>>                                                                      // doesn't need to have this bit set for the other cpus remain in 1st kernel. 
> Is this on kvm or on a real hardware? Because for kvm I don't care. And
> don't say "theoretically".
>

It's from my understanding, I didn't get the explicit description from the intel SDM on this point.
If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each cpu have MCG_STATUS_RIPV bit set?

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-02-17 16:21                             ` Xunlei Pang
  0 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-02-17 16:21 UTC (permalink / raw)
  To: Borislav Petkov, xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, Peter Zijlstra, x86,
	kexec, linux-kernel, Ingo Molnar, Junichi Nomura,
	Naoya Horiguchi, Dave Young, Thomas Gleixner

On 02/17/2017 at 05:07 PM, Borislav Petkov wrote:
> On Fri, Feb 17, 2017 at 09:53:21AM +0800, Xunlei Pang wrote:
>> It changes the value of cpu_online_mask/etc which will cause confusion to vmcore analysis.
> Then export the crashing_cpu variable, initialize it to something
> invalid in the first kernel, -1 for example, and test it in the #MC
> handlier like this:
>
> 	int cpu;
>
> 	...
>
> 	cpu = smp_processor_id();
>
> 	if (cpu_is_offline(cpu) ||
> 	    ((crashing_cpu != -1) && (crashing_cpu != cpu)) {
>                 u64 mcgstatus;
>
>                 mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>                 if (mcgstatus & MCG_STATUS_RIPV) {
>                         mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> 			return;
> 		}
> 	}

Yes, it is doable, I will do some tests later.

>> Moreover, for the code(see comment inlined)
>>
>>         if (cpu_is_offline(smp_processor_id())) {
>>                 u64 mcgstatus;
>>
>>                 mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>>                 if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be not true, the mce triggered on kdump cpu 
>>                                                                      // doesn't need to have this bit set for the other cpus remain in 1st kernel. 
> Is this on kvm or on a real hardware? Because for kvm I don't care. And
> don't say "theoretically".
>

It's from my understanding, I didn't get the explicit description from the intel SDM on this point.
If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each cpu have MCG_STATUS_RIPV bit set?

Regards,
Xunlei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-02-17 16:21                             ` Xunlei Pang
@ 2017-02-21 18:20                               ` Luck, Tony
  -1 siblings, 0 replies; 48+ messages in thread
From: Luck, Tony @ 2017-02-21 18:20 UTC (permalink / raw)
  To: xlpang, Borislav Petkov
  Cc: Prarit Bhargava, Kiyoshi Ueda, Peter Zijlstra, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young, Thomas Gleixner

> It's from my understanding, I didn't get the explicit description from the intel SDM on this point.
> If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each cpu have MCG_STATUS_RIPV bit set?

MCG_STATUS is a per-thread MSR and will contain the status appropriate for that thread when #MC is delivered.
So the RIPV bit will be set if, and only if, the thread saved a valid return address for this exception. The net result
is that it is almost always set for "innocent bystander" CPUs that were dragged into the exception handler because
of a broadcast #MC. We make the test because if it isn't set, then the do_machine_check() had better not return
because we have no idea where it will return to - since there is not a valid return IP.

-Tony

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-02-21 18:20                               ` Luck, Tony
  0 siblings, 0 replies; 48+ messages in thread
From: Luck, Tony @ 2017-02-21 18:20 UTC (permalink / raw)
  To: xlpang, Borislav Petkov
  Cc: Prarit Bhargava, Kiyoshi Ueda, Peter Zijlstra, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young, Thomas Gleixner

> It's from my understanding, I didn't get the explicit description from the intel SDM on this point.
> If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each cpu have MCG_STATUS_RIPV bit set?

MCG_STATUS is a per-thread MSR and will contain the status appropriate for that thread when #MC is delivered.
So the RIPV bit will be set if, and only if, the thread saved a valid return address for this exception. The net result
is that it is almost always set for "innocent bystander" CPUs that were dragged into the exception handler because
of a broadcast #MC. We make the test because if it isn't set, then the do_machine_check() had better not return
because we have no idea where it will return to - since there is not a valid return IP.

-Tony

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
  2017-02-21 18:20                               ` Luck, Tony
@ 2017-02-22  5:50                                 ` Xunlei Pang
  -1 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-02-22  5:50 UTC (permalink / raw)
  To: Luck, Tony, xlpang, Borislav Petkov
  Cc: Prarit Bhargava, Kiyoshi Ueda, Peter Zijlstra, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young, Thomas Gleixner

On 02/22/2017 at 02:20 AM, Luck, Tony wrote:
>> It's from my understanding, I didn't get the explicit description from the intel SDM on this point.
>> If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each cpu have MCG_STATUS_RIPV bit set?
> MCG_STATUS is a per-thread MSR and will contain the status appropriate for that thread when #MC is delivered.
> So the RIPV bit will be set if, and only if, the thread saved a valid return address for this exception. The net result
> is that it is almost always set for "innocent bystander" CPUs that were dragged into the exception handler because
> of a broadcast #MC. We make the test because if it isn't set, then the do_machine_check() had better not return
> because we have no idea where it will return to - since there is not a valid return IP.
>

Got it, thanks for the details.

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
@ 2017-02-22  5:50                                 ` Xunlei Pang
  0 siblings, 0 replies; 48+ messages in thread
From: Xunlei Pang @ 2017-02-22  5:50 UTC (permalink / raw)
  To: Luck, Tony, xlpang, Borislav Petkov
  Cc: Prarit Bhargava, Kiyoshi Ueda, Peter Zijlstra, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young, Thomas Gleixner

On 02/22/2017 at 02:20 AM, Luck, Tony wrote:
>> It's from my understanding, I didn't get the explicit description from the intel SDM on this point.
>> If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each cpu have MCG_STATUS_RIPV bit set?
> MCG_STATUS is a per-thread MSR and will contain the status appropriate for that thread when #MC is delivered.
> So the RIPV bit will be set if, and only if, the thread saved a valid return address for this exception. The net result
> is that it is almost always set for "innocent bystander" CPUs that were dragged into the exception handler because
> of a broadcast #MC. We make the test because if it isn't set, then the do_machine_check() had better not return
> because we have no idea where it will return to - since there is not a valid return IP.
>

Got it, thanks for the details.

Regards,
Xunlei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2017-02-22  5:49 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-23  8:01 [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic Xunlei Pang
2017-01-23  8:01 ` Xunlei Pang
2017-01-23 12:51 ` Borislav Petkov
2017-01-23 12:51   ` Borislav Petkov
2017-01-23 13:35   ` Xunlei Pang
2017-01-23 13:35     ` Xunlei Pang
2017-01-23 14:50     ` Borislav Petkov
2017-01-23 14:50       ` Borislav Petkov
2017-01-23 17:40       ` Luck, Tony
2017-01-23 17:40         ` Luck, Tony
2017-01-23 17:51         ` Borislav Petkov
2017-01-23 17:51           ` Borislav Petkov
2017-01-23 18:01           ` Luck, Tony
2017-01-23 18:01             ` Luck, Tony
2017-01-23 18:14             ` Borislav Petkov
2017-01-23 18:14               ` Borislav Petkov
2017-01-24  2:33               ` Xunlei Pang
2017-01-24  2:33                 ` Xunlei Pang
2017-01-24  1:46           ` Xunlei Pang
2017-01-24  1:46             ` Xunlei Pang
2017-01-24  1:51             ` Xunlei Pang
2017-01-24  1:51               ` Xunlei Pang
2017-01-24  1:27       ` Xunlei Pang
2017-01-24  1:27         ` Xunlei Pang
2017-01-24 12:22         ` Borislav Petkov
2017-01-24 12:22           ` Borislav Petkov
2017-01-26  6:30           ` Xunlei Pang
2017-01-26  6:30             ` Xunlei Pang
2017-01-26  6:44             ` Borislav Petkov
2017-01-26  6:44               ` Borislav Petkov
2017-02-16  5:36               ` Xunlei Pang
2017-02-16  5:36                 ` Xunlei Pang
2017-02-16 10:18                 ` Borislav Petkov
2017-02-16 10:18                   ` Borislav Petkov
2017-02-16 11:52                   ` Xunlei Pang
2017-02-16 11:52                     ` Xunlei Pang
2017-02-16 12:22                     ` Borislav Petkov
2017-02-16 12:22                       ` Borislav Petkov
2017-02-17  1:53                       ` Xunlei Pang
2017-02-17  1:53                         ` Xunlei Pang
2017-02-17  9:07                         ` Borislav Petkov
2017-02-17  9:07                           ` Borislav Petkov
2017-02-17 16:21                           ` Xunlei Pang
2017-02-17 16:21                             ` Xunlei Pang
2017-02-21 18:20                             ` Luck, Tony
2017-02-21 18:20                               ` Luck, Tony
2017-02-22  5:50                               ` Xunlei Pang
2017-02-22  5:50                                 ` Xunlei Pang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.