[PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
@ 2017-02-20  6:10 ` Xunlei Pang
  0 siblings, 0 replies; 18+ messages in thread
From: Xunlei Pang @ 2017-02-20  6:10 UTC (permalink / raw)
  To: x86, linux-kernel, kexec
  Cc: Tony Luck, Borislav Petkov, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Xunlei Pang,
	Naoya Horiguchi

We met an issue for kdump: after kdump kernel boots up,
and there comes a broadcasted mce in first kernel, the
other cpus remaining in first kernel will enter the old
mce handler of first kernel, then timeout and panic due
to MCE synchronization, finally reset the kdump cpus.

This patch lets cpus stay quiet after nmi_shootdown_cpus(),
so before crash cpu shots them down or after kdump boots,
they should not do anything except clearing MCG_STATUS
in case of broadcasted mce. This is useful for kdump
to let the vmcore dumping perform as hard as it can.

Previous efforts:
https://patchwork.kernel.org/patch/6167631/
https://lists.gt.net/linux/kernel/2146557

Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
---
v1->v2:
Using crashing_cpu according to Borislav's suggestion.

 arch/x86/include/asm/reboot.h    |  1 +
 arch/x86/kernel/cpu/mcheck/mce.c |  6 ++++--
 arch/x86/kernel/reboot.c         | 16 +++++++++++++++-
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 2cb1cc2..ec8657b6 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -26,5 +26,6 @@ struct machine_ops {
 typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
 void nmi_shootdown_cpus(nmi_shootdown_cb callback);
 void run_crash_ipi_callback(struct pt_regs *regs);
+bool cpus_shotdown(void);
 
 #endif /* _ASM_X86_REBOOT_H */
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8e9725c..3b56710 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -49,6 +49,7 @@
 #include <asm/tlbflush.h>
 #include <asm/mce.h>
 #include <asm/msr.h>
+#include <asm/reboot.h>
 
 #include "mce-internal.h"
 
@@ -1128,8 +1129,9 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	 */
 	int lmce = 1;
 
-	/* If this CPU is offline, just bail out. */
-	if (cpu_is_offline(smp_processor_id())) {
+	/* If nmi shootdown happened or this CPU is offline, just bail out. */
+	if (cpus_shotdown() ||
+	    cpu_is_offline(smp_processor_id())) {
 		u64 mcgstatus;
 
 		mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e244c19..b301c8d 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -752,7 +752,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
 #if defined(CONFIG_SMP)
 
 /* This keeps a track of which one is crashing cpu. */
-static int crashing_cpu;
+static int crashing_cpu = -1;
 static nmi_shootdown_cb shootdown_callback;
 
 static atomic_t waiting_for_crash_ipi;
@@ -852,6 +852,14 @@ void nmi_panic_self_stop(struct pt_regs *regs)
 	}
 }
 
+bool cpus_shotdown(void)
+{
+	if (crashing_cpu != -1)
+		return true;
+
+	return false;
+}
+
 #else /* !CONFIG_SMP */
 void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 {
@@ -861,4 +869,10 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 void run_crash_ipi_callback(struct pt_regs *regs)
 {
 }
+
+bool cpus_shotdown(void)
+{
+	return false;
+}
+
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
@ 2017-02-20  6:10 ` Xunlei Pang
  0 siblings, 0 replies; 18+ messages in thread
From: Xunlei Pang @ 2017-02-20  6:10 UTC (permalink / raw)
  To: x86, linux-kernel, kexec
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, Xunlei Pang,
	Ingo Molnar, Borislav Petkov, Junichi Nomura, Naoya Horiguchi,
	Dave Young

We met an issue for kdump: after kdump kernel boots up,
and there comes a broadcasted mce in first kernel, the
other cpus remaining in first kernel will enter the old
mce handler of first kernel, then timeout and panic due
to MCE synchronization, finally reset the kdump cpus.

This patch lets cpus stay quiet after nmi_shootdown_cpus(),
so before crash cpu shots them down or after kdump boots,
they should not do anything except clearing MCG_STATUS
in case of broadcasted mce. This is useful for kdump
to let the vmcore dumping perform as hard as it can.

Previous efforts:
https://patchwork.kernel.org/patch/6167631/
https://lists.gt.net/linux/kernel/2146557

Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
---
v1->v2:
Using crashing_cpu according to Borislav's suggestion.

 arch/x86/include/asm/reboot.h    |  1 +
 arch/x86/kernel/cpu/mcheck/mce.c |  6 ++++--
 arch/x86/kernel/reboot.c         | 16 +++++++++++++++-
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 2cb1cc2..ec8657b6 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -26,5 +26,6 @@ struct machine_ops {
 typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
 void nmi_shootdown_cpus(nmi_shootdown_cb callback);
 void run_crash_ipi_callback(struct pt_regs *regs);
+bool cpus_shotdown(void);
 
 #endif /* _ASM_X86_REBOOT_H */
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8e9725c..3b56710 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -49,6 +49,7 @@
 #include <asm/tlbflush.h>
 #include <asm/mce.h>
 #include <asm/msr.h>
+#include <asm/reboot.h>
 
 #include "mce-internal.h"
 
@@ -1128,8 +1129,9 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	 */
 	int lmce = 1;
 
-	/* If this CPU is offline, just bail out. */
-	if (cpu_is_offline(smp_processor_id())) {
+	/* If nmi shootdown happened or this CPU is offline, just bail out. */
+	if (cpus_shotdown() ||
+	    cpu_is_offline(smp_processor_id())) {
 		u64 mcgstatus;
 
 		mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e244c19..b301c8d 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -752,7 +752,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
 #if defined(CONFIG_SMP)
 
 /* This keeps a track of which one is crashing cpu. */
-static int crashing_cpu;
+static int crashing_cpu = -1;
 static nmi_shootdown_cb shootdown_callback;
 
 static atomic_t waiting_for_crash_ipi;
@@ -852,6 +852,14 @@ void nmi_panic_self_stop(struct pt_regs *regs)
 	}
 }
 
+bool cpus_shotdown(void)
+{
+	if (crashing_cpu != -1)
+		return true;
+
+	return false;
+}
+
 #else /* !CONFIG_SMP */
 void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 {
@@ -861,4 +869,10 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 void run_crash_ipi_callback(struct pt_regs *regs)
 {
 }
+
+bool cpus_shotdown(void)
+{
+	return false;
+}
+
 #endif
-- 
1.8.3.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
  2017-02-20  6:10 ` Xunlei Pang
@ 2017-02-20 11:09   ` Borislav Petkov
  -1 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2017-02-20 11:09 UTC (permalink / raw)
  To: Xunlei Pang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On Mon, Feb 20, 2017 at 02:10:37PM +0800, Xunlei Pang wrote:
> @@ -1128,8 +1129,9 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  	 */
>  	int lmce = 1;
>  
> -	/* If this CPU is offline, just bail out. */
> -	if (cpu_is_offline(smp_processor_id())) {
> +	/* If nmi shootdown happened or this CPU is offline, just bail out. */
> +	if (cpus_shotdown() ||

I don't like "cpus_shotdown" - it doesn't hint at all that this is
special-handling crash/kdump.

And more importantly, I want it to be obvious that we do let the
crashing CPU into the MCE handler.

Why?

If we didn't, you will not handle *any* MCE, even a fatal one, during
dumping memory so if that dump is corrupted from the MCE, you won't
know. And I don't want to be the one staring at the corrupted dump and
wondering why I'm seeing what I'm seeing.

IOW, if we get a fatal MCE during dumping then we should go and die.
This is much better than silently corrupting the dump and not even
saying anything about it.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
@ 2017-02-20 11:09   ` Borislav Petkov
  0 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2017-02-20 11:09 UTC (permalink / raw)
  To: Xunlei Pang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On Mon, Feb 20, 2017 at 02:10:37PM +0800, Xunlei Pang wrote:
> @@ -1128,8 +1129,9 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  	 */
>  	int lmce = 1;
>  
> -	/* If this CPU is offline, just bail out. */
> -	if (cpu_is_offline(smp_processor_id())) {
> +	/* If nmi shootdown happened or this CPU is offline, just bail out. */
> +	if (cpus_shotdown() ||

I don't like "cpus_shotdown" - it doesn't hint at all that this is
special-handling crash/kdump.

And more importantly, I want it to be obvious that we do let the
crashing CPU into the MCE handler.

Why?

If we didn't, you will not handle *any* MCE, even a fatal one, during
dumping memory so if that dump is corrupted from the MCE, you won't
know. And I don't want to be the one staring at the corrupted dump and
wondering why I'm seeing what I'm seeing.

IOW, if we get a fatal MCE during dumping then we should go and die.
This is much better than silently corrupting the dump and not even
saying anything about it.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
  2017-02-20 11:09   ` Borislav Petkov
@ 2017-02-20 13:29     ` Xunlei Pang
  -1 siblings, 0 replies; 18+ messages in thread
From: Xunlei Pang @ 2017-02-20 13:29 UTC (permalink / raw)
  To: Borislav Petkov, Xunlei Pang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On 02/20/2017 at 07:09 PM, Borislav Petkov wrote:
> On Mon, Feb 20, 2017 at 02:10:37PM +0800, Xunlei Pang wrote:
>> @@ -1128,8 +1129,9 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>>  	 */
>>  	int lmce = 1;
>>  
>> -	/* If this CPU is offline, just bail out. */
>> -	if (cpu_is_offline(smp_processor_id())) {
>> +	/* If nmi shootdown happened or this CPU is offline, just bail out. */
>> +	if (cpus_shotdown() ||
> I don't like "cpus_shotdown" - it doesn't hint at all that this is
> special-handling crash/kdump.
>
> And more importantly, I want it to be obvious that we do let the
> crashing CPU into the MCE handler.

Ok, I will export crashing_cpu and use it directly in mce handler.

>
> Why?
>
> If we didn't, you will not handle *any* MCE, even a fatal one, during
> dumping memory so if that dump is corrupted from the MCE, you won't
> know. And I don't want to be the one staring at the corrupted dump and
> wondering why I'm seeing what I'm seeing.
>
> IOW, if we get a fatal MCE during dumping then we should go and die.
> This is much better than silently corrupting the dump and not even
> saying anything about it.
>

My thought is that it doesn't matter after kdump boots as new mce handler
will be installed. If we get a fatal MCE during kdumping, the new handler will
handle the cpus running kdump kernel correctly.

There is a small window between crash and kdump kernel boot, so if a SRAO comes
within this window it will also cause the mce synchronization problem on the crashing
cpu if we don't bail out the crashing cpu.

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
@ 2017-02-20 13:29     ` Xunlei Pang
  0 siblings, 0 replies; 18+ messages in thread
From: Xunlei Pang @ 2017-02-20 13:29 UTC (permalink / raw)
  To: Borislav Petkov, Xunlei Pang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On 02/20/2017 at 07:09 PM, Borislav Petkov wrote:
> On Mon, Feb 20, 2017 at 02:10:37PM +0800, Xunlei Pang wrote:
>> @@ -1128,8 +1129,9 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>>  	 */
>>  	int lmce = 1;
>>  
>> -	/* If this CPU is offline, just bail out. */
>> -	if (cpu_is_offline(smp_processor_id())) {
>> +	/* If nmi shootdown happened or this CPU is offline, just bail out. */
>> +	if (cpus_shotdown() ||
> I don't like "cpus_shotdown" - it doesn't hint at all that this is
> special-handling crash/kdump.
>
> And more importantly, I want it to be obvious that we do let the
> crashing CPU into the MCE handler.

Ok, I will export crashing_cpu and use it directly in mce handler.

>
> Why?
>
> If we didn't, you will not handle *any* MCE, even a fatal one, during
> dumping memory so if that dump is corrupted from the MCE, you won't
> know. And I don't want to be the one staring at the corrupted dump and
> wondering why I'm seeing what I'm seeing.
>
> IOW, if we get a fatal MCE during dumping then we should go and die.
> This is much better than silently corrupting the dump and not even
> saying anything about it.
>

My thought is that it doesn't matter after kdump boots as new mce handler
will be installed. If we get a fatal MCE during kdumping, the new handler will
handle the cpus running kdump kernel correctly.

There is a small window between crash and kdump kernel boot, so if a SRAO comes
within this window it will also cause the mce synchronization problem on the crashing
cpu if we don't bail out the crashing cpu.

Regards,
Xunlei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
  2017-02-20 13:29     ` Xunlei Pang
@ 2017-02-20 20:26       ` Borislav Petkov
  -1 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2017-02-20 20:26 UTC (permalink / raw)
  To: xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On Mon, Feb 20, 2017 at 09:29:24PM +0800, Xunlei Pang wrote:
> There is a small window between crash and kdump kernel boot, so
> if a SRAO comes within this window it will also cause the mce
> synchronization problem on the crashing cpu if we don't bail out the
> crashing cpu.

You mean, in the window between, kdump kernel starts writing out memory
and the second, kexec-ed kernel?

If so, please add that information to the place in do_machine_check()
where we check crashing_cpu so that we know why we're doing this
temporary ignore of #MC.

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
@ 2017-02-20 20:26       ` Borislav Petkov
  0 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2017-02-20 20:26 UTC (permalink / raw)
  To: xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On Mon, Feb 20, 2017 at 09:29:24PM +0800, Xunlei Pang wrote:
> There is a small window between crash and kdump kernel boot, so
> if a SRAO comes within this window it will also cause the mce
> synchronization problem on the crashing cpu if we don't bail out the
> crashing cpu.

You mean, in the window between, kdump kernel starts writing out memory
and the second, kexec-ed kernel?

If so, please add that information to the place in do_machine_check()
where we check crashing_cpu so that we know why we're doing this
temporary ignore of #MC.

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
  2017-02-20 13:29     ` Xunlei Pang
@ 2017-02-21  1:26       ` Xunlei Pang
  -1 siblings, 0 replies; 18+ messages in thread
From: Xunlei Pang @ 2017-02-21  1:26 UTC (permalink / raw)
  To: Borislav Petkov, Xunlei Pang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On 02/20/2017 at 09:29 PM, Xunlei Pang wrote:
> On 02/20/2017 at 07:09 PM, Borislav Petkov wrote:
>> On Mon, Feb 20, 2017 at 02:10:37PM +0800, Xunlei Pang wrote:
>>> @@ -1128,8 +1129,9 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>>>  	 */
>>>  	int lmce = 1;
>>>  
>>> -	/* If this CPU is offline, just bail out. */
>>> -	if (cpu_is_offline(smp_processor_id())) {
>>> +	/* If nmi shootdown happened or this CPU is offline, just bail out. */
>>> +	if (cpus_shotdown() ||
>> I don't like "cpus_shotdown" - it doesn't hint at all that this is
>> special-handling crash/kdump.
>>
>> And more importantly, I want it to be obvious that we do let the
>> crashing CPU into the MCE handler.
> Ok, I will export crashing_cpu and use it directly in mce handler.

Forget to mention, one reason I introduced cpus_shotdown() is that "crashing_cpu"
is defined with CONFIG_SMP=y, so we have to export it unconditionally if we don't want
to add the conditional code(i.e. with #ifdef CONFIG_SMP quoted) in mce.c.

Regards,
Xunlei

>
>> Why?
>>
>> If we didn't, you will not handle *any* MCE, even a fatal one, during
>> dumping memory so if that dump is corrupted from the MCE, you won't
>> know. And I don't want to be the one staring at the corrupted dump and
>> wondering why I'm seeing what I'm seeing.
>>
>> IOW, if we get a fatal MCE during dumping then we should go and die.
>> This is much better than silently corrupting the dump and not even
>> saying anything about it.
>>
> My thought is that it doesn't matter after kdump boots as new mce handler
> will be installed. If we get a fatal MCE during kdumping, the new handler will
> handle the cpus running kdump kernel correctly.
>
> There is a small window between crash and kdump kernel boot, so if a SRAO comes
> within this window it will also cause the mce synchronization problem on the crashing
> cpu if we don't bail out the crashing cpu.
>
> Regards,
> Xunlei

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
@ 2017-02-21  1:26       ` Xunlei Pang
  0 siblings, 0 replies; 18+ messages in thread
From: Xunlei Pang @ 2017-02-21  1:26 UTC (permalink / raw)
  To: Borislav Petkov, Xunlei Pang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On 02/20/2017 at 09:29 PM, Xunlei Pang wrote:
> On 02/20/2017 at 07:09 PM, Borislav Petkov wrote:
>> On Mon, Feb 20, 2017 at 02:10:37PM +0800, Xunlei Pang wrote:
>>> @@ -1128,8 +1129,9 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>>>  	 */
>>>  	int lmce = 1;
>>>  
>>> -	/* If this CPU is offline, just bail out. */
>>> -	if (cpu_is_offline(smp_processor_id())) {
>>> +	/* If nmi shootdown happened or this CPU is offline, just bail out. */
>>> +	if (cpus_shotdown() ||
>> I don't like "cpus_shotdown" - it doesn't hint at all that this is
>> special-handling crash/kdump.
>>
>> And more importantly, I want it to be obvious that we do let the
>> crashing CPU into the MCE handler.
> Ok, I will export crashing_cpu and use it directly in mce handler.

Forget to mention, one reason I introduced cpus_shotdown() is that "crashing_cpu"
is defined with CONFIG_SMP=y, so we have to export it unconditionally if we don't want
to add the conditional code(i.e. with #ifdef CONFIG_SMP quoted) in mce.c.

Regards,
Xunlei

>
>> Why?
>>
>> If we didn't, you will not handle *any* MCE, even a fatal one, during
>> dumping memory so if that dump is corrupted from the MCE, you won't
>> know. And I don't want to be the one staring at the corrupted dump and
>> wondering why I'm seeing what I'm seeing.
>>
>> IOW, if we get a fatal MCE during dumping then we should go and die.
>> This is much better than silently corrupting the dump and not even
>> saying anything about it.
>>
> My thought is that it doesn't matter after kdump boots as new mce handler
> will be installed. If we get a fatal MCE during kdumping, the new handler will
> handle the cpus running kdump kernel correctly.
>
> There is a small window between crash and kdump kernel boot, so if a SRAO comes
> within this window it will also cause the mce synchronization problem on the crashing
> cpu if we don't bail out the crashing cpu.
>
> Regards,
> Xunlei


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
  2017-02-20 20:26       ` Borislav Petkov
@ 2017-02-21  1:28         ` Xunlei Pang
  -1 siblings, 0 replies; 18+ messages in thread
From: Xunlei Pang @ 2017-02-21  1:28 UTC (permalink / raw)
  To: Borislav Petkov, xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On 02/21/2017 at 04:26 AM, Borislav Petkov wrote:
> On Mon, Feb 20, 2017 at 09:29:24PM +0800, Xunlei Pang wrote:
>> There is a small window between crash and kdump kernel boot, so
>> if a SRAO comes within this window it will also cause the mce
>> synchronization problem on the crashing cpu if we don't bail out the
>> crashing cpu.
> You mean, in the window between, kdump kernel starts writing out memory
> and the second, kexec-ed kernel?

Not kdump kernel starts dumping, just during nmi_shootdown_cpus(), if some
MCE comes after crashing_cpu was set and we don't skip crashing_cpu, then
the crashing cpu will enter mce handler and trigger the synchronization issue.

>
> If so, please add that information to the place in do_machine_check()
> where we check crashing_cpu so that we know why we're doing this
> temporary ignore of #MC.

Ok, will add, thanks for the feedback.

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
@ 2017-02-21  1:28         ` Xunlei Pang
  0 siblings, 0 replies; 18+ messages in thread
From: Xunlei Pang @ 2017-02-21  1:28 UTC (permalink / raw)
  To: Borislav Petkov, xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On 02/21/2017 at 04:26 AM, Borislav Petkov wrote:
> On Mon, Feb 20, 2017 at 09:29:24PM +0800, Xunlei Pang wrote:
>> There is a small window between crash and kdump kernel boot, so
>> if a SRAO comes within this window it will also cause the mce
>> synchronization problem on the crashing cpu if we don't bail out the
>> crashing cpu.
> You mean, in the window between, kdump kernel starts writing out memory
> and the second, kexec-ed kernel?

Not kdump kernel starts dumping, just during nmi_shootdown_cpus(), if some
MCE comes after crashing_cpu was set and we don't skip crashing_cpu, then
the crashing cpu will enter mce handler and trigger the synchronization issue.

>
> If so, please add that information to the place in do_machine_check()
> where we check crashing_cpu so that we know why we're doing this
> temporary ignore of #MC.

Ok, will add, thanks for the feedback.

Regards,
Xunlei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
  2017-02-21  1:28         ` Xunlei Pang
@ 2017-02-21  8:41           ` Borislav Petkov
  -1 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2017-02-21  8:41 UTC (permalink / raw)
  To: xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On Tue, Feb 21, 2017 at 09:28:20AM +0800, Xunlei Pang wrote:
> Not kdump kernel starts dumping, just during nmi_shootdown_cpus(), if some
> MCE comes after crashing_cpu was set and we don't skip crashing_cpu, then
> the crashing cpu will enter mce handler and trigger the synchronization issue.

Ok, I went and looked at __crash_kexec() - so we do nmi_shootdown_cpus()
as part of machine_crash_shutdown(). The next thing we do is kexec the
new kernel in machine_kexec(kexec_crash_image). The picture is clearer
now.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
@ 2017-02-21  8:41           ` Borislav Petkov
  0 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2017-02-21  8:41 UTC (permalink / raw)
  To: xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On Tue, Feb 21, 2017 at 09:28:20AM +0800, Xunlei Pang wrote:
> Not kdump kernel starts dumping, just during nmi_shootdown_cpus(), if some
> MCE comes after crashing_cpu was set and we don't skip crashing_cpu, then
> the crashing cpu will enter mce handler and trigger the synchronization issue.

Ok, I went and looked at __crash_kexec() - so we do nmi_shootdown_cpus()
as part of machine_crash_shutdown(). The next thing we do is kexec the
new kernel in machine_kexec(kexec_crash_image). The picture is clearer
now.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
  2017-02-20 11:09   ` Borislav Petkov
@ 2017-02-21 12:37     ` Xunlei Pang
  -1 siblings, 0 replies; 18+ messages in thread
From: Xunlei Pang @ 2017-02-21 12:37 UTC (permalink / raw)
  To: Borislav Petkov, Xunlei Pang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On 02/20/2017 at 07:09 PM, Borislav Petkov wrote:
> On Mon, Feb 20, 2017 at 02:10:37PM +0800, Xunlei Pang wrote:
>> @@ -1128,8 +1129,9 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>>  	 */
>>  	int lmce = 1;
>>  
>> -	/* If this CPU is offline, just bail out. */
>> -	if (cpu_is_offline(smp_processor_id())) {
>> +	/* If nmi shootdown happened or this CPU is offline, just bail out. */
>> +	if (cpus_shotdown() ||
> I don't like "cpus_shotdown" - it doesn't hint at all that this is
> special-handling crash/kdump.
>
> And more importantly, I want it to be obvious that we do let the
> crashing CPU into the MCE handler.

Hi Boris,

I made some improvements, what do you think the following one?
If you think it is fine, I can send out v3. Thanks for your time!

---
 arch/x86/include/asm/reboot.h    |  1 +
 arch/x86/kernel/cpu/mcheck/mce.c | 11 +++++++++--
 arch/x86/kernel/reboot.c         |  5 +++--
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 2cb1cc2..fc62ba8 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -15,6 +15,7 @@ struct machine_ops {
 };
 
 extern struct machine_ops machine_ops;
+extern int crashing_cpu;
 
 void native_machine_crash_shutdown(struct pt_regs *regs);
 void native_machine_shutdown(void);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8e9725c..7f53145 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -49,6 +49,7 @@
 #include <asm/tlbflush.h>
 #include <asm/mce.h>
 #include <asm/msr.h>
+#include <asm/reboot.h>
 
 #include "mce-internal.h"
 
@@ -1128,8 +1129,14 @@ void do_machine_check(struct pt_regs *regs, long error_code)
      */
     int lmce = 1;
 
-    /* If this CPU is offline, just bail out. */
-    if (cpu_is_offline(smp_processor_id())) {
+    /*
+     * Cases to bail out to avoid rendezvous process timeout:
+     * 1)If crashing_cpu was set, e.g. entering kdump,
+     *   we need to skip cpus remaining in 1st kernel.
+     * 2)If this CPU is offline.
+     */
+    if (crashing_cpu != -1 ||
+        cpu_is_offline(smp_processor_id())) {
         u64 mcgstatus;
 
         mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e244c19..92ecf4b 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -749,10 +749,11 @@ void machine_crash_shutdown(struct pt_regs *regs)
 #endif
 
 
+/* This keeps a track of which one is crashing cpu. */
+int crashing_cpu = -1;
+
 #if defined(CONFIG_SMP)
 
-/* This keeps a track of which one is crashing cpu. */
-static int crashing_cpu;
 static nmi_shootdown_cb shootdown_callback;
 
 static atomic_t waiting_for_crash_ipi;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
@ 2017-02-21 12:37     ` Xunlei Pang
  0 siblings, 0 replies; 18+ messages in thread
From: Xunlei Pang @ 2017-02-21 12:37 UTC (permalink / raw)
  To: Borislav Petkov, Xunlei Pang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On 02/20/2017 at 07:09 PM, Borislav Petkov wrote:
> On Mon, Feb 20, 2017 at 02:10:37PM +0800, Xunlei Pang wrote:
>> @@ -1128,8 +1129,9 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>>  	 */
>>  	int lmce = 1;
>>  
>> -	/* If this CPU is offline, just bail out. */
>> -	if (cpu_is_offline(smp_processor_id())) {
>> +	/* If nmi shootdown happened or this CPU is offline, just bail out. */
>> +	if (cpus_shotdown() ||
> I don't like "cpus_shotdown" - it doesn't hint at all that this is
> special-handling crash/kdump.
>
> And more importantly, I want it to be obvious that we do let the
> crashing CPU into the MCE handler.

Hi Boris,

I made some improvements, what do you think the following one?
If you think it is fine, I can send out v3. Thanks for your time!

---
 arch/x86/include/asm/reboot.h    |  1 +
 arch/x86/kernel/cpu/mcheck/mce.c | 11 +++++++++--
 arch/x86/kernel/reboot.c         |  5 +++--
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 2cb1cc2..fc62ba8 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -15,6 +15,7 @@ struct machine_ops {
 };
 
 extern struct machine_ops machine_ops;
+extern int crashing_cpu;
 
 void native_machine_crash_shutdown(struct pt_regs *regs);
 void native_machine_shutdown(void);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8e9725c..7f53145 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -49,6 +49,7 @@
 #include <asm/tlbflush.h>
 #include <asm/mce.h>
 #include <asm/msr.h>
+#include <asm/reboot.h>
 
 #include "mce-internal.h"
 
@@ -1128,8 +1129,14 @@ void do_machine_check(struct pt_regs *regs, long error_code)
      */
     int lmce = 1;
 
-    /* If this CPU is offline, just bail out. */
-    if (cpu_is_offline(smp_processor_id())) {
+    /*
+     * Cases to bail out to avoid rendezvous process timeout:
+     * 1)If crashing_cpu was set, e.g. entering kdump,
+     *   we need to skip cpus remaining in 1st kernel.
+     * 2)If this CPU is offline.
+     */
+    if (crashing_cpu != -1 ||
+        cpu_is_offline(smp_processor_id())) {
         u64 mcgstatus;
 
         mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e244c19..92ecf4b 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -749,10 +749,11 @@ void machine_crash_shutdown(struct pt_regs *regs)
 #endif
 
 
+/* This keeps a track of which one is crashing cpu. */
+int crashing_cpu = -1;
+
 #if defined(CONFIG_SMP)
 
-/* This keeps a track of which one is crashing cpu. */
-static int crashing_cpu;
 static nmi_shootdown_cb shootdown_callback;
 
 static atomic_t waiting_for_crash_ipi;
-- 
1.8.3.1

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
  2017-02-21 12:37     ` Xunlei Pang
@ 2017-02-21 17:27       ` Borislav Petkov
  -1 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2017-02-21 17:27 UTC (permalink / raw)
  To: xlpang
  Cc: x86, linux-kernel, kexec, Tony Luck, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Naoya Horiguchi

On Tue, Feb 21, 2017 at 08:37:21PM +0800, Xunlei Pang wrote:
> -    /* If this CPU is offline, just bail out. */
> -    if (cpu_is_offline(smp_processor_id())) {
> +    /*
> +     * Cases to bail out to avoid rendezvous process timeout:
> +     * 1)If crashing_cpu was set, e.g. entering kdump,
> +     *   we need to skip cpus remaining in 1st kernel.
> +     * 2)If this CPU is offline.
> +     */
> +    if (crashing_cpu != -1 ||
> +        cpu_is_offline(smp_processor_id())) {

You're still not letting the crashing_cpu enter the #MC handler. You
need to handle the MCE no matter how short the window is.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
@ 2017-02-21 17:27       ` Borislav Petkov
  0 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2017-02-21 17:27 UTC (permalink / raw)
  To: xlpang
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, x86, kexec,
	linux-kernel, Ingo Molnar, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On Tue, Feb 21, 2017 at 08:37:21PM +0800, Xunlei Pang wrote:
> -    /* If this CPU is offline, just bail out. */
> -    if (cpu_is_offline(smp_processor_id())) {
> +    /*
> +     * Cases to bail out to avoid rendezvous process timeout:
> +     * 1)If crashing_cpu was set, e.g. entering kdump,
> +     *   we need to skip cpus remaining in 1st kernel.
> +     * 2)If this CPU is offline.
> +     */
> +    if (crashing_cpu != -1 ||
> +        cpu_is_offline(smp_processor_id())) {

You're still not letting the crashing_cpu enter the #MC handler. You
need to handle the MCE no matter how short the window is.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-02-21 17:27 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-20  6:10 [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made Xunlei Pang
2017-02-20  6:10 ` Xunlei Pang
2017-02-20 11:09 ` Borislav Petkov
2017-02-20 11:09   ` Borislav Petkov
2017-02-20 13:29   ` Xunlei Pang
2017-02-20 13:29     ` Xunlei Pang
2017-02-20 20:26     ` Borislav Petkov
2017-02-20 20:26       ` Borislav Petkov
2017-02-21  1:28       ` Xunlei Pang
2017-02-21  1:28         ` Xunlei Pang
2017-02-21  8:41         ` Borislav Petkov
2017-02-21  8:41           ` Borislav Petkov
2017-02-21  1:26     ` Xunlei Pang
2017-02-21  1:26       ` Xunlei Pang
2017-02-21 12:37   ` Xunlei Pang
2017-02-21 12:37     ` Xunlei Pang
2017-02-21 17:27     ` Borislav Petkov
2017-02-21 17:27       ` Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.