All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
@ 2017-02-22  4:11 ` Xunlei Pang
  0 siblings, 0 replies; 6+ messages in thread
From: Xunlei Pang @ 2017-02-22  4:11 UTC (permalink / raw)
  To: x86, linux-kernel, kexec
  Cc: Tony Luck, Borislav Petkov, Ingo Molnar, Dave Young,
	Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda, Xunlei Pang,
	Naoya Horiguchi

We met an issue for kdump: after kdump kernel boots up,
and there comes a broadcasted mce in first kernel, the
other cpus remaining in first kernel will enter the old
mce handler of first kernel, then timeout and panic due
to MCE synchronization, finally reset the kdump cpus.

This patch lets cpus stay quiet after nmi_shootdown_cpus(),
so after kdump boots, cpus remaining in 1st kernel should 
not do anything except clearing MCG_STATUS. This is useful
for kdump to let vmcore dumping perform as hard as it can.

Previous efforts:
https://patchwork.kernel.org/patch/6167631/
https://lists.gt.net/linux/kernel/2146557

Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
---
v1->v2:
Using crashing_cpu according to Borislav's suggestion.

v2->v3:
- Used crashing_cpu in mce.c explicitly, not skip crashing_cpu.
- Added some comments.

 arch/x86/include/asm/reboot.h    |  1 +
 arch/x86/kernel/cpu/mcheck/mce.c | 12 ++++++++++--
 arch/x86/kernel/reboot.c         |  5 +++--
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 2cb1cc2..fc62ba8 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -15,6 +15,7 @@ struct machine_ops {
 };
 
 extern struct machine_ops machine_ops;
+extern int crashing_cpu;
 
 void native_machine_crash_shutdown(struct pt_regs *regs);
 void native_machine_shutdown(void);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8e9725c..1493222 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -49,6 +49,7 @@
 #include <asm/tlbflush.h>
 #include <asm/mce.h>
 #include <asm/msr.h>
+#include <asm/reboot.h>
 
 #include "mce-internal.h"
 
@@ -1127,9 +1128,16 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	 * on Intel.
 	 */
 	int lmce = 1;
+	int cpu = smp_processor_id();
 
-	/* If this CPU is offline, just bail out. */
-	if (cpu_is_offline(smp_processor_id())) {
+	/*
+	 * Cases to bail out to avoid rendezvous process timeout:
+	 * 1)If this CPU is offline.
+	 * 2)If crashing_cpu was set, e.g. entering kdump,
+	 *   we need to skip cpus remaining in 1st kernel.
+	 */
+	if (cpu_is_offline(cpu) ||
+	    (crashing_cpu != -1 && crashing_cpu != cpu)) {
 		u64 mcgstatus;
 
 		mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e244c19..92ecf4b 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -749,10 +749,11 @@ void machine_crash_shutdown(struct pt_regs *regs)
 #endif
 
 
+/* This keeps a track of which one is crashing cpu. */
+int crashing_cpu = -1;
+
 #if defined(CONFIG_SMP)
 
-/* This keeps a track of which one is crashing cpu. */
-static int crashing_cpu;
 static nmi_shootdown_cb shootdown_callback;
 
 static atomic_t waiting_for_crash_ipi;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
@ 2017-02-22  4:11 ` Xunlei Pang
  0 siblings, 0 replies; 6+ messages in thread
From: Xunlei Pang @ 2017-02-22  4:11 UTC (permalink / raw)
  To: x86, linux-kernel, kexec
  Cc: Prarit Bhargava, Kiyoshi Ueda, Tony Luck, Xunlei Pang,
	Ingo Molnar, Borislav Petkov, Junichi Nomura, Naoya Horiguchi,
	Dave Young

We met an issue for kdump: after kdump kernel boots up,
and there comes a broadcasted mce in first kernel, the
other cpus remaining in first kernel will enter the old
mce handler of first kernel, then timeout and panic due
to MCE synchronization, finally reset the kdump cpus.

This patch lets cpus stay quiet after nmi_shootdown_cpus(),
so after kdump boots, cpus remaining in 1st kernel should 
not do anything except clearing MCG_STATUS. This is useful
for kdump to let vmcore dumping perform as hard as it can.

Previous efforts:
https://patchwork.kernel.org/patch/6167631/
https://lists.gt.net/linux/kernel/2146557

Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
---
v1->v2:
Using crashing_cpu according to Borislav's suggestion.

v2->v3:
- Used crashing_cpu in mce.c explicitly, not skip crashing_cpu.
- Added some comments.

 arch/x86/include/asm/reboot.h    |  1 +
 arch/x86/kernel/cpu/mcheck/mce.c | 12 ++++++++++--
 arch/x86/kernel/reboot.c         |  5 +++--
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 2cb1cc2..fc62ba8 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -15,6 +15,7 @@ struct machine_ops {
 };
 
 extern struct machine_ops machine_ops;
+extern int crashing_cpu;
 
 void native_machine_crash_shutdown(struct pt_regs *regs);
 void native_machine_shutdown(void);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8e9725c..1493222 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -49,6 +49,7 @@
 #include <asm/tlbflush.h>
 #include <asm/mce.h>
 #include <asm/msr.h>
+#include <asm/reboot.h>
 
 #include "mce-internal.h"
 
@@ -1127,9 +1128,16 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	 * on Intel.
 	 */
 	int lmce = 1;
+	int cpu = smp_processor_id();
 
-	/* If this CPU is offline, just bail out. */
-	if (cpu_is_offline(smp_processor_id())) {
+	/*
+	 * Cases to bail out to avoid rendezvous process timeout:
+	 * 1)If this CPU is offline.
+	 * 2)If crashing_cpu was set, e.g. entering kdump,
+	 *   we need to skip cpus remaining in 1st kernel.
+	 */
+	if (cpu_is_offline(cpu) ||
+	    (crashing_cpu != -1 && crashing_cpu != cpu)) {
 		u64 mcgstatus;
 
 		mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e244c19..92ecf4b 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -749,10 +749,11 @@ void machine_crash_shutdown(struct pt_regs *regs)
 #endif
 
 
+/* This keeps a track of which one is crashing cpu. */
+int crashing_cpu = -1;
+
 #if defined(CONFIG_SMP)
 
-/* This keeps a track of which one is crashing cpu. */
-static int crashing_cpu;
 static nmi_shootdown_cb shootdown_callback;
 
 static atomic_t waiting_for_crash_ipi;
-- 
1.8.3.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
  2017-02-22  4:11 ` Xunlei Pang
@ 2017-02-22 18:50   ` Luck, Tony
  -1 siblings, 0 replies; 6+ messages in thread
From: Luck, Tony @ 2017-02-22 18:50 UTC (permalink / raw)
  To: Xunlei Pang
  Cc: x86, linux-kernel, kexec, Borislav Petkov, Ingo Molnar,
	Dave Young, Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda,
	Naoya Horiguchi

On Wed, Feb 22, 2017 at 12:11:14PM +0800, Xunlei Pang wrote:
> +	/*
> +	 * Cases to bail out to avoid rendezvous process timeout:
> +	 * 1)If this CPU is offline.
> +	 * 2)If crashing_cpu was set, e.g. entering kdump,
> +	 *   we need to skip cpus remaining in 1st kernel.
> +	 */
> +	if (cpu_is_offline(cpu) ||
> +	    (crashing_cpu != -1 && crashing_cpu != cpu)) {
>  		u64 mcgstatus;
>  
>  		mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);


I think we should document the remaining race conditions. I don't
think there is any good way to eliminate them, and they are already
pretty small windows.

I think the sequence of events looks like:

     1	Panic occurs
     2	nmi_shootdown_cpus() sets crashing_cpu
     3	send NMI to everyone else
     4	wait up to a second for other CPUs to take NMI
     5	go to kexec code
     6	start new kernel
     7	new kernel establishes #MC handler

If one of the other cpus triggers a machine check while
getting to, or in, the NMI handler ... then that cpu will
skip processing (if RIPV is set).

Between '2' and '5' if crashing_cpu gets a machine check it
will execute in the old kernel handler, and do the right thing.

There's a fuzzy area between '6' and '7' where a machine check
might not end up in the right code.

>From '7' onwards the kexec kernel will handle and machine
checks caused by kdump.

-Tony

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
@ 2017-02-22 18:50   ` Luck, Tony
  0 siblings, 0 replies; 6+ messages in thread
From: Luck, Tony @ 2017-02-22 18:50 UTC (permalink / raw)
  To: Xunlei Pang
  Cc: Prarit Bhargava, Kiyoshi Ueda, x86, kexec, linux-kernel,
	Ingo Molnar, Borislav Petkov, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On Wed, Feb 22, 2017 at 12:11:14PM +0800, Xunlei Pang wrote:
> +	/*
> +	 * Cases to bail out to avoid rendezvous process timeout:
> +	 * 1)If this CPU is offline.
> +	 * 2)If crashing_cpu was set, e.g. entering kdump,
> +	 *   we need to skip cpus remaining in 1st kernel.
> +	 */
> +	if (cpu_is_offline(cpu) ||
> +	    (crashing_cpu != -1 && crashing_cpu != cpu)) {
>  		u64 mcgstatus;
>  
>  		mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);


I think we should document the remaining race conditions. I don't
think there is any good way to eliminate them, and they are already
pretty small windows.

I think the sequence of events looks like:

     1	Panic occurs
     2	nmi_shootdown_cpus() sets crashing_cpu
     3	send NMI to everyone else
     4	wait up to a second for other CPUs to take NMI
     5	go to kexec code
     6	start new kernel
     7	new kernel establishes #MC handler

If one of the other cpus triggers a machine check while
getting to, or in, the NMI handler ... then that cpu will
skip processing (if RIPV is set).

Between '2' and '5' if crashing_cpu gets a machine check it
will execute in the old kernel handler, and do the right thing.

There's a fuzzy area between '6' and '7' where a machine check
might not end up in the right code.

From '7' onwards the kexec kernel will handle and machine
checks caused by kdump.

-Tony

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
  2017-02-22 18:50   ` Luck, Tony
@ 2017-02-23  6:04     ` Xunlei Pang
  -1 siblings, 0 replies; 6+ messages in thread
From: Xunlei Pang @ 2017-02-23  6:04 UTC (permalink / raw)
  To: Luck, Tony, Xunlei Pang
  Cc: x86, linux-kernel, kexec, Borislav Petkov, Ingo Molnar,
	Dave Young, Prarit Bhargava, Junichi Nomura, Kiyoshi Ueda,
	Naoya Horiguchi

On 02/23/2017 at 02:50 AM, Luck, Tony wrote:
> On Wed, Feb 22, 2017 at 12:11:14PM +0800, Xunlei Pang wrote:
>> +	/*
>> +	 * Cases to bail out to avoid rendezvous process timeout:
>> +	 * 1)If this CPU is offline.
>> +	 * 2)If crashing_cpu was set, e.g. entering kdump,
>> +	 *   we need to skip cpus remaining in 1st kernel.
>> +	 */
>> +	if (cpu_is_offline(cpu) ||
>> +	    (crashing_cpu != -1 && crashing_cpu != cpu)) {
>>  		u64 mcgstatus;
>>  
>>  		mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>
> I think we should document the remaining race conditions. I don't
> think there is any good way to eliminate them, and they are already
> pretty small windows.
>
> I think the sequence of events looks like:
>
>      1	Panic occurs
>      2	nmi_shootdown_cpus() sets crashing_cpu
>      3	send NMI to everyone else
>      4	wait up to a second for other CPUs to take NMI
>      5	go to kexec code
>      6	start new kernel
>      7	new kernel establishes #MC handler
>
> If one of the other cpus triggers a machine check while
> getting to, or in, the NMI handler ... then that cpu will
> skip processing (if RIPV is set).
>
> Between '2' and '5' if crashing_cpu gets a machine check it
> will execute in the old kernel handler, and do the right thing.
>
> There's a fuzzy area between '6' and '7' where a machine check
> might not end up in the right code.
>
> From '7' onwards the kexec kernel will handle and machine
> checks caused by kdump.
>

Agree, will update the comment.

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
@ 2017-02-23  6:04     ` Xunlei Pang
  0 siblings, 0 replies; 6+ messages in thread
From: Xunlei Pang @ 2017-02-23  6:04 UTC (permalink / raw)
  To: Luck, Tony, Xunlei Pang
  Cc: Prarit Bhargava, Kiyoshi Ueda, x86, kexec, linux-kernel,
	Ingo Molnar, Borislav Petkov, Junichi Nomura, Naoya Horiguchi,
	Dave Young

On 02/23/2017 at 02:50 AM, Luck, Tony wrote:
> On Wed, Feb 22, 2017 at 12:11:14PM +0800, Xunlei Pang wrote:
>> +	/*
>> +	 * Cases to bail out to avoid rendezvous process timeout:
>> +	 * 1)If this CPU is offline.
>> +	 * 2)If crashing_cpu was set, e.g. entering kdump,
>> +	 *   we need to skip cpus remaining in 1st kernel.
>> +	 */
>> +	if (cpu_is_offline(cpu) ||
>> +	    (crashing_cpu != -1 && crashing_cpu != cpu)) {
>>  		u64 mcgstatus;
>>  
>>  		mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>
> I think we should document the remaining race conditions. I don't
> think there is any good way to eliminate them, and they are already
> pretty small windows.
>
> I think the sequence of events looks like:
>
>      1	Panic occurs
>      2	nmi_shootdown_cpus() sets crashing_cpu
>      3	send NMI to everyone else
>      4	wait up to a second for other CPUs to take NMI
>      5	go to kexec code
>      6	start new kernel
>      7	new kernel establishes #MC handler
>
> If one of the other cpus triggers a machine check while
> getting to, or in, the NMI handler ... then that cpu will
> skip processing (if RIPV is set).
>
> Between '2' and '5' if crashing_cpu gets a machine check it
> will execute in the old kernel handler, and do the right thing.
>
> There's a fuzzy area between '6' and '7' where a machine check
> might not end up in the right code.
>
> From '7' onwards the kexec kernel will handle and machine
> checks caused by kdump.
>

Agree, will update the comment.

Regards,
Xunlei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-02-23  6:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-22  4:11 [PATCH v3] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made Xunlei Pang
2017-02-22  4:11 ` Xunlei Pang
2017-02-22 18:50 ` Luck, Tony
2017-02-22 18:50   ` Luck, Tony
2017-02-23  6:04   ` Xunlei Pang
2017-02-23  6:04     ` Xunlei Pang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.