linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] x86/smpboot: Unbreak CPU0 hotplug
@ 2017-06-26 16:36 Vitaly Kuznetsov
  2017-06-30  8:25 ` Vitaly Kuznetsov
  0 siblings, 1 reply; 2+ messages in thread
From: Vitaly Kuznetsov @ 2017-06-26 16:36 UTC (permalink / raw)
  To: x86; +Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel

A hang on CPU0 onlining after a preceding offlining is observed. Trace
shows that CPU0 is stuck in check_tsc_sync_target() waiting for source
CPU to run check_tsc_sync_source() but this never happens. Source CPU,
in its turn, is stuck on synchronize_sched() which is called from
native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler().

Fix the issue by moving unregister_nmi_handler() from do_boot_cpu() to
native_cpu_up() after cpu onlining is done.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
RFC part: I tested the fix on KVM and Xen guests and it works just fine
but CPU0 hotplug is not something I'm really familiar with, I may be
missing some important details. I also skipped code archeology to figure
out when things got broken.
---
 arch/x86/kernel/smpboot.c | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f04479a..00349b6 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -971,7 +971,8 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
  * Returns zero if CPU booted OK, else error code from
  * ->wakeup_secondary_cpu.
  */
-static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
+static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
+		       int *cpu0_nmi_registered)
 {
 	volatile u32 *trampoline_status =
 		(volatile u32 *) __va(real_mode_header->trampoline_status);
@@ -979,7 +980,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 	unsigned long start_ip = real_mode_header->trampoline_start;
 
 	unsigned long boot_error = 0;
-	int cpu0_nmi_registered = 0;
 	unsigned long timeout;
 
 	idle->thread.sp = (unsigned long)task_pt_regs(idle);
@@ -1035,7 +1035,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 		boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
 	else
 		boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
-						     &cpu0_nmi_registered);
+						     cpu0_nmi_registered);
 
 	if (!boot_error) {
 		/*
@@ -1080,12 +1080,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 		 */
 		smpboot_restore_warm_reset_vector();
 	}
-	/*
-	 * Clean up the nmi handler. Do this after the callin and callout sync
-	 * to avoid impact of possible long unregister time.
-	 */
-	if (cpu0_nmi_registered)
-		unregister_nmi_handler(NMI_LOCAL, "wake_cpu0");
 
 	return boot_error;
 }
@@ -1093,8 +1087,9 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 {
 	int apicid = apic->cpu_present_to_apicid(cpu);
+	int cpu0_nmi_registered = 0;
 	unsigned long flags;
-	int err;
+	int err, ret = 0;
 
 	WARN_ON(irqs_disabled());
 
@@ -1131,10 +1126,11 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 
 	common_cpu_up(cpu, tidle);
 
-	err = do_boot_cpu(apicid, cpu, tidle);
+	err = do_boot_cpu(apicid, cpu, tidle, &cpu0_nmi_registered);
 	if (err) {
 		pr_err("do_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu);
-		return -EIO;
+		ret = -EIO;
+		goto unreg_nmi;
 	}
 
 	/*
@@ -1150,7 +1146,15 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 		touch_nmi_watchdog();
 	}
 
-	return 0;
+unreg_nmi:
+	/*
+	 * Clean up the nmi handler. Do this after the callin and callout sync
+	 * to avoid impact of possible long unregister time.
+	 */
+	if (cpu0_nmi_registered)
+		unregister_nmi_handler(NMI_LOCAL, "wake_cpu0");
+
+	return ret;
 }
 
 /**
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH RFC] x86/smpboot: Unbreak CPU0 hotplug
  2017-06-26 16:36 [PATCH RFC] x86/smpboot: Unbreak CPU0 hotplug Vitaly Kuznetsov
@ 2017-06-30  8:25 ` Vitaly Kuznetsov
  0 siblings, 0 replies; 2+ messages in thread
From: Vitaly Kuznetsov @ 2017-06-30  8:25 UTC (permalink / raw)
  To: x86; +Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> A hang on CPU0 onlining after a preceding offlining is observed. Trace
> shows that CPU0 is stuck in check_tsc_sync_target() waiting for source
> CPU to run check_tsc_sync_source() but this never happens. Source CPU,
> in its turn, is stuck on synchronize_sched() which is called from
> native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler().
>
> Fix the issue by moving unregister_nmi_handler() from do_boot_cpu() to
> native_cpu_up() after cpu onlining is done.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> RFC part: I tested the fix on KVM and Xen guests and it works just fine
> but CPU0 hotplug is not something I'm really familiar with, I may be
> missing some important details. I also skipped code archeology to figure
> out when things got broken.

Any comments? This can be considered as non-RFC if not :-)

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-06-30  8:25 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-26 16:36 [PATCH RFC] x86/smpboot: Unbreak CPU0 hotplug Vitaly Kuznetsov
2017-06-30  8:25 ` Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).