linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug
@ 2017-08-03 10:58 Vitaly Kuznetsov
  2017-08-10 12:01 ` Vitaly Kuznetsov
  2017-08-10 16:39 ` [tip:x86/urgent] " tip-bot for Vitaly Kuznetsov
  0 siblings, 2 replies; 6+ messages in thread
From: Vitaly Kuznetsov @ 2017-08-03 10:58 UTC (permalink / raw)
  To: x86; +Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin

A hang on CPU0 onlining after a preceding offlining is observed. Trace
shows that CPU0 is stuck in check_tsc_sync_target() waiting for source
CPU to run check_tsc_sync_source() but this never happens. Source CPU,
in its turn, is stuck on synchronize_sched() which is called from
native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler().

Fix the issue by moving unregister_nmi_handler() from do_boot_cpu() to
native_cpu_up() after cpu onlining is done.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
It's been awile since my v1 submission, no comments so far. Resending.
---
 arch/x86/kernel/smpboot.c | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index b474c8de7fba..54b9e89d4d6b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -971,7 +971,8 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
  * Returns zero if CPU booted OK, else error code from
  * ->wakeup_secondary_cpu.
  */
-static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
+static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
+		       int *cpu0_nmi_registered)
 {
 	volatile u32 *trampoline_status =
 		(volatile u32 *) __va(real_mode_header->trampoline_status);
@@ -979,7 +980,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 	unsigned long start_ip = real_mode_header->trampoline_start;
 
 	unsigned long boot_error = 0;
-	int cpu0_nmi_registered = 0;
 	unsigned long timeout;
 
 	idle->thread.sp = (unsigned long)task_pt_regs(idle);
@@ -1035,7 +1035,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 		boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
 	else
 		boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
-						     &cpu0_nmi_registered);
+						     cpu0_nmi_registered);
 
 	if (!boot_error) {
 		/*
@@ -1080,12 +1080,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 		 */
 		smpboot_restore_warm_reset_vector();
 	}
-	/*
-	 * Clean up the nmi handler. Do this after the callin and callout sync
-	 * to avoid impact of possible long unregister time.
-	 */
-	if (cpu0_nmi_registered)
-		unregister_nmi_handler(NMI_LOCAL, "wake_cpu0");
 
 	return boot_error;
 }
@@ -1093,8 +1087,9 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 {
 	int apicid = apic->cpu_present_to_apicid(cpu);
+	int cpu0_nmi_registered = 0;
 	unsigned long flags;
-	int err;
+	int err, ret = 0;
 
 	WARN_ON(irqs_disabled());
 
@@ -1131,10 +1126,11 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 
 	common_cpu_up(cpu, tidle);
 
-	err = do_boot_cpu(apicid, cpu, tidle);
+	err = do_boot_cpu(apicid, cpu, tidle, &cpu0_nmi_registered);
 	if (err) {
 		pr_err("do_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu);
-		return -EIO;
+		ret = -EIO;
+		goto unreg_nmi;
 	}
 
 	/*
@@ -1150,7 +1146,15 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 		touch_nmi_watchdog();
 	}
 
-	return 0;
+unreg_nmi:
+	/*
+	 * Clean up the nmi handler. Do this after the callin and callout sync
+	 * to avoid impact of possible long unregister time.
+	 */
+	if (cpu0_nmi_registered)
+		unregister_nmi_handler(NMI_LOCAL, "wake_cpu0");
+
+	return ret;
 }
 
 /**
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug
  2017-08-03 10:58 [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug Vitaly Kuznetsov
@ 2017-08-10 12:01 ` Vitaly Kuznetsov
  2017-08-10 14:23   ` Ingo Molnar
  2017-08-10 16:39 ` [tip:x86/urgent] " tip-bot for Vitaly Kuznetsov
  1 sibling, 1 reply; 6+ messages in thread
From: Vitaly Kuznetsov @ 2017-08-10 12:01 UTC (permalink / raw)
  To: x86; +Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> A hang on CPU0 onlining after a preceding offlining is observed. Trace
> shows that CPU0 is stuck in check_tsc_sync_target() waiting for source
> CPU to run check_tsc_sync_source() but this never happens. Source CPU,
> in its turn, is stuck on synchronize_sched() which is called from
> native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler().
>
> Fix the issue by moving unregister_nmi_handler() from do_boot_cpu() to
> native_cpu_up() after cpu onlining is done.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> It's been awile since my v1 submission, no comments so far. Resending.

Sorry, but 

ping?

I haven't received a single comment on this since the initial submission
on June, 26 - is it so bad? :-)

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug
  2017-08-10 12:01 ` Vitaly Kuznetsov
@ 2017-08-10 14:23   ` Ingo Molnar
  2017-08-10 14:56     ` Vitaly Kuznetsov
  2017-08-14  7:09     ` Thomas Gleixner
  0 siblings, 2 replies; 6+ messages in thread
From: Ingo Molnar @ 2017-08-10 14:23 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin


* Vitaly Kuznetsov <vkuznets@redhat.com> wrote:

> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> 
> > A hang on CPU0 onlining after a preceding offlining is observed. Trace
> > shows that CPU0 is stuck in check_tsc_sync_target() waiting for source
> > CPU to run check_tsc_sync_source() but this never happens. Source CPU,
> > in its turn, is stuck on synchronize_sched() which is called from
> > native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler().
> >
> > Fix the issue by moving unregister_nmi_handler() from do_boot_cpu() to
> > native_cpu_up() after cpu onlining is done.

Looks like a classic ABBA deadlock, due to the use of synchronize_sched() in 
unregister_nmi_handler(), right?

> >
> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > ---
> > It's been awile since my v1 submission, no comments so far. Resending.
> 
> Sorry, but 
> 
> ping?
> 
> I haven't received a single comment on this since the initial submission
> on June, 26 - is it so bad? :-)

So the fix looks good to me at first sight, but wanted to wait for Thomas to ack 
it - once he gets back from vacation.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug
  2017-08-10 14:23   ` Ingo Molnar
@ 2017-08-10 14:56     ` Vitaly Kuznetsov
  2017-08-14  7:09     ` Thomas Gleixner
  1 sibling, 0 replies; 6+ messages in thread
From: Vitaly Kuznetsov @ 2017-08-10 14:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin

Ingo Molnar <mingo@kernel.org> writes:

> * Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
>> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>> 
>> > A hang on CPU0 onlining after a preceding offlining is observed. Trace
>> > shows that CPU0 is stuck in check_tsc_sync_target() waiting for source
>> > CPU to run check_tsc_sync_source() but this never happens. Source CPU,
>> > in its turn, is stuck on synchronize_sched() which is called from
>> > native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler().
>> >
>> > Fix the issue by moving unregister_nmi_handler() from do_boot_cpu() to
>> > native_cpu_up() after cpu onlining is done.
>
> Looks like a classic ABBA deadlock, due to the use of synchronize_sched() in 
> unregister_nmi_handler(), right?
>

Exactly.

>> >
>> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> > ---
>> > It's been awile since my v1 submission, no comments so far. Resending.
>> 
>> Sorry, but 
>> 
>> ping?
>> 
>> I haven't received a single comment on this since the initial submission
>> on June, 26 - is it so bad? :-)
>
> So the fix looks good to me at first sight, but wanted to wait for Thomas to ack 
> it - once he gets back from vacation.
>

Thanks!

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:x86/urgent] x86/smpboot: Unbreak CPU0 hotplug
  2017-08-03 10:58 [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug Vitaly Kuznetsov
  2017-08-10 12:01 ` Vitaly Kuznetsov
@ 2017-08-10 16:39 ` tip-bot for Vitaly Kuznetsov
  1 sibling, 0 replies; 6+ messages in thread
From: tip-bot for Vitaly Kuznetsov @ 2017-08-10 16:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, tglx, vkuznets, torvalds

Commit-ID:  10e66760fa8ee11f254a69433fc132d04758a5fc
Gitweb:     http://git.kernel.org/tip/10e66760fa8ee11f254a69433fc132d04758a5fc
Author:     Vitaly Kuznetsov <vkuznets@redhat.com>
AuthorDate: Thu, 3 Aug 2017 12:58:18 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Aug 2017 17:02:47 +0200

x86/smpboot: Unbreak CPU0 hotplug

A hang on CPU0 onlining after a preceding offlining is observed. Trace
shows that CPU0 is stuck in check_tsc_sync_target() waiting for source
CPU to run check_tsc_sync_source() but this never happens. Source CPU,
in its turn, is stuck on synchronize_sched() which is called from
native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler().

So it's a classic ABBA deadlock, due to the use of synchronize_sched() in
unregister_nmi_handler().

Fix the bug by moving unregister_nmi_handler() from do_boot_cpu() to
native_cpu_up() after cpu onlining is done.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170803105818.9934-1-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/smpboot.c | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index b474c8d..54b9e89 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -971,7 +971,8 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
  * Returns zero if CPU booted OK, else error code from
  * ->wakeup_secondary_cpu.
  */
-static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
+static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
+		       int *cpu0_nmi_registered)
 {
 	volatile u32 *trampoline_status =
 		(volatile u32 *) __va(real_mode_header->trampoline_status);
@@ -979,7 +980,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 	unsigned long start_ip = real_mode_header->trampoline_start;
 
 	unsigned long boot_error = 0;
-	int cpu0_nmi_registered = 0;
 	unsigned long timeout;
 
 	idle->thread.sp = (unsigned long)task_pt_regs(idle);
@@ -1035,7 +1035,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 		boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
 	else
 		boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
-						     &cpu0_nmi_registered);
+						     cpu0_nmi_registered);
 
 	if (!boot_error) {
 		/*
@@ -1080,12 +1080,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 		 */
 		smpboot_restore_warm_reset_vector();
 	}
-	/*
-	 * Clean up the nmi handler. Do this after the callin and callout sync
-	 * to avoid impact of possible long unregister time.
-	 */
-	if (cpu0_nmi_registered)
-		unregister_nmi_handler(NMI_LOCAL, "wake_cpu0");
 
 	return boot_error;
 }
@@ -1093,8 +1087,9 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 {
 	int apicid = apic->cpu_present_to_apicid(cpu);
+	int cpu0_nmi_registered = 0;
 	unsigned long flags;
-	int err;
+	int err, ret = 0;
 
 	WARN_ON(irqs_disabled());
 
@@ -1131,10 +1126,11 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 
 	common_cpu_up(cpu, tidle);
 
-	err = do_boot_cpu(apicid, cpu, tidle);
+	err = do_boot_cpu(apicid, cpu, tidle, &cpu0_nmi_registered);
 	if (err) {
 		pr_err("do_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu);
-		return -EIO;
+		ret = -EIO;
+		goto unreg_nmi;
 	}
 
 	/*
@@ -1150,7 +1146,15 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 		touch_nmi_watchdog();
 	}
 
-	return 0;
+unreg_nmi:
+	/*
+	 * Clean up the nmi handler. Do this after the callin and callout sync
+	 * to avoid impact of possible long unregister time.
+	 */
+	if (cpu0_nmi_registered)
+		unregister_nmi_handler(NMI_LOCAL, "wake_cpu0");
+
+	return ret;
 }
 
 /**

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug
  2017-08-10 14:23   ` Ingo Molnar
  2017-08-10 14:56     ` Vitaly Kuznetsov
@ 2017-08-14  7:09     ` Thomas Gleixner
  1 sibling, 0 replies; 6+ messages in thread
From: Thomas Gleixner @ 2017-08-14  7:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Vitaly Kuznetsov, x86, linux-kernel, Ingo Molnar, H. Peter Anvin

On Thu, 10 Aug 2017, Ingo Molnar wrote:
> * Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> > I haven't received a single comment on this since the initial submission
> > on June, 26 - is it so bad? :-)

Sorry slipped through the cracks.

> So the fix looks good to me at first sight, but wanted to wait for Thomas to ack 
> it - once he gets back from vacation.

Acked-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-08-14  7:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-03 10:58 [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug Vitaly Kuznetsov
2017-08-10 12:01 ` Vitaly Kuznetsov
2017-08-10 14:23   ` Ingo Molnar
2017-08-10 14:56     ` Vitaly Kuznetsov
2017-08-14  7:09     ` Thomas Gleixner
2017-08-10 16:39 ` [tip:x86/urgent] " tip-bot for Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).