* [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug
@ 2017-08-03 10:58 Vitaly Kuznetsov
2017-08-10 12:01 ` Vitaly Kuznetsov
2017-08-10 16:39 ` [tip:x86/urgent] " tip-bot for Vitaly Kuznetsov
0 siblings, 2 replies; 6+ messages in thread
From: Vitaly Kuznetsov @ 2017-08-03 10:58 UTC (permalink / raw)
To: x86; +Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin
A hang on CPU0 onlining after a preceding offlining is observed. Trace
shows that CPU0 is stuck in check_tsc_sync_target() waiting for source
CPU to run check_tsc_sync_source() but this never happens. Source CPU,
in its turn, is stuck on synchronize_sched() which is called from
native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler().
Fix the issue by moving unregister_nmi_handler() from do_boot_cpu() to
native_cpu_up() after cpu onlining is done.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
It's been awile since my v1 submission, no comments so far. Resending.
---
arch/x86/kernel/smpboot.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index b474c8de7fba..54b9e89d4d6b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -971,7 +971,8 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
* Returns zero if CPU booted OK, else error code from
* ->wakeup_secondary_cpu.
*/
-static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
+static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
+ int *cpu0_nmi_registered)
{
volatile u32 *trampoline_status =
(volatile u32 *) __va(real_mode_header->trampoline_status);
@@ -979,7 +980,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
unsigned long start_ip = real_mode_header->trampoline_start;
unsigned long boot_error = 0;
- int cpu0_nmi_registered = 0;
unsigned long timeout;
idle->thread.sp = (unsigned long)task_pt_regs(idle);
@@ -1035,7 +1035,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
else
boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
- &cpu0_nmi_registered);
+ cpu0_nmi_registered);
if (!boot_error) {
/*
@@ -1080,12 +1080,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
*/
smpboot_restore_warm_reset_vector();
}
- /*
- * Clean up the nmi handler. Do this after the callin and callout sync
- * to avoid impact of possible long unregister time.
- */
- if (cpu0_nmi_registered)
- unregister_nmi_handler(NMI_LOCAL, "wake_cpu0");
return boot_error;
}
@@ -1093,8 +1087,9 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
{
int apicid = apic->cpu_present_to_apicid(cpu);
+ int cpu0_nmi_registered = 0;
unsigned long flags;
- int err;
+ int err, ret = 0;
WARN_ON(irqs_disabled());
@@ -1131,10 +1126,11 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
common_cpu_up(cpu, tidle);
- err = do_boot_cpu(apicid, cpu, tidle);
+ err = do_boot_cpu(apicid, cpu, tidle, &cpu0_nmi_registered);
if (err) {
pr_err("do_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu);
- return -EIO;
+ ret = -EIO;
+ goto unreg_nmi;
}
/*
@@ -1150,7 +1146,15 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
touch_nmi_watchdog();
}
- return 0;
+unreg_nmi:
+ /*
+ * Clean up the nmi handler. Do this after the callin and callout sync
+ * to avoid impact of possible long unregister time.
+ */
+ if (cpu0_nmi_registered)
+ unregister_nmi_handler(NMI_LOCAL, "wake_cpu0");
+
+ return ret;
}
/**
--
2.13.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug
2017-08-03 10:58 [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug Vitaly Kuznetsov
@ 2017-08-10 12:01 ` Vitaly Kuznetsov
2017-08-10 14:23 ` Ingo Molnar
2017-08-10 16:39 ` [tip:x86/urgent] " tip-bot for Vitaly Kuznetsov
1 sibling, 1 reply; 6+ messages in thread
From: Vitaly Kuznetsov @ 2017-08-10 12:01 UTC (permalink / raw)
To: x86; +Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin
Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> A hang on CPU0 onlining after a preceding offlining is observed. Trace
> shows that CPU0 is stuck in check_tsc_sync_target() waiting for source
> CPU to run check_tsc_sync_source() but this never happens. Source CPU,
> in its turn, is stuck on synchronize_sched() which is called from
> native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler().
>
> Fix the issue by moving unregister_nmi_handler() from do_boot_cpu() to
> native_cpu_up() after cpu onlining is done.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> It's been awile since my v1 submission, no comments so far. Resending.
Sorry, but
ping?
I haven't received a single comment on this since the initial submission
on June, 26 - is it so bad? :-)
--
Vitaly
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug
2017-08-10 12:01 ` Vitaly Kuznetsov
@ 2017-08-10 14:23 ` Ingo Molnar
2017-08-10 14:56 ` Vitaly Kuznetsov
2017-08-14 7:09 ` Thomas Gleixner
0 siblings, 2 replies; 6+ messages in thread
From: Ingo Molnar @ 2017-08-10 14:23 UTC (permalink / raw)
To: Vitaly Kuznetsov
Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin
* Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>
> > A hang on CPU0 onlining after a preceding offlining is observed. Trace
> > shows that CPU0 is stuck in check_tsc_sync_target() waiting for source
> > CPU to run check_tsc_sync_source() but this never happens. Source CPU,
> > in its turn, is stuck on synchronize_sched() which is called from
> > native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler().
> >
> > Fix the issue by moving unregister_nmi_handler() from do_boot_cpu() to
> > native_cpu_up() after cpu onlining is done.
Looks like a classic ABBA deadlock, due to the use of synchronize_sched() in
unregister_nmi_handler(), right?
> >
> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > ---
> > It's been awile since my v1 submission, no comments so far. Resending.
>
> Sorry, but
>
> ping?
>
> I haven't received a single comment on this since the initial submission
> on June, 26 - is it so bad? :-)
So the fix looks good to me at first sight, but wanted to wait for Thomas to ack
it - once he gets back from vacation.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug
2017-08-10 14:23 ` Ingo Molnar
@ 2017-08-10 14:56 ` Vitaly Kuznetsov
2017-08-14 7:09 ` Thomas Gleixner
1 sibling, 0 replies; 6+ messages in thread
From: Vitaly Kuznetsov @ 2017-08-10 14:56 UTC (permalink / raw)
To: Ingo Molnar
Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin
Ingo Molnar <mingo@kernel.org> writes:
> * Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
>> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>>
>> > A hang on CPU0 onlining after a preceding offlining is observed. Trace
>> > shows that CPU0 is stuck in check_tsc_sync_target() waiting for source
>> > CPU to run check_tsc_sync_source() but this never happens. Source CPU,
>> > in its turn, is stuck on synchronize_sched() which is called from
>> > native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler().
>> >
>> > Fix the issue by moving unregister_nmi_handler() from do_boot_cpu() to
>> > native_cpu_up() after cpu onlining is done.
>
> Looks like a classic ABBA deadlock, due to the use of synchronize_sched() in
> unregister_nmi_handler(), right?
>
Exactly.
>> >
>> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> > ---
>> > It's been awile since my v1 submission, no comments so far. Resending.
>>
>> Sorry, but
>>
>> ping?
>>
>> I haven't received a single comment on this since the initial submission
>> on June, 26 - is it so bad? :-)
>
> So the fix looks good to me at first sight, but wanted to wait for Thomas to ack
> it - once he gets back from vacation.
>
Thanks!
--
Vitaly
^ permalink raw reply [flat|nested] 6+ messages in thread
* [tip:x86/urgent] x86/smpboot: Unbreak CPU0 hotplug
2017-08-03 10:58 [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug Vitaly Kuznetsov
2017-08-10 12:01 ` Vitaly Kuznetsov
@ 2017-08-10 16:39 ` tip-bot for Vitaly Kuznetsov
1 sibling, 0 replies; 6+ messages in thread
From: tip-bot for Vitaly Kuznetsov @ 2017-08-10 16:39 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, peterz, tglx, vkuznets, torvalds
Commit-ID: 10e66760fa8ee11f254a69433fc132d04758a5fc
Gitweb: http://git.kernel.org/tip/10e66760fa8ee11f254a69433fc132d04758a5fc
Author: Vitaly Kuznetsov <vkuznets@redhat.com>
AuthorDate: Thu, 3 Aug 2017 12:58:18 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Aug 2017 17:02:47 +0200
x86/smpboot: Unbreak CPU0 hotplug
A hang on CPU0 onlining after a preceding offlining is observed. Trace
shows that CPU0 is stuck in check_tsc_sync_target() waiting for source
CPU to run check_tsc_sync_source() but this never happens. Source CPU,
in its turn, is stuck on synchronize_sched() which is called from
native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler().
So it's a classic ABBA deadlock, due to the use of synchronize_sched() in
unregister_nmi_handler().
Fix the bug by moving unregister_nmi_handler() from do_boot_cpu() to
native_cpu_up() after cpu onlining is done.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170803105818.9934-1-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/kernel/smpboot.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index b474c8d..54b9e89 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -971,7 +971,8 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
* Returns zero if CPU booted OK, else error code from
* ->wakeup_secondary_cpu.
*/
-static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
+static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
+ int *cpu0_nmi_registered)
{
volatile u32 *trampoline_status =
(volatile u32 *) __va(real_mode_header->trampoline_status);
@@ -979,7 +980,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
unsigned long start_ip = real_mode_header->trampoline_start;
unsigned long boot_error = 0;
- int cpu0_nmi_registered = 0;
unsigned long timeout;
idle->thread.sp = (unsigned long)task_pt_regs(idle);
@@ -1035,7 +1035,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
else
boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
- &cpu0_nmi_registered);
+ cpu0_nmi_registered);
if (!boot_error) {
/*
@@ -1080,12 +1080,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
*/
smpboot_restore_warm_reset_vector();
}
- /*
- * Clean up the nmi handler. Do this after the callin and callout sync
- * to avoid impact of possible long unregister time.
- */
- if (cpu0_nmi_registered)
- unregister_nmi_handler(NMI_LOCAL, "wake_cpu0");
return boot_error;
}
@@ -1093,8 +1087,9 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
{
int apicid = apic->cpu_present_to_apicid(cpu);
+ int cpu0_nmi_registered = 0;
unsigned long flags;
- int err;
+ int err, ret = 0;
WARN_ON(irqs_disabled());
@@ -1131,10 +1126,11 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
common_cpu_up(cpu, tidle);
- err = do_boot_cpu(apicid, cpu, tidle);
+ err = do_boot_cpu(apicid, cpu, tidle, &cpu0_nmi_registered);
if (err) {
pr_err("do_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu);
- return -EIO;
+ ret = -EIO;
+ goto unreg_nmi;
}
/*
@@ -1150,7 +1146,15 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
touch_nmi_watchdog();
}
- return 0;
+unreg_nmi:
+ /*
+ * Clean up the nmi handler. Do this after the callin and callout sync
+ * to avoid impact of possible long unregister time.
+ */
+ if (cpu0_nmi_registered)
+ unregister_nmi_handler(NMI_LOCAL, "wake_cpu0");
+
+ return ret;
}
/**
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug
2017-08-10 14:23 ` Ingo Molnar
2017-08-10 14:56 ` Vitaly Kuznetsov
@ 2017-08-14 7:09 ` Thomas Gleixner
1 sibling, 0 replies; 6+ messages in thread
From: Thomas Gleixner @ 2017-08-14 7:09 UTC (permalink / raw)
To: Ingo Molnar
Cc: Vitaly Kuznetsov, x86, linux-kernel, Ingo Molnar, H. Peter Anvin
On Thu, 10 Aug 2017, Ingo Molnar wrote:
> * Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> > I haven't received a single comment on this since the initial submission
> > on June, 26 - is it so bad? :-)
Sorry slipped through the cracks.
> So the fix looks good to me at first sight, but wanted to wait for Thomas to ack
> it - once he gets back from vacation.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-08-14 7:09 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-03 10:58 [PATCH RESEND] x86/smpboot: Unbreak CPU0 hotplug Vitaly Kuznetsov
2017-08-10 12:01 ` Vitaly Kuznetsov
2017-08-10 14:23 ` Ingo Molnar
2017-08-10 14:56 ` Vitaly Kuznetsov
2017-08-14 7:09 ` Thomas Gleixner
2017-08-10 16:39 ` [tip:x86/urgent] " tip-bot for Vitaly Kuznetsov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).