linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] x86/reboot: Don't wait infinitely for IPI completion during reboot
@ 2019-06-28 12:28 Grzegorz Halat
  2019-07-25 13:40 ` [tip:x86/core] x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails tip-bot for Grzegorz Halat
  2019-07-25 14:19 ` [tip:x86/apic] " tip-bot for Grzegorz Halat
  0 siblings, 2 replies; 3+ messages in thread
From: Grzegorz Halat @ 2019-06-28 12:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	x86, Don Zickus, Oleksandr Natalenko, Grzegorz Halat

Hi,
This patch is trying to address a deadlock that may happen during a
reboot.

CPU0 can stop a CPU which is holding a lock and there may be another CPU
with disabled interrupts waiting for the same lock. In this situation,
the CPU waiting for the lock can't be stopped and CPU0 will be
indefinitely waiting for IPI completion.

To avoid indefinitely looping kernel should send NMI after timeout.
It seems to me that this was the original intention of
commit 7d007d21e539 ("x86/reboot: Use NMI to assist in shutting down if
IRQ fails") but the removal of 'wait' in the first loop has been
forgotten.

I'm sending the patch as RFC, maybe there is a better way to handle this
situation. Another thing which wonders me is pr_emerg() after IPI.
Is this safe? I think that IPI can shut down a CPU that is holding a
lock used by pr_emerg(), so there may be a deadlock.

CPU0 is stuck in native_stop_other_cpus() waiting for shutdown of CPU2:

 #6 [ffffb1a7826b7dd0] delay_tsc at ffffffffb640f0b4
 #7 [ffffb1a7826b7dd0] native_stop_other_cpus at ffffffffb5c47ffa 
 #8 [ffffb1a7826b7df0] native_machine_shutdown at ffffffffb5c47132
 #9 [ffffb1a7826b7df8] native_machine_restart at ffffffffb5c47649
#10 [ffffb1a7826b7e00] __do_sys_reboot at ffffffffb5ccff62
#11 [ffffb1a7826b7f38] do_syscall_64 at ffffffffb5c0424b

The loop, in case of reboot() syscall wait==1:
File: linux/arch/x86/kernel/smp.c
218: 		timeout = USEC_PER_SEC;
219: 		while (num_online_cpus() > 1 && (wait || timeout--))
220: 			udelay(1);

CPU1 has been shutdown while holding tasklist_lock:

 #0 [ffff984418083fb8] stop_this_cpu at ffffffffb5c28495  << CPU stopped
 #1 [ffff984418083fc0] smp_reboot_interrupt at ffffffffb5c48169
 #2 [ffff984418083ff0] reboot_interrupt at ffffffffb6600bbf
--- <IRQ stack> ---
 #3 [ffffb1a782417be8] reboot_interrupt at ffffffffb6600bbf
    [exception RIP: delay_tsc+32]
 #4 [ffffb1a782417c98] probe_12098 at ffffffffc0813fab 
 #5 [ffffb1a782417cb0] do_wait at ffffffffb5cae5b4
 #6 [ffffb1a782417d08] optimized_callback at ffffffffb5c56be3
 #7 [ffffb1a782417d98] do_wait at ffffffffb5cae5b3 << read_lock() here
 #8 [ffffb1a782417df0] kernel_wait4 at ffffffffb5cafa4e
 #9 [ffffb1a782417e78] __do_sys_wait4 at ffffffffb5cafb73
#10 [ffffb1a782417f38] do_syscall_64 at ffffffffb5c0424b

Note: probe_12098 is SystemTap probe which injects delay.
I'm using SystemTap to reliably reproduce the issue 

CPU2 is trying to grab tasklist_lock with disabled interrupts:

RFLAGS: 00000006
 #5 [ffffb1a782397e78] queued_write_lock_slowpath at ffffffffb5d0052e
 #6 [ffffb1a782397e88] do_exit at ffffffffb5caf08e
 #7 [ffffb1a782397f08] do_group_exit at ffffffffb5caf8da
 #8 [ffffb1a782397f30] __x64_sys_exit_group at ffffffffb5caf954
 #9 [ffffb1a782397f38] do_syscall_64 at ffffffffb5c0424b

Signed-off-by: Grzegorz Halat <ghalat@redhat.com>
---
 arch/x86/kernel/smp.c | 28 ++++++++++------------------
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 4693e2f3a03e..9186e432b8d6 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -211,30 +211,23 @@ static void native_stop_other_cpus(int wait)
 
 		apic->send_IPI_allbutself(REBOOT_VECTOR);
 
-		/*
-		 * Don't wait longer than a second if the caller
-		 * didn't ask us to wait.
-		 */
+		/* Don't wait longer than a second for IPI completion */
 		timeout = USEC_PER_SEC;
-		while (num_online_cpus() > 1 && (wait || timeout--))
+		while (num_online_cpus() > 1 && timeout--)
 			udelay(1);
 	}
 	
 	/* if the REBOOT_VECTOR didn't work, try with the NMI */
-	if ((num_online_cpus() > 1) && (!smp_no_nmi_ipi))  {
-		if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
-					 NMI_FLAG_FIRST, "smp_stop"))
-			/* Note: we ignore failures here */
-			/* Hope the REBOOT_IRQ is good enough */
-			goto finish;
-
-		/* sync above data before sending IRQ */
-		wmb();
-
-		pr_emerg("Shutting down cpus with NMI\n");
+	if (num_online_cpus() > 1) {
+		if (!smp_no_nmi_ipi && !register_nmi_handler(NMI_LOCAL,
+			smp_stop_nmi_callback, NMI_FLAG_FIRST, "smp_stop")){
+			/* sync above data before sending IRQ */
+			wmb();
 
-		apic->send_IPI_allbutself(NMI_VECTOR);
+			pr_emerg("Shutting down cpus with NMI\n");
 
+			apic->send_IPI_allbutself(NMI_VECTOR);
+		}
 		/*
 		 * Don't wait longer than a 10 ms if the caller
 		 * didn't ask us to wait.
@@ -244,7 +237,6 @@ static void native_stop_other_cpus(int wait)
 			udelay(1);
 	}
 
-finish:
 	local_irq_save(flags);
 	disable_local_APIC();
 	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [tip:x86/core] x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails
  2019-06-28 12:28 [PATCH RFC] x86/reboot: Don't wait infinitely for IPI completion during reboot Grzegorz Halat
@ 2019-07-25 13:40 ` tip-bot for Grzegorz Halat
  2019-07-25 14:19 ` [tip:x86/apic] " tip-bot for Grzegorz Halat
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Grzegorz Halat @ 2019-07-25 13:40 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, tglx, hpa, ghalat, dzickus, mingo

Commit-ID:  d92e35b76cfcafc31987a2aa186a8e8b4ee84f52
Gitweb:     https://git.kernel.org/tip/d92e35b76cfcafc31987a2aa186a8e8b4ee84f52
Author:     Grzegorz Halat <ghalat@redhat.com>
AuthorDate: Fri, 28 Jun 2019 14:28:13 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 25 Jul 2019 15:35:08 +0200

x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails

A reboot request sends an IPI via the reboot vector and waits for all other
CPUs to stop. If one or more CPUs are in critical regions with interrupts
disabled then the IPI is not handled on those CPUs and the shutdown hangs
if native_stop_other_cpus() is called with the wait argument set.

Such a situation can happen when one CPU was stopped within a lock held
section and another CPU is trying to acquire that lock with interrupts
disabled. There are other scenarios which can cause such a lockup as well.

In theory the shutdown should be attempted by an NMI IPI after the timeout
period elapsed. Though the wait loop after sending the reboot vector IPI
prevents this. It checks the wait request argument and the timeout. If wait
is set, which is true for sys_reboot() then it won't fall through to the
NMI shutdown method after the timeout period has finished.

This was an oversight when the NMI shutdown mechanism was added to handle
the 'reboot IPI is not working' situation. The mechanism was added to deal
with stuck panic shutdowns, which do not have the wait request set, so the
'wait request' case was probably not considered.

Remove the wait check from the post reboot vector IPI wait loop and enforce
that the wait loop in the NMI fallback path is invoked even if NMI IPIs are
disabled or the registration of the NMI handler fails. That second wait
loop will then hang if not all CPUs shutdown and the wait argument is set.

[ tglx: Avoid the hard to parse line break in the NMI fallback path,
  	add comments and massage the changelog ]

Fixes: 7d007d21e539 ("x86/reboot: Use NMI to assist in shutting down if IRQ fails")
Signed-off-by: Grzegorz Halat <ghalat@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Don Zickus <dzickus@redhat.com>
Link: https://lkml.kernel.org/r/20190628122813.15500-1-ghalat@redhat.com
---
 arch/x86/kernel/smp.c | 46 +++++++++++++++++++++++++++-------------------
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 96421f97e75c..231fa230ebc7 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -179,6 +179,12 @@ asmlinkage __visible void smp_reboot_interrupt(void)
 	irq_exit();
 }
 
+static int register_stop_handler(void)
+{
+	return register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
+				    NMI_FLAG_FIRST, "smp_stop");
+}
+
 static void native_stop_other_cpus(int wait)
 {
 	unsigned long flags;
@@ -212,39 +218,41 @@ static void native_stop_other_cpus(int wait)
 		apic->send_IPI_allbutself(REBOOT_VECTOR);
 
 		/*
-		 * Don't wait longer than a second if the caller
-		 * didn't ask us to wait.
+		 * Don't wait longer than a second for IPI completion. The
+		 * wait request is not checked here because that would
+		 * prevent an NMI shutdown attempt in case that not all
+		 * CPUs reach shutdown state.
 		 */
 		timeout = USEC_PER_SEC;
-		while (num_online_cpus() > 1 && (wait || timeout--))
+		while (num_online_cpus() > 1 && timeout--)
 			udelay(1);
 	}
-	
-	/* if the REBOOT_VECTOR didn't work, try with the NMI */
-	if ((num_online_cpus() > 1) && (!smp_no_nmi_ipi))  {
-		if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
-					 NMI_FLAG_FIRST, "smp_stop"))
-			/* Note: we ignore failures here */
-			/* Hope the REBOOT_IRQ is good enough */
-			goto finish;
-
-		/* sync above data before sending IRQ */
-		wmb();
 
-		pr_emerg("Shutting down cpus with NMI\n");
+	/* if the REBOOT_VECTOR didn't work, try with the NMI */
+	if (num_online_cpus() > 1) {
+		/*
+		 * If NMI IPI is enabled, try to register the stop handler
+		 * and send the IPI. In any case try to wait for the other
+		 * CPUs to stop.
+		 */
+		if (!smp_no_nmi_ipi && !register_stop_handler()) {
+			/* Sync above data before sending IRQ */
+			wmb();
 
-		apic->send_IPI_allbutself(NMI_VECTOR);
+			pr_emerg("Shutting down cpus with NMI\n");
 
+			apic->send_IPI_allbutself(NMI_VECTOR);
+		}
 		/*
-		 * Don't wait longer than a 10 ms if the caller
-		 * didn't ask us to wait.
+		 * Don't wait longer than 10 ms if the caller didn't
+		 * reqeust it. If wait is true, the machine hangs here if
+		 * one or more CPUs do not reach shutdown state.
 		 */
 		timeout = USEC_PER_MSEC * 10;
 		while (num_online_cpus() > 1 && (wait || timeout--))
 			udelay(1);
 	}
 
-finish:
 	local_irq_save(flags);
 	disable_local_APIC();
 	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [tip:x86/apic] x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails
  2019-06-28 12:28 [PATCH RFC] x86/reboot: Don't wait infinitely for IPI completion during reboot Grzegorz Halat
  2019-07-25 13:40 ` [tip:x86/core] x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails tip-bot for Grzegorz Halat
@ 2019-07-25 14:19 ` tip-bot for Grzegorz Halat
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Grzegorz Halat @ 2019-07-25 14:19 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: mingo, ghalat, linux-kernel, hpa, dzickus, tglx

Commit-ID:  747d5a1bf293dcb33af755a6d285d41b8c1ea010
Gitweb:     https://git.kernel.org/tip/747d5a1bf293dcb33af755a6d285d41b8c1ea010
Author:     Grzegorz Halat <ghalat@redhat.com>
AuthorDate: Fri, 28 Jun 2019 14:28:13 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 25 Jul 2019 16:09:24 +0200

x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails

A reboot request sends an IPI via the reboot vector and waits for all other
CPUs to stop. If one or more CPUs are in critical regions with interrupts
disabled then the IPI is not handled on those CPUs and the shutdown hangs
if native_stop_other_cpus() is called with the wait argument set.

Such a situation can happen when one CPU was stopped within a lock held
section and another CPU is trying to acquire that lock with interrupts
disabled. There are other scenarios which can cause such a lockup as well.

In theory the shutdown should be attempted by an NMI IPI after the timeout
period elapsed. Though the wait loop after sending the reboot vector IPI
prevents this. It checks the wait request argument and the timeout. If wait
is set, which is true for sys_reboot() then it won't fall through to the
NMI shutdown method after the timeout period has finished.

This was an oversight when the NMI shutdown mechanism was added to handle
the 'reboot IPI is not working' situation. The mechanism was added to deal
with stuck panic shutdowns, which do not have the wait request set, so the
'wait request' case was probably not considered.

Remove the wait check from the post reboot vector IPI wait loop and enforce
that the wait loop in the NMI fallback path is invoked even if NMI IPIs are
disabled or the registration of the NMI handler fails. That second wait
loop will then hang if not all CPUs shutdown and the wait argument is set.

[ tglx: Avoid the hard to parse line break in the NMI fallback path,
  	add comments and massage the changelog ]

Fixes: 7d007d21e539 ("x86/reboot: Use NMI to assist in shutting down if IRQ fails")
Signed-off-by: Grzegorz Halat <ghalat@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Don Zickus <dzickus@redhat.com>
Link: https://lkml.kernel.org/r/20190628122813.15500-1-ghalat@redhat.com
---
 arch/x86/kernel/smp.c | 46 +++++++++++++++++++++++++++-------------------
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 96421f97e75c..231fa230ebc7 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -179,6 +179,12 @@ asmlinkage __visible void smp_reboot_interrupt(void)
 	irq_exit();
 }
 
+static int register_stop_handler(void)
+{
+	return register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
+				    NMI_FLAG_FIRST, "smp_stop");
+}
+
 static void native_stop_other_cpus(int wait)
 {
 	unsigned long flags;
@@ -212,39 +218,41 @@ static void native_stop_other_cpus(int wait)
 		apic->send_IPI_allbutself(REBOOT_VECTOR);
 
 		/*
-		 * Don't wait longer than a second if the caller
-		 * didn't ask us to wait.
+		 * Don't wait longer than a second for IPI completion. The
+		 * wait request is not checked here because that would
+		 * prevent an NMI shutdown attempt in case that not all
+		 * CPUs reach shutdown state.
 		 */
 		timeout = USEC_PER_SEC;
-		while (num_online_cpus() > 1 && (wait || timeout--))
+		while (num_online_cpus() > 1 && timeout--)
 			udelay(1);
 	}
-	
-	/* if the REBOOT_VECTOR didn't work, try with the NMI */
-	if ((num_online_cpus() > 1) && (!smp_no_nmi_ipi))  {
-		if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
-					 NMI_FLAG_FIRST, "smp_stop"))
-			/* Note: we ignore failures here */
-			/* Hope the REBOOT_IRQ is good enough */
-			goto finish;
-
-		/* sync above data before sending IRQ */
-		wmb();
 
-		pr_emerg("Shutting down cpus with NMI\n");
+	/* if the REBOOT_VECTOR didn't work, try with the NMI */
+	if (num_online_cpus() > 1) {
+		/*
+		 * If NMI IPI is enabled, try to register the stop handler
+		 * and send the IPI. In any case try to wait for the other
+		 * CPUs to stop.
+		 */
+		if (!smp_no_nmi_ipi && !register_stop_handler()) {
+			/* Sync above data before sending IRQ */
+			wmb();
 
-		apic->send_IPI_allbutself(NMI_VECTOR);
+			pr_emerg("Shutting down cpus with NMI\n");
 
+			apic->send_IPI_allbutself(NMI_VECTOR);
+		}
 		/*
-		 * Don't wait longer than a 10 ms if the caller
-		 * didn't ask us to wait.
+		 * Don't wait longer than 10 ms if the caller didn't
+		 * reqeust it. If wait is true, the machine hangs here if
+		 * one or more CPUs do not reach shutdown state.
 		 */
 		timeout = USEC_PER_MSEC * 10;
 		while (num_online_cpus() > 1 && (wait || timeout--))
 			udelay(1);
 	}
 
-finish:
 	local_irq_save(flags);
 	disable_local_APIC();
 	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-07-25 14:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-28 12:28 [PATCH RFC] x86/reboot: Don't wait infinitely for IPI completion during reboot Grzegorz Halat
2019-07-25 13:40 ` [tip:x86/core] x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails tip-bot for Grzegorz Halat
2019-07-25 14:19 ` [tip:x86/apic] " tip-bot for Grzegorz Halat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).