All of lore.kernel.org
 help / color / mirror / Atom feed
From: Grzegorz Halat <ghalat@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H . Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Don Zickus <dzickus@redhat.com>,
	Oleksandr Natalenko <oleksandr@redhat.com>,
	Grzegorz Halat <ghalat@redhat.com>
Subject: [PATCH RFC] x86/reboot: Don't wait infinitely for IPI completion during reboot
Date: Fri, 28 Jun 2019 14:28:13 +0200	[thread overview]
Message-ID: <20190628122813.15500-1-ghalat@redhat.com> (raw)

Hi,
This patch is trying to address a deadlock that may happen during a
reboot.

CPU0 can stop a CPU which is holding a lock and there may be another CPU
with disabled interrupts waiting for the same lock. In this situation,
the CPU waiting for the lock can't be stopped and CPU0 will be
indefinitely waiting for IPI completion.

To avoid indefinitely looping kernel should send NMI after timeout.
It seems to me that this was the original intention of
commit 7d007d21e539 ("x86/reboot: Use NMI to assist in shutting down if
IRQ fails") but the removal of 'wait' in the first loop has been
forgotten.

I'm sending the patch as RFC, maybe there is a better way to handle this
situation. Another thing which wonders me is pr_emerg() after IPI.
Is this safe? I think that IPI can shut down a CPU that is holding a
lock used by pr_emerg(), so there may be a deadlock.

CPU0 is stuck in native_stop_other_cpus() waiting for shutdown of CPU2:

 #6 [ffffb1a7826b7dd0] delay_tsc at ffffffffb640f0b4
 #7 [ffffb1a7826b7dd0] native_stop_other_cpus at ffffffffb5c47ffa 
 #8 [ffffb1a7826b7df0] native_machine_shutdown at ffffffffb5c47132
 #9 [ffffb1a7826b7df8] native_machine_restart at ffffffffb5c47649
#10 [ffffb1a7826b7e00] __do_sys_reboot at ffffffffb5ccff62
#11 [ffffb1a7826b7f38] do_syscall_64 at ffffffffb5c0424b

The loop, in case of reboot() syscall wait==1:
File: linux/arch/x86/kernel/smp.c
218: 		timeout = USEC_PER_SEC;
219: 		while (num_online_cpus() > 1 && (wait || timeout--))
220: 			udelay(1);

CPU1 has been shutdown while holding tasklist_lock:

 #0 [ffff984418083fb8] stop_this_cpu at ffffffffb5c28495  << CPU stopped
 #1 [ffff984418083fc0] smp_reboot_interrupt at ffffffffb5c48169
 #2 [ffff984418083ff0] reboot_interrupt at ffffffffb6600bbf
--- <IRQ stack> ---
 #3 [ffffb1a782417be8] reboot_interrupt at ffffffffb6600bbf
    [exception RIP: delay_tsc+32]
 #4 [ffffb1a782417c98] probe_12098 at ffffffffc0813fab 
 #5 [ffffb1a782417cb0] do_wait at ffffffffb5cae5b4
 #6 [ffffb1a782417d08] optimized_callback at ffffffffb5c56be3
 #7 [ffffb1a782417d98] do_wait at ffffffffb5cae5b3 << read_lock() here
 #8 [ffffb1a782417df0] kernel_wait4 at ffffffffb5cafa4e
 #9 [ffffb1a782417e78] __do_sys_wait4 at ffffffffb5cafb73
#10 [ffffb1a782417f38] do_syscall_64 at ffffffffb5c0424b

Note: probe_12098 is SystemTap probe which injects delay.
I'm using SystemTap to reliably reproduce the issue 

CPU2 is trying to grab tasklist_lock with disabled interrupts:

RFLAGS: 00000006
 #5 [ffffb1a782397e78] queued_write_lock_slowpath at ffffffffb5d0052e
 #6 [ffffb1a782397e88] do_exit at ffffffffb5caf08e
 #7 [ffffb1a782397f08] do_group_exit at ffffffffb5caf8da
 #8 [ffffb1a782397f30] __x64_sys_exit_group at ffffffffb5caf954
 #9 [ffffb1a782397f38] do_syscall_64 at ffffffffb5c0424b

Signed-off-by: Grzegorz Halat <ghalat@redhat.com>
---
 arch/x86/kernel/smp.c | 28 ++++++++++------------------
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 4693e2f3a03e..9186e432b8d6 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -211,30 +211,23 @@ static void native_stop_other_cpus(int wait)
 
 		apic->send_IPI_allbutself(REBOOT_VECTOR);
 
-		/*
-		 * Don't wait longer than a second if the caller
-		 * didn't ask us to wait.
-		 */
+		/* Don't wait longer than a second for IPI completion */
 		timeout = USEC_PER_SEC;
-		while (num_online_cpus() > 1 && (wait || timeout--))
+		while (num_online_cpus() > 1 && timeout--)
 			udelay(1);
 	}
 	
 	/* if the REBOOT_VECTOR didn't work, try with the NMI */
-	if ((num_online_cpus() > 1) && (!smp_no_nmi_ipi))  {
-		if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
-					 NMI_FLAG_FIRST, "smp_stop"))
-			/* Note: we ignore failures here */
-			/* Hope the REBOOT_IRQ is good enough */
-			goto finish;
-
-		/* sync above data before sending IRQ */
-		wmb();
-
-		pr_emerg("Shutting down cpus with NMI\n");
+	if (num_online_cpus() > 1) {
+		if (!smp_no_nmi_ipi && !register_nmi_handler(NMI_LOCAL,
+			smp_stop_nmi_callback, NMI_FLAG_FIRST, "smp_stop")){
+			/* sync above data before sending IRQ */
+			wmb();
 
-		apic->send_IPI_allbutself(NMI_VECTOR);
+			pr_emerg("Shutting down cpus with NMI\n");
 
+			apic->send_IPI_allbutself(NMI_VECTOR);
+		}
 		/*
 		 * Don't wait longer than a 10 ms if the caller
 		 * didn't ask us to wait.
@@ -244,7 +237,6 @@ static void native_stop_other_cpus(int wait)
 			udelay(1);
 	}
 
-finish:
 	local_irq_save(flags);
 	disable_local_APIC();
 	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
-- 
2.18.1


             reply	other threads:[~2019-06-28 12:29 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-28 12:28 Grzegorz Halat [this message]
2019-07-25 13:40 ` [tip:x86/core] x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails tip-bot for Grzegorz Halat
2019-07-25 14:19 ` [tip:x86/apic] " tip-bot for Grzegorz Halat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190628122813.15500-1-ghalat@redhat.com \
    --to=ghalat@redhat.com \
    --cc=bp@alien8.de \
    --cc=dzickus@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleksandr@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.