All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix NMI IPI timeouts
@ 2018-11-26  2:01 Nicholas Piggin
  2018-11-26  2:01 ` [PATCH 1/3] powerpc/smp: Fix NMI IPI timeout Nicholas Piggin
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Nicholas Piggin @ 2018-11-26  2:01 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Paul Mackerras, Nicholas Piggin

The first two patches fix timeouts and deadlocks using NMI IPIs.
The xmon deadlock fails with a timeout so it wasn't really noticed.
The infinite timeout only hits when the system is in really bad shape
or the sreset handler is broken somehow (e.g., running a guest and no
KVM test in the handler).

Patch 1 is a necessary fix. Patch 2 is a bigger change, we could live
without it because the sender will eventually time out and break the
deadlock.

Thanks,
Nick

Nicholas Piggin (3):
  powerpc/smp: Fix NMI IPI timeout
  powerpc/smp: Fix NMI IPI xmon timeout
  powerpc/smp: __smp_send_nmi_ipi static

 arch/powerpc/kernel/smp.c | 93 +++++++++++++--------------------------
 1 file changed, 30 insertions(+), 63 deletions(-)

-- 
2.19.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] powerpc/smp: Fix NMI IPI timeout
  2018-11-26  2:01 [PATCH 0/3] Fix NMI IPI timeouts Nicholas Piggin
@ 2018-11-26  2:01 ` Nicholas Piggin
  2019-02-22  9:47   ` [1/3] " Michael Ellerman
  2018-11-26  2:01 ` [PATCH 2/3] powerpc/smp: Fix NMI IPI xmon timeout Nicholas Piggin
  2018-11-26  2:01 ` [PATCH 3/3] powerpc/smp: __smp_send_nmi_ipi static Nicholas Piggin
  2 siblings, 1 reply; 5+ messages in thread
From: Nicholas Piggin @ 2018-11-26  2:01 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Paul Mackerras, Nicholas Piggin

The NMI IPI timeout logic is broken, if __smp_send_nmi_ipi times out
on the first condition, delay_us will be zero which will send it into
the second spin loop with no timeout so it will spin forever.

Fixes: 5b73151fff63 ("powerpc: NMI IPI make NMI IPIs fully sychronous")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/smp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 3f15edf25a0d..137196a4248b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -519,7 +519,7 @@ int __smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us, bool
 		if (delay_us) {
 			delay_us--;
 			if (!delay_us)
-				break;
+				goto timeout;
 		}
 	}
 
@@ -530,10 +530,11 @@ int __smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us, bool
 		if (delay_us) {
 			delay_us--;
 			if (!delay_us)
-				break;
+				goto timeout;
 		}
 	}
 
+timeout:
 	if (!cpumask_empty(&nmi_ipi_pending_mask)) {
 		/* Timeout waiting for CPUs to call smp_handle_nmi_ipi */
 		ret = 0;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] powerpc/smp: Fix NMI IPI xmon timeout
  2018-11-26  2:01 [PATCH 0/3] Fix NMI IPI timeouts Nicholas Piggin
  2018-11-26  2:01 ` [PATCH 1/3] powerpc/smp: Fix NMI IPI timeout Nicholas Piggin
@ 2018-11-26  2:01 ` Nicholas Piggin
  2018-11-26  2:01 ` [PATCH 3/3] powerpc/smp: __smp_send_nmi_ipi static Nicholas Piggin
  2 siblings, 0 replies; 5+ messages in thread
From: Nicholas Piggin @ 2018-11-26  2:01 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Paul Mackerras, Nicholas Piggin

The xmon debugger IPI handler waits in the callback function while
xmon is still active. This means they don't complete the IPI, and the
initiator always times out waiting for them.

Things manage to work after the timeout because there is some fallback
logic to keep NMI IPI state sane in case of the timeout, but this is a
bit ugly.

This patch changes NMI IPI back to half-asynchronous (i.e., wait for
everyone to call in, do not wait for IPI function to complete), but
the complexity is avoided by going one step further and allowing new
IPIs to be issued before the IPI functions to all complete.

If synchronization against that is required, it is left up to the
caller, but current callers don't require that. In fact with the
timeout handling, callers must be able to cope with this already.

Fixes: 5b73151fff63 ("powerpc: NMI IPI make NMI IPIs fully sychronous")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/smp.c | 93 ++++++++++++---------------------------
 1 file changed, 29 insertions(+), 64 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 137196a4248b..6e521a3f67ca 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -358,13 +358,12 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask)
  * NMI IPIs may not be recoverable, so should not be used as ongoing part of
  * a running system. They can be used for crash, debug, halt/reboot, etc.
  *
- * NMI IPIs are globally single threaded. No more than one in progress at
- * any time.
- *
  * The IPI call waits with interrupts disabled until all targets enter the
- * NMI handler, then the call returns.
+ * NMI handler, then returns. Subsequent IPIs can be issued before targets
+ * have returned from their handlers, so there is no guarantee about
+ * concurrency or re-entrancy.
  *
- * No new NMI can be initiated until targets exit the handler.
+ * A new NMI can be issued before all targets exit the handler.
  *
  * The IPI call may time out without all targets entering the NMI handler.
  * In that case, there is some logic to recover (and ignore subsequent
@@ -375,7 +374,7 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask)
 
 static atomic_t __nmi_ipi_lock = ATOMIC_INIT(0);
 static struct cpumask nmi_ipi_pending_mask;
-static int nmi_ipi_busy_count = 0;
+static bool nmi_ipi_busy = false;
 static void (*nmi_ipi_function)(struct pt_regs *) = NULL;
 
 static void nmi_ipi_lock_start(unsigned long *flags)
@@ -414,7 +413,7 @@ static void nmi_ipi_unlock_end(unsigned long *flags)
  */
 int smp_handle_nmi_ipi(struct pt_regs *regs)
 {
-	void (*fn)(struct pt_regs *);
+	void (*fn)(struct pt_regs *) = NULL;
 	unsigned long flags;
 	int me = raw_smp_processor_id();
 	int ret = 0;
@@ -425,29 +424,17 @@ int smp_handle_nmi_ipi(struct pt_regs *regs)
 	 * because the caller may have timed out.
 	 */
 	nmi_ipi_lock_start(&flags);
-	if (!nmi_ipi_busy_count)
-		goto out;
-	if (!cpumask_test_cpu(me, &nmi_ipi_pending_mask))
-		goto out;
-
-	fn = nmi_ipi_function;
-	if (!fn)
-		goto out;
-
-	cpumask_clear_cpu(me, &nmi_ipi_pending_mask);
-	nmi_ipi_busy_count++;
-	nmi_ipi_unlock();
-
-	ret = 1;
-
-	fn(regs);
-
-	nmi_ipi_lock();
-	if (nmi_ipi_busy_count > 1) /* Can race with caller time-out */
-		nmi_ipi_busy_count--;
-out:
+	if (cpumask_test_cpu(me, &nmi_ipi_pending_mask)) {
+		cpumask_clear_cpu(me, &nmi_ipi_pending_mask);
+		fn = READ_ONCE(nmi_ipi_function);
+		WARN_ON_ONCE(!fn);
+		ret = 1;
+	}
 	nmi_ipi_unlock_end(&flags);
 
+	if (fn)
+		fn(regs);
+
 	return ret;
 }
 
@@ -473,7 +460,7 @@ static void do_smp_send_nmi_ipi(int cpu, bool safe)
  * - cpu is the target CPU (must not be this CPU), or NMI_IPI_ALL_OTHERS.
  * - fn is the target callback function.
  * - delay_us > 0 is the delay before giving up waiting for targets to
- *   complete executing the handler, == 0 specifies indefinite delay.
+ *   begin executing the handler, == 0 specifies indefinite delay.
  */
 int __smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us, bool safe)
 {
@@ -487,31 +474,33 @@ int __smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us, bool
 	if (unlikely(!smp_ops))
 		return 0;
 
-	/* Take the nmi_ipi_busy count/lock with interrupts hard disabled */
 	nmi_ipi_lock_start(&flags);
-	while (nmi_ipi_busy_count) {
+	while (nmi_ipi_busy) {
 		nmi_ipi_unlock_end(&flags);
-		spin_until_cond(nmi_ipi_busy_count == 0);
+		spin_until_cond(!nmi_ipi_busy);
 		nmi_ipi_lock_start(&flags);
 	}
-
+	nmi_ipi_busy = true;
 	nmi_ipi_function = fn;
 
+	WARN_ON_ONCE(!cpumask_empty(&nmi_ipi_pending_mask));
+
 	if (cpu < 0) {
 		/* ALL_OTHERS */
 		cpumask_copy(&nmi_ipi_pending_mask, cpu_online_mask);
 		cpumask_clear_cpu(me, &nmi_ipi_pending_mask);
 	} else {
-		/* cpumask starts clear */
 		cpumask_set_cpu(cpu, &nmi_ipi_pending_mask);
 	}
-	nmi_ipi_busy_count++;
+
 	nmi_ipi_unlock();
 
+	/* Interrupts remain hard disabled */
+
 	do_smp_send_nmi_ipi(cpu, safe);
 
 	nmi_ipi_lock();
-	/* nmi_ipi_busy_count is held here, so unlock/lock is okay */
+	/* nmi_ipi_busy is set here, so unlock/lock is okay */
 	while (!cpumask_empty(&nmi_ipi_pending_mask)) {
 		nmi_ipi_unlock();
 		udelay(1);
@@ -519,34 +508,19 @@ int __smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us, bool
 		if (delay_us) {
 			delay_us--;
 			if (!delay_us)
-				goto timeout;
+				break;
 		}
 	}
 
-	while (nmi_ipi_busy_count > 1) {
-		nmi_ipi_unlock();
-		udelay(1);
-		nmi_ipi_lock();
-		if (delay_us) {
-			delay_us--;
-			if (!delay_us)
-				goto timeout;
-		}
-	}
-
-timeout:
 	if (!cpumask_empty(&nmi_ipi_pending_mask)) {
 		/* Timeout waiting for CPUs to call smp_handle_nmi_ipi */
 		ret = 0;
 		cpumask_clear(&nmi_ipi_pending_mask);
 	}
-	if (nmi_ipi_busy_count > 1) {
-		/* Timeout waiting for CPUs to execute fn */
-		ret = 0;
-		nmi_ipi_busy_count = 1;
-	}
 
-	nmi_ipi_busy_count--;
+	nmi_ipi_function = NULL;
+	nmi_ipi_busy = false;
+
 	nmi_ipi_unlock_end(&flags);
 
 	return ret;
@@ -614,17 +588,8 @@ void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
 static void nmi_stop_this_cpu(struct pt_regs *regs)
 {
 	/*
-	 * This is a special case because it never returns, so the NMI IPI
-	 * handling would never mark it as done, which makes any later
-	 * smp_send_nmi_ipi() call spin forever. Mark it done now.
-	 *
 	 * IRQs are already hard disabled by the smp_handle_nmi_ipi.
 	 */
-	nmi_ipi_lock();
-	if (nmi_ipi_busy_count > 1)
-		nmi_ipi_busy_count--;
-	nmi_ipi_unlock();
-
 	spin_begin();
 	while (1)
 		spin_cpu_relax();
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] powerpc/smp: __smp_send_nmi_ipi static
  2018-11-26  2:01 [PATCH 0/3] Fix NMI IPI timeouts Nicholas Piggin
  2018-11-26  2:01 ` [PATCH 1/3] powerpc/smp: Fix NMI IPI timeout Nicholas Piggin
  2018-11-26  2:01 ` [PATCH 2/3] powerpc/smp: Fix NMI IPI xmon timeout Nicholas Piggin
@ 2018-11-26  2:01 ` Nicholas Piggin
  2 siblings, 0 replies; 5+ messages in thread
From: Nicholas Piggin @ 2018-11-26  2:01 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Paul Mackerras, Nicholas Piggin

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/smp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 6e521a3f67ca..5366d9e7bed4 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -462,7 +462,8 @@ static void do_smp_send_nmi_ipi(int cpu, bool safe)
  * - delay_us > 0 is the delay before giving up waiting for targets to
  *   begin executing the handler, == 0 specifies indefinite delay.
  */
-int __smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us, bool safe)
+static int __smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *),
+				u64 delay_us, bool safe)
 {
 	unsigned long flags;
 	int me = raw_smp_processor_id();
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [1/3] powerpc/smp: Fix NMI IPI timeout
  2018-11-26  2:01 ` [PATCH 1/3] powerpc/smp: Fix NMI IPI timeout Nicholas Piggin
@ 2019-02-22  9:47   ` Michael Ellerman
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Ellerman @ 2019-02-22  9:47 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Paul Mackerras, Nicholas Piggin

On Mon, 2018-11-26 at 02:01:05 UTC, Nicholas Piggin wrote:
> The NMI IPI timeout logic is broken, if __smp_send_nmi_ipi times out
> on the first condition, delay_us will be zero which will send it into
> the second spin loop with no timeout so it will spin forever.
> 
> Fixes: 5b73151fff63 ("powerpc: NMI IPI make NMI IPIs fully sychronous")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1b5fc84aba170bdfe3533396ca9662ce

cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-02-22  9:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-26  2:01 [PATCH 0/3] Fix NMI IPI timeouts Nicholas Piggin
2018-11-26  2:01 ` [PATCH 1/3] powerpc/smp: Fix NMI IPI timeout Nicholas Piggin
2019-02-22  9:47   ` [1/3] " Michael Ellerman
2018-11-26  2:01 ` [PATCH 2/3] powerpc/smp: Fix NMI IPI xmon timeout Nicholas Piggin
2018-11-26  2:01 ` [PATCH 3/3] powerpc/smp: __smp_send_nmi_ipi static Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.