[PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

* [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash
@ 2020-04-01  0:00 Leonardo Bras
  2020-04-01  3:07   ` kbuild test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Leonardo Bras @ 2020-04-01  0:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Enrico Weigelt, Leonardo Bras, Alexios Zavras, Thomas Gleixner,
	Greg Kroah-Hartman, Christophe Leroy, peterz
  Cc: linuxppc-dev, linux-kernel

During a crash, there is chance that the cpus that handle the NMI IPI
are holding a spin_lock. If this spin_lock is needed by crashing_cpu it
will cause a deadlock. (rtas.lock and printk logbuf_lock as of today)

This is a problem if the system has kdump set up, given if it crashes
for any reason kdump may not be saved for crash analysis.

After NMI IPI is sent to all other cpus, force unlock all spinlocks
needed for finishing crash routine.

Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>

---
Changes from v2:
- Instead of skipping spinlocks, unlock the needed ones.

Changes from v1:
- Exported variable
---
 arch/powerpc/kexec/crash.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
index d488311efab1..8d63fca3242c 100644
--- a/arch/powerpc/kexec/crash.c
+++ b/arch/powerpc/kexec/crash.c
@@ -24,6 +24,7 @@
 #include <asm/smp.h>
 #include <asm/setjmp.h>
 #include <asm/debug.h>
+#include <asm/rtas.h>
 
 /*
  * The primary CPU waits a while for all secondary CPUs to enter. This is to
@@ -49,6 +50,8 @@ static int time_to_dump;
  */
 int crash_wake_offline;
 
+extern raw_spinlock_t logbuf_lock;
+
 #define CRASH_HANDLER_MAX 3
 /* List of shutdown handles */
 static crash_shutdown_t crash_shutdown_handles[CRASH_HANDLER_MAX];
@@ -129,6 +132,13 @@ static void crash_kexec_prepare_cpus(int cpu)
 	/* Would it be better to replace the trap vector here? */
 
 	if (atomic_read(&cpus_in_crash) >= ncpus) {
+		/*
+		 * At this point no other CPU is running, and some of them may
+		 * have been interrupted while holding one of the locks needed
+		 * to complete crashing. Free them so there is no deadlock.
+		 */
+		arch_spin_unlock(&logbuf_lock.raw_lock);
+		arch_spin_unlock(&rtas.lock);
 		printk(KERN_EMERG "IPI complete\n");
 		return;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread