linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] Introduce KGDB_DEBUG_SPINLOCKS
@ 2020-05-22 14:55 Daniel Thompson
  2020-05-22 14:55 ` [RFC PATCH 1/2] debug: Convert dbg_slave_lock to an atomic Daniel Thompson
  2020-05-22 14:55 ` [RFC PATCH 2/2] locking/spinlock/debug: Add checks for kgdb trap safety Daniel Thompson
  0 siblings, 2 replies; 5+ messages in thread
From: Daniel Thompson @ 2020-05-22 14:55 UTC (permalink / raw)
  To: sumit.garg, jason.wessel, dianders
  Cc: Daniel Thompson, kgdb-bugreport, linux-kernel, patches, pmladek,
	sergey.senozhatsky, Peter Zijlstra, Ingo Molnar, Will Deacon

The execution context for kgdb/kdb is pretty much unique. We are running
a debug trap handler with all CPUs parked in a holding loop and with
interrupts disabled. At least one CPU is in an unknowable execution
state (could be NMI, IRQ, irqs disabled, etc) and the others are either
servicing an IRQ or NMI depending on architecture.

Breakpoints (including some implicit breakpoints when serious errors
are detected) can happen on more or less any context, including when we
own important spin locks.

As such spin lock waits should never happen whilst we are executing the
kgdb trap handler used except, occasionally, via an explicit command
from a (forewarned?) local operator.

Currently kdb doesn't meet this criteria (although I think kgdb does)
so I started thinking about what tooling we could employ to reinforce
code review and bring problems to the surface.

The result is a patch that extends DEBUG_SPINLOCKS and checks whether
the execution context is safe. The "except via an explicit command"
aspect (mentioned above) convinced me to make the checks conditional
on KGDB_DEBUG_SPINLOCKS.

Daniel Thompson (2):
  debug: Convert dbg_slave_lock to an atomic
  locking/spinlock/debug: Add checks for kgdb trap safety

 include/linux/kgdb.h            | 16 ++++++++++++++++
 kernel/debug/debug_core.c       |  8 ++++----
 kernel/locking/spinlock_debug.c |  4 ++++
 lib/Kconfig.kgdb                | 11 +++++++++++
 4 files changed, 35 insertions(+), 4 deletions(-)


base-commit: 6a8b55ed4056ea5559ebe4f6a4b247f627870d4c
--
2.25.4


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 1/2] debug: Convert dbg_slave_lock to an atomic
  2020-05-22 14:55 [RFC PATCH 0/2] Introduce KGDB_DEBUG_SPINLOCKS Daniel Thompson
@ 2020-05-22 14:55 ` Daniel Thompson
  2020-05-22 16:26   ` Peter Zijlstra
  2020-05-22 14:55 ` [RFC PATCH 2/2] locking/spinlock/debug: Add checks for kgdb trap safety Daniel Thompson
  1 sibling, 1 reply; 5+ messages in thread
From: Daniel Thompson @ 2020-05-22 14:55 UTC (permalink / raw)
  To: sumit.garg, jason.wessel, dianders
  Cc: Daniel Thompson, kgdb-bugreport, linux-kernel, patches, pmladek,
	sergey.senozhatsky, Peter Zijlstra, Ingo Molnar, Will Deacon

Currently the debug core takes and releases dbg_slave_lock, knowing it to
be uncontended, as a means to set and clear a flag that it uses to herd
the other cores in to or out of a holding pen.

Let's convert this to a regular atomic instead. This change is worthwhile
simply for the subtle increase in clarity in a very tangled bit of code.
It is also useful because it removes a raw_spin_lock() from within the
debug trap, which will make it easier to introduce spin lock debugging
code for the debug trap handler.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
---
 kernel/debug/debug_core.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 2b7c9b67931d..8f43171ddeac 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -121,12 +121,12 @@ static struct kgdb_bkpt		kgdb_break[KGDB_MAX_BREAKPOINTS] = {
 atomic_t			kgdb_active = ATOMIC_INIT(-1);
 EXPORT_SYMBOL_GPL(kgdb_active);
 static DEFINE_RAW_SPINLOCK(dbg_master_lock);
-static DEFINE_RAW_SPINLOCK(dbg_slave_lock);
 
 /*
  * We use NR_CPUs not PERCPU, in case kgdb is used to debug early
  * bootup code (which might not have percpu set up yet):
  */
+static atomic_t			slaves_must_spin;
 static atomic_t			masters_in_kgdb;
 static atomic_t			slaves_in_kgdb;
 static atomic_t			kgdb_break_tasklet_var;
@@ -615,7 +615,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 			dump_stack();
 			kgdb_info[cpu].exception_state &= ~DCPU_WANT_BT;
 		} else if (kgdb_info[cpu].exception_state & DCPU_IS_SLAVE) {
-			if (!raw_spin_is_locked(&dbg_slave_lock))
+			if (!atomic_read(&slaves_must_spin))
 				goto return_normal;
 		} else {
 return_normal:
@@ -677,7 +677,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 	 * CPU in a spin state while the debugger is active
 	 */
 	if (!kgdb_single_step)
-		raw_spin_lock(&dbg_slave_lock);
+		atomic_set(&slaves_must_spin, 1);
 
 #ifdef CONFIG_SMP
 	/* If send_ready set, slaves are already waiting */
@@ -741,7 +741,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 		dbg_io_ops->post_exception();
 
 	if (!kgdb_single_step) {
-		raw_spin_unlock(&dbg_slave_lock);
+		atomic_set(&slaves_must_spin, 0);
 		/* Wait till all the CPUs have quit from the debugger. */
 		while (kgdb_do_roundup && atomic_read(&slaves_in_kgdb))
 			cpu_relax();
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 2/2] locking/spinlock/debug: Add checks for kgdb trap safety
  2020-05-22 14:55 [RFC PATCH 0/2] Introduce KGDB_DEBUG_SPINLOCKS Daniel Thompson
  2020-05-22 14:55 ` [RFC PATCH 1/2] debug: Convert dbg_slave_lock to an atomic Daniel Thompson
@ 2020-05-22 14:55 ` Daniel Thompson
  2020-05-22 16:35   ` Peter Zijlstra
  1 sibling, 1 reply; 5+ messages in thread
From: Daniel Thompson @ 2020-05-22 14:55 UTC (permalink / raw)
  To: sumit.garg, jason.wessel, dianders
  Cc: Daniel Thompson, kgdb-bugreport, linux-kernel, patches, pmladek,
	sergey.senozhatsky, Peter Zijlstra, Ingo Molnar, Will Deacon

In general it is not safe to call spin_lock() whilst executing in the
kgdb trap handler. The trap can be entered from all sorts of execution
context (NMI, IRQ, irqs disabled, etc) and the kgdb/kdb needs to be
as resillient as possible.

Currently it is difficult to spot mistakes in the kgdb/kdb logic
(especially so for kdb because it uses more kernel features than
pure-kgdb). Let's provide a means to bring attention to deadlock
risks in the debug code.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
---
 include/linux/kgdb.h            | 16 ++++++++++++++++
 kernel/locking/spinlock_debug.c |  4 ++++
 lib/Kconfig.kgdb                | 11 +++++++++++
 3 files changed, 31 insertions(+)

diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index b072aeb1fd78..de30ce8078cf 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -332,4 +332,20 @@ extern void kgdb_panic(const char *msg);
 #define dbg_late_init()
 static inline void kgdb_panic(const char *msg) {}
 #endif /* ! CONFIG_KGDB */
+
+#ifdef CONFIG_KGDB_DEBUG_SPINLOCK
+/**
+ * check_kgdb_context_before() - Check whether to issue a spinlock warning
+ *
+ * Currently this only reports when the master processor violates the
+ * locking rules (because we are using the in_dbg_master() macro since
+ * we are confident that will avoid false positives).
+ *
+ * Return: True if we are executing in the debug trap
+ */
+static inline int check_kgdb_context_before(void) { return in_dbg_master(); }
+#else
+static inline int check_kgdb_context_before(void) { return 0; }
+#endif
+
 #endif /* _KGDB_H_ */
diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index b9d93087ee66..b49789e0fed8 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -12,6 +12,7 @@
 #include <linux/debug_locks.h>
 #include <linux/delay.h>
 #include <linux/export.h>
+#include <linux/kgdb.h>
 
 void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
 			  struct lock_class_key *key, short inner)
@@ -84,6 +85,7 @@ debug_spin_lock_before(raw_spinlock_t *lock)
 	SPIN_BUG_ON(READ_ONCE(lock->owner) == current, lock, "recursion");
 	SPIN_BUG_ON(READ_ONCE(lock->owner_cpu) == raw_smp_processor_id(),
 							lock, "cpu recursion");
+	SPIN_BUG_ON(check_kgdb_context_before(), lock, "in debug trap");
 }
 
 static inline void debug_spin_lock_after(raw_spinlock_t *lock)
@@ -174,6 +176,7 @@ int do_raw_read_trylock(rwlock_t *lock)
 void do_raw_read_unlock(rwlock_t *lock)
 {
 	RWLOCK_BUG_ON(lock->magic != RWLOCK_MAGIC, lock, "bad magic");
+	RWLOCK_BUG_ON(check_kgdb_context_before(), lock, "in debug trap");
 	arch_read_unlock(&lock->raw_lock);
 }
 
@@ -183,6 +186,7 @@ static inline void debug_write_lock_before(rwlock_t *lock)
 	RWLOCK_BUG_ON(lock->owner == current, lock, "recursion");
 	RWLOCK_BUG_ON(lock->owner_cpu == raw_smp_processor_id(),
 							lock, "cpu recursion");
+	RWLOCK_BUG_ON(check_kgdb_context_before(), lock, "in debug trap");
 }
 
 static inline void debug_write_lock_after(rwlock_t *lock)
diff --git a/lib/Kconfig.kgdb b/lib/Kconfig.kgdb
index 933680b59e2d..4d57900d6c53 100644
--- a/lib/Kconfig.kgdb
+++ b/lib/Kconfig.kgdb
@@ -29,6 +29,17 @@ config KGDB_SERIAL_CONSOLE
 	  Share a serial console with kgdb. Sysrq-g must be used
 	  to break in initially.
 
+config KGDB_DEBUG_SPINLOCK
+	bool "KGDB: Check for spin lock usage when system is halted"
+	select DEBUG_SPINLOCK
+	default n
+	help
+	  Say Y here to catch spin lock waiting when we are running
+	  in the kgdb trap handler and report it. When the trap handler
+	  is executing all other system activity is halted and spin lock
+	  contention will lead to deadlock. This makes any spin lock wait
+	  from this execution context risky.
+
 config KGDB_TESTS
 	bool "KGDB: internal test suite"
 	default n
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 1/2] debug: Convert dbg_slave_lock to an atomic
  2020-05-22 14:55 ` [RFC PATCH 1/2] debug: Convert dbg_slave_lock to an atomic Daniel Thompson
@ 2020-05-22 16:26   ` Peter Zijlstra
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2020-05-22 16:26 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: sumit.garg, jason.wessel, dianders, kgdb-bugreport, linux-kernel,
	patches, pmladek, sergey.senozhatsky, Ingo Molnar, Will Deacon

On Fri, May 22, 2020 at 03:55:09PM +0100, Daniel Thompson wrote:

> +static atomic_t			slaves_must_spin;

> +			if (!atomic_read(&slaves_must_spin))

> +		atomic_set(&slaves_must_spin, 1);

> +		atomic_set(&slaves_must_spin, 0);

There is no atomic operation there; this is pointless use of atomic_t.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 2/2] locking/spinlock/debug: Add checks for kgdb trap safety
  2020-05-22 14:55 ` [RFC PATCH 2/2] locking/spinlock/debug: Add checks for kgdb trap safety Daniel Thompson
@ 2020-05-22 16:35   ` Peter Zijlstra
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2020-05-22 16:35 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: sumit.garg, jason.wessel, dianders, kgdb-bugreport, linux-kernel,
	patches, pmladek, sergey.senozhatsky, Ingo Molnar, Will Deacon

On Fri, May 22, 2020 at 03:55:10PM +0100, Daniel Thompson wrote:
> In general it is not safe to call spin_lock() whilst executing in the
> kgdb trap handler. The trap can be entered from all sorts of execution
> context (NMI, IRQ, irqs disabled, etc) and the kgdb/kdb needs to be
> as resillient as possible.
> 
> Currently it is difficult to spot mistakes in the kgdb/kdb logic
> (especially so for kdb because it uses more kernel features than
> pure-kgdb). Let's provide a means to bring attention to deadlock
> risks in the debug code.

I really dislike this thing. Also, commit:

  f6f48e180404 ("lockdep: Teach lockdep about "USED" <- "IN-NMI" inversions")

should be able to trigger here when the kgdb traps are marked as NMI.
x86 will soon have that.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-05-22 16:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-22 14:55 [RFC PATCH 0/2] Introduce KGDB_DEBUG_SPINLOCKS Daniel Thompson
2020-05-22 14:55 ` [RFC PATCH 1/2] debug: Convert dbg_slave_lock to an atomic Daniel Thompson
2020-05-22 16:26   ` Peter Zijlstra
2020-05-22 14:55 ` [RFC PATCH 2/2] locking/spinlock/debug: Add checks for kgdb trap safety Daniel Thompson
2020-05-22 16:35   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).