[RFC PATCH] panic: fix deadlock in panic()

* [RFC PATCH] panic: fix deadlock in panic()
@ 2020-06-03 14:19 Cheng Jian
  2020-06-04  7:59 ` Sergey Senozhatsky
  2020-06-04  8:29 ` Petr Mladek
  0 siblings, 2 replies; 5+ messages in thread
From: Cheng Jian @ 2020-06-03 14:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: cj.chengjian, chenwandun, xiexiuqi, bobo.shaobowang,
	huawei.libin, pmladek, sergey.senozhatsky, rostedt

 A deadlock caused by logbuf_lock occurs when panic:

	a) Panic CPU is running in non-NMI context
	b) Panic CPU sends out shutdown IPI via NMI vector
	c) One of the CPUs that we bring down via NMI vector holded logbuf_lock
	d) Panic CPU try to hold logbuf_lock, then deadlock occurs.

we try to re-init the logbuf_lock in printk_safe_flush_on_panic()
to avoid deadlock, but it does not work here, because :

Firstly, it is inappropriate to check num_online_cpus() here.
When the CPU bring down via NMI vector, the panic CPU willn't
wait too long for other cores to stop, so when this problem
occurs, num_online_cpus() may be greater than 1.

Secondly, printk_safe_flush_on_panic() is called after panic
notifier callback, so if printk() is called in panic notifier
callback, deadlock will still occurs. Eg, if ftrace_dump_on_oops
is set, we print some debug information, it will try to hold the
logbuf_lock.

To avoid this deadlock, drop the num_online_cpus() check and call
the printk_safe_flush_on_panic() before panic_notifier_list callback,
attempt to re-init logbuf_lock from panic CPU.

Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
---
 kernel/panic.c              | 3 +++
 kernel/printk/printk_safe.c | 3 ---
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index b69ee9e76cb2..8dbcb2227b60 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -255,6 +255,9 @@ void panic(const char *fmt, ...)
 		crash_smp_send_stop();
 	}
 
+	/* Call flush even twice. It tries harder with a single online CPU */
+	printk_safe_flush_on_panic();
+
 	/*
 	 * Run any panic handlers, including those that might need to
 	 * add information to the kmsg dump output.
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index d9a659a686f3..9ebc1723e1a4 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -269,9 +269,6 @@ void printk_safe_flush_on_panic(void)
 	 * Do not risk a double release when more CPUs are up.
 	 */
 	if (raw_spin_is_locked(&logbuf_lock)) {
-		if (num_online_cpus() > 1)
-			return;
-
 		debug_locks_off();
 		raw_spin_lock_init(&logbuf_lock);
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread