All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] printk: reduce deadlocks during panic
@ 2022-02-01 18:57 Stephen Brennan
  2022-02-01 18:57 ` [PATCH v3 1/4] printk: Add panic_in_progress helper Stephen Brennan
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Stephen Brennan @ 2022-02-01 18:57 UTC (permalink / raw)
  To: Sergey Senozhatsky, Petr Mladek; +Cc: Steven Rostedt, linux-kernel, John Ogness

When a caller writes heavily to the kernel log (e.g. writing to
/dev/kmsg in a loop) while another panics, there's currently a high
likelihood of a deadlock (see patch 2 for the full description of this
deadlock).

The principle fix is to disable the optimistic spin once panic_cpu is
set, so the panic CPU doesn't spin waiting for a halted CPU to hand over
the console_sem.

However, this exposed us to a livelock situation, where the panic CPU
holds the console_sem, and another CPU could fill up the log buffer
faster than the consoles could drain it, preventing the panic from
progressing and halting the other CPUs. To avoid this, patch 3 adds a
mechanism to suppress printk (from non-panic-CPU) during panic, if we
reach a threshold of dropped messages.

A major goal with all of these patches is to try to decrease the
likelihood that another CPU is holding the console_sem when we halt it
in panic(). This reduces the odds of needing to break locks and
potentially encountering further deadlocks with the console drivers.

To test, I use the following script, kmsg_panic.sh:

    #!/bin/bash
    date
    # 991 chars (based on log buffer size):
    chars="$(printf 'a%.0s' {1..991})"
    while :; do
        echo $chars > /dev/kmsg
    done &
    echo c > /proc/sysrq-trigger &
    date
    exit

I defined a hang as any time the system did not reboot to a login prompt
on the serial console within 60 seconds. Here are the statistics on
hangs using this script, before and after the patch.

before:  776 hangs / 1484 trials - 52.3%
after :    0 hangs /  15k trials -  0.0%

v3:
    Some mild style changes, none of which affect testing (which has run
    continuously and is now over 15 thousand trials without a hang!)
v2:
    Each patch has minor updates from code reviews. I've re-done testing and
    updated the above statistics. Exact changes are in each patch.

Stephen Brennan (4):
  printk: Add panic_in_progress helper
  printk: disable optimistic spin during panic
  printk: Avoid livelock with heavy printk during panic
  printk: Drop console_sem during panic

 kernel/printk/printk.c | 55 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 54 insertions(+), 1 deletion(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/4] printk: Add panic_in_progress helper
  2022-02-01 18:57 [PATCH v3 0/4] printk: reduce deadlocks during panic Stephen Brennan
@ 2022-02-01 18:57 ` Stephen Brennan
  2022-02-02  1:39   ` Stephen Brennan
  2022-02-01 18:58 ` [PATCH v3 2/4] printk: disable optimistic spin during panic Stephen Brennan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Stephen Brennan @ 2022-02-01 18:57 UTC (permalink / raw)
  To: Sergey Senozhatsky, Petr Mladek
  Cc: Steven Rostedt, linux-kernel, John Ogness, Stephen Brennan

This will be used help avoid deadlocks during panics. Although it would
be better to include this in linux/panic.h, it would require that header
to include linux/atomic.h as well. On some architectures, this results
in a circular dependency. So instead add the helper directly to
printk.c.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---

Notes:
    v3: Move the helper into printk.c due to circular include
    v2: Switch from macro to static inline function

 kernel/printk/printk.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 57b132b658e1..b33c2861a8fc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1762,6 +1762,11 @@ static DEFINE_RAW_SPINLOCK(console_owner_lock);
 static struct task_struct *console_owner;
 static bool console_waiter;
 
+static bool panic_in_progress(void)
+{
+	return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
+}
+
 /**
  * console_lock_spinning_enable - mark beginning of code where another
  *	thread might safely busy wait
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/4] printk: disable optimistic spin during panic
  2022-02-01 18:57 [PATCH v3 0/4] printk: reduce deadlocks during panic Stephen Brennan
  2022-02-01 18:57 ` [PATCH v3 1/4] printk: Add panic_in_progress helper Stephen Brennan
@ 2022-02-01 18:58 ` Stephen Brennan
  2022-02-01 18:58 ` [PATCH v3 3/4] printk: Avoid livelock with heavy printk " Stephen Brennan
  2022-02-01 18:58 ` [PATCH v3 4/4] printk: Drop console_sem " Stephen Brennan
  3 siblings, 0 replies; 11+ messages in thread
From: Stephen Brennan @ 2022-02-01 18:58 UTC (permalink / raw)
  To: Sergey Senozhatsky, Petr Mladek
  Cc: Steven Rostedt, linux-kernel, John Ogness, Stephen Brennan

A CPU executing with console lock spinning enabled might be halted
during a panic. Before the panicking CPU calls console_flush_on_panic(),
it may call console_trylock(), which attempts to optimistically spin,
deadlocking the panic CPU:

CPU 0 (panic CPU)             CPU 1
-----------------             ------
                              printk() {
                                vprintk_func() {
                                  vprintk_default() {
                                    vprintk_emit() {
                                      console_unlock() {
                                        console_lock_spinning_enable();
                                        ... printing to console ...
panic() {
  crash_smp_send_stop() {
    NMI  -------------------> HALT
  }
  atomic_notifier_call_chain() {
    printk() {
      ...
      console_trylock_spinnning() {
        // optimistic spin infinitely

This hang during panic can be induced when a kdump kernel is loaded, and
crash_kexec_post_notifiers=1 is present on the kernel command line. The
following script which concurrently writes to /dev/kmsg, and triggers a
panic, can result in this hang:

    #!/bin/bash
    date
    # 991 chars (based on log buffer size):
    chars="$(printf 'a%.0s' {1..991})"
    while :; do
        echo $chars > /dev/kmsg
    done &
    echo c > /proc/sysrq-trigger &
    date
    exit

To avoid this deadlock, ensure that console_trylock_spinning() does not
allow spinning once a panic has begun.

Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes")

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b33c2861a8fc..1b96166eea35 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1848,6 +1848,16 @@ static int console_trylock_spinning(void)
 	if (console_trylock())
 		return 1;
 
+	/*
+	 * It's unsafe to spin once a panic has begun. If we are the
+	 * panic CPU, we may have already halted the owner of the
+	 * console_sem. If we are not the panic CPU, then we should
+	 * avoid taking console_sem, so the panic CPU has a better
+	 * chance of cleanly acquiring it later.
+	 */
+	if (panic_in_progress())
+		return 0;
+
 	printk_safe_enter_irqsave(flags);
 
 	raw_spin_lock(&console_owner_lock);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/4] printk: Avoid livelock with heavy printk during panic
  2022-02-01 18:57 [PATCH v3 0/4] printk: reduce deadlocks during panic Stephen Brennan
  2022-02-01 18:57 ` [PATCH v3 1/4] printk: Add panic_in_progress helper Stephen Brennan
  2022-02-01 18:58 ` [PATCH v3 2/4] printk: disable optimistic spin during panic Stephen Brennan
@ 2022-02-01 18:58 ` Stephen Brennan
  2022-02-02  0:16     ` kernel test robot
  2022-02-01 18:58 ` [PATCH v3 4/4] printk: Drop console_sem " Stephen Brennan
  3 siblings, 1 reply; 11+ messages in thread
From: Stephen Brennan @ 2022-02-01 18:58 UTC (permalink / raw)
  To: Sergey Senozhatsky, Petr Mladek
  Cc: Steven Rostedt, linux-kernel, John Ogness, Stephen Brennan

During panic(), if another CPU is writing heavily the kernel log (e.g.
via /dev/kmsg), then the panic CPU may livelock writing out its messages
to the console. Note when too many messages are dropped during panic and
suppress further printk, except from the panic CPU. This could result in
some important messages being dropped. However, messages are already
being dropped, so this approach at least prevents a livelock.

Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---

Notes:
    v3: Use pr_warn_once, and don't break the message line
    v2: Add pr_warn when we suppress printk on non-panic CPU

 kernel/printk/printk.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1b96166eea35..cc7bb86f7bfe 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -93,6 +93,12 @@ EXPORT_SYMBOL_GPL(console_drivers);
  */
 int __read_mostly suppress_printk;
 
+/*
+ * During panic, heavy printk by other CPUs can delay the
+ * panic and risk deadlock on console resources.
+ */
+int __read_mostly suppress_panic_printk;
+
 #ifdef CONFIG_LOCKDEP
 static struct lockdep_map console_lock_dep_map = {
 	.name = "console_lock"
@@ -2233,6 +2239,10 @@ asmlinkage int vprintk_emit(int facility, int level,
 	if (unlikely(suppress_printk))
 		return 0;
 
+	if (unlikely(suppress_panic_printk) &&
+	    atomic_read(&panic_cpu) != raw_smp_processor_id())
+		return 0;
+
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
 		in_sched = true;
@@ -2618,6 +2628,7 @@ void console_unlock(void)
 {
 	static char ext_text[CONSOLE_EXT_LOG_MAX];
 	static char text[CONSOLE_LOG_MAX];
+	static int panic_console_dropped;
 	unsigned long flags;
 	bool do_cond_resched, retry;
 	struct printk_info info;
@@ -2672,6 +2683,10 @@ void console_unlock(void)
 		if (console_seq != r.info->seq) {
 			console_dropped += r.info->seq - console_seq;
 			console_seq = r.info->seq;
+			if (panic_in_progress() && panic_console_dropped++ > 10) {
+				suppress_panic_printk = 1;
+				pr_warn_once("Too many dropped messages. Supress messages on non-panic CPUs to prevent livelock.\n");
+			}
 		}
 
 		if (suppress_message_printing(r.info->level)) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 4/4] printk: Drop console_sem during panic
  2022-02-01 18:57 [PATCH v3 0/4] printk: reduce deadlocks during panic Stephen Brennan
                   ` (2 preceding siblings ...)
  2022-02-01 18:58 ` [PATCH v3 3/4] printk: Avoid livelock with heavy printk " Stephen Brennan
@ 2022-02-01 18:58 ` Stephen Brennan
  2022-02-04  4:04   ` Sergey Senozhatsky
  3 siblings, 1 reply; 11+ messages in thread
From: Stephen Brennan @ 2022-02-01 18:58 UTC (permalink / raw)
  To: Sergey Senozhatsky, Petr Mladek
  Cc: Steven Rostedt, linux-kernel, John Ogness, Stephen Brennan

If another CPU is in panic, we are about to be halted. Try to gracefully
abandon the console_sem, leaving it free for the panic CPU to grab.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---

Notes:
    v2: Factor check out to a helper, and check at the end of
    console_unlock() to prevent retry as well.

 kernel/printk/printk.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index cc7bb86f7bfe..35676e76482e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2598,6 +2598,25 @@ static int have_callable_console(void)
 	return 0;
 }
 
+/*
+ * Return true when this CPU should unlock console_sem without pushing all
+ * messages to the console. This reduces the chance that the console is
+ * locked when the panic CPU tries to use it.
+ */
+static bool abandon_console_lock_in_panic(void)
+{
+	if (!panic_in_progress())
+		return false;
+
+	/*
+	 * We can use raw_smp_processor_id() here because it is impossible for
+	 * the task to be migrated to the panic_cpu, or away from it. If
+	 * panic_cpu has already been set, and we're not currently executing on
+	 * that CPU, then we never will be.
+	 */
+	return atomic_read(&panic_cpu) != raw_smp_processor_id();
+}
+
 /*
  * Can we actually use the console at this time on this cpu?
  *
@@ -2746,6 +2765,10 @@ void console_unlock(void)
 		if (handover)
 			return;
 
+		/* Allow panic_cpu to take over the consoles safely */
+		if (abandon_console_lock_in_panic())
+			break;
+
 		if (do_cond_resched)
 			cond_resched();
 	}
@@ -2763,7 +2786,7 @@ void console_unlock(void)
 	 * flush, no worries.
 	 */
 	retry = prb_read_valid(prb, next_seq, NULL);
-	if (retry && console_trylock())
+	if (retry && !abandon_console_lock_in_panic() && console_trylock())
 		goto again;
 }
 EXPORT_SYMBOL(console_unlock);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/4] printk: Avoid livelock with heavy printk during panic
  2022-02-01 18:58 ` [PATCH v3 3/4] printk: Avoid livelock with heavy printk " Stephen Brennan
@ 2022-02-02  0:16     ` kernel test robot
  0 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2022-02-02  0:16 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: llvm, kbuild-all

Hi Stephen,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.17-rc2 next-20220201]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Stephen-Brennan/printk-reduce-deadlocks-during-panic/20220202-030025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 26291c54e111ff6ba87a164d85d4a4e134b7315c
config: hexagon-buildonly-randconfig-r006-20220130 (https://download.01.org/0day-ci/archive/20220202/202202020859.21684oMh-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 6b1e844b69f15bb7dffaf9365cd2b355d2eb7579)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f8bc56f0cf3ba4bdd9948d9743b42927f8415345
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Stephen-Brennan/printk-reduce-deadlocks-during-panic/20220202-030025
        git checkout f8bc56f0cf3ba4bdd9948d9743b42927f8415345
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash kernel/printk/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> kernel/printk/printk.c:2686:8: error: implicit declaration of function 'panic_in_progress' [-Werror,-Wimplicit-function-declaration]
                           if (panic_in_progress() && panic_console_dropped++ > 10) {
                               ^
   1 error generated.


vim +/panic_in_progress +2686 kernel/printk/printk.c

  2612	
  2613	/**
  2614	 * console_unlock - unlock the console system
  2615	 *
  2616	 * Releases the console_lock which the caller holds on the console system
  2617	 * and the console driver list.
  2618	 *
  2619	 * While the console_lock was held, console output may have been buffered
  2620	 * by printk().  If this is the case, console_unlock(); emits
  2621	 * the output prior to releasing the lock.
  2622	 *
  2623	 * If there is output waiting, we wake /dev/kmsg and syslog() users.
  2624	 *
  2625	 * console_unlock(); may be called from any context.
  2626	 */
  2627	void console_unlock(void)
  2628	{
  2629		static char ext_text[CONSOLE_EXT_LOG_MAX];
  2630		static char text[CONSOLE_LOG_MAX];
  2631		static int panic_console_dropped;
  2632		unsigned long flags;
  2633		bool do_cond_resched, retry;
  2634		struct printk_info info;
  2635		struct printk_record r;
  2636		u64 __maybe_unused next_seq;
  2637	
  2638		if (console_suspended) {
  2639			up_console_sem();
  2640			return;
  2641		}
  2642	
  2643		prb_rec_init_rd(&r, &info, text, sizeof(text));
  2644	
  2645		/*
  2646		 * Console drivers are called with interrupts disabled, so
  2647		 * @console_may_schedule should be cleared before; however, we may
  2648		 * end up dumping a lot of lines, for example, if called from
  2649		 * console registration path, and should invoke cond_resched()
  2650		 * between lines if allowable.  Not doing so can cause a very long
  2651		 * scheduling stall on a slow console leading to RCU stall and
  2652		 * softlockup warnings which exacerbate the issue with more
  2653		 * messages practically incapacitating the system.
  2654		 *
  2655		 * console_trylock() is not able to detect the preemptive
  2656		 * context reliably. Therefore the value must be stored before
  2657		 * and cleared after the "again" goto label.
  2658		 */
  2659		do_cond_resched = console_may_schedule;
  2660	again:
  2661		console_may_schedule = 0;
  2662	
  2663		/*
  2664		 * We released the console_sem lock, so we need to recheck if
  2665		 * cpu is online and (if not) is there at least one CON_ANYTIME
  2666		 * console.
  2667		 */
  2668		if (!can_use_console()) {
  2669			console_locked = 0;
  2670			up_console_sem();
  2671			return;
  2672		}
  2673	
  2674		for (;;) {
  2675			size_t ext_len = 0;
  2676			int handover;
  2677			size_t len;
  2678	
  2679	skip:
  2680			if (!prb_read_valid(prb, console_seq, &r))
  2681				break;
  2682	
  2683			if (console_seq != r.info->seq) {
  2684				console_dropped += r.info->seq - console_seq;
  2685				console_seq = r.info->seq;
> 2686				if (panic_in_progress() && panic_console_dropped++ > 10) {
  2687					suppress_panic_printk = 1;
  2688					pr_warn_once("Too many dropped messages. Supress messages on non-panic CPUs to prevent livelock.\n");
  2689				}
  2690			}
  2691	
  2692			if (suppress_message_printing(r.info->level)) {
  2693				/*
  2694				 * Skip record we have buffered and already printed
  2695				 * directly to the console when we received it, and
  2696				 * record that has level above the console loglevel.
  2697				 */
  2698				console_seq++;
  2699				goto skip;
  2700			}
  2701	
  2702			/* Output to all consoles once old messages replayed. */
  2703			if (unlikely(exclusive_console &&
  2704				     console_seq >= exclusive_console_stop_seq)) {
  2705				exclusive_console = NULL;
  2706			}
  2707	
  2708			/*
  2709			 * Handle extended console text first because later
  2710			 * record_print_text() will modify the record buffer in-place.
  2711			 */
  2712			if (nr_ext_console_drivers) {
  2713				ext_len = info_print_ext_header(ext_text,
  2714							sizeof(ext_text),
  2715							r.info);
  2716				ext_len += msg_print_ext_body(ext_text + ext_len,
  2717							sizeof(ext_text) - ext_len,
  2718							&r.text_buf[0],
  2719							r.info->text_len,
  2720							&r.info->dev_info);
  2721			}
  2722			len = record_print_text(&r,
  2723					console_msg_format & MSG_FORMAT_SYSLOG,
  2724					printk_time);
  2725			console_seq++;
  2726	
  2727			/*
  2728			 * While actively printing out messages, if another printk()
  2729			 * were to occur on another CPU, it may wait for this one to
  2730			 * finish. This task can not be preempted if there is a
  2731			 * waiter waiting to take over.
  2732			 *
  2733			 * Interrupts are disabled because the hand over to a waiter
  2734			 * must not be interrupted until the hand over is completed
  2735			 * (@console_waiter is cleared).
  2736			 */
  2737			printk_safe_enter_irqsave(flags);
  2738			console_lock_spinning_enable();
  2739	
  2740			stop_critical_timings();	/* don't trace print latency */
  2741			call_console_drivers(ext_text, ext_len, text, len);
  2742			start_critical_timings();
  2743	
  2744			handover = console_lock_spinning_disable_and_check();
  2745			printk_safe_exit_irqrestore(flags);
  2746			if (handover)
  2747				return;
  2748	
  2749			if (do_cond_resched)
  2750				cond_resched();
  2751		}
  2752	
  2753		/* Get consistent value of the next-to-be-used sequence number. */
  2754		next_seq = console_seq;
  2755	
  2756		console_locked = 0;
  2757		up_console_sem();
  2758	
  2759		/*
  2760		 * Someone could have filled up the buffer again, so re-check if there's
  2761		 * something to flush. In case we cannot trylock the console_sem again,
  2762		 * there's a new owner and the console_unlock() from them will do the
  2763		 * flush, no worries.
  2764		 */
  2765		retry = prb_read_valid(prb, next_seq, NULL);
  2766		if (retry && console_trylock())
  2767			goto again;
  2768	}
  2769	EXPORT_SYMBOL(console_unlock);
  2770	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/4] printk: Avoid livelock with heavy printk during panic
@ 2022-02-02  0:16     ` kernel test robot
  0 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2022-02-02  0:16 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 8011 bytes --]

Hi Stephen,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.17-rc2 next-20220201]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Stephen-Brennan/printk-reduce-deadlocks-during-panic/20220202-030025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 26291c54e111ff6ba87a164d85d4a4e134b7315c
config: hexagon-buildonly-randconfig-r006-20220130 (https://download.01.org/0day-ci/archive/20220202/202202020859.21684oMh-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 6b1e844b69f15bb7dffaf9365cd2b355d2eb7579)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f8bc56f0cf3ba4bdd9948d9743b42927f8415345
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Stephen-Brennan/printk-reduce-deadlocks-during-panic/20220202-030025
        git checkout f8bc56f0cf3ba4bdd9948d9743b42927f8415345
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash kernel/printk/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> kernel/printk/printk.c:2686:8: error: implicit declaration of function 'panic_in_progress' [-Werror,-Wimplicit-function-declaration]
                           if (panic_in_progress() && panic_console_dropped++ > 10) {
                               ^
   1 error generated.


vim +/panic_in_progress +2686 kernel/printk/printk.c

  2612	
  2613	/**
  2614	 * console_unlock - unlock the console system
  2615	 *
  2616	 * Releases the console_lock which the caller holds on the console system
  2617	 * and the console driver list.
  2618	 *
  2619	 * While the console_lock was held, console output may have been buffered
  2620	 * by printk().  If this is the case, console_unlock(); emits
  2621	 * the output prior to releasing the lock.
  2622	 *
  2623	 * If there is output waiting, we wake /dev/kmsg and syslog() users.
  2624	 *
  2625	 * console_unlock(); may be called from any context.
  2626	 */
  2627	void console_unlock(void)
  2628	{
  2629		static char ext_text[CONSOLE_EXT_LOG_MAX];
  2630		static char text[CONSOLE_LOG_MAX];
  2631		static int panic_console_dropped;
  2632		unsigned long flags;
  2633		bool do_cond_resched, retry;
  2634		struct printk_info info;
  2635		struct printk_record r;
  2636		u64 __maybe_unused next_seq;
  2637	
  2638		if (console_suspended) {
  2639			up_console_sem();
  2640			return;
  2641		}
  2642	
  2643		prb_rec_init_rd(&r, &info, text, sizeof(text));
  2644	
  2645		/*
  2646		 * Console drivers are called with interrupts disabled, so
  2647		 * @console_may_schedule should be cleared before; however, we may
  2648		 * end up dumping a lot of lines, for example, if called from
  2649		 * console registration path, and should invoke cond_resched()
  2650		 * between lines if allowable.  Not doing so can cause a very long
  2651		 * scheduling stall on a slow console leading to RCU stall and
  2652		 * softlockup warnings which exacerbate the issue with more
  2653		 * messages practically incapacitating the system.
  2654		 *
  2655		 * console_trylock() is not able to detect the preemptive
  2656		 * context reliably. Therefore the value must be stored before
  2657		 * and cleared after the "again" goto label.
  2658		 */
  2659		do_cond_resched = console_may_schedule;
  2660	again:
  2661		console_may_schedule = 0;
  2662	
  2663		/*
  2664		 * We released the console_sem lock, so we need to recheck if
  2665		 * cpu is online and (if not) is there at least one CON_ANYTIME
  2666		 * console.
  2667		 */
  2668		if (!can_use_console()) {
  2669			console_locked = 0;
  2670			up_console_sem();
  2671			return;
  2672		}
  2673	
  2674		for (;;) {
  2675			size_t ext_len = 0;
  2676			int handover;
  2677			size_t len;
  2678	
  2679	skip:
  2680			if (!prb_read_valid(prb, console_seq, &r))
  2681				break;
  2682	
  2683			if (console_seq != r.info->seq) {
  2684				console_dropped += r.info->seq - console_seq;
  2685				console_seq = r.info->seq;
> 2686				if (panic_in_progress() && panic_console_dropped++ > 10) {
  2687					suppress_panic_printk = 1;
  2688					pr_warn_once("Too many dropped messages. Supress messages on non-panic CPUs to prevent livelock.\n");
  2689				}
  2690			}
  2691	
  2692			if (suppress_message_printing(r.info->level)) {
  2693				/*
  2694				 * Skip record we have buffered and already printed
  2695				 * directly to the console when we received it, and
  2696				 * record that has level above the console loglevel.
  2697				 */
  2698				console_seq++;
  2699				goto skip;
  2700			}
  2701	
  2702			/* Output to all consoles once old messages replayed. */
  2703			if (unlikely(exclusive_console &&
  2704				     console_seq >= exclusive_console_stop_seq)) {
  2705				exclusive_console = NULL;
  2706			}
  2707	
  2708			/*
  2709			 * Handle extended console text first because later
  2710			 * record_print_text() will modify the record buffer in-place.
  2711			 */
  2712			if (nr_ext_console_drivers) {
  2713				ext_len = info_print_ext_header(ext_text,
  2714							sizeof(ext_text),
  2715							r.info);
  2716				ext_len += msg_print_ext_body(ext_text + ext_len,
  2717							sizeof(ext_text) - ext_len,
  2718							&r.text_buf[0],
  2719							r.info->text_len,
  2720							&r.info->dev_info);
  2721			}
  2722			len = record_print_text(&r,
  2723					console_msg_format & MSG_FORMAT_SYSLOG,
  2724					printk_time);
  2725			console_seq++;
  2726	
  2727			/*
  2728			 * While actively printing out messages, if another printk()
  2729			 * were to occur on another CPU, it may wait for this one to
  2730			 * finish. This task can not be preempted if there is a
  2731			 * waiter waiting to take over.
  2732			 *
  2733			 * Interrupts are disabled because the hand over to a waiter
  2734			 * must not be interrupted until the hand over is completed
  2735			 * (@console_waiter is cleared).
  2736			 */
  2737			printk_safe_enter_irqsave(flags);
  2738			console_lock_spinning_enable();
  2739	
  2740			stop_critical_timings();	/* don't trace print latency */
  2741			call_console_drivers(ext_text, ext_len, text, len);
  2742			start_critical_timings();
  2743	
  2744			handover = console_lock_spinning_disable_and_check();
  2745			printk_safe_exit_irqrestore(flags);
  2746			if (handover)
  2747				return;
  2748	
  2749			if (do_cond_resched)
  2750				cond_resched();
  2751		}
  2752	
  2753		/* Get consistent value of the next-to-be-used sequence number. */
  2754		next_seq = console_seq;
  2755	
  2756		console_locked = 0;
  2757		up_console_sem();
  2758	
  2759		/*
  2760		 * Someone could have filled up the buffer again, so re-check if there's
  2761		 * something to flush. In case we cannot trylock the console_sem again,
  2762		 * there's a new owner and the console_unlock() from them will do the
  2763		 * flush, no worries.
  2764		 */
  2765		retry = prb_read_valid(prb, next_seq, NULL);
  2766		if (retry && console_trylock())
  2767			goto again;
  2768	}
  2769	EXPORT_SYMBOL(console_unlock);
  2770	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/4] printk: Add panic_in_progress helper
  2022-02-01 18:57 ` [PATCH v3 1/4] printk: Add panic_in_progress helper Stephen Brennan
@ 2022-02-02  1:39   ` Stephen Brennan
  0 siblings, 0 replies; 11+ messages in thread
From: Stephen Brennan @ 2022-02-02  1:39 UTC (permalink / raw)
  To: Sergey Senozhatsky, Petr Mladek; +Cc: Steven Rostedt, linux-kernel, John Ogness

Stephen Brennan <stephen.s.brennan@oracle.com> writes:
> This will be used help avoid deadlocks during panics. Although it would
> be better to include this in linux/panic.h, it would require that header
> to include linux/atomic.h as well. On some architectures, this results
> in a circular dependency. So instead add the helper directly to
> printk.c.
>
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> Reviewed-by: Petr Mladek <pmladek@suse.com>
> ---
>
> Notes:
>     v3: Move the helper into printk.c due to circular include
>     v2: Switch from macro to static inline function
>
>  kernel/printk/printk.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 57b132b658e1..b33c2861a8fc 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1762,6 +1762,11 @@ static DEFINE_RAW_SPINLOCK(console_owner_lock);
>  static struct task_struct *console_owner;
>  static bool console_waiter;
>  
> +static bool panic_in_progress(void)
> +{
> +	return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
> +}
> +

Unfortunately this is defined within a (very large)
#ifdef CONFIG_PRINTK section. Thanks to the kbuild bot which will have
earned the reported-by tag on this patch :D

v4 will be tomorrow. Sorry for the noise everyone!

>  /**
>   * console_lock_spinning_enable - mark beginning of code where another
>   *	thread might safely busy wait
> -- 
> 2.30.2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 4/4] printk: Drop console_sem during panic
  2022-02-01 18:58 ` [PATCH v3 4/4] printk: Drop console_sem " Stephen Brennan
@ 2022-02-04  4:04   ` Sergey Senozhatsky
  2022-02-04 18:53     ` Stephen Brennan
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2022-02-04  4:04 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, linux-kernel,
	John Ogness

On (22/02/01 10:58), Stephen Brennan wrote:
> +/*
> + * Return true when this CPU should unlock console_sem without pushing all
> + * messages to the console. This reduces the chance that the console is
> + * locked when the panic CPU tries to use it.
> + */
> +static bool abandon_console_lock_in_panic(void)
> +{
> +	if (!panic_in_progress())
> +		return false;
> +
> +	/*
> +	 * We can use raw_smp_processor_id() here because it is impossible for
> +	 * the task to be migrated to the panic_cpu, or away from it. If
> +	 * panic_cpu has already been set, and we're not currently executing on
> +	 * that CPU, then we never will be.
> +	 */
> +	return atomic_read(&panic_cpu) != raw_smp_processor_id();
> +}
> +
>  /*
>   * Can we actually use the console at this time on this cpu?
>   *
> @@ -2746,6 +2765,10 @@ void console_unlock(void)
>  		if (handover)
>  			return;
>  
> +		/* Allow panic_cpu to take over the consoles safely */
> +		if (abandon_console_lock_in_panic())
> +			break;

Sorry, why not just `return` like in handover case?

> +
>  		if (do_cond_resched)
>  			cond_resched();
>  	}
> @@ -2763,7 +2786,7 @@ void console_unlock(void)
>  	 * flush, no worries.
>  	 */
>  	retry = prb_read_valid(prb, next_seq, NULL);
> -	if (retry && console_trylock())
> +	if (retry && !abandon_console_lock_in_panic() && console_trylock())
>  		goto again;
>  }
>  EXPORT_SYMBOL(console_unlock);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 4/4] printk: Drop console_sem during panic
  2022-02-04  4:04   ` Sergey Senozhatsky
@ 2022-02-04 18:53     ` Stephen Brennan
  2022-02-08  2:15       ` Sergey Senozhatsky
  0 siblings, 1 reply; 11+ messages in thread
From: Stephen Brennan @ 2022-02-04 18:53 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, linux-kernel,
	John Ogness

Sergey Senozhatsky <senozhatsky@chromium.org> writes:
> On (22/02/01 10:58), Stephen Brennan wrote:
>> +/*
>> + * Return true when this CPU should unlock console_sem without pushing all
>> + * messages to the console. This reduces the chance that the console is
>> + * locked when the panic CPU tries to use it.
>> + */
>> +static bool abandon_console_lock_in_panic(void)
>> +{
>> +	if (!panic_in_progress())
>> +		return false;
>> +
>> +	/*
>> +	 * We can use raw_smp_processor_id() here because it is impossible for
>> +	 * the task to be migrated to the panic_cpu, or away from it. If
>> +	 * panic_cpu has already been set, and we're not currently executing on
>> +	 * that CPU, then we never will be.
>> +	 */
>> +	return atomic_read(&panic_cpu) != raw_smp_processor_id();
>> +}
>> +
>>  /*
>>   * Can we actually use the console at this time on this cpu?
>>   *
>> @@ -2746,6 +2765,10 @@ void console_unlock(void)
>>  		if (handover)
>>  			return;
>>  
>> +		/* Allow panic_cpu to take over the consoles safely */
>> +		if (abandon_console_lock_in_panic())
>> +			break;
>
> Sorry, why not just `return` like in handover case?

We need to drop console_sem before returning, since the whole benefit
here is to increase the chance that console_sem is unlocked when the
panic_cpu halts this CPU.

in the handover case, there's another cpu waiting, and we're essentially
transferring the console_sem ownership to that cpu, so we explicitly
return and skip the unlocking portion.

Does this need some comments to clarify it?

Admittedly if I had a few more lines of context in the diff, you would
see the console unlock directly after the loop; it's a bit clearer when
you're looking at the function as whole:

2768 		/* Allow panic_cpu to take over the consoles safely */
2769 		if (abandon_console_lock_in_panic())
2770 			break;
2771 
2772 		if (do_cond_resched)
2773 			cond_resched();
2774 	}
2775 
2776 	/* Get consistent value of the next-to-be-used sequence number. */
2777 	next_seq = console_seq;
2778 
2779 	console_locked = 0;
2780 	up_console_sem();

Stephen

>
>> +
>>  		if (do_cond_resched)
>>  			cond_resched();
>>  	}
>> @@ -2763,7 +2786,7 @@ void console_unlock(void)
>>  	 * flush, no worries.
>>  	 */
>>  	retry = prb_read_valid(prb, next_seq, NULL);
>> -	if (retry && console_trylock())
>> +	if (retry && !abandon_console_lock_in_panic() && console_trylock())
>>  		goto again;
>>  }
>>  EXPORT_SYMBOL(console_unlock);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 4/4] printk: Drop console_sem during panic
  2022-02-04 18:53     ` Stephen Brennan
@ 2022-02-08  2:15       ` Sergey Senozhatsky
  0 siblings, 0 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2022-02-08  2:15 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, linux-kernel,
	John Ogness

On (22/02/04 10:53), Stephen Brennan wrote:
> Sergey Senozhatsky <senozhatsky@chromium.org> writes:
> > On (22/02/01 10:58), Stephen Brennan wrote:
> >> +/*
> >> + * Return true when this CPU should unlock console_sem without pushing all
> >> + * messages to the console. This reduces the chance that the console is
> >> + * locked when the panic CPU tries to use it.
> >> + */
> >> +static bool abandon_console_lock_in_panic(void)
> >> +{
> >> +	if (!panic_in_progress())
> >> +		return false;
> >> +
> >> +	/*
> >> +	 * We can use raw_smp_processor_id() here because it is impossible for
> >> +	 * the task to be migrated to the panic_cpu, or away from it. If
> >> +	 * panic_cpu has already been set, and we're not currently executing on
> >> +	 * that CPU, then we never will be.
> >> +	 */
> >> +	return atomic_read(&panic_cpu) != raw_smp_processor_id();
> >> +}
> >> +
> >>  /*
> >>   * Can we actually use the console at this time on this cpu?
> >>   *
> >> @@ -2746,6 +2765,10 @@ void console_unlock(void)
> >>  		if (handover)
> >>  			return;
> >>  
> >> +		/* Allow panic_cpu to take over the consoles safely */
> >> +		if (abandon_console_lock_in_panic())
> >> +			break;
> >
> > Sorry, why not just `return` like in handover case?
> 
> We need to drop console_sem before returning, since the whole benefit
> here is to increase the chance that console_sem is unlocked when the
> panic_cpu halts this CPU.

Yes, that makes sense.

> in the handover case, there's another cpu waiting, and we're essentially
> transferring the console_sem ownership to that cpu, so we explicitly
> return and skip the unlocking portion.
> 
> Does this need some comments to clarify it?

No. Everything looks good. Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-02-08  2:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-01 18:57 [PATCH v3 0/4] printk: reduce deadlocks during panic Stephen Brennan
2022-02-01 18:57 ` [PATCH v3 1/4] printk: Add panic_in_progress helper Stephen Brennan
2022-02-02  1:39   ` Stephen Brennan
2022-02-01 18:58 ` [PATCH v3 2/4] printk: disable optimistic spin during panic Stephen Brennan
2022-02-01 18:58 ` [PATCH v3 3/4] printk: Avoid livelock with heavy printk " Stephen Brennan
2022-02-02  0:16   ` kernel test robot
2022-02-02  0:16     ` kernel test robot
2022-02-01 18:58 ` [PATCH v3 4/4] printk: Drop console_sem " Stephen Brennan
2022-02-04  4:04   ` Sergey Senozhatsky
2022-02-04 18:53     ` Stephen Brennan
2022-02-08  2:15       ` Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.