All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] printk: reduce deadlocks during panic
@ 2022-01-21 19:02 Stephen Brennan
  2022-01-21 19:02 ` [PATCH 1/4] panic: Add panic_in_progress helper Stephen Brennan
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Stephen Brennan @ 2022-01-21 19:02 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, John Ogness
  Cc: Stephen Brennan, Sergey Senozhatsky, linux-kernel

When a caller writes heavily to the kernel log (e.g. writing to
/dev/kmsg in a loop) while another panics, there's currently a high
likelihood of a deadlock (see patch 2 for the full description of this
deadlock).

The principle fix is to disable the optimistic spin once panic_cpu is
set, so the panic CPU doesn't spin waiting for a halted CPU to hand over
the console_sem.

However, this exposed us to a livelock situation, where the panic CPU
holds the console_sem, and another CPU could fill up the log buffer
faster than the consoles could drain it, preventing the panic from
progressing and halting the other CPUs. To avoid this, patch 3 adds a
mechanism to suppress printk (from non-panic-CPU) during panic, if we
reach a threshold of dropped messages.

A major goal with all of these patches is to try to decrease the
likelihood that another CPU is holding the console_sem when we halt it
in panic(). This reduces the odds of needing to break locks and
potentially encountering further deadlocks with the console drivers.

To test, I use the following script, kmsg_panic.sh:

    #!/bin/bash
    date
    # 991 chars (based on log buffer size):
    chars="$(printf 'a%.0s' {1..991})"
    while :; do
        echo $chars > /dev/kmsg
    done &
    echo c > /proc/sysrq-trigger &
    date
    exit

I defined a hang as any time the system did not reboot to a login prompt
on the serial console within 60 seconds. Here are the statistics on
hangs using this script, before and after the patch.

before:  776 hangs / 1484 trials - 52.3%
after :    0 hangs / 1238 trials -  0.0%


Stephen Brennan (4):
  panic: Add panic_in_progress helper
  printk: disable optimistic spin during panic
  printk: Avoid livelock with heavy printk during panic
  printk: Drop console_sem during panic

 include/linux/panic.h  |  3 +++
 kernel/printk/printk.c | 31 ++++++++++++++++++++++++++++++-
 2 files changed, 33 insertions(+), 1 deletion(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4] panic: Add panic_in_progress helper
  2022-01-21 19:02 [PATCH 0/4] printk: reduce deadlocks during panic Stephen Brennan
@ 2022-01-21 19:02 ` Stephen Brennan
  2022-01-25 11:48   ` Petr Mladek
  2022-01-21 19:02 ` [PATCH 2/4] printk: disable optimistic spin during panic Stephen Brennan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Stephen Brennan @ 2022-01-21 19:02 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, John Ogness
  Cc: Stephen Brennan, Sergey Senozhatsky, linux-kernel

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
We cannot define a static inline without including linux/atomic.h, so
I just added a macro for convenience in later patches. Since macros were
the only option, I didn't include a helper for
panic_in_progress_different_cpu().

 include/linux/panic.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/panic.h b/include/linux/panic.h
index f5844908a089..8e8bd50494d5 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -45,6 +45,9 @@ extern bool crash_kexec_post_notifiers;
 extern atomic_t panic_cpu;
 #define PANIC_CPU_INVALID	-1
 
+#define panic_in_progress()				\
+	unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID)
+
 /*
  * Only to be used by arch init code. If the user over-wrote the default
  * CONFIG_PANIC_TIMEOUT, honor it.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/4] printk: disable optimistic spin during panic
  2022-01-21 19:02 [PATCH 0/4] printk: reduce deadlocks during panic Stephen Brennan
  2022-01-21 19:02 ` [PATCH 1/4] panic: Add panic_in_progress helper Stephen Brennan
@ 2022-01-21 19:02 ` Stephen Brennan
  2022-01-25 12:42   ` Petr Mladek
  2022-01-26  9:18   ` Sergey Senozhatsky
  2022-01-21 19:02 ` [PATCH 3/4] printk: Avoid livelock with heavy printk " Stephen Brennan
  2022-01-21 19:02 ` [PATCH 4/4] printk: Drop console_sem " Stephen Brennan
  3 siblings, 2 replies; 21+ messages in thread
From: Stephen Brennan @ 2022-01-21 19:02 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, John Ogness
  Cc: Stephen Brennan, Sergey Senozhatsky, linux-kernel

A CPU executing with console lock spinning enabled might be halted
during a panic. Before the panicking CPU calls console_flush_on_panic(),
it may call console_trylock(), which attempts to optimistically spin,
deadlocking the panic CPU:

CPU 0 (panic CPU)             CPU 1
-----------------             ------
                              printk() {
                                vprintk_func() {
                                  vprintk_default() {
                                    vprintk_emit() {
                                      console_unlock() {
                                        console_lock_spinning_enable();
                                        ... printing to console ...
panic() {
  crash_smp_send_stop() {
    NMI  -------------------> HALT
  }
  atomic_notifier_call_chain() {
    printk() {
      ...
      console_trylock_spinnning() {
        // optimistic spin infinitely

This hang during panic can be induced when a kdump kernel is loaded, and
crash_kexec_post_notifiers=1 is present on the kernel command line. The
following script which concurrently writes to /dev/kmsg, and triggers a
panic, can result in this hang:

    #!/bin/bash
    date
    # 991 chars (based on log buffer size):
    chars="$(printf 'a%.0s' {1..991})"
    while :; do
        echo $chars > /dev/kmsg
    done &
    echo c > /proc/sysrq-trigger &
    date
    exit

To avoid this deadlock, ensure that console_trylock_spinning() does not
allow spinning once a panic has begun.

Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes")

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 kernel/printk/printk.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 57b132b658e1..20b4b71a1a07 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1843,6 +1843,16 @@ static int console_trylock_spinning(void)
 	if (console_trylock())
 		return 1;
 
+	/*
+	 * It's unsafe to spin once a panic has begun. If we are the
+	 * panic CPU, we may have already halted the owner of the
+	 * console_sem. If we are not the panic CPU, then we should
+	 * avoid taking console_sem, so the panic CPU has a better
+	 * chance of cleanly acquiring it later.
+	 */
+	if (panic_in_progress())
+		return 0;
+
 	printk_safe_enter_irqsave(flags);
 
 	raw_spin_lock(&console_owner_lock);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/4] printk: Avoid livelock with heavy printk during panic
  2022-01-21 19:02 [PATCH 0/4] printk: reduce deadlocks during panic Stephen Brennan
  2022-01-21 19:02 ` [PATCH 1/4] panic: Add panic_in_progress helper Stephen Brennan
  2022-01-21 19:02 ` [PATCH 2/4] printk: disable optimistic spin during panic Stephen Brennan
@ 2022-01-21 19:02 ` Stephen Brennan
  2022-01-25 14:25   ` Petr Mladek
  2022-01-21 19:02 ` [PATCH 4/4] printk: Drop console_sem " Stephen Brennan
  3 siblings, 1 reply; 21+ messages in thread
From: Stephen Brennan @ 2022-01-21 19:02 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, John Ogness
  Cc: Stephen Brennan, Sergey Senozhatsky, linux-kernel

During a panic(), if another CPU is writing heavily the kernel log (e.g.
via /dev/kmsg), then the panic CPU may livelock writing out its messages
to the console. Note when too many messages are dropped during panic and
suppress further printk, except from the panic CPU.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 kernel/printk/printk.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 20b4b71a1a07..ca253ac07615 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -93,6 +93,12 @@ EXPORT_SYMBOL_GPL(console_drivers);
  */
 int __read_mostly suppress_printk;
 
+/*
+ * During panic, heavy printk by other CPUs can delay the
+ * panic and risk deadlock on console resources.
+ */
+int __read_mostly suppress_panic_printk;
+
 #ifdef CONFIG_LOCKDEP
 static struct lockdep_map console_lock_dep_map = {
 	.name = "console_lock"
@@ -2228,6 +2234,10 @@ asmlinkage int vprintk_emit(int facility, int level,
 	if (unlikely(suppress_printk))
 		return 0;
 
+	if (unlikely(suppress_panic_printk) &&
+	    atomic_read(&panic_cpu) != raw_smp_processor_id())
+		return 0;
+
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
 		in_sched = true;
@@ -2613,6 +2623,7 @@ void console_unlock(void)
 {
 	static char ext_text[CONSOLE_EXT_LOG_MAX];
 	static char text[CONSOLE_LOG_MAX];
+	static int panic_console_dropped;
 	unsigned long flags;
 	bool do_cond_resched, retry;
 	struct printk_info info;
@@ -2667,6 +2678,8 @@ void console_unlock(void)
 		if (console_seq != r.info->seq) {
 			console_dropped += r.info->seq - console_seq;
 			console_seq = r.info->seq;
+			if (panic_in_progress() && panic_console_dropped++ > 10)
+				suppress_panic_printk = 1;
 		}
 
 		if (suppress_message_printing(r.info->level)) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/4] printk: Drop console_sem during panic
  2022-01-21 19:02 [PATCH 0/4] printk: reduce deadlocks during panic Stephen Brennan
                   ` (2 preceding siblings ...)
  2022-01-21 19:02 ` [PATCH 3/4] printk: Avoid livelock with heavy printk " Stephen Brennan
@ 2022-01-21 19:02 ` Stephen Brennan
  2022-01-24 16:12   ` John Ogness
  3 siblings, 1 reply; 21+ messages in thread
From: Stephen Brennan @ 2022-01-21 19:02 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, John Ogness
  Cc: Stephen Brennan, Sergey Senozhatsky, linux-kernel

If another CPU is in panic, we are about to be halted. Try to gracefully
drop console_sem and allow the panic CPU to grab it easily.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 kernel/printk/printk.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ca253ac07615..c2dc8ebd9509 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2668,7 +2668,7 @@ void console_unlock(void)
 
 	for (;;) {
 		size_t ext_len = 0;
-		int handover;
+		int handover, pcpu;
 		size_t len;
 
 skip:
@@ -2739,6 +2739,12 @@ void console_unlock(void)
 		if (handover)
 			return;
 
+		/* Allow panic_cpu to take over the consoles safely */
+		pcpu = atomic_read(&panic_cpu);
+		if (unlikely(pcpu != PANIC_CPU_INVALID &&
+		    pcpu != raw_smp_processor_id()))
+			break;
+
 		if (do_cond_resched)
 			cond_resched();
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] printk: Drop console_sem during panic
  2022-01-21 19:02 ` [PATCH 4/4] printk: Drop console_sem " Stephen Brennan
@ 2022-01-24 16:12   ` John Ogness
  2022-01-24 16:26     ` John Ogness
  2022-01-25 15:04     ` Petr Mladek
  0 siblings, 2 replies; 21+ messages in thread
From: John Ogness @ 2022-01-24 16:12 UTC (permalink / raw)
  To: Stephen Brennan, Petr Mladek, Sergey Senozhatsky, Steven Rostedt
  Cc: Stephen Brennan, Sergey Senozhatsky, linux-kernel

On 2022-01-21, Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
> If another CPU is in panic, we are about to be halted. Try to gracefully
> drop console_sem and allow the panic CPU to grab it easily.
>
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> ---
>  kernel/printk/printk.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ca253ac07615..c2dc8ebd9509 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2668,7 +2668,7 @@ void console_unlock(void)
>  
>  	for (;;) {
>  		size_t ext_len = 0;
> -		int handover;
> +		int handover, pcpu;
>  		size_t len;
>  
>  skip:
> @@ -2739,6 +2739,12 @@ void console_unlock(void)
>  		if (handover)
>  			return;
>  
> +		/* Allow panic_cpu to take over the consoles safely */
> +		pcpu = atomic_read(&panic_cpu);
> +		if (unlikely(pcpu != PANIC_CPU_INVALID &&
> +		    pcpu != raw_smp_processor_id()))
> +			break;
> +

Keep in mind that after the "break", this context will try to re-acquire
the console lock and continue printing. That is a pretty small window
for the panic CPU to attempt a trylock.

Perhaps the retry after the loop should also be avoided for non-panic
CPUs. This would rely on the panic CPU taking over (as your comment
suggests will happen). Since the panic-CPU calls pr_emerg() as the final
record before drifting off to neverland, that is probably OK.

Something like:

@@ -2731,7 +2731,8 @@ void console_unlock(void)
 	 * there's a new owner and the console_unlock() from them will do the
 	 * flush, no worries.
 	 */
-	retry = prb_read_valid(prb, next_seq, NULL);
+	retry = (pcpu != raw_smp_processor_id()) &&
+		prb_read_valid(prb, next_seq, NULL);
 	if (retry && console_trylock())
 		goto again;
 }

I would also like to see a comment about why it is acceptable to use
raw_smp_processor_id() in a context that has migration
enabled. Something like: raw_smp_processor_id() can be used because this
context cannot be migrated to the panic CPU.

John Ogness

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] printk: Drop console_sem during panic
  2022-01-24 16:12   ` John Ogness
@ 2022-01-24 16:26     ` John Ogness
  2022-01-25 15:04     ` Petr Mladek
  1 sibling, 0 replies; 21+ messages in thread
From: John Ogness @ 2022-01-24 16:26 UTC (permalink / raw)
  To: Stephen Brennan, Petr Mladek, Sergey Senozhatsky, Steven Rostedt
  Cc: Stephen Brennan, Sergey Senozhatsky, linux-kernel

On 2022-01-24, John Ogness <john.ogness@linutronix.de> wrote:
> Something like:
>
> @@ -2731,7 +2731,8 @@ void console_unlock(void)
>  	 * there's a new owner and the console_unlock() from them will do the
>  	 * flush, no worries.
>  	 */
> -	retry = prb_read_valid(prb, next_seq, NULL);
> +	retry = (pcpu != raw_smp_processor_id()) &&
> +		prb_read_valid(prb, next_seq, NULL);
>  	if (retry && console_trylock())
>  		goto again;
>  }

Sorry, that hunk is obviously garbage. I meant something like:

@@ -2731,7 +2731,10 @@ void console_unlock(void)
 	 * there's a new owner and the console_unlock() from them will do the
 	 * flush, no worries.
 	 */
-	retry = prb_read_valid(prb, next_seq, NULL);
+	if (panic_in_progress())
+		retry = (pcpu == raw_smp_processor_id()) && prb_read_valid(prb, next_seq, NULL);
+	else
+		retry = prb_read_valid(prb, next_seq, NULL);
 	if (retry && console_trylock())
 		goto again;
 }

I'm sure there is a cleaner way to code that.

John

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] panic: Add panic_in_progress helper
  2022-01-21 19:02 ` [PATCH 1/4] panic: Add panic_in_progress helper Stephen Brennan
@ 2022-01-25 11:48   ` Petr Mladek
  2022-01-26 17:37     ` Stephen Brennan
  0 siblings, 1 reply; 21+ messages in thread
From: Petr Mladek @ 2022-01-25 11:48 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Sergey Senozhatsky, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, linux-kernel

On Fri 2022-01-21 11:02:19, Stephen Brennan wrote:

Please, add explanation why the new helper is added. It will be
used in printk code to reduce risk of deadlocks during panic().

> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> ---
> We cannot define a static inline without including linux/atomic.h, so
> I just added a macro for convenience in later patches. Since macros were
> the only option, I didn't include a helper for
> panic_in_progress_different_cpu().

What is the exact problem with including atomic.h and using static
inline, please?

IMHO, the define is not a real solution. The macro won't be usable
without including atomic.h. So, it would work only by chance.

But it is possible that I miss something.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] printk: disable optimistic spin during panic
  2022-01-21 19:02 ` [PATCH 2/4] printk: disable optimistic spin during panic Stephen Brennan
@ 2022-01-25 12:42   ` Petr Mladek
  2022-01-26  9:18   ` Sergey Senozhatsky
  1 sibling, 0 replies; 21+ messages in thread
From: Petr Mladek @ 2022-01-25 12:42 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Sergey Senozhatsky, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, linux-kernel

On Fri 2022-01-21 11:02:20, Stephen Brennan wrote:
> A CPU executing with console lock spinning enabled might be halted
> during a panic. Before the panicking CPU calls console_flush_on_panic(),
> it may call console_trylock(), which attempts to optimistically spin,
> deadlocking the panic CPU:
> 
> CPU 0 (panic CPU)             CPU 1
> -----------------             ------
>                               printk() {
>                                 vprintk_func() {
>                                   vprintk_default() {
>                                     vprintk_emit() {
>                                       console_unlock() {
>                                         console_lock_spinning_enable();
>                                         ... printing to console ...
> panic() {
>   crash_smp_send_stop() {
>     NMI  -------------------> HALT
>   }
>   atomic_notifier_call_chain() {
>     printk() {
>       ...
>       console_trylock_spinnning() {
>         // optimistic spin infinitely
> 
> This hang during panic can be induced when a kdump kernel is loaded, and
> crash_kexec_post_notifiers=1 is present on the kernel command line. The
> following script which concurrently writes to /dev/kmsg, and triggers a
> panic, can result in this hang:
> 
>     #!/bin/bash
>     date
>     # 991 chars (based on log buffer size):
>     chars="$(printf 'a%.0s' {1..991})"
>     while :; do
>         echo $chars > /dev/kmsg
>     done &
>     echo c > /proc/sysrq-trigger &
>     date
>     exit
> 
> To avoid this deadlock, ensure that console_trylock_spinning() does not
> allow spinning once a panic has begun.
> 
> Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes")
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>

Looks good to me:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] printk: Avoid livelock with heavy printk during panic
  2022-01-21 19:02 ` [PATCH 3/4] printk: Avoid livelock with heavy printk " Stephen Brennan
@ 2022-01-25 14:25   ` Petr Mladek
  0 siblings, 0 replies; 21+ messages in thread
From: Petr Mladek @ 2022-01-25 14:25 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Sergey Senozhatsky, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, linux-kernel

On Fri 2022-01-21 11:02:21, Stephen Brennan wrote:
> During a panic(), if another CPU is writing heavily the kernel log (e.g.
> via /dev/kmsg), then the panic CPU may livelock writing out its messages
> to the console. Note when too many messages are dropped during panic and
> suppress further printk, except from the panic CPU.

I would add something like:

"It might cause that some useful messages are lost. But messages are
being lost anyway and this at least avoids the possible livelock."

> ---
>  kernel/printk/printk.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 20b4b71a1a07..ca253ac07615 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2667,6 +2678,8 @@ void console_unlock(void)
>  		if (console_seq != r.info->seq) {
>  			console_dropped += r.info->seq - console_seq;
>  			console_seq = r.info->seq;
> +			if (panic_in_progress() && panic_console_dropped++ > 10)
> +				suppress_panic_printk = 1;

It would be great to let the user know, something like:

				pr_warn("Too many dropped messages. Supress messages on non-panic CPUs to prevent livelock.\n");



>  		}
>  
>  		if (suppress_message_printing(r.info->level)) {

Otherwise, it looks good to me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] printk: Drop console_sem during panic
  2022-01-24 16:12   ` John Ogness
  2022-01-24 16:26     ` John Ogness
@ 2022-01-25 15:04     ` Petr Mladek
  1 sibling, 0 replies; 21+ messages in thread
From: Petr Mladek @ 2022-01-25 15:04 UTC (permalink / raw)
  To: John Ogness
  Cc: Stephen Brennan, Sergey Senozhatsky, Steven Rostedt,
	Sergey Senozhatsky, linux-kernel

On Mon 2022-01-24 17:18:42, John Ogness wrote:
> On 2022-01-21, Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
> > If another CPU is in panic, we are about to be halted. Try to gracefully
> > drop console_sem and allow the panic CPU to grab it easily.
> >
> > Suggested-by: Petr Mladek <pmladek@suse.com>
> > Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> > ---
> >  kernel/printk/printk.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index ca253ac07615..c2dc8ebd9509 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2668,7 +2668,7 @@ void console_unlock(void)
> >  
> >  	for (;;) {
> >  		size_t ext_len = 0;
> > -		int handover;
> > +		int handover, pcpu;
> >  		size_t len;
> >  
> >  skip:
> > @@ -2739,6 +2739,12 @@ void console_unlock(void)
> >  		if (handover)
> >  			return;
> >  
> > +		/* Allow panic_cpu to take over the consoles safely */
> > +		pcpu = atomic_read(&panic_cpu);
> > +		if (unlikely(pcpu != PANIC_CPU_INVALID &&
> > +		    pcpu != raw_smp_processor_id()))
> > +			break;
> > +
> 
> Keep in mind that after the "break", this context will try to re-acquire
> the console lock and continue printing. That is a pretty small window
> for the panic CPU to attempt a trylock.
>
> Perhaps the retry after the loop should also be avoided for non-panic
> CPUs. This would rely on the panic CPU taking over (as your comment
> suggests will happen). Since the panic-CPU calls pr_emerg() as the final
> record before drifting off to neverland, that is probably OK.

Great catch!

> Something like:
> 
> @@ -2731,7 +2731,8 @@ void console_unlock(void)
>  	 * there's a new owner and the console_unlock() from them will do the
>  	 * flush, no worries.
>  	 */
> -	retry = prb_read_valid(prb, next_seq, NULL);
> +	retry = (pcpu != raw_smp_processor_id()) &&
> +		prb_read_valid(prb, next_seq, NULL);
>  	if (retry && console_trylock())
>  		goto again;
>  }
> 
> I would also like to see a comment about why it is acceptable to use
> raw_smp_processor_id() in a context that has migration
> enabled. Something like: raw_smp_processor_id() can be used because this
> context cannot be migrated to the panic CPU.

Yup. It would be nice to mention this.

We actually need the same check in both locations. I would either put
it into a helper or I would pass the result via a variable:

The helper might look like:

/*
 * Return true when this CPU should unlock console_sem without pushing
 * all messages to the console. It would allow to call console in
 * panic CPU a safe way even after other CPUs are stopped.
 *
 * It can be called safely even in preemptive context because panic
 * CPU does not longer schedule.
 */
static bool abandon_console_lock_in_panic()
{
	if (!panic_in_progress())
		return false;

	return atomic_read(&panic_cpu) != raw_smp_processor_id();
}

Note that I used panic_in_progress() because it makes the code more
readable. Using pcpu variable is small optimization. But it has effect
only during panic() where it is not important. It is slow path anyway.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] printk: disable optimistic spin during panic
  2022-01-21 19:02 ` [PATCH 2/4] printk: disable optimistic spin during panic Stephen Brennan
  2022-01-25 12:42   ` Petr Mladek
@ 2022-01-26  9:18   ` Sergey Senozhatsky
  2022-01-26  9:45     ` John Ogness
  1 sibling, 1 reply; 21+ messages in thread
From: Sergey Senozhatsky @ 2022-01-26  9:18 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, linux-kernel

On (22/01/21 11:02), Stephen Brennan wrote:
> A CPU executing with console lock spinning enabled might be halted
> during a panic. Before the panicking CPU calls console_flush_on_panic(),
> it may call console_trylock(), which attempts to optimistically spin,
> deadlocking the panic CPU:
> 
> CPU 0 (panic CPU)             CPU 1
> -----------------             ------
>                               printk() {
>                                 vprintk_func() {
>                                   vprintk_default() {
>                                     vprintk_emit() {
>                                       console_unlock() {
>                                         console_lock_spinning_enable();
>                                         ... printing to console ...
> panic() {
>   crash_smp_send_stop() {
>     NMI  -------------------> HALT
>   }
>   atomic_notifier_call_chain() {
>     printk() {
>       ...
>       console_trylock_spinnning() {
>         // optimistic spin infinitely

[..]

> +++ b/kernel/printk/printk.c
> @@ -1843,6 +1843,16 @@ static int console_trylock_spinning(void)
>  	if (console_trylock())
>  		return 1;
>  
> +	/*
> +	 * It's unsafe to spin once a panic has begun. If we are the
> +	 * panic CPU, we may have already halted the owner of the
> +	 * console_sem. If we are not the panic CPU, then we should
> +	 * avoid taking console_sem, so the panic CPU has a better
> +	 * chance of cleanly acquiring it later.
> +	 */
> +	if (panic_in_progress())
> +		return 0;

Is there something that prevents panic CPU from NMI hlt CPU which is
in console_trylock() under raw_spin_lock_irqsave()?

 CPU0				CPU1
				console_trylock_spinnning()
				 console_trylock()
				  down_trylock()
				   raw_spin_lock_irqsave(&sem->lock)

 panic()
  crash_smp_send_stop()
   NMI 			-> 		HALT

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] printk: disable optimistic spin during panic
  2022-01-26  9:18   ` Sergey Senozhatsky
@ 2022-01-26  9:45     ` John Ogness
  2022-01-26 10:06       ` Sergey Senozhatsky
  0 siblings, 1 reply; 21+ messages in thread
From: John Ogness @ 2022-01-26  9:45 UTC (permalink / raw)
  To: Sergey Senozhatsky, Stephen Brennan
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Sergey Senozhatsky, linux-kernel

> Is there something that prevents panic CPU from NMI hlt CPU which is
> in console_trylock() under raw_spin_lock_irqsave()?
>
>  CPU0				CPU1
> 				console_trylock_spinnning()
> 				 console_trylock()
> 				  down_trylock()
> 				   raw_spin_lock_irqsave(&sem->lock)
>
>  panic()
>   crash_smp_send_stop()
>    NMI 			-> 		HALT

This is a good point. I wonder if console_flush_on_panic() should
perform a sema_init() before it does console_trylock().

John

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] printk: disable optimistic spin during panic
  2022-01-26  9:45     ` John Ogness
@ 2022-01-26 10:06       ` Sergey Senozhatsky
  2022-01-26 18:15         ` Stephen Brennan
  0 siblings, 1 reply; 21+ messages in thread
From: Sergey Senozhatsky @ 2022-01-26 10:06 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Stephen Brennan, Petr Mladek, Steven Rostedt,
	Sergey Senozhatsky, linux-kernel

On (22/01/26 10:51), John Ogness wrote:
> > Is there something that prevents panic CPU from NMI hlt CPU which is
> > in console_trylock() under raw_spin_lock_irqsave()?
> >
> >  CPU0				CPU1
> > 				console_trylock_spinnning()
> > 				 console_trylock()
> > 				  down_trylock()
> > 				   raw_spin_lock_irqsave(&sem->lock)
> >
> >  panic()
> >   crash_smp_send_stop()
> >    NMI 			-> 		HALT
> 
> This is a good point. I wonder if console_flush_on_panic() should
> perform a sema_init() before it does console_trylock().

A long time ago there was zap_locks() function in printk, that used
to re-init console semaphore and logbuf spin_lock, but _only_ in case
of printk recursion (which was never reliable)

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/printk/printk.c?h=v4.9.297#n1557

This has been superseded by printk_safe per-CPU buffers so we removed
that function.

So it could be that may be we want to introduce something similar to
zap_locks() again.

All reasonable serial consoles drivers should take oops_in_progress into
consideration in ->write(), so we probably don't care for console_drivers
spinlocks, etc. but potentially can do a bit better on the printk side.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] panic: Add panic_in_progress helper
  2022-01-25 11:48   ` Petr Mladek
@ 2022-01-26 17:37     ` Stephen Brennan
  0 siblings, 0 replies; 21+ messages in thread
From: Stephen Brennan @ 2022-01-26 17:37 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, linux-kernel

Petr Mladek <pmladek@suse.com> writes:

> On Fri 2022-01-21 11:02:19, Stephen Brennan wrote:
>
> Please, add explanation why the new helper is added. It will be
> used in printk code to reduce risk of deadlocks during panic().
>
>> Suggested-by: Petr Mladek <pmladek@suse.com>
>> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
>> ---
>> We cannot define a static inline without including linux/atomic.h, so
>> I just added a macro for convenience in later patches. Since macros were
>> the only option, I didn't include a helper for
>> panic_in_progress_different_cpu().
>
> What is the exact problem with including atomic.h and using static
> inline, please?

Hi Petr,

linux/panic.h is a commonly included header, and so I didn't want to
pollute the namespaces of many other compilation units. (In fact, for
stable kernels we avoid changing headers to avoid changing the hashes
produced by genksyms). It could impact compile time too.

But I suppose I was prematurely optimizing. I can make them proper
static inlines instead :)

Stephen

>
> IMHO, the define is not a real solution. The macro won't be usable
> without including atomic.h. So, it would work only by chance.
>
> But it is possible that I miss something.
>
> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] printk: disable optimistic spin during panic
  2022-01-26 10:06       ` Sergey Senozhatsky
@ 2022-01-26 18:15         ` Stephen Brennan
  2022-01-27  7:11           ` Sergey Senozhatsky
  0 siblings, 1 reply; 21+ messages in thread
From: Stephen Brennan @ 2022-01-26 18:15 UTC (permalink / raw)
  To: Sergey Senozhatsky, John Ogness
  Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt,
	Sergey Senozhatsky, linux-kernel

Sergey Senozhatsky <senozhatsky@chromium.org> writes:

> On (22/01/26 10:51), John Ogness wrote:
>> > Is there something that prevents panic CPU from NMI hlt CPU which is
>> > in console_trylock() under raw_spin_lock_irqsave()?
>> >
>> >  CPU0				CPU1
>> > 				console_trylock_spinnning()
>> > 				 console_trylock()
>> > 				  down_trylock()
>> > 				   raw_spin_lock_irqsave(&sem->lock)
>> >
>> >  panic()
>> >   crash_smp_send_stop()
>> >    NMI 			-> 		HALT
>> 
>> This is a good point. I wonder if console_flush_on_panic() should
>> perform a sema_init() before it does console_trylock().
>
> A long time ago there was zap_locks() function in printk, that used
> to re-init console semaphore and logbuf spin_lock, but _only_ in case
> of printk recursion (which was never reliable)
>
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/printk/printk.c?h=v4.9.297#n1557
>
> This has been superseded by printk_safe per-CPU buffers so we removed
> that function.
>
> So it could be that may be we want to introduce something similar to
> zap_locks() again.
>
> All reasonable serial consoles drivers should take oops_in_progress into
> consideration in ->write(), so we probably don't care for console_drivers
> spinlocks, etc. but potentially can do a bit better on the printk side.

I see the concern here. If a CPU is halted while holding
console_sem.lock spinlock, then the very next printk would hang, since
each vprintk_emit() does a trylock.

Now in my thousands of iterations of tests, I haven't been lucky enough
to interrupt a CPU in the middle of this critical section. The critical
section itself is incredibly short and so it's hard to do it. Not
impossible, I'd imagine.

We can't fix it in console_flush_on_panic(), because that is called much
later, after we've called the panic notifiers, which definitely
printk(). If we wanted to re-initialize the console_sem, we'd want it
done earlier in panic(), directly after the NMI was sent.

My understanding was that we can't be too cautious regarding the console
drivers. Sure, they _shouldn't_ have any race conditions, but once we're
in panic we're better off avoiding the console drivers unless it's our
last choice. So, is it worth re-initializing the console_sem early in
panic, which forces all the subsequent printk to go out to the consoles?
I don't know.

One alternative is to do __printk_safe_enter() at the beginning of
panic. This effectively guarantees that no printk will hit the console
drivers or even attempt to grab the console_sem. Then, we can do the
kmsg_dump, do a crash_kexec if configured, and only when all options
have been exhausted would we reinitialize the console_sem and flush to
the console. Maybe this is too cautious, but it is an alternative.

Stephen

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] printk: disable optimistic spin during panic
  2022-01-26 18:15         ` Stephen Brennan
@ 2022-01-27  7:11           ` Sergey Senozhatsky
  2022-01-27  9:09             ` John Ogness
  2022-01-27 11:38             ` Petr Mladek
  0 siblings, 2 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2022-01-27  7:11 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Sergey Senozhatsky, John Ogness, Petr Mladek, Steven Rostedt,
	Sergey Senozhatsky, linux-kernel

On (22/01/26 10:15), Stephen Brennan wrote:
[..]
> > On (22/01/26 10:51), John Ogness wrote:
> >> > Is there something that prevents panic CPU from NMI hlt CPU which is
> >> > in console_trylock() under raw_spin_lock_irqsave()?
> >> >
> >> >  CPU0				CPU1
> >> > 				console_trylock_spinnning()
> >> > 				 console_trylock()
> >> > 				  down_trylock()
> >> > 				   raw_spin_lock_irqsave(&sem->lock)
> >> >
> >> >  panic()
> >> >   crash_smp_send_stop()
> >> >    NMI 			-> 		HALT
> >> 
> >> This is a good point. I wonder if console_flush_on_panic() should
> >> perform a sema_init() before it does console_trylock().
> >
> > A long time ago there was zap_locks() function in printk, that used
> > to re-init console semaphore and logbuf spin_lock, but _only_ in case
> > of printk recursion (which was never reliable)
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/printk/printk.c?h=v4.9.297#n1557
> >
> > This has been superseded by printk_safe per-CPU buffers so we removed
> > that function.
> >
> > So it could be that may be we want to introduce something similar to
> > zap_locks() again.
> >
> > All reasonable serial consoles drivers should take oops_in_progress into
> > consideration in ->write(), so we probably don't care for console_drivers
> > spinlocks, etc. but potentially can do a bit better on the printk side.
> 
> I see the concern here. If a CPU is halted while holding
> console_sem.lock spinlock, then the very next printk would hang, since
> each vprintk_emit() does a trylock.

Right. So I also thought about placing panic_in_progress() somewhere in
console_trylock() and make it fail for anything that is not a panic CPU.

> Now in my thousands of iterations of tests, I haven't been lucky enough
> to interrupt a CPU in the middle of this critical section. The critical
> section itself is incredibly short and so it's hard to do it. Not
> impossible, I'd imagine.

I can imagine that the race window is really small, and I'm not insisting
on fixing it right now (or ever for that matter).

Basically, we now have two different "something bad is in progress"
that affect two different ends of the calls stack. bust_spinlocks()
sets oops_in_progress and affects console drivers' spinlocks, but has
no meaning to any other printk locks. And then we have panic_in_progress()
which is meaningful to some printk locks, but not to all of them, and is
meaningless to console drivers, because those look at oops_in_progress.

If printk folks are fine with that then I'm also fine.

> We can't fix it in console_flush_on_panic(), because that is called much
> later, after we've called the panic notifiers, which definitely
> printk(). If we wanted to re-initialize the console_sem, we'd want it
> done earlier in panic(), directly after the NMI was sent.

Right.

> My understanding was that we can't be too cautious regarding the console
> drivers. Sure, they _shouldn't_ have any race conditions, but once we're
> in panic we're better off avoiding the console drivers unless it's our
> last choice. So, is it worth re-initializing the console_sem early in
> panic, which forces all the subsequent printk to go out to the consoles?
> I don't know.
>
> One alternative is to do __printk_safe_enter() at the beginning of
> panic. This effectively guarantees that no printk will hit the console
> drivers or even attempt to grab the console_sem. Then, we can do the
> kmsg_dump, do a crash_kexec if configured, and only when all options
> have been exhausted would we reinitialize the console_sem and flush to
> the console. Maybe this is too cautious, but it is an alternative.

Back in the days we also had this idea of "detaching" non-panic CPUs from
printk() by overwriting their printk function pointers.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] printk: disable optimistic spin during panic
  2022-01-27  7:11           ` Sergey Senozhatsky
@ 2022-01-27  9:09             ` John Ogness
  2022-01-27 11:38             ` Petr Mladek
  1 sibling, 0 replies; 21+ messages in thread
From: John Ogness @ 2022-01-27  9:09 UTC (permalink / raw)
  To: Sergey Senozhatsky, Stephen Brennan
  Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt,
	Sergey Senozhatsky, linux-kernel

On 2022-01-27, Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> Right. So I also thought about placing panic_in_progress() somewhere in
> console_trylock() and make it fail for anything that is not a panic
> CPU.

I think this is a good idea and console_trylock() is the correct place
for that.

> Back in the days we also had this idea of "detaching" non-panic CPUs from
> printk() by overwriting their printk function pointers.

We need to keep in mind that printk() is no longer the problem. The
records are stored locklessly. The problem is the
console_trylock()/console_unlock() within vprintk_emit(). IMHO adding a
panic check in console_trylock() should solve that race sufficiently.

John

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] printk: disable optimistic spin during panic
  2022-01-27  7:11           ` Sergey Senozhatsky
  2022-01-27  9:09             ` John Ogness
@ 2022-01-27 11:38             ` Petr Mladek
  2022-01-27 12:43               ` John Ogness
  1 sibling, 1 reply; 21+ messages in thread
From: Petr Mladek @ 2022-01-27 11:38 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Stephen Brennan, John Ogness, Steven Rostedt, Sergey Senozhatsky,
	linux-kernel

On Thu 2022-01-27 16:11:08, Sergey Senozhatsky wrote:
> On (22/01/26 10:15), Stephen Brennan wrote:
> [..]
> > > On (22/01/26 10:51), John Ogness wrote:
> > >> > Is there something that prevents panic CPU from NMI hlt CPU which is
> > >> > in console_trylock() under raw_spin_lock_irqsave()?
> > >> >
> > >> >  CPU0				CPU1
> > >> > 				console_trylock_spinnning()
> > >> > 				 console_trylock()
> > >> > 				  down_trylock()
> > >> > 				   raw_spin_lock_irqsave(&sem->lock)
> > >> >
> > >> >  panic()
> > >> >   crash_smp_send_stop()
> > >> >    NMI 			-> 		HALT
> > >> 
> > >> This is a good point. I wonder if console_flush_on_panic() should
> > >> perform a sema_init() before it does console_trylock().
> > >
> > > A long time ago there was zap_locks() function in printk, that used
> > > to re-init console semaphore and logbuf spin_lock, but _only_ in case
> > > of printk recursion (which was never reliable)
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/printk/printk.c?h=v4.9.297#n1557
> > >
> > > This has been superseded by printk_safe per-CPU buffers so we removed
> > > that function.
> > >
> > > So it could be that may be we want to introduce something similar to
> > > zap_locks() again.
> > >
> > > All reasonable serial consoles drivers should take oops_in_progress into
> > > consideration in ->write(), so we probably don't care for console_drivers
> > > spinlocks, etc. but potentially can do a bit better on the printk side.
> > 
> > I see the concern here. If a CPU is halted while holding
> > console_sem.lock spinlock, then the very next printk would hang, since
> > each vprintk_emit() does a trylock.
> 
> Right. So I also thought about placing panic_in_progress() somewhere in
> console_trylock() and make it fail for anything that is not a panic CPU.
> 
> > Now in my thousands of iterations of tests, I haven't been lucky enough
> > to interrupt a CPU in the middle of this critical section. The critical
> > section itself is incredibly short and so it's hard to do it. Not
> > impossible, I'd imagine.
> 
> I can imagine that the race window is really small, and I'm not insisting
> on fixing it right now (or ever for that matter).
>
> Basically, we now have two different "something bad is in progress"
> that affect two different ends of the calls stack. bust_spinlocks()
> sets oops_in_progress and affects console drivers' spinlocks, but has
> no meaning to any other printk locks. And then we have panic_in_progress()
> which is meaningful to some printk locks, but not to all of them, and is
> meaningless to console drivers, because those look at oops_in_progress.

Good point! It looks a bit non-consistent and I am sure that we could
do better.

Well, my view is that there are contexts:

1. oops_in_progress is used when printing OOps report and panic().

   The important thing is that the system might continue working
   after OOps when "panic_on_oops" is not defined.

   Many console drivers allow to enter locked section even when the lock
   is already teaken. But they prevent double unlock.

   The aim is to show OOps messages on the console because they system
   might silently die otherwise.


2. The new panic_in_progress() context says that the system is really
   dying.

   The system should try hard to show the messages on console but
   still a safe way. It should not risk breaking other ways to store
   debugging information: kdump, kmsg_dump, and panic notifiers.

   I mention notifiers because some inform the hypervisor about the
   panic. The hypervisor could get some debugging information as well.


3. console_flush_on_panic() is the last attempt to show to messages
   on the console.

   It is done after kdump, kmsg_dump, and notifiers. consoles are
   the only concern here. So it could be more agressive and risk more.


Honestly, I never thought much about oops_in_panic context. All
the past discussion were focused on panic.

Also we focused on lowering the risk by introducing lockless algorithms
or fixing bugs. I can't remember moving some risky operation earlier.
Also we were reluctant to change the risk level (algorithm) when it
was not obvious or at least promising that it makes things better.


> If printk folks are fine with that then I'm also fine.
> 
> > We can't fix it in console_flush_on_panic(), because that is called much
> > later, after we've called the panic notifiers, which definitely
> > printk(). If we wanted to re-initialize the console_sem, we'd want it
> > done earlier in panic(), directly after the NMI was sent.

I am not sure if it is worth the risk. You want to reinitialize the
semaphore because a small race window in the internal spin lock.
But it will allow to enter a lot of code that is guarded by console_sem.

I mean that chance of dealock caused by the internal semaohore spin
lock is super small. In compare, a lot of tricky code is guarded
by console_sem. It looks like a big risk to ignore the semaphore
early in panic().

A better solution would be to use raw_spin_trylock_irqsave() in
down_trylock().

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] printk: disable optimistic spin during panic
  2022-01-27 11:38             ` Petr Mladek
@ 2022-01-27 12:43               ` John Ogness
  2022-01-27 14:25                 ` Petr Mladek
  0 siblings, 1 reply; 21+ messages in thread
From: John Ogness @ 2022-01-27 12:43 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky
  Cc: Stephen Brennan, Steven Rostedt, Sergey Senozhatsky, linux-kernel

On 2022-01-27, Petr Mladek <pmladek@suse.com> wrote:
> I mean that chance of dealock caused by the internal semaohore spin
> lock is super small. In compare, a lot of tricky code is guarded
> by console_sem. It looks like a big risk to ignore the semaphore
> early in panic().

Agreed.

> A better solution would be to use raw_spin_trylock_irqsave() in
> down_trylock().

down_trylock() is attempting to decrement a semaphore. It should not
fail just because another CPU is also in the process of
decrementing/incrementing the semaphore.

Maybe a down_trylock_cond() could be introduced where the trylock could
fail if a given condition is not met. The function would need to
implement its own internal trylock spin loop to check the condition. But
then we could pass in a condition for it to abort. For example, when in
panic and we are not the panic CPU.

John

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] printk: disable optimistic spin during panic
  2022-01-27 12:43               ` John Ogness
@ 2022-01-27 14:25                 ` Petr Mladek
  0 siblings, 0 replies; 21+ messages in thread
From: Petr Mladek @ 2022-01-27 14:25 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Stephen Brennan, Steven Rostedt,
	Sergey Senozhatsky, linux-kernel

On Thu 2022-01-27 13:49:44, John Ogness wrote:
> On 2022-01-27, Petr Mladek <pmladek@suse.com> wrote:
> > I mean that chance of dealock caused by the internal semaohore spin
> > lock is super small. In compare, a lot of tricky code is guarded
> > by console_sem. It looks like a big risk to ignore the semaphore
> > early in panic().
> 
> Agreed.
> 
> > A better solution would be to use raw_spin_trylock_irqsave() in
> > down_trylock().
> 
> down_trylock() is attempting to decrement a semaphore. It should not
> fail just because another CPU is also in the process of
> decrementing/incrementing the semaphore.

IMHO, it does not matter. As you say, raw_spin_trylock_irqsave() fails
only when another process is about to release or take the semaphore.
The semaphore is usually taken for a long time. The tiny window when
the counter is manipulated is negligible.

I mean, if down_trylock() fails because of raw_spin_trylock_irqsave()
failure then it is few instructions from failing even with the lock.

> Maybe a down_trylock_cond() could be introduced where the trylock could
> fail if a given condition is not met. The function would need to
> implement its own internal trylock spin loop to check the condition. But
> then we could pass in a condition for it to abort. For example, when in
> panic and we are not the panic CPU.

This looks too complicated.

Another solution would be to introduce panic_down_trylock() variant
of down_trylock() that will use raw_spin_trylock_irqsave(). The normal
down_trylock() might still use the raw_spin_lock_irqsave().

Well, this should get discussed with the locking people.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-01-27 14:26 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-21 19:02 [PATCH 0/4] printk: reduce deadlocks during panic Stephen Brennan
2022-01-21 19:02 ` [PATCH 1/4] panic: Add panic_in_progress helper Stephen Brennan
2022-01-25 11:48   ` Petr Mladek
2022-01-26 17:37     ` Stephen Brennan
2022-01-21 19:02 ` [PATCH 2/4] printk: disable optimistic spin during panic Stephen Brennan
2022-01-25 12:42   ` Petr Mladek
2022-01-26  9:18   ` Sergey Senozhatsky
2022-01-26  9:45     ` John Ogness
2022-01-26 10:06       ` Sergey Senozhatsky
2022-01-26 18:15         ` Stephen Brennan
2022-01-27  7:11           ` Sergey Senozhatsky
2022-01-27  9:09             ` John Ogness
2022-01-27 11:38             ` Petr Mladek
2022-01-27 12:43               ` John Ogness
2022-01-27 14:25                 ` Petr Mladek
2022-01-21 19:02 ` [PATCH 3/4] printk: Avoid livelock with heavy printk " Stephen Brennan
2022-01-25 14:25   ` Petr Mladek
2022-01-21 19:02 ` [PATCH 4/4] printk: Drop console_sem " Stephen Brennan
2022-01-24 16:12   ` John Ogness
2022-01-24 16:26     ` John Ogness
2022-01-25 15:04     ` Petr Mladek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.