All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Brennan <stephen.s.brennan@oracle.com>
To: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	linux-kernel@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	John Ogness <john.ogness@linutronix.de>
Subject: Re: [PATCH v4 0/4] printk: reduce deadlocks during panic
Date: Thu, 10 Feb 2022 12:06:44 -0800	[thread overview]
Message-ID: <8f08bb64-ee8a-9555-f4a1-6d55d3c77531@oracle.com> (raw)
In-Reply-To: <YgTZPQEay6T/nhu6@alley>

On 2/10/22 01:22, Petr Mladek wrote:
> On Wed 2022-02-02 09:18:17, Stephen Brennan wrote:
>> When a caller writes heavily to the kernel log (e.g. writing to
>> /dev/kmsg in a loop) while another panics, there's currently a high
>> likelihood of a deadlock (see patch 2 for the full description of this
>> deadlock).
>>
>> The principle fix is to disable the optimistic spin once panic_cpu is
>> set, so the panic CPU doesn't spin waiting for a halted CPU to hand over
>> the console_sem.
>>
>> However, this exposed us to a livelock situation, where the panic CPU
>> holds the console_sem, and another CPU could fill up the log buffer
>> faster than the consoles could drain it, preventing the panic from
>> progressing and halting the other CPUs. To avoid this, patch 3 adds a
>> mechanism to suppress printk (from non-panic-CPU) during panic, if we
>> reach a threshold of dropped messages.
>>
>> A major goal with all of these patches is to try to decrease the
>> likelihood that another CPU is holding the console_sem when we halt it
>> in panic(). This reduces the odds of needing to break locks and
>> potentially encountering further deadlocks with the console drivers.
>>
>> To test, I use the following script, kmsg_panic.sh:
>>
>>      #!/bin/bash
>>      date
>>      # 991 chars (based on log buffer size):
>>      chars="$(printf 'a%.0s' {1..991})"
>>      while :; do
>>          echo $chars > /dev/kmsg
>>      done &
>>      echo c > /proc/sysrq-trigger &
>>      date
>>      exit
>>
>> I defined a hang as any time the system did not reboot to a login prompt
>> on the serial console within 60 seconds. Here are the statistics on
>> hangs using this script, before and after the patch.
>>
>> before:  776 hangs / 1484 trials - 52.3%
>> after :    0 hangs /  15k trials -  0.0%
>>
>> Stephen Brennan (4):
>>    printk: Add panic_in_progress helper
>>    printk: disable optimistic spin during panic
>>    printk: Avoid livelock with heavy printk during panic
>>    printk: Drop console_sem during panic
>>
>>   kernel/printk/printk.c | 55 +++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 54 insertions(+), 1 deletion(-)
> 
> For the entire patchset:
> 
> Reviewed-by: Petr Mladek <pmladek@suse.com>
> 
> It looks ready for linux-next from my POV. I am going to push it early
> next week unless anyone complains in the meantime.
Thank you Petr! It occurs to me that some of this could be 
stable-worthy, depending on your feelings on it. Patches 1-3 resolve 
real bugs on customer systems, and they'd apply back a decent way. 1-2 
apply all the way back to 4.14, and 3 would apply with some minor 
changes. I suppose the question is whether they are simple enough. Patch 
4 is useful but I don't have a real reproducer for a bug it fixes, so I 
wouldn't say it's stable worthy.

Of course we have the logbuf_lock in 5.10 and previous, and if a CPU is 
halted holding that lock, then printk hangs even before the optimistic 
spinning. I have patches which reinitialize those locks after the CPUs 
are halted if necessary. I think they are reasonable for stable - printk 
is guaranteed to hang without doing this, so in the worst case you trade 
a hang during a panic, with some other sort of printk log buffer bug 
during a panic. But in the common case, you eliminate the hang. I can 
send that patch to linux-stable as well.

What do you think about these patches and stable?

Thanks again,
Stephen

> 
> Best Regards,
> Petr Mladek

  reply	other threads:[~2022-02-10 20:07 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-02 17:18 [PATCH v4 0/4] printk: reduce deadlocks during panic Stephen Brennan
2022-02-02 17:18 ` [PATCH v4 1/4] printk: Add panic_in_progress helper Stephen Brennan
2022-02-02 17:18 ` [PATCH v4 2/4] printk: disable optimistic spin during panic Stephen Brennan
2022-02-02 17:18 ` [PATCH v4 3/4] printk: Avoid livelock with heavy printk " Stephen Brennan
2022-02-02 17:18 ` [PATCH v4 4/4] printk: Drop console_sem " Stephen Brennan
2022-02-10  9:22 ` [PATCH v4 0/4] printk: reduce deadlocks " Petr Mladek
2022-02-10 20:06   ` Stephen Brennan [this message]
2022-02-14 13:54     ` Petr Mladek
2022-02-15 23:24       ` Stephen Brennan
2022-02-14  2:27   ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f08bb64-ee8a-9555-f4a1-6d55d3c77531@oracle.com \
    --to=stephen.s.brennan@oracle.com \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.