archive mirror
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <>
To: Petr Mladek <>
Cc: Sergey Senozhatsky <>,
	Konstantin Khlebnikov <>,
	Sergey Senozhatsky <>,
	Steven Rostedt <>,
	Peter Zijlstra <>,, John Ogness <>,
	Thomas Gleixner <>,
	Andrew Morton <>,
	Petr Tesarik <>,
Subject: Re: [PATCH 1/2] printk/panic: Access the main printk log in panic() only when safe
Date: Tue, 23 Jul 2019 12:13:40 +0900	[thread overview]
Message-ID: <20190723031340.GA19463@jagdpanzerIV> (raw)
In-Reply-To: <>

On (07/19/19 14:57), Petr Mladek wrote:
> > Where do nested printk()-s come from? Which one of the following
> > scenarios you cover in commit message:
> > 
> > scenario 1
> > 
> > - we have CPUB which holds logbuf_lock
> > - we have CPUA which panic()-s the system, but can't bring CPUB down,
> >   so logbuf_lock stays locked on remote CPU
> No, this scenario is not affected by this patch. It would always lead to
> a deadlock.

Agreed, in many cases we won't be able to panic() the system properly,
deadlocking somewhere in smp_send_stop().

> > scenario 2
> > 
> > - we have CPUA which holds logbuf_lock
> > - we have panic() on CPUA, but it cannot bring down some other CPUB
> >   so logbuf_lock stays locked on local CPU, and it cannot re-init
> >   logbuf.
>   + Before:
>       + printk_safe_flush_on_panic() will keep logbuf_lock locked
> 	and do nothing.
>       + kmsg_dump(), console_unblank(), or console_flush_on_panic()
> 	will deadlock when they try to get logbuf_lock(). They will
> 	not be able to process any single line.
>   + After:
>       + printk_bust_lock_safe() will keep logbuf_lock locked
>       + All functions using logbuf_lock will not get called.
> 	We will not see the messages (as previously) but the
> 	system will not deadlock.
> But there is one more scenario 3:


>   - we have CPUB which loops or is deadlocked in IRQ context
>   - we have CPUA which panic()-s the system, but can't bring CPUB down,
>     so logbuf_lock might be takes and release from time to time
>     by CPUB


This is the only case when we actually need to pay attention to
num_online_cpus(), because there is an active logbuf_lock owner;
in any other case we can unconditionally re-init printk() locks.

But there is more to it.

Note, that the problem in scenario 3 is bigger than just logbuf_lock.
Regardless of logbuf implementation we will not be able to panic()
the system.

If we have a never ending source of printk() messages, coming from
misbehaving CPU which stuck in printing loop in IRQ context, then
flush_on_panic() will never end or kmsg dump will never stop, etc.
We need to cut off misbehaving CPUs. Panic CPU waits (for up to 1
second?) in smp_send_stop() for secondary CPUs to die, if some
secondary CPUs are still chatty after that then most likely those
CPUs don't have anything good to say, just a pointless flood of same
messages over and over again; which, however, will not let panic
CPU to proceed.

And this is where the idea of "disconnecting" those CPUs from main
logbuf come from.

So what we can do:
- smp_send_stop()
- disconnect all-but-self from logbuf (via printk-safe)
- wait for 1 or 2 more extra seconds for secondary CPUs to leave
  console_unlock() and to redirect printks to per-CPU buffers
- after that we are sort of good-to-go: re-init printk locks
  and do kmsg_dump, flush_on_panic().

Note, misbehaving CPUs will write to per-CPU buffers, they are not
expected to be able to flush per-CPU buffers to the main logbuf. That
will require enabled IRQs, which should deliver stop IPI. But we can
do even more - just disable print_safe irq work on disconnect CPUs.

So, shall we try one more time with the "disconnect" misbehaving CPUs
approach? I can send an RFC patch.


  reply	other threads:[~2019-07-23  3:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-16  7:28 [PATCH 0/2] panic/printk/x86: Prevent some more printk-related deadlocks in panic() Petr Mladek
2019-07-16  7:28 ` [PATCH 1/2] printk/panic: Access the main printk log in panic() only when safe Petr Mladek
2019-07-17  9:56   ` Sergey Senozhatsky
2019-07-18  8:36     ` Petr Mladek
2019-07-18  9:49       ` Sergey Senozhatsky
2019-07-19 12:57         ` Petr Mladek
2019-07-23  3:13           ` Sergey Senozhatsky [this message]
2019-07-24 12:27             ` Petr Mladek
2019-07-31  6:08               ` Sergey Senozhatsky
2019-07-31  6:59                 ` Sergey Senozhatsky
2019-07-18 10:11   ` Sergey Senozhatsky
2019-07-16  7:28 ` [PATCH 2/2] printk/panic/x86: Allow to access printk log buffer after crash_smp_send_stop() Petr Mladek
2019-07-18 10:47   ` Sergey Senozhatsky
2019-07-18 11:07     ` Konstantin Khlebnikov
2019-07-18 11:29       ` Sergey Senozhatsky
2019-07-19 12:19         ` Petr Mladek
2019-07-18  9:59 ` [PATCH 0/2] panic/printk/x86: Prevent some more printk-related deadlocks in panic() Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190723031340.GA19463@jagdpanzerIV \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).