linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <peterz@infradead.org>,
	x86@kernel.org, John Ogness <john.ogness@linutronix.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Petr Tesarik <ptesarik@suse.cz>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] printk/panic: Access the main printk log in panic() only when safe
Date: Wed, 24 Jul 2019 14:27:11 +0200	[thread overview]
Message-ID: <20190724122711.qquevkcuge24bhdd@pathway.suse.cz> (raw)
In-Reply-To: <20190723031340.GA19463@jagdpanzerIV>

On Tue 2019-07-23 12:13:40, Sergey Senozhatsky wrote:
> On (07/19/19 14:57), Petr Mladek wrote:
> > But there is one more scenario 3:
> 
> Yes!
> 
> >   - we have CPUB which loops or is deadlocked in IRQ context
> > 
> >   - we have CPUA which panic()-s the system, but can't bring CPUB down,
> >     so logbuf_lock might be takes and release from time to time
> >     by CPUB
> 
> Great!
> 
> This is the only case when we actually need to pay attention to
> num_online_cpus(), because there is an active logbuf_lock owner;
> in any other case we can unconditionally re-init printk() locks.
> 
> But there is more to it.
> 
> Note, that the problem in scenario 3 is bigger than just logbuf_lock.
> Regardless of logbuf implementation we will not be able to panic()
> the system.
> 
> If we have a never ending source of printk() messages, coming from
> misbehaving CPU which stuck in printing loop in IRQ context, then
> flush_on_panic() will never end or kmsg dump will never stop, etc.

Yes.

> We need to cut off misbehaving CPUs. Panic CPU waits (for up to 1
> second?) in smp_send_stop() for secondary CPUs to die, if some
> secondary CPUs are still chatty after that then most likely those
> CPUs don't have anything good to say, just a pointless flood of same
> messages over and over again; which, however, will not let panic
> CPU to proceed.

Makes sense.

> And this is where the idea of "disconnecting" those CPUs from main
> logbuf come from.
> 
> So what we can do:
> - smp_send_stop()
> - disconnect all-but-self from logbuf (via printk-safe)

printk_safe is not really necessary. As you wrote, nobody
is interested into messages from CPUs that are supposed
to be stopped.

It might be enough to set some global variable, for example,
with the CPU number that is calling panic() and is the only
one allowed to print messages from this point on.

It might even be used to force console lock owner to leave
the cycle immediately.

> - wait for 1 or 2 more extra seconds for secondary CPUs to leave
>   console_unlock() and to redirect printks to per-CPU buffers
> - after that we are sort of good-to-go: re-init printk locks
>   and do kmsg_dump, flush_on_panic().

Sounds good.

> Note, misbehaving CPUs will write to per-CPU buffers, they are not
> expected to be able to flush per-CPU buffers to the main logbuf. That
> will require enabled IRQs, which should deliver stop IPI. But we can
> do even more - just disable print_safe irq work on disconnect CPUs.
>
> So, shall we try one more time with the "disconnect" misbehaving CPUs
> approach? I can send an RFC patch.

IMHO, it will be acceptable only when it is reasonably simple and
straightforward. The panic() code is full of special hacks and
it is already complicated to follow all the twists.

Especially because this scenario came up from a theoretical
discussion. Note that my original, real bug report, was
with logbuf_lock NMI, enabled kdump notifiers, ...

Best Regards,
Petr

  reply	other threads:[~2019-07-24 12:27 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-16  7:28 [PATCH 0/2] panic/printk/x86: Prevent some more printk-related deadlocks in panic() Petr Mladek
2019-07-16  7:28 ` [PATCH 1/2] printk/panic: Access the main printk log in panic() only when safe Petr Mladek
2019-07-17  9:56   ` Sergey Senozhatsky
2019-07-18  8:36     ` Petr Mladek
2019-07-18  9:49       ` Sergey Senozhatsky
2019-07-19 12:57         ` Petr Mladek
2019-07-23  3:13           ` Sergey Senozhatsky
2019-07-24 12:27             ` Petr Mladek [this message]
2019-07-31  6:08               ` Sergey Senozhatsky
2019-07-31  6:59                 ` Sergey Senozhatsky
2019-07-18 10:11   ` Sergey Senozhatsky
2019-07-16  7:28 ` [PATCH 2/2] printk/panic/x86: Allow to access printk log buffer after crash_smp_send_stop() Petr Mladek
2019-07-18 10:47   ` Sergey Senozhatsky
2019-07-18 11:07     ` Konstantin Khlebnikov
2019-07-18 11:29       ` Sergey Senozhatsky
2019-07-19 12:19         ` Petr Mladek
2019-07-18  9:59 ` [PATCH 0/2] panic/printk/x86: Prevent some more printk-related deadlocks in panic() Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190724122711.qquevkcuge24bhdd@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=john.ogness@linutronix.de \
    --cc=koct9i@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=ptesarik@suse.cz \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).