All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <peterz@infradead.org>,
	x86@kernel.org, John Ogness <john.ogness@linutronix.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Petr Tesarik <ptesarik@suse.cz>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] printk/panic: Access the main printk log in panic() only when safe
Date: Fri, 19 Jul 2019 14:57:53 +0200	[thread overview]
Message-ID: <20190719125753.miniwfq4nhicy76n@pathway.suse.cz> (raw)
In-Reply-To: <20190718094934.GA10041@jagdpanzerIV>

On Thu 2019-07-18 18:49:34, Sergey Senozhatsky wrote:
> On (07/18/19 10:36), Petr Mladek wrote:
> > On Wed 2019-07-17 18:56:15, Sergey Senozhatsky wrote:
> > > On (07/16/19 09:28), Petr Mladek wrote:
> > > > Kernel tries hard to store and show printk messages when panicking. Even
> > > > logbuf_lock gets re-initialized when only one CPU is running after
> > > > smp_send_stop().
> > > > 
> > > > Unfortunately, smp_send_stop() might fail on architectures that do not
> > > > use NMI as a fallback. Then printk log buffer might stay locked and
> > > > a deadlock is almost inevitable.
> > > 
> > > I'd say that deadlock is still almost inevitable.
> > > 
> > > panic-CPU syncs with the printing-CPU before it attempts to SMP_STOP.
> > > If there is an active printing-CPU, which is looping in console_unlock(),
> > > taking logbuf_lock in order to msg_print_text() and stuff, then panic-CPU
> > > will spin on console_owner waiting for that printing-CPU to handover
> > > printing duties.
> > > 
> > > 	pr_emerg("Kernel panic - not syncing");
> > > 	smp_send_stop();
> > 
> > Good point. I forgot the handover logic. Well, it is enabled only
> > around call_console_drivers(). Therefore it is not under
> > lockbuf_lock.
> > 
> > I had in mind some infinite loop or deadlock in vprintk_store().
> > There was at least one long time ago (warning triggered
> > by leap second).
> > 
> > 
> > > If printing-CPU goes nuts under logbuf_lock, has corrupted IDT or anything
> > > else, then we will not progress with panic(). panic-CPU will deadlock. If
> > > not on
> > > 	pr_emerg("Kernel panic - not syncing")
> > > 
> > > then on another pr_emerg(), right before the NMI-fallback.
> > 
> > Nested printk() should not be problem thanks to printk_safe.
> 
> Where do nested printk()-s come from? Which one of the following
> scenarios you cover in commit message:
> 
> scenario 1
> 
> - we have CPUB which holds logbuf_lock
> - we have CPUA which panic()-s the system, but can't bring CPUB down,
>   so logbuf_lock stays locked on remote CPU

No, this scenario is not affected by this patch. It would always lead to
a deadlock.

> scenario 2
> 
> - we have CPUA which holds logbuf_lock
> - we have panic() on CPUA, but it cannot bring down some other CPUB
>   so logbuf_lock stays locked on local CPU, and it cannot re-init
>   logbuf.

This scenario should get better handled by this patch. The difference
will be when smp_send_stop() is not able to stop all CPUs:

  + Before:
      + printk_safe_flush_on_panic() will keep logbuf_lock locked
	and do nothing.

      + kmsg_dump(), console_unblank(), or console_flush_on_panic()
	will deadlock when they try to get logbuf_lock(). They will
	not be able to process any single line.

  + After:
      + printk_bust_lock_safe() will keep logbuf_lock locked

      + All functions using logbuf_lock will not get called.
	We will not see the messages (as previously) but the
	system will not deadlock.


But there is one more scenario 3:

  - we have CPUB which loops or is deadlocked in IRQ context

  - we have CPUA which panic()-s the system, but can't bring CPUB down,
    so logbuf_lock might be takes and release from time to time
    by CPUB

Hmm, this scenario might be handled a bit _worse_ by this patch:

  + Before:
      + printk_safe_flush_on_panic() will not touch logbuf_lock
	The messages will get flushed according to the state of
	logbuf_lock at the moment when it is being checked.

      + kmsg_dump(), console_unblank(), or console_flush_on_panic()
	will be able to do their job.

  + After:
      +  printk_safe_flush_on_panic(), kmsg_dump(), console_unblank(),
	 and console_flush_on_panic() could finish the job. But they
	 will get called _only_ when logbuf_lock is released at
	 the moment when it is being checked by printk_bust_lock_safe().


Resume:

From my POV, the 3rd scenario is the most likely one. Therefore this
patch would make more bad than good.

It might be possible to somehow detect if lockbuf_lock is released
from time the time on the non-stopped CPU. But it would be hairy.
IMHO, it is not worth it.

Thanks a lot for helping me to sort the ideas. I suggest to forget
this patch and work on lockless ringbuffer.

Best Regards,
Petr

  reply	other threads:[~2019-07-19 12:57 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-16  7:28 [PATCH 0/2] panic/printk/x86: Prevent some more printk-related deadlocks in panic() Petr Mladek
2019-07-16  7:28 ` [PATCH 1/2] printk/panic: Access the main printk log in panic() only when safe Petr Mladek
2019-07-17  9:56   ` Sergey Senozhatsky
2019-07-18  8:36     ` Petr Mladek
2019-07-18  9:49       ` Sergey Senozhatsky
2019-07-19 12:57         ` Petr Mladek [this message]
2019-07-23  3:13           ` Sergey Senozhatsky
2019-07-24 12:27             ` Petr Mladek
2019-07-31  6:08               ` Sergey Senozhatsky
2019-07-31  6:59                 ` Sergey Senozhatsky
2019-07-18 10:11   ` Sergey Senozhatsky
2019-07-16  7:28 ` [PATCH 2/2] printk/panic/x86: Allow to access printk log buffer after crash_smp_send_stop() Petr Mladek
2019-07-18 10:47   ` Sergey Senozhatsky
2019-07-18 11:07     ` Konstantin Khlebnikov
2019-07-18 11:29       ` Sergey Senozhatsky
2019-07-19 12:19         ` Petr Mladek
2019-07-18  9:59 ` [PATCH 0/2] panic/printk/x86: Prevent some more printk-related deadlocks in panic() Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190719125753.miniwfq4nhicy76n@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=john.ogness@linutronix.de \
    --cc=koct9i@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=ptesarik@suse.cz \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.