All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <peterz@infradead.org>,
	x86@kernel.org, John Ogness <john.ogness@linutronix.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Petr Tesarik <ptesarik@suse.cz>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] printk/panic/x86: Allow to access printk log buffer after crash_smp_send_stop()
Date: Fri, 19 Jul 2019 14:19:07 +0200	[thread overview]
Message-ID: <20190719121907.4hrfp4dxdhtjygbi@pathway.suse.cz> (raw)
In-Reply-To: <20190718112954.GA1774@jagdpanzerIV>

On Thu 2019-07-18 20:29:54, Sergey Senozhatsky wrote:
> On (07/18/19 14:07), Konstantin Khlebnikov wrote:
> > > Let me test the waters. Criticize the following idea:
> > >
> > > Can we, sort of, disconnect "supposed to be dead" CPUs from printk()
> > > so then we can unconditionally re-init printk() from panic-CPU?
> > >
> > > We have per-CPU printk_state; so panic-CPU can set, let's say,
> > > DEAD_CPUS_TELL_NO_TALES bit on all CPUs but self, and vprintk_func()
> > > will do nothing if DEAD_CPUS_TELL_NO_TALES bit set on particular
> > > CPU. Foreign CPUs are not even supposed to be alive, and smp_send_stop()
> > > waits for IPI acks from secondary CPUs long enough on average (need
> > > to check that) so if one of the CPUs is misbehaving and doesn't want
> > > to die (geez...) we will just "disconnect" it from printk() to minimize
> > > possible logbuf/console drivers interventions and then proceed with
> > > panic; assuming that misbehaving CPUs are actually up to something
> > > sane. Sometimes, you know, in some cases, those CPUs are already dead:
> > > either accidentally powered off, or went completely nuts and do nothing,
> > > etc. etc. but we still can kdump() and console_flush_on_panic().
> > 
> > Good idea.
> > Panic-CPU could just increment state to reroute printk into 'safe'
> > per-cpu buffer.
> 
> Yeah, that's better.
> 
> So we can do something like this
> 
> @@ -269,15 +269,21 @@ void printk_safe_flush_on_panic(void)
>  	 * Make sure that we could access the main ring buffer.
>  	 * Do not risk a double release when more CPUs are up.
>  	 */
> -	if (raw_spin_is_locked(&logbuf_lock)) {
> -		if (num_online_cpus() > 1)
> -			return;
> +	debug_locks_off();
> +	raw_spin_lock_init(&logbuf_lock);

Hmm, the check for num_online_cpus() is there to avoid deadlock
caused be double unlock. The unconditional init would bring it back.

It is about what compromise we use. We either try to get the messages
out, init the lock and risk a deadlock. Or we do not risk the
deadlock.

The current code is not consistent. printk_safe_flush_on_panic()
is conservative and does not risk the deadlock. While kmsg_dump(),
console_unblank() and console_flush_on_panic() risk the deadlock.

The 1st patch tries to make it more consistent. It makes all
the above functions as conservative as printk_safe_flush_on_panic()
regarding logbuf_lock. While they still risk a deadlock with
console-specific locks.

The reasoning might be:

  + The code under logbuf_lock mostly just manipulates strings. There
    are no nested locks. Infinite loops would most likely happen also
    during normal operation. By other words, it is not easy to stay
    locked in logbuf_lock (he, he, the last famous words).

  + On the other hand, a lot of code is done with disabled interrupts.
    It is easier to imagine that a CPU could not get stopped by normal
    interrupt because some deadlock or infinite loop in interrupt
    context. It might even be a code calling console_unlock().
    Re-initializing logbuf_lock might be unnecessary risk.

  + We are probably more relaxed with console specific locks because
    most of them are ignored in panic after bust_spinlocks().


Anyway, this patchset is primary about logbuf_lock related deadlocks.
I needed a fix for our customers. I do not want to spend too much
time on upstreaming. IMHO, it is better to invest into the lockless
ring buffer.

If we could agree that the patches makes things better (more
consistent, more safe in kdump with enabled notifiers) and
the complexity is acceptable. Then let's accept them
(with some trivial improvements).

If they are too controversial, risky, or complex then
let's abandon them and concentrate on lockless ring buffer.

Best Regards,
Petr

  reply	other threads:[~2019-07-19 12:19 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-16  7:28 [PATCH 0/2] panic/printk/x86: Prevent some more printk-related deadlocks in panic() Petr Mladek
2019-07-16  7:28 ` [PATCH 1/2] printk/panic: Access the main printk log in panic() only when safe Petr Mladek
2019-07-17  9:56   ` Sergey Senozhatsky
2019-07-18  8:36     ` Petr Mladek
2019-07-18  9:49       ` Sergey Senozhatsky
2019-07-19 12:57         ` Petr Mladek
2019-07-23  3:13           ` Sergey Senozhatsky
2019-07-24 12:27             ` Petr Mladek
2019-07-31  6:08               ` Sergey Senozhatsky
2019-07-31  6:59                 ` Sergey Senozhatsky
2019-07-18 10:11   ` Sergey Senozhatsky
2019-07-16  7:28 ` [PATCH 2/2] printk/panic/x86: Allow to access printk log buffer after crash_smp_send_stop() Petr Mladek
2019-07-18 10:47   ` Sergey Senozhatsky
2019-07-18 11:07     ` Konstantin Khlebnikov
2019-07-18 11:29       ` Sergey Senozhatsky
2019-07-19 12:19         ` Petr Mladek [this message]
2019-07-18  9:59 ` [PATCH 0/2] panic/printk/x86: Prevent some more printk-related deadlocks in panic() Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190719121907.4hrfp4dxdhtjygbi@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=john.ogness@linutronix.de \
    --cc=koct9i@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=ptesarik@suse.cz \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.