linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] printk: drop in_nmi check from printk_safe_flush_on_panic()
@ 2018-05-30  7:03 Sergey Senozhatsky
  2018-05-30  7:24 ` Petr Mladek
  2018-05-31 14:05 ` Petr Mladek
  0 siblings, 2 replies; 8+ messages in thread
From: Sergey Senozhatsky @ 2018-05-30  7:03 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt
  Cc: Peter Zijlstra, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Drop the in_nmi() check from printk_safe_flush_on_panic()
and attempt to re-init (IOW unlock) locked logbuf spinlock
from panic CPU regardless of its context. Otherwise,
theoretically, we can deadlock on logbuf trying to flush
per-CPU buffers:
a) Panic CPU is running in non-NMI context
b) Panic CPU sends out shutdown IPI via reboot vector
c) Panic CPU fails to stop all remote CPUs
d) Panic CPU sends out shutdown IPI via NMI vector
   One of the CPUs that we bring down via NMI vector can hold
   logbuf spin lock (theoretically).

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk_safe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..baa80de5d8ec 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -278,7 +278,7 @@ void printk_safe_flush_on_panic(void)
 	 * Make sure that we could access the main ring buffer.
 	 * Do not risk a double release when more CPUs are up.
 	 */
-	if (in_nmi() && raw_spin_is_locked(&logbuf_lock)) {
+	if (raw_spin_is_locked(&logbuf_lock)) {
 		if (num_online_cpus() > 1)
 			return;
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] printk: drop in_nmi check from printk_safe_flush_on_panic()
  2018-05-30  7:03 [PATCH] printk: drop in_nmi check from printk_safe_flush_on_panic() Sergey Senozhatsky
@ 2018-05-30  7:24 ` Petr Mladek
  2018-05-30  7:51   ` Sergey Senozhatsky
  2018-05-31 14:05 ` Petr Mladek
  1 sibling, 1 reply; 8+ messages in thread
From: Petr Mladek @ 2018-05-30  7:24 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Peter Zijlstra, linux-kernel, Sergey Senozhatsky

On Wed 2018-05-30 16:03:50, Sergey Senozhatsky wrote:
> Drop the in_nmi() check from printk_safe_flush_on_panic()
> and attempt to re-init (IOW unlock) locked logbuf spinlock
> from panic CPU regardless of its context. Otherwise,
> theoretically, we can deadlock on logbuf trying to flush
> per-CPU buffers:
> a) Panic CPU is running in non-NMI context
> b) Panic CPU sends out shutdown IPI via reboot vector
> c) Panic CPU fails to stop all remote CPUs
> d) Panic CPU sends out shutdown IPI via NMI vector
>    One of the CPUs that we bring down via NMI vector can hold
>    logbuf spin lock (theoretically).
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

Acked-by: Petr Mladek <pmladek@suse.com>

Just to be sure. IMHO, it is not worth nominating this patch for
stable. It is not a regression fix. I see it as a continuous
improving of the handling in various corner cases. And I see this
as a distant corner case.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] printk: drop in_nmi check from printk_safe_flush_on_panic()
  2018-05-30  7:24 ` Petr Mladek
@ 2018-05-30  7:51   ` Sergey Senozhatsky
  2018-05-30  8:48     ` Petr Mladek
  0 siblings, 1 reply; 8+ messages in thread
From: Sergey Senozhatsky @ 2018-05-30  7:51 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Peter Zijlstra, linux-kernel,
	Sergey Senozhatsky

On (05/30/18 09:24), Petr Mladek wrote:
> Acked-by: Petr Mladek <pmladek@suse.com>

Thanks.

> Just to be sure. IMHO, it is not worth nominating this patch for
> stable. It is not a regression fix. I see it as a continuous
> improving of the handling in various corner cases. And I see this
> as a distant corner case.

Yep, agreed.

***

A random thought [not suggesting anything]:

Given that we call printk() before SMP stop and that some of
smp_send_stop() call printk(), may be we can switch panic()
to printk_safe() mode and return it back to normal printk()
mode right before printk_safe_flush_on_panic(). So all possible
printk()-s that can happen in between (printk_safe_enter()
printk_safe_exit()) will not access the logbuf spin lock, yet
we still will try to flush all per-CPU buffers a bit later.

It probably doesn't sound like a very good/solid idea, just
wondering what will people say.

Very schematically,

---

diff --git a/kernel/panic.c b/kernel/panic.c
index 42e487488554..98a0493a59d3 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -148,6 +148,7 @@ void panic(const char *fmt, ...)
 	 * after setting panic_cpu) from invoking panic() again.
 	 */
 	local_irq_disable();
+	__printk_safe_enter();
 
 	/*
 	 * It's possible to come here directly from a panic-assertion and
@@ -217,6 +218,7 @@ void panic(const char *fmt, ...)
 	 */
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
+	__printk_safe_exit();
 	/* Call flush even twice. It tries harder with a single online CPU */
 	printk_safe_flush_on_panic();
 	kmsg_dump(KMSG_DUMP_PANIC);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] printk: drop in_nmi check from printk_safe_flush_on_panic()
  2018-05-30  7:51   ` Sergey Senozhatsky
@ 2018-05-30  8:48     ` Petr Mladek
  2018-05-30  9:55       ` Sergey Senozhatsky
  0 siblings, 1 reply; 8+ messages in thread
From: Petr Mladek @ 2018-05-30  8:48 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Peter Zijlstra, linux-kernel, Sergey Senozhatsky

On Wed 2018-05-30 16:51:05, Sergey Senozhatsky wrote:
> On (05/30/18 09:24), Petr Mladek wrote:
> > Acked-by: Petr Mladek <pmladek@suse.com>
> 
> Thanks.
> 
> > Just to be sure. IMHO, it is not worth nominating this patch for
> > stable. It is not a regression fix. I see it as a continuous
> > improving of the handling in various corner cases. And I see this
> > as a distant corner case.
> 
> Yep, agreed.
> 
> ***
> 
> A random thought [not suggesting anything]:
> 
> Given that we call printk() before SMP stop and that some of
> smp_send_stop() call printk(), may be we can switch panic()
> to printk_safe() mode and return it back to normal printk()
> mode right before printk_safe_flush_on_panic(). So all possible
> printk()-s that can happen in between (printk_safe_enter()
> printk_safe_exit()) will not access the logbuf spin lock, yet
> we still will try to flush all per-CPU buffers a bit later.
> 
> It probably doesn't sound like a very good/solid idea, just
> wondering what will people say.
> 
> Very schematically,
> 
> ---
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 42e487488554..98a0493a59d3 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -148,6 +148,7 @@ void panic(const char *fmt, ...)
>  	 * after setting panic_cpu) from invoking panic() again.
>  	 */
>  	local_irq_disable();
> +	__printk_safe_enter();

I understand why you came with it but I am against this change without
a proper research. This would redirect too valuable messages into
a buffer of a limited size and postpone flushing them to the consoles.

We would need to really carefully compare chances where this would
help and where it would make things worse. There is a high chance
that we could come with a better solution once we have the analyze.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] printk: drop in_nmi check from printk_safe_flush_on_panic()
  2018-05-30  8:48     ` Petr Mladek
@ 2018-05-30  9:55       ` Sergey Senozhatsky
  2018-05-30 10:00         ` Sergey Senozhatsky
  0 siblings, 1 reply; 8+ messages in thread
From: Sergey Senozhatsky @ 2018-05-30  9:55 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Peter Zijlstra, linux-kernel,
	Sergey Senozhatsky

On (05/30/18 10:48), Petr Mladek wrote:
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 42e487488554..98a0493a59d3 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -148,6 +148,7 @@ void panic(const char *fmt, ...)
> >  	 * after setting panic_cpu) from invoking panic() again.
> >  	 */
> >  	local_irq_disable();
> > +	__printk_safe_enter();
> 
> I understand why you came with it but I am against this change without
> a proper research. This would redirect too valuable messages into
> a buffer of a limited size and postpone flushing them to the consoles.
> 
> We would need to really carefully compare chances where this would
> help and where it would make things worse. There is a high chance
> that we could come with a better solution once we have the analyze.

I agree, sure.

The thing is, we, in fact, already invoke panic() in printk_safe mode.
Sometimes.

Namely,

  nmi_panic() -> panic()

is invoked while we are in printk_nmi(), so all printk()-s go
to the per-CPU buffers. So, at least to some extent, panic()
in printk_safe context is not something never seen before. Just
saying.

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] printk: drop in_nmi check from printk_safe_flush_on_panic()
  2018-05-30  9:55       ` Sergey Senozhatsky
@ 2018-05-30 10:00         ` Sergey Senozhatsky
  2018-05-31 14:21           ` Petr Mladek
  0 siblings, 1 reply; 8+ messages in thread
From: Sergey Senozhatsky @ 2018-05-30 10:00 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Petr Mladek, Steven Rostedt, Peter Zijlstra, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

On (05/30/18 18:55), Sergey Senozhatsky wrote:
> > 
> > I understand why you came with it but I am against this change without
> > a proper research. This would redirect too valuable messages into
> > a buffer of a limited size and postpone flushing them to the consoles.
> > 
> > We would need to really carefully compare chances where this would
> > help and where it would make things worse. There is a high chance
> > that we could come with a better solution once we have the analyze.
> 
> I agree, sure.
> 
> The thing is, we, in fact, already invoke panic() in printk_safe mode.
> Sometimes.
> 
> Namely,
> 
>   nmi_panic() -> panic()
> 
> is invoked while we are in printk_nmi(), so all printk()-s go
> to the per-CPU buffers. So, at least to some extent, panic()
> in printk_safe context is not something never seen before. Just
> saying.

Well, we have a PRINTK_NMI_DEFERRED_CONTEXT_MASK mode for
printk_nmi(). May be we can [if need be] come up with the same trick
for printk_safe_panic() mode. If logbuf spin_lock is unlocked, then
we use the main logbuf, if it is locked, we redirect printk to per-CPU
buffers and then flush it via printk_safe_flush_on_panic(), which will
re-init (unlock) the logbuf.

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] printk: drop in_nmi check from printk_safe_flush_on_panic()
  2018-05-30  7:03 [PATCH] printk: drop in_nmi check from printk_safe_flush_on_panic() Sergey Senozhatsky
  2018-05-30  7:24 ` Petr Mladek
@ 2018-05-31 14:05 ` Petr Mladek
  1 sibling, 0 replies; 8+ messages in thread
From: Petr Mladek @ 2018-05-31 14:05 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Peter Zijlstra, linux-kernel, Sergey Senozhatsky

On Wed 2018-05-30 16:03:50, Sergey Senozhatsky wrote:
> Drop the in_nmi() check from printk_safe_flush_on_panic()
> and attempt to re-init (IOW unlock) locked logbuf spinlock
> from panic CPU regardless of its context. Otherwise,
> theoretically, we can deadlock on logbuf trying to flush
> per-CPU buffers:
> a) Panic CPU is running in non-NMI context
> b) Panic CPU sends out shutdown IPI via reboot vector
> c) Panic CPU fails to stop all remote CPUs
> d) Panic CPU sends out shutdown IPI via NMI vector
>    One of the CPUs that we bring down via NMI vector can hold
>    logbuf spin lock (theoretically).
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

I have pushed this into printk.git, branch for-4.18.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] printk: drop in_nmi check from printk_safe_flush_on_panic()
  2018-05-30 10:00         ` Sergey Senozhatsky
@ 2018-05-31 14:21           ` Petr Mladek
  0 siblings, 0 replies; 8+ messages in thread
From: Petr Mladek @ 2018-05-31 14:21 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Peter Zijlstra, linux-kernel, Sergey Senozhatsky

On Wed 2018-05-30 19:00:37, Sergey Senozhatsky wrote:
> On (05/30/18 18:55), Sergey Senozhatsky wrote:
> > The thing is, we, in fact, already invoke panic() in printk_safe mode.
> > Sometimes.
> > 
> > Namely,
> > 
> >   nmi_panic() -> panic()
> > 
> > is invoked while we are in printk_nmi(), so all printk()-s go
> > to the per-CPU buffers. So, at least to some extent, panic()
> > in printk_safe context is not something never seen before. Just
> > saying.
> 
> Well, we have a PRINTK_NMI_DEFERRED_CONTEXT_MASK mode for
> printk_nmi(). May be we can [if need be] come up with the same trick
> for printk_safe_panic() mode. If logbuf spin_lock is unlocked, then
> we use the main logbuf, if it is locked, we redirect printk to per-CPU
> buffers and then flush it via printk_safe_flush_on_panic(), which will
> re-init (unlock) the logbuf.

All these checks are racy. Now, I believe that it might really prevent
a deadlock in some situations but it might also cause loosing messages
in other situations (never flushed buffer). I am sorry but I am still
unable to decide if it is worth the risk.

I would want to keep it as is until anyone comes with a more detailed
analyze or until we get some bug reports.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-05-31 14:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-30  7:03 [PATCH] printk: drop in_nmi check from printk_safe_flush_on_panic() Sergey Senozhatsky
2018-05-30  7:24 ` Petr Mladek
2018-05-30  7:51   ` Sergey Senozhatsky
2018-05-30  8:48     ` Petr Mladek
2018-05-30  9:55       ` Sergey Senozhatsky
2018-05-30 10:00         ` Sergey Senozhatsky
2018-05-31 14:21           ` Petr Mladek
2018-05-31 14:05 ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).