All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] printk/nmi: restore printk_func in nmi_panic
@ 2016-02-26  3:37 Sergey Senozhatsky
  2016-02-26 14:57 ` Petr Mladek
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-02-26  3:37 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Jan Kara, Peter Zijlstra, Steven Rostedt,
	Ingo Molnar, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

When watchdog detects a hardlockup and calls nmi_panic() `printk_func'
must be restored via printk_nmi_exit() call, so panic() will be able
to flush nmi buf and show backtrace and panic message. We also better
explicitly ask nmi to printk_nmi_flush() in console_flush_on_panic(),
because it may be too late to rely on irq work.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 include/linux/kernel.h | 6 ++++--
 kernel/printk/printk.c | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index f4fa2b2..3ee33d5 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -469,10 +469,12 @@ do {									\
 	cpu = raw_smp_processor_id();					\
 	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, cpu);	\
 									\
-	if (old_cpu == PANIC_CPU_INVALID)				\
+	if (old_cpu == PANIC_CPU_INVALID) {				\
+		printk_nmi_exit();					\
 		panic(fmt, ##__VA_ARGS__);				\
-	else if (old_cpu != cpu)					\
+	} else if (old_cpu != cpu) {					\
 		nmi_panic_self_stop(regs);				\
+	}								\
 } while (0)
 
 /*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9917f69..13a4022 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2395,6 +2395,7 @@ void console_flush_on_panic(void)
 	 */
 	console_trylock();
 	console_may_schedule = 0;
+	printk_nmi_flush();
 	console_unlock();
 }
 
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/nmi: restore printk_func in nmi_panic
  2016-02-26  3:37 [PATCH] printk/nmi: restore printk_func in nmi_panic Sergey Senozhatsky
@ 2016-02-26 14:57 ` Petr Mladek
  2016-02-27  2:19   ` Sergey Senozhatsky
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Mladek @ 2016-02-26 14:57 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Jan Kara, Peter Zijlstra, Steven Rostedt,
	Ingo Molnar, linux-kernel, Sergey Senozhatsky

On Fri 2016-02-26 12:37:20, Sergey Senozhatsky wrote:
> When watchdog detects a hardlockup and calls nmi_panic() `printk_func'
> must be restored via printk_nmi_exit() call, so panic() will be able
> to flush nmi buf and show backtrace and panic message. We also better
> explicitly ask nmi to printk_nmi_flush() in console_flush_on_panic(),
> because it may be too late to rely on irq work.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  include/linux/kernel.h | 6 ++++--
>  kernel/printk/printk.c | 1 +
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index f4fa2b2..3ee33d5 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -469,10 +469,12 @@ do {									\
>  	cpu = raw_smp_processor_id();					\
>  	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, cpu);	\
>  									\
> -	if (old_cpu == PANIC_CPU_INVALID)				\
> +	if (old_cpu == PANIC_CPU_INVALID) {				\
> +		printk_nmi_exit();					\

This might end up in a deadlock that printk_nmi() wanted to avoid.

I think about a compromise. We should try to get the messages
out only when kdump is not enabled. If the kdump is enabled
we should avoid the risk a the deadlock here and let people
to find the messages in the dump.

I am going to play with this a bit.

Best Regards,
Petr

PS: I am sorry for not responding much about the printk problems
this week. I have attended a training and did not have much time
for a real work.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/nmi: restore printk_func in nmi_panic
  2016-02-26 14:57 ` Petr Mladek
@ 2016-02-27  2:19   ` Sergey Senozhatsky
  2016-02-27  3:09     ` Sergey Senozhatsky
  2016-02-29 10:31     ` Petr Mladek
  0 siblings, 2 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-02-27  2:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Andrew Morton, Jan Kara, Peter Zijlstra,
	Steven Rostedt, Ingo Molnar, linux-kernel, Sergey Senozhatsky

Hello Petr,

On (02/26/16 15:57), Petr Mladek wrote:
> On Fri 2016-02-26 12:37:20, Sergey Senozhatsky wrote:
> > When watchdog detects a hardlockup and calls nmi_panic() `printk_func'
> > must be restored via printk_nmi_exit() call, so panic() will be able
> > to flush nmi buf and show backtrace and panic message. We also better
> > explicitly ask nmi to printk_nmi_flush() in console_flush_on_panic(),
> > because it may be too late to rely on irq work.
> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > ---
> >  include/linux/kernel.h | 6 ++++--
> >  kernel/printk/printk.c | 1 +
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index f4fa2b2..3ee33d5 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -469,10 +469,12 @@ do {									\
> >  	cpu = raw_smp_processor_id();					\
> >  	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, cpu);	\
> >  									\
> > -	if (old_cpu == PANIC_CPU_INVALID)				\
> > +	if (old_cpu == PANIC_CPU_INVALID) {				\
> > +		printk_nmi_exit();					\
> 
> This might end up in a deadlock that printk_nmi() wanted to avoid.

aha, I see.

> I think about a compromise. We should try to get the messages
> out only when kdump is not enabled.

can we zap_locks() if we are on nmi_panic()->panic()->console_flush_on_panic() path?
console_flush_on_panic() is happening after we send out smp_send_stop().


> PS: I am sorry for not responding much about the printk problems
> this week. I have attended a training and did not have much time
> for a real work.

no prob.

	-ss

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/nmi: restore printk_func in nmi_panic
  2016-02-27  2:19   ` Sergey Senozhatsky
@ 2016-02-27  3:09     ` Sergey Senozhatsky
  2016-02-27  3:33       ` Sergey Senozhatsky
  2016-02-29 10:31     ` Petr Mladek
  1 sibling, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-02-27  3:09 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Andrew Morton, Jan Kara, Peter Zijlstra,
	Steven Rostedt, Ingo Molnar, linux-kernel, Sergey Senozhatsky

On (02/27/16 11:19), Sergey Senozhatsky wrote:
[..]
> > I think about a compromise. We should try to get the messages
> > out only when kdump is not enabled.
> 
> can we zap_locks() if we are on nmi_panic()->panic()->console_flush_on_panic() path?
> console_flush_on_panic() is happening after we send out smp_send_stop().

can something like this do the trick?

====8<====

>From 4186873bb4574b4bbb227e7448d56599849de0bd Mon Sep 17 00:00:00 2001
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: [PATCH] printk/nmi: restore printk_func in nmi_panic

When watchdog detects a hardlockup and calls nmi_panic() `printk_func'
must be restored via printk_nmi_exit() call, so panic() will be able
to flush nmi buf and show backtrace and panic message. We also better
explicitly ask nmi to printk_nmi_flush() in console_flush_on_panic(),
because it may be too late to rely on irq work.

Factor out __zap_locks(), and call it from console_flush_on_panic().
This is normally not needed, because logbuf_lock always comes with
IRQ disable/enable magic, but we can panic() from nmi.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 include/linux/kernel.h |  6 ++++--
 kernel/printk/printk.c | 27 ++++++++++++++++-----------
 2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index f4fa2b2..3ee33d5 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -469,10 +469,12 @@ do {									\
 	cpu = raw_smp_processor_id();					\
 	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, cpu);	\
 									\
-	if (old_cpu == PANIC_CPU_INVALID)				\
+	if (old_cpu == PANIC_CPU_INVALID) {				\
+		printk_nmi_exit();					\
 		panic(fmt, ##__VA_ARGS__);				\
-	else if (old_cpu != cpu)					\
+	} else if (old_cpu != cpu) {					\
 		nmi_panic_self_stop(regs);				\
+	}								\
 } while (0)
 
 /*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9917f69..0a318ed 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1462,6 +1462,15 @@ static void call_console_drivers(int level,
 	}
 }
 
+static void __zap_locks(void)
+{
+	debug_locks_off();
+	/* If a crash is occurring, make sure we can't deadlock */
+	raw_spin_lock_init(&logbuf_lock);
+	/* And make sure that we print immediately */
+	sema_init(&console_sem, 1);
+}
+
 /*
  * Zap console related locks when oopsing.
  * To leave time for slow consoles to print a full oops,
@@ -1477,11 +1486,7 @@ static void zap_locks(void)
 
 	oops_timestamp = jiffies;
 
-	debug_locks_off();
-	/* If a crash is occurring, make sure we can't deadlock */
-	raw_spin_lock_init(&logbuf_lock);
-	/* And make sure that we print immediately */
-	sema_init(&console_sem, 1);
+	__zap_locks();
 }
 
 int printk_delay_msec __read_mostly;
@@ -2386,15 +2391,15 @@ void console_unblank(void)
  */
 void console_flush_on_panic(void)
 {
-	/*
-	 * If someone else is holding the console lock, trylock will fail
-	 * and may_schedule may be set.  Ignore and proceed to unlock so
-	 * that messages are flushed out.  As this can be called from any
-	 * context and we don't want to get preempted while flushing,
-	 * ensure may_schedule is cleared.
+	__zap_locks();
+
+	/* As this can be called from any context and we don't want
+	 * to get preempted while flushing, ensure may_schedule is
+	 * cleared.
 	 */
 	console_trylock();
 	console_may_schedule = 0;
+	printk_nmi_flush();
 	console_unlock();
 }
 
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/nmi: restore printk_func in nmi_panic
  2016-02-27  3:09     ` Sergey Senozhatsky
@ 2016-02-27  3:33       ` Sergey Senozhatsky
  2016-02-28  3:52         ` Sergey Senozhatsky
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-02-27  3:33 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, Andrew Morton, Jan Kara,
	Peter Zijlstra, Steven Rostedt, Ingo Molnar, linux-kernel

On (02/27/16 12:09), Sergey Senozhatsky wrote:
> On (02/27/16 11:19), Sergey Senozhatsky wrote:
> [..]
> > > I think about a compromise. We should try to get the messages
> > > out only when kdump is not enabled.
> > 
> > can we zap_locks() if we are on nmi_panic()->panic()->console_flush_on_panic() path?
> > console_flush_on_panic() is happening after we send out smp_send_stop().
> 
> can something like this do the trick?

hm, no. it can't.

I forgot to move printk_nmi_exit() from nmi_panic() to panic(). so
it should have been:

	panic()
		...
		printk_nmi_exit()
		console_flush_on_panic()
			__zap_locks()
			printk_nmi_flush()
			console_unlock()

but this __zap_locks() can _in theory_ race with irq_work->printk_nmi_flush().
so we need something more than this...

	-ss

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/nmi: restore printk_func in nmi_panic
  2016-02-27  3:33       ` Sergey Senozhatsky
@ 2016-02-28  3:52         ` Sergey Senozhatsky
  0 siblings, 0 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-02-28  3:52 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Jan Kara, Peter Zijlstra, Steven Rostedt,
	Ingo Molnar, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

Petr, what do you think about this?

1) __zap_locks() in console_flush_on_panic()
2) add printk_nmi_flush_on_panic() -- disable irq work, exit nmi, flush nmi


---
 include/linux/printk.h |  2 ++
 kernel/printk/nmi.c    | 28 +++++++++++++++++++---------
 kernel/printk/printk.c | 27 ++++++++++++++++-----------
 3 files changed, 37 insertions(+), 20 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 51dd6b8..19d52d4 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -127,11 +127,13 @@ extern void printk_nmi_init(void);
 extern void printk_nmi_enter(void);
 extern void printk_nmi_exit(void);
 extern void printk_nmi_flush(void);
+extern void printk_nmi_flush_on_panic(void);
 #else
 static inline void printk_nmi_init(void) { }
 static inline void printk_nmi_enter(void) { }
 static inline void printk_nmi_exit(void) { }
 static inline void printk_nmi_flush(void) { }
+extern void printk_nmi_flush_on_panic(void) { }
 #endif /* PRINTK_NMI */
 
 #ifdef CONFIG_PRINTK
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 98e9e80..c3fde8a 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -52,6 +52,15 @@ struct nmi_seq_buf {
 static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
 
 /*
+ * The lock has two functions. First, one reader has to flush all
+ * available message to make the lockless synchronization with
+ * writers easier. Second, we do not want to mix messages from
+ * different CPUs. This is especially important when printing
+ * a backtrace.
+ */
+static DEFINE_RAW_SPINLOCK(read_lock);
+
+/*
  * Safe printk() for NMI context. It uses a per-CPU buffer to
  * store the message. NMIs are not nested, so there is always only
  * one writer running. But the buffer might get flushed from another
@@ -115,19 +124,10 @@ static void print_nmi_seq_line(struct nmi_seq_buf *s, int start, int end)
  */
 static void __printk_nmi_flush(struct irq_work *work)
 {
-	static raw_spinlock_t read_lock =
-		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
 	struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
 	size_t len, size;
 	int i, last_i;
 
-	/*
-	 * The lock has two functions. First, one reader has to flush all
-	 * available message to make the lockless synchronization with
-	 * writers easier. Second, we do not want to mix messages from
-	 * different CPUs. This is especially important when printing
-	 * a backtrace.
-	 */
 	raw_spin_lock(&read_lock);
 
 	i = 0;
@@ -220,3 +220,13 @@ void printk_nmi_exit(void)
 {
 	this_cpu_write(printk_func, vprintk_default);
 }
+
+void printk_nmi_flush_on_panic(void)
+{
+	raw_spin_lock(&read_lock);
+	printk_nmi_exit();
+	printk_nmi_irq_ready = 0;
+	raw_spin_unlock(&read_lock);
+
+	printk_nmi_flush();
+}
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9917f69..fc5e6d4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1462,6 +1462,15 @@ static void call_console_drivers(int level,
 	}
 }
 
+static void __zap_locks(void)
+{
+	debug_locks_off();
+	/* If a crash is occurring, make sure we can't deadlock */
+	raw_spin_lock_init(&logbuf_lock);
+	/* And make sure that we print immediately */
+	sema_init(&console_sem, 1);
+}
+
 /*
  * Zap console related locks when oopsing.
  * To leave time for slow consoles to print a full oops,
@@ -1477,11 +1486,7 @@ static void zap_locks(void)
 
 	oops_timestamp = jiffies;
 
-	debug_locks_off();
-	/* If a crash is occurring, make sure we can't deadlock */
-	raw_spin_lock_init(&logbuf_lock);
-	/* And make sure that we print immediately */
-	sema_init(&console_sem, 1);
+	__zap_locks();
 }
 
 int printk_delay_msec __read_mostly;
@@ -2386,15 +2391,15 @@ void console_unblank(void)
  */
 void console_flush_on_panic(void)
 {
-	/*
-	 * If someone else is holding the console lock, trylock will fail
-	 * and may_schedule may be set.  Ignore and proceed to unlock so
-	 * that messages are flushed out.  As this can be called from any
-	 * context and we don't want to get preempted while flushing,
-	 * ensure may_schedule is cleared.
+	__zap_locks();
+
+	/* As this can be called from any context and we don't want
+	 * to get preempted while flushing, ensure may_schedule is
+	 * cleared.
 	 */
 	console_trylock();
 	console_may_schedule = 0;
+	printk_nmi_flush_on_panic();
 	console_unlock();
 }
 
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/nmi: restore printk_func in nmi_panic
  2016-02-27  2:19   ` Sergey Senozhatsky
  2016-02-27  3:09     ` Sergey Senozhatsky
@ 2016-02-29 10:31     ` Petr Mladek
  2016-02-29 11:19       ` Sergey Senozhatsky
  1 sibling, 1 reply; 11+ messages in thread
From: Petr Mladek @ 2016-02-29 10:31 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, Jan Kara, Peter Zijlstra,
	Steven Rostedt, Ingo Molnar, linux-kernel

On Sat 2016-02-27 11:19:44, Sergey Senozhatsky wrote:
> Hello Petr,
> 
> On (02/26/16 15:57), Petr Mladek wrote:
> > On Fri 2016-02-26 12:37:20, Sergey Senozhatsky wrote:
> > > When watchdog detects a hardlockup and calls nmi_panic() `printk_func'
> > > must be restored via printk_nmi_exit() call, so panic() will be able
> > > to flush nmi buf and show backtrace and panic message. We also better
> > > explicitly ask nmi to printk_nmi_flush() in console_flush_on_panic(),
> > > because it may be too late to rely on irq work.
> > > 
> > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > > ---
> > >  include/linux/kernel.h | 6 ++++--
> > >  kernel/printk/printk.c | 1 +
> > >  2 files changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > > index f4fa2b2..3ee33d5 100644
> > > --- a/include/linux/kernel.h
> > > +++ b/include/linux/kernel.h
> > > @@ -469,10 +469,12 @@ do {									\
> > >  	cpu = raw_smp_processor_id();					\
> > >  	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, cpu);	\
> > >  									\
> > > -	if (old_cpu == PANIC_CPU_INVALID)				\
> > > +	if (old_cpu == PANIC_CPU_INVALID) {				\
> > > +		printk_nmi_exit();					\
> > 
> > This might end up in a deadlock that printk_nmi() wanted to avoid.
> 
> aha, I see.
> 
> > I think about a compromise. We should try to get the messages
> > out only when kdump is not enabled.
> 
> can we zap_locks() if we are on nmi_panic()->panic()->console_flush_on_panic() path?

That is the problem. zap_locks() is not a solution.

First, it handles only lockbuf_lock and console_sem. There are other
locks used by particular consoles that might cause a deadlock.

Second, re-initializing locks is dangerous of its own. If they are
released by some other CPU that is still running, you might end up
in a deadlock because of a double release. In fact, I think that it
actually increases the risk. If there are more than 2 CPUs than
it is more likely that a printk is running on another CPU than
on the current one.


Peter Zijlstra had an idea of using early console in this case.
I am not sure but I guess that it does not have any internal locks.
But there is still the other problem with the double release.

I am afraid that the only solution is to make it configurable.
Some people might want to risk the deadlock and try to see the messages
on console. Others might rather want to get the crashdump for sure
with the cost that they will need to extract the NMI messages
from the per-CPU buffers.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/nmi: restore printk_func in nmi_panic
  2016-02-29 10:31     ` Petr Mladek
@ 2016-02-29 11:19       ` Sergey Senozhatsky
  2016-03-01  9:24         ` Sergey Senozhatsky
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-02-29 11:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, Jan Kara,
	Peter Zijlstra, Steven Rostedt, Ingo Molnar, linux-kernel

On (02/29/16 11:31), Petr Mladek wrote:
> > > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > > > index f4fa2b2..3ee33d5 100644
> > > > --- a/include/linux/kernel.h
> > > > +++ b/include/linux/kernel.h
> > > > @@ -469,10 +469,12 @@ do {									\
> > > >  	cpu = raw_smp_processor_id();					\
> > > >  	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, cpu);	\
> > > >  									\
> > > > -	if (old_cpu == PANIC_CPU_INVALID)				\
> > > > +	if (old_cpu == PANIC_CPU_INVALID) {				\
> > > > +		printk_nmi_exit();					\
> > > 
> > > This might end up in a deadlock that printk_nmi() wanted to avoid.
> > 
> > aha, I see.
> > 
> > > I think about a compromise. We should try to get the messages
> > > out only when kdump is not enabled.
> > 
> > can we zap_locks() if we are on nmi_panic()->panic()->console_flush_on_panic() path?
> 
> That is the problem. zap_locks() is not a solution.
> 
> First, it handles only lockbuf_lock and console_sem. There are other
> locks used by particular consoles that might cause a deadlock.

yes, well, that's true for panic() in general. we can't take care of
all of the locks that possibly could have been in busy state at the
time CPU received STOP_IPI or entered panic(). we can't even safely
iterate all of the consoles and call ->reset() on them (provided that
such struct console callback will ever be implemented) because
smp_send_stop() is free to leave some CPUs online.


> Second, re-initializing locks is dangerous of its own. If they are
> released by some other CPU that is still running, you might end up
> in a deadlock because of a double release. In fact, I think that it
> actually increases the risk. If there are more than 2 CPUs than
> it is more likely that a printk is running on another CPU than
> on the current one.

panic calls debug_locks_off(), so chances *seem* to be lower.

hm... we are (sort of) on the safe side if we know that smp_send_stop() has
stopped all the CPUs but panic cpu; so zap_locks() is safe (to some extent of
course) when we know that num_online_cpus() == 1 in console_flush_on_panic().

	-ss

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/nmi: restore printk_func in nmi_panic
  2016-02-29 11:19       ` Sergey Senozhatsky
@ 2016-03-01  9:24         ` Sergey Senozhatsky
  2016-03-01 11:05           ` Petr Mladek
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-03-01  9:24 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Jan Kara, Peter Zijlstra, Steven Rostedt,
	Ingo Molnar, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

Hello,

On (02/29/16 20:19), Sergey Senozhatsky wrote:
[..]
> > That is the problem. zap_locks() is not a solution.
> > 
> > First, it handles only lockbuf_lock and console_sem. There are other
> > locks used by particular consoles that might cause a deadlock.
> 
> yes, well, that's true for panic() in general.

Petr, what do you think of this (added PRINTK_NMI_FLUSH_ON_PANIC)?

1) zap_locks() in console_flush_on_panic()
2) add PRINTK_NMI_FLUSH_ON_PANIC symbols
3) add printk_nmi_flush_on_panic()

---
 include/linux/printk.h |  2 ++
 init/Kconfig           |  9 +++++++++
 kernel/printk/nmi.c    | 31 ++++++++++++++++++++++---------
 kernel/printk/printk.c | 27 ++++++++++++++++-----------
 4 files changed, 49 insertions(+), 20 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 51dd6b8..19d52d4 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -127,11 +127,13 @@ extern void printk_nmi_init(void);
 extern void printk_nmi_enter(void);
 extern void printk_nmi_exit(void);
 extern void printk_nmi_flush(void);
+extern void printk_nmi_flush_on_panic(void);
 #else
 static inline void printk_nmi_init(void) { }
 static inline void printk_nmi_enter(void) { }
 static inline void printk_nmi_exit(void) { }
 static inline void printk_nmi_flush(void) { }
+extern void printk_nmi_flush_on_panic(void) { }
 #endif /* PRINTK_NMI */
 
 #ifdef CONFIG_PRINTK
diff --git a/init/Kconfig b/init/Kconfig
index 651ec15..05921a6 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -883,6 +883,15 @@ config NMI_LOG_BUF_SHIFT
 		     13 =>   8 KB for each CPU
 		     12 =>   4 KB for each CPU
 
+config PRINTK_NMI_FLUSH_ON_PANIC
+	bool "Try to flush NMI logs to consoles on panic"
+	depends on PRINTK_NMI
+	help
+	  Attempt to flush (printk) NMI per-cpu logs to consoles on panic.
+	  This may be risky and may end up in deadlock; printk will attempt
+	  to zap its locks to reduce this possibility, but there are still
+	  chances left.
+
 #
 # Architectures with an unreliable sched_clock() should select this:
 #
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 98e9e80..9da8be3 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -52,6 +52,15 @@ struct nmi_seq_buf {
 static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
 
 /*
+ * The lock has two functions. First, one reader has to flush all
+ * available message to make the lockless synchronization with
+ * writers easier. Second, we do not want to mix messages from
+ * different CPUs. This is especially important when printing
+ * a backtrace.
+ */
+static DEFINE_RAW_SPINLOCK(read_lock);
+
+/*
  * Safe printk() for NMI context. It uses a per-CPU buffer to
  * store the message. NMIs are not nested, so there is always only
  * one writer running. But the buffer might get flushed from another
@@ -115,19 +124,10 @@ static void print_nmi_seq_line(struct nmi_seq_buf *s, int start, int end)
  */
 static void __printk_nmi_flush(struct irq_work *work)
 {
-	static raw_spinlock_t read_lock =
-		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
 	struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
 	size_t len, size;
 	int i, last_i;
 
-	/*
-	 * The lock has two functions. First, one reader has to flush all
-	 * available message to make the lockless synchronization with
-	 * writers easier. Second, we do not want to mix messages from
-	 * different CPUs. This is especially important when printing
-	 * a backtrace.
-	 */
 	raw_spin_lock(&read_lock);
 
 	i = 0;
@@ -220,3 +220,16 @@ void printk_nmi_exit(void)
 {
 	this_cpu_write(printk_func, vprintk_default);
 }
+
+void printk_nmi_flush_on_panic(void)
+{
+	if (!IS_ENABLED(CONFIG_PRINTK_NMI_FLUSH_ON_PANIC))
+		return;
+
+	raw_spin_lock(&read_lock);
+	printk_nmi_exit();
+	printk_nmi_irq_ready = 0;
+	raw_spin_unlock(&read_lock);
+
+	printk_nmi_flush();
+}
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9917f69..fc5e6d4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1462,6 +1462,15 @@ static void call_console_drivers(int level,
 	}
 }
 
+static void __zap_locks(void)
+{
+	debug_locks_off();
+	/* If a crash is occurring, make sure we can't deadlock */
+	raw_spin_lock_init(&logbuf_lock);
+	/* And make sure that we print immediately */
+	sema_init(&console_sem, 1);
+}
+
 /*
  * Zap console related locks when oopsing.
  * To leave time for slow consoles to print a full oops,
@@ -1477,11 +1486,7 @@ static void zap_locks(void)
 
 	oops_timestamp = jiffies;
 
-	debug_locks_off();
-	/* If a crash is occurring, make sure we can't deadlock */
-	raw_spin_lock_init(&logbuf_lock);
-	/* And make sure that we print immediately */
-	sema_init(&console_sem, 1);
+	__zap_locks();
 }
 
 int printk_delay_msec __read_mostly;
@@ -2386,15 +2391,15 @@ void console_unblank(void)
  */
 void console_flush_on_panic(void)
 {
-	/*
-	 * If someone else is holding the console lock, trylock will fail
-	 * and may_schedule may be set.  Ignore and proceed to unlock so
-	 * that messages are flushed out.  As this can be called from any
-	 * context and we don't want to get preempted while flushing,
-	 * ensure may_schedule is cleared.
+	__zap_locks();
+
+	/* As this can be called from any context and we don't want
+	 * to get preempted while flushing, ensure may_schedule is
+	 * cleared.
 	 */
 	console_trylock();
 	console_may_schedule = 0;
+	printk_nmi_flush_on_panic();
 	console_unlock();
 }
 
-- 
2.8.0.rc0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/nmi: restore printk_func in nmi_panic
  2016-03-01  9:24         ` Sergey Senozhatsky
@ 2016-03-01 11:05           ` Petr Mladek
  2016-03-01 13:14             ` Sergey Senozhatsky
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Mladek @ 2016-03-01 11:05 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Jan Kara, Peter Zijlstra, Steven Rostedt,
	Ingo Molnar, linux-kernel, Sergey Senozhatsky

On Tue 2016-03-01 18:24:26, Sergey Senozhatsky wrote:
> Hello,
> 
> On (02/29/16 20:19), Sergey Senozhatsky wrote:
> [..]
> > > That is the problem. zap_locks() is not a solution.
> > > 
> > > First, it handles only lockbuf_lock and console_sem. There are other
> > > locks used by particular consoles that might cause a deadlock.
> > 
> > yes, well, that's true for panic() in general.
> 
> Petr, what do you think of this (added PRINTK_NMI_FLUSH_ON_PANIC)?
> 
> 1) zap_locks() in console_flush_on_panic()
> 2) add PRINTK_NMI_FLUSH_ON_PANIC symbols
> 3) add printk_nmi_flush_on_panic()

This is definitely better than nothing. Well, it seems that
the printk/NMI patches that motivated this patch will be
removed for a while, see
http://thread.gmane.org/gmane.linux.ports.arm.kernel/482845/focus=483002

I want to play with the panic handling a bit more and make it better
working out of box. It might be enough to put the messages into
the rind buffer when crashdump is going to be produced. Also
there is still the idea about using the lock-less early console.
I think that the solution from this patch might be the last
fallback.

Thanks a lot for proposals,
Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/nmi: restore printk_func in nmi_panic
  2016-03-01 11:05           ` Petr Mladek
@ 2016-03-01 13:14             ` Sergey Senozhatsky
  0 siblings, 0 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2016-03-01 13:14 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Andrew Morton, Jan Kara, Peter Zijlstra,
	Steven Rostedt, Ingo Molnar, linux-kernel, Sergey Senozhatsky

On (03/01/16 12:05), Petr Mladek wrote:
[..]
> > > yes, well, that's true for panic() in general.
> > 
> > Petr, what do you think of this (added PRINTK_NMI_FLUSH_ON_PANIC)?
> > 
> > 1) zap_locks() in console_flush_on_panic()
> > 2) add PRINTK_NMI_FLUSH_ON_PANIC symbols
> > 3) add printk_nmi_flush_on_panic()
> 
> This is definitely better than nothing. Well, it seems that
> the printk/NMI patches that motivated this patch will be
> removed for a while, see
> http://thread.gmane.org/gmane.linux.ports.arm.kernel/482845/focus=483002
> 

Oh, thanks a lot for the link! wasn't aware of it.

> I want to play with the panic handling a bit more and make it better
> working out of box. It might be enough to put the messages into
> the rind buffer when crashdump is going to be produced. Also
> there is still the idea about using the lock-less early console.
> I think that the solution from this patch might be the last
> fallback.
> 
> Thanks a lot for proposals,

OK, good to know that.

	-ss

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-03-01 13:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-26  3:37 [PATCH] printk/nmi: restore printk_func in nmi_panic Sergey Senozhatsky
2016-02-26 14:57 ` Petr Mladek
2016-02-27  2:19   ` Sergey Senozhatsky
2016-02-27  3:09     ` Sergey Senozhatsky
2016-02-27  3:33       ` Sergey Senozhatsky
2016-02-28  3:52         ` Sergey Senozhatsky
2016-02-29 10:31     ` Petr Mladek
2016-02-29 11:19       ` Sergey Senozhatsky
2016-03-01  9:24         ` Sergey Senozhatsky
2016-03-01 11:05           ` Petr Mladek
2016-03-01 13:14             ` Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.