linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] panic: add an option to replay all the printk message in buffer
@ 2019-04-25 13:32 Feng Tang
  2019-04-26  7:49 ` Petr Mladek
  0 siblings, 1 reply; 8+ messages in thread
From: Feng Tang @ 2019-04-25 13:32 UTC (permalink / raw)
  To: Andrew Morton, Petr Mladek, Steven Rostedt, Sergey Senozhatsky,
	linux-kernel
  Cc: Feng Tang, Aaro Koskinen, Kees Cook, Borislav Petkov

Currently on panic, kernel will lower the loglevel and print out
pending printk msg only with console_flush_on_panic().

Add an option for users to configure the "panic_print" to replay
all dmesg in buffer, some of which they may have never seen due
to the loglevel setting, which will help panic debugging .

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Aaro Koskinen <aaro.koskinen@nokia.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
---
Changelog:

  v4:
    - keep the original console_flush_on_panic() inside
      panic(), as suggested by Petr Mladek

  v3:
    - Solve the compile issue when CONFIG_PRINTK=n, found by
      Andrew Morton
    - Add enum for parameter of console_flush_on_panic()
    - move the printk replay code to console_flush_on_panic()

  v2:
    - Add a new func in printk.c dedicated for the replaying, as
      suggested by Petr Mladek
    - Combine the 2 patches in v1 into one suggested by both Petr
      and Sergey

 Documentation/admin-guide/kernel-parameters.txt | 1 +
 arch/powerpc/kernel/traps.c                     | 2 +-
 include/linux/console.h                         | 7 ++++++-
 kernel/panic.c                                  | 6 +++++-
 kernel/printk/printk.c                          | 8 +++++++-
 5 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2b8ee90..7b15c94 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3135,6 +3135,7 @@
 			bit 2: print timer info
 			bit 3: print locks info if CONFIG_LOCKDEP is on
 			bit 4: print ftrace buffer
+			bit 5: print all printk messages in buffer
 
 	panic_on_warn	panic() instead of WARN().  Useful to cause kdump
 			on a WARN().
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 1fd45a8..0fb785e 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -179,7 +179,7 @@ extern void panic_flush_kmsg_end(void)
 	kmsg_dump(KMSG_DUMP_PANIC);
 	bust_spinlocks(0);
 	debug_locks_off();
-	console_flush_on_panic();
+	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
 }
 
 static unsigned long oops_begin(struct pt_regs *regs)
diff --git a/include/linux/console.h b/include/linux/console.h
index ec9bdb3..d09951d 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -166,6 +166,11 @@ struct console {
 extern int console_set_on_cmdline;
 extern struct console *early_console;
 
+enum con_flush_mode {
+	CONSOLE_FLUSH_PENDING,
+	CONSOLE_REPLAY_ALL,
+};
+
 extern int add_preferred_console(char *name, int idx, char *options);
 extern void register_console(struct console *);
 extern int unregister_console(struct console *);
@@ -175,7 +180,7 @@ extern int console_trylock(void);
 extern void console_unlock(void);
 extern void console_conditional_schedule(void);
 extern void console_unblank(void);
-extern void console_flush_on_panic(void);
+extern void console_flush_on_panic(enum con_flush_mode mode);
 extern struct tty_driver *console_device(int *);
 extern void console_stop(struct console *);
 extern void console_start(struct console *);
diff --git a/kernel/panic.c b/kernel/panic.c
index 0ae0d73..f795feb 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -51,6 +51,7 @@ EXPORT_SYMBOL_GPL(panic_timeout);
 #define PANIC_PRINT_TIMER_INFO		0x00000004
 #define PANIC_PRINT_LOCK_INFO		0x00000008
 #define PANIC_PRINT_FTRACE_INFO		0x00000010
+#define PANIC_PRINT_ALL_PRINTK_MSG	0x00000020
 unsigned long panic_print;
 
 ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
@@ -134,6 +135,9 @@ EXPORT_SYMBOL(nmi_panic);
 
 static void panic_print_sys_info(void)
 {
+	if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
+		console_flush_on_panic(CONSOLE_REPLAY_ALL);
+
 	if (panic_print & PANIC_PRINT_TASK_INFO)
 		show_state();
 
@@ -277,7 +281,7 @@ void panic(const char *fmt, ...)
 	 * panic() is not being callled from OOPS.
 	 */
 	debug_locks_off();
-	console_flush_on_panic();
+	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
 
 	panic_print_sys_info();
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 02ca827..54f93de 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2525,10 +2525,11 @@ void console_unblank(void)
 
 /**
  * console_flush_on_panic - flush console content on panic
+ * @mode: flush all messages in buffer or just the pending ones
  *
  * Immediately output all pending messages no matter what.
  */
-void console_flush_on_panic(void)
+void console_flush_on_panic(enum con_flush_mode mode)
 {
 	/*
 	 * If someone else is holding the console lock, trylock will fail
@@ -2539,6 +2540,11 @@ void console_flush_on_panic(void)
 	 */
 	console_trylock();
 	console_may_schedule = 0;
+
+	if (mode == CONSOLE_REPLAY_ALL) {
+		console_seq = log_first_seq;
+		console_idx = log_first_idx;
+	}
 	console_unlock();
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] panic: add an option to replay all the printk message in buffer
  2019-04-25 13:32 [PATCH v4] panic: add an option to replay all the printk message in buffer Feng Tang
@ 2019-04-26  7:49 ` Petr Mladek
  2019-04-26 13:53   ` Sergey Senozhatsky
  0 siblings, 1 reply; 8+ messages in thread
From: Petr Mladek @ 2019-04-26  7:49 UTC (permalink / raw)
  To: Feng Tang
  Cc: Andrew Morton, Steven Rostedt, Sergey Senozhatsky, linux-kernel,
	Aaro Koskinen, Kees Cook, Borislav Petkov

On Thu 2019-04-25 21:32:17, Feng Tang wrote:
> Currently on panic, kernel will lower the loglevel and print out
> pending printk msg only with console_flush_on_panic().
> 
> Add an option for users to configure the "panic_print" to replay
> all dmesg in buffer, some of which they may have never seen due
> to the loglevel setting, which will help panic debugging .
> 
> @@ -2539,6 +2540,11 @@ void console_flush_on_panic(void)
>  	 */
>  	console_trylock();
>  	console_may_schedule = 0;
> +
> +	if (mode == CONSOLE_REPLAY_ALL) {
> +		console_seq = log_first_seq;
> +		console_idx = log_first_idx;

Ah, log_first_seq and log_first_idx are synchronized by
logbuf_log.

console_flush_on_panic(CONSOLE_REPLAY_ALL) is called
when only one CPU is running but it is not guaranteed.

Therefore we should use:

	if (mode == CONSOLE_REPLAY_ALL) {
		unsigned long flags;

		logbuf_lock_irqsave(flags);
		console_seq = log_first_seq;
		console_idx = log_first_idx;
		logbuf_unlock_irqrestore(flags);
	}

I am really sorry that I have missed it yesterday.

Anyway, with the above locking, feel free to add:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] panic: add an option to replay all the printk message in buffer
  2019-04-26  7:49 ` Petr Mladek
@ 2019-04-26 13:53   ` Sergey Senozhatsky
  2019-04-26 14:14     ` Petr Mladek
  0 siblings, 1 reply; 8+ messages in thread
From: Sergey Senozhatsky @ 2019-04-26 13:53 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Feng Tang, Andrew Morton, Steven Rostedt, Sergey Senozhatsky,
	linux-kernel, Aaro Koskinen, Kees Cook, Borislav Petkov

On (04/26/19 09:49), Petr Mladek wrote:
> On Thu 2019-04-25 21:32:17, Feng Tang wrote:
> > Currently on panic, kernel will lower the loglevel and print out
> > pending printk msg only with console_flush_on_panic().
> > 
> > Add an option for users to configure the "panic_print" to replay
> > all dmesg in buffer, some of which they may have never seen due
> > to the loglevel setting, which will help panic debugging .
> > 
> > @@ -2539,6 +2540,11 @@ void console_flush_on_panic(void)
> >  	 */
> >  	console_trylock();
> >  	console_may_schedule = 0;
> > +
> > +	if (mode == CONSOLE_REPLAY_ALL) {
> > +		console_seq = log_first_seq;
> > +		console_idx = log_first_idx;
> 
> Ah, log_first_seq and log_first_idx are synchronized by
> logbuf_log.
> 
> console_flush_on_panic(CONSOLE_REPLAY_ALL) is called
> when only one CPU is running but it is not guaranteed.
> 
> Therefore we should use:
> 
> 	if (mode == CONSOLE_REPLAY_ALL) {
> 		unsigned long flags;
> 
> 		logbuf_lock_irqsave(flags);
> 		console_seq = log_first_seq;
> 		console_idx = log_first_idx;
> 		logbuf_unlock_irqrestore(flags);
> 	}

I thought about it, and I don't think I see how we can race with
anything here.

Suppose we have panic on CPUA and cactive CPUB in console_unlock():

- if it's not in atomic context, then the moment it does

	call_console_drivers();
	printk_safe_exit_irqrestore(flags);  << IPI

  IPI will take it down.

- If IPI doesn't take it down, then NMI will.

- But, more importantly, if that CPUB is in atomic context, then panic
  CPUA will spin, waiting for that CPUB to handoff printing, before
  panic CPU will even try to stop all CPUs.

	pr_emerg("Kernel panic - not syncing: %s\n", buf)

  is the point of 'synchronization' - panic CPU will wait for
  current console owner.

  Hmm, we might have a bit of a problem here, maybe.

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] panic: add an option to replay all the printk message in buffer
  2019-04-26 13:53   ` Sergey Senozhatsky
@ 2019-04-26 14:14     ` Petr Mladek
  2019-04-26 16:43       ` Sergey Senozhatsky
  0 siblings, 1 reply; 8+ messages in thread
From: Petr Mladek @ 2019-04-26 14:14 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Feng Tang, Andrew Morton, Steven Rostedt, Sergey Senozhatsky,
	linux-kernel, Aaro Koskinen, Kees Cook, Borislav Petkov

On Fri 2019-04-26 22:53:16, Sergey Senozhatsky wrote:
> On (04/26/19 09:49), Petr Mladek wrote:
> > On Thu 2019-04-25 21:32:17, Feng Tang wrote:
> > > Currently on panic, kernel will lower the loglevel and print out
> > > pending printk msg only with console_flush_on_panic().
> > > 
> > > Add an option for users to configure the "panic_print" to replay
> > > all dmesg in buffer, some of which they may have never seen due
> > > to the loglevel setting, which will help panic debugging .
> > > 
> > > @@ -2539,6 +2540,11 @@ void console_flush_on_panic(void)
> > >  	 */
> > >  	console_trylock();
> > >  	console_may_schedule = 0;
> > > +
> > > +	if (mode == CONSOLE_REPLAY_ALL) {
> > > +		console_seq = log_first_seq;
> > > +		console_idx = log_first_idx;
> > 
> > Ah, log_first_seq and log_first_idx are synchronized by
> > logbuf_log.
> > 
> > console_flush_on_panic(CONSOLE_REPLAY_ALL) is called
> > when only one CPU is running but it is not guaranteed.
> > 
> > Therefore we should use:
> > 
> > 	if (mode == CONSOLE_REPLAY_ALL) {
> > 		unsigned long flags;
> > 
> > 		logbuf_lock_irqsave(flags);
> > 		console_seq = log_first_seq;
> > 		console_idx = log_first_idx;
> > 		logbuf_unlock_irqrestore(flags);
> > 	}
> 
> I thought about it, and I don't think I see how we can race with
> anything here.
> 
> Suppose we have panic on CPUA and cactive CPUB in console_unlock():
> 
> - if it's not in atomic context, then the moment it does
> 
> 	call_console_drivers();
> 	printk_safe_exit_irqrestore(flags);  << IPI
> 
>   IPI will take it down.
> 
> - If IPI doesn't take it down, then NMI will.

Then I wonder why, for example, native_stop_other_cpus() waits
10ms at maximum after sending the NMIs. What is the state
of the CPUs that miss this deadline?


> - But, more importantly, if that CPUB is in atomic context, then panic
>   CPUA will spin, waiting for that CPUB to handoff printing, before
>   panic CPU will even try to stop all CPUs.
> 
> 	pr_emerg("Kernel panic - not syncing: %s\n", buf)
> 
>   is the point of 'synchronization' - panic CPU will wait for
>   current console owner.

"Synchronization point" is too strong formulation.
The console waiter logic is effective but it does not always
work. The current console owner must be calling the console
drivers.


>   Hmm, we might have a bit of a problem here, maybe.

Hmm, the printk() might wait forever when NMI stopped
the current console owner in the console driver code
or with the logbuf_lock taken.

The console waiter logic might get solved by clearing
the console_owner in console_flush_on_panic(). It can't
be much worse, we already ignore console_lock() there, ...

The best solution for the logbuf_lock scenario would be
the lock less log buffer.


Anyway, do we really need to have length discussion about
whether the locks are needed? They will not break anything.
And we could ignore one scenario in case of problems.
Also we will not need to think about it when the
call is moved anywhere else.

Finally, I could imagine that people would want to replay
the log also from other locations when debugging nasty bugs.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] panic: add an option to replay all the printk message in buffer
  2019-04-26 14:14     ` Petr Mladek
@ 2019-04-26 16:43       ` Sergey Senozhatsky
  2019-04-26 17:16         ` Sergey Senozhatsky
  0 siblings, 1 reply; 8+ messages in thread
From: Sergey Senozhatsky @ 2019-04-26 16:43 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Feng Tang, Andrew Morton, Steven Rostedt,
	Sergey Senozhatsky, linux-kernel, Aaro Koskinen, Kees Cook,
	Borislav Petkov

On (04/26/19 16:14), Petr Mladek wrote:
>
> Then I wonder why, for example, native_stop_other_cpus() waits
> 10ms at maximum after sending the NMIs. What is the state
> of the CPUs that miss this deadline?

Well, I saw a case when CPU was forcibly powered off (embedded),
for instance.

> > - But, more importantly, if that CPUB is in atomic context, then panic
> >   CPUA will spin, waiting for that CPUB to handoff printing, before
> >   panic CPU will even try to stop all CPUs.
> > 
> > 	pr_emerg("Kernel panic - not syncing: %s\n", buf)
> > 
> >   is the point of 'synchronization' - panic CPU will wait for
> >   current console owner.
> 
> "Synchronization point" is too strong formulation.

But it is sort of synchronized. That's why console_owner patch
set solved the panic-printk deadlock which Google folks reported
a while ago.

> The console waiter logic is effective but it does not always
> work. The current console owner must be calling the console
> drivers.
>
> >   Hmm, we might have a bit of a problem here, maybe.
>
> Hmm, the printk() might wait forever when NMI stopped
> the current console owner in the console driver code
> or with the logbuf_lock taken.

I guess this is why we re-init logbuf lock from panic,
however, we don't do anything with the console_owner.

> The console waiter logic might get solved by clearing
> the console_owner in console_flush_on_panic(). It can't
> be much worse, we already ignore console_lock() there, ...

Right.

[..]
> Anyway, do we really need to have length discussion about
> whether the locks are needed? They will not break anything.

I'm not objecting v5 nor your request to add that locking there.
I'm talking about different things.

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] panic: add an option to replay all the printk message in buffer
  2019-04-26 16:43       ` Sergey Senozhatsky
@ 2019-04-26 17:16         ` Sergey Senozhatsky
  2019-04-29 11:44           ` Petr Mladek
  0 siblings, 1 reply; 8+ messages in thread
From: Sergey Senozhatsky @ 2019-04-26 17:16 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Feng Tang, Andrew Morton, Steven Rostedt, Sergey Senozhatsky,
	linux-kernel, Aaro Koskinen, Kees Cook, Borislav Petkov,
	Sergey Senozhatsky

On (04/27/19 01:43), Sergey Senozhatsky wrote:
[..]
> > The console waiter logic is effective but it does not always
> > work. The current console owner must be calling the console
> > drivers.
> >
> > >   Hmm, we might have a bit of a problem here, maybe.
> >
> > Hmm, the printk() might wait forever when NMI stopped
> > the current console owner in the console driver code
> > or with the logbuf_lock taken.
> 
> I guess this is why we re-init logbuf lock from panic,
> however, we don't do anything with the console_owner.
> 
> > The console waiter logic might get solved by clearing
> > the console_owner in console_flush_on_panic(). It can't
> > be much worse, we already ignore console_lock() there, ...

Hmm, or maybe we are fine... console_waiter logic should work
before we send out stop IPI/NMI from panic CPU. When we call
flush_on_panic() console_unlock() clears console_owner, so
panic_print_sys_info() should not deadlock on console_owner.

It's probably only problematic if we kill a console_owner
CPU and then try to printk() (from smp_send_stop()) before
we do flush_on_panic()->console_unlock().

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] panic: add an option to replay all the printk message in buffer
  2019-04-26 17:16         ` Sergey Senozhatsky
@ 2019-04-29 11:44           ` Petr Mladek
  2019-04-30  6:27             ` Sergey Senozhatsky
  0 siblings, 1 reply; 8+ messages in thread
From: Petr Mladek @ 2019-04-29 11:44 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Feng Tang, Andrew Morton, Steven Rostedt, Sergey Senozhatsky,
	linux-kernel, Aaro Koskinen, Kees Cook, Borislav Petkov

On Sat 2019-04-27 02:16:40, Sergey Senozhatsky wrote:
> On (04/27/19 01:43), Sergey Senozhatsky wrote:
> [..]
> > > The console waiter logic is effective but it does not always
> > > work. The current console owner must be calling the console
> > > drivers.
> > >
> > > >   Hmm, we might have a bit of a problem here, maybe.
> > >
> > > Hmm, the printk() might wait forever when NMI stopped
> > > the current console owner in the console driver code
> > > or with the logbuf_lock taken.
> > 
> > I guess this is why we re-init logbuf lock from panic,
> > however, we don't do anything with the console_owner.

> > > The console waiter logic might get solved by clearing
> > > the console_owner in console_flush_on_panic(). It can't
> > > be much worse, we already ignore console_lock() there, ...
> 
> Hmm, or maybe we are fine... console_waiter logic should work
> before we send out stop IPI/NMI from panic CPU. When we call
> flush_on_panic() console_unlock() clears console_owner, so
> panic_print_sys_info() should not deadlock on console_owner.

Good point!

> It's probably only problematic if we kill a console_owner
> CPU and then try to printk() (from smp_send_stop()) before
> we do flush_on_panic()->console_unlock().

Yup. There are called several functions between smp_send_stop()
and console_flush_on_panic().

The question is if it is worth a code complication. We could
never 100% guarantee that printk() would work in panic().
I more and more understand what Peter Zijlstra means
by the duct taping.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] panic: add an option to replay all the printk message in buffer
  2019-04-29 11:44           ` Petr Mladek
@ 2019-04-30  6:27             ` Sergey Senozhatsky
  0 siblings, 0 replies; 8+ messages in thread
From: Sergey Senozhatsky @ 2019-04-30  6:27 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Feng Tang, Andrew Morton, Steven Rostedt,
	Sergey Senozhatsky, linux-kernel, Aaro Koskinen, Kees Cook,
	Borislav Petkov

On (04/29/19 13:44), Petr Mladek wrote:
> On Sat 2019-04-27 02:16:40, Sergey Senozhatsky wrote:
> > On (04/27/19 01:43), Sergey Senozhatsky wrote:
> > [..]
> > > > The console waiter logic is effective but it does not always
> > > > work. The current console owner must be calling the console
> > > > drivers.
> > > >
> > > > >   Hmm, we might have a bit of a problem here, maybe.
> > > >
> > > > Hmm, the printk() might wait forever when NMI stopped
> > > > the current console owner in the console driver code
> > > > or with the logbuf_lock taken.
> > > 
> > > I guess this is why we re-init logbuf lock from panic,
> > > however, we don't do anything with the console_owner.
> 
> > > > The console waiter logic might get solved by clearing
> > > > the console_owner in console_flush_on_panic(). It can't
> > > > be much worse, we already ignore console_lock() there, ...
> > 
> > Hmm, or maybe we are fine... console_waiter logic should work
> > before we send out stop IPI/NMI from panic CPU. When we call
> > flush_on_panic() console_unlock() clears console_owner, so
> > panic_print_sys_info() should not deadlock on console_owner.
> 
> Good point!
> 
> > It's probably only problematic if we kill a console_owner
> > CPU and then try to printk() (from smp_send_stop()) before
> > we do flush_on_panic()->console_unlock().
> 
> Yup. There are called several functions between smp_send_stop()
> and console_flush_on_panic().
> 
> The question is if it is worth a code complication. We could
> never 100% guarantee that printk() would work in panic().
> I more and more understand what Peter Zijlstra means
> by the duct taping.

Agreed.

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-04-30  5:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-25 13:32 [PATCH v4] panic: add an option to replay all the printk message in buffer Feng Tang
2019-04-26  7:49 ` Petr Mladek
2019-04-26 13:53   ` Sergey Senozhatsky
2019-04-26 14:14     ` Petr Mladek
2019-04-26 16:43       ` Sergey Senozhatsky
2019-04-26 17:16         ` Sergey Senozhatsky
2019-04-29 11:44           ` Petr Mladek
2019-04-30  6:27             ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).