All of lore.kernel.org
 help / color / mirror / Atom feed
* + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
@ 2018-12-04  7:15 akpm
  2018-12-04 10:10 ` Sergey Senozhatsky
  2018-12-04 10:20 ` Petr Mladek
  0 siblings, 2 replies; 25+ messages in thread
From: akpm @ 2018-12-04  7:15 UTC (permalink / raw)
  To: bp, feng.tang, keescook, mm-commits, pmladek, sergey.senozhatsky,
	stable, tglx


The patch titled
     Subject: panic: Avoid the extra noise dmesg
has been added to the -mm tree.  Its filename is
     panic-avoid-the-extra-noise-dmesg.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/panic-avoid-the-extra-noise-dmesg.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/panic-avoid-the-extra-noise-dmesg.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Feng Tang <feng.tang@intel.com>
Subject: panic: Avoid the extra noise dmesg

When kernel panic happens, it will first print the panic call stack,
then the ending msg like:

[   35.743249] ---[ end Kernel panic - not syncing: Fatal exception
[   35.749975] ------------[ cut here ]------------

The above message are very useful for debugging.

But if system is configured to not reboot on panic, say the "panic_timeout"
parameter equals 0, it will likely print out many noisy message like
WARN() call stack for each and every CPU except the panic one, messages
like below:

	WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
	Call Trace:
	<IRQ>
	try_to_wake_up
	default_wake_function
	autoremove_wake_function
	__wake_up_common
	__wake_up_common_lock
	__wake_up
	wake_up_klogd_work_func
	irq_work_run_list
	irq_work_tick
	update_process_times
	tick_sched_timer
	__hrtimer_run_queues
	hrtimer_interrupt
	smp_apic_timer_interrupt
	apic_timer_interrupt

For people working in console mode, the screen will first show the panic
call stack, but immediately overridded by these noisy extra messages, which
makes debugging much more difficult, as the original context gets lost on
screen.

Also these noisy messages will confuse some users, as I have seen many bug
reporters posted the noisy message into bugzilla, instead of the real panic
call stack and context.

Removing the "local_irq_enable" will avoid the noisy message.

The justification for the removing is: when code runs to this point, it
means user has chosed to not reboot, or do any special handling by using
the panic notifier method, no much point in re-enabling the interrupt.

Link: http://lkml.kernel.org/r/1543902228-23834-1-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---


--- a/kernel/panic.c~panic-avoid-the-extra-noise-dmesg
+++ a/kernel/panic.c
@@ -322,7 +322,6 @@ void panic(const char *fmt, ...)
 	}
 #endif
 	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
-	local_irq_enable();
 	for (i = 0; ; i += PANIC_TIMER_STEP) {
 		touch_softlockup_watchdog();
 		if (i >= i_next) {
_

Patches currently in -mm which might be from feng.tang@intel.com are

panic-add-options-to-print-system-info-when-panic-happens.patch
kernel-sysctl-add-panic_print-into-sysctl.patch
panic-avoid-the-extra-noise-dmesg.patch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-04  7:15 + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree akpm
@ 2018-12-04 10:10 ` Sergey Senozhatsky
  2018-12-04 10:20 ` Petr Mladek
  1 sibling, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2018-12-04 10:10 UTC (permalink / raw)
  To: Feng Tang
  Cc: bp, feng.tang, keescook, mm-commits, pmladek, sergey.senozhatsky,
	stable, tglx, akpm, peterz, rostedt

On (12/03/18 23:15), akpm@linux-foundation.org wrote:
[..]
> Feng Tang wrote:
>
[..]
> 
> For people working in console mode, the screen will first show the panic
> call stack, but immediately overridded by these noisy extra messages, which
> makes debugging much more difficult, as the original context gets lost on
> screen.
> 
> Also these noisy messages will confuse some users, as I have seen many bug
> reporters posted the noisy message into bugzilla, instead of the real panic
> call stack and context.
> 
> Removing the "local_irq_enable" will avoid the noisy message.
> 
> The justification for the removing is: when code runs to this point, it
> means user has chosed to not reboot, or do any special handling by using
> the panic notifier method, no much point in re-enabling the interrupt.

[..]


> @@ -322,7 +322,6 @@ void panic(const char *fmt, ...)
>  	}
>  #endif
>  	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
> -	local_irq_enable();
>  	for (i = 0; ; i += PANIC_TIMER_STEP) {
>  		touch_softlockup_watchdog();
>  		if (i >= i_next) {

Hmm, looking at 2.4 kernel:

---  panic() ---
...
        sti();
        for(;;) {
#if defined(CONFIG_X86) && defined(CONFIG_VT)
                extern void panic_blink(void);
                panic_blink();
#endif
                CHECK_EMERGENCY_SYNC
        }
----------------

CHECK_EMERGENCY_SYNC is

#define CHECK_EMERGENCY_SYNC                    \
        if (emergency_sync_scheduled)           \
                do_emergency_sync();

And emergency_sync_scheduled is set by sysrq.


I wonder if this is still the case - sysrq over serial, for instance,
we want local irqs for that.

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-04  7:15 + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree akpm
  2018-12-04 10:10 ` Sergey Senozhatsky
@ 2018-12-04 10:20 ` Petr Mladek
  2018-12-04 15:49   ` Feng Tang
  1 sibling, 1 reply; 25+ messages in thread
From: Petr Mladek @ 2018-12-04 10:20 UTC (permalink / raw)
  To: akpm
  Cc: bp, feng.tang, keescook, mm-commits, sergey.senozhatsky, stable,
	tglx, Steven Rostedt, Peter Zijlstra

On Mon 2018-12-03 23:15:31, Andrew Morton wrote:
> 
> The patch titled
>      Subject: panic: Avoid the extra noise dmesg
> has been added to the -mm tree.  Its filename is
>      panic-avoid-the-extra-noise-dmesg.patch
> 
> This patch should soon appear at
>     http://ozlabs.org/~akpm/mmots/broken-out/panic-avoid-the-extra-noise-dmesg.patch
> and later at
>     http://ozlabs.org/~akpm/mmotm/broken-out/panic-avoid-the-extra-noise-dmesg.patch
> 
> Before you just go and hit "reply", please:
>    a) Consider who else should be cc'ed
>    b) Prefer to cc a suitable mailing list as well
>    c) Ideally: find the original patch on the mailing list and do a
>       reply-to-all to that, adding suitable additional cc's
> 
> *** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
> 
> The -mm tree is included into linux-next and is updated
> there every 3-4 working days
> 
> ------------------------------------------------------
> From: Feng Tang <feng.tang@intel.com>
> Subject: panic: Avoid the extra noise dmesg
> 
> When kernel panic happens, it will first print the panic call stack,
> then the ending msg like:
> 
> [   35.743249] ---[ end Kernel panic - not syncing: Fatal exception
> [   35.749975] ------------[ cut here ]------------
> 
> The above message are very useful for debugging.
> 
> But if system is configured to not reboot on panic, say the "panic_timeout"
> parameter equals 0, it will likely print out many noisy message like
> WARN() call stack for each and every CPU except the panic one, messages
> like below:
> 
> 	WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
> 	Call Trace:
> 	<IRQ>
> 	try_to_wake_up
> 	default_wake_function
> 	autoremove_wake_function
> 	__wake_up_common
> 	__wake_up_common_lock
> 	__wake_up
> 	wake_up_klogd_work_func
> 	irq_work_run_list
> 	irq_work_tick
> 	update_process_times
> 	tick_sched_timer
> 	__hrtimer_run_queues
> 	hrtimer_interrupt
> 	smp_apic_timer_interrupt
> 	apic_timer_interrupt

I guess that it is a warning about migrating tasks to an offline CPU.

> For people working in console mode, the screen will first show the panic
> call stack, but immediately overridded by these noisy extra messages, which
> makes debugging much more difficult, as the original context gets lost on
> screen.
> 
> Also these noisy messages will confuse some users, as I have seen many bug
> reporters posted the noisy message into bugzilla, instead of the real panic
> call stack and context.
> 
> Removing the "local_irq_enable" will avoid the noisy message.
> 
> The justification for the removing is: when code runs to this point, it
> means user has chosed to not reboot, or do any special handling by using
> the panic notifier method, no much point in re-enabling the interrupt.
> 
> Link: http://lkml.kernel.org/r/1543902228-23834-1-git-send-email-feng.tang@intel.com
> Signed-off-by: Feng Tang <feng.tang@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
> 
> --- a/kernel/panic.c~panic-avoid-the-extra-noise-dmesg
> +++ a/kernel/panic.c
> @@ -322,7 +322,6 @@ void panic(const char *fmt, ...)
>  	}
>  #endif
>  	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
> -	local_irq_enable();
>  	for (i = 0; ; i += PANIC_TIMER_STEP) {
>  		touch_softlockup_watchdog();
>  		if (i >= i_next) {

Hmm, this calls panic_blink(). It seems that it depends on workqueues
and the scheduler:

  + led_panic_blink()
    + led_trigger_event()
      + led_set_brightness()
        + schedule_work(set_brightness_work)

I guess that blinking might be important in some situations and
on some devices. On the other hand, we are interested only into
blinking from this point on.

The easiest solution seems to be to make a noop from printk().
For example, we could add a global flag:

int panic_blinking;

and add the following into vprintk_func()

	/*
	 * Do not push away real panic() message by warnings from led
	 * blinking code.
	 */
	if (panic_blinking)
		return 0;

How does that sound?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-04 10:20 ` Petr Mladek
@ 2018-12-04 15:49   ` Feng Tang
  2018-12-04 16:01     ` Petr Mladek
  2018-12-05  2:26     ` Sergey Senozhatsky
  0 siblings, 2 replies; 25+ messages in thread
From: Feng Tang @ 2018-12-04 15:49 UTC (permalink / raw)
  To: Petr Mladek
  Cc: akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx,
	Steven Rostedt, Peter Zijlstra, Sasha Levin

+ Sasha 

Thanks Petr and Sergey for the reviews.


On Tue, Dec 04, 2018 at 11:20:33AM +0100, Petr Mladek wrote:
> On Mon 2018-12-03 23:15:31, Andrew Morton wrote:
> > 
> > ------------------------------------------------------
> > From: Feng Tang <feng.tang@intel.com>
> > Subject: panic: Avoid the extra noise dmesg
> > 
> > When kernel panic happens, it will first print the panic call stack,
> > then the ending msg like:
> > 
> > [   35.743249] ---[ end Kernel panic - not syncing: Fatal exception
> > [   35.749975] ------------[ cut here ]------------
> > 
> > The above message are very useful for debugging.
> > 
> > But if system is configured to not reboot on panic, say the "panic_timeout"
> > parameter equals 0, it will likely print out many noisy message like
> > WARN() call stack for each and every CPU except the panic one, messages
> > like below:
> > 
> > 	WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
> > 	Call Trace:
> > 	<IRQ>
> > 	try_to_wake_up
> > 	default_wake_function
> > 	autoremove_wake_function
> > 	__wake_up_common
> > 	__wake_up_common_lock
> > 	__wake_up
> > 	wake_up_klogd_work_func
> > 	irq_work_run_list
> > 	irq_work_tick
> > 	update_process_times
> > 	tick_sched_timer
> > 	__hrtimer_run_queues
> > 	hrtimer_interrupt
> > 	smp_apic_timer_interrupt
> > 	apic_timer_interrupt
> 
> I guess that it is a warning about migrating tasks to an offline CPU.
 
My v1 patch was trying to add some hacky code into architecture code to
address several WARN()s directly, but that turns out to be very hacky and
involve much code for many archs.

> > For people working in console mode, the screen will first show the panic
> > call stack, but immediately overridded by these noisy extra messages, which
> > makes debugging much more difficult, as the original context gets lost on
> > screen.
> > 
> > Also these noisy messages will confuse some users, as I have seen many bug
> > reporters posted the noisy message into bugzilla, instead of the real panic
> > call stack and context.
> > 
> > Removing the "local_irq_enable" will avoid the noisy message.
> > 
> > The justification for the removing is: when code runs to this point, it
> > means user has chosed to not reboot, or do any special handling by using
> > the panic notifier method, no much point in re-enabling the interrupt.
> > 
> > Link: http://lkml.kernel.org/r/1543902228-23834-1-git-send-email-feng.tang@intel.com
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Petr Mladek <pmladek@suse.com>
> > Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > ---
> > 
> > 
> > --- a/kernel/panic.c~panic-avoid-the-extra-noise-dmesg
> > +++ a/kernel/panic.c
> > @@ -322,7 +322,6 @@ void panic(const char *fmt, ...)
> >  	}
> >  #endif
> >  	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
> > -	local_irq_enable();
> >  	for (i = 0; ; i += PANIC_TIMER_STEP) {
> >  		touch_softlockup_watchdog();
> >  		if (i >= i_next) {
> 
> Hmm, this calls panic_blink(). It seems that it depends on workqueues
> and the scheduler:
> 
>   + led_panic_blink()
>     + led_trigger_event()
>       + led_set_brightness()
>         + schedule_work(set_brightness_work)
> 
> I guess that blinking might be important in some situations and
> on some devices. On the other hand, we are interested only into
> blinking from this point on.
> 
> The easiest solution seems to be to make a noop from printk().
> For example, we could add a global flag:
> 
> int panic_blinking;
> 
> and add the following into vprintk_func()
> 
> 	/*
> 	 * Do not push away real panic() message by warnings from led
> 	 * blinking code.
> 	 */
> 	if (panic_blinking)
> 		return 0;
> 
> How does that sound?

This should be able to achieve the same goal. 

One thing I can think of is what mentioned by Sergey that some sysrq
handler may want to print out something, but it should mostly be
covered by 2 other panic debug print patches, which will print out
task/mem/timer/lock/ftrace info runtime on demand.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-04 15:49   ` Feng Tang
@ 2018-12-04 16:01     ` Petr Mladek
  2018-12-05  1:53       ` Feng Tang
  2018-12-05  2:26     ` Sergey Senozhatsky
  1 sibling, 1 reply; 25+ messages in thread
From: Petr Mladek @ 2018-12-04 16:01 UTC (permalink / raw)
  To: Feng Tang
  Cc: akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx,
	Steven Rostedt, Peter Zijlstra, Sasha Levin

On Tue 2018-12-04 23:49:36, Feng Tang wrote:
> + Sasha 
> 
> Thanks Petr and Sergey for the reviews.
> 
> 
> On Tue, Dec 04, 2018 at 11:20:33AM +0100, Petr Mladek wrote:
> > On Mon 2018-12-03 23:15:31, Andrew Morton wrote:
> > > 
> > > ------------------------------------------------------
> > > From: Feng Tang <feng.tang@intel.com>
> > > Subject: panic: Avoid the extra noise dmesg
> > > 
> > > When kernel panic happens, it will first print the panic call stack,
> > > then the ending msg like:
> > > 
> > > [   35.743249] ---[ end Kernel panic - not syncing: Fatal exception
> > > [   35.749975] ------------[ cut here ]------------
> > > 
> > > The above message are very useful for debugging.
> > > 
> > > But if system is configured to not reboot on panic, say the "panic_timeout"
> > > parameter equals 0, it will likely print out many noisy message like
> > > WARN() call stack for each and every CPU except the panic one, messages
> > > like below:
> > > 
> > > 	WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
> > > 	Call Trace:
> > > 	<IRQ>
> > > 	try_to_wake_up
> > > 	default_wake_function
> > > 	autoremove_wake_function
> > > 	__wake_up_common
> > > 	__wake_up_common_lock
> > > 	__wake_up
> > > 	wake_up_klogd_work_func
> > > 	irq_work_run_list
> > > 	irq_work_tick
> > > 	update_process_times
> > > 	tick_sched_timer
> > > 	__hrtimer_run_queues
> > > 	hrtimer_interrupt
> > > 	smp_apic_timer_interrupt
> > > 	apic_timer_interrupt
> > 
> > I guess that it is a warning about migrating tasks to an offline CPU.
>  
> My v1 patch was trying to add some hacky code into architecture code to
> address several WARN()s directly, but that turns out to be very hacky and
> involve much code for many archs.
> 
> > > For people working in console mode, the screen will first show the panic
> > > call stack, but immediately overridded by these noisy extra messages, which
> > > makes debugging much more difficult, as the original context gets lost on
> > > screen.
> > > 
> > > Also these noisy messages will confuse some users, as I have seen many bug
> > > reporters posted the noisy message into bugzilla, instead of the real panic
> > > call stack and context.
> > > 
> > > Removing the "local_irq_enable" will avoid the noisy message.
> > > 
> > > The justification for the removing is: when code runs to this point, it
> > > means user has chosed to not reboot, or do any special handling by using
> > > the panic notifier method, no much point in re-enabling the interrupt.
> > > 
> > > Link: http://lkml.kernel.org/r/1543902228-23834-1-git-send-email-feng.tang@intel.com
> > > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Borislav Petkov <bp@suse.de>
> > > Cc: Petr Mladek <pmladek@suse.com>
> > > Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > > Cc: <stable@vger.kernel.org>
> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > > ---
> > > 
> > > 
> > > --- a/kernel/panic.c~panic-avoid-the-extra-noise-dmesg
> > > +++ a/kernel/panic.c
> > > @@ -322,7 +322,6 @@ void panic(const char *fmt, ...)
> > >  	}
> > >  #endif
> > >  	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
> > > -	local_irq_enable();
> > >  	for (i = 0; ; i += PANIC_TIMER_STEP) {
> > >  		touch_softlockup_watchdog();
> > >  		if (i >= i_next) {
> > 
> > Hmm, this calls panic_blink(). It seems that it depends on workqueues
> > and the scheduler:
> > 
> >   + led_panic_blink()
> >     + led_trigger_event()
> >       + led_set_brightness()
> >         + schedule_work(set_brightness_work)
> > 
> > I guess that blinking might be important in some situations and
> > on some devices. On the other hand, we are interested only into
> > blinking from this point on.
> > 
> > The easiest solution seems to be to make a noop from printk().
> > For example, we could add a global flag:
> > 
> > int panic_blinking;
> > 
> > and add the following into vprintk_func()
> > 
> > 	/*
> > 	 * Do not push away real panic() message by warnings from led
> > 	 * blinking code.
> > 	 */
> > 	if (panic_blinking)
> > 		return 0;
> > 
> > How does that sound?
> 
> This should be able to achieve the same goal. 
> 
> One thing I can think of is what mentioned by Sergey that some sysrq
> handler may want to print out something, but it should mostly be
> covered by 2 other panic debug print patches, which will print out
> task/mem/timer/lock/ftrace info runtime on demand.

I think that we could simply clear panic_blinking from
__handle_sysrq(). The user will still be able to capture the screen
before touching the keyboard. But it will keep the things simple.

I hope that we did not miss anything else. Anyway, the approach with
making printk a nop still looks like the best maintainable solution
to me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-04 16:01     ` Petr Mladek
@ 2018-12-05  1:53       ` Feng Tang
  2018-12-05  2:50         ` Sergey Senozhatsky
  0 siblings, 1 reply; 25+ messages in thread
From: Feng Tang @ 2018-12-05  1:53 UTC (permalink / raw)
  To: Petr Mladek
  Cc: akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx,
	Steven Rostedt, Peter Zijlstra, Sasha Levin

Hi Petr,

On Tue, Dec 04, 2018 at 05:01:52PM +0100, Petr Mladek wrote:
> On Tue 2018-12-04 23:49:36, Feng Tang wrote:
> > + Sasha 
> > 
> > Thanks Petr and Sergey for the reviews.
> > 
> > 
> > On Tue, Dec 04, 2018 at 11:20:33AM +0100, Petr Mladek wrote:
> > > On Mon 2018-12-03 23:15:31, Andrew Morton wrote:
> > > > 
> > > > ------------------------------------------------------
> > > > From: Feng Tang <feng.tang@intel.com>
> > > > Subject: panic: Avoid the extra noise dmesg
> > > > 
> > > > When kernel panic happens, it will first print the panic call stack,
> > > > then the ending msg like:
> > > > 
> > > > [   35.743249] ---[ end Kernel panic - not syncing: Fatal exception
> > > > [   35.749975] ------------[ cut here ]------------
> > > > 
> > > > The above message are very useful for debugging.
> > > > 
> > > > But if system is configured to not reboot on panic, say the "panic_timeout"
> > > > parameter equals 0, it will likely print out many noisy message like
> > > > WARN() call stack for each and every CPU except the panic one, messages
> > > > like below:
> > > > 
> > > > 	WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
> > > > 	Call Trace:
> > > > 	<IRQ>
> > > > 	try_to_wake_up
> > > > 	default_wake_function
> > > > 	autoremove_wake_function
> > > > 	__wake_up_common
> > > > 	__wake_up_common_lock
> > > > 	__wake_up
> > > > 	wake_up_klogd_work_func
> > > > 	irq_work_run_list
> > > > 	irq_work_tick
> > > > 	update_process_times
> > > > 	tick_sched_timer
> > > > 	__hrtimer_run_queues
> > > > 	hrtimer_interrupt
> > > > 	smp_apic_timer_interrupt
> > > > 	apic_timer_interrupt
> > > 
> > > I guess that it is a warning about migrating tasks to an offline CPU.
> >  
> > My v1 patch was trying to add some hacky code into architecture code to
> > address several WARN()s directly, but that turns out to be very hacky and
> > involve much code for many archs.
> > 
> > > > For people working in console mode, the screen will first show the panic
> > > > call stack, but immediately overridded by these noisy extra messages, which
> > > > makes debugging much more difficult, as the original context gets lost on
> > > > screen.
> > > > 
> > > > Also these noisy messages will confuse some users, as I have seen many bug
> > > > reporters posted the noisy message into bugzilla, instead of the real panic
> > > > call stack and context.
> > > > 
> > > > Removing the "local_irq_enable" will avoid the noisy message.
> > > > 
> > > > The justification for the removing is: when code runs to this point, it
> > > > means user has chosed to not reboot, or do any special handling by using
> > > > the panic notifier method, no much point in re-enabling the interrupt.
> > > > 
> > > > Link: http://lkml.kernel.org/r/1543902228-23834-1-git-send-email-feng.tang@intel.com
> > > > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Borislav Petkov <bp@suse.de>
> > > > Cc: Petr Mladek <pmladek@suse.com>
> > > > Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > > > Cc: <stable@vger.kernel.org>
> > > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > > > ---
> > > > 
> > > > 
> > > > --- a/kernel/panic.c~panic-avoid-the-extra-noise-dmesg
> > > > +++ a/kernel/panic.c
> > > > @@ -322,7 +322,6 @@ void panic(const char *fmt, ...)
> > > >  	}
> > > >  #endif
> > > >  	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
> > > > -	local_irq_enable();
> > > >  	for (i = 0; ; i += PANIC_TIMER_STEP) {
> > > >  		touch_softlockup_watchdog();
> > > >  		if (i >= i_next) {
> > > 
> > > Hmm, this calls panic_blink(). It seems that it depends on workqueues
> > > and the scheduler:
> > > 
> > >   + led_panic_blink()
> > >     + led_trigger_event()
> > >       + led_set_brightness()
> > >         + schedule_work(set_brightness_work)
> > > 
> > > I guess that blinking might be important in some situations and
> > > on some devices. On the other hand, we are interested only into
> > > blinking from this point on.
> > > 
> > > The easiest solution seems to be to make a noop from printk().
> > > For example, we could add a global flag:
> > > 
> > > int panic_blinking;
> > > 
> > > and add the following into vprintk_func()
> > > 
> > > 	/*
> > > 	 * Do not push away real panic() message by warnings from led
> > > 	 * blinking code.
> > > 	 */
> > > 	if (panic_blinking)
> > > 		return 0;
> > > 
> > > How does that sound?
> > 
> > This should be able to achieve the same goal. 
> > 
> > One thing I can think of is what mentioned by Sergey that some sysrq
> > handler may want to print out something, but it should mostly be
> > covered by 2 other panic debug print patches, which will print out
> > task/mem/timer/lock/ftrace info runtime on demand.
> 
> I think that we could simply clear panic_blinking from
> __handle_sysrq(). The user will still be able to capture the screen
> before touching the keyboard. But it will keep the things simple.
> 
> I hope that we did not miss anything else. Anyway, the approach with
> making printk a nop still looks like the best maintainable solution
> to me.
 
I will setup a platform which can handle sysrq request and try your
suggestion. thanks,

- Feng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-04 15:49   ` Feng Tang
  2018-12-04 16:01     ` Petr Mladek
@ 2018-12-05  2:26     ` Sergey Senozhatsky
  2018-12-05  2:47       ` Feng Tang
  1 sibling, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2018-12-05  2:26 UTC (permalink / raw)
  To: Feng Tang
  Cc: Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky,
	stable, tglx, Steven Rostedt, Peter Zijlstra, Sasha Levin

On (12/04/18 23:49), Feng Tang wrote:
> This should be able to achieve the same goal. 
> 
> One thing I can think of is what mentioned by Sergey that some sysrq
> handler may want to print out something, but it should mostly be
> covered by 2 other panic debug print patches, which will print out
> task/mem/timer/lock/ftrace info runtime on demand.

Well, not all sysrq handlers just printk stuff; some do sane things,
like emergency sync, umount, etc.

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-05  2:26     ` Sergey Senozhatsky
@ 2018-12-05  2:47       ` Feng Tang
  2018-12-05  2:57         ` Sergey Senozhatsky
  0 siblings, 1 reply; 25+ messages in thread
From: Feng Tang @ 2018-12-05  2:47 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky,
	stable, tglx, Steven Rostedt, Peter Zijlstra, Sasha Levin

Hi Sergey,

On Wed, Dec 05, 2018 at 11:26:54AM +0900, Sergey Senozhatsky wrote:
> On (12/04/18 23:49), Feng Tang wrote:
> > This should be able to achieve the same goal. 
> > 
> > One thing I can think of is what mentioned by Sergey that some sysrq
> > handler may want to print out something, but it should mostly be
> > covered by 2 other panic debug print patches, which will print out
> > task/mem/timer/lock/ftrace info runtime on demand.
> 
> Well, not all sysrq handlers just printk stuff; some do sane things,
> like emergency sync, umount, etc.

Yes, I understand. I said that by assuming we will keep the irq enabled and
use the panic_blinking to control the flag, as suggested by Petr and you.

With irq enabled, these sync, umount will be completed as usual.

Btw, just FYI,  I just tried the sysrq (using minicom CTL + A + F + 'magic key'),
it works with system is running, but failed after I trigger a panic, I will
check more though I'm not very familiar with sysrq yet.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-05  1:53       ` Feng Tang
@ 2018-12-05  2:50         ` Sergey Senozhatsky
  2018-12-05  3:05           ` Sergey Senozhatsky
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2018-12-05  2:50 UTC (permalink / raw)
  To: Feng Tang
  Cc: Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky,
	stable, tglx, Steven Rostedt, Peter Zijlstra, Sasha Levin

On (12/05/18 09:53), Feng Tang wrote:
> > I think that we could simply clear panic_blinking from
> > __handle_sysrq(). The user will still be able to capture the screen
> > before touching the keyboard. But it will keep the things simple.
> > 
> > I hope that we did not miss anything else. Anyway, the approach with
> > making printk a nop still looks like the best maintainable solution
> > to me.
>  
> I will setup a platform which can handle sysrq request and try your
> suggestion. thanks,

I don't entirely understand this patch series, sorry. So you want to
keep local IRQs disabled to, supposedly, have less printk-s between
dump_stack() from panic CPU and "end Kernel panic" marker; yet at the
same time you add *significantly* more printk-s between dump_stack()
from panic CPU and "end Kernel panic" marker.

panic_print_sys_info() can be very verbose, and it happens much later
than dump_stack() from panic CPU. So you are guaranteed to have same
problems you are trying to avoid: "the original context gets
lost on screen" and "confused people post bad bug reports".

Am I missing something?

Dunno. Just a bunch of ideas (raw ideas).
Is something like below going to work for you instead?

---

@@ -327,6 +327,9 @@ void panic(const char *fmt, ...)
 #endif
 	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
 	local_irq_enable();
+
+	dump_stack();
+
 	for (i = 0; ; i += PANIC_TIMER_STEP) {
 		touch_softlockup_watchdog();
 		if (i >= i_next) {

---

Or... *Maybe* you can even do a ratelimited dump_stack() from that
PANIC_TIMER_STEP loop. Say, one dump_stack() every 10 minutes. The
WARN_ON noise should stop at some point.

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-05  2:47       ` Feng Tang
@ 2018-12-05  2:57         ` Sergey Senozhatsky
  2018-12-05  5:29           ` Sergey Senozhatsky
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2018-12-05  2:57 UTC (permalink / raw)
  To: Feng Tang
  Cc: Sergey Senozhatsky, Petr Mladek, akpm, bp, keescook, mm-commits,
	sergey.senozhatsky, stable, tglx, Steven Rostedt, Peter Zijlstra,
	Sasha Levin

On (12/05/18 10:47), Feng Tang wrote:
> 
> Btw, just FYI,  I just tried the sysrq (using minicom CTL + A + F + 'magic key'),
> it works with system is running, but failed after I trigger a panic, I will
> check more though I'm not very familiar with sysrq yet.
> 

Thanks! I'm not entirely sure if people still do this. I just saw the
code in linux 2.4 and assumed that this still might be the case.

Well, "theoretically, why not" :) The system can panic() because
of misbehaving video driver, for instance, so the emergency sync
still should be possible.

But maybe it's all irrelevant, I don't know.

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-05  2:50         ` Sergey Senozhatsky
@ 2018-12-05  3:05           ` Sergey Senozhatsky
  2018-12-05  3:27             ` Feng Tang
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2018-12-05  3:05 UTC (permalink / raw)
  To: Feng Tang
  Cc: Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky,
	stable, tglx, Steven Rostedt, Peter Zijlstra, Sasha Levin,
	Sergey Senozhatsky

On (12/05/18 11:50), Sergey Senozhatsky wrote:
> 
> panic_print_sys_info() can be very verbose, and it happens much later
> than dump_stack() from panic CPU. So you are guaranteed to have same
> problems you are trying to avoid: "the original context gets
> lost on screen" and "confused people post bad bug reports".
> 

And it probably would be _a bit_ better to do panic_print_sys_info()
before console_flush_on_panic().

---
        debug_locks_off();
-       console_flush_on_panic();
-
        panic_print_sys_info();
+       console_flush_on_panic();
---

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-05  3:05           ` Sergey Senozhatsky
@ 2018-12-05  3:27             ` Feng Tang
  0 siblings, 0 replies; 25+ messages in thread
From: Feng Tang @ 2018-12-05  3:27 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky,
	stable, tglx, Steven Rostedt, Peter Zijlstra, Sasha Levin

Hi Sergey,

On Wed, Dec 05, 2018 at 12:05:55PM +0900, Sergey Senozhatsky wrote:
> On (12/05/18 11:50), Sergey Senozhatsky wrote:
> > 
> > panic_print_sys_info() can be very verbose, and it happens much later
> > than dump_stack() from panic CPU. So you are guaranteed to have same
> > problems you are trying to avoid: "the original context gets
> > lost on screen" and "confused people post bad bug reports".
> > 
> 
> And it probably would be _a bit_ better to do panic_print_sys_info()
> before console_flush_on_panic().
> 
> ---
>         debug_locks_off();
> -       console_flush_on_panic();
> -
>         panic_print_sys_info();
> +       console_flush_on_panic();
> ---

Current situation of kernel panic print is like this:

	1. local_irq_disable() 
	2. panic stack dump ----> Msg A
	3. local_irq_enable()
	4. lots of WARN()   ----> Msg B 

Msg A is what we need, and Msg B is want we want to eliminate.

https://lkml.org/lkml/2018/11/27/459 has the example log for
kernel panic.

My v3 patch is to remove the step 3, keep the IRQ disabled to
avoid Msg B.

Given the sysrq and led blinking concerns brought by you and Petr,
I'm trying to test the panic_blink way while keeping the step 3
of local_irq_enable()

Thanks,
Feng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-05  2:57         ` Sergey Senozhatsky
@ 2018-12-05  5:29           ` Sergey Senozhatsky
  2018-12-05  8:00             ` Sergey Senozhatsky
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2018-12-05  5:29 UTC (permalink / raw)
  To: Feng Tang, Peter Zijlstra
  Cc: Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky,
	stable, tglx, Steven Rostedt, Sasha Levin, Sergey Senozhatsky,
	Andi Kleen

On (12/05/18 11:57), Sergey Senozhatsky wrote:
> On (12/05/18 10:47), Feng Tang wrote:
> > 
> > Btw, just FYI,  I just tried the sysrq (using minicom CTL + A + F + 'magic key'),
> > it works with system is running, but failed after I trigger a panic, I will
> > check more though I'm not very familiar with sysrq yet.

OK... So, apparently, what's happening is panic() calls smp_send_stop().
And smp_send_stop()->native_stop_other_cpus() on x86 disables local APIC.
So no fun anymore.


If I keep APIC enabled on panic CPU, then I have my keyboard working,
including PageUp-PageDown scrolling, sysrq handling, and so on.

I think I'm not the only one who'd want scrollback to work after panic
(yes, fremebuffer for debugging).


Andi Kleen [1] wrote (Cc-ed):
: Oops/warnings are getting longer and longer, often scrolling away
: from the screen, and if the kernel crashes backscroll does not work
: anymore, so precious information is lost.


PeterZ,
  for those folks who sometimes have to use framebuffer for debugging
  (just a trivial "let me scrollback and see the panic backtrace") and
  not always have access to serial console, can we have local APIC
  enabled on the panic_cpu? Or is it a terrible thing to ask for?


[1] https://lore.kernel.org/lkml/878tcvt592.fsf@linux.intel.com/T/#u

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-05  5:29           ` Sergey Senozhatsky
@ 2018-12-05  8:00             ` Sergey Senozhatsky
  2018-12-05 15:46               ` Feng Tang
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2018-12-05  8:00 UTC (permalink / raw)
  To: Feng Tang, Peter Zijlstra
  Cc: Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky,
	stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen,
	Sergey Senozhatsky

On (12/05/18 14:29), Sergey Senozhatsky wrote:
> On (12/05/18 11:57), Sergey Senozhatsky wrote:
> > On (12/05/18 10:47), Feng Tang wrote:
>
> OK... So, apparently, what's happening is panic() calls smp_send_stop().
> And smp_send_stop()->native_stop_other_cpus() on x86 disables local APIC.
> So no fun anymore.
>
> If I keep APIC enabled on panic CPU, then I have my keyboard working,
> including PageUp-PageDown scrolling, sysrq handling, and so on.

Meeh, it's much harder than this. Worked on one machine only.

> PeterZ,
>   for those folks who sometimes have to use framebuffer for debugging
>   (just a trivial "let me scrollback and see the panic backtrace") and
>   not always have access to serial console, can we have local APIC
>   enabled on the panic_cpu? Or is it a terrible thing to ask for?

The question remains: can we have input irqs on panic_cpu after panic?

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-05  8:00             ` Sergey Senozhatsky
@ 2018-12-05 15:46               ` Feng Tang
  2018-12-06  3:58                 ` Feng Tang
  0 siblings, 1 reply; 25+ messages in thread
From: Feng Tang @ 2018-12-05 15:46 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Peter Zijlstra, Petr Mladek, akpm, bp, keescook, mm-commits,
	sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin,
	Andi Kleen

On Wed, Dec 05, 2018 at 05:00:44PM +0900, Sergey Senozhatsky wrote:
> On (12/05/18 14:29), Sergey Senozhatsky wrote:
> > On (12/05/18 11:57), Sergey Senozhatsky wrote:
> > > On (12/05/18 10:47), Feng Tang wrote:
> >
> > OK... So, apparently, what's happening is panic() calls smp_send_stop().
> > And smp_send_stop()->native_stop_other_cpus() on x86 disables local APIC.
> > So no fun anymore.
> >
> > If I keep APIC enabled on panic CPU, then I have my keyboard working,
> > including PageUp-PageDown scrolling, sysrq handling, and so on.
> 
> Meeh, it's much harder than this. Worked on one machine only.

Same here, I tried on several platforms and hardly get the sysrq magic key
working, though it works while system is running.

And it make me wondering if those workqueue dependent led blinking code
can still really work.

> 
> > PeterZ,
> >   for those folks who sometimes have to use framebuffer for debugging
> >   (just a trivial "let me scrollback and see the panic backtrace") and
> >   not always have access to serial console, can we have local APIC
> >   enabled on the panic_cpu? Or is it a terrible thing to ask for?
> 
> The question remains: can we have input irqs on panic_cpu after panic?

With current kernel, the irq is still enabled for the panic CPU, I did
see timer and some device's ISR get called, but they will only fire IRQ
one time after panic triggered.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-05 15:46               ` Feng Tang
@ 2018-12-06  3:58                 ` Feng Tang
  2018-12-07  9:50                   ` Sergey Senozhatsky
  0 siblings, 1 reply; 25+ messages in thread
From: Feng Tang @ 2018-12-06  3:58 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Peter Zijlstra, Petr Mladek, akpm, bp, keescook, mm-commits,
	sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin,
	Andi Kleen

On Wed, Dec 05, 2018 at 11:46:20PM +0800, Feng Tang wrote:
> On Wed, Dec 05, 2018 at 05:00:44PM +0900, Sergey Senozhatsky wrote:
> > On (12/05/18 14:29), Sergey Senozhatsky wrote:
> > > On (12/05/18 11:57), Sergey Senozhatsky wrote:
> > > > On (12/05/18 10:47), Feng Tang wrote:
> > >
> > > OK... So, apparently, what's happening is panic() calls smp_send_stop().
> > > And smp_send_stop()->native_stop_other_cpus() on x86 disables local APIC.
> > > So no fun anymore.
> > >
> > > If I keep APIC enabled on panic CPU, then I have my keyboard working,
> > > including PageUp-PageDown scrolling, sysrq handling, and so on.
> > 
> > Meeh, it's much harder than this. Worked on one machine only.
> 
> Same here, I tried on several platforms and hardly get the sysrq magic key
> working, though it works while system is running.
> 
> And it make me wondering if those workqueue dependent led blinking code
> can still really work.

Also, IMHO, if we need a panic blink method, it should better be simple
and robust with only HW registers access plus delay function, as I'm not
sure if the scheduling can still work. 

Anyway, can I propose to make the "local_irq_enable" conditional and off
by default, and add a warning.

@@ -295,7 +295,10 @@ void panic(const char *fmt, ...)
 	}
 #endif
 	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
-	local_irq_enable();
+	if (panic_keep_irq_on)
+		local_irq_enable();
+	else
+		pr_emerg("Please add panic_keep_irq_on to cmdline, if you want blink/sysrq work");
 	for (i = 0; ; i += PANIC_TIMER_STEP) {
 		touch_softlockup_watchdog();
 		if (i >= i_next) {

Any comments? thanks!

- Feng

> 
> > 
> > > PeterZ,
> > >   for those folks who sometimes have to use framebuffer for debugging
> > >   (just a trivial "let me scrollback and see the panic backtrace") and
> > >   not always have access to serial console, can we have local APIC
> > >   enabled on the panic_cpu? Or is it a terrible thing to ask for?
> > 
> > The question remains: can we have input irqs on panic_cpu after panic?
> 
> With current kernel, the irq is still enabled for the panic CPU, I did
> see timer and some device's ISR get called, but they will only fire IRQ
> one time after panic triggered.
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-06  3:58                 ` Feng Tang
@ 2018-12-07  9:50                   ` Sergey Senozhatsky
  2018-12-10  9:45                     ` Feng Tang
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2018-12-07  9:50 UTC (permalink / raw)
  To: Feng Tang
  Cc: Sergey Senozhatsky, Peter Zijlstra, Petr Mladek, akpm, bp,
	keescook, mm-commits, sergey.senozhatsky, stable, tglx,
	Steven Rostedt, Sasha Levin, Andi Kleen

On (12/06/18 11:58), Feng Tang wrote:
> > Same here, I tried on several platforms and hardly get the sysrq magic key
> > working, though it works while system is running.
> > 
> > And it make me wondering if those workqueue dependent led blinking code
> > can still really work.
> 
> Also, IMHO, if we need a panic blink method, it should better be simple
> and robust with only HW registers access plus delay function, as I'm not
> sure if the scheduling can still work. 
> 
> Anyway, can I propose to make the "local_irq_enable" conditional and off
> by default, and add a warning.

I'm not sure what to do about this. I think that the behaviour is platform
specific. For instance, arm64 keeps secondary CPUs in a busy loop
	while (1)
		cpu_relax();

(masked out) and on panic_cpu disables only SDEI (interrupts from firmware,
if I got it right); so it seems that arm64 can handle IRQs after panic. And
if there are platforms that handle IRQ (including sysrq) after panic, then
both options - making printk a noop or keeping local irqs off - maybe can
cause some problems. Or maybe not. We better ask arch people.

Personally, on my x86 laptop, I'd prefer the srollback to work after panic.
Just my 5 cents.

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-07  9:50                   ` Sergey Senozhatsky
@ 2018-12-10  9:45                     ` Feng Tang
  2018-12-10 15:57                       ` Petr Mladek
  2018-12-11  8:00                       ` Sergey Senozhatsky
  0 siblings, 2 replies; 25+ messages in thread
From: Feng Tang @ 2018-12-10  9:45 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Peter Zijlstra, Petr Mladek, akpm, bp, keescook, mm-commits,
	sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin,
	Andi Kleen, linux-kernel

Hi Sergey,

+ lkml.

On Fri, Dec 07, 2018 at 06:50:04PM +0900, Sergey Senozhatsky wrote:
> On (12/06/18 11:58), Feng Tang wrote:
> > > Same here, I tried on several platforms and hardly get the sysrq magic key
> > > working, though it works while system is running.
> > > 
> > > And it make me wondering if those workqueue dependent led blinking code
> > > can still really work.
> > 
> > Also, IMHO, if we need a panic blink method, it should better be simple
> > and robust with only HW registers access plus delay function, as I'm not
> > sure if the scheduling can still work. 
> > 
> > Anyway, can I propose to make the "local_irq_enable" conditional and off
> > by default, and add a warning.
> 
> I'm not sure what to do about this. I think that the behaviour is platform
> specific. For instance, arm64 keeps secondary CPUs in a busy loop
> 	while (1)
> 		cpu_relax();

Yes, it's similar to x86's handling for non-panic CPU.

> 
> (masked out) and on panic_cpu disables only SDEI (interrupts from firmware,
> if I got it right); so it seems that arm64 can handle IRQs after panic. And
> if there are platforms that handle IRQ (including sysrq) after panic, then
> both options - making printk a noop or keeping local irqs off - maybe can
> cause some problems. Or maybe not. We better ask arch people.

Yes, this is very valid concern. And after Petr and you raised it, I did
some experiments with 3 x86 platforms at my hand, one Apollolake IOT device
with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key
all works well before panic, and fails after panic. But I did remember the
PageUp/PageDown key worked on some laptop years ago. And you actually raised a
good question: what do we expect for the post-panic kernel? 

For the v4 patch, my thought is, for experienced developers to make
sysrq/panic_blink work, it's easy to add "panic_keep_irq_on" to kernel cmdline,
or runtime change it by 
 "echo Y > /sys/module/kernel/parameters/panic_keep_irq_on"
while for normal user, they can by default see the clean panic call stack
either on a screen or a serial console. 

Thanks,
Feng

> 
> Personally, on my x86 laptop, I'd prefer the srollback to work after panic.
> Just my 5 cents.
> 
> 	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-10  9:45                     ` Feng Tang
@ 2018-12-10 15:57                       ` Petr Mladek
  2018-12-11  8:07                         ` Sergey Senozhatsky
  2018-12-11  8:00                       ` Sergey Senozhatsky
  1 sibling, 1 reply; 25+ messages in thread
From: Petr Mladek @ 2018-12-10 15:57 UTC (permalink / raw)
  To: Feng Tang
  Cc: Sergey Senozhatsky, Peter Zijlstra, akpm, bp, keescook,
	mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt,
	Sasha Levin, Andi Kleen, linux-kernel

On Mon 2018-12-10 17:45:54, Feng Tang wrote:
> Hi Sergey,
> 
> + lkml.
> 
> On Fri, Dec 07, 2018 at 06:50:04PM +0900, Sergey Senozhatsky wrote:
> > On (12/06/18 11:58), Feng Tang wrote:
> > > > Same here, I tried on several platforms and hardly get the sysrq magic key
> > > > working, though it works while system is running.
> > > > 
> > > > And it make me wondering if those workqueue dependent led blinking code
> > > > can still really work.
> > > 
> > > Also, IMHO, if we need a panic blink method, it should better be simple
> > > and robust with only HW registers access plus delay function, as I'm not
> > > sure if the scheduling can still work. 
> > > 
> > > Anyway, can I propose to make the "local_irq_enable" conditional and off
> > > by default, and add a warning.
> > 
> > I'm not sure what to do about this. I think that the behaviour is platform
> > specific. For instance, arm64 keeps secondary CPUs in a busy loop
> > 	while (1)
> > 		cpu_relax();
> 
> Yes, it's similar to x86's handling for non-panic CPU.
> 
> > 
> > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware,
> > if I got it right); so it seems that arm64 can handle IRQs after panic. And
> > if there are platforms that handle IRQ (including sysrq) after panic, then
> > both options - making printk a noop or keeping local irqs off - maybe can
> > cause some problems. Or maybe not. We better ask arch people.
> 
> Yes, this is very valid concern. And after Petr and you raised it, I did
> some experiments with 3 x86 platforms at my hand, one Apollolake IOT device
> with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key
> all works well before panic, and fails after panic. But I did remember the
> PageUp/PageDown key worked on some laptop years ago. And you actually raised a
> good question: what do we expect for the post-panic kernel?

I am not sure why it does not work. But it would be nice if sysrq
worked.


> For the v4 patch, my thought is, for experienced developers to make
> sysrq/panic_blink work, it's easy to add "panic_keep_irq_on" to kernel cmdline,
> or runtime change it by
>  "echo Y > /sys/module/kernel/parameters/panic_keep_irq_on"
> while for normal user, they can by default see the clean panic call stack
> either on a screen or a serial console.

I see this problematic from two reasons:

First, the blinking and especially sysrq might be handy but they
are disabled by default. It is too late to do anything when the
system is panicing.

Second, new parameters or configure options should be avoided
whenever possible. They are easy to implement but the are less
easy to use. The current amount of them is already overhelming.


I still think that calming down printk() is acceptable when
it can be restored from sysrq.

I think that only few people might be interested into debugging
post-panic problems. We could print a warning for them about
that printk() has got disabled.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-10  9:45                     ` Feng Tang
  2018-12-10 15:57                       ` Petr Mladek
@ 2018-12-11  8:00                       ` Sergey Senozhatsky
  1 sibling, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2018-12-11  8:00 UTC (permalink / raw)
  To: Feng Tang
  Cc: Sergey Senozhatsky, Peter Zijlstra, Petr Mladek, akpm, bp,
	keescook, mm-commits, sergey.senozhatsky, stable, tglx,
	Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel

Hi,

On (12/10/18 17:45), Feng Tang wrote:
> Yes, this is very valid concern. And after Petr and you raised it, I did
> some experiments with 3 x86 platforms at my hand, one Apollolake IOT device
> with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key
> all works well before panic, and fails after panic. But I did remember the
> PageUp/PageDown key worked on some laptop years ago. And you actually raised a
> good question: what do we expect for the post-panic kernel?

Yeah.
It used to be case that people expected some things to work after panic.

> For the v4 patch, my thought is, for experienced developers to make
> sysrq/panic_blink work, it's easy to add "panic_keep_irq_on" to kernel cmdline,
> or runtime change it by
>  "echo Y > /sys/module/kernel/parameters/panic_keep_irq_on"
> while for normal user, they can by default see the clean panic call stack
> either on a screen or a serial console.

Before we move on, just a quick question, since I wasn't Cc-ed to v1
and v2 of this patch - did you have a chance to ask x86 people if they
can help in any way? Asking to make sure that we are not fixing a _maybe_
x86-specific problem in arch-independent/common code.


/* offtopic */

LOL, wish this was a "dumb-and-ugly-solutions" contest; I'm pretty sure
I'd take the first prize with this one:

---
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 04adc8d60aed..40f643bb7fdc 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -181,6 +181,16 @@ asmlinkage __visible void smp_reboot_interrupt(void)
 	irq_exit();
 }
 
+static void native_smp_suppress_reschedule(int cpu)
+{
+}
+
+static void native_smp_to_up(void)
+{
+	WARN_ON_ONCE(num_online_cpus() > 1);
+	smp_ops.smp_send_reschedule = native_smp_suppress_reschedule;
+}
+
 static void native_stop_other_cpus(int wait)
 {
 	unsigned long flags;
@@ -250,6 +260,7 @@ static void native_stop_other_cpus(int wait)
 	local_irq_save(flags);
 	disable_local_APIC();
 	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
+	native_smp_to_up();
 	local_irq_restore(flags);
 }
---

If the system is not SMP anymore (hlt non-panic CPUs) - rewrite
some smp_ops pointers to NOOP stubs to suppress some of those
warnings.

I know it's utterly awful.

	-ss

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-10 15:57                       ` Petr Mladek
@ 2018-12-11  8:07                         ` Sergey Senozhatsky
  2018-12-11  8:22                           ` Petr Mladek
  2018-12-11  8:32                           ` Feng Tang
  0 siblings, 2 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2018-12-11  8:07 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Feng Tang, Sergey Senozhatsky, Peter Zijlstra, akpm, bp,
	keescook, mm-commits, sergey.senozhatsky, stable, tglx,
	Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel

On (12/10/18 16:57), Petr Mladek wrote:
> > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware,
> > > if I got it right); so it seems that arm64 can handle IRQs after panic. And
> > > if there are platforms that handle IRQ (including sysrq) after panic, then
> > > both options - making printk a noop or keeping local irqs off - maybe can
> > > cause some problems. Or maybe not. We better ask arch people.
> > 
> > Yes, this is very valid concern. And after Petr and you raised it, I did
> > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device
> > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key
> > all works well before panic, and fails after panic. But I did remember the
> > PageUp/PageDown key worked on some laptop years ago. And you actually raised a
> > good question: what do we expect for the post-panic kernel?
> 
> I am not sure why it does not work. But it would be nice if sysrq
> worked.

Absolutely.

[..]
> I still think that calming down printk() is acceptable when
> it can be restored from sysrq.

I would agree; peeking one of the two solutions, printk patch is
probably preferable.

> I think that only few people might be interested into debugging
> post-panic problems. We could print a warning for them about
> that printk() has got disabled.

Dunno. This _maybe_ (speculation!) can upset folks on those platforms
that have sysrq working after panic. printk is a common code.

I'm probably missing a lot of things here, but just in case, I'm not
sure at which point the idea of patching some files under arch/x86
directory was ruled out and why.

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-11  8:07                         ` Sergey Senozhatsky
@ 2018-12-11  8:22                           ` Petr Mladek
  2018-12-11  8:26                             ` Sergey Senozhatsky
  2018-12-11  8:32                           ` Feng Tang
  1 sibling, 1 reply; 25+ messages in thread
From: Petr Mladek @ 2018-12-11  8:22 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Feng Tang, Peter Zijlstra, akpm, bp, keescook, mm-commits,
	sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin,
	Andi Kleen, linux-kernel

On Tue 2018-12-11 17:07:43, Sergey Senozhatsky wrote:
> On (12/10/18 16:57), Petr Mladek wrote:
> > > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware,
> > > > if I got it right); so it seems that arm64 can handle IRQs after panic. And
> > > > if there are platforms that handle IRQ (including sysrq) after panic, then
> > > > both options - making printk a noop or keeping local irqs off - maybe can
> > > > cause some problems. Or maybe not. We better ask arch people.
> > > 
> > > Yes, this is very valid concern. And after Petr and you raised it, I did
> > > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device
> > > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key
> > > all works well before panic, and fails after panic. But I did remember the
> > > PageUp/PageDown key worked on some laptop years ago. And you actually raised a
> > > good question: what do we expect for the post-panic kernel?
> > 
> > I am not sure why it does not work. But it would be nice if sysrq
> > worked.
> 
> Absolutely.
> 
> [..]
> > I still think that calming down printk() is acceptable when
> > it can be restored from sysrq.
> 
> I would agree; peeking one of the two solutions, printk patch is
> probably preferable.
> 
> > I think that only few people might be interested into debugging
> > post-panic problems. We could print a warning for them about
> > that printk() has got disabled.
> 
> Dunno. This _maybe_ (speculation!) can upset folks on those platforms
> that have sysrq working after panic. printk is a common code.
> 
> I'm probably missing a lot of things here, but just in case, I'm not
> sure at which point the idea of patching some files under arch/x86
> directory was ruled out and why.

I suggested to clear the panic_blinking (or whatever name) in
__handle_sysrq(). The idea is that sysrq needs manual intervention.
It allows to see the original message before it gets overridden
by a potential sysrq-related output.

It assumes that sysrq is the only interesting operation when
printk() might be useful at this state.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-11  8:22                           ` Petr Mladek
@ 2018-12-11  8:26                             ` Sergey Senozhatsky
  0 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2018-12-11  8:26 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Feng Tang, Peter Zijlstra, akpm, bp,
	keescook, mm-commits, sergey.senozhatsky, stable, tglx,
	Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel

On (12/11/18 09:22), Petr Mladek wrote:
> I suggested to clear the panic_blinking (or whatever name) in
> __handle_sysrq(). The idea is that sysrq needs manual intervention.
> It allows to see the original message before it gets overridden
> by a potential sysrq-related output.

Ah, indeed. Sounds good.

> It assumes that sysrq is the only interesting operation when
> printk() might be useful at this state.

Agreed. Makes sense.

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-11  8:07                         ` Sergey Senozhatsky
  2018-12-11  8:22                           ` Petr Mladek
@ 2018-12-11  8:32                           ` Feng Tang
  2018-12-11  9:08                             ` Sergey Senozhatsky
  1 sibling, 1 reply; 25+ messages in thread
From: Feng Tang @ 2018-12-11  8:32 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Peter Zijlstra, akpm, bp, keescook, mm-commits,
	sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin,
	Andi Kleen, linux-kernel

Hi Sergey,

On Tue, Dec 11, 2018 at 05:07:43PM +0900, Sergey Senozhatsky wrote:
> On (12/10/18 16:57), Petr Mladek wrote:
> > > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware,
> > > > if I got it right); so it seems that arm64 can handle IRQs after panic. And
> > > > if there are platforms that handle IRQ (including sysrq) after panic, then
> > > > both options - making printk a noop or keeping local irqs off - maybe can
> > > > cause some problems. Or maybe not. We better ask arch people.
> > > 
> > > Yes, this is very valid concern. And after Petr and you raised it, I did
> > > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device
> > > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key
> > > all works well before panic, and fails after panic. But I did remember the
> > > PageUp/PageDown key worked on some laptop years ago. And you actually raised a
> > > good question: what do we expect for the post-panic kernel?
> > 
> > I am not sure why it does not work. But it would be nice if sysrq
> > worked.
> 
> Absolutely.
> 
> [..]
> > I still think that calming down printk() is acceptable when
> > it can be restored from sysrq.
> 
> I would agree; peeking one of the two solutions, printk patch is
> probably preferable.
> 
> > I think that only few people might be interested into debugging
> > post-panic problems. We could print a warning for them about
> > that printk() has got disabled.
> 
> Dunno. This _maybe_ (speculation!) can upset folks on those platforms
> that have sysrq working after panic. printk is a common code.
> 
> I'm probably missing a lot of things here, but just in case, I'm not
> sure at which point the idea of patching some files under arch/x86
> directory was ruled out and why.

Here is the v1 patch: https://lkml.org/lkml/2018/10/11/304 

And actually no one ruled out the v1 patch :), I don't have HW of other
archs like arm/ppc, so I just read some of the arch code, and found
most of them use the similar flow like x86, that's why I chosed to
finding a soluton inside panic.c itself.

Thanks,
Feng


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-11  8:32                           ` Feng Tang
@ 2018-12-11  9:08                             ` Sergey Senozhatsky
  0 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2018-12-11  9:08 UTC (permalink / raw)
  To: Feng Tang, Peter Zijlstra, Petr Mladek, Steven Rostedt,
	Thomas Gleixner, Frederic Weisbecker
  Cc: Sergey Senozhatsky, akpm, bp, keescook, mm-commits,
	sergey.senozhatsky, stable, Sasha Levin, Andi Kleen,
	linux-kernel

Adding Frederic,

https://lkml.org/lkml/2018/10/11/304


On (12/11/18 16:32), Feng Tang wrote:
> Here is the v1 patch: https://lkml.org/lkml/2018/10/11/304
>
> And actually no one ruled out the v1 patch :), I don't have HW of other
> archs like arm/ppc, so I just read some of the arch code, and found
> most of them use the similar flow like x86, that's why I chosed to
> finding a soluton inside panic.c itself.

Interesting. So if the problem is that we need to clear cpu bit in
several cpumaks (e.g. nohz.idle_cpus_mask) when we stop_this_cpu(),
then I'd say let's clear cpumasks which are needed to be clear (doing
some of the things which sched_cpu_dying() does, except that we need
it on !CONFIG_HOTPLUG_CPU systems too). The idea of notifiers also looks
interesting.


x86 and sched gurus, can you please help?

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-12-11  9:08 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-04  7:15 + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree akpm
2018-12-04 10:10 ` Sergey Senozhatsky
2018-12-04 10:20 ` Petr Mladek
2018-12-04 15:49   ` Feng Tang
2018-12-04 16:01     ` Petr Mladek
2018-12-05  1:53       ` Feng Tang
2018-12-05  2:50         ` Sergey Senozhatsky
2018-12-05  3:05           ` Sergey Senozhatsky
2018-12-05  3:27             ` Feng Tang
2018-12-05  2:26     ` Sergey Senozhatsky
2018-12-05  2:47       ` Feng Tang
2018-12-05  2:57         ` Sergey Senozhatsky
2018-12-05  5:29           ` Sergey Senozhatsky
2018-12-05  8:00             ` Sergey Senozhatsky
2018-12-05 15:46               ` Feng Tang
2018-12-06  3:58                 ` Feng Tang
2018-12-07  9:50                   ` Sergey Senozhatsky
2018-12-10  9:45                     ` Feng Tang
2018-12-10 15:57                       ` Petr Mladek
2018-12-11  8:07                         ` Sergey Senozhatsky
2018-12-11  8:22                           ` Petr Mladek
2018-12-11  8:26                             ` Sergey Senozhatsky
2018-12-11  8:32                           ` Feng Tang
2018-12-11  9:08                             ` Sergey Senozhatsky
2018-12-11  8:00                       ` Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.