linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] panic: Ensure preemption is disabled during panic()
@ 2019-10-02 12:35 Will Deacon
  2019-10-02 20:58 ` Kees Cook
  2019-10-02 21:45 ` Andrew Morton
  0 siblings, 2 replies; 11+ messages in thread
From: Will Deacon @ 2019-10-02 12:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arm-kernel, contact, Will Deacon, Russell King,
	Greg Kroah-Hartman, Ingo Molnar, Kees Cook, Andrew Morton,
	stable

Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
calling CPU in an infinite loop, but with interrupts and preemption
enabled. From this state, userspace can continue to be scheduled,
despite the system being "dead" as far as the kernel is concerned. This
is easily reproducible on arm64 when booting with "nosmp" on the command
line; a couple of shell scripts print out a periodic "Ping" message
whilst another triggers a crash by writing to /proc/sysrq-trigger:

  | sysrq: Trigger a crash
  | Kernel panic - not syncing: sysrq triggered crash
  | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
  | Hardware name: linux,dummy-virt (DT)
  | Call trace:
  |  dump_backtrace+0x0/0x148
  |  show_stack+0x14/0x20
  |  dump_stack+0xa0/0xc4
  |  panic+0x140/0x32c
  |  sysrq_handle_reboot+0x0/0x20
  |  __handle_sysrq+0x124/0x190
  |  write_sysrq_trigger+0x64/0x88
  |  proc_reg_write+0x60/0xa8
  |  __vfs_write+0x18/0x40
  |  vfs_write+0xa4/0x1b8
  |  ksys_write+0x64/0xf0
  |  __arm64_sys_write+0x14/0x20
  |  el0_svc_common.constprop.0+0xb0/0x168
  |  el0_svc_handler+0x28/0x78
  |  el0_svc+0x8/0xc
  | Kernel Offset: disabled
  | CPU features: 0x0002,24002004
  | Memory Limit: none
  | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
  |  Ping 2!
  |  Ping 1!
  |  Ping 1!
  |  Ping 2!

The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise
local interrupts are disabled in 'smp_send_stop()'.

Disable preemption in 'panic()' before re-enabling interrupts.

Cc: Russell King <linux@armlinux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone
Reported-by: Xogium <contact@xogium.me>
Signed-off-by: Will Deacon <will@kernel.org>
---
 kernel/panic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/panic.c b/kernel/panic.c
index 47e8ebccc22b..f470a038b05b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -180,6 +180,7 @@ void panic(const char *fmt, ...)
 	 * after setting panic_cpu) from invoking panic() again.
 	 */
 	local_irq_disable();
+	preempt_disable_notrace();
 
 	/*
 	 * It's possible to come here directly from a panic-assertion and
-- 
2.23.0.444.g18eeb5a265-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] panic: Ensure preemption is disabled during panic()
  2019-10-02 12:35 [PATCH] panic: Ensure preemption is disabled during panic() Will Deacon
@ 2019-10-02 20:58 ` Kees Cook
  2019-10-03 20:56   ` Will Deacon
  2019-10-02 21:45 ` Andrew Morton
  1 sibling, 1 reply; 11+ messages in thread
From: Kees Cook @ 2019-10-02 20:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-arm-kernel, contact, Russell King,
	Greg Kroah-Hartman, Ingo Molnar, Andrew Morton, stable,
	Feng Tang, Petr Mladek, Steven Rostedt, Sergey Senozhatsky

On Wed, Oct 02, 2019 at 01:35:38PM +0100, Will Deacon wrote:
> Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
> calling CPU in an infinite loop, but with interrupts and preemption
> enabled. From this state, userspace can continue to be scheduled,
> despite the system being "dead" as far as the kernel is concerned. This
> is easily reproducible on arm64 when booting with "nosmp" on the command
> line; a couple of shell scripts print out a periodic "Ping" message
> whilst another triggers a crash by writing to /proc/sysrq-trigger:
> 
>   | sysrq: Trigger a crash
>   | Kernel panic - not syncing: sysrq triggered crash
>   | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
>   | Hardware name: linux,dummy-virt (DT)
>   | Call trace:
>   |  dump_backtrace+0x0/0x148
>   |  show_stack+0x14/0x20
>   |  dump_stack+0xa0/0xc4
>   |  panic+0x140/0x32c
>   |  sysrq_handle_reboot+0x0/0x20
>   |  __handle_sysrq+0x124/0x190
>   |  write_sysrq_trigger+0x64/0x88
>   |  proc_reg_write+0x60/0xa8
>   |  __vfs_write+0x18/0x40
>   |  vfs_write+0xa4/0x1b8
>   |  ksys_write+0x64/0xf0
>   |  __arm64_sys_write+0x14/0x20
>   |  el0_svc_common.constprop.0+0xb0/0x168
>   |  el0_svc_handler+0x28/0x78
>   |  el0_svc+0x8/0xc
>   | Kernel Offset: disabled
>   | CPU features: 0x0002,24002004
>   | Memory Limit: none
>   | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
>   |  Ping 2!
>   |  Ping 1!
>   |  Ping 1!
>   |  Ping 2!
> 
> The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise
> local interrupts are disabled in 'smp_send_stop()'.
> 
> Disable preemption in 'panic()' before re-enabling interrupts.

Is this perhaps the correct solution for what commit c39ea0b9dd24 ("panic:
avoid the extra noise dmesg") was trying to fix?

Either way:

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

> 
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: <stable@vger.kernel.org>
> Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone
> Reported-by: Xogium <contact@xogium.me>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  kernel/panic.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 47e8ebccc22b..f470a038b05b 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -180,6 +180,7 @@ void panic(const char *fmt, ...)
>  	 * after setting panic_cpu) from invoking panic() again.
>  	 */
>  	local_irq_disable();
> +	preempt_disable_notrace();
>  
>  	/*
>  	 * It's possible to come here directly from a panic-assertion and
> -- 
> 2.23.0.444.g18eeb5a265-goog
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] panic: Ensure preemption is disabled during panic()
  2019-10-02 12:35 [PATCH] panic: Ensure preemption is disabled during panic() Will Deacon
  2019-10-02 20:58 ` Kees Cook
@ 2019-10-02 21:45 ` Andrew Morton
  2019-10-03 20:53   ` Will Deacon
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2019-10-02 21:45 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-arm-kernel, contact, Russell King,
	Greg Kroah-Hartman, Ingo Molnar, Kees Cook, stable

On Wed,  2 Oct 2019 13:35:38 +0100 Will Deacon <will@kernel.org> wrote:

> Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
> calling CPU in an infinite loop, but with interrupts and preemption
> enabled. From this state, userspace can continue to be scheduled,
> despite the system being "dead" as far as the kernel is concerned. This
> is easily reproducible on arm64 when booting with "nosmp" on the command
> line; a couple of shell scripts print out a periodic "Ping" message
> whilst another triggers a crash by writing to /proc/sysrq-trigger:
> 
>   | sysrq: Trigger a crash
>   | Kernel panic - not syncing: sysrq triggered crash
>   | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
>   | Hardware name: linux,dummy-virt (DT)
>   | Call trace:
>   |  dump_backtrace+0x0/0x148
>   |  show_stack+0x14/0x20
>   |  dump_stack+0xa0/0xc4
>   |  panic+0x140/0x32c
>   |  sysrq_handle_reboot+0x0/0x20
>   |  __handle_sysrq+0x124/0x190
>   |  write_sysrq_trigger+0x64/0x88
>   |  proc_reg_write+0x60/0xa8
>   |  __vfs_write+0x18/0x40
>   |  vfs_write+0xa4/0x1b8
>   |  ksys_write+0x64/0xf0
>   |  __arm64_sys_write+0x14/0x20
>   |  el0_svc_common.constprop.0+0xb0/0x168
>   |  el0_svc_handler+0x28/0x78
>   |  el0_svc+0x8/0xc
>   | Kernel Offset: disabled
>   | CPU features: 0x0002,24002004
>   | Memory Limit: none
>   | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
>   |  Ping 2!
>   |  Ping 1!
>   |  Ping 1!
>   |  Ping 2!
> 
> The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise
> local interrupts are disabled in 'smp_send_stop()'.
> 
> Disable preemption in 'panic()' before re-enabling interrupts.
> 
> ...
>
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -180,6 +180,7 @@ void panic(const char *fmt, ...)
>  	 * after setting panic_cpu) from invoking panic() again.
>  	 */
>  	local_irq_disable();
> +	preempt_disable_notrace();
>  
>  	/*
>  	 * It's possible to come here directly from a panic-assertion and

We still do a lot of stuff (kexec, kgdb, etc) after this
preempt_disable() and I worry that something in there will now trigger
a might_sleep() warning as a result?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] panic: Ensure preemption is disabled during panic()
  2019-10-02 21:45 ` Andrew Morton
@ 2019-10-03 20:53   ` Will Deacon
  0 siblings, 0 replies; 11+ messages in thread
From: Will Deacon @ 2019-10-03 20:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-arm-kernel, contact, Russell King,
	Greg Kroah-Hartman, Ingo Molnar, Kees Cook, stable

Hi Andrew,

Thanks for having a look.

On Wed, Oct 02, 2019 at 02:45:58PM -0700, Andrew Morton wrote:
> On Wed,  2 Oct 2019 13:35:38 +0100 Will Deacon <will@kernel.org> wrote:
> > Disable preemption in 'panic()' before re-enabling interrupts.
> > 
> > ...
> >
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -180,6 +180,7 @@ void panic(const char *fmt, ...)
> >  	 * after setting panic_cpu) from invoking panic() again.
> >  	 */
> >  	local_irq_disable();
> > +	preempt_disable_notrace();
> >  
> >  	/*
> >  	 * It's possible to come here directly from a panic-assertion and
> 
> We still do a lot of stuff (kexec, kgdb, etc) after this
> preempt_disable() and I worry that something in there will now trigger
> a might_sleep() warning as a result?

Given that interrupts are already disabled at this point, I don't think
we'll get any additional warnings here by disabling preemption as well.

Will

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] panic: Ensure preemption is disabled during panic()
  2019-10-02 20:58 ` Kees Cook
@ 2019-10-03 20:56   ` Will Deacon
  2019-10-04  9:11     ` Petr Mladek
  0 siblings, 1 reply; 11+ messages in thread
From: Will Deacon @ 2019-10-03 20:56 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, linux-arm-kernel, contact, Russell King,
	Greg Kroah-Hartman, Ingo Molnar, Andrew Morton, stable,
	Feng Tang, Petr Mladek, Steven Rostedt, Sergey Senozhatsky

Hi Kees,

On Wed, Oct 02, 2019 at 01:58:46PM -0700, Kees Cook wrote:
> On Wed, Oct 02, 2019 at 01:35:38PM +0100, Will Deacon wrote:
> > Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
> > calling CPU in an infinite loop, but with interrupts and preemption
> > enabled. From this state, userspace can continue to be scheduled,
> > despite the system being "dead" as far as the kernel is concerned. This
> > is easily reproducible on arm64 when booting with "nosmp" on the command
> > line; a couple of shell scripts print out a periodic "Ping" message
> > whilst another triggers a crash by writing to /proc/sysrq-trigger:
> > 
> >   | sysrq: Trigger a crash
> >   | Kernel panic - not syncing: sysrq triggered crash
> >   | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
> >   | Hardware name: linux,dummy-virt (DT)
> >   | Call trace:
> >   |  dump_backtrace+0x0/0x148
> >   |  show_stack+0x14/0x20
> >   |  dump_stack+0xa0/0xc4
> >   |  panic+0x140/0x32c
> >   |  sysrq_handle_reboot+0x0/0x20
> >   |  __handle_sysrq+0x124/0x190
> >   |  write_sysrq_trigger+0x64/0x88
> >   |  proc_reg_write+0x60/0xa8
> >   |  __vfs_write+0x18/0x40
> >   |  vfs_write+0xa4/0x1b8
> >   |  ksys_write+0x64/0xf0
> >   |  __arm64_sys_write+0x14/0x20
> >   |  el0_svc_common.constprop.0+0xb0/0x168
> >   |  el0_svc_handler+0x28/0x78
> >   |  el0_svc+0x8/0xc
> >   | Kernel Offset: disabled
> >   | CPU features: 0x0002,24002004
> >   | Memory Limit: none
> >   | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
> >   |  Ping 2!
> >   |  Ping 1!
> >   |  Ping 1!
> >   |  Ping 2!
> > 
> > The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise
> > local interrupts are disabled in 'smp_send_stop()'.
> > 
> > Disable preemption in 'panic()' before re-enabling interrupts.
> 
> Is this perhaps the correct solution for what commit c39ea0b9dd24 ("panic:
> avoid the extra noise dmesg") was trying to fix?

Hmm, maybe, although that looks like it's focussed more on irq handling
than preemption. I've deliberately left the irq part alone, since I think
having magic sysrq work via the keyboard interrupt is desirable from the
panic loop.

> Reviewed-by: Kees Cook <keescook@chromium.org>

Thanks!

Will

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] panic: Ensure preemption is disabled during panic()
  2019-10-03 20:56   ` Will Deacon
@ 2019-10-04  9:11     ` Petr Mladek
  2019-10-04  9:29       ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Mladek @ 2019-10-04  9:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: Kees Cook, Russell King, Sergey Senozhatsky, Steven Rostedt,
	Feng Tang, Andrew Morton, Greg Kroah-Hartman, linux-arm-kernel,
	Ingo Molnar, linux-kernel, stable, contact

On Thu 2019-10-03 21:56:34, Will Deacon wrote:
> Hi Kees,
> 
> On Wed, Oct 02, 2019 at 01:58:46PM -0700, Kees Cook wrote:
> > On Wed, Oct 02, 2019 at 01:35:38PM +0100, Will Deacon wrote:
> > > Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
> > > calling CPU in an infinite loop, but with interrupts and preemption
> > > enabled. From this state, userspace can continue to be scheduled,
> > > despite the system being "dead" as far as the kernel is concerned. This
> > > is easily reproducible on arm64 when booting with "nosmp" on the command
> > > line; a couple of shell scripts print out a periodic "Ping" message
> > > whilst another triggers a crash by writing to /proc/sysrq-trigger:
> > > 
> > >   | sysrq: Trigger a crash
> > >   | Kernel panic - not syncing: sysrq triggered crash
> > >   | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
> > >   | Hardware name: linux,dummy-virt (DT)
> > >   | Call trace:
> > >   |  dump_backtrace+0x0/0x148
> > >   |  show_stack+0x14/0x20
> > >   |  dump_stack+0xa0/0xc4
> > >   |  panic+0x140/0x32c
> > >   |  sysrq_handle_reboot+0x0/0x20
> > >   |  __handle_sysrq+0x124/0x190
> > >   |  write_sysrq_trigger+0x64/0x88
> > >   |  proc_reg_write+0x60/0xa8
> > >   |  __vfs_write+0x18/0x40
> > >   |  vfs_write+0xa4/0x1b8
> > >   |  ksys_write+0x64/0xf0
> > >   |  __arm64_sys_write+0x14/0x20
> > >   |  el0_svc_common.constprop.0+0xb0/0x168
> > >   |  el0_svc_handler+0x28/0x78
> > >   |  el0_svc+0x8/0xc
> > >   | Kernel Offset: disabled
> > >   | CPU features: 0x0002,24002004
> > >   | Memory Limit: none
> > >   | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
> > >   |  Ping 2!
> > >   |  Ping 1!
> > >   |  Ping 1!
> > >   |  Ping 2!
> > > 
> > > The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise
> > > local interrupts are disabled in 'smp_send_stop()'.
> > > 
> > > Disable preemption in 'panic()' before re-enabling interrupts.
> > 
> > Is this perhaps the correct solution for what commit c39ea0b9dd24 ("panic:
> > avoid the extra noise dmesg") was trying to fix?
> 
> Hmm, maybe, although that looks like it's focussed more on irq handling
> than preemption.

Exactly, the backtrace mentioned in commit c39ea0b9dd24 ("panic: avoid
the extra noise dmesg") is printed by wake_up() called from
wake_up_klogd_work_func(). It is irq_work. Therefore disabling
preemption would not prevent this.


> I've deliberately left the irq part alone, since I think
> having magic sysrq work via the keyboard interrupt is desirable from the
> panic loop.

I agree that we should keep sysrq working.

One pity thing is that led_panic_blink() in
leds/drivers/trigger/ledtrig-panic.c uses workqueues:

  + led_panic_blink()
    + led_trigger_event()
      + led_set_brightness()
	+ schedule_work()

It means that it depends on the scheduler. I guess that it
does not work in many panic situations. But this patch
will always block it.

I agree that it is strange that userspace still works at
this stage. But does it cause any real problems?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] panic: Ensure preemption is disabled during panic()
  2019-10-04  9:11     ` Petr Mladek
@ 2019-10-04  9:29       ` Russell King - ARM Linux admin
  2019-10-04 10:49         ` Will Deacon
  0 siblings, 1 reply; 11+ messages in thread
From: Russell King - ARM Linux admin @ 2019-10-04  9:29 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Will Deacon, Kees Cook, Sergey Senozhatsky, Steven Rostedt,
	Feng Tang, Andrew Morton, Greg Kroah-Hartman, linux-arm-kernel,
	Ingo Molnar, linux-kernel, stable, contact

On Fri, Oct 04, 2019 at 11:11:42AM +0200, Petr Mladek wrote:
> On Thu 2019-10-03 21:56:34, Will Deacon wrote:
> > Hi Kees,
> > 
> > On Wed, Oct 02, 2019 at 01:58:46PM -0700, Kees Cook wrote:
> > > On Wed, Oct 02, 2019 at 01:35:38PM +0100, Will Deacon wrote:
> > > > Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
> > > > calling CPU in an infinite loop, but with interrupts and preemption
> > > > enabled. From this state, userspace can continue to be scheduled,
> > > > despite the system being "dead" as far as the kernel is concerned. This
> > > > is easily reproducible on arm64 when booting with "nosmp" on the command
> > > > line; a couple of shell scripts print out a periodic "Ping" message
> > > > whilst another triggers a crash by writing to /proc/sysrq-trigger:
> > > > 
> > > >   | sysrq: Trigger a crash
> > > >   | Kernel panic - not syncing: sysrq triggered crash
> > > >   | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
> > > >   | Hardware name: linux,dummy-virt (DT)
> > > >   | Call trace:
> > > >   |  dump_backtrace+0x0/0x148
> > > >   |  show_stack+0x14/0x20
> > > >   |  dump_stack+0xa0/0xc4
> > > >   |  panic+0x140/0x32c
> > > >   |  sysrq_handle_reboot+0x0/0x20
> > > >   |  __handle_sysrq+0x124/0x190
> > > >   |  write_sysrq_trigger+0x64/0x88
> > > >   |  proc_reg_write+0x60/0xa8
> > > >   |  __vfs_write+0x18/0x40
> > > >   |  vfs_write+0xa4/0x1b8
> > > >   |  ksys_write+0x64/0xf0
> > > >   |  __arm64_sys_write+0x14/0x20
> > > >   |  el0_svc_common.constprop.0+0xb0/0x168
> > > >   |  el0_svc_handler+0x28/0x78
> > > >   |  el0_svc+0x8/0xc
> > > >   | Kernel Offset: disabled
> > > >   | CPU features: 0x0002,24002004
> > > >   | Memory Limit: none
> > > >   | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
> > > >   |  Ping 2!
> > > >   |  Ping 1!
> > > >   |  Ping 1!
> > > >   |  Ping 2!
> > > > 
> > > > The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise
> > > > local interrupts are disabled in 'smp_send_stop()'.
> > > > 
> > > > Disable preemption in 'panic()' before re-enabling interrupts.
> > > 
> > > Is this perhaps the correct solution for what commit c39ea0b9dd24 ("panic:
> > > avoid the extra noise dmesg") was trying to fix?
> > 
> > Hmm, maybe, although that looks like it's focussed more on irq handling
> > than preemption.
> 
> Exactly, the backtrace mentioned in commit c39ea0b9dd24 ("panic: avoid
> the extra noise dmesg") is printed by wake_up() called from
> wake_up_klogd_work_func(). It is irq_work. Therefore disabling
> preemption would not prevent this.
> 
> 
> > I've deliberately left the irq part alone, since I think
> > having magic sysrq work via the keyboard interrupt is desirable from the
> > panic loop.
> 
> I agree that we should keep sysrq working.
> 
> One pity thing is that led_panic_blink() in
> leds/drivers/trigger/ledtrig-panic.c uses workqueues:
> 
>   + led_panic_blink()
>     + led_trigger_event()
>       + led_set_brightness()
> 	+ schedule_work()
> 
> It means that it depends on the scheduler. I guess that it
> does not work in many panic situations. But this patch
> will always block it.
> 
> I agree that it is strange that userspace still works at
> this stage. But does it cause any real problems?

Yes, there are watchdog drivers that continue to pat their watchdog
after the kernel has panic'd.  It makes watchdogs useless (which is
exactly how this problem was discovered.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] panic: Ensure preemption is disabled during panic()
  2019-10-04  9:29       ` Russell King - ARM Linux admin
@ 2019-10-04 10:49         ` Will Deacon
  2019-10-04 11:15           ` Petr Mladek
  2019-10-07  8:02           ` Jiri Kosina
  0 siblings, 2 replies; 11+ messages in thread
From: Will Deacon @ 2019-10-04 10:49 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Petr Mladek, Kees Cook, Sergey Senozhatsky, Steven Rostedt,
	Feng Tang, Andrew Morton, Greg Kroah-Hartman, linux-arm-kernel,
	Ingo Molnar, linux-kernel, stable, contact

On Fri, Oct 04, 2019 at 10:29:17AM +0100, Russell King - ARM Linux admin wrote:
> On Fri, Oct 04, 2019 at 11:11:42AM +0200, Petr Mladek wrote:
> > On Thu 2019-10-03 21:56:34, Will Deacon wrote:
> > > I've deliberately left the irq part alone, since I think
> > > having magic sysrq work via the keyboard interrupt is desirable from the
> > > panic loop.
> > 
> > I agree that we should keep sysrq working.
> > 
> > One pity thing is that led_panic_blink() in
> > leds/drivers/trigger/ledtrig-panic.c uses workqueues:
> > 
> >   + led_panic_blink()
> >     + led_trigger_event()
> >       + led_set_brightness()
> > 	+ schedule_work()
> > 
> > It means that it depends on the scheduler. I guess that it
> > does not work in many panic situations. But this patch
> > will always block it.
> > 
> > I agree that it is strange that userspace still works at
> > this stage. But does it cause any real problems?
> 
> Yes, there are watchdog drivers that continue to pat their watchdog
> after the kernel has panic'd.  It makes watchdogs useless (which is
> exactly how this problem was discovered.)

Indeed, and I think the LED blinking is already unreliable if the
brightness operation needs to sleep. For example, if the kernel isn't
preemptible or the work gets queued up on a different CPU which is
sitting in panic_smp_self_stop().

Will

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] panic: Ensure preemption is disabled during panic()
  2019-10-04 10:49         ` Will Deacon
@ 2019-10-04 11:15           ` Petr Mladek
  2019-10-04 13:51             ` Feng Tang
  2019-10-07  8:02           ` Jiri Kosina
  1 sibling, 1 reply; 11+ messages in thread
From: Petr Mladek @ 2019-10-04 11:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: Russell King - ARM Linux admin, Kees Cook, Sergey Senozhatsky,
	Steven Rostedt, Feng Tang, Andrew Morton, Greg Kroah-Hartman,
	linux-arm-kernel, Ingo Molnar, linux-kernel, stable, contact

On Fri 2019-10-04 11:49:48, Will Deacon wrote:
> On Fri, Oct 04, 2019 at 10:29:17AM +0100, Russell King - ARM Linux admin wrote:
> > On Fri, Oct 04, 2019 at 11:11:42AM +0200, Petr Mladek wrote:
> > > On Thu 2019-10-03 21:56:34, Will Deacon wrote:
> > > > I've deliberately left the irq part alone, since I think
> > > > having magic sysrq work via the keyboard interrupt is desirable from the
> > > > panic loop.
> > > 
> > > I agree that we should keep sysrq working.
> > > 
> > > One pity thing is that led_panic_blink() in
> > > leds/drivers/trigger/ledtrig-panic.c uses workqueues:
> > > 
> > >   + led_panic_blink()
> > >     + led_trigger_event()
> > >       + led_set_brightness()
> > > 	+ schedule_work()
> > > 
> > > It means that it depends on the scheduler. I guess that it
> > > does not work in many panic situations. But this patch
> > > will always block it.
> > > 
> > > I agree that it is strange that userspace still works at
> > > this stage. But does it cause any real problems?
> > 
> > Yes, there are watchdog drivers that continue to pat their watchdog
> > after the kernel has panic'd.  It makes watchdogs useless (which is
> > exactly how this problem was discovered.)
> 
> Indeed, and I think the LED blinking is already unreliable if the
> brightness operation needs to sleep. For example, if the kernel isn't
> preemptible or the work gets queued up on a different CPU which is
> sitting in panic_smp_self_stop().

To make it clear. I do not want to block this patch. I just wanted
to point out the problem. I am not sure how the blinking is important
these days. Well, I could imagine that it might be useful on some
embedded devices.

Another question is how many people want to end up with dead system
these days. The watchdogs are likely used in data centers. I guess
that automatic reboot in panic() is a better choice there.

Anyway, it might make sense to remove the panic blinking code when
it will not have a chance to work.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] panic: Ensure preemption is disabled during panic()
  2019-10-04 11:15           ` Petr Mladek
@ 2019-10-04 13:51             ` Feng Tang
  0 siblings, 0 replies; 11+ messages in thread
From: Feng Tang @ 2019-10-04 13:51 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Will Deacon, Russell King - ARM Linux admin, Kees Cook,
	Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
	Greg Kroah-Hartman, linux-arm-kernel, Ingo Molnar, linux-kernel,
	stable, contact

On Fri, Oct 04, 2019 at 01:15:21PM +0200, Petr Mladek wrote:
> On Fri 2019-10-04 11:49:48, Will Deacon wrote:
> > On Fri, Oct 04, 2019 at 10:29:17AM +0100, Russell King - ARM Linux admin wrote:
> > > On Fri, Oct 04, 2019 at 11:11:42AM +0200, Petr Mladek wrote:
> > > > On Thu 2019-10-03 21:56:34, Will Deacon wrote:
> > > > > I've deliberately left the irq part alone, since I think
> > > > > having magic sysrq work via the keyboard interrupt is desirable from the
> > > > > panic loop.
> > > > 
> > > > I agree that we should keep sysrq working.
> > > > 
> > > > One pity thing is that led_panic_blink() in
> > > > leds/drivers/trigger/ledtrig-panic.c uses workqueues:
> > > > 
> > > >   + led_panic_blink()
> > > >     + led_trigger_event()
> > > >       + led_set_brightness()
> > > > 	+ schedule_work()
> > > > 
> > > > It means that it depends on the scheduler. I guess that it
> > > > does not work in many panic situations. But this patch
> > > > will always block it.
> > > > 
> > > > I agree that it is strange that userspace still works at
> > > > this stage. But does it cause any real problems?
> > > 
> > > Yes, there are watchdog drivers that continue to pat their watchdog
> > > after the kernel has panic'd.  It makes watchdogs useless (which is
> > > exactly how this problem was discovered.)
> > 
> > Indeed, and I think the LED blinking is already unreliable if the
> > brightness operation needs to sleep. For example, if the kernel isn't
> > preemptible or the work gets queued up on a different CPU which is
> > sitting in panic_smp_self_stop().
> 
> To make it clear. I do not want to block this patch. I just wanted
> to point out the problem. I am not sure how the blinking is important
> these days. Well, I could imagine that it might be useful on some
> embedded devices.

When reviewing the c39ea0b9dd24 ("panic: avoid the extra noise dmesg"),
there was similar discussion about what's the expectation for kernel
when panic happens, as the earlier version patch is simply keeping the
the local irq disabled, which may break the sysrq and the panic blink
code, so we turned to handling it together with printk.

> 
> Another question is how many people want to end up with dead system
> these days. The watchdogs are likely used in data centers. I guess
> that automatic reboot in panic() is a better choice there.
> 
> Anyway, it might make sense to remove the panic blinking code when
> it will not have a chance to work.

I was also wondering if the panic blinking code still really works
on any platforms.

Thanks,
Feng

> 
> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] panic: Ensure preemption is disabled during panic()
  2019-10-04 10:49         ` Will Deacon
  2019-10-04 11:15           ` Petr Mladek
@ 2019-10-07  8:02           ` Jiri Kosina
  1 sibling, 0 replies; 11+ messages in thread
From: Jiri Kosina @ 2019-10-07  8:02 UTC (permalink / raw)
  To: Will Deacon
  Cc: Russell King - ARM Linux admin, Petr Mladek, Kees Cook,
	Sergey Senozhatsky, Steven Rostedt, Feng Tang, Andrew Morton,
	Greg Kroah-Hartman, linux-arm-kernel, Ingo Molnar, linux-kernel,
	stable, contact

On Fri, 4 Oct 2019, Will Deacon wrote:

> Indeed, and I think the LED blinking is already unreliable if the
> brightness operation needs to sleep. 

One thing is that led_set_brightness() can probably be forced to avoid the 
workqueue scheduling, by setting LED_BLINK_SW on the device (e.g. by 
issuing led_set_software_blink() during panic).

But I am afraid this still won't solve the issue completely, as USB 
keyboards need workqueues for blinking the LEDs in for URB management.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-10-07  8:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-02 12:35 [PATCH] panic: Ensure preemption is disabled during panic() Will Deacon
2019-10-02 20:58 ` Kees Cook
2019-10-03 20:56   ` Will Deacon
2019-10-04  9:11     ` Petr Mladek
2019-10-04  9:29       ` Russell King - ARM Linux admin
2019-10-04 10:49         ` Will Deacon
2019-10-04 11:15           ` Petr Mladek
2019-10-04 13:51             ` Feng Tang
2019-10-07  8:02           ` Jiri Kosina
2019-10-02 21:45 ` Andrew Morton
2019-10-03 20:53   ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).