linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [question] panic() during reboot -f (reboot syscall)
@ 2019-03-06 13:29 Petr Mladek
  2019-03-10 17:56 ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: Petr Mladek @ 2019-03-06 13:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Rafael J. Wysocky, Andrew Morton,
	Eric W. Biederman, linux-ext4, Thomas Gleixner, Andy Shevchenko,
	Peter Zijlstra, Jan Kara

Hello,

I wonder if it is "normal" to get panic() when the system is rebooted
using "reboot -f". I looks a bit weird to me.

In our case, the panic() was triggered from ext4 filesystem code
that was mounted with "errors=panic"

  crash> bt
  PID: 3984   TASK: ffff887db1f6c180  CPU: 32  COMMAND: "bash"
  #0 [ffff887e637bf9a8] machine_kexec at ffffffff81059c5c
  #1 [ffff887e637bf9f8] __crash_kexec at ffffffff81119e0a
  #2 [ffff887e637bfab8] panic at ffffffff81193c31
  #3 [ffff887e637bfb30] ext4_handle_error at ffffffffa0229faa [ext4]
  #4 [ffff887e637bfb40] __ext4_error_inode at ffffffffa022a12a [ext4]
  #5 [ffff887e637bfbe0] __ext4_get_inode_loc at ffffffffa02096a5 [ext4]
  #6 [ffff887e637bfc40] ext4_iget at ffffffffa020c028 [ext4]
  #7 [ffff887e637bfcc0] ext4_lookup at ffffffffa0216ca0 [ext4]
  #8 [ffff887e637bfce8] lookup_real at ffffffff81218e3f
  #9 [ffff887e637bfd00] __lookup_hash at ffffffff8121916f
  #10 [ffff887e637bfd20] walk_component at ffffffff8121b50f
  #11 [ffff887e637bfd70] path_lookupat at ffffffff8121ca30
  #12 [ffff887e637bfd98] filename_lookup at ffffffff8121e58c
  #13 [ffff887e637bfe98] vfs_fstatat at ffffffff81214549
  #14 [ffff887e637bfed8] SYSC_newstat at ffffffff812149ca
  #15 [ffff887e637bff50] entry_SYSCALL_64_fastpath at ffffffff8161de61
      RIP: 00007f9db8d3ebe5  RSP: 00007ffda081cf68  RFLAGS: 00000246
      RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f9db8d3ebe5
      RDX: 00000000013c7fa0  RSI: 00000000013c7fa0  RDI: 00000000013c7f40
      RBP: 00007f9db943bee0   R8: 00000000013c7f40   R9: 00000000000b0000
      R10: 000000007af2c337  R11: 0000000000000246  R12: 00000000013c7fa0
      R13: 00000000013c7fa0  R14: 0000000000000008  R15: 00000000013c7f80
      ORIG_RAX: 0000000000000004  CS: 0033  SS: 002b


Now, "reboot -f" just calls the reboot() syscall. I do not see
anything that would stop processes. It even does not stop
other CPUs by purpose, see the commit cf7df378aa4ff7da
("reboot: rigrate shutdown/reboot to boot cpu").

But it shuts down devices very early, via:

  + kernel_restart()
    + kernel_restart_prepare()
      + blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
      + device_shutdown()

As a result, processes are still running. Filesystem code return
errors because the underlaying disk device was removed. It causes
panic() because "errors=panic" mount option.


My undestanding that userspace is reponsible for "clean" reboot.
The "reboot" command normally stops services, kill processes,
sync disks, umount filesystem before it calls the "reboot"
syscall.

By other words. It looks like the panic() is possible by design.
But it looks a bit weird. Any opinion?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [question] panic() during reboot -f (reboot syscall)
  2019-03-06 13:29 [question] panic() during reboot -f (reboot syscall) Petr Mladek
@ 2019-03-10 17:56 ` Linus Torvalds
  2019-03-12 21:29   ` Eric W. Biederman
  0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2019-03-10 17:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Linux List Kernel Mailing, Rafael J. Wysocky, Andrew Morton,
	Eric W. Biederman, linux-ext4, Thomas Gleixner, Andy Shevchenko,
	Peter Zijlstra, Jan Kara

On Wed, Mar 6, 2019 at 5:29 AM Petr Mladek <pmladek@suse.com> wrote:
>
> I wonder if it is "normal" to get panic() when the system is rebooted
> using "reboot -f". I looks a bit weird to me.

No, a panic is never normal (except possibly for test modules etc, of course).

> Now, "reboot -f" just calls the reboot() syscall. I do not see
> anything that would stop processes.

There isn't supposed to be anything. It's meant for "things are
screwed up, just reboot *now* without doing anything else".

The "reboot now" is basically meant to be a poor man's power cycle.

> But it shuts down devices very early, via:
>
>   + kernel_restart()
>     + kernel_restart_prepare()
>       + blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
>       + device_shutdown()

The problem is that there are conflicting goals here, and the kernel
doesn't even *know* if this is supposed to be a normal clean reboot,
or a "reboot -f" that just shuts down everything.

On a nice clean reboot (where init has shut everything down) we
obviously _do_ want to shut devices down etc. Quite often you need to
do it just to make sure they come up nicely again (because the
firmware isn't even always re-initializing things properly on a soft
reboot).

But on a "reboot -f", user space _hasn't_ cleaned up, and just wants
things to reboot. But the kernel doesn't really know. It just gets the
reboot system call in both cases.

> By other words. It looks like the panic() is possible by design.
> But it looks a bit weird. Any opinion?

It's definitely not "by design", but it might be unavoidable in this case.

Of course, "unavoidable" is relative. There could be workarounds that
are reasonably ok in practice.

Like having the filesystem panic code see "oh, system_state isn't
SYSTEM_RUNNING, so I shouldn't be panicing".

                Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [question] panic() during reboot -f (reboot syscall)
  2019-03-10 17:56 ` Linus Torvalds
@ 2019-03-12 21:29   ` Eric W. Biederman
  2019-03-13  8:23     ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Eric W. Biederman @ 2019-03-12 21:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Petr Mladek, Linux List Kernel Mailing, Rafael J. Wysocky,
	Andrew Morton, linux-ext4, Thomas Gleixner, Andy Shevchenko,
	Peter Zijlstra, Jan Kara

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Wed, Mar 6, 2019 at 5:29 AM Petr Mladek <pmladek@suse.com> wrote:
>>
>> I wonder if it is "normal" to get panic() when the system is rebooted
>> using "reboot -f". I looks a bit weird to me.
>
> No, a panic is never normal (except possibly for test modules etc, of course).
>
>> Now, "reboot -f" just calls the reboot() syscall. I do not see
>> anything that would stop processes.
>
> There isn't supposed to be anything. It's meant for "things are
> screwed up, just reboot *now* without doing anything else".
>
> The "reboot now" is basically meant to be a poor man's power cycle.
>
>> But it shuts down devices very early, via:
>>
>>   + kernel_restart()
>>     + kernel_restart_prepare()
>>       + blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
>>       + device_shutdown()
>
> The problem is that there are conflicting goals here, and the kernel
> doesn't even *know* if this is supposed to be a normal clean reboot,
> or a "reboot -f" that just shuts down everything.
>
> On a nice clean reboot (where init has shut everything down) we
> obviously _do_ want to shut devices down etc. Quite often you need to
> do it just to make sure they come up nicely again (because the
> firmware isn't even always re-initializing things properly on a soft
> reboot).
>
> But on a "reboot -f", user space _hasn't_ cleaned up, and just wants
> things to reboot. But the kernel doesn't really know. It just gets the
> reboot system call in both cases.
>
>> By other words. It looks like the panic() is possible by design.
>> But it looks a bit weird. Any opinion?
>
> It's definitely not "by design", but it might be unavoidable in this case.
>
> Of course, "unavoidable" is relative. There could be workarounds that
> are reasonably ok in practice.
>
> Like having the filesystem panic code see "oh, system_state isn't
> SYSTEM_RUNNING, so I shouldn't be panicing".

I wonder if there is an easy way to get the scheduler to not schedule
userspace processes once the reboot system call has started.  That
sounds like the simple way to avoid this kind of confusion.

Eric


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [question] panic() during reboot -f (reboot syscall)
  2019-03-12 21:29   ` Eric W. Biederman
@ 2019-03-13  8:23     ` Peter Zijlstra
  0 siblings, 0 replies; 4+ messages in thread
From: Peter Zijlstra @ 2019-03-13  8:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Petr Mladek, Linux List Kernel Mailing,
	Rafael J. Wysocky, Andrew Morton, linux-ext4, Thomas Gleixner,
	Andy Shevchenko, Jan Kara

On Tue, Mar 12, 2019 at 04:29:25PM -0500, Eric W. Biederman wrote:
> I wonder if there is an easy way to get the scheduler to not schedule
> userspace processes once the reboot system call has started.  That
> sounds like the simple way to avoid this kind of confusion.

That sounds like adding code to a hotpath that is 'never' used.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-13  8:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-06 13:29 [question] panic() during reboot -f (reboot syscall) Petr Mladek
2019-03-10 17:56 ` Linus Torvalds
2019-03-12 21:29   ` Eric W. Biederman
2019-03-13  8:23     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).