linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <christian.brauner@ubuntu.com>
To: chenqiwu <qiwuchen55@gmail.com>
Cc: peterz@infradead.org, mingo@kernel.org, kernel-team@android.com,
	linux-kernel@vger.kernel.org, chenqiwu@xiaomi.com,
	oleg@redhat.com
Subject: Re: [PATCH] kernel/exit: do panic earlier to get coredump if global init task exit
Date: Sat, 14 Dec 2019 12:48:11 +0100	[thread overview]
Message-ID: <20191214114810.ftfxtx4qaa5nzji4@wittgenstein> (raw)
In-Reply-To: <20191214062704.GA5580@cqw-OptiPlex-7050>

On Sat, Dec 14, 2019 at 02:27:04PM +0800, chenqiwu wrote:
> On Thu, Dec 12, 2019 at 12:05:14PM +0100, Christian Brauner wrote:
> > On Thu, Dec 12, 2019 at 11:08:38AM +0100, Oleg Nesterov wrote:
> > > can't you use is_global_init() && group_dead ?
> > 
> > Seems reasonable.
> > Looks like we can move
> > group_dead = atomic_dec_and_test(&tsk->signal->live);
> > further up...
> > 
> > (Ideally I'd like to have a test for this to ensure that this lets you
> > capture a global init coredump but that might be tricky. But since you've
> > seem to have run into this case maybe you even have something that could
> > be turned into a test? (Similar to how we already have a purely opt-in
> > test for pstore.))
> > 
> > Christian
> 
> Hi all,
> I agree that using is_global_init() && group_dead is more reasonable.
> 
> The crash isuee happened on a Android phone by reboot stress test.
> panic log:
> [   84.048521] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [   84.048521]
> [   84.048540] CPU: 2 PID: 1 Comm: init Tainted: G S         O    4.14.117-perf-g8035d1a #1
> [   84.048544] Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 RAPHAEL (DT)
> [   84.048550] Call trace:
> [   84.048564]  dump_backtrace+0x0/0x268
> [   84.048569]  show_stack+0x14/0x20
> [   84.048577]  dump_stack+0xc4/0x100
> [   84.048584]  panic+0x1f0/0x410
> [   84.048591]  complete_and_exit+0x0/0x20
> [   84.048596]  do_group_exit+0x8c/0xa0
> [   84.048602]  get_signal+0x1c0/0x790
> [   84.048608]  do_notify_resume+0x184/0xc30
> [   84.048613]  work_pending+0x8/0x10
> 
> From the kdump loaded by crash utility, all threads of global init have exited,
> the group_dead value of global init has truned to 0 by atomic_dec_and_test().
> crash> ps init
>    PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
> >     1      0   2  ffffffcd77526000  ??   0.0       0      0  init
>     534      1   4  ffffffcd6b9a9000  ZO   0.0       0      0  init
>     535      1   4  ffffffcd6b9aa000  ZO   0.0       0      0  init
> crash> ps -g 1
> PID: 1      TASK: ffffffcd77526000  CPU: 2   COMMAND: "init"
>   (no threads)
> crash> struct task_struct.signal ffffffcd77526000
>   signal = 0xffffffcd77530000
> crash> struct signal_struct 0xffffffcd77530000
> struct signal_struct {
>   sigcnt = {
>     counter = 1
>   },
>   live = {
>     counter = 0
>   },
>   nr_threads = 1,
>   thread_head = {
>     next = 0xffffffcd77526730,
>     prev = 0xffffffcd77526730
>   },
>   group_exit_code = 11,
>   notify_count = 0,
>   group_exit_task = 0x0,
>   group_stop_count = 0,
>   flags = 4,
>   ...
>  }
> 
> However, as Christian said, the test for this is tricky since we must
> make sure all of init threads exited. I make a test for is_global_init()
> and send a SIGSEGV signal to global init task in userspace. The phone
> crash imeddiately and reboot to collect kdump. Then I extract the coredump
> of global init task from kdump successfully.
> (gdb) core-file core.1.init
> Core was generated by `/system/bin/init second_stage'.
> #0  _exit () at bionic/libc/arch-arm64/syscalls/_exit.S:9
> 9           cmn     x0, #(MAX_ERRNO + 1)
> (gdb) bt
> #0  _exit () at bionic/libc/arch-arm64/syscalls/_exit.S:9
> #1  0x00000055606db11c in android::init::InstallRebootSignalHandlers()::$_14::operator()(int) const (this=<optimized out>, signal=11)
>     at system/core/init/reboot_utils.cpp:141
> #2  0x00000055606db100 in android::init::InstallRebootSignalHandlers()::$_14::__invoke(int) (signal=11) at system/core/init/reboot_utils.cpp:138
> #3  0x0000007f8de236a0 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> 
> So from the following test, we have confidence that the following patch can help us

Thanks. Can you please resend this as a proper patch for v2?

Christian

      reply	other threads:[~2019-12-14 11:53 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-12  6:14 [PATCH] kernel/exit: do panic earlier to get coredump if global init task exit qiwuchen55
2019-12-12  9:51 ` Oleg Nesterov
2019-12-12 10:08   ` Oleg Nesterov
2019-12-12 11:05     ` Christian Brauner
2019-12-14  6:27       ` chenqiwu
2019-12-14 11:48         ` Christian Brauner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191214114810.ftfxtx4qaa5nzji4@wittgenstein \
    --to=christian.brauner@ubuntu.com \
    --cc=chenqiwu@xiaomi.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qiwuchen55@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).