linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit
@ 2019-12-16  3:18 qiwuchen55
  2019-12-16 10:27 ` Christian Brauner
  2019-12-16 17:28 ` Oleg Nesterov
  0 siblings, 2 replies; 11+ messages in thread
From: qiwuchen55 @ 2019-12-16  3:18 UTC (permalink / raw)
  To: christian.brauner, peterz, mingo, oleg
  Cc: kernel-team, linux-kernel, chenqiwu

From: chenqiwu <chenqiwu@xiaomi.com>

When global init task get a chance to be killed, panic will happen in
later calling steps by do_exit()->exit_notify()->forget_original_parent()
->find_child_reaper() if all init threads have exited.

However, it's hard to extract the coredump of init task from a kernel
crashdump, since exit_mm() has released its mm before panic. In order
to get the backtrace of init task in userspace, it's better to do panic
earlier at the beginning of exitting route.

It's worth noting that we must take case of a multi-threaded init exitting
issue. We need the test for is_global_init() && group_dead to ensure that
it is all of init threads exiting and not just the current thread.

Signed-off-by: chenqiwu <chenqiwu@xiaomi.com>
---
changes in v2:
 - using is_global_init() && group_dead as panic condition.
 - move up group_dead = atomic_dec_and_test(&tsk->signal->live).
 - add comment for this change in do_exit().
---
 kernel/exit.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index bcbd598..33364c8 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -517,10 +517,6 @@ static struct task_struct *find_child_reaper(struct task_struct *father,
 	}
 
 	write_unlock_irq(&tasklist_lock);
-	if (unlikely(pid_ns == &init_pid_ns)) {
-		panic("Attempted to kill init! exitcode=0x%08x\n",
-			father->signal->group_exit_code ?: father->exit_code);
-	}
 
 	list_for_each_entry_safe(p, n, dead, ptrace_entry) {
 		list_del_init(&p->ptrace_entry);
@@ -728,6 +724,14 @@ void __noreturn do_exit(long code)
 		panic("Attempted to kill the idle task!");
 
 	/*
+	 * If all threads of global init have exited, do panic imeddiately
+	 * to get the coredump to find any clue for init task in userspace.
+	 */
+	group_dead = atomic_dec_and_test(&tsk->signal->live);
+	if (unlikely(is_global_init(tsk) && group_dead))
+		panic("Attempted to kill init! exitcode=0x%08lx\n", code);
+
+	/*
 	 * If do_exit is called because this processes oopsed, it's possible
 	 * that get_fs() was left as KERNEL_DS, so reset it to USER_DS before
 	 * continuing. Amongst other possible reasons, this is to prevent
@@ -764,7 +768,6 @@ void __noreturn do_exit(long code)
 	if (tsk->mm)
 		sync_mm_rss(tsk->mm);
 	acct_update_integrals(tsk);
-	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
 #ifdef CONFIG_POSIX_TIMERS
 		hrtimer_cancel(&tsk->signal->real_timer);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit
  2019-12-16  3:18 [PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit qiwuchen55
@ 2019-12-16 10:27 ` Christian Brauner
  2019-12-16 17:28 ` Oleg Nesterov
  1 sibling, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2019-12-16 10:27 UTC (permalink / raw)
  To: qiwuchen55; +Cc: peterz, mingo, oleg, kernel-team, linux-kernel, chenqiwu

On Mon, Dec 16, 2019 at 11:18:44AM +0800, qiwuchen55@gmail.com wrote:
> From: chenqiwu <chenqiwu@xiaomi.com>
> 
> When global init task get a chance to be killed, panic will happen in
> later calling steps by do_exit()->exit_notify()->forget_original_parent()
> ->find_child_reaper() if all init threads have exited.
> 
> However, it's hard to extract the coredump of init task from a kernel
> crashdump, since exit_mm() has released its mm before panic. In order
> to get the backtrace of init task in userspace, it's better to do panic
> earlier at the beginning of exitting route.
> 
> It's worth noting that we must take case of a multi-threaded init exitting
> issue. We need the test for is_global_init() && group_dead to ensure that
> it is all of init threads exiting and not just the current thread.
> 
> Signed-off-by: chenqiwu <chenqiwu@xiaomi.com>

Just a comment and a nit I can fixup myself.

Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>

Oleg, wdyt?

> ---
> changes in v2:
>  - using is_global_init() && group_dead as panic condition.
>  - move up group_dead = atomic_dec_and_test(&tsk->signal->live).
>  - add comment for this change in do_exit().
> ---
>  kernel/exit.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index bcbd598..33364c8 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -517,10 +517,6 @@ static struct task_struct *find_child_reaper(struct task_struct *father,
>  	}
>  
>  	write_unlock_irq(&tasklist_lock);
> -	if (unlikely(pid_ns == &init_pid_ns)) {
> -		panic("Attempted to kill init! exitcode=0x%08x\n",
> -			father->signal->group_exit_code ?: father->exit_code);
> -	}

This made me queasy at first but from what I can see this is safe to remove.
We could've only gotten here if:
task == child_reaper && !find_alive_thread() && pid_ns == init_pid_ns
that should be equivalent to the new
is_global_init(tsk) && group_dead
check, i.e. I think there's no condition we might loose by removing this
check.

>  
>  	list_for_each_entry_safe(p, n, dead, ptrace_entry) {
>  		list_del_init(&p->ptrace_entry);
> @@ -728,6 +724,14 @@ void __noreturn do_exit(long code)
>  		panic("Attempted to kill the idle task!");
>  
>  	/*
> +	 * If all threads of global init have exited, do panic imeddiately

Nit: s/imeddiately/immediately/

but I can fix this up when applying.

> +	 * to get the coredump to find any clue for init task in userspace.
> +	 */
> +	group_dead = atomic_dec_and_test(&tsk->signal->live);
> +	if (unlikely(is_global_init(tsk) && group_dead))
> +		panic("Attempted to kill init! exitcode=0x%08lx\n", code);
> +
> +	/*
>  	 * If do_exit is called because this processes oopsed, it's possible
>  	 * that get_fs() was left as KERNEL_DS, so reset it to USER_DS before
>  	 * continuing. Amongst other possible reasons, this is to prevent
> @@ -764,7 +768,6 @@ void __noreturn do_exit(long code)
>  	if (tsk->mm)
>  		sync_mm_rss(tsk->mm);
>  	acct_update_integrals(tsk);
> -	group_dead = atomic_dec_and_test(&tsk->signal->live);
>  	if (group_dead) {
>  #ifdef CONFIG_POSIX_TIMERS
>  		hrtimer_cancel(&tsk->signal->real_timer);
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit
  2019-12-16  3:18 [PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit qiwuchen55
  2019-12-16 10:27 ` Christian Brauner
@ 2019-12-16 17:28 ` Oleg Nesterov
  2019-12-16 17:44   ` Christian Brauner
  1 sibling, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2019-12-16 17:28 UTC (permalink / raw)
  To: qiwuchen55
  Cc: christian.brauner, peterz, mingo, kernel-team, linux-kernel, chenqiwu

On 12/16, qiwuchen55@gmail.com wrote:
>
> +	 * If all threads of global init have exited, do panic imeddiately
> +	 * to get the coredump to find any clue for init task in userspace.
> +	 */
> +	group_dead = atomic_dec_and_test(&tsk->signal->live);
> +	if (unlikely(is_global_init(tsk) && group_dead))
> +		panic("Attempted to kill init! exitcode=0x%08lx\n", code);
                                                                    ^^^^

No, we should not throw out the useful info, please use

	signal->group_exit_code ?: code

as the current code does.

And I am worried atomic_dec_and_test() is called too early...

Say, acct_process() can report the exit while some sub-thread sleeps
in PTRACE_EVENT_EXIT? I'd prefer to not move it up too much, at least
before exit_signals().

Oleg.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit
  2019-12-16 17:28 ` Oleg Nesterov
@ 2019-12-16 17:44   ` Christian Brauner
  2019-12-17 10:50     ` chenqiwu
  2019-12-17 14:14     ` Oleg Nesterov
  0 siblings, 2 replies; 11+ messages in thread
From: Christian Brauner @ 2019-12-16 17:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: qiwuchen55, peterz, mingo, kernel-team, linux-kernel, chenqiwu

On Mon, Dec 16, 2019 at 06:28:41PM +0100, Oleg Nesterov wrote:
> On 12/16, qiwuchen55@gmail.com wrote:
> >
> > +	 * If all threads of global init have exited, do panic imeddiately
> > +	 * to get the coredump to find any clue for init task in userspace.
> > +	 */
> > +	group_dead = atomic_dec_and_test(&tsk->signal->live);
> > +	if (unlikely(is_global_init(tsk) && group_dead))
> > +		panic("Attempted to kill init! exitcode=0x%08lx\n", code);
>                                                                     ^^^^
> 
> No, we should not throw out the useful info, please use
> 
> 	signal->group_exit_code ?: code
> 
> as the current code does.
> 
> And I am worried atomic_dec_and_test() is called too early...
> 
> Say, acct_process() can report the exit while some sub-thread sleeps

Hm, I'm not following here. I might just be slow. acct_process() doesn't
seem to report exit status and has been called after group_dead before.

Christian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit
  2019-12-16 17:44   ` Christian Brauner
@ 2019-12-17 10:50     ` chenqiwu
  2019-12-17 14:25       ` Oleg Nesterov
  2019-12-17 14:14     ` Oleg Nesterov
  1 sibling, 1 reply; 11+ messages in thread
From: chenqiwu @ 2019-12-17 10:50 UTC (permalink / raw)
  To: Christian Brauner, oleg, peterz, mingo
  Cc: kernel-team, linux-kernel, chenqiwu

On Mon, Dec 16, 2019 at 06:44:11PM +0100, Christian Brauner wrote:
> On Mon, Dec 16, 2019 at 06:28:41PM +0100, Oleg Nesterov wrote:
> > On 12/16, qiwuchen55@gmail.com wrote:
> > >
> > > +	 * If all threads of global init have exited, do panic imeddiately
> > > +	 * to get the coredump to find any clue for init task in userspace.
> > > +	 */
> > > +	group_dead = atomic_dec_and_test(&tsk->signal->live);
> > > +	if (unlikely(is_global_init(tsk) && group_dead))
> > > +		panic("Attempted to kill init! exitcode=0x%08lx\n", code);
> >                                                                     ^^^^
> > 
> > No, we should not throw out the useful info, please use
> > 
> > 	signal->group_exit_code ?: code
> > 
> > as the current code does.
> > 
> > And I am worried atomic_dec_and_test() is called too early...
> > 
> > Say, acct_process() can report the exit while some sub-thread sleeps
> 
> Hm, I'm not following here. I might just be slow. acct_process() doesn't
> seem to report exit status and has been called after group_dead before.
> 
> Christian

I agree using signal->group_exit_code ?: code is better, because there is a
possibilty that the exit of main init thread is called by complete_and_exit()
not by a normal exit route (get_signal()->do_group_exit()). For this case,
signal->group_exit_code maybe NULL, so the arg of code should be the exitcode.

And I agree we should not move up group_dead too much. What Oleg has mentioned
is just one case, what's more, there is a possibilty that some sub-threads sleep
in exit_signals()->threadgroup_change_begin() and not do real exit, the main
thread will do panic if all sub-threads do exit since the group_dead is 0.

For workaroud such case, I think the patch should be:

diff --git a/kernel/exit.c b/kernel/exit.c
index bcbd598..58d90e1 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -517,10 +517,6 @@ static struct task_struct *find_child_reaper(struct task_struct *father,
        }

        write_unlock_irq(&tasklist_lock);
-       if (unlikely(pid_ns == &init_pid_ns)) {
-               panic("Attempted to kill init! exitcode=0x%08x\n",
-                       father->signal->group_exit_code ?: father->exit_code);
-       }

        list_for_each_entry_safe(p, n, dead, ptrace_entry) {
                list_del_init(&p->ptrace_entry);
@@ -728,6 +724,15 @@ void __noreturn do_exit(long code)
                panic("Attempted to kill the idle task!");

        /*
+        * If all threads of global init have exited, do panic imeddiately
+        * to get the coredump to find any clue for init task in userspace.
+        */
+       if (unlikely(is_global_init(tsk) &&
+               (atomic_read(&tsk->signal->live) == 1)))
+               panic("Attempted to kill init! exitcode=0x%08x\n",
+                       tsk->signal->group_exit_code ?: (int)code);
+
+       /*
         * If do_exit is called because this processes oopsed, it's possible
         * that get_fs() was left as KERNEL_DS, so reset it to USER_DS before
         * continuing. Amongst other possible reasons, this is to prevent




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit
  2019-12-16 17:44   ` Christian Brauner
  2019-12-17 10:50     ` chenqiwu
@ 2019-12-17 14:14     ` Oleg Nesterov
  1 sibling, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2019-12-17 14:14 UTC (permalink / raw)
  To: Christian Brauner
  Cc: qiwuchen55, peterz, mingo, kernel-team, linux-kernel, chenqiwu

On 12/16, Christian Brauner wrote:
>
> On Mon, Dec 16, 2019 at 06:28:41PM +0100, Oleg Nesterov wrote:
> >
> > And I am worried atomic_dec_and_test() is called too early...
> >
> > Say, acct_process() can report the exit while some sub-thread sleeps
>
> Hm, I'm not following here. I might just be slow. acct_process() doesn't
> seem to report exit status and has been called after group_dead before.

Yes, but with this patch group_dead becomes true before the process is
"dead enough".

Suppose a process has 2 threads, T1, and T2.

T1 exits, decrements signal->live, and sleeps in PTRACE_EVENT_EXIT.

T2 exits, decrements signal->live, group_dead == T, it calls acct_process()
which reports the fact this process has exited. Probably not a real problem,
but still strange, this process has not exited yet, it is not even a zombie.

Oleg.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit
  2019-12-17 10:50     ` chenqiwu
@ 2019-12-17 14:25       ` Oleg Nesterov
  2019-12-17 14:56         ` chenqiwu
  0 siblings, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2019-12-17 14:25 UTC (permalink / raw)
  To: chenqiwu
  Cc: Christian Brauner, peterz, mingo, kernel-team, linux-kernel, chenqiwu

On 12/17, chenqiwu wrote:
>
> @@ -728,6 +724,15 @@ void __noreturn do_exit(long code)
>                 panic("Attempted to kill the idle task!");
>
>         /*
> +        * If all threads of global init have exited, do panic imeddiately
> +        * to get the coredump to find any clue for init task in userspace.
> +        */
> +       if (unlikely(is_global_init(tsk) &&
> +               (atomic_read(&tsk->signal->live) == 1)))

Well, I guess this will work in practice, but in theory this is racy.

Suppose that signal->live == 2 and both threads exit in parallel. They
both can see tsk->signal->live == 2 before atomic_dec_and_test().

If you are fine with this race I won't object, but please add a comment.

But why can't you simply do

	--- x/kernel/exit.c
	+++ x/kernel/exit.c
	@@ -786,6 +786,8 @@ void __noreturn do_exit(long code)
		acct_update_integrals(tsk);
		group_dead = atomic_dec_and_test(&tsk->signal->live);
		if (group_dead) {
	+		if (unlikely(is_global_init(tsk)
	+			panic(...);
	 #ifdef CONFIG_POSIX_TIMERS
			hrtimer_cancel(&tsk->signal->real_timer);
			exit_itimers(tsk->signal);

?

Oleg.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit
  2019-12-17 14:25       ` Oleg Nesterov
@ 2019-12-17 14:56         ` chenqiwu
  2019-12-17 15:23           ` Oleg Nesterov
  0 siblings, 1 reply; 11+ messages in thread
From: chenqiwu @ 2019-12-17 14:56 UTC (permalink / raw)
  To: Oleg Nesterov, christian.brauner, peterz, mingo
  Cc: kernel-team, linux-kernel, chenqiwu

On Tue, Dec 17, 2019 at 03:25:15PM +0100, Oleg Nesterov wrote:
> On 12/17, chenqiwu wrote:
> >
> > @@ -728,6 +724,15 @@ void __noreturn do_exit(long code)
> >                 panic("Attempted to kill the idle task!");
> >
> >         /*
> > +        * If all threads of global init have exited, do panic imeddiately
> > +        * to get the coredump to find any clue for init task in userspace.
> > +        */
> > +       if (unlikely(is_global_init(tsk) &&
> > +               (atomic_read(&tsk->signal->live) == 1)))
> 
> Well, I guess this will work in practice, but in theory this is racy.
> 
> Suppose that signal->live == 2 and both threads exit in parallel. They
> both can see tsk->signal->live == 2 before atomic_dec_and_test().
> 
> If you are fine with this race I won't object, but please add a comment.
> 
> But why can't you simply do
> 
> 	--- x/kernel/exit.c
> 	+++ x/kernel/exit.c
> 	@@ -786,6 +786,8 @@ void __noreturn do_exit(long code)
> 		acct_update_integrals(tsk);
> 		group_dead = atomic_dec_and_test(&tsk->signal->live);
> 		if (group_dead) {
> 	+		if (unlikely(is_global_init(tsk)
> 	+			panic(...);
> 	 #ifdef CONFIG_POSIX_TIMERS
> 			hrtimer_cancel(&tsk->signal->real_timer);
> 			exit_itimers(tsk->signal);
> 
> ?
> 
> Oleg.
>

Oh, yeah, thanks for your reminds! But in fact, I think atomic_read()
can avoid the racy even if both threads exit in parallel, since it is
an atomic operation forever. I agree your simply modify, is there any
other questions?



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit
  2019-12-17 14:56         ` chenqiwu
@ 2019-12-17 15:23           ` Oleg Nesterov
  2019-12-17 19:33             ` Peter Zijlstra
  2019-12-18  8:08             ` chenqiwu
  0 siblings, 2 replies; 11+ messages in thread
From: Oleg Nesterov @ 2019-12-17 15:23 UTC (permalink / raw)
  To: chenqiwu
  Cc: christian.brauner, peterz, mingo, kernel-team, linux-kernel, chenqiwu

On 12/17, chenqiwu wrote:
>
> But in fact, I think atomic_read()
> can avoid the racy even if both threads exit in parallel, since it is
> an atomic operation forever.

Hmm, not sure I understand. atomic_read() is just READ_ONCE(), it can't be
re-ordered but that is all.

How can it avoid the race if it is called before atomic_dec_and_test() ?

Again, suppose that we have 2 exiting threads and signal->live == 2. With
your patch each thread does atomic_read() before atomic_dec_and_test(),
both threads can observe atomic_read(signal->live) == 2 simply because
the counter was not decremented yet.

Oleg.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit
  2019-12-17 15:23           ` Oleg Nesterov
@ 2019-12-17 19:33             ` Peter Zijlstra
  2019-12-18  8:08             ` chenqiwu
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2019-12-17 19:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: chenqiwu, christian.brauner, mingo, kernel-team, linux-kernel, chenqiwu

On Tue, Dec 17, 2019 at 04:23:33PM +0100, Oleg Nesterov wrote:
> On 12/17, chenqiwu wrote:
> >
> > But in fact, I think atomic_read()
> > can avoid the racy even if both threads exit in parallel, since it is
> > an atomic operation forever.
> 
> Hmm, not sure I understand. atomic_read() is just READ_ONCE(), it can't be
> re-ordered but that is all.

That, atomic_read() is just a read. It doesn't modify the variable,
therefore there isn't anything 'atomic' even remotely possible.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit
  2019-12-17 15:23           ` Oleg Nesterov
  2019-12-17 19:33             ` Peter Zijlstra
@ 2019-12-18  8:08             ` chenqiwu
  1 sibling, 0 replies; 11+ messages in thread
From: chenqiwu @ 2019-12-18  8:08 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: christian.brauner, peterz, mingo, kernel-team, linux-kernel

On Tue, Dec 17, 2019 at 04:23:33PM +0100, Oleg Nesterov wrote:
> On 12/17, chenqiwu wrote:
> >
> > But in fact, I think atomic_read()
> > can avoid the racy even if both threads exit in parallel, since it is
> > an atomic operation forever.
> 
> Hmm, not sure I understand. atomic_read() is just READ_ONCE(), it can't be
> re-ordered but that is all.
> 
> How can it avoid the race if it is called before atomic_dec_and_test() ?
> 
> Again, suppose that we have 2 exiting threads and signal->live == 2. With
> your patch each thread does atomic_read() before atomic_dec_and_test(),
> both threads can observe atomic_read(signal->live) == 2 simply because
> the counter was not decremented yet.
> 
> Oleg.
>

Yeah, I agree your idea.
Now, if there are no further questions, I will resend this as a proper patch for v3.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-12-18  8:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-16  3:18 [PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit qiwuchen55
2019-12-16 10:27 ` Christian Brauner
2019-12-16 17:28 ` Oleg Nesterov
2019-12-16 17:44   ` Christian Brauner
2019-12-17 10:50     ` chenqiwu
2019-12-17 14:25       ` Oleg Nesterov
2019-12-17 14:56         ` chenqiwu
2019-12-17 15:23           ` Oleg Nesterov
2019-12-17 19:33             ` Peter Zijlstra
2019-12-18  8:08             ` chenqiwu
2019-12-17 14:14     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).