linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3] exit: trigger panic when global init has exited
@ 2021-03-17 12:51 Qianli Zhao
  2021-03-17 14:38 ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Qianli Zhao @ 2021-03-17 12:51 UTC (permalink / raw)
  To: christian, axboe, ebiederm, oleg, tglx, pcc; +Cc: linux-kernel, zhaoqianli

From: Qianli Zhao <zhaoqianli@xiaomi.com>

When init sub-threads running on different CPUs exit at the same time,
zap_pid_ns_processe()->BUG() may be happened.
And every thread status is abnormal after exit(PF_EXITING set,task->mm=NULL etc),
which makes it difficult to parse coredump from fulldump normally.
In order to fix the above problem, when any one init has been set to SIGNAL_GROUP_EXIT,
trigger panic immediately, and prevent other init threads from continuing to exit

[   24.705376] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
[   24.705382] CPU: 4 PID: 552 Comm: init Tainted: G S         O    4.14.180-perf-g4483caa8ae80-dirty #1
[   24.705390] kernel BUG at include/linux/pid_namespace.h:98!

PID: 552   CPU: 4   COMMAND: "init"
PID: 1     CPU: 7   COMMAND: "init"
core4                           core7
...                             sys_exit_group()
                                do_group_exit()
                                   - sig->flags = SIGNAL_GROUP_EXIT
                                   - zap_other_threads()
                                do_exit() //PF_EXITING is set
ret_to_user()
do_notify_resume()
get_signal()
    - signal_group_exit
    - goto fatal;
do_group_exit()
do_exit() //PF_EXITING is set
    - panic("Attempted to kill init! exitcode=0x%08x\n")
                                exit_notify()
                                find_alive_thread() //no alive sub-threads
                                zap_pid_ns_processes()//CONFIG_PID_NS is not set
                                BUG()

Signed-off-by: Qianli Zhao <zhaoqianli@xiaomi.com>
---
V3:
- Use group_dead instead of thread_group_empty() to test single init exit.

V2:
- Changelog update
- Remove wrong useage of SIGNAL_UNKILLABLE. 
- Add thread_group_empty() test to handle single init thread exit
---
 kernel/exit.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 04029e3..32b74e4 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -766,6 +766,17 @@ void __noreturn do_exit(long code)
 
 	validate_creds_for_do_exit(tsk);
 
+	group_dead = atomic_dec_and_test(&tsk->signal->live);
+	/*
+	 * If global init has exited,
+	 * panic immediately to get a useable coredump.
+	 */
+	if (unlikely(is_global_init(tsk) &&
+	    (group_dead || (tsk->signal->flags & SIGNAL_GROUP_EXIT)))) {
+			panic("Attempted to kill init! exitcode=0x%08x\n",
+				tsk->signal->group_exit_code ?: (int)code);
+	}
+
 	/*
 	 * We're taking recursive faults here in do_exit. Safest is to just
 	 * leave this task alone and wait for reboot.
@@ -784,16 +795,7 @@ void __noreturn do_exit(long code)
 	if (tsk->mm)
 		sync_mm_rss(tsk->mm);
 	acct_update_integrals(tsk);
-	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
-		/*
-		 * If the last thread of global init has exited, panic
-		 * immediately to get a useable coredump.
-		 */
-		if (unlikely(is_global_init(tsk)))
-			panic("Attempted to kill init! exitcode=0x%08x\n",
-				tsk->signal->group_exit_code ?: (int)code);
-
 #ifdef CONFIG_POSIX_TIMERS
 		hrtimer_cancel(&tsk->signal->real_timer);
 		exit_itimers(tsk->signal);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-17 12:51 [PATCH V3] exit: trigger panic when global init has exited Qianli Zhao
@ 2021-03-17 14:38 ` Oleg Nesterov
  2021-03-18  2:47   ` qianli zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2021-03-17 14:38 UTC (permalink / raw)
  To: Qianli Zhao
  Cc: christian, axboe, ebiederm, tglx, pcc, linux-kernel, zhaoqianli

On 03/17, Qianli Zhao wrote:
>
> From: Qianli Zhao <zhaoqianli@xiaomi.com>
>
> When init sub-threads running on different CPUs exit at the same time,
> zap_pid_ns_processe()->BUG() may be happened.

and why do you think your patch can't prevent this?

Sorry, I must have missed something. But it seems to me that you are trying
to fix the wrong problem. Yes, zap_pid_ns_processes() must not be called in
the root namespace, and this has nothing to do with CONFIG_PID_NS.

> And every thread status is abnormal after exit(PF_EXITING set,task->mm=NULL etc),
> which makes it difficult to parse coredump from fulldump normally.
> In order to fix the above problem, when any one init has been set to SIGNAL_GROUP_EXIT,
> trigger panic immediately, and prevent other init threads from continuing to exit
>
> [   24.705376] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
> [   24.705382] CPU: 4 PID: 552 Comm: init Tainted: G S         O    4.14.180-perf-g4483caa8ae80-dirty #1
> [   24.705390] kernel BUG at include/linux/pid_namespace.h:98!
>
> PID: 552   CPU: 4   COMMAND: "init"
> PID: 1     CPU: 7   COMMAND: "init"
> core4                           core7
> ...                             sys_exit_group()
>                                 do_group_exit()
>                                    - sig->flags = SIGNAL_GROUP_EXIT
>                                    - zap_other_threads()
>                                 do_exit() //PF_EXITING is set
> ret_to_user()
> do_notify_resume()
> get_signal()
>     - signal_group_exit
>     - goto fatal;
> do_group_exit()
> do_exit() //PF_EXITING is set
>     - panic("Attempted to kill init! exitcode=0x%08x\n")
>                                 exit_notify()
>                                 find_alive_thread() //no alive sub-threads
>                                 zap_pid_ns_processes()//CONFIG_PID_NS is not set
>                                 BUG()
>
> Signed-off-by: Qianli Zhao <zhaoqianli@xiaomi.com>
> ---
> V3:
> - Use group_dead instead of thread_group_empty() to test single init exit.
>
> V2:
> - Changelog update
> - Remove wrong useage of SIGNAL_UNKILLABLE.
> - Add thread_group_empty() test to handle single init thread exit
> ---
>  kernel/exit.c | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 04029e3..32b74e4 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -766,6 +766,17 @@ void __noreturn do_exit(long code)
>
>  	validate_creds_for_do_exit(tsk);
>
> +	group_dead = atomic_dec_and_test(&tsk->signal->live);
> +	/*
> +	 * If global init has exited,
> +	 * panic immediately to get a useable coredump.
> +	 */
> +	if (unlikely(is_global_init(tsk) &&
> +	    (group_dead || (tsk->signal->flags & SIGNAL_GROUP_EXIT)))) {
> +			panic("Attempted to kill init! exitcode=0x%08x\n",
> +				tsk->signal->group_exit_code ?: (int)code);
> +	}
> +
>  	/*
>  	 * We're taking recursive faults here in do_exit. Safest is to just
>  	 * leave this task alone and wait for reboot.
> @@ -784,16 +795,7 @@ void __noreturn do_exit(long code)
>  	if (tsk->mm)
>  		sync_mm_rss(tsk->mm);
>  	acct_update_integrals(tsk);
> -	group_dead = atomic_dec_and_test(&tsk->signal->live);
>  	if (group_dead) {
> -		/*
> -		 * If the last thread of global init has exited, panic
> -		 * immediately to get a useable coredump.
> -		 */
> -		if (unlikely(is_global_init(tsk)))
> -			panic("Attempted to kill init! exitcode=0x%08x\n",
> -				tsk->signal->group_exit_code ?: (int)code);
> -
>  #ifdef CONFIG_POSIX_TIMERS
>  		hrtimer_cancel(&tsk->signal->real_timer);
>  		exit_itimers(tsk->signal);
> --
> 1.9.1
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-17 14:38 ` Oleg Nesterov
@ 2021-03-18  2:47   ` qianli zhao
  2021-03-18 18:04     ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: qianli zhao @ 2021-03-18  2:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: christian, axboe, Eric W. Biederman, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

Hi,Oleg

Thank you for your reply.

>> When init sub-threads running on different CPUs exit at the same time,
>> zap_pid_ns_processe()->BUG() may be happened.

> and why do you think your patch can't prevent this?

> Sorry, I must have missed something. But it seems to me that you are trying
> to fix the wrong problem. Yes, zap_pid_ns_processes() must not be called in
> the root namespace, and this has nothing to do with CONFIG_PID_NS.

Yes, i try to fix this exception by test SIGNAL_GROUP_EXIT and call
panic before setting PF_EXITING to prevent zap_pid_ns_processes()
being called when init do_exit().
In addition, the patch also protects the init process state to
successfully get usable init coredump.

In my test,this patch works.

Oleg Nesterov <oleg@redhat.com> 于2021年3月17日周三 下午10:38写道:
>
> On 03/17, Qianli Zhao wrote:
> >
> > From: Qianli Zhao <zhaoqianli@xiaomi.com>
> >
> > When init sub-threads running on different CPUs exit at the same time,
> > zap_pid_ns_processe()->BUG() may be happened.
>
> and why do you think your patch can't prevent this?
>
> Sorry, I must have missed something. But it seems to me that you are trying
> to fix the wrong problem. Yes, zap_pid_ns_processes() must not be called in
> the root namespace, and this has nothing to do with CONFIG_PID_NS.
>
> > And every thread status is abnormal after exit(PF_EXITING set,task->mm=NULL etc),
> > which makes it difficult to parse coredump from fulldump normally.
> > In order to fix the above problem, when any one init has been set to SIGNAL_GROUP_EXIT,
> > trigger panic immediately, and prevent other init threads from continuing to exit
> >
> > [   24.705376] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
> > [   24.705382] CPU: 4 PID: 552 Comm: init Tainted: G S         O    4.14.180-perf-g4483caa8ae80-dirty #1
> > [   24.705390] kernel BUG at include/linux/pid_namespace.h:98!
> >
> > PID: 552   CPU: 4   COMMAND: "init"
> > PID: 1     CPU: 7   COMMAND: "init"
> > core4                           core7
> > ...                             sys_exit_group()
> >                                 do_group_exit()
> >                                    - sig->flags = SIGNAL_GROUP_EXIT
> >                                    - zap_other_threads()
> >                                 do_exit() //PF_EXITING is set
> > ret_to_user()
> > do_notify_resume()
> > get_signal()
> >     - signal_group_exit
> >     - goto fatal;
> > do_group_exit()
> > do_exit() //PF_EXITING is set
> >     - panic("Attempted to kill init! exitcode=0x%08x\n")
> >                                 exit_notify()
> >                                 find_alive_thread() //no alive sub-threads
> >                                 zap_pid_ns_processes()//CONFIG_PID_NS is not set
> >                                 BUG()
> >
> > Signed-off-by: Qianli Zhao <zhaoqianli@xiaomi.com>
> > ---
> > V3:
> > - Use group_dead instead of thread_group_empty() to test single init exit.
> >
> > V2:
> > - Changelog update
> > - Remove wrong useage of SIGNAL_UNKILLABLE.
> > - Add thread_group_empty() test to handle single init thread exit
> > ---
> >  kernel/exit.c | 20 +++++++++++---------
> >  1 file changed, 11 insertions(+), 9 deletions(-)
> >
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index 04029e3..32b74e4 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -766,6 +766,17 @@ void __noreturn do_exit(long code)
> >
> >       validate_creds_for_do_exit(tsk);
> >
> > +     group_dead = atomic_dec_and_test(&tsk->signal->live);
> > +     /*
> > +      * If global init has exited,
> > +      * panic immediately to get a useable coredump.
> > +      */
> > +     if (unlikely(is_global_init(tsk) &&
> > +         (group_dead || (tsk->signal->flags & SIGNAL_GROUP_EXIT)))) {
> > +                     panic("Attempted to kill init! exitcode=0x%08x\n",
> > +                             tsk->signal->group_exit_code ?: (int)code);
> > +     }
> > +
> >       /*
> >        * We're taking recursive faults here in do_exit. Safest is to just
> >        * leave this task alone and wait for reboot.
> > @@ -784,16 +795,7 @@ void __noreturn do_exit(long code)
> >       if (tsk->mm)
> >               sync_mm_rss(tsk->mm);
> >       acct_update_integrals(tsk);
> > -     group_dead = atomic_dec_and_test(&tsk->signal->live);
> >       if (group_dead) {
> > -             /*
> > -              * If the last thread of global init has exited, panic
> > -              * immediately to get a useable coredump.
> > -              */
> > -             if (unlikely(is_global_init(tsk)))
> > -                     panic("Attempted to kill init! exitcode=0x%08x\n",
> > -                             tsk->signal->group_exit_code ?: (int)code);
> > -
> >  #ifdef CONFIG_POSIX_TIMERS
> >               hrtimer_cancel(&tsk->signal->real_timer);
> >               exit_itimers(tsk->signal);
> > --
> > 1.9.1
> >
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-18  2:47   ` qianli zhao
@ 2021-03-18 18:04     ` Oleg Nesterov
  2021-03-18 19:08       ` Eric W. Biederman
  2021-03-19  5:08       ` qianli zhao
  0 siblings, 2 replies; 20+ messages in thread
From: Oleg Nesterov @ 2021-03-18 18:04 UTC (permalink / raw)
  To: qianli zhao
  Cc: christian, axboe, Eric W. Biederman, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

On 03/18, qianli zhao wrote:
>
> Hi,Oleg
>
> Thank you for your reply.
>
> >> When init sub-threads running on different CPUs exit at the same time,
> >> zap_pid_ns_processe()->BUG() may be happened.
>
> > and why do you think your patch can't prevent this?
>
> > Sorry, I must have missed something. But it seems to me that you are trying
> > to fix the wrong problem. Yes, zap_pid_ns_processes() must not be called in
> > the root namespace, and this has nothing to do with CONFIG_PID_NS.
>
> Yes, i try to fix this exception by test SIGNAL_GROUP_EXIT and call
> panic before setting PF_EXITING to prevent zap_pid_ns_processes()
> being called when init do_exit().

Ah, I didn't notice your patch does atomic_dec_and_test(signal->live)
before exit_signals() which sets PF_EXITING. Thanks for correcting me.

So yes, I was wrong, your patch can prevent this. Although I'd like to
recheck if every do-something-if-group-dead action is correct in the
case we have a non-PF_EXITING thread...

But then I don't understand the SIGNAL_GROUP_EXIT check added by your
patch. Do we really need it if we want to avoid zap_pid_ns_processes()
when the global init exits?

> In addition, the patch also protects the init process state to
> successfully get usable init coredump.

Could you spell please?

Does this connect to SIGNAL_GROUP_EXIT check? Do you mean that you want
to panic earlier, before other init's sub-threads exit?

Thanks,

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-18 18:04     ` Oleg Nesterov
@ 2021-03-18 19:08       ` Eric W. Biederman
  2021-03-19  6:33         ` qianli zhao
                           ` (2 more replies)
  2021-03-19  5:08       ` qianli zhao
  1 sibling, 3 replies; 20+ messages in thread
From: Eric W. Biederman @ 2021-03-18 19:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: qianli zhao, christian, axboe, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

Oleg Nesterov <oleg@redhat.com> writes:

> On 03/18, qianli zhao wrote:
>>
>> Hi,Oleg
>>
>> Thank you for your reply.
>>
>> >> When init sub-threads running on different CPUs exit at the same time,
>> >> zap_pid_ns_processe()->BUG() may be happened.
>>
>> > and why do you think your patch can't prevent this?
>>
>> > Sorry, I must have missed something. But it seems to me that you are trying
>> > to fix the wrong problem. Yes, zap_pid_ns_processes() must not be called in
>> > the root namespace, and this has nothing to do with CONFIG_PID_NS.
>>
>> Yes, i try to fix this exception by test SIGNAL_GROUP_EXIT and call
>> panic before setting PF_EXITING to prevent zap_pid_ns_processes()
>> being called when init do_exit().
>
> Ah, I didn't notice your patch does atomic_dec_and_test(signal->live)
> before exit_signals() which sets PF_EXITING. Thanks for correcting me.
>
> So yes, I was wrong, your patch can prevent this. Although I'd like to
> recheck if every do-something-if-group-dead action is correct in the
> case we have a non-PF_EXITING thread...
>
> But then I don't understand the SIGNAL_GROUP_EXIT check added by your
> patch. Do we really need it if we want to avoid zap_pid_ns_processes()
> when the global init exits?
>
>> In addition, the patch also protects the init process state to
>> successfully get usable init coredump.
>
> Could you spell please?
>
> Does this connect to SIGNAL_GROUP_EXIT check? Do you mean that you want
> to panic earlier, before other init's sub-threads exit?

That is my understanding.

As I understand it this patch has two purposes:
1. Avoid the BUG_ON in zap_pid_ns_processes when !CONFIG_PID_NS
2. panic as early as possible so exiting threads don't removing
   interesting debugging state.


It is a bit tricky to tell if the movement of the decrement of
signal->live is safe.  That affects current_is_single threaded
which is used by unshare, setns of the time namespace, and setting
the selinux part of creds.

The usage in kernel/cgroup/cgroup.c:css_task_iter_advance seems safe.
Hmm, Maybe not.  Today cgroup_thread_change_begin is held around
setting PF_EXITING before signal->live is decremented.  So there seem to
be some subtle cgroup dependencies.

The usages of group_dead in do_exit seem safe, as except for the new
one everything is the same.

We could definitely take advantage of knowing group_dead in exit_signals
to simplify it's optimization to not rerouting signals to living
threads.


I think if we are going to move the decrement of signal->live that
should be it's own patch and be accompanied with a good description of
why it is safe instead of having the decrement of signal->live be there
as a side effect of another change.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-18 18:04     ` Oleg Nesterov
  2021-03-18 19:08       ` Eric W. Biederman
@ 2021-03-19  5:08       ` qianli zhao
  2021-03-19 16:32         ` Oleg Nesterov
  1 sibling, 1 reply; 20+ messages in thread
From: qianli zhao @ 2021-03-19  5:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: christian, axboe, Eric W. Biederman, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

Hi,Oleg

> But then I don't understand the SIGNAL_GROUP_EXIT check added by your
> patch. Do we really need it if we want to avoid zap_pid_ns_processes()
> when the global init exits?

I think check SIGNAL_GROUP_EXIT is necessary,or panic() will happen
after all init sub-threads do_exit(),so the following two situations
will happen:
1.According to the timing in the changelog,
zap_pid_ns_processes()->BUG() maybe happened.
2.The key variables of each init sub-threads will be in the exit
state(such task->mm=NULL,task->flags=PF_EXITING,task->nsproxy=NULL),resulting
in the failure to parse coredump from fulldump.

So i think check SIGNAL_GROUP_EXIT is a simple and effective way to
prevent these

> Does this connect to SIGNAL_GROUP_EXIT check? Do you mean that you want
> to panic earlier, before other init's sub-threads exit?

Yes, my patch just want panic earlier before other init's sub-threads exit

Oleg Nesterov <oleg@redhat.com> 于2021年3月19日周五 上午2:05写道:
>
> On 03/18, qianli zhao wrote:
> >
> > Hi,Oleg
> >
> > Thank you for your reply.
> >
> > >> When init sub-threads running on different CPUs exit at the same time,
> > >> zap_pid_ns_processe()->BUG() may be happened.
> >
> > > and why do you think your patch can't prevent this?
> >
> > > Sorry, I must have missed something. But it seems to me that you are trying
> > > to fix the wrong problem. Yes, zap_pid_ns_processes() must not be called in
> > > the root namespace, and this has nothing to do with CONFIG_PID_NS.
> >
> > Yes, i try to fix this exception by test SIGNAL_GROUP_EXIT and call
> > panic before setting PF_EXITING to prevent zap_pid_ns_processes()
> > being called when init do_exit().
>
> Ah, I didn't notice your patch does atomic_dec_and_test(signal->live)
> before exit_signals() which sets PF_EXITING. Thanks for correcting me.
>
> So yes, I was wrong, your patch can prevent this. Although I'd like to
> recheck if every do-something-if-group-dead action is correct in the
> case we have a non-PF_EXITING thread...
>
> But then I don't understand the SIGNAL_GROUP_EXIT check added by your
> patch. Do we really need it if we want to avoid zap_pid_ns_processes()
> when the global init exits?
>
> > In addition, the patch also protects the init process state to
> > successfully get usable init coredump.
>
> Could you spell please?
>
> Does this connect to SIGNAL_GROUP_EXIT check? Do you mean that you want
> to panic earlier, before other init's sub-threads exit?
>
> Thanks,
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-18 19:08       ` Eric W. Biederman
@ 2021-03-19  6:33         ` qianli zhao
  2021-03-19 16:34           ` Oleg Nesterov
  2021-03-19 16:26         ` Oleg Nesterov
  2021-03-21 16:00         ` qianli zhao
  2 siblings, 1 reply; 20+ messages in thread
From: qianli zhao @ 2021-03-19  6:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, christian, axboe, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

Hi,Eric

> As I understand it this patch has two purposes:
> 1. Avoid the BUG_ON in zap_pid_ns_processes when !CONFIG_PID_NS
> 2. panic as early as possible so exiting threads don't removing
>   interesting debugging state.

Your understanding is very correct,this is what my patch wants to do

> I think if we are going to move the decrement of signal->live that
> should be it's own patch and be accompanied with a good description of
> why it is safe instead of having the decrement of signal->live be there
> as a side effect of another change.

I will think about the risks of movement of the decrement of
signal->live before exit_signal().
If is difficult to judge movement of the decrement of signal->live is
safe,how about only test 'signal->live==1' not use group_dead?

Such as:
diff --git a/kernel/exit.c b/kernel/exit.c
index 04029e3..87f3595 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -767,6 +767,17 @@ void __noreturn do_exit(long code)
        validate_creds_for_do_exit(tsk);

        /*
+        * If global init has exited,
+        * panic immediately to get a useable coredump.
+        */
+       if (unlikely(is_global_init(tsk) &&
+           ((atomic_read(&tsk->signal->live) == 1) ||    /*current is
last init thread*/
+            (tsk->signal->flags & SIGNAL_GROUP_EXIT)))) {
+                       panic("Attempted to kill init! exitcode=0x%08x\n",
+                               tsk->signal->group_exit_code ?: (int)code);
+       }
+
+       /*
         * We're taking recursive faults here in do_exit. Safest is to just
         * leave this task alone and wait for reboot.
         */
@@ -784,16 +795,9 @@ void __noreturn do_exit(long code)
        if (tsk->mm)
                sync_mm_rss(tsk->mm);
        acct_update_integrals(tsk);
+
        group_dead = atomic_dec_and_test(&tsk->signal->live);
        if (group_dead) {
-               /*
-                * If the last thread of global init has exited, panic
-                * immediately to get a useable coredump.
-                */
-               if (unlikely(is_global_init(tsk)))
-                       panic("Attempted to kill init! exitcode=0x%08x\n",
-                               tsk->signal->group_exit_code ?: (int)code);
-

Eric W. Biederman <ebiederm@xmission.com> 于2021年3月19日周五 上午3:09写道:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > On 03/18, qianli zhao wrote:
> >>
> >> Hi,Oleg
> >>
> >> Thank you for your reply.
> >>
> >> >> When init sub-threads running on different CPUs exit at the same time,
> >> >> zap_pid_ns_processe()->BUG() may be happened.
> >>
> >> > and why do you think your patch can't prevent this?
> >>
> >> > Sorry, I must have missed something. But it seems to me that you are trying
> >> > to fix the wrong problem. Yes, zap_pid_ns_processes() must not be called in
> >> > the root namespace, and this has nothing to do with CONFIG_PID_NS.
> >>
> >> Yes, i try to fix this exception by test SIGNAL_GROUP_EXIT and call
> >> panic before setting PF_EXITING to prevent zap_pid_ns_processes()
> >> being called when init do_exit().
> >
> > Ah, I didn't notice your patch does atomic_dec_and_test(signal->live)
> > before exit_signals() which sets PF_EXITING. Thanks for correcting me.
> >
> > So yes, I was wrong, your patch can prevent this. Although I'd like to
> > recheck if every do-something-if-group-dead action is correct in the
> > case we have a non-PF_EXITING thread...
> >
> > But then I don't understand the SIGNAL_GROUP_EXIT check added by your
> > patch. Do we really need it if we want to avoid zap_pid_ns_processes()
> > when the global init exits?
> >
> >> In addition, the patch also protects the init process state to
> >> successfully get usable init coredump.
> >
> > Could you spell please?
> >
> > Does this connect to SIGNAL_GROUP_EXIT check? Do you mean that you want
> > to panic earlier, before other init's sub-threads exit?
>
> That is my understanding.
>
> As I understand it this patch has two purposes:
> 1. Avoid the BUG_ON in zap_pid_ns_processes when !CONFIG_PID_NS
> 2. panic as early as possible so exiting threads don't removing
>    interesting debugging state.
>
>
> It is a bit tricky to tell if the movement of the decrement of
> signal->live is safe.  That affects current_is_single threaded
> which is used by unshare, setns of the time namespace, and setting
> the selinux part of creds.
>
> The usage in kernel/cgroup/cgroup.c:css_task_iter_advance seems safe.
> Hmm, Maybe not.  Today cgroup_thread_change_begin is held around
> setting PF_EXITING before signal->live is decremented.  So there seem to
> be some subtle cgroup dependencies.
>
> The usages of group_dead in do_exit seem safe, as except for the new
> one everything is the same.
>
> We could definitely take advantage of knowing group_dead in exit_signals
> to simplify it's optimization to not rerouting signals to living
> threads.
>
>
> I think if we are going to move the decrement of signal->live that
> should be it's own patch and be accompanied with a good description of
> why it is safe instead of having the decrement of signal->live be there
> as a side effect of another change.
>
> Eric

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-18 19:08       ` Eric W. Biederman
  2021-03-19  6:33         ` qianli zhao
@ 2021-03-19 16:26         ` Oleg Nesterov
  2021-03-21 16:00         ` qianli zhao
  2 siblings, 0 replies; 20+ messages in thread
From: Oleg Nesterov @ 2021-03-19 16:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: qianli zhao, christian, axboe, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

On 03/18, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > On 03/18, qianli zhao wrote:
> >>
> >> In addition, the patch also protects the init process state to
> >> successfully get usable init coredump.
> >
> > Could you spell please?
> >
> > Does this connect to SIGNAL_GROUP_EXIT check? Do you mean that you want
> > to panic earlier, before other init's sub-threads exit?
>
> That is my understanding.
>
> As I understand it this patch has two purposes:
> 1. Avoid the BUG_ON in zap_pid_ns_processes when !CONFIG_PID_NS
> 2. panic as early as possible so exiting threads don't removing
>    interesting debugging state.

Yes, this was my understanding too, but the changelog didn't look
clear to me.

And I'd say that it is not that we want to avoid BUG_ON() in
zap_pid_ns_processes() when !CONFIG_PID_NS, we want to avoid
zap_pid_ns_processes() in the root namespace, regardless of
CONFIG_PID_NS.

> It is a bit tricky to tell if the movement of the decrement of
> signal->live is safe.

Agreed, this was my concern. I see nothing wrong at first glance,
but I can easily miss something.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-19  5:08       ` qianli zhao
@ 2021-03-19 16:32         ` Oleg Nesterov
  2021-03-21 13:04           ` qianli zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2021-03-19 16:32 UTC (permalink / raw)
  To: qianli zhao
  Cc: christian, axboe, Eric W. Biederman, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

On 03/19, qianli zhao wrote:
>
> > But then I don't understand the SIGNAL_GROUP_EXIT check added by your
> > patch. Do we really need it if we want to avoid zap_pid_ns_processes()
> > when the global init exits?
>
> I think check SIGNAL_GROUP_EXIT is necessary,or panic() will happen
> after all init sub-threads do_exit(),so the following two situations
> will happen:
> 1.According to the timing in the changelog,
> zap_pid_ns_processes()->BUG() maybe happened.

How? Perhaps I missed something again, but I don't think this is possible.

zap_pid_ns_processes() simply won't be called, find_child_reaper() will
see the !PF_EXITING thread which calls panic().

So I think this should be documented somehow, at least in the changelog.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-19  6:33         ` qianli zhao
@ 2021-03-19 16:34           ` Oleg Nesterov
  0 siblings, 0 replies; 20+ messages in thread
From: Oleg Nesterov @ 2021-03-19 16:34 UTC (permalink / raw)
  To: qianli zhao
  Cc: Eric W. Biederman, christian, axboe, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

On 03/19, qianli zhao wrote:
>
> I will think about the risks of movement of the decrement of
> signal->live before exit_signal().
> If is difficult to judge movement of the decrement of signal->live is
> safe,how about only test 'signal->live==1' not use group_dead?
>
> Such as:
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 04029e3..87f3595 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -767,6 +767,17 @@ void __noreturn do_exit(long code)
>         validate_creds_for_do_exit(tsk);
>
>         /*
> +        * If global init has exited,
> +        * panic immediately to get a useable coredump.
> +        */
> +       if (unlikely(is_global_init(tsk) &&
> +           ((atomic_read(&tsk->signal->live) == 1) ||    /*current is
> last init thread*/

Just suppose signal->live == 2 and both init's sub-threads exit at the
same time. They both can see signal->live == 2, panic() won't be called.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-19 16:32         ` Oleg Nesterov
@ 2021-03-21 13:04           ` qianli zhao
  2021-03-22 16:37             ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: qianli zhao @ 2021-03-21 13:04 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: christian, axboe, Eric W. Biederman, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

Hi,Oleg

> How? Perhaps I missed something again, but I don't think this is possible.

> zap_pid_ns_processes() simply won't be called, find_child_reaper() will
> see the !PF_EXITING thread which calls panic().

> So I think this should be documented somehow, at least in the changelog.

This problem occurs when both two init threads enter the do_exit,
One of the init thread is syscall sys_exit_group,and set SIGNAL_GROUP_EXIT
The other init thread perform ret_to_user()->get_signal() and found
SIGNAL_GROUP_EXIT is set,then do_group_exit()->do_exit(),since there
are no alive init threads it finally goes to
zap_pid_ns_processes() and BUG().

Please refer to the sequence diagram below(that has been written into
the changelog),and callstack is also below:
kernel log:
[   24.705376] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x00007f00
[   24.705382] CPU: 4 PID: 552 Comm: init Tainted: G S         O
4.14.180-perf-g4483caa8ae80-dirty #1
[   24.705390] kernel BUG at include/linux/pid_namespace.h:98!

PID: 1      TASK: ffffffc973126900  CPU: 7   COMMAND: "init"
 #0 [ffffff800805ba60] perf_trace_kernel_panic_late at ffffff99ac0bcbcc
 #1 [ffffff800805bac0] die at ffffff99ac08dc64
 #2 [ffffff800805bb10] bug_handler at ffffff99ac08e398
 #3 [ffffff800805bbc0] brk_handler at ffffff99ac08529c
 #4 [ffffff800805bc80] do_debug_exception at ffffff99ac0814e4
->exception
    /home/work/courbet-r-stable-build/kernel/msm-4.14/include/linux/pid_namespace.h:
98
96static inline void zap_pid_ns_processes(struct pid_namespace *ns)
97{
98 BUG();
99}
 #5 [ffffff800805bdf0] el1_dbg at ffffff99ac083298
 #6 [ffffff800805be20] do_exit at ffffff99ac0c22e8
 #7 [ffffff800805be80] do_group_exit at ffffff99ac0c2658
 #8 [ffffff800805beb0] sys_exit_group at ffffff99ac0c266c
 #9 [ffffff800805bff0] el0_svc_naked at ffffff99ac083cfc

PID: 552    TASK: ffffffc9613c8f00  CPU: 4   COMMAND: "init"
 #0 [ffffff801455b870] __delay at ffffff99ad32cc14
 #1 [ffffff801455b8b0] __const_udelay at ffffff99ad32cd10
 #2 [ffffff801455b8c0] msm_trigger_wdog_bite at ffffff99ac5d5be0
 #3 [ffffff801455b980] do_msm_restart at ffffff99acccc3f8
 #4 [ffffff801455b9b0] machine_restart at ffffff99ac085dd0
 #5 [ffffff801455b9d0] emergency_restart at ffffff99ac0eb6dc
 #6 [ffffff801455baf0] panic at ffffff99ac0bd008
   ========> /home/work/courbet-r-stable-build/kernel/msm-4.14/kernel/exit.c:
842
836 if (group_dead) {
837 /*
838 * If the last thread of global init has exited, panic
839 * immediately to get a useable coredump.
840 */
841 if (unlikely(is_global_init(tsk)))
842 panic("Attempted to kill init! exitcode=0x%08x\n",
843 tsk->signal->group_exit_code ?: (int)code);

 #7 [ffffff801455bb70] do_exit at ffffff99ac0c257c
#8 [ffffff801455bbd0] do_group_exit at ffffff99ac0c2644
#9 [ffffff801455bcc0] get_signal at ffffff99ac0d1384
#10 [ffffff801455be60] do_notify_resume at ffffff99ac08b2a8
#11 [ffffff801455bff0] work_pending at ffffff99ac083b8c

core4                                  core7
...                                     sys_exit_group()
                                        do_group_exit()
                                          - sig->flags = SIGNAL_GROUP_EXIT
                                          - zap_other_threads()
                                       do_exit() //PF_EXITING is set
ret_to_user()
do_notify_resume()
get_signal()
    - signal_group_exit
    - goto fatal;
do_group_exit()
do_exit() //PF_EXITING is set
    - panic("Attempted to kill init! exitcode=0x%08x\n")
                                        exit_notify()
                                        find_alive_thread() //no alive
sub-threads
                                       zap_pid_ns_processes()
//CONFIG_PID_NS is not set
                                       BUG()

Oleg Nesterov <oleg@redhat.com> 于2021年3月20日周六 上午12:32写道:
>
> On 03/19, qianli zhao wrote:
> >
> > > But then I don't understand the SIGNAL_GROUP_EXIT check added by your
> > > patch. Do we really need it if we want to avoid zap_pid_ns_processes()
> > > when the global init exits?
> >
> > I think check SIGNAL_GROUP_EXIT is necessary,or panic() will happen
> > after all init sub-threads do_exit(),so the following two situations
> > will happen:
> > 1.According to the timing in the changelog,
> > zap_pid_ns_processes()->BUG() maybe happened.
>
> How? Perhaps I missed something again, but I don't think this is possible.
>
> zap_pid_ns_processes() simply won't be called, find_child_reaper() will
> see the !PF_EXITING thread which calls panic().
>
> So I think this should be documented somehow, at least in the changelog.
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-18 19:08       ` Eric W. Biederman
  2021-03-19  6:33         ` qianli zhao
  2021-03-19 16:26         ` Oleg Nesterov
@ 2021-03-21 16:00         ` qianli zhao
  2021-03-22 17:07           ` Oleg Nesterov
  2 siblings, 1 reply; 20+ messages in thread
From: qianli zhao @ 2021-03-21 16:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, christian, axboe, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

Hi,Eric,Oleg

> It is a bit tricky to tell if the movement of the decrement of
> signal->live is safe

It is hard to say whether it is safe or not,below are just a few of my thoughts

> That affects current_is_single threaded
> which is used by unshare, setns of the time namespace, and setting
> the selinux part of creds.

I think is ok in current_is_single_threaded,in check_unshare_flags()
and selinux_setprocattr()
change "signal->live--" position won't change the result,There is no
other dependency change to this patch and these function.

> The usage in kernel/cgroup/cgroup.c:css_task_iter_advance seems safe.
> Hmm, Maybe not.  Today cgroup_thread_change_begin is held around
> setting PF_EXITING before signal->live is decremented.  So there seem to
> be some subtle cgroup dependencies.

Moving the decrement position should only affect between new and old
code position of movement of the decrement of
signal->live.In this range,i think
acct_update_integrals(),sync_mm_rss() mainly updated some data,only
exit_signals() and sched_exit() need
attention.
cgroup_threadgroup_change_begin() is called in exit_signals(),and
css_task_iter_advance used "signal->live",it seems like it might be a
little related.
cgroup_threadgroup_change_begin() just give stable threadgroup for
user,and css_task_iter_advance only check if group is dead, decrement
of
signal->live and sets PF_EXITING seems like safe.

> I think if we are going to move the decrement of signal->live that
> should be it's own patch and be accompanied with a good description of
> why it is safe instead of having the decrement of signal->live be there
> as a side effect of another change.

I'm not sure how to describe whether this move is safe or not,from my
analysis, no side effects have been found.
Would you like tell me how to prove that or give me some suggestion?



Thanks

Eric W. Biederman <ebiederm@xmission.com> 于2021年3月19日周五 上午3:09写道:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > On 03/18, qianli zhao wrote:
> >>
> >> Hi,Oleg
> >>
> >> Thank you for your reply.
> >>
> >> >> When init sub-threads running on different CPUs exit at the same time,
> >> >> zap_pid_ns_processe()->BUG() may be happened.
> >>
> >> > and why do you think your patch can't prevent this?
> >>
> >> > Sorry, I must have missed something. But it seems to me that you are trying
> >> > to fix the wrong problem. Yes, zap_pid_ns_processes() must not be called in
> >> > the root namespace, and this has nothing to do with CONFIG_PID_NS.
> >>
> >> Yes, i try to fix this exception by test SIGNAL_GROUP_EXIT and call
> >> panic before setting PF_EXITING to prevent zap_pid_ns_processes()
> >> being called when init do_exit().
> >
> > Ah, I didn't notice your patch does atomic_dec_and_test(signal->live)
> > before exit_signals() which sets PF_EXITING. Thanks for correcting me.
> >
> > So yes, I was wrong, your patch can prevent this. Although I'd like to
> > recheck if every do-something-if-group-dead action is correct in the
> > case we have a non-PF_EXITING thread...
> >
> > But then I don't understand the SIGNAL_GROUP_EXIT check added by your
> > patch. Do we really need it if we want to avoid zap_pid_ns_processes()
> > when the global init exits?
> >
> >> In addition, the patch also protects the init process state to
> >> successfully get usable init coredump.
> >
> > Could you spell please?
> >
> > Does this connect to SIGNAL_GROUP_EXIT check? Do you mean that you want
> > to panic earlier, before other init's sub-threads exit?
>
> That is my understanding.
>
> As I understand it this patch has two purposes:
> 1. Avoid the BUG_ON in zap_pid_ns_processes when !CONFIG_PID_NS
> 2. panic as early as possible so exiting threads don't removing
>    interesting debugging state.
>
>
> It is a bit tricky to tell if the movement of the decrement of
> signal->live is safe.  That affects current_is_single threaded
> which is used by unshare, setns of the time namespace, and setting
> the selinux part of creds.
>
> The usage in kernel/cgroup/cgroup.c:css_task_iter_advance seems safe.
> Hmm, Maybe not.  Today cgroup_thread_change_begin is held around
> setting PF_EXITING before signal->live is decremented.  So there seem to
> be some subtle cgroup dependencies.
>
> The usages of group_dead in do_exit seem safe, as except for the new
> one everything is the same.
>
> We could definitely take advantage of knowing group_dead in exit_signals
> to simplify it's optimization to not rerouting signals to living
> threads.
>
>
> I think if we are going to move the decrement of signal->live that
> should be it's own patch and be accompanied with a good description of
> why it is safe instead of having the decrement of signal->live be there
> as a side effect of another change.
>
> Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-21 13:04           ` qianli zhao
@ 2021-03-22 16:37             ` Oleg Nesterov
  2021-03-23  3:14               ` qianli zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2021-03-22 16:37 UTC (permalink / raw)
  To: qianli zhao
  Cc: christian, axboe, Eric W. Biederman, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

Hi,

It seems that we don't understand each other.

If we move atomic_dec_and_test(signal->live) and do

	if (group_dead && is_global_init)
		panic(...);


before setting PF_EXITING like your patch does, then zap_pid_ns_processes()
simply won't be called.

Because:

On 03/21, qianli zhao wrote:
>
> Hi,Oleg
>
> > How? Perhaps I missed something again, but I don't think this is possible.
>
> > zap_pid_ns_processes() simply won't be called, find_child_reaper() will
> > see the !PF_EXITING thread which calls panic().
>
> > So I think this should be documented somehow, at least in the changelog.
>
> This problem occurs when both two init threads enter the do_exit,
> One of the init thread is syscall sys_exit_group,and set SIGNAL_GROUP_EXIT
> The other init thread perform ret_to_user()->get_signal() and found
> SIGNAL_GROUP_EXIT is set,then do_group_exit()->do_exit(),since there
> are no alive init threads it finally goes to
> zap_pid_ns_processes()

No, there is at least one alive init thread. If they all have exited, we have
the thread which calls panic() above.

> and BUG().

so we don't need the SIGNAL_GROUP_EXIT check to avoid this BUG().

What have I missed?

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-21 16:00         ` qianli zhao
@ 2021-03-22 17:07           ` Oleg Nesterov
  2021-03-22 17:09             ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2021-03-22 17:07 UTC (permalink / raw)
  To: qianli zhao
  Cc: Eric W. Biederman, christian, axboe, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

On 03/22, qianli zhao wrote:
>
> Moving the decrement position should only affect between new and old
> code position of movement of the decrement of
> signal->live.

Why do you think so? It can affect _any_ code which runs under
"if (group_dead)". Again, I don't see anything wrong, but I didn't even
try to audit these code paths.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-22 17:07           ` Oleg Nesterov
@ 2021-03-22 17:09             ` Oleg Nesterov
  0 siblings, 0 replies; 20+ messages in thread
From: Oleg Nesterov @ 2021-03-22 17:09 UTC (permalink / raw)
  To: qianli zhao
  Cc: Eric W. Biederman, christian, axboe, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

On 03/22, Oleg Nesterov wrote:
>
> On 03/22, qianli zhao wrote:
> >
> > Moving the decrement position should only affect between new and old
> > code position of movement of the decrement of
> > signal->live.
>
> Why do you think so? It can affect _any_ code which runs under
> "if (group_dead)". Again, I don't see anything wrong, but I didn't even
> try to audit these code paths.

forgot to mention...

and any code outside of do_exit() which checks signal->live.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-22 16:37             ` Oleg Nesterov
@ 2021-03-23  3:14               ` qianli zhao
  2021-03-23  9:00                 ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: qianli zhao @ 2021-03-23  3:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: christian, axboe, Eric W. Biederman, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

Hi,Oleg

> No, there is at least one alive init thread. If they all have exited, we have
> the thread which calls panic() above.

By current logic, setting PF_EXITING(exit_signals()) is before the
panic(),find_alive_thread() determines the PF_EXITING of all child
threads, the panic thread's PF_EXITING has been set before panic(),so
find_alive_thread() thinks this thread also dead, resulting in
find_alive_thread returning NULL.It is possible to trigger a
zap_pid_ns_processes()->BUG() in this case.
========
exit_signals(tsk);  /* sets PF_EXITING */
...
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
if (unlikely(is_global_init(tsk)))
panic("Attempted to kill init!
exitcode=0x%08x\n",-------------------->//PF_EXITING has been set
tsk->signal->group_exit_code ?: (int)code);

=======

> Why do you think so? It can affect _any_ code which runs under
> "if (group_dead)". Again, I don't see anything wrong, but I didn't even
> try to audit these code paths.

Yes,all places where checked the "signal->live" may be affected,but
even before my changes, each program that checks "signal->live" may
get different state(group_dead or not), depending on the timing of the
caller,this situation will not change after my change.
After my patch,"signal->live--" and other variable are set in a
different order(such as signal->live and PF_EXITING),this can cause
abnormalities in the logic associated with these two variables,that is
my thinking.
Of course, check all the "signal->live--" path is definitely
necessary,it's just the case above that we need more attention.

Thanks

Oleg Nesterov <oleg@redhat.com> 于2021年3月23日周二 上午12:37写道:
>
> Hi,
>
> It seems that we don't understand each other.
>
> If we move atomic_dec_and_test(signal->live) and do
>
>         if (group_dead && is_global_init)
>                 panic(...);
>
>
> before setting PF_EXITING like your patch does, then zap_pid_ns_processes()
> simply won't be called.
>
> Because:
>
> On 03/21, qianli zhao wrote:
> >
> > Hi,Oleg
> >
> > > How? Perhaps I missed something again, but I don't think this is possible.
> >
> > > zap_pid_ns_processes() simply won't be called, find_child_reaper() will
> > > see the !PF_EXITING thread which calls panic().
> >
> > > So I think this should be documented somehow, at least in the changelog.
> >
> > This problem occurs when both two init threads enter the do_exit,
> > One of the init thread is syscall sys_exit_group,and set SIGNAL_GROUP_EXIT
> > The other init thread perform ret_to_user()->get_signal() and found
> > SIGNAL_GROUP_EXIT is set,then do_group_exit()->do_exit(),since there
> > are no alive init threads it finally goes to
> > zap_pid_ns_processes()
>
> No, there is at least one alive init thread. If they all have exited, we have
> the thread which calls panic() above.
>
> > and BUG().
>
> so we don't need the SIGNAL_GROUP_EXIT check to avoid this BUG().
>
> What have I missed?
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-23  3:14               ` qianli zhao
@ 2021-03-23  9:00                 ` Oleg Nesterov
  2021-03-23 11:23                   ` qianli zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2021-03-23  9:00 UTC (permalink / raw)
  To: qianli zhao
  Cc: christian, axboe, Eric W. Biederman, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

On 03/23, qianli zhao wrote:
>
> Hi,Oleg
>
> > No, there is at least one alive init thread. If they all have exited, we have
> > the thread which calls panic() above.
>
> By current logic, setting PF_EXITING(exit_signals()) is before the
> panic(),

You certainly don't understand me :/

Please read my email you quoted below. I didn't mean the current logic.
I meant the logic after your patch which moves atomic_dec_and_test() and
panic() before exit_signals().

Oleg.

> find_alive_thread() determines the PF_EXITING of all child
> threads, the panic thread's PF_EXITING has been set before panic(),so
> find_alive_thread() thinks this thread also dead, resulting in
> find_alive_thread returning NULL.It is possible to trigger a
> zap_pid_ns_processes()->BUG() in this case.
> ========
> exit_signals(tsk);  /* sets PF_EXITING */
> ...
> group_dead = atomic_dec_and_test(&tsk->signal->live);
> if (group_dead) {
> if (unlikely(is_global_init(tsk)))
> panic("Attempted to kill init!
> exitcode=0x%08x\n",-------------------->//PF_EXITING has been set
> tsk->signal->group_exit_code ?: (int)code);
> 
> =======
> 
> > Why do you think so? It can affect _any_ code which runs under
> > "if (group_dead)". Again, I don't see anything wrong, but I didn't even
> > try to audit these code paths.
> 
> Yes,all places where checked the "signal->live" may be affected,but
> even before my changes, each program that checks "signal->live" may
> get different state(group_dead or not), depending on the timing of the
> caller,this situation will not change after my change.
> After my patch,"signal->live--" and other variable are set in a
> different order(such as signal->live and PF_EXITING),this can cause
> abnormalities in the logic associated with these two variables,that is
> my thinking.
> Of course, check all the "signal->live--" path is definitely
> necessary,it's just the case above that we need more attention.
> 
> Thanks
> 
> Oleg Nesterov <oleg@redhat.com> 于2021年3月23日周二 上午12:37写道:
> >
> > Hi,
> >
> > It seems that we don't understand each other.
> >
> > If we move atomic_dec_and_test(signal->live) and do
> >
> >         if (group_dead && is_global_init)
> >                 panic(...);
> >
> >
> > before setting PF_EXITING like your patch does, then zap_pid_ns_processes()
> > simply won't be called.
> >
> > Because:
> >
> > On 03/21, qianli zhao wrote:
> > >
> > > Hi,Oleg
> > >
> > > > How? Perhaps I missed something again, but I don't think this is possible.
> > >
> > > > zap_pid_ns_processes() simply won't be called, find_child_reaper() will
> > > > see the !PF_EXITING thread which calls panic().
> > >
> > > > So I think this should be documented somehow, at least in the changelog.
> > >
> > > This problem occurs when both two init threads enter the do_exit,
> > > One of the init thread is syscall sys_exit_group,and set SIGNAL_GROUP_EXIT
> > > The other init thread perform ret_to_user()->get_signal() and found
> > > SIGNAL_GROUP_EXIT is set,then do_group_exit()->do_exit(),since there
> > > are no alive init threads it finally goes to
> > > zap_pid_ns_processes()
> >
> > No, there is at least one alive init thread. If they all have exited, we have
> > the thread which calls panic() above.
> >
> > > and BUG().
> >
> > so we don't need the SIGNAL_GROUP_EXIT check to avoid this BUG().
> >
> > What have I missed?
> >
> > Oleg.
> >
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-23  9:00                 ` Oleg Nesterov
@ 2021-03-23 11:23                   ` qianli zhao
  2021-03-24 18:12                     ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: qianli zhao @ 2021-03-23 11:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: christian, axboe, Eric W. Biederman, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

Hi,Oleg

> You certainly don't understand me :/

> Please read my email you quoted below. I didn't mean the current logic.
> I meant the logic after your patch which moves atomic_dec_and_test() and
> panic() before exit_signals().

Sorry, I think I see what you mean now.

You mean that after apply my patch,SIGNAL_GROUP_EXIT no longer needs
to be tested or avoid zap_pid_ns_processes()->BUG().
Yes,your consideration is correct.
But,my patch has another purpose,protect some key variables(such
as:task->mm,task->nsproxy,etc) to recover init coredump from
fulldump,if sub-threads finish do_exit(),these variables of sub-task
will be lost,and we cannot parse the coredump of the init process
through the tool normally such as "gcore".


Oleg Nesterov <oleg@redhat.com> 于2021年3月23日周二 下午5:00写道:
>
> On 03/23, qianli zhao wrote:
> >
> > Hi,Oleg
> >
> > > No, there is at least one alive init thread. If they all have exited, we have
> > > the thread which calls panic() above.
> >
> > By current logic, setting PF_EXITING(exit_signals()) is before the
> > panic(),
>
> You certainly don't understand me :/
>
> Please read my email you quoted below. I didn't mean the current logic.
> I meant the logic after your patch which moves atomic_dec_and_test() and
> panic() before exit_signals().
>
> Oleg.
>
> > find_alive_thread() determines the PF_EXITING of all child
> > threads, the panic thread's PF_EXITING has been set before panic(),so
> > find_alive_thread() thinks this thread also dead, resulting in
> > find_alive_thread returning NULL.It is possible to trigger a
> > zap_pid_ns_processes()->BUG() in this case.
> > ========
> > exit_signals(tsk);  /* sets PF_EXITING */
> > ...
> > group_dead = atomic_dec_and_test(&tsk->signal->live);
> > if (group_dead) {
> > if (unlikely(is_global_init(tsk)))
> > panic("Attempted to kill init!
> > exitcode=0x%08x\n",-------------------->//PF_EXITING has been set
> > tsk->signal->group_exit_code ?: (int)code);
> >
> > =======
> >
> > > Why do you think so? It can affect _any_ code which runs under
> > > "if (group_dead)". Again, I don't see anything wrong, but I didn't even
> > > try to audit these code paths.
> >
> > Yes,all places where checked the "signal->live" may be affected,but
> > even before my changes, each program that checks "signal->live" may
> > get different state(group_dead or not), depending on the timing of the
> > caller,this situation will not change after my change.
> > After my patch,"signal->live--" and other variable are set in a
> > different order(such as signal->live and PF_EXITING),this can cause
> > abnormalities in the logic associated with these two variables,that is
> > my thinking.
> > Of course, check all the "signal->live--" path is definitely
> > necessary,it's just the case above that we need more attention.
> >
> > Thanks
> >
> > Oleg Nesterov <oleg@redhat.com> 于2021年3月23日周二 上午12:37写道:
> > >
> > > Hi,
> > >
> > > It seems that we don't understand each other.
> > >
> > > If we move atomic_dec_and_test(signal->live) and do
> > >
> > >         if (group_dead && is_global_init)
> > >                 panic(...);
> > >
> > >
> > > before setting PF_EXITING like your patch does, then zap_pid_ns_processes()
> > > simply won't be called.
> > >
> > > Because:
> > >
> > > On 03/21, qianli zhao wrote:
> > > >
> > > > Hi,Oleg
> > > >
> > > > > How? Perhaps I missed something again, but I don't think this is possible.
> > > >
> > > > > zap_pid_ns_processes() simply won't be called, find_child_reaper() will
> > > > > see the !PF_EXITING thread which calls panic().
> > > >
> > > > > So I think this should be documented somehow, at least in the changelog.
> > > >
> > > > This problem occurs when both two init threads enter the do_exit,
> > > > One of the init thread is syscall sys_exit_group,and set SIGNAL_GROUP_EXIT
> > > > The other init thread perform ret_to_user()->get_signal() and found
> > > > SIGNAL_GROUP_EXIT is set,then do_group_exit()->do_exit(),since there
> > > > are no alive init threads it finally goes to
> > > > zap_pid_ns_processes()
> > >
> > > No, there is at least one alive init thread. If they all have exited, we have
> > > the thread which calls panic() above.
> > >
> > > > and BUG().
> > >
> > > so we don't need the SIGNAL_GROUP_EXIT check to avoid this BUG().
> > >
> > > What have I missed?
> > >
> > > Oleg.
> > >
> >
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-23 11:23                   ` qianli zhao
@ 2021-03-24 18:12                     ` Oleg Nesterov
  2021-03-25  3:00                       ` qianli zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2021-03-24 18:12 UTC (permalink / raw)
  To: qianli zhao
  Cc: christian, axboe, Eric W. Biederman, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

Hi,

On 03/23, qianli zhao wrote:
>
> Hi,Oleg
>
> > You certainly don't understand me :/
>
> > Please read my email you quoted below. I didn't mean the current logic.
> > I meant the logic after your patch which moves atomic_dec_and_test() and
> > panic() before exit_signals().
>
> Sorry, I think I see what you mean now.
>
> You mean that after apply my patch,SIGNAL_GROUP_EXIT no longer needs
> to be tested or avoid zap_pid_ns_processes()->BUG().
> Yes,your consideration is correct.

OK, great

> But,my patch has another purpose,protect some key variables(such
> as:task->mm,task->nsproxy,etc) to recover init coredump from
> fulldump,if sub-threads finish do_exit(),

Yes I know.

But the purpose of this SIGNAL_GROUP_EXIT check is not clear and not
documented. That is why I said it should be documented at least in the
changelog.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V3] exit: trigger panic when global init has exited
  2021-03-24 18:12                     ` Oleg Nesterov
@ 2021-03-25  3:00                       ` qianli zhao
  0 siblings, 0 replies; 20+ messages in thread
From: qianli zhao @ 2021-03-25  3:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: christian, axboe, Eric W. Biederman, Thomas Gleixner,
	Peter Collingbourne, linux-kernel, Qianli Zhao

>> But,my patch has another purpose,protect some key variables(such
>> as:task->mm,task->nsproxy,etc) to recover init coredump from
>> fulldump,if sub-threads finish do_exit(),

> Yes I know.

> But the purpose of this SIGNAL_GROUP_EXIT check is not clear and not
> documented. That is why I said it should be documented at least in the
> changelog.

Ok.
I will update the changelog as you suggest.

Oleg Nesterov <oleg@redhat.com> 于2021年3月25日周四 上午2:12写道:
>
> Hi,
>
> On 03/23, qianli zhao wrote:
> >
> > Hi,Oleg
> >
> > > You certainly don't understand me :/
> >
> > > Please read my email you quoted below. I didn't mean the current logic.
> > > I meant the logic after your patch which moves atomic_dec_and_test() and
> > > panic() before exit_signals().
> >
> > Sorry, I think I see what you mean now.
> >
> > You mean that after apply my patch,SIGNAL_GROUP_EXIT no longer needs
> > to be tested or avoid zap_pid_ns_processes()->BUG().
> > Yes,your consideration is correct.
>
> OK, great
>
> > But,my patch has another purpose,protect some key variables(such
> > as:task->mm,task->nsproxy,etc) to recover init coredump from
> > fulldump,if sub-threads finish do_exit(),
>
> Yes I know.
>
> But the purpose of this SIGNAL_GROUP_EXIT check is not clear and not
> documented. That is why I said it should be documented at least in the
> changelog.
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-03-25  3:01 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-17 12:51 [PATCH V3] exit: trigger panic when global init has exited Qianli Zhao
2021-03-17 14:38 ` Oleg Nesterov
2021-03-18  2:47   ` qianli zhao
2021-03-18 18:04     ` Oleg Nesterov
2021-03-18 19:08       ` Eric W. Biederman
2021-03-19  6:33         ` qianli zhao
2021-03-19 16:34           ` Oleg Nesterov
2021-03-19 16:26         ` Oleg Nesterov
2021-03-21 16:00         ` qianli zhao
2021-03-22 17:07           ` Oleg Nesterov
2021-03-22 17:09             ` Oleg Nesterov
2021-03-19  5:08       ` qianli zhao
2021-03-19 16:32         ` Oleg Nesterov
2021-03-21 13:04           ` qianli zhao
2021-03-22 16:37             ` Oleg Nesterov
2021-03-23  3:14               ` qianli zhao
2021-03-23  9:00                 ` Oleg Nesterov
2021-03-23 11:23                   ` qianli zhao
2021-03-24 18:12                     ` Oleg Nesterov
2021-03-25  3:00                       ` qianli zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).