Re: [PATCH V3] exit: trigger panic when global init has exited

From: qianli zhao <zhaoqianligood@gmail.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>,
	christian@brauner.io, axboe@kernel.dk,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Collingbourne <pcc@google.com>,
	linux-kernel@vger.kernel.org, Qianli Zhao <zhaoqianli@xiaomi.com>
Subject: Re: [PATCH V3] exit: trigger panic when global init has exited
Date: Fri, 19 Mar 2021 14:33:23 +0800	[thread overview]
Message-ID: <CAPx_LQHahNDvUkv08RZgUvbKZtdHNaSNRA1XqVDkNiwv5D=fXg@mail.gmail.com> (raw)
In-Reply-To: <m1pmzwb7pd.fsf@fess.ebiederm.org>

Hi,Eric

> As I understand it this patch has two purposes:
> 1. Avoid the BUG_ON in zap_pid_ns_processes when !CONFIG_PID_NS
> 2. panic as early as possible so exiting threads don't removing
>   interesting debugging state.

Your understanding is very correct,this is what my patch wants to do

> I think if we are going to move the decrement of signal->live that
> should be it's own patch and be accompanied with a good description of
> why it is safe instead of having the decrement of signal->live be there
> as a side effect of another change.

I will think about the risks of movement of the decrement of
signal->live before exit_signal().
If is difficult to judge movement of the decrement of signal->live is
safe,how about only test 'signal->live==1' not use group_dead?

Such as:

diff --git a/kernel/exit.c b/kernel/exit.c
index 04029e3..87f3595 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -767,6 +767,17 @@ void __noreturn do_exit(long code)
        validate_creds_for_do_exit(tsk);

        /*
+        * If global init has exited,
+        * panic immediately to get a useable coredump.
+        */
+       if (unlikely(is_global_init(tsk) &&
+           ((atomic_read(&tsk->signal->live) == 1) ||    /*current is
last init thread*/
+            (tsk->signal->flags & SIGNAL_GROUP_EXIT)))) {
+                       panic("Attempted to kill init! exitcode=0x%08x\n",
+                               tsk->signal->group_exit_code ?: (int)code);
+       }
+
+       /*
         * We're taking recursive faults here in do_exit. Safest is to just
         * leave this task alone and wait for reboot.
         */
@@ -784,16 +795,9 @@ void __noreturn do_exit(long code)
        if (tsk->mm)
                sync_mm_rss(tsk->mm);
        acct_update_integrals(tsk);
+
        group_dead = atomic_dec_and_test(&tsk->signal->live);
        if (group_dead) {
-               /*
-                * If the last thread of global init has exited, panic
-                * immediately to get a useable coredump.
-                */
-               if (unlikely(is_global_init(tsk)))
-                       panic("Attempted to kill init! exitcode=0x%08x\n",
-                               tsk->signal->group_exit_code ?: (int)code);
-

Eric W. Biederman <ebiederm@xmission.com> 于2021年3月19日周五 上午3:09写道：
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > On 03/18, qianli zhao wrote:
> >>
> >> Hi,Oleg
> >>
> >> Thank you for your reply.
> >>
> >> >> When init sub-threads running on different CPUs exit at the same time,
> >> >> zap_pid_ns_processe()->BUG() may be happened.
> >>
> >> > and why do you think your patch can't prevent this?
> >>
> >> > Sorry, I must have missed something. But it seems to me that you are trying
> >> > to fix the wrong problem. Yes, zap_pid_ns_processes() must not be called in
> >> > the root namespace, and this has nothing to do with CONFIG_PID_NS.
> >>
> >> Yes, i try to fix this exception by test SIGNAL_GROUP_EXIT and call
> >> panic before setting PF_EXITING to prevent zap_pid_ns_processes()
> >> being called when init do_exit().
> >
> > Ah, I didn't notice your patch does atomic_dec_and_test(signal->live)
> > before exit_signals() which sets PF_EXITING. Thanks for correcting me.
> >
> > So yes, I was wrong, your patch can prevent this. Although I'd like to
> > recheck if every do-something-if-group-dead action is correct in the
> > case we have a non-PF_EXITING thread...
> >
> > But then I don't understand the SIGNAL_GROUP_EXIT check added by your
> > patch. Do we really need it if we want to avoid zap_pid_ns_processes()
> > when the global init exits?
> >
> >> In addition, the patch also protects the init process state to
> >> successfully get usable init coredump.
> >
> > Could you spell please?
> >
> > Does this connect to SIGNAL_GROUP_EXIT check? Do you mean that you want
> > to panic earlier, before other init's sub-threads exit?
>
> That is my understanding.
>
> As I understand it this patch has two purposes:
> 1. Avoid the BUG_ON in zap_pid_ns_processes when !CONFIG_PID_NS
> 2. panic as early as possible so exiting threads don't removing
>    interesting debugging state.
>
>
> It is a bit tricky to tell if the movement of the decrement of
> signal->live is safe.  That affects current_is_single threaded
> which is used by unshare, setns of the time namespace, and setting
> the selinux part of creds.
>
> The usage in kernel/cgroup/cgroup.c:css_task_iter_advance seems safe.
> Hmm, Maybe not.  Today cgroup_thread_change_begin is held around
> setting PF_EXITING before signal->live is decremented.  So there seem to
> be some subtle cgroup dependencies.
>
> The usages of group_dead in do_exit seem safe, as except for the new
> one everything is the same.
>
> We could definitely take advantage of knowing group_dead in exit_signals
> to simplify it's optimization to not rerouting signals to living
> threads.
>
>
> I think if we are going to move the decrement of signal->live that
> should be it's own patch and be accompanied with a good description of
> why it is safe instead of having the decrement of signal->live be there
> as a side effect of another change.
>
> Eric