All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Qianli Zhao <zhaoqianligood@gmail.com>
Cc: christian@brauner.io, axboe@kernel.dk, ebiederm@xmission.com,
	tglx@linutronix.de, pcc@google.com, linux-kernel@vger.kernel.org,
	zhaoqianli@xiaomi.com
Subject: Re: [PATCH] exit: trigger panic when init process is set to SIGNAL_GROUP_EXIT
Date: Tue, 9 Mar 2021 19:26:58 +0100	[thread overview]
Message-ID: <20210309182657.GA1408@redhat.com> (raw)
In-Reply-To: <1615296712-175334-1-git-send-email-zhaoqianligood@gmail.com>

On 03/09, Qianli Zhao wrote:
>
> From: Qianli Zhao <zhaoqianli@xiaomi.com>
>
> Once any init thread finds SIGNAL_GROUP_EXIT, trigger panic immediately
> instead of last thread of global init has exited, and do not allow other
> init threads to exit, protect task/memory state of all sub-threads for
> get reliable init coredump

To be honest, I don't understand the changelog. It seems that you want
to uglify the kernel to simplify the debugging of buggy init? Or what?

Nor can I understand the patch. I fail to understand the games with
SIGNAL_UNKILLABLE and ->siglock.

And iiuc with this patch the kernel will crash if init's sub-thread execs,
signal_group_exit() returns T in this case.

Oleg.

> [   24.705376] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
> [   24.705382] CPU: 4 PID: 552 Comm: init Tainted: G S         O    4.14.180-perf-g4483caa8ae80-dirty #1
> [   24.705390] kernel BUG at include/linux/pid_namespace.h:98!
> 
> PID: 552   CPU: 4   COMMAND: "init"
> PID: 1     CPU: 7   COMMAND: "init"
> core4				core7
> ...				sys_exit_group()
> 				do_group_exit()
> 				    - sig->flags = SIGNAL_GROUP_EXIT
> 				    - zap_other_threads()
> 				do_exit()
> 				    - PF_EXITING is set
> ret_to_user()
> do_notify_resume()
> get_signal()
>     - signal_group_exit
>     - goto fatal;
> do_group_exit()
> do_exit()
>     - PF_EXITING is set
>     - panic("Attempted to kill init! exitcode=0x%08x\n")
> 				exit_notify()
> 				find_alive_thread() //no alive sub-threads
> 				zap_pid_ns_processes()//CONFIG_PID_NS is not set
> 				BUG()
> 
> Signed-off-by: Qianli Zhao <zhaoqianli@xiaomi.com>
> ---
> We got an init crash issue, but we can't get init coredump from fulldump, we also
> see BUG() triggered which calling in zap_pid_ns_processes().
> 
> From crash dump we can get the following information:
> 1. "Attempted to kill init",init process is killed.
> - Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
> 2. At the same time as init crash, a BUG() triggered in other core.
> - [   24.705390] kernel BUG at include/linux/pid_namespace.h:98!
> 3. When init thread calls exit_mm, the corresponding thread task->mm will be empty, which is not conducive to extracting coredump
> 
> To fix the issue and save complete coredump, once find init thread is set to SIGNAL_GROUP_EXIT
> trigger panic immediately,and other child threads are not allowed to exit just wait for reboot
> 
> PID: 1      TASK: ffffffc973126900  CPU: 7   COMMAND: "init"
>  #0 [ffffff800805ba60] perf_trace_kernel_panic_late at ffffff99ac0bcbcc
>  #1 [ffffff800805bac0] die at ffffff99ac08dc64
>  #2 [ffffff800805bb10] bug_handler at ffffff99ac08e398
>  #3 [ffffff800805bbc0] brk_handler at ffffff99ac08529c
>  #4 [ffffff800805bc80] do_debug_exception at ffffff99ac0814e4
>  #5 [ffffff800805bdf0] el1_dbg at ffffff99ac083298
> ->Exception
>     /home/work/courbet-r-stable-build/kernel/msm-4.14/include/linux/pid_namespace.h: 98
>  #6 [ffffff800805be20] do_exit at ffffff99ac0c22e8
>  #7 [ffffff800805be80] do_group_exit at ffffff99ac0c2658
>  #8 [ffffff800805beb0] sys_exit_group at ffffff99ac0c266c
>  #9 [ffffff800805bff0] el0_svc_naked at ffffff99ac083cf
> ->SYSCALLNO: 5e (__NR_exit_group) 
> 
> PID: 552    TASK: ffffffc9613c8f00  CPU: 4   COMMAND: "init"
>  #0 [ffffff801455b870] __delay at ffffff99ad32cc14
>  #1 [ffffff801455b8b0] __const_udelay at ffffff99ad32cd10
>  #2 [ffffff801455b8c0] msm_trigger_wdog_bite at ffffff99ac5d5be0
>  #3 [ffffff801455b980] do_msm_restart at ffffff99acccc3f8
>  #4 [ffffff801455b9b0] machine_restart at ffffff99ac085dd0
>  #5 [ffffff801455b9d0] emergency_restart at ffffff99ac0eb6dc
>  #6 [ffffff801455baf0] panic at ffffff99ac0bd008
>  #7 [ffffff801455bb70] do_exit at ffffff99ac0c257c
>     /home/work/courbet-r-stable-build/kernel/msm-4.14/kernel/exit.c: 842
>  #8 [ffffff801455bbd0] do_group_exit at ffffff99ac0c2644
>  #9 [ffffff801455bcc0] get_signal at ffffff99ac0d1384
> #10 [ffffff801455be60] do_notify_resume at ffffff99ac08b2a8
> #11 [ffffff801455bff0] work_pending at ffffff99ac083b8c
> 
> ---
>  kernel/exit.c | 29 +++++++++++++++++++++--------
>  1 file changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index ef2fb929..6b2da22 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -758,6 +758,27 @@ void __noreturn do_exit(long code)
>  	validate_creds_for_do_exit(tsk);
>  
>  	/*
> +	 * Once init group is marked for death,
> +	 * panic immediately to get a useable coredump
> +	 */
> +	if (unlikely(is_global_init(tsk) &&
> +	    signal_group_exit(tsk->signal))) {
> +		spin_lock_irq(&tsk->sighand->siglock);
> +		if (!(tsk->signal->flags & SIGNAL_UNKILLABLE)) {
> +			tsk->signal->flags |= SIGNAL_UNKILLABLE;
> +			spin_unlock_irq(&tsk->sighand->siglock);
> +			panic("Attempted to kill init! exitcode=0x%08x\n",
> +				tsk->signal->group_exit_code ?: (int)code);
> +		} else {
> +			/* init sub-thread is dying, just wait for reboot */
> +			spin_unlock_irq(&tsk->sighand->siglock);
> +			futex_exit_recursive(tsk);
> +			set_current_state(TASK_UNINTERRUPTIBLE);
> +			schedule();
> +		}
> +	}
> +
> +	/*
>  	 * We're taking recursive faults here in do_exit. Safest is to just
>  	 * leave this task alone and wait for reboot.
>  	 */
> @@ -776,14 +797,6 @@ void __noreturn do_exit(long code)
>  	acct_update_integrals(tsk);
>  	group_dead = atomic_dec_and_test(&tsk->signal->live);
>  	if (group_dead) {
> -		/*
> -		 * If the last thread of global init has exited, panic
> -		 * immediately to get a useable coredump.
> -		 */
> -		if (unlikely(is_global_init(tsk)))
> -			panic("Attempted to kill init! exitcode=0x%08x\n",
> -				tsk->signal->group_exit_code ?: (int)code);
> -
>  #ifdef CONFIG_POSIX_TIMERS
>  		hrtimer_cancel(&tsk->signal->real_timer);
>  		exit_itimers(tsk->signal);
> -- 
> 1.9.1
> 


  reply	other threads:[~2021-03-09 18:27 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-09 13:31 [PATCH] exit: trigger panic when init process is set to SIGNAL_GROUP_EXIT Qianli Zhao
2021-03-09 18:26 ` Oleg Nesterov [this message]
2021-03-10  3:59   ` qianli zhao
2021-03-10 16:44     ` Eric W. Biederman
2021-03-10 17:32       ` Oleg Nesterov
2021-03-10 19:07         ` Eric W. Biederman
2021-03-10 22:13         ` Eric W. Biederman
2021-03-11  4:40       ` qianli zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210309182657.GA1408@redhat.com \
    --to=oleg@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=christian@brauner.io \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pcc@google.com \
    --cc=tglx@linutronix.de \
    --cc=zhaoqianli@xiaomi.com \
    --cc=zhaoqianligood@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.