From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751497AbdCCBL6 (ORCPT ); Thu, 2 Mar 2017 20:11:58 -0500 Received: from out02.mta.xmission.com ([166.70.13.232]:40169 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751255AbdCCBL4 (ORCPT ); Thu, 2 Mar 2017 20:11:56 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Oleg Nesterov Cc: Andrew Morton , Aleksa Sarai , Andy Lutomirski , Attila Fazekas , Jann Horn , Kees Cook , Michal Hocko , Ulrich Obergfell , linux-kernel@vger.kernel.org References: <20170213141452.GA30203@redhat.com> <20170224160354.GA845@redhat.com> Date: Thu, 02 Mar 2017 19:05:50 -0600 In-Reply-To: <20170224160354.GA845@redhat.com> (Oleg Nesterov's message of "Fri, 24 Feb 2017 17:03:54 +0100") Message-ID: <87shmv6ufl.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1cjbjs-0003fD-Vr;;;mid=<87shmv6ufl.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=67.3.234.240;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX19bhKxXKhCQ+w60qO7d1AKPWrOPROCQbt0= X-SA-Exim-Connect-IP: 67.3.234.240 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa07 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_01 4+ unique symbols in subject X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: *;Oleg Nesterov X-Spam-Relay-Country: X-Spam-Timing: total 431 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 3.5 (0.8%), b_tie_ro: 2.4 (0.6%), parse: 1.08 (0.3%), extract_message_metadata: 4.9 (1.1%), get_uri_detail_list: 2.8 (0.7%), tests_pri_-1000: 3.7 (0.9%), tests_pri_-950: 1.18 (0.3%), tests_pri_-900: 0.97 (0.2%), tests_pri_-400: 30 (6.9%), check_bayes: 29 (6.7%), b_tokenize: 11 (2.6%), b_tok_get_all: 9 (2.1%), b_comp_prob: 3.1 (0.7%), b_tok_touch_all: 3.3 (0.8%), b_finish: 0.72 (0.2%), tests_pri_0: 372 (86.3%), check_dkim_signature: 0.48 (0.1%), check_dkim_adsp: 2.8 (0.6%), tests_pri_500: 4.6 (1.1%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH 0/2] fix the traced mt-exec deadlock X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Oleg Nesterov writes: > Eric, > > our discussion was a bit confusing, and it seems that we did not > fully convince each other. So let me ask what do you finally think > about this fix. > > Let me repeat. Even if I do not agree with some of your objections, > I do agree that 1/2 does not look nice and clean. And we seem to > agree that either way, with or without this fix, we need more changes > in this area. > > But we need a simple and backportable fix for stable trees, say for > rhel7. This bug was reported many times, and this is the simplest > solution I was able to find. I am not 100% convinced that we need a backportable solution, but regardless my experience is that good clean solutions are almost always easier to backport that something we just settle for. The patch below needs a little more looking and testing but arguably it is sufficient. It implements autoreaping for non-leader threads always. We might want to limit this to the case of exec. For exec there is a strong argument that no one cares as the exit_code/status returned by these exiting threads has always been 0. Maybe that is ok. Exit_success but it certainly seems wrong in the case of exec. It would really be better to have an exit status that you went away with exec. I know it has always been 0 as fs/exec.c does not set sig->group_exit_code so the value initialized in copy_signal from kmem_cache_zalloc is used. AKA 0. This change has very little to do with the cred_guard_mutex. Just a question of which semantics we want to export. In my preliminary testing this seems a viable. My big problem with your previous patch is that it is in no way obviously correct. Nor is it well explained. All signs that someone is pushing too hard and something better can be achieved if someone steps back a little. When the following patch can be made to solve the same problem we certainly need an explanation in the commit comment on why we are using much more code for the same job. diff --git a/kernel/exit.c b/kernel/exit.c index 5cfbd595f918..5a432e0dcf4a 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -690,7 +690,9 @@ static void exit_notify(struct task_struct *tsk, int group_dead) thread_group_empty(tsk) && !ptrace_reparented(tsk) ? tsk->exit_signal : SIGCHLD; - autoreap = do_notify_parent(tsk, sig); + do_notify_parent(tsk, sig); + /* Autoreap threads even when ptraced */ + autoreap = !thread_group_leader(tsk); } else if (thread_group_leader(tsk)) { autoreap = thread_group_empty(tsk) && do_notify_parent(tsk, tsk->exit_signal); @@ -699,8 +701,6 @@ static void exit_notify(struct task_struct *tsk, int group_dead) } tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE; - if (tsk->exit_state == EXIT_DEAD) - list_add(&tsk->ptrace_entry, &dead); /* mt-exec, de_thread() is waiting for group leader */ if (unlikely(tsk->signal->notify_count < 0)) @@ -711,6 +711,8 @@ static void exit_notify(struct task_struct *tsk, int group_dead) list_del_init(&p->ptrace_entry); release_task(p); } + if (autoreap) + release_task(tsk); } #ifdef CONFIG_DEBUG_STACK_USAGE Eric