From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752353AbdCCTXF (ORCPT ); Fri, 3 Mar 2017 14:23:05 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:49257 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752082AbdCCTW5 (ORCPT ); Fri, 3 Mar 2017 14:22:57 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Oleg Nesterov Cc: Andrew Morton , Aleksa Sarai , Andy Lutomirski , Attila Fazekas , Jann Horn , Kees Cook , Michal Hocko , Ulrich Obergfell , linux-kernel@vger.kernel.org, References: <20170213141452.GA30203@redhat.com> <20170224160354.GA845@redhat.com> <87shmv6ufl.fsf@xmission.com> <20170303173326.GA17899@redhat.com> Date: Fri, 03 Mar 2017 12:23:31 -0600 In-Reply-To: <20170303173326.GA17899@redhat.com> (Oleg Nesterov's message of "Fri, 3 Mar 2017 18:33:26 +0100") Message-ID: <87tw7axlr0.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1cjrw6-0003Uk-I5;;;mid=<87tw7axlr0.fsf@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=67.3.234.240;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX18xlUk76OICTBuCtKnDg3QTatPJE7x0a6s= X-SA-Exim-Connect-IP: 67.3.234.240 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa06 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_01 4+ unique symbols in subject X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: *;Oleg Nesterov X-Spam-Relay-Country: X-Spam-Timing: total 405 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 2.8 (0.7%), b_tie_ro: 1.97 (0.5%), parse: 1.22 (0.3%), extract_message_metadata: 7 (1.6%), get_uri_detail_list: 4.4 (1.1%), tests_pri_-1000: 4.0 (1.0%), tests_pri_-950: 1.23 (0.3%), tests_pri_-900: 1.03 (0.3%), tests_pri_-400: 34 (8.3%), check_bayes: 33 (8.1%), b_tokenize: 11 (2.7%), b_tok_get_all: 13 (3.2%), b_comp_prob: 3.2 (0.8%), b_tok_touch_all: 3.4 (0.8%), b_finish: 0.63 (0.2%), tests_pri_0: 342 (84.4%), check_dkim_signature: 0.55 (0.1%), check_dkim_adsp: 2.6 (0.7%), tests_pri_500: 3.6 (0.9%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH 0/2] fix the traced mt-exec deadlock X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Cc'd linux-api as we are talking about a deliberate user visible API change here. Oleg Nesterov writes: > On 03/02, Eric W. Biederman wrote: >> >> Oleg Nesterov writes: >> >> > our discussion was a bit confusing, and it seems that we did not >> > fully convince each other. So let me ask what do you finally think >> > about this fix. >> > >> > Let me repeat. Even if I do not agree with some of your objections, >> > I do agree that 1/2 does not look nice and clean. And we seem to >> > agree that either way, with or without this fix, we need more changes >> > in this area. >> > >> > But we need a simple and backportable fix for stable trees, say for >> > rhel7. This bug was reported many times, and this is the simplest >> > solution I was able to find. >> >> I am not 100% convinced that we need a backportable solution, > > I think we need, this was already requested, > >> but >> regardless my experience is that good clean solutions are almost always >> easier to backport that something we just settle for. > > Sure. > >> The patch below needs a little more looking and testing but arguably >> it is sufficient. >> >> It implements autoreaping for non-leader threads always. We might want >> to limit this to the case of exec. > > I should have mentioned this. Of course, this change was proposed from the > very beginning, when this problem was reported first time. And of course > I like this fix much more than my patch (but yes, we shouldd limit it to > the case of exec). > > The only problem is that it is incompatible change, and I have no idea what > can be broken. > > Trivial test-case: > > void *thread(void *arg) > { > for (;;) > pause(); > return NULL; > } > > int main(void) > { > pthread_t pt; > pthread_create(&pt, NULL, thread, NULL); > execlp("true", "true", NULL); > } > > with your patch applied > > $ strace -f ./test > strace: wait4(__WALL): No child processes > > Yes, this is just a warning, but still. Now we need to change strace. And we > can't know what else can be confused/broken by this change. > > man(ptrace) even documents that all other threads except the thread group leader > report death as if they exited via _exit(2). > > Yes, yes, it also says that other threads stop in PTRACE_EVENT_EXIT stop, > so 2/2 (which we need with your change too) is is not compatible too and > I am worried, but: > > - this was never really true, an already exiting thread won't stop > if it races with exec > > - PTRACE_O_TRACEEXEC is not used that often, it never really worked > > - man(ptrace) also mentions that PTRACE_EVENT_EXIT behaviour may be > change in the future. > > In short. Of course I considered this change. Looks too risky to me. But. > I will be happy if you send this change and take all the blame ;) I won't > argue (if you limit it to execve case). Unfortunately you have found what already looks like a userspace regression. So I don't think we can do that. I do think the user space visible change of modifying a multi-threaded exec no to wait for the other threads to be reaped is the least dangerous change. The big lesson for me, and what was not obvious from your change description is that we are changing the user space visible semantics of exec+ptrace and that cred_guard_mutex is not at all the problem (as we always take cred_guard_mutex in a killable or interruptible way). So I tentatively agree that we need to deliberately change the semantics of exec to not wait for zombies in fs/exec.c:de_thread. That is what I would expect of exec and that seems to the minimal change. As part of that change I expect we want to call __cleanup_sighand from do_exit rather than release_task. Semantically we sould not need the sighand_struct for zombie process. >> --- a/kernel/exit.c >> +++ b/kernel/exit.c >> @@ -690,7 +690,9 @@ static void exit_notify(struct task_struct *tsk, int group_dead) >> thread_group_empty(tsk) && >> !ptrace_reparented(tsk) ? >> tsk->exit_signal : SIGCHLD; >> - autoreap = do_notify_parent(tsk, sig); >> + do_notify_parent(tsk, sig); >> + /* Autoreap threads even when ptraced */ >> + autoreap = !thread_group_leader(tsk); >> } else if (thread_group_leader(tsk)) { >> autoreap = thread_group_empty(tsk) && >> do_notify_parent(tsk, tsk->exit_signal); > > This is all you need, > >> @@ -699,8 +701,6 @@ static void exit_notify(struct task_struct *tsk, int group_dead) >> } >> >> tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE; >> - if (tsk->exit_state == EXIT_DEAD) >> - list_add(&tsk->ptrace_entry, &dead); >> >> /* mt-exec, de_thread() is waiting for group leader */ >> if (unlikely(tsk->signal->notify_count < 0)) >> @@ -711,6 +711,8 @@ static void exit_notify(struct task_struct *tsk, int group_dead) >> list_del_init(&p->ptrace_entry); >> release_task(p); >> } >> + if (autoreap) >> + release_task(tsk); > > These 2 changes are not needed. release_task(tsk) will be called by > list_for_each_entry_safe() above if autoreap == T. Except for the practical case that for threads that are ptraced tsk->ptrace_entry is already in use. Which means we can't use list_add(&tsk->ptrace_entry, &dead). Or in short we get a really pretty oops because we corrupt one of the ptrace lists if we don't have my extra two hunks. But I agree logically they are separate changes. Eric