From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out03.mta.xmission.com ([166.70.13.233]:41547 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751099AbcKIAh0 (ORCPT ); Tue, 8 Nov 2016 19:37:26 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Oleg Nesterov Cc: Jann Horn , Alexander Viro , Roland McGrath , John Johansen , James Morris , "Serge E. Hallyn" , Paul Moore , Stephen Smalley , Eric Paris , Casey Schaufler , Kees Cook , Andrew Morton , Janis Danisevskis , Seth Forshee , Thomas Gleixner , Benjamin LaHaise , Ben Hutchings , Andy Lutomirski , Linus Torvalds , Krister Johansen , linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, security@kernel.org References: <1477863998-3298-1-git-send-email-jann@thejh.net> <1477863998-3298-2-git-send-email-jann@thejh.net> <20161102181806.GB1112@redhat.com> <20161102205011.GF8196@pc.thejh.net> <20161103181225.GA11212@redhat.com> <87k2cj2x6j.fsf@xmission.com> <87k2cjuw6h.fsf@xmission.com> <20161104180416.GA19221@redhat.com> <20161104184505.GA21320@redhat.com> <20161105145623.GA21207@redhat.com> Date: Tue, 08 Nov 2016 18:34:57 -0600 In-Reply-To: <20161105145623.GA21207@redhat.com> (Oleg Nesterov's message of "Sat, 5 Nov 2016 15:56:23 +0100") Message-ID: <87inrxsd72.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [PATCH v3 1/8] exec: introduce cred_guard_light Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Oleg Nesterov writes: > On 11/04, Oleg Nesterov wrote: >> >> On 11/04, Oleg Nesterov wrote: >> > >> > On 11/04, Eric W. Biederman wrote: >> > > >> > > The following mostly correct patch modifies zap_other_threads in >> > > the case of a de_thread to not wait for zombies to be reaped. The only >> > > case that cares is ptrace (as threads are self reaping). So I don't >> > > think this will cause any problems except removing the strace -f race. >> > >> > From my previous email: >> > >> > So the only plan I currently have is change de_thread() to wait until >> > other threads pass exit_notify() or even exit_signals(), but I don't >> > like this. >> > >> > And yes, I don't like this, but perhaps this is what we should do. >> > >> > The patch is incomplete and racy (afaics), and the SIGNAL_GROUP_EXIT >> > checks doesn't look right, but off course technically this change should >> > be simple enough. >> > >> > But not that simple. Just for example, the exiting sub-threads should >> > not run with ->group_leader pointing to nowhere, in case it was reaped >> > by de_thread. >> >> Not to mention other potential problems outside of ptrace/exec. For example >> userns_install() can fail after mt-exec even without ptrace, simply because >> thread_group_empty() can be false. Sure, easy to fix, and probably _install() >> should use signal->live anyway, but still. >> >> And I didn't mention the fun with sighand unsharing. We simply can't do this >> until all sub-threads go away. IOW, your patch breaks the usage of ->siglock. >> The execing thread and the zombie threads will use different locks to, say, >> remove the task from thread-group. Again, this is fixable, but not that >> simple. >> >> > And we have another problem with PTRACE_EVENT_EXIT which can lead to the >> > same deadlock. Unfortunately, the semantics of PTRACE_EVENT_EXIT was never >> > defined. But this change will add the user-visible change. >> > >> > And if we add the user-visible changes, then perhaps we could simply untrace >> > the traced sub-threads on exec. This change is simple, we do not even need >> > to touch exec/de_thread, we could just change exit_notify() to ignore ->ptrace >> > if exec is in progress. But I'm afraid we can't do this. > > So I was thinking about something like below. Untested, probably buggy/incomplete > too, but hopefully can work. > > flush_old_exec() calls the new kill_sub_threads() helper which waits until > all the sub-threads pass exit_notify(). > > de_thread() is called after install_exec_creds(), it is simplified and waits > for thread_group_empty() without cred_guard_mutex. > > But again, I do not really like this, and we need to do something with > PTRACE_EVENT_EXIT anyway, this needs another/separate change. User-visible. > > And I disagree that this has nothing to do with cred_guard_mutex. And in any > case we should narrow its scope in do_execve() path. Why do we take it so early? > Why do we need to do, say, copy_strings() with this lock held? The original > motivation for this has gone, acct_arg_size() can work just fine even if > multiple threads call sys_execve(). The little piece of this puzzle that I understand is that we don't want to ptrace_attach while a processes is in the middle of exec. The name cred_guard_mutex is odd for that, but that is what I see that lock doing. But ptrace really needs to consider either the original creds and mm or the final creds and mm. Halfway states are problem. Solution to avoid that may simply be some code motion that allows the mutex to have a smaller hold time. Eric