From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out02.mta.xmission.com ([166.70.13.232]:49087 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932541AbcKPUHP (ORCPT ); Wed, 16 Nov 2016 15:07:15 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Oleg Nesterov Cc: Jann Horn , Alexander Viro , Roland McGrath , John Johansen , James Morris , "Serge E. Hallyn" , Paul Moore , Stephen Smalley , Eric Paris , Casey Schaufler , Kees Cook , Andrew Morton , Janis Danisevskis , Seth Forshee , Thomas Gleixner , Benjamin LaHaise , Ben Hutchings , Andy Lutomirski , Linus Torvalds , Krister Johansen , linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, security@kernel.org References: <1477863998-3298-1-git-send-email-jann@thejh.net> <1477863998-3298-2-git-send-email-jann@thejh.net> <20161102181806.GB1112@redhat.com> <20161102205011.GF8196@pc.thejh.net> <20161103181225.GA11212@redhat.com> <87k2cj2x6j.fsf@xmission.com> <87k2cjuw6h.fsf@xmission.com> <20161104180416.GA19221@redhat.com> <20161104184505.GA21320@redhat.com> <20161105145623.GA21207@redhat.com> Date: Wed, 16 Nov 2016 14:03:12 -0600 In-Reply-To: <20161105145623.GA21207@redhat.com> (Oleg Nesterov's message of "Sat, 5 Nov 2016 15:56:23 +0100") Message-ID: <87h977dwfz.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [PATCH v3 1/8] exec: introduce cred_guard_light Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Oleg Nesterov writes: > On 11/04, Oleg Nesterov wrote: >> >> On 11/04, Oleg Nesterov wrote: >> > >> > On 11/04, Eric W. Biederman wrote: >> > > >> > > The following mostly correct patch modifies zap_other_threads in >> > > the case of a de_thread to not wait for zombies to be reaped. The only >> > > case that cares is ptrace (as threads are self reaping). So I don't >> > > think this will cause any problems except removing the strace -f race. >> > >> > From my previous email: >> > >> > So the only plan I currently have is change de_thread() to wait until >> > other threads pass exit_notify() or even exit_signals(), but I don't >> > like this. >> > >> > And yes, I don't like this, but perhaps this is what we should do. >> > >> > The patch is incomplete and racy (afaics), and the SIGNAL_GROUP_EXIT >> > checks doesn't look right, but off course technically this change should >> > be simple enough. >> > >> > But not that simple. Just for example, the exiting sub-threads should >> > not run with ->group_leader pointing to nowhere, in case it was reaped >> > by de_thread. >> >> Not to mention other potential problems outside of ptrace/exec. For example >> userns_install() can fail after mt-exec even without ptrace, simply because >> thread_group_empty() can be false. Sure, easy to fix, and probably _install() >> should use signal->live anyway, but still. >> >> And I didn't mention the fun with sighand unsharing. We simply can't do this >> until all sub-threads go away. IOW, your patch breaks the usage of ->siglock. >> The execing thread and the zombie threads will use different locks to, say, >> remove the task from thread-group. Again, this is fixable, but not that >> simple. >> >> > And we have another problem with PTRACE_EVENT_EXIT which can lead to the >> > same deadlock. Unfortunately, the semantics of PTRACE_EVENT_EXIT was never >> > defined. But this change will add the user-visible change. >> > >> > And if we add the user-visible changes, then perhaps we could simply untrace >> > the traced sub-threads on exec. This change is simple, we do not even need >> > to touch exec/de_thread, we could just change exit_notify() to ignore ->ptrace >> > if exec is in progress. But I'm afraid we can't do this. > > So I was thinking about something like below. Untested, probably buggy/incomplete > too, but hopefully can work. > > flush_old_exec() calls the new kill_sub_threads() helper which waits until > all the sub-threads pass exit_notify(). > > de_thread() is called after install_exec_creds(), it is simplified and waits > for thread_group_empty() without cred_guard_mutex. > > But again, I do not really like this, and we need to do something with > PTRACE_EVENT_EXIT anyway, this needs another/separate change. User-visible. > > And I disagree that this has nothing to do with cred_guard_mutex. And in any > case we should narrow its scope in do_execve() path. Why do we take it so early? > Why do we need to do, say, copy_strings() with this lock held? The original > motivation for this has gone, acct_arg_size() can work just fine even if > multiple threads call sys_execve(). > > I'll try to discuss the possible changes in LSM hooks with Jann, I still think > that this is what we actually need to do. At least try to do, possibly this is > too complicated. The code below looks interesting. Am I wrong or do we get the PTRACE_EVENT_EXIT case wrong for the multi-threaded exec's when we don't exec from the primary thread? AKA I think the primary thread will pass through ptrace_event(PTRACE_EVENT_EXIT) before we steal it's thread and likewise the thread that calls exec won't pass through ptrace_event(PTRACE_EVENT_EXIT). Which I suspect gives us quite a bit of lattitude to skip that notification entirely without notifying userspace. We need to test to be certain that both gdb and strace can cope. But I do suspect we could just throw ptrace_event(PTRACE_EVENT_EXIT) out in the case of a multi-threaded exec and no one would care. Eric