From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out03.mta.xmission.com ([166.70.13.233]:37372 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761889AbcKDN3g (ORCPT ); Fri, 4 Nov 2016 09:29:36 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Oleg Nesterov Cc: Jann Horn , Alexander Viro , Roland McGrath , John Johansen , James Morris , "Serge E. Hallyn" , Paul Moore , Stephen Smalley , Eric Paris , Casey Schaufler , Kees Cook , Andrew Morton , Janis Danisevskis , Seth Forshee , Thomas Gleixner , Benjamin LaHaise , Ben Hutchings , Andy Lutomirski , Linus Torvalds , Krister Johansen , linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, security@kernel.org References: <1477863998-3298-1-git-send-email-jann@thejh.net> <1477863998-3298-2-git-send-email-jann@thejh.net> <20161102181806.GB1112@redhat.com> <20161102205011.GF8196@pc.thejh.net> <20161103181225.GA11212@redhat.com> Date: Fri, 04 Nov 2016 08:26:28 -0500 In-Reply-To: <20161103181225.GA11212@redhat.com> (Oleg Nesterov's message of "Thu, 3 Nov 2016 19:12:25 +0100") Message-ID: <87k2cj2x6j.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [PATCH v3 1/8] exec: introduce cred_guard_light Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Oleg Nesterov writes: > On 11/02, Jann Horn wrote: >> >> On Wed, Nov 02, 2016 at 07:18:06PM +0100, Oleg Nesterov wrote: >> > On 10/30, Jann Horn wrote: >> > > >> > > This is a new per-threadgroup lock that can often be taken instead of >> > > cred_guard_mutex and has less deadlock potential. I'm doing this because >> > > Oleg Nesterov mentioned the potential for deadlocks, in particular if a >> > > debugged task is stuck in execve, trying to get rid of a ptrace-stopped >> > > thread, and the debugger attempts to inspect procfs files of the debugged >> > > task. >> > >> > Yes, but let me repeat that we need to fix this anyway. So I don't really >> > understand why should we add yet another mutex. >> >> execve() only takes the new mutex immediately after de_thread(), so this >> problem shouldn't occur there. > > Yes, I see. > >> Basically, I think that I'm not making the >> problem worse with my patches this way. > > In a sense that it doesn't add the new deadlocks, I agree. But it adds > yet another per-process mutex while we already have the similar one, > >> I believe that it should be possible to convert most existing users of the >> cred_guard_mutex to the new cred_guard_light - exceptions to that that I >> see are: >> >> - PTRACE_ATTACH > > This is the main problem afaics. So "strace -f" can hang if it races > with mt-exec. And we need to fix this. I constantly forget about this > problem, but I tried many times to find a reasonable solution, still > can't. > > IMO, it would be nice to rework the lsm hooks, so that we could take > cred_guard_mutex after de_thread() (like your cred_guard_light) or > at least drop it earlier, but unlikely this is possible... > > So the only plan I currently have is change de_thread() to wait until > other threads pass exit_notify() or even exit_signals(), but I don't > like this. > >> - SECCOMP_FILTER_FLAG_TSYNC (sets NO_NEW_PRIVS on remote task) > > I forgot about this one... Need to re-check but at first glance this > is not a real problem. > >> Beyond that, conceptually, the new cred_guard_light could also be turned >> into a read-write mutex > > Not sure I understand how this can help... doesn't matter. > > My point is, imo you should not add the new mutex. Just use the old > one in (say) 4/8 (which I do not personally like as you know ;), this > won't add the new problem. > > >> It seems to me like SECCOMP_FILTER_FLAG_TSYNC doesn't really have >> deadlocking issues. > > Yes, agreed. > >> PTRACE_ATTACH isn't that clear to me; if a debugger >> tries to attach to a newly spawned thread while another ptraced thread is >> dying because of de_thread() in a third thread, that might still cause >> the debugger to deadlock, right? > > This is the trivial test-case I wrote when the problem was initially > reported. And damn, I always knew that cred_guard_mutex needs fixes, > but somehow I completely forgot that it is used by PTRACE_ATTACH when > I was going to try to remove from fs/proc a long ago. > > void *thread(void *arg) > { > ptrace(PTRACE_TRACEME, 0,0,0); > return NULL; > } > > int main(void) > { > int pid = fork(); > > if (!pid) { > pthread_t pt; > pthread_create(&pt, NULL, thread, NULL); > pthread_join(pt, NULL); > execlp("echo", "echo", "passed", NULL); > } > > sleep(1); > // or anything else which needs ->cred_guard_mutex, > // say open(/proc/$pid/mem) > ptrace(PTRACE_ATTACH, pid, 0,0); > kill(pid, SIGCONT); > > return 0; > } > > The problem is trivial. The execing thread waits until its sub-thread > goes away, it should be reaped by the tracer, the tracer waits for > cred_guard_mutex. There is a bug here but I don't believe it has anything to do with the cred_guard_mutex. If we reach zap_other_threads fundamentally the tracer should not be able to block the traced thread from exiting. Those are the semantics described in the comments in the code. I have poked things a little and have a half fix for that but the fix appears to be the wrong, but enlightening. AKA the following prevents the hang of your test case. diff --git a/kernel/signal.c b/kernel/signal.c index 75761acc77cf..a6f83450500e 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1200,7 +1200,7 @@ int zap_other_threads(struct task_struct *p) if (t->exit_state) continue; sigaddset(&t->pending.signal, SIGKILL); - signal_wake_up(t, 1); + signal_wake_up_state(t, TASK_WAKEKILL | __TASK_TRACED); } return count; It looks like somewhere on the exit path the traced thread is blocking without setting TASK_WAKEKILL. Eric