From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:49492 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759218AbcKCROP (ORCPT ); Thu, 3 Nov 2016 13:14:15 -0400 Date: Thu, 3 Nov 2016 19:12:25 +0100 From: Oleg Nesterov To: Jann Horn Cc: Alexander Viro , Roland McGrath , John Johansen , James Morris , "Serge E. Hallyn" , Paul Moore , Stephen Smalley , Eric Paris , Casey Schaufler , Kees Cook , Andrew Morton , Janis Danisevskis , Seth Forshee , "Eric W. Biederman" , Thomas Gleixner , Benjamin LaHaise , Ben Hutchings , Andy Lutomirski , Linus Torvalds , Krister Johansen , linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, security@kernel.org Subject: Re: [PATCH v3 1/8] exec: introduce cred_guard_light Message-ID: <20161103181225.GA11212@redhat.com> References: <1477863998-3298-1-git-send-email-jann@thejh.net> <1477863998-3298-2-git-send-email-jann@thejh.net> <20161102181806.GB1112@redhat.com> <20161102205011.GF8196@pc.thejh.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161102205011.GF8196@pc.thejh.net> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 11/02, Jann Horn wrote: > > On Wed, Nov 02, 2016 at 07:18:06PM +0100, Oleg Nesterov wrote: > > On 10/30, Jann Horn wrote: > > > > > > This is a new per-threadgroup lock that can often be taken instead of > > > cred_guard_mutex and has less deadlock potential. I'm doing this because > > > Oleg Nesterov mentioned the potential for deadlocks, in particular if a > > > debugged task is stuck in execve, trying to get rid of a ptrace-stopped > > > thread, and the debugger attempts to inspect procfs files of the debugged > > > task. > > > > Yes, but let me repeat that we need to fix this anyway. So I don't really > > understand why should we add yet another mutex. > > execve() only takes the new mutex immediately after de_thread(), so this > problem shouldn't occur there. Yes, I see. > Basically, I think that I'm not making the > problem worse with my patches this way. In a sense that it doesn't add the new deadlocks, I agree. But it adds yet another per-process mutex while we already have the similar one, > I believe that it should be possible to convert most existing users of the > cred_guard_mutex to the new cred_guard_light - exceptions to that that I > see are: > > - PTRACE_ATTACH This is the main problem afaics. So "strace -f" can hang if it races with mt-exec. And we need to fix this. I constantly forget about this problem, but I tried many times to find a reasonable solution, still can't. IMO, it would be nice to rework the lsm hooks, so that we could take cred_guard_mutex after de_thread() (like your cred_guard_light) or at least drop it earlier, but unlikely this is possible... So the only plan I currently have is change de_thread() to wait until other threads pass exit_notify() or even exit_signals(), but I don't like this. > - SECCOMP_FILTER_FLAG_TSYNC (sets NO_NEW_PRIVS on remote task) I forgot about this one... Need to re-check but at first glance this is not a real problem. > Beyond that, conceptually, the new cred_guard_light could also be turned > into a read-write mutex Not sure I understand how this can help... doesn't matter. My point is, imo you should not add the new mutex. Just use the old one in (say) 4/8 (which I do not personally like as you know ;), this won't add the new problem. > It seems to me like SECCOMP_FILTER_FLAG_TSYNC doesn't really have > deadlocking issues. Yes, agreed. > PTRACE_ATTACH isn't that clear to me; if a debugger > tries to attach to a newly spawned thread while another ptraced thread is > dying because of de_thread() in a third thread, that might still cause > the debugger to deadlock, right? This is the trivial test-case I wrote when the problem was initially reported. And damn, I always knew that cred_guard_mutex needs fixes, but somehow I completely forgot that it is used by PTRACE_ATTACH when I was going to try to remove from fs/proc a long ago. void *thread(void *arg) { ptrace(PTRACE_TRACEME, 0,0,0); return NULL; } int main(void) { int pid = fork(); if (!pid) { pthread_t pt; pthread_create(&pt, NULL, thread, NULL); pthread_join(pt, NULL); execlp("echo", "echo", "passed", NULL); } sleep(1); // or anything else which needs ->cred_guard_mutex, // say open(/proc/$pid/mem) ptrace(PTRACE_ATTACH, pid, 0,0); kill(pid, SIGCONT); return 0; } The problem is trivial. The execing thread waits until its sub-thread goes away, it should be reaped by the tracer, the tracer waits for cred_guard_mutex. > security_bprm_set_creds() is called in prepare_binprm(), which is > executed very early in do_execveat_common(), at a point where failures > should still be graceful (return an error code instead of killing the > whole process), Yes, yes. Oleg.