From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:49492 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1759218AbcKCROP (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 3 Nov 2016 13:14:15 -0400
Date: Thu, 3 Nov 2016 19:12:25 +0100
From: Oleg Nesterov <oleg@redhat.com>
To: Jann Horn <jann@thejh.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
        Roland McGrath <roland@hack.frob.com>,
        John Johansen <john.johansen@canonical.com>,
        James Morris <james.l.morris@oracle.com>,
        "Serge E. Hallyn" <serge@hallyn.com>,
        Paul Moore <aul@paul-moore.com>,
        Stephen Smalley <sds@tycho.nsa.gov>,
        Eric Paris <eparis@parisplace.org>,
        Casey Schaufler <casey@schaufler-ca.com>,
        Kees Cook <keescook@chromium.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Janis Danisevskis <jdanis@google.com>,
        Seth Forshee <seth.forshee@canonical.com>,
        "Eric W. Biederman" <ebiederm@xmission.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Benjamin LaHaise <bcrl@kvack.org>,
        Ben Hutchings <ben@decadent.org.uk>,
        Andy Lutomirski <luto@amacapital.net>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Krister Johansen <kjlx@templeofstupid.com>,
        linux-fsdevel@vger.kernel.org,
        linux-security-module@vger.kernel.org, security@kernel.org
Subject: Re: [PATCH v3 1/8] exec: introduce cred_guard_light
Message-ID: <20161103181225.GA11212@redhat.com>
References: <1477863998-3298-1-git-send-email-jann@thejh.net> <1477863998-3298-2-git-send-email-jann@thejh.net> <20161102181806.GB1112@redhat.com> <20161102205011.GF8196@pc.thejh.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20161102205011.GF8196@pc.thejh.net>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On 11/02, Jann Horn wrote:
>
> On Wed, Nov 02, 2016 at 07:18:06PM +0100, Oleg Nesterov wrote:
> > On 10/30, Jann Horn wrote:
> > >
> > > This is a new per-threadgroup lock that can often be taken instead of
> > > cred_guard_mutex and has less deadlock potential. I'm doing this because
> > > Oleg Nesterov mentioned the potential for deadlocks, in particular if a
> > > debugged task is stuck in execve, trying to get rid of a ptrace-stopped
> > > thread, and the debugger attempts to inspect procfs files of the debugged
> > > task.
> >
> > Yes, but let me repeat that we need to fix this anyway. So I don't really
> > understand why should we add yet another mutex.
>
> execve() only takes the new mutex immediately after de_thread(), so this
> problem shouldn't occur there.

Yes, I see.

> Basically, I think that I'm not making the
> problem worse with my patches this way.

In a sense that it doesn't add the new deadlocks, I agree. But it adds
yet another per-process mutex while we already have the similar one,

> I believe that it should be possible to convert most existing users of the
> cred_guard_mutex to the new cred_guard_light - exceptions to that that I
> see are:
>
>  - PTRACE_ATTACH

This is the main problem afaics. So "strace -f" can hang if it races
with mt-exec. And we need to fix this. I constantly forget about this
problem, but I tried many times to find a reasonable solution, still
can't.

IMO, it would be nice to rework the lsm hooks, so that we could take
cred_guard_mutex after de_thread() (like your cred_guard_light) or
at least drop it earlier, but unlikely this is possible...

So the only plan I currently have is change de_thread() to wait until
other threads pass exit_notify() or even exit_signals(), but I don't
like this.

>  - SECCOMP_FILTER_FLAG_TSYNC (sets NO_NEW_PRIVS on remote task)

I forgot about this one... Need to re-check but at first glance this
is not a real problem.

> Beyond that, conceptually, the new cred_guard_light could also be turned
> into a read-write mutex

Not sure I understand how this can help... doesn't matter.

My point is, imo you should not add the new mutex. Just use the old
one in (say) 4/8 (which I do not personally like as you know ;), this
won't add the new problem.


> It seems to me like SECCOMP_FILTER_FLAG_TSYNC doesn't really have
> deadlocking issues.

Yes, agreed.

> PTRACE_ATTACH isn't that clear to me; if a debugger
> tries to attach to a newly spawned thread while another ptraced thread is
> dying because of de_thread() in a third thread, that might still cause
> the debugger to deadlock, right?

This is the trivial test-case I wrote when the problem was initially
reported. And damn, I always knew that cred_guard_mutex needs fixes,
but somehow I completely forgot that it is used by PTRACE_ATTACH when
I was going to try to remove from fs/proc a long ago.

	void *thread(void *arg)
	{
		ptrace(PTRACE_TRACEME, 0,0,0);
		return NULL;
	}

	int main(void)
	{
		int pid = fork();

		if (!pid) {
			pthread_t pt;
			pthread_create(&pt, NULL, thread, NULL);
			pthread_join(pt, NULL);
			execlp("echo", "echo", "passed", NULL);
		}

		sleep(1);
		// or anything else which needs ->cred_guard_mutex,
		// say open(/proc/$pid/mem)
		ptrace(PTRACE_ATTACH, pid, 0,0);
		kill(pid, SIGCONT);

		return 0;
	}

The problem is trivial. The execing thread waits until its sub-thread
goes away, it should be reaped by the tracer, the tracer waits for
cred_guard_mutex.

> security_bprm_set_creds() is called in prepare_binprm(), which is
> executed very early in do_execveat_common(), at a point where failures
> should still be graceful (return an error code instead of killing the
> whole process),

Yes, yes.

Oleg.