All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jann@thejh.net>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Roland McGrath <roland@hack.frob.com>,
	John Johansen <john.johansen@canonical.com>,
	James Morris <james.l.morris@oracle.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Paul Moore <aul@paul-moore.com>,
	Stephen Smalley <sds@tycho.nsa.gov>,
	Eric Paris <eparis@parisplace.org>,
	Casey Schaufler <casey@schaufler-ca.com>,
	Kees Cook <keescook@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Janis Danisevskis <jdanis@google.com>,
	Seth Forshee <seth.forshee@canonical.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Benjamin LaHaise <bcrl@kvack.org>,
	Ben Hutchings <ben@decadent.org.uk>,
	Andy Lutomirski <luto@amacapital.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Krister Johansen <kjlx@templeofstupid.com>,
	linux-fsdevel@vger.kernel.org,
	linux-security-module@vger.kernel.org, security@kernel.org
Subject: Re: [PATCH v3 1/8] exec: introduce cred_guard_light
Date: Fri, 04 Nov 2016 10:00:38 -0500	[thread overview]
Message-ID: <87k2cjuw6h.fsf@xmission.com> (raw)
In-Reply-To: <87k2cj2x6j.fsf@xmission.com> (Eric W. Biederman's message of "Fri, 04 Nov 2016 08:26:28 -0500")

ebiederm@xmission.com (Eric W. Biederman) writes:

> Oleg Nesterov <oleg@redhat.com> writes:
>
>> On 11/02, Jann Horn wrote:
>>>
>>> On Wed, Nov 02, 2016 at 07:18:06PM +0100, Oleg Nesterov wrote:
>>> > On 10/30, Jann Horn wrote:
>>> > >
>>> > > This is a new per-threadgroup lock that can often be taken instead of
>>> > > cred_guard_mutex and has less deadlock potential. I'm doing this because
>>> > > Oleg Nesterov mentioned the potential for deadlocks, in particular if a
>>> > > debugged task is stuck in execve, trying to get rid of a ptrace-stopped
>>> > > thread, and the debugger attempts to inspect procfs files of the debugged
>>> > > task.
>>> >
>>> > Yes, but let me repeat that we need to fix this anyway. So I don't really
>>> > understand why should we add yet another mutex.
>>>
>>> execve() only takes the new mutex immediately after de_thread(), so this
>>> problem shouldn't occur there.
>>
>> Yes, I see.
>>
>>> Basically, I think that I'm not making the
>>> problem worse with my patches this way.
>>
>> In a sense that it doesn't add the new deadlocks, I agree. But it adds
>> yet another per-process mutex while we already have the similar one,
>>
>>> I believe that it should be possible to convert most existing users of the
>>> cred_guard_mutex to the new cred_guard_light - exceptions to that that I
>>> see are:
>>>
>>>  - PTRACE_ATTACH
>>
>> This is the main problem afaics. So "strace -f" can hang if it races
>> with mt-exec. And we need to fix this. I constantly forget about this
>> problem, but I tried many times to find a reasonable solution, still
>> can't.
>>
>> IMO, it would be nice to rework the lsm hooks, so that we could take
>> cred_guard_mutex after de_thread() (like your cred_guard_light) or
>> at least drop it earlier, but unlikely this is possible...
>>
>> So the only plan I currently have is change de_thread() to wait until
>> other threads pass exit_notify() or even exit_signals(), but I don't
>> like this.
>>
>>>  - SECCOMP_FILTER_FLAG_TSYNC (sets NO_NEW_PRIVS on remote task)
>>
>> I forgot about this one... Need to re-check but at first glance this
>> is not a real problem.
>>
>>> Beyond that, conceptually, the new cred_guard_light could also be turned
>>> into a read-write mutex
>>
>> Not sure I understand how this can help... doesn't matter.
>>
>> My point is, imo you should not add the new mutex. Just use the old
>> one in (say) 4/8 (which I do not personally like as you know ;), this
>> won't add the new problem.
>>
>>
>>> It seems to me like SECCOMP_FILTER_FLAG_TSYNC doesn't really have
>>> deadlocking issues.
>>
>> Yes, agreed.
>>
>>> PTRACE_ATTACH isn't that clear to me; if a debugger
>>> tries to attach to a newly spawned thread while another ptraced thread is
>>> dying because of de_thread() in a third thread, that might still cause
>>> the debugger to deadlock, right?
>>
>> This is the trivial test-case I wrote when the problem was initially
>> reported. And damn, I always knew that cred_guard_mutex needs fixes,
>> but somehow I completely forgot that it is used by PTRACE_ATTACH when
>> I was going to try to remove from fs/proc a long ago.
>>
>> 	void *thread(void *arg)
>> 	{
>> 		ptrace(PTRACE_TRACEME, 0,0,0);
>> 		return NULL;
>> 	}
>>
>> 	int main(void)
>> 	{
>> 		int pid = fork();
>>
>> 		if (!pid) {
>> 			pthread_t pt;
>> 			pthread_create(&pt, NULL, thread, NULL);
>> 			pthread_join(pt, NULL);
>> 			execlp("echo", "echo", "passed", NULL);
>> 		}
>>
>> 		sleep(1);
>> 		// or anything else which needs ->cred_guard_mutex,
>> 		// say open(/proc/$pid/mem)
>> 		ptrace(PTRACE_ATTACH, pid, 0,0);
>> 		kill(pid, SIGCONT);
>>
>> 		return 0;
>> 	}
>>
>> The problem is trivial. The execing thread waits until its sub-thread
>> goes away, it should be reaped by the tracer, the tracer waits for
>> cred_guard_mutex.
>
> There is a bug here but I don't believe it has anything to do with
> the cred_guard_mutex.
>
> If we reach zap_other_threads fundamentally the tracer should not
> be able to block the traced thread from exiting.  Those are the
> semantics described in the comments in the code.
>
> I have poked things a little and have a half fix for that but
> the fix appears to be the wrong, but enlightening.
>
> AKA the following prevents the hang of your test case.
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 75761acc77cf..a6f83450500e 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1200,7 +1200,7 @@ int zap_other_threads(struct task_struct *p)
>  		if (t->exit_state)
>  			continue;
>  		sigaddset(&t->pending.signal, SIGKILL);
> -		signal_wake_up(t, 1);
> +		signal_wake_up_state(t, TASK_WAKEKILL | __TASK_TRACED);
>  	}
>  
>  	return count;
>
> It looks like somewhere on the exit path the traced thread is blocking
> without setting TASK_WAKEKILL.

Apologies there was a testing mistake and that patch does not actually
help anything.

The following mostly correct patch modifies zap_other_threads in
the case of a de_thread to not wait for zombies to be reaped.  The only
case that cares is ptrace (as threads are self reaping).  So I don't
think this will cause any problems except removing the strace -f race.

Not waiting for zombies to be reaped in de_thread keeps the kernel from
holding the cred_guard_mutex while waiting for userspace.  Which should
mean we don't have to move it.

Not waiting for zombies to be reaped should also speed of mt-exec.  So I
think this is a benefit all around.


diff --git a/kernel/exit.c b/kernel/exit.c
index 9d68c45ebbe3..8c8556cab655 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -109,7 +109,8 @@ static void __exit_signal(struct task_struct *tsk)
 		 * If there is any task waiting for the group exit
 		 * then notify it:
 		 */
-		if (sig->notify_count > 0 && !--sig->notify_count)
+		if ((sig->flags & SIGNAL_GROUP_EXIT) &&
+		    sig->notify_count > 0 && !--sig->notify_count)
 			wake_up_process(sig->group_exit_task);
 
 		if (tsk == sig->curr_target)
@@ -690,6 +691,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
 	if (tsk->exit_state == EXIT_DEAD)
 		list_add(&tsk->ptrace_entry, &dead);
 
+	if (!(tsk->signal->flags & SIGNAL_GROUP_EXIT) &&
+	    tsk->signal->notify_count > 0 && !--tsk->signal->notify_count)
+		wake_up_process(tsk->signal->group_exit_task);
+
 	/* mt-exec, de_thread() is waiting for group leader */
 	if (unlikely(tsk->signal->notify_count < 0))
 		wake_up_process(tsk->signal->group_exit_task);
diff --git a/kernel/signal.c b/kernel/signal.c
index 75761acc77cf..a3a5cd8dad0f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1194,7 +1194,9 @@ int zap_other_threads(struct task_struct *p)
 
 	while_each_thread(p, t) {
 		task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
-		count++;
+		if ((t->signal->flags & SIGNAL_GROUP_EXIT) ||
+		    !t->exit_state)
+			count++;
 
 		/* Don't bother with already dead threads */
 		if (t->exit_state)


Eric

  reply	other threads:[~2016-11-04 15:08 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-30 21:46 [PATCH v3 0/8] Various fixes related to ptrace_may_access() Jann Horn
2016-10-30 21:46 ` [PATCH v3 1/8] exec: introduce cred_guard_light Jann Horn
2016-11-02 18:18   ` Oleg Nesterov
2016-11-02 20:50     ` Jann Horn
2016-11-02 21:38       ` Ben Hutchings
2016-11-02 21:54         ` Jann Horn
2016-11-03 18:12       ` Oleg Nesterov
2016-11-03 21:17         ` Jann Horn
2016-11-04 13:26         ` Eric W. Biederman
2016-11-04 15:00           ` Eric W. Biederman [this message]
2016-11-04 18:04             ` Oleg Nesterov
2016-11-04 18:45               ` Oleg Nesterov
2016-11-05 14:56                 ` Oleg Nesterov
2016-11-09  0:34                   ` Eric W. Biederman
2016-11-16 20:03                   ` Eric W. Biederman
2016-11-08 22:02                 ` Kees Cook
2016-11-08 22:46                   ` Eric W. Biederman
2016-11-08 22:56                     ` Benjamin LaHaise
2016-11-08 23:33                       ` Eric W. Biederman
2016-10-30 21:46 ` [PATCH v3 2/8] exec: add privunit to task_struct Jann Horn
2016-10-30 21:46 ` [PATCH v3 3/8] proc: use open()-time creds for ptrace checks Jann Horn
2016-10-30 21:46 ` [PATCH v3 4/8] futex: don't leak robust_list pointer Jann Horn
2016-10-30 21:46 ` [PATCH v3 5/8] proc: lock properly in ptrace_may_access callers Jann Horn
2016-10-30 21:46 ` [PATCH v3 6/8] fs/proc: fix attr access check Jann Horn
2016-10-30 21:46 ` [PATCH v3 7/8] proc: fix timerslack_ns handling Jann Horn
2016-10-30 21:46 ` [PATCH v3 8/8] Documentation: add security/ptrace_checks.txt Jann Horn
2016-11-01 23:57 ` [PATCH v3 0/8] Various fixes related to ptrace_may_access() Linus Torvalds
2016-11-02 18:38   ` Oleg Nesterov
2016-11-02 21:40     ` Jann Horn
2016-11-03 19:09   ` Andrew Morton
2016-11-03 20:01     ` Jann Horn
2016-11-04  0:57   ` James Morris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k2cjuw6h.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=aul@paul-moore.com \
    --cc=bcrl@kvack.org \
    --cc=ben@decadent.org.uk \
    --cc=casey@schaufler-ca.com \
    --cc=eparis@parisplace.org \
    --cc=james.l.morris@oracle.com \
    --cc=jann@thejh.net \
    --cc=jdanis@google.com \
    --cc=john.johansen@canonical.com \
    --cc=keescook@chromium.org \
    --cc=kjlx@templeofstupid.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=oleg@redhat.com \
    --cc=roland@hack.frob.com \
    --cc=sds@tycho.nsa.gov \
    --cc=security@kernel.org \
    --cc=serge@hallyn.com \
    --cc=seth.forshee@canonical.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.