linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 0/2] fix the traced mt-exec deadlock
       [not found]       ` <20170303173326.GA17899-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-03-03 18:23         ` Eric W. Biederman
  2017-03-03 18:59           ` Eric W. Biederman
       [not found]           ` <87tw7axlr0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 2 replies; 42+ messages in thread
From: Eric W. Biederman @ 2017-03-03 18:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


Cc'd linux-api as we are talking about a deliberate user visible API
change here.

Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> On 03/02, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>
>> > our discussion was a bit confusing, and it seems that we did not
>> > fully convince each other. So let me ask what do you finally think
>> > about this fix.
>> >
>> > Let me repeat. Even if I do not agree with some of your objections,
>> > I do agree that 1/2 does not look nice and clean. And we seem to
>> > agree that either way, with or without this fix, we need more changes
>> > in this area.
>> >
>> > But we need a simple and backportable fix for stable trees, say for
>> > rhel7. This bug was reported many times, and this is the simplest
>> > solution I was able to find.
>>
>> I am not 100% convinced that we need a backportable solution,
>
> I think we need, this was already requested,
>
>> but
>> regardless my experience is that good clean solutions are almost always
>> easier to backport that something we just settle for.
>
> Sure.
>
>> The patch below needs a little more looking and testing but arguably
>> it is sufficient.
>>
>> It implements autoreaping for non-leader threads always.  We might want
>> to limit this to the case of exec.
>
> I should have mentioned this. Of course, this change was proposed from the
> very beginning, when this problem was reported first time. And of course
> I like this fix much more than my patch (but yes, we shouldd limit it to
> the case of exec).
>
> The only problem is that it is incompatible change, and I have no idea what
> can be broken.
>
> Trivial test-case:
>
> 	void *thread(void *arg)
> 	{
> 		for (;;)
> 			pause();
> 		return NULL;
> 	}
>
> 	int main(void)
> 	{
> 		pthread_t pt;
> 		pthread_create(&pt, NULL, thread, NULL);
> 		execlp("true", "true", NULL);
> 	}
>
> with your patch applied
>
> 	$ strace -f ./test
> 	strace: wait4(__WALL): No child processes
>
> Yes, this is just a warning, but still. Now we need to change strace. And we
> can't know what else can be confused/broken by this change.
>
> man(ptrace) even documents that all other threads except the thread group leader
> report death as if they exited via _exit(2).
>
> Yes, yes, it also says that other threads stop in PTRACE_EVENT_EXIT stop,
> so 2/2 (which we need with your change too) is is not compatible too and
> I am worried, but:
>
> 	- this was never really true, an already exiting thread won't stop
> 	  if it races with exec
>
> 	- PTRACE_O_TRACEEXEC is not used that often, it never really worked
>
> 	- man(ptrace) also mentions that PTRACE_EVENT_EXIT behaviour may be
> 	  change in the future.
>
> In short. Of course I considered this change. Looks too risky to me. But.
> I will be happy if you send this change and take all the blame ;) I won't
> argue (if you limit it to execve case).

Unfortunately you have found what already looks like a userspace
regression.  So I don't think we can do that.

I do think the user space visible change of modifying a multi-threaded
exec no to wait  for the other threads to be reaped is the least
dangerous change.

The big lesson for me, and what was not obvious from your change
description is that we are changing the user space visible semantics
of exec+ptrace and that cred_guard_mutex is not at all the problem (as
we always take cred_guard_mutex in a killable or interruptible way).

So I tentatively agree that we need to deliberately change the semantics
of exec to not wait for zombies in fs/exec.c:de_thread.  That is what I
would expect of exec and that seems to the minimal change.

As part of that change I expect we want to call __cleanup_sighand from
do_exit rather than release_task.  Semantically we sould not need the
sighand_struct for zombie process.

>> --- a/kernel/exit.c
>> +++ b/kernel/exit.c
>> @@ -690,7 +690,9 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
>>  				thread_group_empty(tsk) &&
>>  				!ptrace_reparented(tsk) ?
>>  			tsk->exit_signal : SIGCHLD;
>> -		autoreap = do_notify_parent(tsk, sig);
>> +		do_notify_parent(tsk, sig);
>> +		/* Autoreap threads even when ptraced */
>> +		autoreap = !thread_group_leader(tsk);
>>  	} else if (thread_group_leader(tsk)) {
>>  		autoreap = thread_group_empty(tsk) &&
>>  			do_notify_parent(tsk, tsk->exit_signal);
>
> This is all you need,
>
>> @@ -699,8 +701,6 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
>>  	}
>>
>>  	tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
>> -	if (tsk->exit_state == EXIT_DEAD)
>> -		list_add(&tsk->ptrace_entry, &dead);
>>
>>  	/* mt-exec, de_thread() is waiting for group leader */
>>  	if (unlikely(tsk->signal->notify_count < 0))
>> @@ -711,6 +711,8 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
>>  		list_del_init(&p->ptrace_entry);
>>  		release_task(p);
>>  	}
>> +	if (autoreap)
>> +		release_task(tsk);
>
> These 2 changes are not needed. release_task(tsk) will be called by
> list_for_each_entry_safe() above if autoreap == T.

Except for the practical case that for threads that are ptraced
tsk->ptrace_entry is already in use.  Which means we can't use
list_add(&tsk->ptrace_entry, &dead).

Or in short we get a really pretty oops because we corrupt one of the
ptrace lists if we don't have my extra two hunks.  But I agree logically
they are separate changes.

Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/2] fix the traced mt-exec deadlock
  2017-03-03 18:23         ` [PATCH 0/2] fix the traced mt-exec deadlock Eric W. Biederman
@ 2017-03-03 18:59           ` Eric W. Biederman
       [not found]             ` <87d1dyw5iw.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
       [not found]           ` <87tw7axlr0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-03-03 18:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel, linux-api

ebiederm@xmission.com (Eric W. Biederman) writes:

> The big lesson for me, and what was not obvious from your change
> description is that we are changing the user space visible semantics
> of exec+ptrace and that cred_guard_mutex is not at all the problem (as
> we always take cred_guard_mutex in a killable or interruptible way).

Just to follow up.

Because the cred_guard_mutex is fine as is we don't need to move
de_thread out from under cred_guard_mutex.  We just need to change
de_thread to wait until all of the other threads are zombies.
Which should remove about half your proposed patch.

The other key thing is that knowning it isn't cred_guard_mutex let's us
know that this kind of deadlock goes all of the way back to when
CLONE_THREAD was merged into the kernel.

Insteresingly enough looking at zap_other_threads and notify_count I
have found a second bug.  When a multi-threaded processes becomes a
zombie we don't send the notification to the parent process until the
non-leader threads have been reaped.  Which means ptrace can mess up
sending SIGCHLD to the parent.

Now arguably that might be what is desirable but I don't think so.  If
we aren't ptracing a thread then I don't think we want to delay sending
SIGCHLD to the parent.

So this whole area of the semantics of a ptrace'd multi-threaded process
exiting/exec'ing looks like it needs a thorough going over.

Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/2] fix the traced mt-exec deadlock
       [not found]             ` <87d1dyw5iw.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-03-03 20:06               ` Eric W. Biederman
       [not found]                 ` <87tw7aunuh.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-03-03 20:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) writes:

> ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) writes:
>
>> The big lesson for me, and what was not obvious from your change
>> description is that we are changing the user space visible semantics
>> of exec+ptrace and that cred_guard_mutex is not at all the problem (as
>> we always take cred_guard_mutex in a killable or interruptible way).
>
> Just to follow up.
>
> Because the cred_guard_mutex is fine as is we don't need to move
> de_thread out from under cred_guard_mutex.  We just need to change
> de_thread to wait until all of the other threads are zombies.
> Which should remove about half your proposed patch.
>
> The other key thing is that knowning it isn't cred_guard_mutex let's us
> know that this kind of deadlock goes all of the way back to when
> CLONE_THREAD was merged into the kernel.
>
> Insteresingly enough looking at zap_other_threads and notify_count I
> have found a second bug.  When a multi-threaded processes becomes a
> zombie we don't send the notification to the parent process until the
> non-leader threads have been reaped.  Which means ptrace can mess up
> sending SIGCHLD to the parent.

Bah. I was misreading the code.  Nothing but exec uses notify_count
and group_exit_task.

Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.
       [not found]                 ` <87tw7aunuh.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-03-03 20:11                   ` Eric W. Biederman
  2017-03-04 17:03                     ` Oleg Nesterov
  0 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-03-03 20:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


Ever since CLONE_THREAD support was added to the kernel it has been
possible to dead-lock userspace by ptracing a process and not reaping
it's child threads.  With use of the cred_guard_mutex in proc the ways
userspace can unknowningly trigger a dead-lock have grown.  Which makes
a simple -f strace on a program that performs a mult-threaded exec
unsafe.

        #include <unistd.h>
        #include <signal.h>
        #include <sys/ptrace.h>
        #include <pthread.h>

        void *thread(void *arg)
        {
                ptrace(PTRACE_TRACEME, 0,0,0);
                return NULL;
        }

        int main(void)
        {
                int pid = fork();

                if (!pid) {
                        pthread_t pt;
                        pthread_create(&pt, NULL, thread, NULL);
                        pthread_join(pt, NULL);
                        execlp("echo", "echo", "passed", NULL);
                }

                sleep(1);
                // or anything else which needs ->cred_guard_mutex,
                // say open(/proc/$pid/mem)
                ptrace(PTRACE_ATTACH, pid, 0,0);
                kill(pid, SIGCONT);

                return 0;
        }

Sovle this by modifying exec to only wait until all of the other
threads are zombies, and not waiting until the other threads
are reaped.

As well as simplifying the userspace semantics this simplifies
the code.

Reported-by: Ulrich Obergfell <uobergfe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Reported-by: Attila Fazekas <afazekas-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Reported-by: Aleksa Sarai <asarai-IBi9RG/b67k@public.gmane.org>
Reported-by: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Fixes: v2.4.0
Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/exec.c             | 31 +++++--------------------------
 include/linux/sched.h |  5 ++---
 kernel/exit.c         | 18 +++++-------------
 kernel/signal.c       |  6 +-----
 4 files changed, 13 insertions(+), 47 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 698a86094f76..f78205a72304 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1065,11 +1065,8 @@ static int de_thread(struct task_struct *tsk)
 	}
 
 	sig->group_exit_task = tsk;
-	sig->notify_count = zap_other_threads(tsk);
-	if (!thread_group_leader(tsk))
-		sig->notify_count--;
-
-	while (sig->notify_count) {
+	zap_other_threads(tsk);
+	while (atomic_read(&sig->live) > 1) {
 		__set_current_state(TASK_KILLABLE);
 		spin_unlock_irq(lock);
 		schedule();
@@ -1081,29 +1078,13 @@ static int de_thread(struct task_struct *tsk)
 
 	/*
 	 * At this point all other threads have exited, all we have to
-	 * do is to wait for the thread group leader to become inactive,
-	 * and to assume its PID:
+	 * do is to assume the PID of the thread group leader.
 	 */
 	if (!thread_group_leader(tsk)) {
 		struct task_struct *leader = tsk->group_leader;
 
-		for (;;) {
-			threadgroup_change_begin(tsk);
-			write_lock_irq(&tasklist_lock);
-			/*
-			 * Do this under tasklist_lock to ensure that
-			 * exit_notify() can't miss ->group_exit_task
-			 */
-			sig->notify_count = -1;
-			if (likely(leader->exit_state))
-				break;
-			__set_current_state(TASK_KILLABLE);
-			write_unlock_irq(&tasklist_lock);
-			threadgroup_change_end(tsk);
-			schedule();
-			if (unlikely(__fatal_signal_pending(tsk)))
-				goto killed;
-		}
+		threadgroup_change_begin(tsk);
+		write_lock_irq(&tasklist_lock);
 
 		/*
 		 * The only record we have of the real-time age of a
@@ -1163,7 +1144,6 @@ static int de_thread(struct task_struct *tsk)
 	}
 
 	sig->group_exit_task = NULL;
-	sig->notify_count = 0;
 
 no_thread_group:
 	/* we have changed execution domain */
@@ -1204,7 +1184,6 @@ static int de_thread(struct task_struct *tsk)
 	/* protects against exit_notify() and __exit_signal() */
 	read_lock(&tasklist_lock);
 	sig->group_exit_task = NULL;
-	sig->notify_count = 0;
 	read_unlock(&tasklist_lock);
 	return -EAGAIN;
 }
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6261bfc12853..ff6aa76beac5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -711,11 +711,10 @@ struct signal_struct {
 	/* thread group exit support */
 	int			group_exit_code;
 	/* overloaded:
-	 * - notify group_exit_task when ->count is equal to notify_count
+	 * - notify group_exit_task when ->live is equal to 1
 	 * - everyone except group_exit_task is stopped during signal delivery
 	 *   of fatal signals, group_exit_task processes the signal.
 	 */
-	int			notify_count;
 	struct task_struct	*group_exit_task;
 
 	/* thread group stop support, overloads group_exit_code too */
@@ -2763,7 +2762,7 @@ extern __must_check bool do_notify_parent(struct task_struct *, int);
 extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent);
 extern void force_sig(int, struct task_struct *);
 extern int send_sig(int, struct task_struct *, int);
-extern int zap_other_threads(struct task_struct *p);
+extern void zap_other_threads(struct task_struct *p);
 extern struct sigqueue *sigqueue_alloc(void);
 extern void sigqueue_free(struct sigqueue *);
 extern int send_sigqueue(struct sigqueue *,  struct task_struct *, int group);
diff --git a/kernel/exit.c b/kernel/exit.c
index 5cfbd595f918..dc9bc2bdb45a 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -111,13 +111,6 @@ static void __exit_signal(struct task_struct *tsk)
 		tty = sig->tty;
 		sig->tty = NULL;
 	} else {
-		/*
-		 * If there is any task waiting for the group exit
-		 * then notify it:
-		 */
-		if (sig->notify_count > 0 && !--sig->notify_count)
-			wake_up_process(sig->group_exit_task);
-
 		if (tsk == sig->curr_target)
 			sig->curr_target = next_thread(tsk);
 	}
@@ -701,10 +694,6 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
 	tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
 	if (tsk->exit_state == EXIT_DEAD)
 		list_add(&tsk->ptrace_entry, &dead);
-
-	/* mt-exec, de_thread() is waiting for group leader */
-	if (unlikely(tsk->signal->notify_count < 0))
-		wake_up_process(tsk->signal->group_exit_task);
 	write_unlock_irq(&tasklist_lock);
 
 	list_for_each_entry_safe(p, n, &dead, ptrace_entry) {
@@ -740,7 +729,7 @@ static inline void check_stack_usage(void) {}
 void __noreturn do_exit(long code)
 {
 	struct task_struct *tsk = current;
-	int group_dead;
+	int group_left, group_dead;
 	TASKS_RCU(int tasks_rcu_i);
 
 	profile_task_exit(tsk);
@@ -809,7 +798,8 @@ void __noreturn do_exit(long code)
 	if (tsk->mm)
 		sync_mm_rss(tsk->mm);
 	acct_update_integrals(tsk);
-	group_dead = atomic_dec_and_test(&tsk->signal->live);
+	group_left = atomic_dec_return(&tsk->signal->live);
+	group_dead = group_left == 0;
 	if (group_dead) {
 #ifdef CONFIG_POSIX_TIMERS
 		hrtimer_cancel(&tsk->signal->real_timer);
@@ -818,6 +808,8 @@ void __noreturn do_exit(long code)
 		if (tsk->mm)
 			setmax_mm_hiwater_rss(&tsk->signal->maxrss, tsk->mm);
 	}
+	if ((group_left == 1) && tsk->signal->group_exit_task)
+		wake_up_process(tsk->signal->group_exit_task);
 	acct_collect(code, group_dead);
 	if (group_dead)
 		tty_audit_exit();
diff --git a/kernel/signal.c b/kernel/signal.c
index fbbab5a7c84d..17312c71f1ae 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1191,16 +1191,14 @@ force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
 /*
  * Nuke all other threads in the group.
  */
-int zap_other_threads(struct task_struct *p)
+void zap_other_threads(struct task_struct *p)
 {
 	struct task_struct *t = p;
-	int count = 0;
 
 	p->signal->group_stop_count = 0;
 
 	while_each_thread(p, t) {
 		task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
-		count++;
 
 		/* Don't bother with already dead threads */
 		if (t->exit_state)
@@ -1208,8 +1206,6 @@ int zap_other_threads(struct task_struct *p)
 		sigaddset(&t->pending.signal, SIGKILL);
 		signal_wake_up(t, 1);
 	}
-
-	return count;
 }
 
 struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/2] fix the traced mt-exec deadlock
       [not found]           ` <87tw7axlr0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-03-04 16:54             ` Oleg Nesterov
  0 siblings, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2017-03-04 16:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On 03/03, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>
> >> @@ -699,8 +701,6 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
> >>  	}
> >>
> >>  	tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
> >> -	if (tsk->exit_state == EXIT_DEAD)
> >> -		list_add(&tsk->ptrace_entry, &dead);
> >>
> >>  	/* mt-exec, de_thread() is waiting for group leader */
> >>  	if (unlikely(tsk->signal->notify_count < 0))
> >> @@ -711,6 +711,8 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
> >>  		list_del_init(&p->ptrace_entry);
> >>  		release_task(p);
> >>  	}
> >> +	if (autoreap)
> >> +		release_task(tsk);
> >
> > These 2 changes are not needed. release_task(tsk) will be called by
> > list_for_each_entry_safe() above if autoreap == T.
>
> Except for the practical case that for threads that are ptraced
> tsk->ptrace_entry is already in use.  Which means we can't use
> list_add(&tsk->ptrace_entry, &dead).

Yes, I was wrong here, thanks for correcting me.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.
  2017-03-03 20:11                   ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Eric W. Biederman
@ 2017-03-04 17:03                     ` Oleg Nesterov
  2017-03-30  8:07                       ` Eric W. Biederman
  0 siblings, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2017-03-04 17:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel, linux-api

On 03/03, Eric W. Biederman wrote:
>
> Ever since CLONE_THREAD support was added to the kernel it has been
> possible to dead-lock userspace by ptracing a process and not reaping
> it's child threads.

Hmm. I disagree... I do not think this is a bug. But lets discuss this
separately, perhaps I misunderstood you.

> With use of the cred_guard_mutex in proc the ways
> userspace can unknowningly trigger a dead-lock have grown.

I think this particular problem did not exist until cred_guard_mutex
was introduced. Debugger can obviously "delay" exec if it doesn't
reap a zombie sub-thread, but this is another thing and not a bug imo.


> Sovle this by modifying exec to only wait until all of the other
> threads are zombies, and not waiting until the other threads
> are reaped.

This patch looks wrong in many ways.

> @@ -1065,11 +1065,8 @@ static int de_thread(struct task_struct *tsk)
>  	}
>
>  	sig->group_exit_task = tsk;
> -	sig->notify_count = zap_other_threads(tsk);
> -	if (!thread_group_leader(tsk))
> -		sig->notify_count--;
> -
> -	while (sig->notify_count) {
> +	zap_other_threads(tsk);
> +	while (atomic_read(&sig->live) > 1) {
>  		__set_current_state(TASK_KILLABLE);
>  		spin_unlock_irq(lock);
>  		schedule();

Very nice. So de_thread() returns as soon as all other threads decrement
signal->live in do_exit(). Before they do, say, exit_mm(). This is already
wrong, for example this breaks OOM. Plus a lot more problems afaics,  but
lets ignore this.

Note that de_thread() also unshares ->sighand before return. So in the
case of mt exec it will likely see oldsighand->count != 1 and alloc the
new sighand_struct and this breaks the locking.

Because the execing thread will use newsighand->siglock to protect its
signal_struct while the zombie threads will use oldsighand->siglock to
protect the same signal struct. Yes, tasklist_lock + the fact irq_disable
implies rcu_lock mostly save us but not entirely, say, a foreign process
doing __send_signal() can take the right or the wrong lock depending on
/dev/random.


> @@ -818,6 +808,8 @@ void __noreturn do_exit(long code)
>  		if (tsk->mm)
>  			setmax_mm_hiwater_rss(&tsk->signal->maxrss, tsk->mm);
>  	}
> +	if ((group_left == 1) && tsk->signal->group_exit_task)
> +		wake_up_process(tsk->signal->group_exit_task);

This is racy, but this is minor.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.
  2017-03-04 17:03                     ` Oleg Nesterov
@ 2017-03-30  8:07                       ` Eric W. Biederman
       [not found]                         ` <8760ir192p.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-03-30  8:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel, linux-api

Oleg Nesterov <oleg@redhat.com> writes:

> On 03/03, Eric W. Biederman wrote:
>> @@ -1065,11 +1065,8 @@ static int de_thread(struct task_struct *tsk)
>>  	}
>>
>>  	sig->group_exit_task = tsk;
>> -	sig->notify_count = zap_other_threads(tsk);
>> -	if (!thread_group_leader(tsk))
>> -		sig->notify_count--;
>> -
>> -	while (sig->notify_count) {
>> +	zap_other_threads(tsk);
>> +	while (atomic_read(&sig->live) > 1) {
>>  		__set_current_state(TASK_KILLABLE);
>>  		spin_unlock_irq(lock);
>>  		schedule();
>
> Very nice. So de_thread() returns as soon as all other threads decrement
> signal->live in do_exit(). Before they do, say, exit_mm(). This is already
> wrong, for example this breaks OOM. Plus a lot more problems afaics,  but
> lets ignore this.

Which means that we need to keep sig->notify_count.

> Note that de_thread() also unshares ->sighand before return. So in the
> case of mt exec it will likely see oldsighand->count != 1 and alloc the
> new sighand_struct and this breaks the locking.
>
> Because the execing thread will use newsighand->siglock to protect its
> signal_struct while the zombie threads will use oldsighand->siglock to
> protect the same signal struct. Yes, tasklist_lock + the fact irq_disable
> implies rcu_lock mostly save us but not entirely, say, a foreign process
> doing __send_signal() can take the right or the wrong lock depending on
> /dev/random.

Which leads to the question how can we get back tot he 2.4 behavior
of freeing sighand_struct in do_exit?

At which point as soon as we free sighand_struct if we are the last
to dying thread notify de_thread and everything works.

There are only two uses of sighand->siglock that I can see in
release_task:  __ptrace_unlink and __exit_signal.

For what __ptrace_unlink is doing we should just be able to skip
acquiring of siglock if PF_EXITING is set.

__exit_signal is a little more interesting but half of what it is
doing looks like it was pulled out of do_exit and just needs to
be put back.

Which probably adds up to 4 or 5 small carefully written patches to sort
out that part of the exit path, but I think it leads to a very simple
and straight forward result.  With the only side effect that we
can exec a process and still have a debugger reaping zombie threads.

Oleg does that sound reasonable to you?

Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [RFC][PATCH 0/2] exec: Fixing ptrace'd mulit-threaded hang
       [not found]                         ` <8760ir192p.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-04-01  5:11                           ` Eric W. Biederman
       [not found]                             ` <878tnkpv8h.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2017-04-02 16:15                           ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Oleg Nesterov
  1 sibling, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-01  5:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


I spent a little more time with this and only waiting until the killed
thread are zombies (and not reaped as we do today) really looks like
the right fix.

Oleg the following two patches work on top of your PTRACE_EVENT_EXIT
change and probably need a little more cleanup until they are ready
for serious posting.

That said I want to I want to post the code so I have a change at
some feedback before I prepare the final round of patches.

These patches only handle the case when sighand_struct is not
shared between different multi-threaded processes.  The general
case is solvable but that is a quite a bit more code.

Eric W. Biederman (2):
      sighand: Count each thread group once in sighand_struct
      exec: If possible don't wait for ptraced threads to be reaped

 fs/exec.c                    | 15 ++++++++++-----
 include/linux/sched/signal.h |  2 +-
 kernel/exit.c                | 15 ++++++++++-----
 kernel/fork.c                |  6 ++++--
 kernel/signal.c              |  8 ++++++--
 5 files changed, 31 insertions(+), 15 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [RFC][PATCH 1/2] sighand: Count each thread group once in sighand_struct
       [not found]                             ` <878tnkpv8h.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-04-01  5:14                               ` Eric W. Biederman
  2017-04-01  5:16                               ` [RFC][PATCH 2/2] exec: If possible don't wait for ptraced threads to be reaped Eric W. Biederman
                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-01  5:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


In practice either a thread group is either using a sighand_struct or
it isn't.  Therefore simplify things a bit and only increment the
count in sighand_struct when a new thread group is created that uses
the existing sighand_struct, and only decrement the count in
sighand_struct when a thread group exits.

As well as standing on it's own merits this has the potential to simply
de_thread.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/exit.c | 2 +-
 kernel/fork.c | 6 ++++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index e126ebf2400c..8c5b3e106298 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -163,9 +163,9 @@ static void __exit_signal(struct task_struct *tsk)
 	tsk->sighand = NULL;
 	spin_unlock(&sighand->siglock);
 
-	__cleanup_sighand(sighand);
 	clear_tsk_thread_flag(tsk, TIF_SIGPENDING);
 	if (group_dead) {
+		__cleanup_sighand(sighand);
 		flush_sigqueue(&sig->shared_pending);
 		tty_kref_put(tty);
 	}
diff --git a/kernel/fork.c b/kernel/fork.c
index 6c463c80e93d..fe6f1bf32bb9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1295,7 +1295,8 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
 	struct sighand_struct *sig;
 
 	if (clone_flags & CLONE_SIGHAND) {
-		atomic_inc(&current->sighand->count);
+		if (!(clone_flags & CLONE_THREAD))
+			atomic_inc(&current->sighand->count);
 		return 0;
 	}
 	sig = kmem_cache_alloc(sighand_cachep, GFP_KERNEL);
@@ -1896,7 +1897,8 @@ static __latent_entropy struct task_struct *copy_process(
 	if (!(clone_flags & CLONE_THREAD))
 		free_signal_struct(p->signal);
 bad_fork_cleanup_sighand:
-	__cleanup_sighand(p->sighand);
+	if (!(clone_flags & CLONE_THREAD))
+		__cleanup_sighand(p->sighand);
 bad_fork_cleanup_fs:
 	exit_fs(p); /* blocking */
 bad_fork_cleanup_files:
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC][PATCH 2/2] exec: If possible don't wait for ptraced threads to be reaped
       [not found]                             ` <878tnkpv8h.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2017-04-01  5:14                               ` [RFC][PATCH 1/2] sighand: Count each thread group once in sighand_struct Eric W. Biederman
@ 2017-04-01  5:16                               ` Eric W. Biederman
       [not found]                                 ` <87vaqooggs.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2017-04-02 15:38                               ` [RFC][PATCH 0/2] exec: Fixing ptrace'd mulit-threaded hang Oleg Nesterov
  2017-04-02 22:50                               ` [RFC][PATCH v2 0/5] " Eric W. Biederman
  3 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-01  5:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


Take advantage of the situation when sighand->count == 1 to only wait
for threads to reach EXIT_ZOMBIE instead of EXIT_DEAD in de_thread.
Only old old linux threading libraries use CLONE_SIGHAND without
CLONE_THREAD.  So this situation should be present most of the time.

This allows ptracing through a multi-threaded exec without the danger
of stalling the exec.  As historically exec waits for the other
threads to be reaped in de_thread before completing.  This is
necessary as it is not safe to unshare the sighand_struct until all of
the other threads in this thread group are reaped, because the lock to
serialize threads in a thread group siglock lives in sighand_struct.

When oldsighand->count == 1 we know that there are no other
users and unsharing the sighand struct in exec is pointless.
This makes it safe to only wait for threads to become zombies
as the siglock won't change during exec and release_task
will use the samve siglock for the old threads as for
the new threads.

Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/exec.c                    | 15 ++++++++++-----
 include/linux/sched/signal.h |  2 +-
 kernel/exit.c                | 13 +++++++++----
 kernel/signal.c              |  8 ++++++--
 4 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 65145a3df065..0fd29342bbe4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1052,6 +1052,7 @@ static int de_thread(struct task_struct *tsk)
 	struct signal_struct *sig = tsk->signal;
 	struct sighand_struct *oldsighand = tsk->sighand;
 	spinlock_t *lock = &oldsighand->siglock;
+	bool may_hang;
 
 	if (thread_group_empty(tsk))
 		goto no_thread_group;
@@ -1069,9 +1070,10 @@ static int de_thread(struct task_struct *tsk)
 		return -EAGAIN;
 	}
 
+	may_hang = atomic_read(&oldsighand->count) != 1;
 	sig->group_exit_task = tsk;
-	sig->notify_count = zap_other_threads(tsk);
-	if (!thread_group_leader(tsk))
+	sig->notify_count = zap_other_threads(tsk, may_hang ? 1 : -1);
+	if (may_hang && !thread_group_leader(tsk))
 		sig->notify_count--;
 
 	while (sig->notify_count) {
@@ -1092,9 +1094,10 @@ static int de_thread(struct task_struct *tsk)
 	if (!thread_group_leader(tsk)) {
 		struct task_struct *leader = tsk->group_leader;
 
-		for (;;) {
-			cgroup_threadgroup_change_begin(tsk);
-			write_lock_irq(&tasklist_lock);
+		cgroup_threadgroup_change_begin(tsk);
+		write_lock_irq(&tasklist_lock);
+
+		for (;may_hang;) {
 			/*
 			 * Do this under tasklist_lock to ensure that
 			 * exit_notify() can't miss ->group_exit_task
@@ -1108,6 +1111,8 @@ static int de_thread(struct task_struct *tsk)
 			schedule();
 			if (unlikely(__fatal_signal_pending(tsk)))
 				goto killed;
+			cgroup_threadgroup_change_begin(tsk);
+			write_lock_irq(&tasklist_lock);
 		}
 
 		/*
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 2cf446704cd4..187a9e980d3a 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -298,7 +298,7 @@ extern __must_check bool do_notify_parent(struct task_struct *, int);
 extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent);
 extern void force_sig(int, struct task_struct *);
 extern int send_sig(int, struct task_struct *, int);
-extern int zap_other_threads(struct task_struct *p);
+extern int zap_other_threads(struct task_struct *p, int do_count);
 extern struct sigqueue *sigqueue_alloc(void);
 extern void sigqueue_free(struct sigqueue *);
 extern int send_sigqueue(struct sigqueue *,  struct task_struct *, int group);
diff --git a/kernel/exit.c b/kernel/exit.c
index 8c5b3e106298..972df5ebf79f 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -712,6 +712,8 @@ static void forget_original_parent(struct task_struct *father,
  */
 static void exit_notify(struct task_struct *tsk, int group_dead)
 {
+	struct sighand_struct *sighand = tsk->sighand;
+	struct signal_struct *signal = tsk->signal;
 	bool autoreap;
 	struct task_struct *p, *n;
 	LIST_HEAD(dead);
@@ -739,9 +741,12 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
 	if (tsk->exit_state == EXIT_DEAD)
 		list_add(&tsk->ptrace_entry, &dead);
 
-	/* mt-exec, de_thread() is waiting for group leader */
-	if (unlikely(tsk->signal->notify_count < 0))
-		wake_up_process(tsk->signal->group_exit_task);
+	spin_lock(&sighand->siglock);
+	/* mt-exec, de_thread is waiting for threads to exit */
+	if (signal->notify_count < 0 && !++signal->notify_count)
+		wake_up_process(signal->group_exit_task);
+
+	spin_unlock(&sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
 
 	list_for_each_entry_safe(p, n, &dead, ptrace_entry) {
@@ -975,7 +980,7 @@ do_group_exit(int exit_code)
 		else {
 			sig->group_exit_code = exit_code;
 			sig->flags = SIGNAL_GROUP_EXIT;
-			zap_other_threads(current);
+			zap_other_threads(current, 0);
 		}
 		spin_unlock_irq(&sighand->siglock);
 	}
diff --git a/kernel/signal.c b/kernel/signal.c
index 986ef55641ea..e3a5bc239345 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1196,7 +1196,7 @@ force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
 /*
  * Nuke all other threads in the group.
  */
-int zap_other_threads(struct task_struct *p)
+int zap_other_threads(struct task_struct *p, int do_count)
 {
 	struct task_struct *t = p;
 	int count = 0;
@@ -1205,13 +1205,17 @@ int zap_other_threads(struct task_struct *p)
 
 	while_each_thread(p, t) {
 		task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
-		count++;
+		if (do_count > 0)
+			count++;
 
 		/* Don't bother with already dead threads */
 		if (t->exit_state)
 			continue;
 		sigaddset(&t->pending.signal, SIGKILL);
 		signal_wake_up(t, 1);
+
+		if (do_count < 0)
+			count--;
 	}
 
 	return count;
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH 2/2] exec: If possible don't wait for ptraced threads to be reaped
       [not found]                                 ` <87vaqooggs.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-04-02 15:35                                   ` Oleg Nesterov
       [not found]                                     ` <20170402153517.GA12637-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-02 15:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On 04/01, Eric W. Biederman wrote:
>
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1052,6 +1052,7 @@ static int de_thread(struct task_struct *tsk)
>  	struct signal_struct *sig = tsk->signal;
>  	struct sighand_struct *oldsighand = tsk->sighand;
>  	spinlock_t *lock = &oldsighand->siglock;
> +	bool may_hang;
>
>  	if (thread_group_empty(tsk))
>  		goto no_thread_group;
> @@ -1069,9 +1070,10 @@ static int de_thread(struct task_struct *tsk)
>  		return -EAGAIN;
>  	}
>
> +	may_hang = atomic_read(&oldsighand->count) != 1;
>  	sig->group_exit_task = tsk;
> -	sig->notify_count = zap_other_threads(tsk);
> -	if (!thread_group_leader(tsk))
> +	sig->notify_count = zap_other_threads(tsk, may_hang ? 1 : -1);

Eric, this is amazing. So with this patch exec does different things depening
on whether sighand is shared with another CLONE_SIGHAND task or not. To me
this doesn't look sane in any case.

And of course you do realize that it doesn't solve the problem entirely? If I
modify my test-case a little bit

	int xxx(void *arg)
	{
		for (;;)
			pause();
	}

	void *thread(void *arg)
	{
		ptrace(PTRACE_TRACEME, 0,0,0);
		return NULL;
	}

	int main(void)
	{
		int pid = fork();

		if (!pid) {
			pthread_t pt;
			char stack[16 * 1024];

			clone(xxx, stack + 16*1024, CLONE_SIGHAND|CLONE_VM, NULL);

			pthread_create(&pt, NULL, thread, NULL);
			pthread_join(pt, NULL);
			execlp("echo", "echo", "passed", NULL);
		}

		sleep(1);
		// or anything else which needs ->cred_guard_mutex,
		// say open(/proc/$pid/mem)
		ptrace(PTRACE_ATTACH, pid, 0,0);
		kill(pid, SIGCONT);

		return 0;
	}

it should deadlock the same way?

So what is the point to make the, well imo insane, patch if it doesn't solve
the problem?

And btw zap_other_threads(may_hang == 0) is racy. Either you need tasklist or
exit_notify() should set tsk->exit_state under siglock, otherwise zap() can
return the wrong count.

Finally. This patch creates the nice security hole. Let me modify my test-case
again:

	void *thread(void *arg)
	{
		ptrace(PTRACE_TRACEME, 0,0,0);
		return NULL;
	}

	int main(void)
	{
		int pid = fork();

		if (!pid) {
			pthread_t pt;
			pthread_create(&pt, NULL, thread, NULL);
			pthread_join(pt, NULL);
			execlp(path-to-setuid-binary, args);
		}

		sleep(1);

		// Now we can send the signals to setiuid app
		kill(pid+1, ANYSIGNAL);

		return 0;
	}

I see another email from your with another proposal. I disagree, will reply soon.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH 0/2] exec: Fixing ptrace'd mulit-threaded hang
       [not found]                             ` <878tnkpv8h.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2017-04-01  5:14                               ` [RFC][PATCH 1/2] sighand: Count each thread group once in sighand_struct Eric W. Biederman
  2017-04-01  5:16                               ` [RFC][PATCH 2/2] exec: If possible don't wait for ptraced threads to be reaped Eric W. Biederman
@ 2017-04-02 15:38                               ` Oleg Nesterov
  2017-04-02 22:50                               ` [RFC][PATCH v2 0/5] " Eric W. Biederman
  3 siblings, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-02 15:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On 04/01, Eric W. Biederman wrote:
>
> These patches only handle the case when sighand_struct is not
> shared between different multi-threaded processes.

Ah, I didn't read 0/2 when looked at 2/2, so at least you documented this.

> The general
> case is solvable but that is a quite a bit more code.

I don't think so, please see my reply to 2/2.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.
       [not found]                         ` <8760ir192p.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2017-04-01  5:11                           ` [RFC][PATCH 0/2] exec: Fixing ptrace'd mulit-threaded hang Eric W. Biederman
@ 2017-04-02 16:15                           ` Oleg Nesterov
       [not found]                             ` <20170402161518.GC12637-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-02 16:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On 03/30, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>
> > Very nice. So de_thread() returns as soon as all other threads decrement
> > signal->live in do_exit(). Before they do, say, exit_mm(). This is already
> > wrong, for example this breaks OOM. Plus a lot more problems afaics,  but
> > lets ignore this.
>
> Which means that we need to keep sig->notify_count.

Yes, although we need to make it less ugly.

> > Note that de_thread() also unshares ->sighand before return. So in the
> > case of mt exec it will likely see oldsighand->count != 1 and alloc the
> > new sighand_struct and this breaks the locking.
> >
> > Because the execing thread will use newsighand->siglock to protect its
> > signal_struct while the zombie threads will use oldsighand->siglock to
> > protect the same signal struct. Yes, tasklist_lock + the fact irq_disable
> > implies rcu_lock mostly save us but not entirely, say, a foreign process
> > doing __send_signal() can take the right or the wrong lock depending on
> > /dev/random.
>
> Which leads to the question how can we get back tot he 2.4 behavior
> of freeing sighand_struct in do_exit?
>
> At which point as soon as we free sighand_struct if we are the last
> to dying thread notify de_thread and everything works.

I was thinking about the similar option, see below, but decided that we
should not do this at least right now.

> For what __ptrace_unlink is doing we should just be able to skip
> acquiring of siglock if PF_EXITING is set.

We can even remove it from release_task() path, this is simple.

> __exit_signal is a little more interesting but half of what it is
> doing looks like it was pulled out of do_exit and just needs to
> be put back.

That is. I think we should actually unhash the exiting sub-thread even
if it is traced. IOW, remove it from thread/pid/parent/etc lists and
nullify its ->sighand. IMO, whatever we do thread_group_empty(current)
should be true after exec.

So the exiting sub-trace should look almost a EXIT_DEAD task except it
still should report to debugger.

But this is dangerous. Say, wait4(upid <= 0) becomes unsafe because
task_pid_type(PIDTYPE_PGID) won't work.

> Which probably adds up to 4 or 5 small carefully written patches to sort
> out that part of the exit path,

Perhaps I am wrong, but I think you underestimate the problems, and it is
not clear to me if we really want this.

=========================================================================
Anyway, Eric, even if we can and want to do this, why we can't do this on
top of my fix?

I simply fail to understand why you dislike it that much. Yes it is not
pretty, I said this many times, but it is safe in that it doesn't really
change the current behaviour.

I am much more worried about 2/2 you didn't argue with, this patch _can_
break something and this is obviously not good even if PTRACE_EVENT_EXIT
was always broken.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH 2/2] exec: If possible don't wait for ptraced threads to be reaped
       [not found]                                     ` <20170402153517.GA12637-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-02 18:53                                       ` Eric W. Biederman
       [not found]                                         ` <877f32k5ew.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-02 18:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> On 04/01, Eric W. Biederman wrote:
>>
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1052,6 +1052,7 @@ static int de_thread(struct task_struct *tsk)
>>  	struct signal_struct *sig = tsk->signal;
>>  	struct sighand_struct *oldsighand = tsk->sighand;
>>  	spinlock_t *lock = &oldsighand->siglock;
>> +	bool may_hang;
>>
>>  	if (thread_group_empty(tsk))
>>  		goto no_thread_group;
>> @@ -1069,9 +1070,10 @@ static int de_thread(struct task_struct *tsk)
>>  		return -EAGAIN;
>>  	}
>>
>> +	may_hang = atomic_read(&oldsighand->count) != 1;
>>  	sig->group_exit_task = tsk;
>> -	sig->notify_count = zap_other_threads(tsk);
>> -	if (!thread_group_leader(tsk))
>> +	sig->notify_count = zap_other_threads(tsk, may_hang ? 1 : -1);
>
> Eric, this is amazing. So with this patch exec does different things depening
> on whether sighand is shared with another CLONE_SIGHAND task or not. To me
> this doesn't look sane in any case.

It is a 99% solution that makes it possible to talk about and review
letting the exec continue after the subthreads are killed but not
reaped.

Sigh I should have made may_hang say:

may_hang = (atomic_read(&oldsignand->count) != 1) && (sig->nr_threads > 1)

Which covers all know ways userspace actually uses these clone flags.

> And btw zap_other_threads(may_hang == 0) is racy. Either you need tasklist or
> exit_notify() should set tsk->exit_state under siglock, otherwise zap() can
> return the wrong count.

zap_other_thread(tsk, 0) only gets called in the case where we don't
care about the return value.  It does not get called from fs/exec.c

> Finally. This patch creates the nice security hole. Let me modify my test-case
> again:
>
> 	void *thread(void *arg)
> 	{
> 		ptrace(PTRACE_TRACEME, 0,0,0);
> 		return NULL;
> 	}
>
> 	int main(void)
> 	{
> 		int pid = fork();
>
> 		if (!pid) {
> 			pthread_t pt;
> 			pthread_create(&pt, NULL, thread, NULL);
> 			pthread_join(pt, NULL);
> 			execlp(path-to-setuid-binary, args);
> 		}
>
> 		sleep(1);
>
> 		// Now we can send the signals to setiuid app
> 		kill(pid+1, ANYSIGNAL);
>
> 		return 0;
> 	}

That is a substantive objection, and something that definitely needs
to get fixed.   Can you think of anything else?

Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.
       [not found]                             ` <20170402161518.GC12637-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-02 21:07                               ` Eric W. Biederman
       [not found]                                 ` <87inmmbjsq.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-02 21:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> Perhaps I am wrong, but I think you underestimate the problems, and it is
> not clear to me if we really want this.

I worked through quite a bit of it and I realized a few fundamental
issues.  The task struct must remain visible until it is reaped
and we use siglock to protect in unhash process to protect that reaping.
Further tsk->sighand == NULL winds up being a flag used to tell
if release_task has been called.

To get an usable count on sighand struct all that needed to be done
was to change the reference counting of sighand_struct to count
processes and not threads.

Which is what I wound up posting.

> Anyway, Eric, even if we can and want to do this, why we can't do this on
> top of my fix?

Because your reduction in scope of cred_guard_mutex is fundamentally
broken and unnecessary.

> I simply fail to understand why you dislike it that much. Yes it is not
> pretty, I said this many times, but it is safe in that it doesn't really
> change the current behaviour.

No it is not safe.  And it promotes wrong thinking which is even more
dangerous.

I reviewed the code and cred_guard_mutex needs to cover what it covers.

> I am much more worried about 2/2 you didn't argue with, this patch _can_
> break something and this is obviously not good even if PTRACE_EVENT_EXIT
> was always broken.

I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually
know what the implications of changing it are.  Let's see...

gdb - no
upstart - no
lldb - yes
strace - no


It looks like lldb is worth testing with your PTRACE_EVENT_EXIT change
to see if anything breaks.

I think we can get away with changing the exec case but it does look
worth testing.  I hadn't realized you hadn't looked to see what was
using PTRACE_O_TRACEEXIT to see if any part of userspace cares.

Hmm.  This is interesting.  From the strace documentation:
> Tracer cannot assume that ptrace-stopped tracee exists. There are many
> scenarios when tracee may die while stopped (such as SIGKILL).
> Therefore, tracer must always be prepared to handle ESRCH error on any
> ptrace operation. Unfortunately, the same error is returned if tracee
> exists but is not ptrace-stopped (for commands which require stopped
> tracee), or if it is not traced by process which issued ptrace call.
> Tracer needs to keep track of stopped/running state, and interpret
> ESRCH as "tracee died unexpectedly" only if it knows that tracee has
> been observed to enter ptrace-stop. Note that there is no guarantee
> that waitpid(WNOHANG) will reliably report tracee's death status if
> ptrace operation returned ESRCH. waitpid(WNOHANG) may return 0 instead.
> IOW: tracee may be "not yet fully dead" but already refusing ptrace
> ops.

If delivering a second SIGKILL to a ptraced stopped processes will
make it continue we have a very interesting out..

When we stop in ptrace_stop we stop in TASK_TRACED == (TASK_WAKEKILL|__TASK_TRACED)

Delivery of a SIGKILL to that task has queue SIGKILL
and call signal_wake_up_state(t, TASK_WAKEKILL).
Which becomes wake_up_state(t, TASK_INTERRUPTIBLE | TASK_WAKEKILL)
Which wakes up the process.

So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT
before the tracers find it.

Therefore we are only talking a quality of implementation issue
if we actually stop and wait for the tracer or not.

....

Which brings us to your PTRACE_EVENT_EXIT patch.

I think may_ptrace_stop is tested in the wrong place,
and is probably buggy.

- We should send the signal in all cases except when the ptracing parent
  does not exist aka (!current->ptrace).  The siginfo contains enough
  information to understand what happened if anyone is listening.

- Then we should send the group stop.

- Then if we don't want to wait we should:
  __set_current_state(TASK_RUNNING)

- Then we should drop the locks and only call freezable_schedule if
  we want to wait.

That way userspace thinks someone else just sent a SIGKILL and killed
the thread before it had a chance to look (which is effectively what
we are doing).  That sounds idea for both core-dumps and this case.

Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [RFC][PATCH v2 0/5] exec: Fixing ptrace'd mulit-threaded hang
       [not found]                             ` <878tnkpv8h.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
                                                 ` (2 preceding siblings ...)
  2017-04-02 15:38                               ` [RFC][PATCH 0/2] exec: Fixing ptrace'd mulit-threaded hang Oleg Nesterov
@ 2017-04-02 22:50                               ` Eric W. Biederman
       [not found]                                 ` <874ly6a0h1.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  3 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-02 22:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


Oleg your comment about kill being able to send signal was an important
dimension I had missed thank you.

This patchset just denies the case of SIGHAND between different
multi-threaded processes as I don't think anyone cares.  I can
fix that if anyone cares but I am not certain we actally do.

I have reworked the ptrace notification code so that we always
send notifications if we can but don't wait if it is a coredump
or an exec.  Which simpilifies the code nicely.

A few more tweaks are needed before a final version but I think
things are compelling.

 fs/exec.c                    |  23 ++-------
 include/linux/sched/signal.h |   1 +
 kernel/exit.c                |  20 ++++----
 kernel/fork.c                |  14 +++++-
 kernel/ptrace.c              |   4 ++
 kernel/signal.c              | 112 +++++++++++++++++++------------------------
 6 files changed, 78 insertions(+), 96 deletions(-)

Eric W. Biederman (5):
      ptrace: Don't wait in PTRACE_O_TRACEEXIT for exec or coredump
      sighand: Count each thread group once in sighand_struct
      clone: Disallown CLONE_THREAD with a shared sighand_struct
      exec: If possible don't wait for ptraced threads to be reaped
      signal: Don't allow accessing signal_struct by old threads after exec

Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [RFC][PATCH v2 1/5] ptrace: Don't wait in PTRACE_O_TRACEEXIT for exec or coredump
       [not found]                                 ` <874ly6a0h1.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-04-02 22:51                                   ` Eric W. Biederman
  2017-04-05 16:19                                     ` Oleg Nesterov
  2017-04-02 22:51                                   ` [RFC][PATCH v2 2/5] sighand: Count each thread group once in sighand_struct Eric W. Biederman
                                                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-02 22:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


Take advantage of the fact that no ptrace stop will wait for
a debugger if another process sends SIGKILL to the waiting task.

In the case of exec and coredump which have many interesting deadlock
opportunities and no one tests the what happens if you are ptraced
during exec or coredump act like another SIGKILL was immediately
sent to the process and don't stop.

Keep sending the signal to the tracer so that this appears like
the worst case where someone else sent the process a SIGKILL before
the tracer could react.  So all non-buggy tracers must support
this case.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/signal.c | 103 +++++++++++++++++++++++---------------------------------
 1 file changed, 42 insertions(+), 61 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 7e59ebc2c25e..11fa736eb2ae 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1740,30 +1740,6 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_unlock_irqrestore(&sighand->siglock, flags);
 }
 
-static inline int may_ptrace_stop(void)
-{
-	if (!likely(current->ptrace))
-		return 0;
-	/*
-	 * Are we in the middle of do_coredump?
-	 * If so and our tracer is also part of the coredump stopping
-	 * is a deadlock situation, and pointless because our tracer
-	 * is dead so don't allow us to stop.
-	 * If SIGKILL was already sent before the caller unlocked
-	 * ->siglock we must see ->core_state != NULL. Otherwise it
-	 * is safe to enter schedule().
-	 *
-	 * This is almost outdated, a task with the pending SIGKILL can't
-	 * block in TASK_TRACED. But PTRACE_EVENT_EXIT can be reported
-	 * after SIGKILL was already dequeued.
-	 */
-	if (unlikely(current->mm->core_state) &&
-	    unlikely(current->mm == current->parent->mm))
-		return 0;
-
-	return 1;
-}
-
 /*
  * Return non-zero if there is a SIGKILL that should be waking us up.
  * Called with the siglock held.
@@ -1789,6 +1765,8 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
 {
+	bool ptrace_parent;
+	bool ptrace_wait;
 	bool gstop_done = false;
 
 	if (arch_ptrace_stop_needed(exit_code, info)) {
@@ -1842,53 +1820,56 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
-	if (may_ptrace_stop()) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
+	ptrace_parent = likely(current->ptrace);
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	if (ptrace_parent)
 		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
+	if (gstop_done && (ptrace_reparented(current) || !ptrace_parent))
+		do_notify_parent_cldstop(current, false, why);
 
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		preempt_enable_no_resched();
-		freezable_schedule();
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
+	/*
+	 * Always wait for the traceer if the process is traced unless
+	 * the process is in the middle of an exec or core dump.
+	 *
+	 * In exec we risk a deadlock with de_thread waiting for the
+	 * thread to be killed, the tracer, and acquring cred_guard_mutex.
+	 *
+	 * In coredump we risk a deadlock if the tracer is also part
+	 * of the coredump stopping.
+	 */
+	ptrace_wait = ptrace_parent &&
+		!current->signal->group_exit_task &&
+		!current->mm->core_state;
 
+	if (!ptrace_wait) {
 		/* tasklist protects us from ptrace_freeze_traced() */
 		__set_current_state(TASK_RUNNING);
 		if (clear_code)
 			current->exit_code = 0;
-		read_unlock(&tasklist_lock);
 	}
 
 	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement read_unlock_no_resched().
+	 */
+	preempt_disable();
+	read_unlock(&tasklist_lock);
+	preempt_enable_no_resched();
+	if (ptrace_wait)
+		freezable_schedule();
+
+	/*
 	 * We are back.  Now reacquire the siglock before touching
 	 * last_siginfo, so that we are sure to have synchronized with
 	 * any signal-sending on another CPU that wants to examine it.
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC][PATCH v2 2/5] sighand: Count each thread group once in sighand_struct
       [not found]                                 ` <874ly6a0h1.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2017-04-02 22:51                                   ` [RFC][PATCH v2 1/5] ptrace: Don't wait in PTRACE_O_TRACEEXIT for exec or coredump Eric W. Biederman
@ 2017-04-02 22:51                                   ` Eric W. Biederman
  2017-04-02 22:52                                   ` [RFC][PATCH v2 3/5] clone: Disallown CLONE_THREAD with a shared sighand_struct Eric W. Biederman
                                                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-02 22:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


In practice either a thread group is either using a sighand_struct or
it isn't.  Therefore simplify things a bit and only increment the
count in sighand_struct when a new thread group is created that uses
the existing sighand_struct, and only decrement the count in
sighand_struct when a thread group exits.

As well as standing on it's own merits this has the potential to simply
de_thread.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/exit.c | 2 +-
 kernel/fork.c | 6 ++++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index e126ebf2400c..8c5b3e106298 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -163,9 +163,9 @@ static void __exit_signal(struct task_struct *tsk)
 	tsk->sighand = NULL;
 	spin_unlock(&sighand->siglock);
 
-	__cleanup_sighand(sighand);
 	clear_tsk_thread_flag(tsk, TIF_SIGPENDING);
 	if (group_dead) {
+		__cleanup_sighand(sighand);
 		flush_sigqueue(&sig->shared_pending);
 		tty_kref_put(tty);
 	}
diff --git a/kernel/fork.c b/kernel/fork.c
index 6c463c80e93d..fe6f1bf32bb9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1295,7 +1295,8 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
 	struct sighand_struct *sig;
 
 	if (clone_flags & CLONE_SIGHAND) {
-		atomic_inc(&current->sighand->count);
+		if (!(clone_flags & CLONE_THREAD))
+			atomic_inc(&current->sighand->count);
 		return 0;
 	}
 	sig = kmem_cache_alloc(sighand_cachep, GFP_KERNEL);
@@ -1896,7 +1897,8 @@ static __latent_entropy struct task_struct *copy_process(
 	if (!(clone_flags & CLONE_THREAD))
 		free_signal_struct(p->signal);
 bad_fork_cleanup_sighand:
-	__cleanup_sighand(p->sighand);
+	if (!(clone_flags & CLONE_THREAD))
+		__cleanup_sighand(p->sighand);
 bad_fork_cleanup_fs:
 	exit_fs(p); /* blocking */
 bad_fork_cleanup_files:
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC][PATCH v2 3/5] clone: Disallown CLONE_THREAD with a shared sighand_struct
       [not found]                                 ` <874ly6a0h1.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2017-04-02 22:51                                   ` [RFC][PATCH v2 1/5] ptrace: Don't wait in PTRACE_O_TRACEEXIT for exec or coredump Eric W. Biederman
  2017-04-02 22:51                                   ` [RFC][PATCH v2 2/5] sighand: Count each thread group once in sighand_struct Eric W. Biederman
@ 2017-04-02 22:52                                   ` Eric W. Biederman
       [not found]                                     ` <87k2728lrp.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2017-04-02 22:53                                   ` [RFC][PATCH v2 4/5] exec: If possible don't wait for ptraced threads to be reaped Eric W. Biederman
  2017-04-02 22:57                                   ` [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec Eric W. Biederman
  4 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-02 22:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


Old threading libraries used CLONE_SIGHAND without clone thread.
Modern threadding libraries always use CLONE_SIGHAND | CLONE_THREAD.

Therefore let's simplify our lives and stop supporting a case no one cares about.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/fork.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/fork.c b/kernel/fork.c
index fe6f1bf32bb9..0632ac1180be 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1515,6 +1515,13 @@ static __latent_entropy struct task_struct *copy_process(
 	if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
 		return ERR_PTR(-EINVAL);
 
+	/* Disallow CLONE_THREAD with a shared SIGHAND structure.  No
+	 * one cares and supporting it leads to unnecessarily complex
+	 * code.
+	 */
+	if ((clone_flags & CLONE_THREAD) && (atomic_read(&current->sighand->count) > 1))
+		return ERR_PTR(-EINVAL);
+
 	/*
 	 * Shared signal handlers imply shared VM. By way of the above,
 	 * thread groups also imply shared VM. Blocking this case allows
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC][PATCH v2 4/5] exec: If possible don't wait for ptraced threads to be reaped
       [not found]                                 ` <874ly6a0h1.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
                                                     ` (2 preceding siblings ...)
  2017-04-02 22:52                                   ` [RFC][PATCH v2 3/5] clone: Disallown CLONE_THREAD with a shared sighand_struct Eric W. Biederman
@ 2017-04-02 22:53                                   ` Eric W. Biederman
  2017-04-05 16:15                                     ` Oleg Nesterov
  2017-04-02 22:57                                   ` [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec Eric W. Biederman
  4 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-02 22:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


Take advantage of the situation when sighand->count == 1 to only wait
for threads to reach EXIT_ZOMBIE instead of EXIT_DEAD in de_thread.
Only old old linux threading libraries use CLONE_SIGHAND without
CLONE_THREAD.  So this situation should be present most of the time.

This allows ptracing through a multi-threaded exec without the danger
of stalling the exec.  As historically exec waits for the other
threads to be reaped in de_thread before completing.  This is
necessary as it is not safe to unshare the sighand_struct until all of
the other threads in this thread group are reaped, because the lock to
serialize threads in a thread group siglock lives in sighand_struct.

When oldsighand->count == 1 we know that there are no other
users and unsharing the sighand struct in exec is pointless.
This makes it safe to only wait for threads to become zombies
as the siglock won't change during exec and release_task
will use the samve siglock for the old threads as for
the new threads.

Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/exec.c       | 22 ++--------------------
 kernel/exit.c   | 18 ++++++++----------
 kernel/signal.c |  2 +-
 3 files changed, 11 insertions(+), 31 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 65145a3df065..303a114b00ce 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1071,9 +1071,6 @@ static int de_thread(struct task_struct *tsk)
 
 	sig->group_exit_task = tsk;
 	sig->notify_count = zap_other_threads(tsk);
-	if (!thread_group_leader(tsk))
-		sig->notify_count--;
-
 	while (sig->notify_count) {
 		__set_current_state(TASK_KILLABLE);
 		spin_unlock_irq(lock);
@@ -1092,23 +1089,8 @@ static int de_thread(struct task_struct *tsk)
 	if (!thread_group_leader(tsk)) {
 		struct task_struct *leader = tsk->group_leader;
 
-		for (;;) {
-			cgroup_threadgroup_change_begin(tsk);
-			write_lock_irq(&tasklist_lock);
-			/*
-			 * Do this under tasklist_lock to ensure that
-			 * exit_notify() can't miss ->group_exit_task
-			 */
-			sig->notify_count = -1;
-			if (likely(leader->exit_state))
-				break;
-			__set_current_state(TASK_KILLABLE);
-			write_unlock_irq(&tasklist_lock);
-			cgroup_threadgroup_change_end(tsk);
-			schedule();
-			if (unlikely(__fatal_signal_pending(tsk)))
-				goto killed;
-		}
+		cgroup_threadgroup_change_begin(tsk);
+		write_lock_irq(&tasklist_lock);
 
 		/*
 		 * The only record we have of the real-time age of a
diff --git a/kernel/exit.c b/kernel/exit.c
index 8c5b3e106298..955c96e3fc12 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -118,13 +118,6 @@ static void __exit_signal(struct task_struct *tsk)
 		tty = sig->tty;
 		sig->tty = NULL;
 	} else {
-		/*
-		 * If there is any task waiting for the group exit
-		 * then notify it:
-		 */
-		if (sig->notify_count > 0 && !--sig->notify_count)
-			wake_up_process(sig->group_exit_task);
-
 		if (tsk == sig->curr_target)
 			sig->curr_target = next_thread(tsk);
 	}
@@ -712,6 +705,8 @@ static void forget_original_parent(struct task_struct *father,
  */
 static void exit_notify(struct task_struct *tsk, int group_dead)
 {
+	struct sighand_struct *sighand = tsk->sighand;
+	struct signal_struct *signal = tsk->signal;
 	bool autoreap;
 	struct task_struct *p, *n;
 	LIST_HEAD(dead);
@@ -739,9 +734,12 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
 	if (tsk->exit_state == EXIT_DEAD)
 		list_add(&tsk->ptrace_entry, &dead);
 
-	/* mt-exec, de_thread() is waiting for group leader */
-	if (unlikely(tsk->signal->notify_count < 0))
-		wake_up_process(tsk->signal->group_exit_task);
+	spin_lock(&sighand->siglock);
+	/* mt-exec, de_thread is waiting for threads to exit */
+	if (signal->notify_count > 0 && !--signal->notify_count)
+		wake_up_process(signal->group_exit_task);
+
+	spin_unlock(&sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
 
 	list_for_each_entry_safe(p, n, &dead, ptrace_entry) {
diff --git a/kernel/signal.c b/kernel/signal.c
index 11fa736eb2ae..fd75ba33ee3d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1205,13 +1205,13 @@ int zap_other_threads(struct task_struct *p)
 
 	while_each_thread(p, t) {
 		task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
-		count++;
 
 		/* Don't bother with already dead threads */
 		if (t->exit_state)
 			continue;
 		sigaddset(&t->pending.signal, SIGKILL);
 		signal_wake_up(t, 1);
+		count++;
 	}
 
 	return count;
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec
       [not found]                                 ` <874ly6a0h1.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
                                                     ` (3 preceding siblings ...)
  2017-04-02 22:53                                   ` [RFC][PATCH v2 4/5] exec: If possible don't wait for ptraced threads to be reaped Eric W. Biederman
@ 2017-04-02 22:57                                   ` Eric W. Biederman
       [not found]                                     ` <87zify76z9.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  4 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-02 22:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


Add exec_id to signal_struct and compare it at a few choice moments.

I believe this closes the security holes that letting the zombie
threads linger after exec opens up.

The problem is that old threads may have different creds after a setuid
exec, and then formerly shared resources may change.  So signal sending
and accesses by proc need to be blocked.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/exec.c                    | 1 +
 include/linux/sched/signal.h | 1 +
 kernel/fork.c                | 1 +
 kernel/ptrace.c              | 4 ++++
 kernel/signal.c              | 7 ++++++-
 5 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/exec.c b/fs/exec.c
index 303a114b00ce..730dee8bb2f8 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1323,6 +1323,7 @@ void setup_new_exec(struct linux_binprm * bprm)
 	/* An exec changes our domain. We are no longer part of the thread
 	   group */
 	current->self_exec_id++;
+	current->signal->exec_id = current->self_exec_id;
 	flush_signal_handlers(current, 0);
 }
 EXPORT_SYMBOL(setup_new_exec);
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 2cf446704cd4..63ae951ee330 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -80,6 +80,7 @@ struct signal_struct {
 	atomic_t		live;
 	int			nr_threads;
 	struct list_head	thread_head;
+	u32			exec_id;
 
 	wait_queue_head_t	wait_chldexit;	/* for wait4() */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 0632ac1180be..a442fa099842 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1387,6 +1387,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 
 	sig->oom_score_adj = current->signal->oom_score_adj;
 	sig->oom_score_adj_min = current->signal->oom_score_adj_min;
+	sig->exec_id = current->self_exec_id;
 
 	mutex_init(&sig->cred_guard_mutex);
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 0af928712174..cc6b10b1ffbe 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -277,6 +277,10 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 	 * or halting the specified task is impossible.
 	 */
 
+	/* Don't allow inspecting a thread after exec */
+	if (task->self_exec_id != task->signal->exec_id)
+		return 1;
+
 	/* Don't let security modules deny introspection */
 	if (same_thread_group(task, current))
 		return 0;
diff --git a/kernel/signal.c b/kernel/signal.c
index fd75ba33ee3d..fe8dcdb622f5 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -995,6 +995,10 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 			from_ancestor_ns || (info == SEND_SIG_FORCED)))
 		goto ret;
 
+	/* Don't allow thread group signals after exec */
+	if (group && (t->signal->exec_id != t->self_exec_id))
+		goto ret;
+
 	pending = group ? &t->signal->shared_pending : &t->pending;
 	/*
 	 * Short-circuit ignored signals and support queuing
@@ -1247,7 +1251,8 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
 		 * must see ->sighand == NULL.
 		 */
 		spin_lock(&sighand->siglock);
-		if (likely(sighand == tsk->sighand)) {
+		if (likely((sighand == tsk->sighand) &&
+			   (tsk->self_exec_id == tsk->signal->exec_id))) {
 			rcu_read_unlock();
 			break;
 		}
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH 2/2] exec: If possible don't wait for ptraced threads to be reaped
       [not found]                                         ` <877f32k5ew.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-04-03 18:12                                           ` Oleg Nesterov
  2017-04-03 21:04                                             ` Eric W. Biederman
  0 siblings, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-03 18:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On 04/02, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>
> > And btw zap_other_threads(may_hang == 0) is racy. Either you need tasklist or
> > exit_notify() should set tsk->exit_state under siglock, otherwise zap() can
> > return the wrong count.
>
> zap_other_thread(tsk, 0) only gets called in the case where we don't
> care about the return value.  It does not get called from fs/exec.c

I meant that may_hang == 0 implies zap_other_threads(do_count => -1) which should
return the number of threads which didn't pass exit_notify(). The returned value
can be wrong unless you change exit_notify() to set exit_state under siglock.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.
       [not found]                                 ` <87inmmbjsq.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-04-03 18:37                                   ` Oleg Nesterov
       [not found]                                     ` <20170403183728.GB31390-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-09-04  3:19                                   ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Robert O'Callahan
  1 sibling, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-03 18:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Eric,

I see another series from you, but I simply failed to force myself to read
it carefully. Because at first glance it makes me really sad, I do dislike
it even if it is correct. Yes, yes, sure, I can be wrong. Will try tomorrow.

On 04/02, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>
> > Anyway, Eric, even if we can and want to do this, why we can't do this on
> > top of my fix?
>
> Because your reduction in scope of cred_guard_mutex is fundamentally
> broken and unnecessary.

And you never explained why it is wrong, or I failed to understand you.

> > I simply fail to understand why you dislike it that much. Yes it is not
> > pretty, I said this many times, but it is safe in that it doesn't really
> > change the current behaviour.
>
> No it is not safe.  And it promotes wrong thinking which is even more
> dangerous.

So please explain why it is not safe and why it is dangerous.

Just in case, if you mean flush_signal_handlers() outside of cred_guard_mutex,
please explain what I have missed in case you still think this is wrong.

> I reviewed the code and cred_guard_mutex needs to cover what it covers.

I strongly, strongly disagree. Its scope is unnecessary huge, we should narrow
it in any case, even if the current code was not bugy. But this is almost
offtopic, lets discuss this separately.

> > I am much more worried about 2/2 you didn't argue with, this patch _can_
> > break something and this is obviously not good even if PTRACE_EVENT_EXIT
> > was always broken.
>
> I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually
> know what the implications of changing it are.  Let's see...

And nobody knows ;) This is the problem, even the clear ptrace bugfix can
break something, this happened before and we had to revert the obviously-
correct patches; the bug was already used as feature.

> If delivering a second SIGKILL
...
> So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT
> before the tracers find it.
>
> Therefore we are only talking a quality of implementation issue
> if we actually stop and wait for the tracer or not.

Oh, this is another story, needs another discussion. We really need some
changes in this area, we need to distinguish SIGKILL sent from user-space
and (say) from group-exit, and we need to decide when should we stop.

But at least I think the tracee should never stop if SIGKILL comes from
user space. And yes ptrace_stop() is ugly and wrong, just look at the
arch_ptrace_stop_needed() check. The problem, again, is that any fix will
be user-visible.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH 2/2] exec: If possible don't wait for ptraced threads to be reaped
  2017-04-03 18:12                                           ` Oleg Nesterov
@ 2017-04-03 21:04                                             ` Eric W. Biederman
  2017-04-05 16:44                                               ` Oleg Nesterov
  0 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-03 21:04 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel, linux-api

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/02, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> > And btw zap_other_threads(may_hang == 0) is racy. Either you need tasklist or
>> > exit_notify() should set tsk->exit_state under siglock, otherwise zap() can
>> > return the wrong count.
>>
>> zap_other_thread(tsk, 0) only gets called in the case where we don't
>> care about the return value.  It does not get called from fs/exec.c
>
> I meant that may_hang == 0 implies zap_other_threads(do_count => -1) which should
> return the number of threads which didn't pass exit_notify(). The returned value
> can be wrong unless you change exit_notify() to set exit_state under
> siglock.

Interesting an existing bug.  I won't deny that one.

Subtle to catch but easy enough to fix.

Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.
       [not found]                                     ` <20170403183728.GB31390-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-03 22:49                                       ` Eric W. Biederman
  2017-04-03 22:49                                       ` scope of cred_guard_mutex Eric W. Biederman
  1 sibling, 0 replies; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-03 22:49 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> Eric,
>
> I see another series from you, but I simply failed to force myself to read
> it carefully. Because at first glance it makes me really sad, I do dislike
> it even if it is correct. Yes, yes, sure, I can be wrong. Will try
> tomorrow.

Yes.  I needed to get my thoughts concrete.  I missed fixing the race in
zap_other_threads.  But overall I think things are moving in a good
direction.

>>
>> I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually
>> know what the implications of changing it are.  Let's see...
>
> And nobody knows ;) This is the problem, even the clear ptrace bugfix can
> break something, this happened before and we had to revert the obviously-
> correct patches; the bug was already used as feature.

Yes that is the challenge of changing userspace.   Which is why it helps
to test as much of a userspace change as possible.  Or to get very
clever, and figure out how to avoid the userspace change.

So I think it is worth knowing the lldb actually uses
PTRACE_O_TRACEEXIT.  So we can test at least some programs to verify
that all is well.

I don't see any way around cleaning up PTRACE_O_TRACEEXIT.  As
we fundamentally have the non-thread-group-leader exec problem.
We have to reap that previous leader thread with release_task.
Which means we can't stop for a PTRACE_O_TRACEEXIT.


>> If delivering a second SIGKILL
> ...
>> So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT
>> before the tracers find it.
>>
>> Therefore we are only talking a quality of implementation issue
>> if we actually stop and wait for the tracer or not.
>
> Oh, this is another story, needs another discussion. We really need some
> changes in this area, we need to distinguish SIGKILL sent from user-space
> and (say) from group-exit, and we need to decide when should we stop.
>
> But at least I think the tracee should never stop if SIGKILL comes from
> user space. And yes ptrace_stop() is ugly and wrong, just look at the
> arch_ptrace_stop_needed() check. The problem, again, is that any fix will
> be user-visible.

The only issue I see is that arch_ptrace_stop() may sleep (sparc and
ia64 do as they flush the register stack to memory).  As the
code may sleep it means we can't set TASK_TRACED until after calling
arch_ptrace_stop().

My inclination is to just solve that by saying:
if (!sigkill_pending(current))
	set_current_task(TASK_TRACED);

That removes the special case.  We have to handle SIGKILL being
delivered immediately after set_current_state in any event.  And as we
are talking about something that happens on rare architecutres I don't
see any problem with tweaking that code at all.

It is closely enough related I will fold that into the next version of
my patch.

Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* scope of cred_guard_mutex.
       [not found]                                     ` <20170403183728.GB31390-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-04-03 22:49                                       ` Eric W. Biederman
@ 2017-04-03 22:49                                       ` Eric W. Biederman
  2017-04-05 16:08                                         ` Oleg Nesterov
       [not found]                                         ` <87fuhpjeco.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  1 sibling, 2 replies; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-03 22:49 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> On 04/02, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>
>> > Anyway, Eric, even if we can and want to do this, why we can't do this on
>> > top of my fix?
>>
>> Because your reduction in scope of cred_guard_mutex is fundamentally
>> broken and unnecessary.
>
> And you never explained why it is wrong, or I failed to understand you.
>
>> > I simply fail to understand why you dislike it that much. Yes it is not
>> > pretty, I said this many times, but it is safe in that it doesn't really
>> > change the current behaviour.
>>
>> No it is not safe.  And it promotes wrong thinking which is even more
>> dangerous.
>
> So please explain why it is not safe and why it is dangerous.
>
> Just in case, if you mean flush_signal_handlers() outside of cred_guard_mutex,
> please explain what I have missed in case you still think this is wrong.

>> I reviewed the code and cred_guard_mutex needs to cover what it covers.
>
> I strongly, strongly disagree. Its scope is unnecessary huge, we should narrow
> it in any case, even if the current code was not bugy. But this is almost
> offtopic, lets discuss this separately.

You have asked why I have problems with your patch and so I am going to
try to explain.  Partly I want to see a clean set of patches that we
can merge into Linus's tree before we make any compromises.  Because the
work preparing a clean patchset may inform us of something better.  Plus
we need to make something clean and long term maintainable in any event.

Partly I object because your understanding and my understanding of
cred_guard_mutex are very different.

As I read and understand the code the job of cred_guard_mutex is to keep
ptrace (and other threads of the proccess) from interferring in
exec and to ensure old resources are accessed with permission checks
using our original credentials and that new and modified resources are
accessed with permission checks using our new credentials.


I object to your patch in particular because you deliberately mess up
the part of only making old resources available with old creds and
new resources available with new creds.  Even if the current permission
checks are a don't care it still remains conceptually wrong.  And
conceptually wrong tends code tends towards maintenance problems
and real surprises when someone makes small changes to the code.  Which
is what I mean when I say your patch is dangerous.


AKA What I see neededing to be protected looks something like:
	mutex_lock();
        new_cred = compute_new_cred(tsk);
        new_mm = compute_new_mm(tsk);
        tsk->mm = new_mm;
        tsk->cred = new_cred;
        zap_other_threads(tsk);
        update_sighand(tsk);
        update_signal(tsk);
        do_close_on_exec();
        update_tsk_fields(tsk);
        mutex_unlock();

The only way I can see of reducing the scope of cred_guard_mutex is
performing work in such a way that ptrace and the other threads can't
interfere and then taking the lock.  Computing the new mm and the new
credentials are certainly candidates for that kind of treatment.

Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: scope of cred_guard_mutex.
  2017-04-03 22:49                                       ` scope of cred_guard_mutex Eric W. Biederman
@ 2017-04-05 16:08                                         ` Oleg Nesterov
  2017-04-05 16:11                                           ` Kees Cook
  2017-04-05 17:53                                           ` Eric W. Biederman
       [not found]                                         ` <87fuhpjeco.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  1 sibling, 2 replies; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-05 16:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel, linux-api

On 04/03, Eric W. Biederman wrote:
>
> You have asked why I have problems with your patch and so I am going to
> try to explain.  Partly I want to see a clean set of patches that we
> can merge into Linus's tree before we make any compromises.  Because the
> work preparing a clean patchset may inform us of something better.  Plus
> we need to make something clean and long term maintainable in any event.
>
> Partly I object because your understanding and my understanding of
> cred_guard_mutex are very different.

And I think there is another problem, your understanding and my understanding
of "clean" differ too much and it seems that we can not convince each other ;)

The last series looks buggy (I'll send more emails later today), but the
main problem is that - in my opinion! - your approach is "obviously wrong
and much less clean". But yes, yes, I understand that this is my opinion,
and I can be wrong.

Eric, I think we need more CC's. Linus, probably security list, the more
the better.

I am going to resend my series with more CC's, then you can nack it and
explain what you think we should do. Perhaps someone else will suggest
a better solution, or at least review the patches. OK?

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: scope of cred_guard_mutex.
  2017-04-05 16:08                                         ` Oleg Nesterov
@ 2017-04-05 16:11                                           ` Kees Cook
  2017-04-05 17:53                                           ` Eric W. Biederman
  1 sibling, 0 replies; 42+ messages in thread
From: Kees Cook @ 2017-04-05 16:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, Andrew Morton, Aleksa Sarai, Andy Lutomirski,
	Attila Fazekas, Jann Horn, Michal Hocko, Ulrich Obergfell, LKML,
	Linux API

On Wed, Apr 5, 2017 at 9:08 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 04/03, Eric W. Biederman wrote:
>>
>> You have asked why I have problems with your patch and so I am going to
>> try to explain.  Partly I want to see a clean set of patches that we
>> can merge into Linus's tree before we make any compromises.  Because the
>> work preparing a clean patchset may inform us of something better.  Plus
>> we need to make something clean and long term maintainable in any event.
>>
>> Partly I object because your understanding and my understanding of
>> cred_guard_mutex are very different.
>
> And I think there is another problem, your understanding and my understanding
> of "clean" differ too much and it seems that we can not convince each other ;)
>
> The last series looks buggy (I'll send more emails later today), but the
> main problem is that - in my opinion! - your approach is "obviously wrong
> and much less clean". But yes, yes, I understand that this is my opinion,
> and I can be wrong.
>
> Eric, I think we need more CC's. Linus, probably security list, the more
> the better.
>
> I am going to resend my series with more CC's, then you can nack it and
> explain what you think we should do. Perhaps someone else will suggest
> a better solution, or at least review the patches. OK?

I've been following along, but it seems like there are a lot of edge
cases in these changes. I'll try to meaningfully comment on the coming
emails... having code examples of why various things will/won't work
go a long way for helping understand what's safe or not...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH v2 4/5] exec: If possible don't wait for ptraced threads to be reaped
  2017-04-02 22:53                                   ` [RFC][PATCH v2 4/5] exec: If possible don't wait for ptraced threads to be reaped Eric W. Biederman
@ 2017-04-05 16:15                                     ` Oleg Nesterov
  0 siblings, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-05 16:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel, linux-api

On 04/02, Eric W. Biederman wrote:
>
> Take advantage of the situation when sighand->count == 1 to only wait
> for threads to reach EXIT_ZOMBIE instead of EXIT_DEAD in de_thread.

Let me comment this patch first, it looks mostly fine to me.

And note that this is what my patch does too: exec() waits until
all threads pass exit_notify() and drops cred_guard_mutex.

However, with my patch patch exec() then waits until all threads
disappear. This is uglifies the code but this is simple and safe.

With your patches exec doesn't do another wait and succeeds after
the 1st wait. I think this is wrong and the next patch is not enough.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec
       [not found]                                     ` <87zify76z9.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-04-05 16:18                                       ` Oleg Nesterov
       [not found]                                         ` <20170405161812.GD14536-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-05 16:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On 04/02, Eric W. Biederman wrote:
>
> Add exec_id to signal_struct and compare it at a few choice moments.

I really dislike this change no matter what, sorry.

Firstly, task_struct->*_exec_id should simply die (I already have the
patch), or at least they should be moved into signal_struct simply
because this is per-process thing.

> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -995,6 +995,10 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
>  			from_ancestor_ns || (info == SEND_SIG_FORCED)))
>  		goto ret;
>
> +	/* Don't allow thread group signals after exec */
> +	if (group && (t->signal->exec_id != t->self_exec_id))
> +		goto ret;

Hmm. Either we do not need this exec_id check at all, or we should not
take "group" into account; a fatal signal (say SIGKILL) will kill the
whole thread-group.

> @@ -1247,7 +1251,8 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
>  		 * must see ->sighand == NULL.
>  		 */
>  		spin_lock(&sighand->siglock);
> -		if (likely(sighand == tsk->sighand)) {
> +		if (likely((sighand == tsk->sighand) &&
> +			   (tsk->self_exec_id == tsk->signal->exec_id))) {

Oh, this doesn't look good to me. Yes, with your approach we probably need
this to, say, ensure that posix-cpu-timer can't kill the process after exec,
but I'd rather add the exit_state check into run_posix_timers().

But OK, suppose that we fix the problems with signal-after-exec.

====================================================================
Now lets fix another problem. A mt exec suceeds and apllication does
sys_seccomp(SECCOMP_FILTER_FLAG_TSYNC) which fails because it finds
another (zombie) SECCOMP_MODE_FILTER thread.

And after we fix this problem, what else we will need to fix?


I really think that - whatever we do - there should be no other threads
after exec, even zombies.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH v2 1/5] ptrace: Don't wait in PTRACE_O_TRACEEXIT for exec or coredump
  2017-04-02 22:51                                   ` [RFC][PATCH v2 1/5] ptrace: Don't wait in PTRACE_O_TRACEEXIT for exec or coredump Eric W. Biederman
@ 2017-04-05 16:19                                     ` Oleg Nesterov
  0 siblings, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-05 16:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel, linux-api

On 04/02, Eric W. Biederman wrote:
>
> In the case of exec and coredump which have many interesting deadlock
> opportunities

So this patch is very close to my 2/2 one-liner, except

	- you removed the current->mm == current->parent->mm check

	  I didn't do this on purpose, because even the->core_state
	  is not really needed if we check ->group_exit_task, this
	  need more changes anyway, but I won't argue.

	- With your patch we send the notification to debugger even
	  if we are not going to stop.

	  This is not wrong, but why? This is pointless, nobody rely
	  on SIGCHLD, if nothing else it doesn't queue.

	  Again, I won't argue, but this complicates both the patch
	  and the code for no reason. Unless I missed something.

> Keep sending the signal to the tracer so that this appears like
> the worst case where someone else sent the process a SIGKILL before
> the tracer could react.  So all non-buggy tracers must support
> this case.

Well, I can't understand the changelog. Sure, debugger must support
this case, but obviously this can break things anyway.

For example. The coredumping thread must stop in PTRACE_EVENT_EXIT.
There is a tool (I don't remember its name) which does
ptrace_attach(PTRACE_SEIZE, PTRACE_O_TRACEEXIT) after the coredump
was already started, closes the pipe, and reads the registers when
this thread actually exits.

This patch or my 2/2 should not break it, ->group_exit_task will be
cleared after do_coredump(), but unfortunately something else can be
broken.

So I think the changelog should mention that yes, this is the user
visible change which _can_ break something anyway.

In short. I will be really happy if this patch comes from you, not me ;)

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH v2 3/5] clone: Disallown CLONE_THREAD with a shared sighand_struct
       [not found]                                     ` <87k2728lrp.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-04-05 16:24                                       ` Oleg Nesterov
  2017-04-05 17:34                                         ` Eric W. Biederman
  0 siblings, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-05 16:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On 04/02, Eric W. Biederman wrote:
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1515,6 +1515,13 @@ static __latent_entropy struct task_struct *copy_process(
>  	if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
>  		return ERR_PTR(-EINVAL);
>
> +	/* Disallow CLONE_THREAD with a shared SIGHAND structure.  No
> +	 * one cares

Well, can't resists... I won't argue, but we can't know if no one cares
or not. I agree that most probably this won't break something, but who
knows... I am always scared when we add the incompatible changes.

> and supporting it leads to unnecessarily complex
> +	 * code.
> +	 */
> +	if ((clone_flags & CLONE_THREAD) && (atomic_read(&current->sighand->count) > 1))
> +		return ERR_PTR(-EINVAL);

Perhaps the comment should explain why we do this and say that
sighand-unsharing in de_thread() depends on this.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH 2/2] exec: If possible don't wait for ptraced threads to be reaped
  2017-04-03 21:04                                             ` Eric W. Biederman
@ 2017-04-05 16:44                                               ` Oleg Nesterov
  0 siblings, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-05 16:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel, linux-api

On 04/03, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > I meant that may_hang == 0 implies zap_other_threads(do_count => -1) which should
> > return the number of threads which didn't pass exit_notify(). The returned value
> > can be wrong unless you change exit_notify() to set exit_state under
> > siglock.

but I forgot to add that, of course, this problem is very minor because
we can only miss a thread which is already at the end of exit_notify()
so nothing bad can happen.

But imo should be fixed anyway, simply because this looks wrong/racy.
Your recent 4/5 has the same problem.

> Interesting an existing bug.

Hmm... what do you mean? The current code looks fine.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH v2 3/5] clone: Disallown CLONE_THREAD with a shared sighand_struct
  2017-04-05 16:24                                       ` Oleg Nesterov
@ 2017-04-05 17:34                                         ` Eric W. Biederman
  2017-04-05 18:11                                           ` Oleg Nesterov
  0 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-05 17:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel, linux-api

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/02, Eric W. Biederman wrote:
>>
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -1515,6 +1515,13 @@ static __latent_entropy struct task_struct *copy_process(
>>  	if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
>>  		return ERR_PTR(-EINVAL);
>>
>> +	/* Disallow CLONE_THREAD with a shared SIGHAND structure.  No
>> +	 * one cares
>
> Well, can't resists... I won't argue, but we can't know if no one cares
> or not. I agree that most probably this won't break something, but who
> knows... I am always scared when we add the incompatible changes.

I agree that changing userspace semantics is something to be very
careful with.  But at least for purposes of discussion I think this is a
good patch.

I can avoid this change but it requires moving sighand->siglock
into signal_struct and introducing a new spinlock into sighand_struct
to just guard the signal handlers.

However I think the change to move siglock would be a distraction from
the larger issues of this patchset.

Once we address the core issues I will be happy to revisit this.

>> and supporting it leads to unnecessarily complex
>> +	 * code.
>> +	 */
>> +	if ((clone_flags & CLONE_THREAD) && (atomic_read(&current->sighand->count) > 1))
>> +		return ERR_PTR(-EINVAL);
>
> Perhaps the comment should explain why we do this and say that
> sighand-unsharing in de_thread() depends on this.

That would be a better comment.

Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: scope of cred_guard_mutex.
  2017-04-05 16:08                                         ` Oleg Nesterov
  2017-04-05 16:11                                           ` Kees Cook
@ 2017-04-05 17:53                                           ` Eric W. Biederman
  2017-04-05 18:15                                             ` Oleg Nesterov
  1 sibling, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-05 17:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel, linux-api

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/03, Eric W. Biederman wrote:
>>
>> You have asked why I have problems with your patch and so I am going to
>> try to explain.  Partly I want to see a clean set of patches that we
>> can merge into Linus's tree before we make any compromises.  Because the
>> work preparing a clean patchset may inform us of something better.  Plus
>> we need to make something clean and long term maintainable in any event.
>>
>> Partly I object because your understanding and my understanding of
>> cred_guard_mutex are very different.
>
> And I think there is another problem, your understanding and my understanding
> of "clean" differ too much and it seems that we can not convince each other ;)

We have barely begun.  You have not shown anyone what your idea of a
clean fix actually is.  All I have seen from you is a quick hack that is
a hack that is back-portable.

Focusing on the back port is the wrong order to solve the issue in.  We
need to solve this in an upstream mergable and maintainable way and then
we can worry about backports.

>From a userspace perspective.  I find anything in the kernel blocking on
a zombie to be just wrong.  A zombie is dead.  Waiting for a zombie should
in all cases be optional.  The system should not break if we don't reap
a zombie.

You have made a clear case that the zombies need to exist for strace -f
to wait on.  So since the zombies must exist we should make them follow
normal zombie rules.

With your change exec still blocks waiting for zombies.

Furthermore you have to violate the reasonable rule that:
* pre-exec resources are guarded with the pre-exec process cred.
* post-exec resources are guarded with the post-exec process cred.

So from a userspace perspective I think your semantics are absolutely
insane.

We also need clean code to implement all of this (which I am still
inching towards).  But we need to be implementing something that makes
sense from a userspace perspective.


> The last series looks buggy (I'll send more emails later today), but the
> main problem is that - in my opinion! - your approach is "obviously wrong
> and much less clean". But yes, yes, I understand that this is my opinion,
> and I can be wrong.

How is changing fixing the implementation so that we don't block waiting
for zombies to be reaped wrong?

> Eric, I think we need more CC's. Linus, probably security list, the more
> the better.
>
> I am going to resend my series with more CC's, then you can nack it and
> explain what you think we should do. Perhaps someone else will suggest
> a better solution, or at least review the patches. OK?

I will be happy to look but my primary objectionions to your patch were:

- You implemented a hack for backporting rather than fixing things
  cleanly the first time.

- You made comments about cred_guard_mutex and it's scope that when I
  reviewed the code were false.  cred_guard_mutex although probably the
  wrong locking structure is semantically and fundamentally where it
  needs to be.  We can optimize it but we can't change what is protected
  to make our lives easier.

Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH v2 3/5] clone: Disallown CLONE_THREAD with a shared sighand_struct
  2017-04-05 17:34                                         ` Eric W. Biederman
@ 2017-04-05 18:11                                           ` Oleg Nesterov
  0 siblings, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-05 18:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel, linux-api

On 04/05, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> I agree that changing userspace semantics is something to be very
> careful with.  But at least for purposes of discussion I think this is a
> good patch.

I agree that we need it with your approach,

but imo it would be much better to not depend on the subtle changes
like this. My 2/2 or your 1/5 are already bad enough.

> I can avoid this change but it requires moving sighand->siglock
> into signal_struct and introducing a new spinlock into sighand_struct
> to just guard the signal handlers.

Oh, this looks much, much worse to me.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: scope of cred_guard_mutex.
  2017-04-05 17:53                                           ` Eric W. Biederman
@ 2017-04-05 18:15                                             ` Oleg Nesterov
  0 siblings, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-05 18:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel, linux-api

On 04/05, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> - You made comments about cred_guard_mutex and it's scope that when I
>   reviewed the code were false.

Too late for me. I'll try to read other emails from you and reply tomorrow.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec
       [not found]                                         ` <20170405161812.GD14536-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-05 18:16                                           ` Eric W. Biederman
       [not found]                                             ` <87zifu90to.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2017-04-05 18:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> On 04/02, Eric W. Biederman wrote:
>>
>> Add exec_id to signal_struct and compare it at a few choice moments.
>
> I really dislike this change no matter what, sorry.
>
> Firstly, task_struct->*_exec_id should simply die (I already have the
> patch), or at least they should be moved into signal_struct simply
> because this is per-process thing.

I am quite happy to find a better way to implement this.  More than
anything this was my proof of concept that it is possible to close
the security holes created if we allow our zombies to be normal zombies.


>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -995,6 +995,10 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
>>  			from_ancestor_ns || (info == SEND_SIG_FORCED)))
>>  		goto ret;
>>
>> +	/* Don't allow thread group signals after exec */
>> +	if (group && (t->signal->exec_id != t->self_exec_id))
>> +		goto ret;
>
> Hmm. Either we do not need this exec_id check at all, or we should not
> take "group" into account; a fatal signal (say SIGKILL) will kill the
> whole thread-group.

Wow.  Those are crazy semantics for fatal signals.  Sending a tkill
should not affect the entire thread group.  Oleg I think this is a bug
you introduced and likely requires a separate fix.

I really don't understand the logic in:

commit 5fcd835bf8c2cde06404559b1904e2f1dfcb4567
Author: Oleg Nesterov <oleg-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org>
Date:   Wed Apr 30 00:52:55 2008 -0700

    signals: use __group_complete_signal() for the specific signals too
    
    Based on Pavel Emelyanov's suggestion.
    
    Rename __group_complete_signal() to complete_signal() and use it to process
    the specific signals too.  To do this we simply add the "int group" argument.
    
    This allows us to greatly simply the signal-sending code and adds a useful
    behaviour change.  We can avoid the unneeded wakeups for the private signals
    because wants_signal() is more clever than sigismember(blocked), but more
    importantly we now take into account the fatal specific signals too.
    
    The latter allows us to kill some subtle checks in handle_stop_signal() and
    makes the specific/group signal's behaviour more consistent.  For example,
    currently sigtimedwait(FATAL_SIGNAL) behaves differently depending on was the
    signal sent by kill() or tkill() if the signal was not blocked.
    
    And.  This allows us to tweak/fix the behaviour when the specific signal is
    sent to the dying/dead ->group_leader.
    
    Signed-off-by: Pavel Emelyanov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
    Signed-off-by: Oleg Nesterov <oleg-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org>
    Cc: Roland McGrath <roland-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    Signed-off-by: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
    Signed-off-by: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>

>> @@ -1247,7 +1251,8 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
>>  		 * must see ->sighand == NULL.
>>  		 */
>>  		spin_lock(&sighand->siglock);
>> -		if (likely(sighand == tsk->sighand)) {
>> +		if (likely((sighand == tsk->sighand) &&
>> +			   (tsk->self_exec_id == tsk->signal->exec_id))) {
>
> Oh, this doesn't look good to me. Yes, with your approach we probably need
> this to, say, ensure that posix-cpu-timer can't kill the process after exec,
> but I'd rather add the exit_state check into run_posix_timers().

The entire point of lock_task_sighand is to not operate on
tasks/processes that have exited. The fact it even sighand in there is
deceptive because it is all about siglock and nothing to do with
sighand.

> But OK, suppose that we fix the problems with signal-after-exec.
>
> ====================================================================
> Now lets fix another problem. A mt exec suceeds and apllication does
> sys_seccomp(SECCOMP_FILTER_FLAG_TSYNC) which fails because it finds
> another (zombie) SECCOMP_MODE_FILTER thread.
>
> And after we fix this problem, what else we will need to fix?
>
>
> I really think that - whatever we do - there should be no other threads
> after exec, even zombies.

I see where you are coming from.

I need to stare at this a bit longer.  Because you are right.  Reusing
the signal_struct and leaving zombies around is very prone to bugs.  So
it is not very maintainable.

I suspect the answer here is to simply allocate a new sighand_struct and
a new signal_struct if there we are not single threaded by the time we
get down to the end of de_thread.

However even if it is a case of whack-a-mole semantically
not-blocking-for-zombies looks like the right thing to do and we need to
figure out how to do it maintainably.

Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec
       [not found]                                             ` <87zifu90to.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-04-06 15:48                                               ` Oleg Nesterov
  0 siblings, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-06 15:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Eugene Syromiatnikov

On 04/05, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>
> >> --- a/kernel/signal.c
> >> +++ b/kernel/signal.c
> >> @@ -995,6 +995,10 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
> >>  			from_ancestor_ns || (info == SEND_SIG_FORCED)))
> >>  		goto ret;
> >>
> >> +	/* Don't allow thread group signals after exec */
> >> +	if (group && (t->signal->exec_id != t->self_exec_id))
> >> +		goto ret;
> >
> > Hmm. Either we do not need this exec_id check at all, or we should not
> > take "group" into account; a fatal signal (say SIGKILL) will kill the
> > whole thread-group.
>
> Wow.  Those are crazy semantics for fatal signals.  Sending a tkill
> should not affect the entire thread group.

How so? SIGKILL or any fatal signal should kill the whole process, even if
it was sent by tkill().

> Oleg I think this is a bug
> you introduced and likely requires a separate fix.
>
> I really don't understand the logic in:
>
> commit 5fcd835bf8c2cde06404559b1904e2f1dfcb4567
> Author: Oleg Nesterov <oleg-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org>
> Date:   Wed Apr 30 00:52:55 2008 -0700
>
>     signals: use __group_complete_signal() for the specific signals too

No. You can even forget about "send" path for the moment. Just suppose that
a thread dequeues SIGKILL sent by tkill(). In this case it will call
do_group_exit() and kill the group anyway. It is not possible to kill an
individual thread, and linux never did this.

Afaics, this commit also fixes the case when SIGKILL can be lost when tkill()
races with the exiting target. Or if the target is a zombie-leader. Exactly
because they obviously can't dequeue SIGKILL.

Plus we want to shutdown the whole thread-group "asap", that is why
complete_signal() sets SIGNAL_GROUP_EXIT and sends SIGKILL to other threads
in the "send" path.

This btw reminds me that we want to do the same with sig_kernel_coredump()
signals too, but this is not simple.

> >> @@ -1247,7 +1251,8 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
> >>  		 * must see ->sighand == NULL.
> >>  		 */
> >>  		spin_lock(&sighand->siglock);
> >> -		if (likely(sighand == tsk->sighand)) {
> >> +		if (likely((sighand == tsk->sighand) &&
> >> +			   (tsk->self_exec_id == tsk->signal->exec_id))) {
> >
> > Oh, this doesn't look good to me. Yes, with your approach we probably need
> > this to, say, ensure that posix-cpu-timer can't kill the process after exec,
> > but I'd rather add the exit_state check into run_posix_timers().
>
> The entire point of lock_task_sighand is to not operate on
> tasks/processes that have exited.

Well, the entire point of lock_task_sighand() is take ->siglock if possible.

> The fact it even sighand in there is
> deceptive because it is all about siglock and nothing to do with
> sighand.

Not sure I understand what you mean...

Yes, lock_task_sighand() can obviously fail, and yes the failure is used
as an indication that this thread has gone. But a zombie thread controlled
by the parent/debugger has not gone yet.

> > ====================================================================
> > Now lets fix another problem. A mt exec suceeds and apllication does
> > sys_seccomp(SECCOMP_FILTER_FLAG_TSYNC) which fails because it finds
> > another (zombie) SECCOMP_MODE_FILTER thread.
> >
> > And after we fix this problem, what else we will need to fix?
> >
> >
> > I really think that - whatever we do - there should be no other threads
> > after exec, even zombies.
>
> I see where you are coming from.
>
> I need to stare at this a bit longer.  Because you are right.  Reusing
> the signal_struct and leaving zombies around is very prone to bugs.  So
> it is not very maintainable.

Yes, yes, yes. This is what I was arguing with.

> I suspect the answer here is to simply allocate a new sighand_struct and
> a new signal_struct if there we are not single threaded by the time we
> get down to the end of de_thread.

May be. Not sure. Looks very nontrivial.

And I still think that if we do this, we should fix the bug first, then try
to do something like this.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: scope of cred_guard_mutex.
       [not found]                                         ` <87fuhpjeco.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-04-06 15:55                                           ` Oleg Nesterov
       [not found]                                             ` <20170406155540.GC7444-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2017-04-06 15:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Aleksa Sarai, Andy Lutomirski, Attila Fazekas,
	Jann Horn, Kees Cook, Michal Hocko, Ulrich Obergfell,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Eugene Syromiatnikov

On 04/03, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>
> >> I reviewed the code and cred_guard_mutex needs to cover what it covers.
> >
> > I strongly, strongly disagree. Its scope is unnecessary huge, we should narrow
> > it in any case, even if the current code was not bugy. But this is almost
> > offtopic, lets discuss this separately.

And let me repeat/clarify.

I meant, I see absolutely no reason to do copy_strings() with this mutex held,
for example. And note that copy_strings() can use a lot of memory/time, it can
trigger oom,swapping, etc. I also think that we can probably do
check_unsafe_exec() which in particular sets LSM_UNSAFE_ at bit later, but I
am less sure about this and this needs more work.

And perhaps more changes like this to narrow the scope of this mutex. And I
thought you were already agree with this?


> You have asked why I have problems with your patch and so I am going to
> try to explain.  Partly I want to see a clean set of patches that we
> can merge into Linus's tree before we make any compromises.

Sure, me too. I do not see a simple and clean fix, your attempts were wrong
so far and imo were worse even if they were correct.

And this makes me think again that we need to restart this discusion with
more CC's.

> Partly I object because your understanding and my understanding of
> cred_guard_mutex are very different.
>
> As I read and understand the code the job of cred_guard_mutex is to keep
> ptrace (and other threads of the proccess) from interferring in
> exec and to ensure old resources are accessed with permission checks
> using our original credentials and that new and modified resources are
> accessed with permission checks using our new credentials.

Yes, this is clear.

> I object to your patch in particular because you deliberately mess up
> the part of only making old resources available with old creds and
> new resources available with new creds.

Could you spell please? I don't understand.

> AKA What I see neededing to be protected looks something like:
> 	mutex_lock();
>         new_cred = compute_new_cred(tsk);
>         new_mm = compute_new_mm(tsk);
>         tsk->mm = new_mm;
>         tsk->cred = new_cred;
>         zap_other_threads(tsk);
>         update_sighand(tsk);
>         update_signal(tsk);
>         do_close_on_exec();
>         update_tsk_fields(tsk);
>         mutex_unlock();
>
> The only way I can see of reducing the scope of cred_guard_mutex is
> performing work in such a way that ptrace and the other threads can't
> interfere and then taking the lock.  Computing the new mm and the new
> credentials are certainly candidates for that kind of treatment.

OK. And yes, I am not sure this all is optimal, but didn't I say from
the very beginning that unlikely we can change this?

Now let me quote your next email:

On 04/05, Eric W. Biederman wrote:
>
> We have barely begun.  You have not shown anyone what your idea of a
> clean fix actually is.  All I have seen from you is a quick hack that is
> a hack that is back-portable.

Yes. I really tried to make a back-portable fix. Otherwise I would start
with ->notify_count cleanups.

> With your change exec still blocks waiting for zombies.

And this is what we currently do. Whether we should do this may be debatable,
but why do you blame my patch?

> Furthermore you have to violate the reasonable rule that:
> * pre-exec resources are guarded with the pre-exec process cred.
> * post-exec resources are guarded with the post-exec process cred.

For example?

> So from a userspace perspective I think your semantics are absolutely
> insane.

Which semantics were changed by my patch?

> - You made comments about cred_guard_mutex and it's scope that when I
>   reviewed the code were false.

Which of my comments was wrong?

> We can optimize it

And this is what I meant when I said "we should narrow it in any case"

> but we can't change what is protected

I am not that sure. But a) I did not even try to suggest to change anything
in this area right now, and b) I said that unlikely this is possible.

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: scope of cred_guard_mutex.
       [not found]                                             ` <20170406155540.GC7444-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-04-07 22:07                                               ` Kees Cook
  0 siblings, 0 replies; 42+ messages in thread
From: Kees Cook @ 2017-04-07 22:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, Andrew Morton, Aleksa Sarai, Andy Lutomirski,
	Attila Fazekas, Jann Horn, Michal Hocko, Ulrich Obergfell, LKML,
	Linux API, Eugene Syromiatnikov

On Thu, Apr 6, 2017 at 8:55 AM, Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> And this makes me think again that we need to restart this discusion with
> more CC's.

I'm a fan of that; I've not been able to follow this thread as it
seems to have gone far from the original deadlock problem. :) I've
seen issues with ptrace, zombies, and now exec. I'm lost. :P

>> Partly I object because your understanding and my understanding of
>> cred_guard_mutex are very different.
>>
>> As I read and understand the code the job of cred_guard_mutex is to keep
>> ptrace (and other threads of the proccess) from interferring in
>> exec and to ensure old resources are accessed with permission checks
>> using our original credentials and that new and modified resources are
>> accessed with permission checks using our new credentials.
>
> Yes, this is clear.

Maybe stupid idea: can we get a patch that just adds this kind of
documentation somewhere in the source? If we can agree on the purpose
of cred_guard_mutex, and get it into the code, that seems like a good
step in discussion...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.
       [not found]                                 ` <87inmmbjsq.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2017-04-03 18:37                                   ` Oleg Nesterov
@ 2017-09-04  3:19                                   ` Robert O'Callahan
  1 sibling, 0 replies; 42+ messages in thread
From: Robert O'Callahan @ 2017-09-04  3:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, Andrew Morton, Aleksa Sarai, Andy Lutomirski,
	Attila Fazekas, Jann Horn, Kees Cook, Michal Hocko,
	Ulrich Obergfell, open list, linux-api-u79uwXL29TY76Z2rM5mHXA

Sorry about replying to this old thread, but...

On Mon, Apr 3, 2017 at 9:07 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually
> know what the implications of changing it are.  Let's see...
>
> gdb - no
> upstart - no
> lldb - yes
> strace - no

For the record, rr uses PTRACE_O_TRACEEXIT.

When a thread exits we need to examine its address space to find its
robust futex list and record the changes that will be performed by the
kernel as it cleans up that list. So, any failures to deliver
PTRACE_EVENT_EXIT are potentially problematic for us because we won't
get a chance to examine the address space before it disappears.

Rob
-- 
lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf toD
selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t rdsme,aoreseoouoto
o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea lurpr
.a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr  esn

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2017-09-04  3:19 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20170213141452.GA30203@redhat.com>
     [not found] ` <20170224160354.GA845@redhat.com>
     [not found]   ` <87shmv6ufl.fsf@xmission.com>
     [not found]     ` <20170303173326.GA17899@redhat.com>
     [not found]       ` <20170303173326.GA17899-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-03 18:23         ` [PATCH 0/2] fix the traced mt-exec deadlock Eric W. Biederman
2017-03-03 18:59           ` Eric W. Biederman
     [not found]             ` <87d1dyw5iw.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-03-03 20:06               ` Eric W. Biederman
     [not found]                 ` <87tw7aunuh.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-03-03 20:11                   ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Eric W. Biederman
2017-03-04 17:03                     ` Oleg Nesterov
2017-03-30  8:07                       ` Eric W. Biederman
     [not found]                         ` <8760ir192p.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-04-01  5:11                           ` [RFC][PATCH 0/2] exec: Fixing ptrace'd mulit-threaded hang Eric W. Biederman
     [not found]                             ` <878tnkpv8h.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-04-01  5:14                               ` [RFC][PATCH 1/2] sighand: Count each thread group once in sighand_struct Eric W. Biederman
2017-04-01  5:16                               ` [RFC][PATCH 2/2] exec: If possible don't wait for ptraced threads to be reaped Eric W. Biederman
     [not found]                                 ` <87vaqooggs.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-04-02 15:35                                   ` Oleg Nesterov
     [not found]                                     ` <20170402153517.GA12637-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-02 18:53                                       ` Eric W. Biederman
     [not found]                                         ` <877f32k5ew.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-04-03 18:12                                           ` Oleg Nesterov
2017-04-03 21:04                                             ` Eric W. Biederman
2017-04-05 16:44                                               ` Oleg Nesterov
2017-04-02 15:38                               ` [RFC][PATCH 0/2] exec: Fixing ptrace'd mulit-threaded hang Oleg Nesterov
2017-04-02 22:50                               ` [RFC][PATCH v2 0/5] " Eric W. Biederman
     [not found]                                 ` <874ly6a0h1.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-04-02 22:51                                   ` [RFC][PATCH v2 1/5] ptrace: Don't wait in PTRACE_O_TRACEEXIT for exec or coredump Eric W. Biederman
2017-04-05 16:19                                     ` Oleg Nesterov
2017-04-02 22:51                                   ` [RFC][PATCH v2 2/5] sighand: Count each thread group once in sighand_struct Eric W. Biederman
2017-04-02 22:52                                   ` [RFC][PATCH v2 3/5] clone: Disallown CLONE_THREAD with a shared sighand_struct Eric W. Biederman
     [not found]                                     ` <87k2728lrp.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-04-05 16:24                                       ` Oleg Nesterov
2017-04-05 17:34                                         ` Eric W. Biederman
2017-04-05 18:11                                           ` Oleg Nesterov
2017-04-02 22:53                                   ` [RFC][PATCH v2 4/5] exec: If possible don't wait for ptraced threads to be reaped Eric W. Biederman
2017-04-05 16:15                                     ` Oleg Nesterov
2017-04-02 22:57                                   ` [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec Eric W. Biederman
     [not found]                                     ` <87zify76z9.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-04-05 16:18                                       ` Oleg Nesterov
     [not found]                                         ` <20170405161812.GD14536-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-05 18:16                                           ` Eric W. Biederman
     [not found]                                             ` <87zifu90to.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-04-06 15:48                                               ` Oleg Nesterov
2017-04-02 16:15                           ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Oleg Nesterov
     [not found]                             ` <20170402161518.GC12637-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-02 21:07                               ` Eric W. Biederman
     [not found]                                 ` <87inmmbjsq.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-04-03 18:37                                   ` Oleg Nesterov
     [not found]                                     ` <20170403183728.GB31390-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-03 22:49                                       ` Eric W. Biederman
2017-04-03 22:49                                       ` scope of cred_guard_mutex Eric W. Biederman
2017-04-05 16:08                                         ` Oleg Nesterov
2017-04-05 16:11                                           ` Kees Cook
2017-04-05 17:53                                           ` Eric W. Biederman
2017-04-05 18:15                                             ` Oleg Nesterov
     [not found]                                         ` <87fuhpjeco.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-04-06 15:55                                           ` Oleg Nesterov
     [not found]                                             ` <20170406155540.GC7444-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-07 22:07                                               ` Kees Cook
2017-09-04  3:19                                   ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Robert O'Callahan
     [not found]           ` <87tw7axlr0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-03-04 16:54             ` [PATCH 0/2] fix the traced mt-exec deadlock Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).