linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Aleksa Sarai <asarai@suse.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Attila Fazekas <afazekas@redhat.com>, Jann Horn <jann@thejh.net>,
	Kees Cook <keescook@chromium.org>,
	Michal Hocko <mhocko@kernel.org>,
	Ulrich Obergfell <uobergfe@redhat.com>,
	linux-kernel@vger.kernel.org, <linux-api@vger.kernel.org>
Subject: Re: [PATCH 0/2] fix the traced mt-exec deadlock
Date: Fri, 03 Mar 2017 12:23:31 -0600	[thread overview]
Message-ID: <87tw7axlr0.fsf@xmission.com> (raw)
In-Reply-To: <20170303173326.GA17899@redhat.com> (Oleg Nesterov's message of "Fri, 3 Mar 2017 18:33:26 +0100")


Cc'd linux-api as we are talking about a deliberate user visible API
change here.

Oleg Nesterov <oleg@redhat.com> writes:

> On 03/02, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> > our discussion was a bit confusing, and it seems that we did not
>> > fully convince each other. So let me ask what do you finally think
>> > about this fix.
>> >
>> > Let me repeat. Even if I do not agree with some of your objections,
>> > I do agree that 1/2 does not look nice and clean. And we seem to
>> > agree that either way, with or without this fix, we need more changes
>> > in this area.
>> >
>> > But we need a simple and backportable fix for stable trees, say for
>> > rhel7. This bug was reported many times, and this is the simplest
>> > solution I was able to find.
>>
>> I am not 100% convinced that we need a backportable solution,
>
> I think we need, this was already requested,
>
>> but
>> regardless my experience is that good clean solutions are almost always
>> easier to backport that something we just settle for.
>
> Sure.
>
>> The patch below needs a little more looking and testing but arguably
>> it is sufficient.
>>
>> It implements autoreaping for non-leader threads always.  We might want
>> to limit this to the case of exec.
>
> I should have mentioned this. Of course, this change was proposed from the
> very beginning, when this problem was reported first time. And of course
> I like this fix much more than my patch (but yes, we shouldd limit it to
> the case of exec).
>
> The only problem is that it is incompatible change, and I have no idea what
> can be broken.
>
> Trivial test-case:
>
> 	void *thread(void *arg)
> 	{
> 		for (;;)
> 			pause();
> 		return NULL;
> 	}
>
> 	int main(void)
> 	{
> 		pthread_t pt;
> 		pthread_create(&pt, NULL, thread, NULL);
> 		execlp("true", "true", NULL);
> 	}
>
> with your patch applied
>
> 	$ strace -f ./test
> 	strace: wait4(__WALL): No child processes
>
> Yes, this is just a warning, but still. Now we need to change strace. And we
> can't know what else can be confused/broken by this change.
>
> man(ptrace) even documents that all other threads except the thread group leader
> report death as if they exited via _exit(2).
>
> Yes, yes, it also says that other threads stop in PTRACE_EVENT_EXIT stop,
> so 2/2 (which we need with your change too) is is not compatible too and
> I am worried, but:
>
> 	- this was never really true, an already exiting thread won't stop
> 	  if it races with exec
>
> 	- PTRACE_O_TRACEEXEC is not used that often, it never really worked
>
> 	- man(ptrace) also mentions that PTRACE_EVENT_EXIT behaviour may be
> 	  change in the future.
>
> In short. Of course I considered this change. Looks too risky to me. But.
> I will be happy if you send this change and take all the blame ;) I won't
> argue (if you limit it to execve case).

Unfortunately you have found what already looks like a userspace
regression.  So I don't think we can do that.

I do think the user space visible change of modifying a multi-threaded
exec no to wait  for the other threads to be reaped is the least
dangerous change.

The big lesson for me, and what was not obvious from your change
description is that we are changing the user space visible semantics
of exec+ptrace and that cred_guard_mutex is not at all the problem (as
we always take cred_guard_mutex in a killable or interruptible way).

So I tentatively agree that we need to deliberately change the semantics
of exec to not wait for zombies in fs/exec.c:de_thread.  That is what I
would expect of exec and that seems to the minimal change.

As part of that change I expect we want to call __cleanup_sighand from
do_exit rather than release_task.  Semantically we sould not need the
sighand_struct for zombie process.

>> --- a/kernel/exit.c
>> +++ b/kernel/exit.c
>> @@ -690,7 +690,9 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
>>  				thread_group_empty(tsk) &&
>>  				!ptrace_reparented(tsk) ?
>>  			tsk->exit_signal : SIGCHLD;
>> -		autoreap = do_notify_parent(tsk, sig);
>> +		do_notify_parent(tsk, sig);
>> +		/* Autoreap threads even when ptraced */
>> +		autoreap = !thread_group_leader(tsk);
>>  	} else if (thread_group_leader(tsk)) {
>>  		autoreap = thread_group_empty(tsk) &&
>>  			do_notify_parent(tsk, tsk->exit_signal);
>
> This is all you need,
>
>> @@ -699,8 +701,6 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
>>  	}
>>
>>  	tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
>> -	if (tsk->exit_state == EXIT_DEAD)
>> -		list_add(&tsk->ptrace_entry, &dead);
>>
>>  	/* mt-exec, de_thread() is waiting for group leader */
>>  	if (unlikely(tsk->signal->notify_count < 0))
>> @@ -711,6 +711,8 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
>>  		list_del_init(&p->ptrace_entry);
>>  		release_task(p);
>>  	}
>> +	if (autoreap)
>> +		release_task(tsk);
>
> These 2 changes are not needed. release_task(tsk) will be called by
> list_for_each_entry_safe() above if autoreap == T.

Except for the practical case that for threads that are ptraced
tsk->ptrace_entry is already in use.  Which means we can't use
list_add(&tsk->ptrace_entry, &dead).

Or in short we get a really pretty oops because we corrupt one of the
ptrace lists if we don't have my extra two hunks.  But I agree logically
they are separate changes.

Eric

  reply	other threads:[~2017-03-03 19:23 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-13 14:14 [PATCH 0/2] fix the traced mt-exec deadlock Oleg Nesterov
2017-02-13 14:15 ` [PATCH 1/2] exec: don't wait for zombie threads with cred_guard_mutex held Oleg Nesterov
2017-02-13 16:12   ` kbuild test robot
2017-02-13 16:47     ` Oleg Nesterov
2017-02-13 16:39   ` kbuild test robot
2017-02-13 17:27   ` Mika Penttilä
2017-02-13 18:01     ` Oleg Nesterov
2017-02-13 18:04   ` [PATCH V2 " Oleg Nesterov
2017-02-16 11:42     ` Eric W. Biederman
2017-02-20 15:22       ` Oleg Nesterov
2017-02-20 15:36         ` Oleg Nesterov
2017-02-20 22:30         ` Eric W. Biederman
2017-02-21 17:53           ` Oleg Nesterov
2017-02-21 20:20             ` Eric W. Biederman
2017-02-22 17:41               ` Oleg Nesterov
2017-02-17  4:42     ` Eric W. Biederman
2017-02-20 15:50       ` Oleg Nesterov
2017-02-13 14:15 ` [PATCH 2/2] ptrace: ensure PTRACE_EVENT_EXIT won't stop if the tracee is killed by exec Oleg Nesterov
2017-02-24 16:03 ` [PATCH 0/2] fix the traced mt-exec deadlock Oleg Nesterov
2017-03-03  1:05   ` Eric W. Biederman
2017-03-03 17:33     ` Oleg Nesterov
2017-03-03 18:23       ` Eric W. Biederman [this message]
2017-03-03 18:59         ` Eric W. Biederman
2017-03-03 20:06           ` Eric W. Biederman
2017-03-03 20:11             ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Eric W. Biederman
2017-03-04 17:03               ` Oleg Nesterov
2017-03-30  8:07                 ` Eric W. Biederman
2017-04-01  5:11                   ` [RFC][PATCH 0/2] exec: Fixing ptrace'd mulit-threaded hang Eric W. Biederman
2017-04-01  5:14                     ` [RFC][PATCH 1/2] sighand: Count each thread group once in sighand_struct Eric W. Biederman
2017-04-01  5:16                     ` [RFC][PATCH 2/2] exec: If possible don't wait for ptraced threads to be reaped Eric W. Biederman
2017-04-02 15:35                       ` Oleg Nesterov
2017-04-02 18:53                         ` Eric W. Biederman
2017-04-03 18:12                           ` Oleg Nesterov
2017-04-03 21:04                             ` Eric W. Biederman
2017-04-05 16:44                               ` Oleg Nesterov
2017-04-02 15:38                     ` [RFC][PATCH 0/2] exec: Fixing ptrace'd mulit-threaded hang Oleg Nesterov
2017-04-02 22:50                     ` [RFC][PATCH v2 0/5] " Eric W. Biederman
2017-04-02 22:51                       ` [RFC][PATCH v2 1/5] ptrace: Don't wait in PTRACE_O_TRACEEXIT for exec or coredump Eric W. Biederman
2017-04-05 16:19                         ` Oleg Nesterov
2017-04-02 22:51                       ` [RFC][PATCH v2 2/5] sighand: Count each thread group once in sighand_struct Eric W. Biederman
2017-04-02 22:52                       ` [RFC][PATCH v2 3/5] clone: Disallown CLONE_THREAD with a shared sighand_struct Eric W. Biederman
2017-04-05 16:24                         ` Oleg Nesterov
2017-04-05 17:34                           ` Eric W. Biederman
2017-04-05 18:11                             ` Oleg Nesterov
2017-04-02 22:53                       ` [RFC][PATCH v2 4/5] exec: If possible don't wait for ptraced threads to be reaped Eric W. Biederman
2017-04-05 16:15                         ` Oleg Nesterov
2017-04-02 22:57                       ` [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec Eric W. Biederman
2017-04-05 16:18                         ` Oleg Nesterov
2017-04-05 18:16                           ` Eric W. Biederman
2017-04-06 15:48                             ` Oleg Nesterov
2017-04-02 16:15                   ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Oleg Nesterov
2017-04-02 21:07                     ` Eric W. Biederman
2017-04-03 18:37                       ` Oleg Nesterov
2017-04-03 22:49                         ` Eric W. Biederman
2017-04-03 22:49                         ` scope of cred_guard_mutex Eric W. Biederman
2017-04-05 16:08                           ` Oleg Nesterov
2017-04-05 16:11                             ` Kees Cook
2017-04-05 17:53                             ` Eric W. Biederman
2017-04-05 18:15                               ` Oleg Nesterov
2017-04-06 15:55                           ` Oleg Nesterov
2017-04-07 22:07                             ` Kees Cook
2017-09-04  3:19                       ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Robert O'Callahan
2017-03-04 16:54         ` [PATCH 0/2] fix the traced mt-exec deadlock Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tw7axlr0.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=afazekas@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=asarai@suse.com \
    --cc=jann@thejh.net \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=uobergfe@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).