All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Aleksa Sarai <asarai@suse.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Attila Fazekas <afazekas@redhat.com>, Jann Horn <jann@thejh.net>,
	Kees Cook <keescook@chromium.org>,
	Michal Hocko <mhocko@kernel.org>,
	Ulrich Obergfell <uobergfe@redhat.com>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org
Subject: Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.
Date: Sun, 02 Apr 2017 16:07:17 -0500	[thread overview]
Message-ID: <87inmmbjsq.fsf@xmission.com> (raw)
In-Reply-To: <20170402161518.GC12637@redhat.com> (Oleg Nesterov's message of "Sun, 2 Apr 2017 18:15:18 +0200")

Oleg Nesterov <oleg@redhat.com> writes:

> Perhaps I am wrong, but I think you underestimate the problems, and it is
> not clear to me if we really want this.

I worked through quite a bit of it and I realized a few fundamental
issues.  The task struct must remain visible until it is reaped
and we use siglock to protect in unhash process to protect that reaping.
Further tsk->sighand == NULL winds up being a flag used to tell
if release_task has been called.

To get an usable count on sighand struct all that needed to be done
was to change the reference counting of sighand_struct to count
processes and not threads.

Which is what I wound up posting.

> Anyway, Eric, even if we can and want to do this, why we can't do this on
> top of my fix?

Because your reduction in scope of cred_guard_mutex is fundamentally
broken and unnecessary.

> I simply fail to understand why you dislike it that much. Yes it is not
> pretty, I said this many times, but it is safe in that it doesn't really
> change the current behaviour.

No it is not safe.  And it promotes wrong thinking which is even more
dangerous.

I reviewed the code and cred_guard_mutex needs to cover what it covers.

> I am much more worried about 2/2 you didn't argue with, this patch _can_
> break something and this is obviously not good even if PTRACE_EVENT_EXIT
> was always broken.

I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually
know what the implications of changing it are.  Let's see...

gdb - no
upstart - no
lldb - yes
strace - no


It looks like lldb is worth testing with your PTRACE_EVENT_EXIT change
to see if anything breaks.

I think we can get away with changing the exec case but it does look
worth testing.  I hadn't realized you hadn't looked to see what was
using PTRACE_O_TRACEEXIT to see if any part of userspace cares.

Hmm.  This is interesting.  From the strace documentation:
> Tracer cannot assume that ptrace-stopped tracee exists. There are many
> scenarios when tracee may die while stopped (such as SIGKILL).
> Therefore, tracer must always be prepared to handle ESRCH error on any
> ptrace operation. Unfortunately, the same error is returned if tracee
> exists but is not ptrace-stopped (for commands which require stopped
> tracee), or if it is not traced by process which issued ptrace call.
> Tracer needs to keep track of stopped/running state, and interpret
> ESRCH as "tracee died unexpectedly" only if it knows that tracee has
> been observed to enter ptrace-stop. Note that there is no guarantee
> that waitpid(WNOHANG) will reliably report tracee's death status if
> ptrace operation returned ESRCH. waitpid(WNOHANG) may return 0 instead.
> IOW: tracee may be "not yet fully dead" but already refusing ptrace
> ops.

If delivering a second SIGKILL to a ptraced stopped processes will
make it continue we have a very interesting out..

When we stop in ptrace_stop we stop in TASK_TRACED == (TASK_WAKEKILL|__TASK_TRACED)

Delivery of a SIGKILL to that task has queue SIGKILL
and call signal_wake_up_state(t, TASK_WAKEKILL).
Which becomes wake_up_state(t, TASK_INTERRUPTIBLE | TASK_WAKEKILL)
Which wakes up the process.

So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT
before the tracers find it.

Therefore we are only talking a quality of implementation issue
if we actually stop and wait for the tracer or not.

....

Which brings us to your PTRACE_EVENT_EXIT patch.

I think may_ptrace_stop is tested in the wrong place,
and is probably buggy.

- We should send the signal in all cases except when the ptracing parent
  does not exist aka (!current->ptrace).  The siginfo contains enough
  information to understand what happened if anyone is listening.

- Then we should send the group stop.

- Then if we don't want to wait we should:
  __set_current_state(TASK_RUNNING)

- Then we should drop the locks and only call freezable_schedule if
  we want to wait.

That way userspace thinks someone else just sent a SIGKILL and killed
the thread before it had a chance to look (which is effectively what
we are doing).  That sounds idea for both core-dumps and this case.

Eric

WARNING: multiple messages have this Message-ID (diff)
From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman)
To: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Aleksa Sarai <asarai-IBi9RG/b67k@public.gmane.org>,
	Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
	Attila Fazekas <afazekas-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Jann Horn <jann-XZ1E9jl8jIdeoWH0uzbU5w@public.gmane.org>,
	Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>,
	Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Ulrich Obergfell
	<uobergfe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.
Date: Sun, 02 Apr 2017 16:07:17 -0500	[thread overview]
Message-ID: <87inmmbjsq.fsf@xmission.com> (raw)
In-Reply-To: <20170402161518.GC12637-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> (Oleg Nesterov's message of "Sun, 2 Apr 2017 18:15:18 +0200")

Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> Perhaps I am wrong, but I think you underestimate the problems, and it is
> not clear to me if we really want this.

I worked through quite a bit of it and I realized a few fundamental
issues.  The task struct must remain visible until it is reaped
and we use siglock to protect in unhash process to protect that reaping.
Further tsk->sighand == NULL winds up being a flag used to tell
if release_task has been called.

To get an usable count on sighand struct all that needed to be done
was to change the reference counting of sighand_struct to count
processes and not threads.

Which is what I wound up posting.

> Anyway, Eric, even if we can and want to do this, why we can't do this on
> top of my fix?

Because your reduction in scope of cred_guard_mutex is fundamentally
broken and unnecessary.

> I simply fail to understand why you dislike it that much. Yes it is not
> pretty, I said this many times, but it is safe in that it doesn't really
> change the current behaviour.

No it is not safe.  And it promotes wrong thinking which is even more
dangerous.

I reviewed the code and cred_guard_mutex needs to cover what it covers.

> I am much more worried about 2/2 you didn't argue with, this patch _can_
> break something and this is obviously not good even if PTRACE_EVENT_EXIT
> was always broken.

I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually
know what the implications of changing it are.  Let's see...

gdb - no
upstart - no
lldb - yes
strace - no


It looks like lldb is worth testing with your PTRACE_EVENT_EXIT change
to see if anything breaks.

I think we can get away with changing the exec case but it does look
worth testing.  I hadn't realized you hadn't looked to see what was
using PTRACE_O_TRACEEXIT to see if any part of userspace cares.

Hmm.  This is interesting.  From the strace documentation:
> Tracer cannot assume that ptrace-stopped tracee exists. There are many
> scenarios when tracee may die while stopped (such as SIGKILL).
> Therefore, tracer must always be prepared to handle ESRCH error on any
> ptrace operation. Unfortunately, the same error is returned if tracee
> exists but is not ptrace-stopped (for commands which require stopped
> tracee), or if it is not traced by process which issued ptrace call.
> Tracer needs to keep track of stopped/running state, and interpret
> ESRCH as "tracee died unexpectedly" only if it knows that tracee has
> been observed to enter ptrace-stop. Note that there is no guarantee
> that waitpid(WNOHANG) will reliably report tracee's death status if
> ptrace operation returned ESRCH. waitpid(WNOHANG) may return 0 instead.
> IOW: tracee may be "not yet fully dead" but already refusing ptrace
> ops.

If delivering a second SIGKILL to a ptraced stopped processes will
make it continue we have a very interesting out..

When we stop in ptrace_stop we stop in TASK_TRACED == (TASK_WAKEKILL|__TASK_TRACED)

Delivery of a SIGKILL to that task has queue SIGKILL
and call signal_wake_up_state(t, TASK_WAKEKILL).
Which becomes wake_up_state(t, TASK_INTERRUPTIBLE | TASK_WAKEKILL)
Which wakes up the process.

So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT
before the tracers find it.

Therefore we are only talking a quality of implementation issue
if we actually stop and wait for the tracer or not.

....

Which brings us to your PTRACE_EVENT_EXIT patch.

I think may_ptrace_stop is tested in the wrong place,
and is probably buggy.

- We should send the signal in all cases except when the ptracing parent
  does not exist aka (!current->ptrace).  The siginfo contains enough
  information to understand what happened if anyone is listening.

- Then we should send the group stop.

- Then if we don't want to wait we should:
  __set_current_state(TASK_RUNNING)

- Then we should drop the locks and only call freezable_schedule if
  we want to wait.

That way userspace thinks someone else just sent a SIGKILL and killed
the thread before it had a chance to look (which is effectively what
we are doing).  That sounds idea for both core-dumps and this case.

Eric

  reply	other threads:[~2017-04-02 21:12 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-13 14:14 [PATCH 0/2] fix the traced mt-exec deadlock Oleg Nesterov
2017-02-13 14:15 ` [PATCH 1/2] exec: don't wait for zombie threads with cred_guard_mutex held Oleg Nesterov
2017-02-13 16:12   ` kbuild test robot
2017-02-13 16:47     ` Oleg Nesterov
2017-02-13 16:39   ` kbuild test robot
2017-02-13 17:27   ` Mika Penttilä
2017-02-13 18:01     ` Oleg Nesterov
2017-02-13 18:04   ` [PATCH V2 " Oleg Nesterov
2017-02-16 11:42     ` Eric W. Biederman
2017-02-20 15:22       ` Oleg Nesterov
2017-02-20 15:36         ` Oleg Nesterov
2017-02-20 22:30         ` Eric W. Biederman
2017-02-21 17:53           ` Oleg Nesterov
2017-02-21 20:20             ` Eric W. Biederman
2017-02-22 17:41               ` Oleg Nesterov
2017-02-17  4:42     ` Eric W. Biederman
2017-02-20 15:50       ` Oleg Nesterov
2017-02-13 14:15 ` [PATCH 2/2] ptrace: ensure PTRACE_EVENT_EXIT won't stop if the tracee is killed by exec Oleg Nesterov
2017-02-24 16:03 ` [PATCH 0/2] fix the traced mt-exec deadlock Oleg Nesterov
2017-03-03  1:05   ` Eric W. Biederman
2017-03-03 17:33     ` Oleg Nesterov
2017-03-03 18:23       ` Eric W. Biederman
2017-03-03 18:23         ` Eric W. Biederman
2017-03-03 18:59         ` Eric W. Biederman
2017-03-03 18:59           ` Eric W. Biederman
2017-03-03 20:06           ` Eric W. Biederman
2017-03-03 20:06             ` Eric W. Biederman
2017-03-03 20:11             ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Eric W. Biederman
2017-03-03 20:11               ` Eric W. Biederman
2017-03-04 17:03               ` Oleg Nesterov
2017-03-30  8:07                 ` Eric W. Biederman
2017-04-01  5:11                   ` [RFC][PATCH 0/2] exec: Fixing ptrace'd mulit-threaded hang Eric W. Biederman
2017-04-01  5:11                     ` Eric W. Biederman
2017-04-01  5:14                     ` [RFC][PATCH 1/2] sighand: Count each thread group once in sighand_struct Eric W. Biederman
2017-04-01  5:14                       ` Eric W. Biederman
2017-04-01  5:16                     ` [RFC][PATCH 2/2] exec: If possible don't wait for ptraced threads to be reaped Eric W. Biederman
2017-04-01  5:16                       ` Eric W. Biederman
2017-04-02 15:35                       ` Oleg Nesterov
2017-04-02 15:35                         ` Oleg Nesterov
2017-04-02 18:53                         ` Eric W. Biederman
2017-04-02 18:53                           ` Eric W. Biederman
2017-04-03 18:12                           ` Oleg Nesterov
2017-04-03 18:12                             ` Oleg Nesterov
2017-04-03 21:04                             ` Eric W. Biederman
2017-04-05 16:44                               ` Oleg Nesterov
2017-04-02 15:38                     ` [RFC][PATCH 0/2] exec: Fixing ptrace'd mulit-threaded hang Oleg Nesterov
2017-04-02 15:38                       ` Oleg Nesterov
2017-04-02 22:50                     ` [RFC][PATCH v2 0/5] " Eric W. Biederman
2017-04-02 22:50                       ` Eric W. Biederman
2017-04-02 22:51                       ` [RFC][PATCH v2 1/5] ptrace: Don't wait in PTRACE_O_TRACEEXIT for exec or coredump Eric W. Biederman
2017-04-02 22:51                         ` Eric W. Biederman
2017-04-05 16:19                         ` Oleg Nesterov
2017-04-02 22:51                       ` [RFC][PATCH v2 2/5] sighand: Count each thread group once in sighand_struct Eric W. Biederman
2017-04-02 22:51                         ` Eric W. Biederman
2017-04-02 22:52                       ` [RFC][PATCH v2 3/5] clone: Disallown CLONE_THREAD with a shared sighand_struct Eric W. Biederman
2017-04-02 22:52                         ` Eric W. Biederman
2017-04-05 16:24                         ` Oleg Nesterov
2017-04-05 16:24                           ` Oleg Nesterov
2017-04-05 17:34                           ` Eric W. Biederman
2017-04-05 18:11                             ` Oleg Nesterov
2017-04-02 22:53                       ` [RFC][PATCH v2 4/5] exec: If possible don't wait for ptraced threads to be reaped Eric W. Biederman
2017-04-02 22:53                         ` Eric W. Biederman
2017-04-05 16:15                         ` Oleg Nesterov
2017-04-02 22:57                       ` [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec Eric W. Biederman
2017-04-02 22:57                         ` Eric W. Biederman
2017-04-05 16:18                         ` Oleg Nesterov
2017-04-05 16:18                           ` Oleg Nesterov
2017-04-05 18:16                           ` Eric W. Biederman
2017-04-05 18:16                             ` Eric W. Biederman
2017-04-06 15:48                             ` Oleg Nesterov
2017-04-06 15:48                               ` Oleg Nesterov
2017-04-02 16:15                   ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Oleg Nesterov
2017-04-02 16:15                     ` Oleg Nesterov
2017-04-02 21:07                     ` Eric W. Biederman [this message]
2017-04-02 21:07                       ` Eric W. Biederman
2017-04-03 18:37                       ` Oleg Nesterov
2017-04-03 18:37                         ` Oleg Nesterov
2017-04-03 22:49                         ` Eric W. Biederman
2017-04-03 22:49                           ` Eric W. Biederman
2017-04-03 22:49                         ` scope of cred_guard_mutex Eric W. Biederman
2017-04-03 22:49                           ` Eric W. Biederman
2017-04-05 16:08                           ` Oleg Nesterov
2017-04-05 16:11                             ` Kees Cook
2017-04-05 17:53                             ` Eric W. Biederman
2017-04-05 18:15                               ` Oleg Nesterov
2017-04-06 15:55                           ` Oleg Nesterov
2017-04-06 15:55                             ` Oleg Nesterov
2017-04-07 22:07                             ` Kees Cook
2017-04-07 22:07                               ` Kees Cook
2017-09-04  3:19                       ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Robert O'Callahan
2017-09-04  3:19                         ` Robert O'Callahan
2017-03-04 16:54         ` [PATCH 0/2] fix the traced mt-exec deadlock Oleg Nesterov
2017-03-04 16:54           ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87inmmbjsq.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=afazekas@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=asarai@suse.com \
    --cc=jann@thejh.net \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=uobergfe@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.