linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH][RFC] fix PTRACE_KILL
@ 2021-08-16 21:50 Al Viro
  2021-08-17 18:21 ` Eric W. Biederman
  0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2021-08-16 21:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Eric Biederman, Oleg Nesterov, linux-kernel

[Cc'd to security@k.o, *NOT* because I consider it a serious security hole;
it's just that the odds of catching relevant reviewers there are higher
than on l-k and there doesn't seem to be any lists where that would be
on-topic.  My apologies for misuse of security@k.o ;-/]

Current implementation is racy in quite a few ways - we check that
the child is traced by us and use ptrace_resume() to feed it
SIGKILL, provided that it's still alive.

What we do not do is making sure that the victim is in ptrace stop;
as the result, it can go and violate all kinds of assumptions,
starting with "child->sighand won't change under ptrace_resume()",
"child->ptrace won't get changed under user_disable_single_step()",
etc.

Note that ptrace(2) manpage has this to say:
    
PTRACE_KILL
      Send  the  tracee a SIGKILL to terminate it.  (addr and data are
      ignored.)
    
      This operation is deprecated; do not use it!   Instead,  send  a
      SIGKILL  directly  using kill(2) or tgkill(2).  The problem with
      PTRACE_KILL is that it requires the  tracee  to  be  in  signal-
      delivery-stop,  otherwise  it  may  not work (i.e., may complete
      successfully but won't kill the tracee).  By contrast, sending a
      SIGKILL directly has no such limitation.
    
So let it check (under tasklist_lock) that the victim is traced by us
and call sig_send_info() to feed it SIGKILL.  It's easier that trying
to force ptrace_resume() into handling that mess and it's less brittle
that way.
    
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index f8589bf8d7dc..7f46be488b36 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1220,11 +1220,6 @@ int ptrace_request(struct task_struct *child, long request,
 	case PTRACE_CONT:
 		return ptrace_resume(child, request, data);
 
-	case PTRACE_KILL:
-		if (child->exit_state)	/* already dead */
-			return 0;
-		return ptrace_resume(child, request, SIGKILL);
-
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	case PTRACE_GETREGSET:
 	case PTRACE_SETREGSET: {
@@ -1270,6 +1265,20 @@ int ptrace_request(struct task_struct *child, long request,
 	return ret;
 }
 
+static int ptrace_kill(struct task_struct *child)
+{
+	int ret = -ESRCH;
+
+	read_lock(&tasklist_lock);
+	if (child->ptrace && child->parent == current) {
+		if (!child->exit_state)
+			send_sig_info(SIGKILL, SEND_SIG_PRIV, child);
+		ret = 0;
+	}
+	read_unlock(&tasklist_lock);
+	return ret;
+}
+
 #ifndef arch_ptrace_attach
 #define arch_ptrace_attach(child)	do { } while (0)
 #endif
@@ -1304,8 +1313,12 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		goto out_put_task_struct;
 	}
 
-	ret = ptrace_check_attach(child, request == PTRACE_KILL ||
-				  request == PTRACE_INTERRUPT);
+	if (request == PTRACE_KILL) {
+		ret = ptrace_kill(child);
+		goto out_put_task_struct;
+	}
+
+	ret = ptrace_check_attach(child, request == PTRACE_INTERRUPT);
 	if (ret < 0)
 		goto out_put_task_struct;
 
@@ -1449,8 +1462,12 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,
 		goto out_put_task_struct;
 	}
 
-	ret = ptrace_check_attach(child, request == PTRACE_KILL ||
-				  request == PTRACE_INTERRUPT);
+	if (request == PTRACE_KILL) {
+		ret = ptrace_kill(child);
+		goto out_put_task_struct;
+	}
+
+	ret = ptrace_check_attach(child, request == PTRACE_INTERRUPT);
 	if (!ret) {
 		ret = compat_arch_ptrace(child, request, addr, data);
 		if (ret || request != PTRACE_DETACH)

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH][RFC] fix PTRACE_KILL
  2021-08-16 21:50 [PATCH][RFC] fix PTRACE_KILL Al Viro
@ 2021-08-17 18:21 ` Eric W. Biederman
  2021-08-25  5:12   ` Al Viro
  0 siblings, 1 reply; 5+ messages in thread
From: Eric W. Biederman @ 2021-08-17 18:21 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Oleg Nesterov, linux-kernel

Al Viro <viro@zeniv.linux.org.uk> writes:

> [Cc'd to security@k.o, *NOT* because I consider it a serious security hole;
> it's just that the odds of catching relevant reviewers there are higher
> than on l-k and there doesn't seem to be any lists where that would be
> on-topic.  My apologies for misuse of security@k.o ;-/]

Hmm.  I don't see security@kernel.org Cc'd.

> Current implementation is racy in quite a few ways - we check that
> the child is traced by us and use ptrace_resume() to feed it
> SIGKILL, provided that it's still alive.
>
> What we do not do is making sure that the victim is in ptrace stop;
> as the result, it can go and violate all kinds of assumptions,
> starting with "child->sighand won't change under ptrace_resume()",
> "child->ptrace won't get changed under user_disable_single_step()",
> etc.
>
> Note that ptrace(2) manpage has this to say:
>     
> PTRACE_KILL
>       Send  the  tracee a SIGKILL to terminate it.  (addr and data are
>       ignored.)
>     
>       This operation is deprecated; do not use it!   Instead,  send  a
>       SIGKILL  directly  using kill(2) or tgkill(2).  The problem with
>       PTRACE_KILL is that it requires the  tracee  to  be  in  signal-
>       delivery-stop,  otherwise  it  may  not work (i.e., may complete
>       successfully but won't kill the tracee).  By contrast, sending a
>       SIGKILL directly has no such limitation.
>     
> So let it check (under tasklist_lock) that the victim is traced by us
> and call sig_send_info() to feed it SIGKILL.  It's easier that trying
> to force ptrace_resume() into handling that mess and it's less brittle
> that way.

I took a quick look and despite being deprecated PTRACE_KILL appears
to still have some active users (like gcc-10).  So that seems to rule
out just removing PTRACE_KILL.

I looked at the bug that PTRACE_KILL only kills a process when it is
stopped and it is present in Linux 1.0.  Given that I expect userspace
applications are ok with the current semantics rather than the intended
semantics.

The current semantics also include the weirdness that PTRACE_KILL only
kills a process when it is stopped in ptrace_signal, and not at other
ptrace stops.

So rather than fix the code to do what was intended 27 years ago,
why don't we accept the fact that PTRACE_KILL is equivalent
to PTRACE_CONT with data = SIGKILL.

If there are regressions or we really care we can tweak the return value
to return 0 instead of -ESRCH when the process is not stopped.

Something like this:

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index f8589bf8d7dc..f40f0a0ff70a 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1221,8 +1221,6 @@ int ptrace_request(struct task_struct *child, long request,
 		return ptrace_resume(child, request, data);
 
 	case PTRACE_KILL:
-		if (child->exit_state)	/* already dead */
-			return 0;
 		return ptrace_resume(child, request, SIGKILL);
 
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
@@ -1304,8 +1302,7 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		goto out_put_task_struct;
 	}
 
-	ret = ptrace_check_attach(child, request == PTRACE_KILL ||
-				  request == PTRACE_INTERRUPT);
+	ret = ptrace_check_attach(child, request == PTRACE_INTERRUPT);
 	if (ret < 0)
 		goto out_put_task_struct;
 
@@ -1449,8 +1446,7 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,
 		goto out_put_task_struct;
 	}
 
-	ret = ptrace_check_attach(child, request == PTRACE_KILL ||
-				  request == PTRACE_INTERRUPT);
+	ret = ptrace_check_attach(child, request == PTRACE_INTERRUPT);
 	if (!ret) {
 		ret = compat_arch_ptrace(child, request, addr, data);
 		if (ret || request != PTRACE_DETACH)

Eric

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH][RFC] fix PTRACE_KILL
  2021-08-17 18:21 ` Eric W. Biederman
@ 2021-08-25  5:12   ` Al Viro
  2021-08-27 18:54     ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2021-08-25  5:12 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, Oleg Nesterov, linux-kernel

On Tue, Aug 17, 2021 at 01:21:03PM -0500, Eric W. Biederman wrote:

> > So let it check (under tasklist_lock) that the victim is traced by us
> > and call sig_send_info() to feed it SIGKILL.  It's easier that trying
> > to force ptrace_resume() into handling that mess and it's less brittle
> > that way.
> 
> I took a quick look and despite being deprecated PTRACE_KILL appears
> to still have some active users (like gcc-10).  So that seems to rule

????
What the hell would gcc do with ptrace() to start with, let alone with
PTRACE_KILL?
<greps>

Egads...  
void ThreadSuspender::KillAllThreads() {
  for (uptr i = 0; i < suspended_threads_list_.ThreadCount(); i++)
    internal_ptrace(PTRACE_KILL, suspended_threads_list_.GetThreadID(i),
                    nullptr, nullptr);
}
in libsanitizer/sanitizer_common/sanitizer_stoptheworld_linux_libcdep.cc

OK, obvious comments regarding the authors' sanity aside, here "send
SIGKILL" would clearly be fine.

> I looked at the bug that PTRACE_KILL only kills a process when it is
> stopped and it is present in Linux 1.0.  Given that I expect userspace
> applications are ok with the current semantics rather than the intended
> semantics.

The history is trickier than that.

Originally (and that's all the way back to v6), ptrace(2) failed for
everything that hadn't been in stopped state and traced by the caller.
The only exception had been ptrace(pid, 0) (== PTRACE_TRACEME).
The only stop back then had been in signal delivery.

Predecessor of PTRACE_KILL had been ptrace(pid, 8) (before any enums and
yes, the constant is still the same).  It worked by arranging for exit()
in the context of tracee; see /usr/sys/ken/sig.c in v6 tree (procxmt()
for payload to be run in child, ptrace() for syscall itself).

At some point between 4.3BSD and 4.4BSD PT_ATTACH had been added
(== our PTRACE_ATTACH), and it obviously had checks of its own
that did _not_ require the target to be stopped (let alone to be
already traced by the caller).

In Linux ptrace(2) had been added in 0.95 (well, at some point between
0.12 and 0.95).  The first departure from the usual model had happened
in 0.99pl4.  For some reason PTRACE_DETACH had "target must be stopped"
check removed.  In 0.99pl14 PTRACE_KILL had joined PTRACE_DETACH in
that weirdness, in 0.99pl14b PTRACE_DETACH went back to normal, but
PTRACE_KILL had stuck that way.  No idea why; ask Linus, maybe he remembers.
It's been 28 years, though, so...

At that point we had two ptrace stops - in signal delivery and in
syscall entry (strace stuff).

PTRACE_KILL had become unpleasantly racy - it would make the child
runnable and set child->exit_code to SIGKILL, expecting the child to
pick it from there once it regains the timeslice.  If the child had
been *not* in ptrace stop, all of that would have no effect whatsoever.

ptrace(pid, PTRACE_CONT, ..., SIGKILL) still worked reliably at
that point.  That had changed 9 years later, when we had added new
ptrace stops.  execve() of a traced process used to raise SIGTRAP,
with the usual signal delivery happening on the way to userland.
In a0691b116f6a "Add new ptrace event tracing mechanism" we had
suddenly grown a bunch of (optional) stops, by way of ptrace_event() -
on exec/fork/vfork/clone/exit.  Unfortunately, in each of those stops
ptrace(pid, PTRACE_CONT, ..., signr) ended up with signr quietly ignored.
And in any of them PTRACE_KILL (by then a close relative of PTRACE_CONT)
would quietly do nothing.  Note that ptrace stop on syscall entry/exit
did explicitly raise the signal requested by PTRACE_CONT; the new ones
had not bothered.

Those stops had been uncommon.  However, in 2004 (two years down the
road) Roland's 0cc0515ba006 "[PATCH] make single-step into signal
delivery stop in handler" had added a ptrace stop on single-stepping
into a handler.  And PTRACE_CONT/SIGKILL there will have that SIGKILL
(or any other signal number we pass) completely ignored.  Worse, changing
that behaviour is impossible - if we make PTRACE_CONT in that state
deliver a new signal, we risk breaking any debugger that relies upon
the fact that in such situation ptrace(..., PTRACE_CONT, ...., signr)
ignores signr and had done so for the last 17 years.

By that point signr passing by PTRACE_CONT had become unreliable.
There'd been no good way to tell that stop from the normal stop
on SIGTRAP delivery, short of having examined the state during
the previous stop *and* remembering that you've done PTRACE_SINGLESTEP
from there.  Both parts are required, unfortunately.  Worse, it's not
universal on all architectures - arm64, hexagon, ia64, microblaze,
openrisc, parisc, powerpc, s390, sh, x86 and uml do it, the rest do
not.

Take a look at gdb source and grep for PTRACE_KILL in there; complaints
are impressive (and well-founded).  They end up doing sending SIGKILL
by kill(2); in gdbserver they follow that with PTRACE_KILL, perhaps on
the theory that there's no such thing as overkill...

In 2011 we had another ptrace stop join that bunch - group stop handling
for traced processes.  At some point seccomp shite had been added to
the "events" pile.  All of those have signr (and PTRACE_KILL) ignored,
but by then the thing had already been FUBAR.

> The current semantics also include the weirdness that PTRACE_KILL only
> kills a process when it is stopped in ptrace_signal, and not at other
> ptrace stops.
> 
> So rather than fix the code to do what was intended 27 years ago,
> why don't we accept the fact that PTRACE_KILL is equivalent
> to PTRACE_CONT with data = SIGKILL.

Because it changes behaviour, making it fail visibly for threads not in
ptrace stop?  I agree that it should have been acting that way all along,
but... the same 27 (28, really) years of the current "no error if it's
not stopped" behaviour are there.

> If there are regressions or we really care we can tweak the return value
> to return 0 instead of -ESRCH when the process is not stopped.

How?  You obviously can't blindly change -ESRCH to 0 - it *is* the expected
error when the recepient is not traced by us, after all.  So we'd have to
do... what exactly?  Special-case PTRACE_KILL in ptrace_check_attach()
*and* in its callers?  Open-code ptrace_check_attach() for PTRACE_KILL,
turning it into "yes, call ptrace_freeze_traced(), bailing out if
it fails, but bailing out with 0 instead of -ESRCH", with ptrace_resume()
done in the same function (ptrace_kill()?) we open-code it into?

Frankly, all variants I see are both ugly and hard on the reader.  In
the code that is already way too subtle...

BTW, glibc has an implementation for Hurd in sysdeps/mach/hurd/ptrace.c:
    case PTRACE_KILL:
      va_start (ap, request);
      pid = va_arg (ap, pid_t);
      va_end (ap);
      /* SIGKILL always just terminates the task,
         so normal kill is just the same when traced.  */
      return __kill (pid, SIGKILL);
Which is to say, there it is turned into a signal right on libc level...


It's a mess.  We have two problems with userland API:
	1) PTRACE_KILL doesn't return an error when the target is not stopped.
	2) PTRACE_KILL *and* PTRACE_CONT are unreliable even for stopped
targets - some ptrace stops ignore the signal passed with PTRACE_CONT (and
ignore SIGKILL from PTRACE_KILL).  PTRACE_CONT side of that is impossible
to fix by now.

And that's on top of the internal problems with ptrace_resume() done on
non-stopped child.

The change I'd proposed makes PTRACE_KILL deliver SIGKILL regardless of
the target state; yours is arguably what we should've done from the very
beginning (what we used to have prior to 0.99pl14 and what all other
Unices had been doing all along), but it's a visible change wrt error
returns and I don't see any sane way to paper over that part.

Linus, what would you prefer?  I've no strong preferences here...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][RFC] fix PTRACE_KILL
  2021-08-25  5:12   ` Al Viro
@ 2021-08-27 18:54     ` Linus Torvalds
  2021-08-27 22:05       ` Eric W. Biederman
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2021-08-27 18:54 UTC (permalink / raw)
  To: Al Viro; +Cc: Eric W. Biederman, Oleg Nesterov, Linux Kernel Mailing List

[ Sorry, this got missed by other stuff in my inbox ]

On Tue, Aug 24, 2021 at 10:12 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> The change I'd proposed makes PTRACE_KILL deliver SIGKILL regardless of
> the target state; yours is arguably what we should've done from the very
> beginning (what we used to have prior to 0.99pl14 and what all other
> Unices had been doing all along), but it's a visible change wrt error
> returns and I don't see any sane way to paper over that part.
>
> Linus, what would you prefer?  I've no strong preferences here...

Honestly, I have no huge preferences either, simply because this has
clearly been broken so long that people who care have worked around
the breakage already, or it just didn't matter enough.

Your patch looks fine to me, although looking at it I get the feeling
that we might as well do it inside "ptrace_check_attach()", and that
we should have just passed that function the request (instead of that
odd "ignore_state" argument).

ptrace_check_attach() already gets that tasklist_lock that the code
requires, and could easily do the PTRACE_KILL checking. So I think
that might be an additional cleanup.

But your patch is targeted and doesn't look wrong to me either.

I don't think Eric's patch works, because if I read that one right, it
would actually do that "wait_task_inactive()" which is very much
against the whole point of PTRACE_KILL.  We want to kill the target
asap, and regardless of where it is stuck.

              Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][RFC] fix PTRACE_KILL
  2021-08-27 18:54     ` Linus Torvalds
@ 2021-08-27 22:05       ` Eric W. Biederman
  0 siblings, 0 replies; 5+ messages in thread
From: Eric W. Biederman @ 2021-08-27 22:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, Oleg Nesterov, Linux Kernel Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> [ Sorry, this got missed by other stuff in my inbox ]
>
> On Tue, Aug 24, 2021 at 10:12 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> The change I'd proposed makes PTRACE_KILL deliver SIGKILL regardless of
>> the target state; yours is arguably what we should've done from the very
>> beginning (what we used to have prior to 0.99pl14 and what all other
>> Unices had been doing all along), but it's a visible change wrt error
>> returns and I don't see any sane way to paper over that part.
>>
>> Linus, what would you prefer?  I've no strong preferences here...
>
> Honestly, I have no huge preferences either, simply because this has
> clearly been broken so long that people who care have worked around
> the breakage already, or it just didn't matter enough.

The two choices are change PTRACE_KILL into:
"kill(SIGKILL)" or "ptrace(PTRACE_CONT, SIGKILL)".

When trying to figure out how userspace understands PTRACE_KILL
today, I have found at least one piece of userspace that
figures the definition of PTRACE_KILL is PTRACE_CONT with the signal
SIGKILL.

Frankly I find that a fair characterization, because except for the
return code when the task is not stopped PTRACE_KILL works exactly
like PTRACE_CONT with signal SIGKILL.

As there are real userspace visible differences between PTRACE_CONT and
kill(SIGKILL) I think we should go with the PTRACE_CONT definition
as that is the least likely to break userspace, and the only thing we
really care about here is making it so that malicious userspace can
not cause the kernel to misbehave.

> I don't think Eric's patch works, because if I read that one right, it
> would actually do that "wait_task_inactive()" which is very much
> against the whole point of PTRACE_KILL.  We want to kill the target
> asap, and regardless of where it is stuck.

It will do wait_task_inactive in one of the cases where PTRACE_KILL is
broken today.  But since the cost of a wait_task_inactive looks very
much like the cost of a spin_lock I don't see the problem.  The code in
ptrace_check_attach already waits on several other spin locks.

The code in ptrace_attach verifies that the target task in in state
TASK_TRACED.  Which means that the target task is in ptrace_stop
and has already done "set_special_state(TASK_TRACED)".  At that point
there are two possibilities.  Either the target task will have
reached freezable_schedule in ptrace_stop and wait_task_inactive won't
block at all, or wait_task_inactive will need to spin waiting for
the target task to reach freezable_schedule.

There is nothing in ptrace_stop that can sleep as all of the work
happens under a spin lock so the worst case wait in wait_task_inactive
should be quite short.


All of that said it is probably worth adding to my patch the logic so
that when ptrace_freeze_traced fails or wait_task_inactive fails the
code bails out immediately and returns 0 instead of -ESRCH.  Just to
avoid any userspace behavioral differences in PTRACE_KILL.  So we don't
even need to consider regressions.

Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-27 22:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-16 21:50 [PATCH][RFC] fix PTRACE_KILL Al Viro
2021-08-17 18:21 ` Eric W. Biederman
2021-08-25  5:12   ` Al Viro
2021-08-27 18:54     ` Linus Torvalds
2021-08-27 22:05       ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).