* [PATCH][RFC] fix PTRACE_KILL @ 2021-08-16 21:50 Al Viro 2021-08-17 18:21 ` Eric W. Biederman 0 siblings, 1 reply; 5+ messages in thread From: Al Viro @ 2021-08-16 21:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: Eric Biederman, Oleg Nesterov, linux-kernel [Cc'd to security@k.o, *NOT* because I consider it a serious security hole; it's just that the odds of catching relevant reviewers there are higher than on l-k and there doesn't seem to be any lists where that would be on-topic. My apologies for misuse of security@k.o ;-/] Current implementation is racy in quite a few ways - we check that the child is traced by us and use ptrace_resume() to feed it SIGKILL, provided that it's still alive. What we do not do is making sure that the victim is in ptrace stop; as the result, it can go and violate all kinds of assumptions, starting with "child->sighand won't change under ptrace_resume()", "child->ptrace won't get changed under user_disable_single_step()", etc. Note that ptrace(2) manpage has this to say: PTRACE_KILL Send the tracee a SIGKILL to terminate it. (addr and data are ignored.) This operation is deprecated; do not use it! Instead, send a SIGKILL directly using kill(2) or tgkill(2). The problem with PTRACE_KILL is that it requires the tracee to be in signal- delivery-stop, otherwise it may not work (i.e., may complete successfully but won't kill the tracee). By contrast, sending a SIGKILL directly has no such limitation. So let it check (under tasklist_lock) that the victim is traced by us and call sig_send_info() to feed it SIGKILL. It's easier that trying to force ptrace_resume() into handling that mess and it's less brittle that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- diff --git a/kernel/ptrace.c b/kernel/ptrace.c index f8589bf8d7dc..7f46be488b36 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -1220,11 +1220,6 @@ int ptrace_request(struct task_struct *child, long request, case PTRACE_CONT: return ptrace_resume(child, request, data); - case PTRACE_KILL: - if (child->exit_state) /* already dead */ - return 0; - return ptrace_resume(child, request, SIGKILL); - #ifdef CONFIG_HAVE_ARCH_TRACEHOOK case PTRACE_GETREGSET: case PTRACE_SETREGSET: { @@ -1270,6 +1265,20 @@ int ptrace_request(struct task_struct *child, long request, return ret; } +static int ptrace_kill(struct task_struct *child) +{ + int ret = -ESRCH; + + read_lock(&tasklist_lock); + if (child->ptrace && child->parent == current) { + if (!child->exit_state) + send_sig_info(SIGKILL, SEND_SIG_PRIV, child); + ret = 0; + } + read_unlock(&tasklist_lock); + return ret; +} + #ifndef arch_ptrace_attach #define arch_ptrace_attach(child) do { } while (0) #endif @@ -1304,8 +1313,12 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr, goto out_put_task_struct; } - ret = ptrace_check_attach(child, request == PTRACE_KILL || - request == PTRACE_INTERRUPT); + if (request == PTRACE_KILL) { + ret = ptrace_kill(child); + goto out_put_task_struct; + } + + ret = ptrace_check_attach(child, request == PTRACE_INTERRUPT); if (ret < 0) goto out_put_task_struct; @@ -1449,8 +1462,12 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid, goto out_put_task_struct; } - ret = ptrace_check_attach(child, request == PTRACE_KILL || - request == PTRACE_INTERRUPT); + if (request == PTRACE_KILL) { + ret = ptrace_kill(child); + goto out_put_task_struct; + } + + ret = ptrace_check_attach(child, request == PTRACE_INTERRUPT); if (!ret) { ret = compat_arch_ptrace(child, request, addr, data); if (ret || request != PTRACE_DETACH) ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH][RFC] fix PTRACE_KILL 2021-08-16 21:50 [PATCH][RFC] fix PTRACE_KILL Al Viro @ 2021-08-17 18:21 ` Eric W. Biederman 2021-08-25 5:12 ` Al Viro 0 siblings, 1 reply; 5+ messages in thread From: Eric W. Biederman @ 2021-08-17 18:21 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, Oleg Nesterov, linux-kernel Al Viro <viro@zeniv.linux.org.uk> writes: > [Cc'd to security@k.o, *NOT* because I consider it a serious security hole; > it's just that the odds of catching relevant reviewers there are higher > than on l-k and there doesn't seem to be any lists where that would be > on-topic. My apologies for misuse of security@k.o ;-/] Hmm. I don't see security@kernel.org Cc'd. > Current implementation is racy in quite a few ways - we check that > the child is traced by us and use ptrace_resume() to feed it > SIGKILL, provided that it's still alive. > > What we do not do is making sure that the victim is in ptrace stop; > as the result, it can go and violate all kinds of assumptions, > starting with "child->sighand won't change under ptrace_resume()", > "child->ptrace won't get changed under user_disable_single_step()", > etc. > > Note that ptrace(2) manpage has this to say: > > PTRACE_KILL > Send the tracee a SIGKILL to terminate it. (addr and data are > ignored.) > > This operation is deprecated; do not use it! Instead, send a > SIGKILL directly using kill(2) or tgkill(2). The problem with > PTRACE_KILL is that it requires the tracee to be in signal- > delivery-stop, otherwise it may not work (i.e., may complete > successfully but won't kill the tracee). By contrast, sending a > SIGKILL directly has no such limitation. > > So let it check (under tasklist_lock) that the victim is traced by us > and call sig_send_info() to feed it SIGKILL. It's easier that trying > to force ptrace_resume() into handling that mess and it's less brittle > that way. I took a quick look and despite being deprecated PTRACE_KILL appears to still have some active users (like gcc-10). So that seems to rule out just removing PTRACE_KILL. I looked at the bug that PTRACE_KILL only kills a process when it is stopped and it is present in Linux 1.0. Given that I expect userspace applications are ok with the current semantics rather than the intended semantics. The current semantics also include the weirdness that PTRACE_KILL only kills a process when it is stopped in ptrace_signal, and not at other ptrace stops. So rather than fix the code to do what was intended 27 years ago, why don't we accept the fact that PTRACE_KILL is equivalent to PTRACE_CONT with data = SIGKILL. If there are regressions or we really care we can tweak the return value to return 0 instead of -ESRCH when the process is not stopped. Something like this: diff --git a/kernel/ptrace.c b/kernel/ptrace.c index f8589bf8d7dc..f40f0a0ff70a 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -1221,8 +1221,6 @@ int ptrace_request(struct task_struct *child, long request, return ptrace_resume(child, request, data); case PTRACE_KILL: - if (child->exit_state) /* already dead */ - return 0; return ptrace_resume(child, request, SIGKILL); #ifdef CONFIG_HAVE_ARCH_TRACEHOOK @@ -1304,8 +1302,7 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr, goto out_put_task_struct; } - ret = ptrace_check_attach(child, request == PTRACE_KILL || - request == PTRACE_INTERRUPT); + ret = ptrace_check_attach(child, request == PTRACE_INTERRUPT); if (ret < 0) goto out_put_task_struct; @@ -1449,8 +1446,7 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid, goto out_put_task_struct; } - ret = ptrace_check_attach(child, request == PTRACE_KILL || - request == PTRACE_INTERRUPT); + ret = ptrace_check_attach(child, request == PTRACE_INTERRUPT); if (!ret) { ret = compat_arch_ptrace(child, request, addr, data); if (ret || request != PTRACE_DETACH) Eric ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH][RFC] fix PTRACE_KILL 2021-08-17 18:21 ` Eric W. Biederman @ 2021-08-25 5:12 ` Al Viro 2021-08-27 18:54 ` Linus Torvalds 0 siblings, 1 reply; 5+ messages in thread From: Al Viro @ 2021-08-25 5:12 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linus Torvalds, Oleg Nesterov, linux-kernel On Tue, Aug 17, 2021 at 01:21:03PM -0500, Eric W. Biederman wrote: > > So let it check (under tasklist_lock) that the victim is traced by us > > and call sig_send_info() to feed it SIGKILL. It's easier that trying > > to force ptrace_resume() into handling that mess and it's less brittle > > that way. > > I took a quick look and despite being deprecated PTRACE_KILL appears > to still have some active users (like gcc-10). So that seems to rule ???? What the hell would gcc do with ptrace() to start with, let alone with PTRACE_KILL? <greps> Egads... void ThreadSuspender::KillAllThreads() { for (uptr i = 0; i < suspended_threads_list_.ThreadCount(); i++) internal_ptrace(PTRACE_KILL, suspended_threads_list_.GetThreadID(i), nullptr, nullptr); } in libsanitizer/sanitizer_common/sanitizer_stoptheworld_linux_libcdep.cc OK, obvious comments regarding the authors' sanity aside, here "send SIGKILL" would clearly be fine. > I looked at the bug that PTRACE_KILL only kills a process when it is > stopped and it is present in Linux 1.0. Given that I expect userspace > applications are ok with the current semantics rather than the intended > semantics. The history is trickier than that. Originally (and that's all the way back to v6), ptrace(2) failed for everything that hadn't been in stopped state and traced by the caller. The only exception had been ptrace(pid, 0) (== PTRACE_TRACEME). The only stop back then had been in signal delivery. Predecessor of PTRACE_KILL had been ptrace(pid, 8) (before any enums and yes, the constant is still the same). It worked by arranging for exit() in the context of tracee; see /usr/sys/ken/sig.c in v6 tree (procxmt() for payload to be run in child, ptrace() for syscall itself). At some point between 4.3BSD and 4.4BSD PT_ATTACH had been added (== our PTRACE_ATTACH), and it obviously had checks of its own that did _not_ require the target to be stopped (let alone to be already traced by the caller). In Linux ptrace(2) had been added in 0.95 (well, at some point between 0.12 and 0.95). The first departure from the usual model had happened in 0.99pl4. For some reason PTRACE_DETACH had "target must be stopped" check removed. In 0.99pl14 PTRACE_KILL had joined PTRACE_DETACH in that weirdness, in 0.99pl14b PTRACE_DETACH went back to normal, but PTRACE_KILL had stuck that way. No idea why; ask Linus, maybe he remembers. It's been 28 years, though, so... At that point we had two ptrace stops - in signal delivery and in syscall entry (strace stuff). PTRACE_KILL had become unpleasantly racy - it would make the child runnable and set child->exit_code to SIGKILL, expecting the child to pick it from there once it regains the timeslice. If the child had been *not* in ptrace stop, all of that would have no effect whatsoever. ptrace(pid, PTRACE_CONT, ..., SIGKILL) still worked reliably at that point. That had changed 9 years later, when we had added new ptrace stops. execve() of a traced process used to raise SIGTRAP, with the usual signal delivery happening on the way to userland. In a0691b116f6a "Add new ptrace event tracing mechanism" we had suddenly grown a bunch of (optional) stops, by way of ptrace_event() - on exec/fork/vfork/clone/exit. Unfortunately, in each of those stops ptrace(pid, PTRACE_CONT, ..., signr) ended up with signr quietly ignored. And in any of them PTRACE_KILL (by then a close relative of PTRACE_CONT) would quietly do nothing. Note that ptrace stop on syscall entry/exit did explicitly raise the signal requested by PTRACE_CONT; the new ones had not bothered. Those stops had been uncommon. However, in 2004 (two years down the road) Roland's 0cc0515ba006 "[PATCH] make single-step into signal delivery stop in handler" had added a ptrace stop on single-stepping into a handler. And PTRACE_CONT/SIGKILL there will have that SIGKILL (or any other signal number we pass) completely ignored. Worse, changing that behaviour is impossible - if we make PTRACE_CONT in that state deliver a new signal, we risk breaking any debugger that relies upon the fact that in such situation ptrace(..., PTRACE_CONT, ...., signr) ignores signr and had done so for the last 17 years. By that point signr passing by PTRACE_CONT had become unreliable. There'd been no good way to tell that stop from the normal stop on SIGTRAP delivery, short of having examined the state during the previous stop *and* remembering that you've done PTRACE_SINGLESTEP from there. Both parts are required, unfortunately. Worse, it's not universal on all architectures - arm64, hexagon, ia64, microblaze, openrisc, parisc, powerpc, s390, sh, x86 and uml do it, the rest do not. Take a look at gdb source and grep for PTRACE_KILL in there; complaints are impressive (and well-founded). They end up doing sending SIGKILL by kill(2); in gdbserver they follow that with PTRACE_KILL, perhaps on the theory that there's no such thing as overkill... In 2011 we had another ptrace stop join that bunch - group stop handling for traced processes. At some point seccomp shite had been added to the "events" pile. All of those have signr (and PTRACE_KILL) ignored, but by then the thing had already been FUBAR. > The current semantics also include the weirdness that PTRACE_KILL only > kills a process when it is stopped in ptrace_signal, and not at other > ptrace stops. > > So rather than fix the code to do what was intended 27 years ago, > why don't we accept the fact that PTRACE_KILL is equivalent > to PTRACE_CONT with data = SIGKILL. Because it changes behaviour, making it fail visibly for threads not in ptrace stop? I agree that it should have been acting that way all along, but... the same 27 (28, really) years of the current "no error if it's not stopped" behaviour are there. > If there are regressions or we really care we can tweak the return value > to return 0 instead of -ESRCH when the process is not stopped. How? You obviously can't blindly change -ESRCH to 0 - it *is* the expected error when the recepient is not traced by us, after all. So we'd have to do... what exactly? Special-case PTRACE_KILL in ptrace_check_attach() *and* in its callers? Open-code ptrace_check_attach() for PTRACE_KILL, turning it into "yes, call ptrace_freeze_traced(), bailing out if it fails, but bailing out with 0 instead of -ESRCH", with ptrace_resume() done in the same function (ptrace_kill()?) we open-code it into? Frankly, all variants I see are both ugly and hard on the reader. In the code that is already way too subtle... BTW, glibc has an implementation for Hurd in sysdeps/mach/hurd/ptrace.c: case PTRACE_KILL: va_start (ap, request); pid = va_arg (ap, pid_t); va_end (ap); /* SIGKILL always just terminates the task, so normal kill is just the same when traced. */ return __kill (pid, SIGKILL); Which is to say, there it is turned into a signal right on libc level... It's a mess. We have two problems with userland API: 1) PTRACE_KILL doesn't return an error when the target is not stopped. 2) PTRACE_KILL *and* PTRACE_CONT are unreliable even for stopped targets - some ptrace stops ignore the signal passed with PTRACE_CONT (and ignore SIGKILL from PTRACE_KILL). PTRACE_CONT side of that is impossible to fix by now. And that's on top of the internal problems with ptrace_resume() done on non-stopped child. The change I'd proposed makes PTRACE_KILL deliver SIGKILL regardless of the target state; yours is arguably what we should've done from the very beginning (what we used to have prior to 0.99pl14 and what all other Unices had been doing all along), but it's a visible change wrt error returns and I don't see any sane way to paper over that part. Linus, what would you prefer? I've no strong preferences here... ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][RFC] fix PTRACE_KILL 2021-08-25 5:12 ` Al Viro @ 2021-08-27 18:54 ` Linus Torvalds 2021-08-27 22:05 ` Eric W. Biederman 0 siblings, 1 reply; 5+ messages in thread From: Linus Torvalds @ 2021-08-27 18:54 UTC (permalink / raw) To: Al Viro; +Cc: Eric W. Biederman, Oleg Nesterov, Linux Kernel Mailing List [ Sorry, this got missed by other stuff in my inbox ] On Tue, Aug 24, 2021 at 10:12 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > The change I'd proposed makes PTRACE_KILL deliver SIGKILL regardless of > the target state; yours is arguably what we should've done from the very > beginning (what we used to have prior to 0.99pl14 and what all other > Unices had been doing all along), but it's a visible change wrt error > returns and I don't see any sane way to paper over that part. > > Linus, what would you prefer? I've no strong preferences here... Honestly, I have no huge preferences either, simply because this has clearly been broken so long that people who care have worked around the breakage already, or it just didn't matter enough. Your patch looks fine to me, although looking at it I get the feeling that we might as well do it inside "ptrace_check_attach()", and that we should have just passed that function the request (instead of that odd "ignore_state" argument). ptrace_check_attach() already gets that tasklist_lock that the code requires, and could easily do the PTRACE_KILL checking. So I think that might be an additional cleanup. But your patch is targeted and doesn't look wrong to me either. I don't think Eric's patch works, because if I read that one right, it would actually do that "wait_task_inactive()" which is very much against the whole point of PTRACE_KILL. We want to kill the target asap, and regardless of where it is stuck. Linus ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][RFC] fix PTRACE_KILL 2021-08-27 18:54 ` Linus Torvalds @ 2021-08-27 22:05 ` Eric W. Biederman 0 siblings, 0 replies; 5+ messages in thread From: Eric W. Biederman @ 2021-08-27 22:05 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, Oleg Nesterov, Linux Kernel Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > [ Sorry, this got missed by other stuff in my inbox ] > > On Tue, Aug 24, 2021 at 10:12 PM Al Viro <viro@zeniv.linux.org.uk> wrote: >> >> The change I'd proposed makes PTRACE_KILL deliver SIGKILL regardless of >> the target state; yours is arguably what we should've done from the very >> beginning (what we used to have prior to 0.99pl14 and what all other >> Unices had been doing all along), but it's a visible change wrt error >> returns and I don't see any sane way to paper over that part. >> >> Linus, what would you prefer? I've no strong preferences here... > > Honestly, I have no huge preferences either, simply because this has > clearly been broken so long that people who care have worked around > the breakage already, or it just didn't matter enough. The two choices are change PTRACE_KILL into: "kill(SIGKILL)" or "ptrace(PTRACE_CONT, SIGKILL)". When trying to figure out how userspace understands PTRACE_KILL today, I have found at least one piece of userspace that figures the definition of PTRACE_KILL is PTRACE_CONT with the signal SIGKILL. Frankly I find that a fair characterization, because except for the return code when the task is not stopped PTRACE_KILL works exactly like PTRACE_CONT with signal SIGKILL. As there are real userspace visible differences between PTRACE_CONT and kill(SIGKILL) I think we should go with the PTRACE_CONT definition as that is the least likely to break userspace, and the only thing we really care about here is making it so that malicious userspace can not cause the kernel to misbehave. > I don't think Eric's patch works, because if I read that one right, it > would actually do that "wait_task_inactive()" which is very much > against the whole point of PTRACE_KILL. We want to kill the target > asap, and regardless of where it is stuck. It will do wait_task_inactive in one of the cases where PTRACE_KILL is broken today. But since the cost of a wait_task_inactive looks very much like the cost of a spin_lock I don't see the problem. The code in ptrace_check_attach already waits on several other spin locks. The code in ptrace_attach verifies that the target task in in state TASK_TRACED. Which means that the target task is in ptrace_stop and has already done "set_special_state(TASK_TRACED)". At that point there are two possibilities. Either the target task will have reached freezable_schedule in ptrace_stop and wait_task_inactive won't block at all, or wait_task_inactive will need to spin waiting for the target task to reach freezable_schedule. There is nothing in ptrace_stop that can sleep as all of the work happens under a spin lock so the worst case wait in wait_task_inactive should be quite short. All of that said it is probably worth adding to my patch the logic so that when ptrace_freeze_traced fails or wait_task_inactive fails the code bails out immediately and returns 0 instead of -ESRCH. Just to avoid any userspace behavioral differences in PTRACE_KILL. So we don't even need to consider regressions. Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-08-27 22:06 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-16 21:50 [PATCH][RFC] fix PTRACE_KILL Al Viro 2021-08-17 18:21 ` Eric W. Biederman 2021-08-25 5:12 ` Al Viro 2021-08-27 18:54 ` Linus Torvalds 2021-08-27 22:05 ` Eric W. Biederman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.