From: ebiederm@xmission.com (Eric W. Biederman) To: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org>, Aleksa Sarai <asarai@suse.com>, Andy Lutomirski <luto@amacapital.net>, Attila Fazekas <afazekas@redhat.com>, Jann Horn <jann@thejh.net>, Kees Cook <keescook@chromium.org>, Michal Hocko <mhocko@kernel.org>, Ulrich Obergfell <uobergfe@redhat.com>, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org Subject: Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped. Date: Sun, 02 Apr 2017 16:07:17 -0500 [thread overview] Message-ID: <87inmmbjsq.fsf@xmission.com> (raw) In-Reply-To: <20170402161518.GC12637@redhat.com> (Oleg Nesterov's message of "Sun, 2 Apr 2017 18:15:18 +0200") Oleg Nesterov <oleg@redhat.com> writes: > Perhaps I am wrong, but I think you underestimate the problems, and it is > not clear to me if we really want this. I worked through quite a bit of it and I realized a few fundamental issues. The task struct must remain visible until it is reaped and we use siglock to protect in unhash process to protect that reaping. Further tsk->sighand == NULL winds up being a flag used to tell if release_task has been called. To get an usable count on sighand struct all that needed to be done was to change the reference counting of sighand_struct to count processes and not threads. Which is what I wound up posting. > Anyway, Eric, even if we can and want to do this, why we can't do this on > top of my fix? Because your reduction in scope of cred_guard_mutex is fundamentally broken and unnecessary. > I simply fail to understand why you dislike it that much. Yes it is not > pretty, I said this many times, but it is safe in that it doesn't really > change the current behaviour. No it is not safe. And it promotes wrong thinking which is even more dangerous. I reviewed the code and cred_guard_mutex needs to cover what it covers. > I am much more worried about 2/2 you didn't argue with, this patch _can_ > break something and this is obviously not good even if PTRACE_EVENT_EXIT > was always broken. I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually know what the implications of changing it are. Let's see... gdb - no upstart - no lldb - yes strace - no It looks like lldb is worth testing with your PTRACE_EVENT_EXIT change to see if anything breaks. I think we can get away with changing the exec case but it does look worth testing. I hadn't realized you hadn't looked to see what was using PTRACE_O_TRACEEXIT to see if any part of userspace cares. Hmm. This is interesting. From the strace documentation: > Tracer cannot assume that ptrace-stopped tracee exists. There are many > scenarios when tracee may die while stopped (such as SIGKILL). > Therefore, tracer must always be prepared to handle ESRCH error on any > ptrace operation. Unfortunately, the same error is returned if tracee > exists but is not ptrace-stopped (for commands which require stopped > tracee), or if it is not traced by process which issued ptrace call. > Tracer needs to keep track of stopped/running state, and interpret > ESRCH as "tracee died unexpectedly" only if it knows that tracee has > been observed to enter ptrace-stop. Note that there is no guarantee > that waitpid(WNOHANG) will reliably report tracee's death status if > ptrace operation returned ESRCH. waitpid(WNOHANG) may return 0 instead. > IOW: tracee may be "not yet fully dead" but already refusing ptrace > ops. If delivering a second SIGKILL to a ptraced stopped processes will make it continue we have a very interesting out.. When we stop in ptrace_stop we stop in TASK_TRACED == (TASK_WAKEKILL|__TASK_TRACED) Delivery of a SIGKILL to that task has queue SIGKILL and call signal_wake_up_state(t, TASK_WAKEKILL). Which becomes wake_up_state(t, TASK_INTERRUPTIBLE | TASK_WAKEKILL) Which wakes up the process. So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT before the tracers find it. Therefore we are only talking a quality of implementation issue if we actually stop and wait for the tracer or not. .... Which brings us to your PTRACE_EVENT_EXIT patch. I think may_ptrace_stop is tested in the wrong place, and is probably buggy. - We should send the signal in all cases except when the ptracing parent does not exist aka (!current->ptrace). The siginfo contains enough information to understand what happened if anyone is listening. - Then we should send the group stop. - Then if we don't want to wait we should: __set_current_state(TASK_RUNNING) - Then we should drop the locks and only call freezable_schedule if we want to wait. That way userspace thinks someone else just sent a SIGKILL and killed the thread before it had a chance to look (which is effectively what we are doing). That sounds idea for both core-dumps and this case. Eric
WARNING: multiple messages have this Message-ID (diff)
From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) To: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, Aleksa Sarai <asarai-IBi9RG/b67k@public.gmane.org>, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>, Attila Fazekas <afazekas-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Jann Horn <jann-XZ1E9jl8jIdeoWH0uzbU5w@public.gmane.org>, Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>, Ulrich Obergfell <uobergfe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Subject: Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped. Date: Sun, 02 Apr 2017 16:07:17 -0500 [thread overview] Message-ID: <87inmmbjsq.fsf@xmission.com> (raw) In-Reply-To: <20170402161518.GC12637-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> (Oleg Nesterov's message of "Sun, 2 Apr 2017 18:15:18 +0200") Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: > Perhaps I am wrong, but I think you underestimate the problems, and it is > not clear to me if we really want this. I worked through quite a bit of it and I realized a few fundamental issues. The task struct must remain visible until it is reaped and we use siglock to protect in unhash process to protect that reaping. Further tsk->sighand == NULL winds up being a flag used to tell if release_task has been called. To get an usable count on sighand struct all that needed to be done was to change the reference counting of sighand_struct to count processes and not threads. Which is what I wound up posting. > Anyway, Eric, even if we can and want to do this, why we can't do this on > top of my fix? Because your reduction in scope of cred_guard_mutex is fundamentally broken and unnecessary. > I simply fail to understand why you dislike it that much. Yes it is not > pretty, I said this many times, but it is safe in that it doesn't really > change the current behaviour. No it is not safe. And it promotes wrong thinking which is even more dangerous. I reviewed the code and cred_guard_mutex needs to cover what it covers. > I am much more worried about 2/2 you didn't argue with, this patch _can_ > break something and this is obviously not good even if PTRACE_EVENT_EXIT > was always broken. I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually know what the implications of changing it are. Let's see... gdb - no upstart - no lldb - yes strace - no It looks like lldb is worth testing with your PTRACE_EVENT_EXIT change to see if anything breaks. I think we can get away with changing the exec case but it does look worth testing. I hadn't realized you hadn't looked to see what was using PTRACE_O_TRACEEXIT to see if any part of userspace cares. Hmm. This is interesting. From the strace documentation: > Tracer cannot assume that ptrace-stopped tracee exists. There are many > scenarios when tracee may die while stopped (such as SIGKILL). > Therefore, tracer must always be prepared to handle ESRCH error on any > ptrace operation. Unfortunately, the same error is returned if tracee > exists but is not ptrace-stopped (for commands which require stopped > tracee), or if it is not traced by process which issued ptrace call. > Tracer needs to keep track of stopped/running state, and interpret > ESRCH as "tracee died unexpectedly" only if it knows that tracee has > been observed to enter ptrace-stop. Note that there is no guarantee > that waitpid(WNOHANG) will reliably report tracee's death status if > ptrace operation returned ESRCH. waitpid(WNOHANG) may return 0 instead. > IOW: tracee may be "not yet fully dead" but already refusing ptrace > ops. If delivering a second SIGKILL to a ptraced stopped processes will make it continue we have a very interesting out.. When we stop in ptrace_stop we stop in TASK_TRACED == (TASK_WAKEKILL|__TASK_TRACED) Delivery of a SIGKILL to that task has queue SIGKILL and call signal_wake_up_state(t, TASK_WAKEKILL). Which becomes wake_up_state(t, TASK_INTERRUPTIBLE | TASK_WAKEKILL) Which wakes up the process. So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT before the tracers find it. Therefore we are only talking a quality of implementation issue if we actually stop and wait for the tracer or not. .... Which brings us to your PTRACE_EVENT_EXIT patch. I think may_ptrace_stop is tested in the wrong place, and is probably buggy. - We should send the signal in all cases except when the ptracing parent does not exist aka (!current->ptrace). The siginfo contains enough information to understand what happened if anyone is listening. - Then we should send the group stop. - Then if we don't want to wait we should: __set_current_state(TASK_RUNNING) - Then we should drop the locks and only call freezable_schedule if we want to wait. That way userspace thinks someone else just sent a SIGKILL and killed the thread before it had a chance to look (which is effectively what we are doing). That sounds idea for both core-dumps and this case. Eric
next prev parent reply other threads:[~2017-04-02 21:12 UTC|newest] Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-02-13 14:14 [PATCH 0/2] fix the traced mt-exec deadlock Oleg Nesterov 2017-02-13 14:15 ` [PATCH 1/2] exec: don't wait for zombie threads with cred_guard_mutex held Oleg Nesterov 2017-02-13 16:12 ` kbuild test robot 2017-02-13 16:47 ` Oleg Nesterov 2017-02-13 16:39 ` kbuild test robot 2017-02-13 17:27 ` Mika Penttilä 2017-02-13 18:01 ` Oleg Nesterov 2017-02-13 18:04 ` [PATCH V2 " Oleg Nesterov 2017-02-16 11:42 ` Eric W. Biederman 2017-02-20 15:22 ` Oleg Nesterov 2017-02-20 15:36 ` Oleg Nesterov 2017-02-20 22:30 ` Eric W. Biederman 2017-02-21 17:53 ` Oleg Nesterov 2017-02-21 20:20 ` Eric W. Biederman 2017-02-22 17:41 ` Oleg Nesterov 2017-02-17 4:42 ` Eric W. Biederman 2017-02-20 15:50 ` Oleg Nesterov 2017-02-13 14:15 ` [PATCH 2/2] ptrace: ensure PTRACE_EVENT_EXIT won't stop if the tracee is killed by exec Oleg Nesterov 2017-02-24 16:03 ` [PATCH 0/2] fix the traced mt-exec deadlock Oleg Nesterov 2017-03-03 1:05 ` Eric W. Biederman 2017-03-03 17:33 ` Oleg Nesterov 2017-03-03 18:23 ` Eric W. Biederman 2017-03-03 18:23 ` Eric W. Biederman 2017-03-03 18:59 ` Eric W. Biederman 2017-03-03 18:59 ` Eric W. Biederman 2017-03-03 20:06 ` Eric W. Biederman 2017-03-03 20:06 ` Eric W. Biederman 2017-03-03 20:11 ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Eric W. Biederman 2017-03-03 20:11 ` Eric W. Biederman 2017-03-04 17:03 ` Oleg Nesterov 2017-03-30 8:07 ` Eric W. Biederman 2017-04-01 5:11 ` [RFC][PATCH 0/2] exec: Fixing ptrace'd mulit-threaded hang Eric W. Biederman 2017-04-01 5:11 ` Eric W. Biederman 2017-04-01 5:14 ` [RFC][PATCH 1/2] sighand: Count each thread group once in sighand_struct Eric W. Biederman 2017-04-01 5:14 ` Eric W. Biederman 2017-04-01 5:16 ` [RFC][PATCH 2/2] exec: If possible don't wait for ptraced threads to be reaped Eric W. Biederman 2017-04-01 5:16 ` Eric W. Biederman 2017-04-02 15:35 ` Oleg Nesterov 2017-04-02 15:35 ` Oleg Nesterov 2017-04-02 18:53 ` Eric W. Biederman 2017-04-02 18:53 ` Eric W. Biederman 2017-04-03 18:12 ` Oleg Nesterov 2017-04-03 18:12 ` Oleg Nesterov 2017-04-03 21:04 ` Eric W. Biederman 2017-04-05 16:44 ` Oleg Nesterov 2017-04-02 15:38 ` [RFC][PATCH 0/2] exec: Fixing ptrace'd mulit-threaded hang Oleg Nesterov 2017-04-02 15:38 ` Oleg Nesterov 2017-04-02 22:50 ` [RFC][PATCH v2 0/5] " Eric W. Biederman 2017-04-02 22:50 ` Eric W. Biederman 2017-04-02 22:51 ` [RFC][PATCH v2 1/5] ptrace: Don't wait in PTRACE_O_TRACEEXIT for exec or coredump Eric W. Biederman 2017-04-02 22:51 ` Eric W. Biederman 2017-04-05 16:19 ` Oleg Nesterov 2017-04-02 22:51 ` [RFC][PATCH v2 2/5] sighand: Count each thread group once in sighand_struct Eric W. Biederman 2017-04-02 22:51 ` Eric W. Biederman 2017-04-02 22:52 ` [RFC][PATCH v2 3/5] clone: Disallown CLONE_THREAD with a shared sighand_struct Eric W. Biederman 2017-04-02 22:52 ` Eric W. Biederman 2017-04-05 16:24 ` Oleg Nesterov 2017-04-05 16:24 ` Oleg Nesterov 2017-04-05 17:34 ` Eric W. Biederman 2017-04-05 18:11 ` Oleg Nesterov 2017-04-02 22:53 ` [RFC][PATCH v2 4/5] exec: If possible don't wait for ptraced threads to be reaped Eric W. Biederman 2017-04-02 22:53 ` Eric W. Biederman 2017-04-05 16:15 ` Oleg Nesterov 2017-04-02 22:57 ` [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec Eric W. Biederman 2017-04-02 22:57 ` Eric W. Biederman 2017-04-05 16:18 ` Oleg Nesterov 2017-04-05 16:18 ` Oleg Nesterov 2017-04-05 18:16 ` Eric W. Biederman 2017-04-05 18:16 ` Eric W. Biederman 2017-04-06 15:48 ` Oleg Nesterov 2017-04-06 15:48 ` Oleg Nesterov 2017-04-02 16:15 ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Oleg Nesterov 2017-04-02 16:15 ` Oleg Nesterov 2017-04-02 21:07 ` Eric W. Biederman [this message] 2017-04-02 21:07 ` Eric W. Biederman 2017-04-03 18:37 ` Oleg Nesterov 2017-04-03 18:37 ` Oleg Nesterov 2017-04-03 22:49 ` Eric W. Biederman 2017-04-03 22:49 ` Eric W. Biederman 2017-04-03 22:49 ` scope of cred_guard_mutex Eric W. Biederman 2017-04-03 22:49 ` Eric W. Biederman 2017-04-05 16:08 ` Oleg Nesterov 2017-04-05 16:11 ` Kees Cook 2017-04-05 17:53 ` Eric W. Biederman 2017-04-05 18:15 ` Oleg Nesterov 2017-04-06 15:55 ` Oleg Nesterov 2017-04-06 15:55 ` Oleg Nesterov 2017-04-07 22:07 ` Kees Cook 2017-04-07 22:07 ` Kees Cook 2017-09-04 3:19 ` [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped Robert O'Callahan 2017-09-04 3:19 ` Robert O'Callahan 2017-03-04 16:54 ` [PATCH 0/2] fix the traced mt-exec deadlock Oleg Nesterov 2017-03-04 16:54 ` Oleg Nesterov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=87inmmbjsq.fsf@xmission.com \ --to=ebiederm@xmission.com \ --cc=afazekas@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=asarai@suse.com \ --cc=jann@thejh.net \ --cc=keescook@chromium.org \ --cc=linux-api@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=luto@amacapital.net \ --cc=mhocko@kernel.org \ --cc=oleg@redhat.com \ --cc=uobergfe@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.