From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751900AbdDBVMh (ORCPT ); Sun, 2 Apr 2017 17:12:37 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:57495 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751652AbdDBVMf (ORCPT ); Sun, 2 Apr 2017 17:12:35 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Oleg Nesterov Cc: Andrew Morton , Aleksa Sarai , Andy Lutomirski , Attila Fazekas , Jann Horn , Kees Cook , Michal Hocko , Ulrich Obergfell , linux-kernel@vger.kernel.org, linux-api@vger.kernel.org References: <20170213141452.GA30203@redhat.com> <20170224160354.GA845@redhat.com> <87shmv6ufl.fsf@xmission.com> <20170303173326.GA17899@redhat.com> <87tw7axlr0.fsf@xmission.com> <87d1dyw5iw.fsf@xmission.com> <87tw7aunuh.fsf@xmission.com> <87lgsmunmj.fsf_-_@xmission.com> <20170304170312.GB13131@redhat.com> <8760ir192p.fsf@xmission.com> <20170402161518.GC12637@redhat.com> Date: Sun, 02 Apr 2017 16:07:17 -0500 In-Reply-To: <20170402161518.GC12637@redhat.com> (Oleg Nesterov's message of "Sun, 2 Apr 2017 18:15:18 +0200") Message-ID: <87inmmbjsq.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1cumnY-00026Z-Bq;;;mid=<87inmmbjsq.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=67.3.234.240;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/BYunBS2SI1GgOqw88l5EImi8+EDtu6oo= X-SA-Exim-Connect-IP: 67.3.234.240 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.7 XMSubLong Long Subject * 1.5 TR_Symld_Words too many words that have symbols inside * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa07 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_01 4+ unique symbols in subject * 1.0 T_XMHurry_00 Hurry and Do Something X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ***;Oleg Nesterov X-Spam-Relay-Country: X-Spam-Timing: total 400 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 3.0 (0.7%), b_tie_ro: 1.99 (0.5%), parse: 1.16 (0.3%), extract_message_metadata: 4.8 (1.2%), get_uri_detail_list: 2.8 (0.7%), tests_pri_-1000: 3.9 (1.0%), tests_pri_-950: 1.17 (0.3%), tests_pri_-900: 0.98 (0.2%), tests_pri_-400: 28 (7.1%), check_bayes: 27 (6.8%), b_tokenize: 9 (2.3%), b_tok_get_all: 9 (2.4%), b_comp_prob: 3.3 (0.8%), b_tok_touch_all: 2.9 (0.7%), b_finish: 0.68 (0.2%), tests_pri_0: 344 (85.9%), check_dkim_signature: 0.65 (0.2%), check_dkim_adsp: 3.8 (1.0%), tests_pri_500: 3.9 (1.0%), rewrite_mail: 0.00 (0.0%) Subject: Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped. X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Oleg Nesterov writes: > Perhaps I am wrong, but I think you underestimate the problems, and it is > not clear to me if we really want this. I worked through quite a bit of it and I realized a few fundamental issues. The task struct must remain visible until it is reaped and we use siglock to protect in unhash process to protect that reaping. Further tsk->sighand == NULL winds up being a flag used to tell if release_task has been called. To get an usable count on sighand struct all that needed to be done was to change the reference counting of sighand_struct to count processes and not threads. Which is what I wound up posting. > Anyway, Eric, even if we can and want to do this, why we can't do this on > top of my fix? Because your reduction in scope of cred_guard_mutex is fundamentally broken and unnecessary. > I simply fail to understand why you dislike it that much. Yes it is not > pretty, I said this many times, but it is safe in that it doesn't really > change the current behaviour. No it is not safe. And it promotes wrong thinking which is even more dangerous. I reviewed the code and cred_guard_mutex needs to cover what it covers. > I am much more worried about 2/2 you didn't argue with, this patch _can_ > break something and this is obviously not good even if PTRACE_EVENT_EXIT > was always broken. I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually know what the implications of changing it are. Let's see... gdb - no upstart - no lldb - yes strace - no It looks like lldb is worth testing with your PTRACE_EVENT_EXIT change to see if anything breaks. I think we can get away with changing the exec case but it does look worth testing. I hadn't realized you hadn't looked to see what was using PTRACE_O_TRACEEXIT to see if any part of userspace cares. Hmm. This is interesting. From the strace documentation: > Tracer cannot assume that ptrace-stopped tracee exists. There are many > scenarios when tracee may die while stopped (such as SIGKILL). > Therefore, tracer must always be prepared to handle ESRCH error on any > ptrace operation. Unfortunately, the same error is returned if tracee > exists but is not ptrace-stopped (for commands which require stopped > tracee), or if it is not traced by process which issued ptrace call. > Tracer needs to keep track of stopped/running state, and interpret > ESRCH as "tracee died unexpectedly" only if it knows that tracee has > been observed to enter ptrace-stop. Note that there is no guarantee > that waitpid(WNOHANG) will reliably report tracee's death status if > ptrace operation returned ESRCH. waitpid(WNOHANG) may return 0 instead. > IOW: tracee may be "not yet fully dead" but already refusing ptrace > ops. If delivering a second SIGKILL to a ptraced stopped processes will make it continue we have a very interesting out.. When we stop in ptrace_stop we stop in TASK_TRACED == (TASK_WAKEKILL|__TASK_TRACED) Delivery of a SIGKILL to that task has queue SIGKILL and call signal_wake_up_state(t, TASK_WAKEKILL). Which becomes wake_up_state(t, TASK_INTERRUPTIBLE | TASK_WAKEKILL) Which wakes up the process. So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT before the tracers find it. Therefore we are only talking a quality of implementation issue if we actually stop and wait for the tracer or not. .... Which brings us to your PTRACE_EVENT_EXIT patch. I think may_ptrace_stop is tested in the wrong place, and is probably buggy. - We should send the signal in all cases except when the ptracing parent does not exist aka (!current->ptrace). The siginfo contains enough information to understand what happened if anyone is listening. - Then we should send the group stop. - Then if we don't want to wait we should: __set_current_state(TASK_RUNNING) - Then we should drop the locks and only call freezable_schedule if we want to wait. That way userspace thinks someone else just sent a SIGKILL and killed the thread before it had a chance to look (which is effectively what we are doing). That sounds idea for both core-dumps and this case. Eric From mboxrd@z Thu Jan 1 00:00:00 1970 From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) Subject: Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped. Date: Sun, 02 Apr 2017 16:07:17 -0500 Message-ID: <87inmmbjsq.fsf@xmission.com> References: <20170213141452.GA30203@redhat.com> <20170224160354.GA845@redhat.com> <87shmv6ufl.fsf@xmission.com> <20170303173326.GA17899@redhat.com> <87tw7axlr0.fsf@xmission.com> <87d1dyw5iw.fsf@xmission.com> <87tw7aunuh.fsf@xmission.com> <87lgsmunmj.fsf_-_@xmission.com> <20170304170312.GB13131@redhat.com> <8760ir192p.fsf@xmission.com> <20170402161518.GC12637@redhat.com> Mime-Version: 1.0 Content-Type: text/plain Return-path: In-Reply-To: <20170402161518.GC12637-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> (Oleg Nesterov's message of "Sun, 2 Apr 2017 18:15:18 +0200") Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Oleg Nesterov Cc: Andrew Morton , Aleksa Sarai , Andy Lutomirski , Attila Fazekas , Jann Horn , Kees Cook , Michal Hocko , Ulrich Obergfell , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-api@vger.kernel.org Oleg Nesterov writes: > Perhaps I am wrong, but I think you underestimate the problems, and it is > not clear to me if we really want this. I worked through quite a bit of it and I realized a few fundamental issues. The task struct must remain visible until it is reaped and we use siglock to protect in unhash process to protect that reaping. Further tsk->sighand == NULL winds up being a flag used to tell if release_task has been called. To get an usable count on sighand struct all that needed to be done was to change the reference counting of sighand_struct to count processes and not threads. Which is what I wound up posting. > Anyway, Eric, even if we can and want to do this, why we can't do this on > top of my fix? Because your reduction in scope of cred_guard_mutex is fundamentally broken and unnecessary. > I simply fail to understand why you dislike it that much. Yes it is not > pretty, I said this many times, but it is safe in that it doesn't really > change the current behaviour. No it is not safe. And it promotes wrong thinking which is even more dangerous. I reviewed the code and cred_guard_mutex needs to cover what it covers. > I am much more worried about 2/2 you didn't argue with, this patch _can_ > break something and this is obviously not good even if PTRACE_EVENT_EXIT > was always broken. I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually know what the implications of changing it are. Let's see... gdb - no upstart - no lldb - yes strace - no It looks like lldb is worth testing with your PTRACE_EVENT_EXIT change to see if anything breaks. I think we can get away with changing the exec case but it does look worth testing. I hadn't realized you hadn't looked to see what was using PTRACE_O_TRACEEXIT to see if any part of userspace cares. Hmm. This is interesting. From the strace documentation: > Tracer cannot assume that ptrace-stopped tracee exists. There are many > scenarios when tracee may die while stopped (such as SIGKILL). > Therefore, tracer must always be prepared to handle ESRCH error on any > ptrace operation. Unfortunately, the same error is returned if tracee > exists but is not ptrace-stopped (for commands which require stopped > tracee), or if it is not traced by process which issued ptrace call. > Tracer needs to keep track of stopped/running state, and interpret > ESRCH as "tracee died unexpectedly" only if it knows that tracee has > been observed to enter ptrace-stop. Note that there is no guarantee > that waitpid(WNOHANG) will reliably report tracee's death status if > ptrace operation returned ESRCH. waitpid(WNOHANG) may return 0 instead. > IOW: tracee may be "not yet fully dead" but already refusing ptrace > ops. If delivering a second SIGKILL to a ptraced stopped processes will make it continue we have a very interesting out.. When we stop in ptrace_stop we stop in TASK_TRACED == (TASK_WAKEKILL|__TASK_TRACED) Delivery of a SIGKILL to that task has queue SIGKILL and call signal_wake_up_state(t, TASK_WAKEKILL). Which becomes wake_up_state(t, TASK_INTERRUPTIBLE | TASK_WAKEKILL) Which wakes up the process. So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT before the tracers find it. Therefore we are only talking a quality of implementation issue if we actually stop and wait for the tracer or not. .... Which brings us to your PTRACE_EVENT_EXIT patch. I think may_ptrace_stop is tested in the wrong place, and is probably buggy. - We should send the signal in all cases except when the ptracing parent does not exist aka (!current->ptrace). The siginfo contains enough information to understand what happened if anyone is listening. - Then we should send the group stop. - Then if we don't want to wait we should: __set_current_state(TASK_RUNNING) - Then we should drop the locks and only call freezable_schedule if we want to wait. That way userspace thinks someone else just sent a SIGKILL and killed the thread before it had a chance to look (which is effectively what we are doing). That sounds idea for both core-dumps and this case. Eric