From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757497Ab1E3Qo1 (ORCPT ); Mon, 30 May 2011 12:44:27 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36643 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757336Ab1E3Qo0 (ORCPT ); Mon, 30 May 2011 12:44:26 -0400 Date: Mon, 30 May 2011 18:42:52 +0200 From: Oleg Nesterov To: Denys Vlasenko Cc: Tejun Heo , jan.kratochvil@redhat.com, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, indan@nul.nu Subject: Re: execve-under-ptrace API bug (was Re: Ptrace documentation, draft #3) Message-ID: <20110530164252.GB11325@redhat.com> References: <20110525143250.GJ10146@htj.dyndns.org> <201105300528.17384.vda.linux@googlemail.com> <20110530084906.GA11773@htj.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/30, Denys Vlasenko wrote: > > On Mon, May 30, 2011 at 1:40 PM, Denys Vlasenko > wrote: > > > > Which is fine. Can we make the death from this "internal SIGKILL" > > visible to the tracer of killed tracees? > > Ok, let's take a deeper look at API needs. What we need to report, and when? OK. but I'm afraid I am a bit confused ;) > We have three kinds of threads at execve: > 1. execve'ing thread, > 2. leader, two cases: (2a) leader is still alive, (2b) leader has exited by now. > 3. other threads. > > (3) is the most simple: API should report death of these threads. > There is no need to ensure these death notifications are reported > before execve syscall exit is reported. I guess you mean PTRACE_EVENT_EXIT? Probably yes, > They can be consumed > by tracer later. by wait(WEXITED), OK. > (1) execve'ing thread is obviously alive. current kernel already > reports its execve success. The only thing we need to add is > a way to retrieve its former pid, so that tracer can drop > former pid's data, and also to cater for the "two execve's" case. This is only needed if strace doesn't track the tracee's tgids, right? > PTRACE_EVENT_EXEC seems to be a good place to do it. > Say, using GETEVENTMSG? Yes, Tejun suggested the same. Ignoring the pid_ns issues, this is trivial. If the tracer runs in the parent namespace it is not, we can't simply record the old tid. Lets ignore the problems with namespaces for now... > (2) is the most problematic. If leader is still alive, should > we report its death? This makes sense since if we do, > and if we ensure its death is always reported before > PTRACE_EVENT_EXEC, Note that we simply can't report this after PTRACE_EVENT_EXEC because its tid was already re-used by the new group leader. And it is not trivial to report this before. Even if we forget about the technical problems, please recall that wait() can't work in this case. Forget about de_thread/exec, suppose that the group leader simply exits before other threads. Yes, we are going to change this somehow. But I am not sure it really makes sense to report the death of the old leader. Why? We know for sure it is already dead at PTRACE_EVENT_EXEC time, but at the same time it is better to pretend that it is not dead, it is the execve'ing thread who should be considered dead in some sense. IOW. Two threads, L is the leader with tid == tgid == 100, and T with tid = 101. T does execve(). After that we have the process with the same tgid and its new leader has tid == 100 as well. If we forget about the actual implementation, it is T who silently disappears, not L. OTOH, there is a problem: we should trace them both. Otherwise, if we only trace L, even GETEVENTMSG can't help. And this means we can only rely on PTRACE_EVENT_EXIT currently. Which needs fixes ;) We could add another trap, but why it would be better? In short: I do not think we can make what you want (assuming I understand your suggestion correctly). Consider the simple example: we are tracing the single thread and it is the group leader, another (untraced) thread execs. I do not think we should change de_thread() so that the execing thread should sleep waiting for waitpid(traced_leader_pid, WEXITED) from the tracer before it reuses its pid. And in any case, even if we do this, we should solve another problem with the dead group leader first. > We definitely must ensure, though, that if leader races with > execve'ing thread and enters exit(2), its death is never reported > *after* PTRACE_EVENT_EXEC Yes... but this is not possible? Oleg.