From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752790Ab1E3XnR (ORCPT ); Mon, 30 May 2011 19:43:17 -0400 Received: from mail-fx0-f46.google.com ([209.85.161.46]:44807 "EHLO mail-fx0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750818Ab1E3XnQ (ORCPT ); Mon, 30 May 2011 19:43:16 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=from:to:subject:date:user-agent:cc:references:in-reply-to :mime-version:content-type:content-transfer-encoding :content-disposition:message-id; b=BtXW0NfhSi7OEXXZDazvX7RLiyHas4rvQhCAvkZHgf8E+m/3AMSldp87N/xIkkKHvR yQTYU3MsnoA4utIdBNv79mpaxInu0Wwrs1ZwmPToGLQSg0GIuJ0yVUVIGGngrXygACfR vjrrwMM3sVoP/XoJlHQZ9Q0RQAyVKR1Pl7N0k= From: Denys Vlasenko To: Oleg Nesterov Subject: Re: execve-under-ptrace API bug (was Re: Ptrace documentation, draft #3) Date: Tue, 31 May 2011 01:43:12 +0200 User-Agent: KMail/1.8.2 Cc: Tejun Heo , jan.kratochvil@redhat.com, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, indan@nul.nu References: <20110530164252.GB11325@redhat.com> In-Reply-To: <20110530164252.GB11325@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201105310143.12280.vda.linux@googlemail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Monday 30 May 2011 18:42, Oleg Nesterov wrote: > On 05/30, Denys Vlasenko wrote: > > > > On Mon, May 30, 2011 at 1:40 PM, Denys Vlasenko > > wrote: > > > > > > Which is fine. Can we make the death from this "internal SIGKILL" > > > visible to the tracer of killed tracees? > > > > Ok, let's take a deeper look at API needs. What we need to report, and when? > > OK. but I'm afraid I am a bit confused ;) I am trying to write up the ptrace API (in this particular thread, wrt execve). Basically, I try to sync up your / Jan's / Tejun's knowledge about the following: * how current kernels are supposed to work, both: - what we promise, and - what we DON'T promise (such as "don't expect ptrace ops to always succeed, you may get ESRCH any time", or "wait(WHOHANG) may return spurious 0"...) * what actually does work (modulo unknown bugs), * what is known to be "slightly" broken, but likely to be fixed, and finally, * what is broken so hopelessly that some API changes/additions will be needed, While working on thie document, and thanks to your request to run actual test with multi-threaded execve, we just discovered that our idea of how API works now doesn't match reality: other threads do not die silently. They do emit death notifications. Only execve'ing thread itself "disappears". Let's decide how we want ptrace API to work in this area. The behavior I observed with the test program: 6797: thread 0 (leader): sleeps in pause() 6798: thread 1: sleeps in pause() 6799: thread 2: execve("/proc/self/exe") Tracer sees the following: 6798: status:0006057f WIFSTOPPED sig:5 (TRAP) event:EXIT 6797: status:0006057f WIFSTOPPED sig:5 (TRAP) event:EXIT 6798: status:00000000 WIFEXITED exitcode:0 6797: status:0004057f WIFSTOPPED sig:5 (TRAP) event:EXEC (I tested it with 10 threads and the pattern seems to be the same) Every thread including leader, but excluding execve'ing one, reports EVENT_EXIT. Then every thread. excluding leader and excluding execve'ing one, reports WIFEXITED. (question to you, Oleg:) ??? do we guarantee that EVENT_EXIT happens? Do we guarantee that WIFEXITED happens? (If not, do you think we can fix it, or we are better to not include such a guarantee in the API?) Do we guarantee the order between them? Note: WIFEXITED of thread 1 can happen before EVENT_EXIT of thread 0. IOW: there is no ordering *between* threads for these ptrace-stops. (I saw reordering with more threads) Then we get EVENT_EXEC with pid of the leader. execve'ing thread's pid is no longer usable by tracer after this. ??? do we guarantee that this happens after all EVENT_EXITs and WIFEXITEDs? > > (1) execve'ing thread is obviously alive. current kernel already > > reports its execve success. The only thing we need to add is > > a way to retrieve its former pid, so that tracer can drop > > former pid's data, and also to cater for the "two execve's" case. > > This is only needed if strace doesn't track the tracee's tgids, right? > > > PTRACE_EVENT_EXEC seems to be a good place to do it. > > Say, using GETEVENTMSG? > > Yes, Tejun suggested the same. Ignoring the pid_ns issues, this is trivial. > If the tracer runs in the parent namespace it is not, we can't simply > record the old tid. Lets ignore the problems with namespaces for now... Yes, this would make tracee's life much easier if we'd tell it what was the pid of the tracee which exec'ed, and therefore this pid is gone. > OTOH, there is a problem: we should trace them both. Otherwise, if we > only trace L, even GETEVENTMSG can't help. In practice, people do this more rarely than tracing every thread. But anyway, I have an idea... > And this means we can only > rely on PTRACE_EVENT_EXIT currently. Which needs fixes ;) What is broken? > In short: I do not think we can make what you want (assuming I understand > your suggestion correctly). Consider the simple example: we are tracing > the single thread and it is the group leader, another (untraced) thread > execs. I do not know what would be the right behavior in this case. It depends whether we consider "tracedness" to be attached to a pid or to a thread of execution. I think the better (more general) question is "what if both threads are traced by _different_ tracers?". Possible answers: If we think "tracedness" is attached to pid: tracer 0 (traces leader) sees: status:0006057f WIFSTOPPED sig:5 (TRAP) event:EXIT status:0004057f WIFSTOPPED sig:5 (TRAP) event:EXEC tracer 1 (traces execve'ing thread) sees: What is bad about it: * tracer 2 has no idea whatsoever that its tracee is gone. If we think "tracedness" is attached to thread (task struct): tracer 0 (traces leader) sees: status:0006057f WIFSTOPPED sig:5 (TRAP) event:EXIT tracer 1 (traces execve'ing thread) sees: status:0004057f WIFSTOPPED sig:5 (TRAP) event:EXEC, and pid has changed! What is bad about it: * tracer 0 expects yet another notification, "status:00000000 WIFEXITED exitcode:0" or similar, but it will never come. * tracer 1 can be rather confused by getting EVENT_EXEC from a tracee it knows nothing about (since the pid has changed!). If it has more than one tracee, it can't guess which one did that. (Yes, it can resort to ugly racy hacks...) I think the second case is "less broken". What API changes can make it better for userspace? First, returning old pid via GETEVENTMSG helps with second badness - tracer 1 can fetch it, and understand which of his tracees changed pid just now. And second, if we'd return "status:00000000 WIFEXITED exitcode:0" thing on execve _for leader too_, then tracer 0 will be happy (it will see consistent sequence of events). If it's hard to do, then alternatively, we can add this information to EVENT_EXIT somehow. Normally, GETEVENTMSG returns exit status. Can be hijack a bit there to say "dont expect WIFEXITED on me"? Final touch may be to make "I exited because some other thread exec'ed" notification different from "I exited because of _exit(0)". It would make strace to say what _actually_ happened, which is a good thing. Silly ideas department proposes returning WIFSIGNALED, WTERMSIG = 0 ;) -- vda