From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758798AbbKSQt5 (ORCPT ); Thu, 19 Nov 2015 11:49:57 -0500 Received: from mx1.redhat.com ([209.132.183.28]:44261 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756882AbbKSQtz (ORCPT ); Thu, 19 Nov 2015 11:49:55 -0500 Message-ID: <564DFDAF.3000402@redhat.com> Date: Thu, 19 Nov 2015 16:49:51 +0000 From: Pedro Alves User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Oleg Nesterov , Tejun Heo CC: Jan Kratochvil , Andrey Ryabinin , Roland McGrath , LKML Subject: Re: ptrace() hangs on attempt to seize/attach stopped & frozen task References: <5640B7F2.70406@virtuozzo.com> <20151109185506.GA22744@redhat.com> <20151109180207.GA28507@mtj.duckdns.org> <20151110202017.GA2976@redhat.com> <20151116184516.GJ18894@mtj.duckdns.org> <20151117193419.GA9993@redhat.com> In-Reply-To: <20151117193419.GA9993@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/17/2015 07:34 PM, Oleg Nesterov wrote: > On 11/16, Tejun Heo wrote: >>> And perhaps we can simply remove this logic? I forgot why do we hide this >>> STOPPED -> RUNNING -> TRACED transition from the attaching thread. But the >>> vague feeling tells me that we discussed this before and perhaps it was me >>> who suggested to avoid the user-visible change when you introduced this >>> transition... >> >> Heh, it was too long ago for me to remember much. :) > > Same here... > >>> Anyway, now I do not understand why do we want to hide it. Lets consider >>> the following "test-case", >>> >>> void test(int pid) >>> { >>> kill(pid, SIGSTOP); >>> waitpid(pid, NULL, WSTOPPED); >>> >>> ptrace(PTRACE_ATTACH-OR-PTRACE_SEIZE, pid, 0,0); >>> >>> assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0); >>> } >>> >>> Yes, it will fail if we remove JOBCTL_TRAPPING. But it can equally fail >>> if SIGCONT comes before ATTACH, so perhaps we do not really care? >>> >>> Jan, Pedro, do you think the patch below can break gdb somehow? With this >>> patch you can never assume that waitpid(WNOHANG) or ptrace(WHATEVER) will >>> succeed right after PTRACE_ATTACH/PTRACE_SEIZE, even if you know that the >>> tracee was TASK_STOPPED before attach. Not sure, because I don't think I fully understand that proposed change. Both GDB and gdbserver have special processing for attaching to already-stopped processes. (and neither use PTRACE_SEIZE yet.) Here's the gdbserver version: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/gdbserver/linux-low.c;h=41ab510fa4ac5654f101f08efb68e26b5bc5dbd7;hb=HEAD#l903 Copied here for convenience: 907 linux_attach_lwp (ptid_t ptid) 908 { 909 struct lwp_info *new_lwp; 910 int lwpid = ptid_get_lwp (ptid); 911 912 if (ptrace (PTRACE_ATTACH, lwpid, (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) 0) 913 != 0) 914 return errno; 915 916 new_lwp = add_lwp (ptid); 917 918 /* We need to wait for SIGSTOP before being able to make the next 919 ptrace call on this LWP. */ 920 new_lwp->must_set_ptrace_flags = 1; 921 922 if (linux_proc_pid_is_stopped (lwpid)) 923 { 924 if (debug_threads) 925 debug_printf ("Attached to a stopped process\n"); 926 927 /* The process is definitely stopped. It is in a job control 928 stop, unless the kernel predates the TASK_STOPPED / 929 TASK_TRACED distinction, in which case it might be in a 930 ptrace stop. Make sure it is in a ptrace stop; from there we 931 can kill it, signal it, et cetera. 932 933 First make sure there is a pending SIGSTOP. Since we are 934 already attached, the process can not transition from stopped 935 to running without a PTRACE_CONT; so we know this signal will 936 go into the queue. The SIGSTOP generated by PTRACE_ATTACH is 937 probably already in the queue (unless this kernel is old 938 enough to use TASK_STOPPED for ptrace stops); but since 939 SIGSTOP is not an RT signal, it can only be queued once. */ 940 kill_lwp (lwpid, SIGSTOP); 941 942 /* Finally, resume the stopped process. This will deliver the 943 SIGSTOP (or a higher priority signal, just like normal 944 PTRACE_ATTACH), which we'll catch later on. */ 945 ptrace (PTRACE_CONT, lwpid, (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) 0); 946 } 947 948 /* The next time we wait for this LWP we'll see a SIGSTOP as PTRACE_ATTACH 949 brings it to a halt. 950 linux_proc_pid_is_stopped checks whether the state in /proc/pid/status is "T (stopped)". Here's the equivalent in gdb: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/linux-nat.c;h=841ec3949c37438dfba924d8db6b37ffc416dd29;hb=HEAD#l974 This queuing of a SIGSTOP + PTRACE_CONT was necessary because otherwise when gdb attaches to a job stopped process, gdb would hang in the waitpid after PTRACE_ATTACH, waiting for the initial SIGSTOP which would never arrive. If the proposed change makes it so that a new intermediate state can be observed right after PTRACE_ATTACH, and so linux_proc_pid_is_stopped can return false, then there's potential for breakage. But maybe not, if we're sure that that when that happens, waitpid returns for the initial PTRACE_ATTACH-induced SIGSTOP. Thanks, Pedro Alves