ptrace() hangs on attempt to seize/attach stopped & frozen task

All of lore.kernel.org
 help / color / mirror / Atom feed

* ptrace() hangs on attempt to seize/attach stopped & frozen task
@ 2015-11-09 15:12 Andrey Ryabinin
  2015-11-09 18:55 ` Oleg Nesterov
  2015-11-19 18:47 ` [PATCH 0/2] (Was: ptrace() hangs on attempt to seize/attach stopped & frozen task) Oleg Nesterov
  0 siblings, 2 replies; 16+ messages in thread
From: Andrey Ryabinin @ 2015-11-09 15:12 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Roland McGrath, Tejun Heo, LKML

Hi,

So, the ptrace() hangs if we try to attach to stopped task from freezing cgroup.
It seems this was introduced by 5d8f72b55c2756("freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()").
See below for the exact scenario and small script to reproduce this.


Tracee:                                                                 Tracer:
static bool do_signal_stop(int signr)
	__set_current_state(TASK_STOPPED);
	freezable_schedule();
		freezer_do_not_count();
		schedule(); /* waiting for wake up */

									ptrace_attach()
										if (task_is_stopped(task) &&
										    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
											signal_wake_up_state(task, __TASK_STOPPED);

		/* woken up by ptrace_attach() */
		freezer_count();
			__refrigerator()
										/* And here we will hang, because tracee is now frozen in __refrigerator() */
										wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT,
												TASK_UNINTERRUPTIBLE);



Reproduecer:
-----------
#!/bin/bash                                                                                                                                                                                                          

sleep 100 &
pid=$!
kill -SIGSTOP $pid
mkdir -p /sys/fs/cgroup/freezer/test
echo $pid > /sys/fs/cgroup/freezer/test/tasks
echo FROZEN > /sys/fs/cgroup/freezer/test/freezer.state

echo '#include <stdlib.h>                                                                                                                                                                                            
#include <sys/ptrace.h>                                                                                                                                                                                              
                                                                                                                                                                                                                     
int main(int argc, char  **argv)                                                                                                                                                                                     
{                                                                                                                                                                                                                    
  int pid = atoi(argv[1]);                                                                                                                                                                                           
  return ptrace(PTRACE_SEIZE, pid, NULL, 0);                                                                                                                                                                         
}                                                                                                                                                                                                                    
' | gcc -x c -
./a.out $pid



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ptrace() hangs on attempt to seize/attach stopped & frozen task
  2015-11-09 18:55 ` Oleg Nesterov
@ 2015-11-09 18:02   ` Tejun Heo
  2015-11-10 20:20     ` Oleg Nesterov
  2015-11-10 20:20   ` Oleg Nesterov
  1 sibling, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2015-11-09 18:02 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Andrey Ryabinin, Roland McGrath, LKML

Hello, Oleg.

> Thanks. All I can say I never liked this wait_on_bit() ;)
> 
> I need to think, but *at first glance* we can move this wait-for-stopped-
> traced-transition into do_wait() path, and this way clear_jobctl_trapping()
> can use __wake_up_parent(). Perhaps we just need to modify task_stopped_code()
> to take JOBCTL_TRAPPING into account...
> 
> Sure, debugger will block in sys_wait() after PTRACE_ATTACH/SEIZE. But this
> does not really differ from the case when the tracee was already frozen;
> SIGSTOP sent by ATTACH or PTRACE_INTERRUPT, so debugger will equally block
> in do_wait() until the tracee is unfrozen.

We simply need to reimplement cgroup freezer so that its userland
visible state is well defined (most likely jobctl stop).  Right now,
it's allowing userland to trigger "stuck somewhere in the kernel"
condition, so interactions with frozen tasks are naturally broken.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ptrace() hangs on attempt to seize/attach stopped & frozen task
  2015-11-09 15:12 ptrace() hangs on attempt to seize/attach stopped & frozen task Andrey Ryabinin
@ 2015-11-09 18:55 ` Oleg Nesterov
  2015-11-09 18:02   ` Tejun Heo
  2015-11-10 20:20   ` Oleg Nesterov
  2015-11-19 18:47 ` [PATCH 0/2] (Was: ptrace() hangs on attempt to seize/attach stopped & frozen task) Oleg Nesterov
  1 sibling, 2 replies; 16+ messages in thread
From: Oleg Nesterov @ 2015-11-09 18:55 UTC (permalink / raw)
  To: Andrey Ryabinin; +Cc: Roland McGrath, Tejun Heo, LKML

Hi,

On 11/09, Andrey Ryabinin wrote:
>
> Hi,
>
> So, the ptrace() hangs if we try to attach to stopped task from freezing cgroup.
> It seems this was introduced by 5d8f72b55c2756("freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()").

quite possible...

> See below for the exact scenario and small script to reproduce this.
>
>
> Tracee:                                                                 Tracer:
> static bool do_signal_stop(int signr)
> 	__set_current_state(TASK_STOPPED);
> 	freezable_schedule();
> 		freezer_do_not_count();
> 		schedule(); /* waiting for wake up */
>
> 									ptrace_attach()
> 										if (task_is_stopped(task) &&
> 										    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
> 											signal_wake_up_state(task, __TASK_STOPPED);
>
> 		/* woken up by ptrace_attach() */
> 		freezer_count();
> 			__refrigerator()
> 										/* And here we will hang, because tracee is now frozen in __refrigerator() */
> 										wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT,
> 												TASK_UNINTERRUPTIBLE);

Thanks. All I can say I never liked this wait_on_bit() ;)

I need to think, but *at first glance* we can move this wait-for-stopped-
traced-transition into do_wait() path, and this way clear_jobctl_trapping()
can use __wake_up_parent(). Perhaps we just need to modify task_stopped_code()
to take JOBCTL_TRAPPING into account...

Sure, debugger will block in sys_wait() after PTRACE_ATTACH/SEIZE. But this
does not really differ from the case when the tracee was already frozen;
SIGSTOP sent by ATTACH or PTRACE_INTERRUPT, so debugger will equally block
in do_wait() until the tracee is unfrozen.

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ptrace() hangs on attempt to seize/attach stopped & frozen task
  2015-11-09 18:55 ` Oleg Nesterov
  2015-11-09 18:02   ` Tejun Heo
@ 2015-11-10 20:20   ` Oleg Nesterov
  1 sibling, 0 replies; 16+ messages in thread
From: Oleg Nesterov @ 2015-11-10 20:20 UTC (permalink / raw)
  To: Andrey Ryabinin; +Cc: Roland McGrath, Tejun Heo, LKML

On 11/09, Oleg Nesterov wrote:
>
> Hi,
>
> On 11/09, Andrey Ryabinin wrote:
> >
> > Hi,
> >
> > So, the ptrace() hangs if we try to attach to stopped task from freezing cgroup.
> > It seems this was introduced by 5d8f72b55c2756("freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()").
>
> quite possible...

Not really, this was broken before that commit. Note that try_to_freeze()
was called under "relock:" label, so we had the same problem: the tracee
doesn't do do_jobctl_trap() after do_signal_stop() without try_to_freeze()
in between. Not that it matters, but still.

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ptrace() hangs on attempt to seize/attach stopped & frozen task
  2015-11-09 18:02   ` Tejun Heo
@ 2015-11-10 20:20     ` Oleg Nesterov
  2015-11-16 18:45       ` Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2015-11-10 20:20 UTC (permalink / raw)
  To: Tejun Heo, Jan Kratochvil, Pedro Alves
  Cc: Andrey Ryabinin, Roland McGrath, LKML

Hi Tejun,

On 11/09, Tejun Heo wrote:
>
> Hello, Oleg.
>
> > Thanks. All I can say I never liked this wait_on_bit() ;)
> >
> > I need to think, but *at first glance* we can move this wait-for-stopped-
> > traced-transition into do_wait() path, and this way clear_jobctl_trapping()
> > can use __wake_up_parent(). Perhaps we just need to modify task_stopped_code()
> > to take JOBCTL_TRAPPING into account...

It seems that we can simply kill JOBCTL_TRAPPING, no other changes are
needed.

But afaics task_stopped_code()->task_is_stopped_or_traced() looks confusing,
it should not see a TASK_STOPPED task if ptrace, task_is_traced() makes more
sense. This is off-topic.

> > Sure, debugger will block in sys_wait() after PTRACE_ATTACH/SEIZE. But this
> > does not really differ from the case when the tracee was already frozen;
> > SIGSTOP sent by ATTACH or PTRACE_INTERRUPT, so debugger will equally block
> > in do_wait() until the tracee is unfrozen.
>
> We simply need to reimplement cgroup freezer so that its userland
> visible state is well defined (most likely jobctl stop).  Right now,
> it's allowing userland to trigger "stuck somewhere in the kernel"
> condition, so interactions with frozen tasks are naturally broken.

I agree, the freezer is not perfect, and it needs changes.

Still I think this needs a fix in ptrace code. At least we should not
wait in TASK_UNINTERRUPTIBLE state.

And perhaps we can simply remove this logic? I forgot why do we hide this
STOPPED -> RUNNING -> TRACED transition from the attaching thread. But the
vague feeling tells me that we discussed this before and perhaps it was me
who suggested to avoid the user-visible change when you introduced this
transition...

Anyway, now I do not understand why do we want to hide it. Lets consider
the following "test-case",

	void test(int pid)
	{
		kill(pid, SIGSTOP);
		waitpid(pid, NULL, WSTOPPED);

		ptrace(PTRACE_ATTACH-OR-PTRACE_SEIZE, pid, 0,0);

		assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
	}

Yes, it will fail if we remove JOBCTL_TRAPPING. But it can equally fail
if SIGCONT comes before ATTACH, so perhaps we do not really care?

Jan, Pedro, do you think the patch below can break gdb somehow? With this
patch you can never assume that waitpid(WNOHANG) or ptrace(WHATEVER) will
succeed right after PTRACE_ATTACH/PTRACE_SEIZE, even if you know that the
tracee was TASK_STOPPED before attach.

Tejun, do you see any reason to keep JOBCTL_TRAPPING?

Oleg.

--- x/kernel/ptrace.c
+++ x/kernel/ptrace.c
@@ -338,21 +338,13 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * If the task is already STOPPED, set JOBCTL_TRAP_STOP and
 	 * TRAPPING, and kick it so that it transits to TRACED.  TRAPPING
 	 * will be cleared if the child completes the transition or any
-	 * event which clears the group stop states happens.  We'll wait
-	 * for the transition to complete before returning from this
-	 * function.
-	 *
-	 * This hides STOPPED -> RUNNING -> TRACED transition from the
-	 * attaching thread but a different thread in the same group can
-	 * still observe the transient RUNNING state.  IOW, if another
-	 * thread's WNOHANG wait(2) on the stopped tracee races against
-	 * ATTACH, the wait(2) may fail due to the transient RUNNING.
+	 * event which clears the group stop states happens.
 	 *
 	 * The following task_is_stopped() test is safe as both transitions
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP))
 		signal_wake_up_state(task, __TASK_STOPPED);

 	spin_unlock(&task->sighand->siglock);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ptrace() hangs on attempt to seize/attach stopped & frozen task
  2015-11-10 20:20     ` Oleg Nesterov
@ 2015-11-16 18:45       ` Tejun Heo
  2015-11-17 19:34         ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2015-11-16 18:45 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jan Kratochvil, Pedro Alves, Andrey Ryabinin, Roland McGrath, LKML

Hello, Oleg.

Sorry about the delay.

On Tue, Nov 10, 2015 at 09:20:17PM +0100, Oleg Nesterov wrote:
> > We simply need to reimplement cgroup freezer so that its userland
> > visible state is well defined (most likely jobctl stop).  Right now,
> > it's allowing userland to trigger "stuck somewhere in the kernel"
> > condition, so interactions with frozen tasks are naturally broken.
> 
> I agree, the freezer is not perfect, and it needs changes.
> 
> Still I think this needs a fix in ptrace code. At least we should not
> wait in TASK_UNINTERRUPTIBLE state.
> 
> And perhaps we can simply remove this logic? I forgot why do we hide this
> STOPPED -> RUNNING -> TRACED transition from the attaching thread. But the
> vague feeling tells me that we discussed this before and perhaps it was me
> who suggested to avoid the user-visible change when you introduced this
> transition...

Heh, it was too long ago for me to remember much. :)

> Anyway, now I do not understand why do we want to hide it. Lets consider
> the following "test-case",
> 
> 	void test(int pid)
> 	{
> 		kill(pid, SIGSTOP);
> 		waitpid(pid, NULL, WSTOPPED);
> 
> 		ptrace(PTRACE_ATTACH-OR-PTRACE_SEIZE, pid, 0,0);
> 
> 		assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
> 	}
> 
> Yes, it will fail if we remove JOBCTL_TRAPPING. But it can equally fail
> if SIGCONT comes before ATTACH, so perhaps we do not really care?
> 
> Jan, Pedro, do you think the patch below can break gdb somehow? With this
> patch you can never assume that waitpid(WNOHANG) or ptrace(WHATEVER) will
> succeed right after PTRACE_ATTACH/PTRACE_SEIZE, even if you know that the
> tracee was TASK_STOPPED before attach.
> 
> Tejun, do you see any reason to keep JOBCTL_TRAPPING?

Hmmm... It's nasty tho.  We're breaking a guaranteed userland behavior
to mask a deficiency (IMHO it's an outright bug) in a different
subsystem.  The problem here is that cgroup-frozen threads become
un-runnable on a running system and it doesn't make sense to me to
work around that from all the affected places rather than fixing it at
the source especially if that involves breaking a known supported
userland behavior.  This isn't different from the frozen processes
failing to respond to SIGKILL.  I'd be a lot more comfortable stating
that cgroup freezer is currently broken rather than diddling with
subtle ptrace semantics.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ptrace() hangs on attempt to seize/attach stopped & frozen task
  2015-11-17 19:34         ` Oleg Nesterov
@ 2015-11-17 18:57           ` Tejun Heo
  2015-11-19 16:49           ` Pedro Alves
  1 sibling, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2015-11-17 18:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jan Kratochvil, Pedro Alves, Andrey Ryabinin, Roland McGrath, LKML

Hey, Oleg.

On Tue, Nov 17, 2015 at 08:34:19PM +0100, Oleg Nesterov wrote:
> On 11/16, Tejun Heo wrote:
> >
> > *** WARNING: THE ATTACHED DOCUMENT(S) CONTAIN MACROS ***
> > *** MACROS MAY CONTAIN MALICIOUS CODE ***
> > *** Open only if you can verify and trust the sender ***
> > *** Please contact infosec@redhat.com if you have questions or concerns **
> 
> Hmm, infosec@redhat.com doesn't like you. But I dared to open and nothing
> happened so far. although perhaps you already own my machine.

lol no idea what's going on there but dude you gotta clean up the
browsing history.

> > Hmmm... It's nasty tho.  We're breaking a guaranteed userland behavior
> 
> Perhaps you are right, but I am wondering if it was ever guaranteed.
> 
> What actually annoys me is that now I am almost sure that it was me
> who asked you to hide this from user-space, and today I see no reason
> for this hack.
> 
> > I'd be a lot more comfortable stating
> > that cgroup freezer is currently broken rather than diddling with
> > subtle ptrace semantics.
> 
> OK, lets keep this JOBCTL_TRAPPING_BIT.
> 
> But still I would like to know what Pedro thinks...
> 
> Anyway, wait_on_bit(TASK_UNINTERRUPTIBLE) doesn't look good. Do you
> see any problem with the change below? Yes, the comment is not clear,
> it should be updated, the tracee can clear this bit too.
> 
> And perhaps we can change get_task_state() until freezer gets another state,
> 
> 	--- x/fs/proc/array.c
> 	+++ x/fs/proc/array.c
> 	@@ -126,6 +126,9 @@ static inline const char *get_task_state
> 	 {
> 		unsigned int state = (tsk->state | tsk->exit_state) & TASK_REPORT;
> 	 
> 	+	if (tsk->flags & PF_FROZEN)
> 	+		return "D (frozen)";
> 	+
> 		BUILD_BUG_ON(1 + ilog2(TASK_REPORT) != ARRAY_SIZE(task_state_array)-1);
> 	 
> 		return task_state_array[fls(state)];

Hmm... the only nit is that we'll eventually want to share "T
(stopped)" or do "T (frozen)" and switching down the road could be a
bit confusing.  It shouldn't be a big deal tho.  I think I'm mostly
reluctant to accomodate the broken behavior of cgroup freezer.

> --- x/kernel/ptrace.c
> +++ x/kernel/ptrace.c
> @@ -364,8 +364,13 @@ unlock_creds:
>  	mutex_unlock(&task->signal->cred_guard_mutex);
>  out:
>  	if (!retval) {
> -		wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT,
> -			    TASK_UNINTERRUPTIBLE);
> +		if (wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT,
> +				TASK_KILLABLE))
> +			/*
> +			 * We will clear JOBCTL_TRAPPING in __ptrace_unlink(),
> +			 * until then nobody can trace this task anyway.
> +			 */
> +			retval = -EINTR;

Yeah, this looks good to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ptrace() hangs on attempt to seize/attach stopped & frozen task
  2015-11-16 18:45       ` Tejun Heo
@ 2015-11-17 19:34         ` Oleg Nesterov
  2015-11-17 18:57           ` Tejun Heo
  2015-11-19 16:49           ` Pedro Alves
  0 siblings, 2 replies; 16+ messages in thread
From: Oleg Nesterov @ 2015-11-17 19:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jan Kratochvil, Pedro Alves, Andrey Ryabinin, Roland McGrath, LKML

On 11/16, Tejun Heo wrote:
>
> *** WARNING: THE ATTACHED DOCUMENT(S) CONTAIN MACROS ***
> *** MACROS MAY CONTAIN MALICIOUS CODE ***
> *** Open only if you can verify and trust the sender ***
> *** Please contact infosec@redhat.com if you have questions or concerns **

Hmm, infosec@redhat.com doesn't like you. But I dared to open and nothing
happened so far. although perhaps you already own my machine.

> > And perhaps we can simply remove this logic? I forgot why do we hide this
> > STOPPED -> RUNNING -> TRACED transition from the attaching thread. But the
> > vague feeling tells me that we discussed this before and perhaps it was me
> > who suggested to avoid the user-visible change when you introduced this
> > transition...
>
> Heh, it was too long ago for me to remember much. :)

Same here...

> > Anyway, now I do not understand why do we want to hide it. Lets consider
> > the following "test-case",
> >
> > 	void test(int pid)
> > 	{
> > 		kill(pid, SIGSTOP);
> > 		waitpid(pid, NULL, WSTOPPED);
> >
> > 		ptrace(PTRACE_ATTACH-OR-PTRACE_SEIZE, pid, 0,0);
> >
> > 		assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
> > 	}
> >
> > Yes, it will fail if we remove JOBCTL_TRAPPING. But it can equally fail
> > if SIGCONT comes before ATTACH, so perhaps we do not really care?
> >
> > Jan, Pedro, do you think the patch below can break gdb somehow? With this
> > patch you can never assume that waitpid(WNOHANG) or ptrace(WHATEVER) will
> > succeed right after PTRACE_ATTACH/PTRACE_SEIZE, even if you know that the
> > tracee was TASK_STOPPED before attach.
> >
> > Tejun, do you see any reason to keep JOBCTL_TRAPPING?
>
> Hmmm... It's nasty tho.  We're breaking a guaranteed userland behavior

Perhaps you are right, but I am wondering if it was ever guaranteed.

What actually annoys me is that now I am almost sure that it was me
who asked you to hide this from user-space, and today I see no reason
for this hack.

> I'd be a lot more comfortable stating
> that cgroup freezer is currently broken rather than diddling with
> subtle ptrace semantics.

OK, lets keep this JOBCTL_TRAPPING_BIT.

But still I would like to know what Pedro thinks...

Anyway, wait_on_bit(TASK_UNINTERRUPTIBLE) doesn't look good. Do you
see any problem with the change below? Yes, the comment is not clear,
it should be updated, the tracee can clear this bit too.

And perhaps we can change get_task_state() until freezer gets another state,

	--- x/fs/proc/array.c
	+++ x/fs/proc/array.c
	@@ -126,6 +126,9 @@ static inline const char *get_task_state
	 {
		unsigned int state = (tsk->state | tsk->exit_state) & TASK_REPORT;
	 
	+	if (tsk->flags & PF_FROZEN)
	+		return "D (frozen)";
	+
		BUILD_BUG_ON(1 + ilog2(TASK_REPORT) != ARRAY_SIZE(task_state_array)-1);
	 
		return task_state_array[fls(state)];

?

Oleg.

--- x/kernel/ptrace.c
+++ x/kernel/ptrace.c
@@ -364,8 +364,13 @@ unlock_creds:
 	mutex_unlock(&task->signal->cred_guard_mutex);
 out:
 	if (!retval) {
-		wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT,
-			    TASK_UNINTERRUPTIBLE);
+		if (wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT,
+				TASK_KILLABLE))
+			/*
+			 * We will clear JOBCTL_TRAPPING in __ptrace_unlink(),
+			 * until then nobody can trace this task anyway.
+			 */
+			retval = -EINTR;
 		proc_ptrace_connector(task, PTRACE_ATTACH);
 	}
 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ptrace() hangs on attempt to seize/attach stopped & frozen task
  2015-11-17 19:34         ` Oleg Nesterov
  2015-11-17 18:57           ` Tejun Heo
@ 2015-11-19 16:49           ` Pedro Alves
  2015-11-19 17:47             ` Oleg Nesterov
  1 sibling, 1 reply; 16+ messages in thread
From: Pedro Alves @ 2015-11-19 16:49 UTC (permalink / raw)
  To: Oleg Nesterov, Tejun Heo
  Cc: Jan Kratochvil, Andrey Ryabinin, Roland McGrath, LKML

On 11/17/2015 07:34 PM, Oleg Nesterov wrote:
> On 11/16, Tejun Heo wrote:

>>> And perhaps we can simply remove this logic? I forgot why do we hide this
>>> STOPPED -> RUNNING -> TRACED transition from the attaching thread. But the
>>> vague feeling tells me that we discussed this before and perhaps it was me
>>> who suggested to avoid the user-visible change when you introduced this
>>> transition...
>>
>> Heh, it was too long ago for me to remember much. :)
> 
> Same here...
> 
>>> Anyway, now I do not understand why do we want to hide it. Lets consider
>>> the following "test-case",
>>>
>>> 	void test(int pid)
>>> 	{
>>> 		kill(pid, SIGSTOP);
>>> 		waitpid(pid, NULL, WSTOPPED);
>>>
>>> 		ptrace(PTRACE_ATTACH-OR-PTRACE_SEIZE, pid, 0,0);
>>>
>>> 		assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
>>> 	}
>>>
>>> Yes, it will fail if we remove JOBCTL_TRAPPING. But it can equally fail
>>> if SIGCONT comes before ATTACH, so perhaps we do not really care?
>>>
>>> Jan, Pedro, do you think the patch below can break gdb somehow? With this
>>> patch you can never assume that waitpid(WNOHANG) or ptrace(WHATEVER) will
>>> succeed right after PTRACE_ATTACH/PTRACE_SEIZE, even if you know that the
>>> tracee was TASK_STOPPED before attach.

Not sure, because I don't think I fully understand that proposed change.

Both GDB and gdbserver have special processing for attaching to already-stopped
processes.  (and neither use PTRACE_SEIZE yet.)

Here's the gdbserver version:

 https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/gdbserver/linux-low.c;h=41ab510fa4ac5654f101f08efb68e26b5bc5dbd7;hb=HEAD#l903

Copied here for convenience:

 907 linux_attach_lwp (ptid_t ptid)
 908 {
 909   struct lwp_info *new_lwp;
 910   int lwpid = ptid_get_lwp (ptid);
 911
 912   if (ptrace (PTRACE_ATTACH, lwpid, (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) 0)
 913       != 0)
 914     return errno;
 915
 916   new_lwp = add_lwp (ptid);
 917
 918   /* We need to wait for SIGSTOP before being able to make the next
 919      ptrace call on this LWP.  */
 920   new_lwp->must_set_ptrace_flags = 1;
 921
 922   if (linux_proc_pid_is_stopped (lwpid))
 923     {
 924       if (debug_threads)
 925         debug_printf ("Attached to a stopped process\n");
 926
 927       /* The process is definitely stopped.  It is in a job control
 928          stop, unless the kernel predates the TASK_STOPPED /
 929          TASK_TRACED distinction, in which case it might be in a
 930          ptrace stop.  Make sure it is in a ptrace stop; from there we
 931          can kill it, signal it, et cetera.
 932
 933          First make sure there is a pending SIGSTOP.  Since we are
 934          already attached, the process can not transition from stopped
 935          to running without a PTRACE_CONT; so we know this signal will
 936          go into the queue.  The SIGSTOP generated by PTRACE_ATTACH is
 937          probably already in the queue (unless this kernel is old
 938          enough to use TASK_STOPPED for ptrace stops); but since
 939          SIGSTOP is not an RT signal, it can only be queued once.  */
 940       kill_lwp (lwpid, SIGSTOP);
 941
 942       /* Finally, resume the stopped process.  This will deliver the
 943          SIGSTOP (or a higher priority signal, just like normal
 944          PTRACE_ATTACH), which we'll catch later on.  */
 945       ptrace (PTRACE_CONT, lwpid, (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) 0);
 946     }
 947
 948   /* The next time we wait for this LWP we'll see a SIGSTOP as PTRACE_ATTACH
 949      brings it to a halt.
 950

linux_proc_pid_is_stopped checks whether the state in /proc/pid/status is "T (stopped)".

Here's the equivalent in gdb:

  https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/linux-nat.c;h=841ec3949c37438dfba924d8db6b37ffc416dd29;hb=HEAD#l974

This queuing of a SIGSTOP + PTRACE_CONT was necessary because
otherwise when gdb attaches to a job stopped process, gdb would hang in the waitpid
after PTRACE_ATTACH, waiting for the initial SIGSTOP which would never arrive.

If the proposed change makes it so that a new intermediate state can be observed
right after PTRACE_ATTACH, and so linux_proc_pid_is_stopped can return false,
then there's potential for breakage.  But maybe not, if we're sure that
that when that happens, waitpid returns for the initial
PTRACE_ATTACH-induced SIGSTOP.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ptrace() hangs on attempt to seize/attach stopped & frozen task
  2015-11-19 16:49           ` Pedro Alves
@ 2015-11-19 17:47             ` Oleg Nesterov
  2015-11-19 18:08               ` Pedro Alves
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2015-11-19 17:47 UTC (permalink / raw)
  To: Pedro Alves
  Cc: Tejun Heo, Jan Kratochvil, Andrey Ryabinin, Roland McGrath, LKML

Thanks Pedro for your email,

I'll recheck tomorrow, but at first glance:

On 11/19, Pedro Alves wrote:
>
> Both GDB and gdbserver have special processing for attaching to already-stopped
> processes.

Yes, I am starting to recall that I have looked at this code years ago ;)

>  907 linux_attach_lwp (ptid_t ptid)
>  908 {
>  909   struct lwp_info *new_lwp;
>  910   int lwpid = ptid_get_lwp (ptid);
>  911
>  912   if (ptrace (PTRACE_ATTACH, lwpid, (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) 0)
>  913       != 0)
>  914     return errno;
>  915
>  916   new_lwp = add_lwp (ptid);
>  917
>  918   /* We need to wait for SIGSTOP before being able to make the next
>  919      ptrace call on this LWP.  */
>  920   new_lwp->must_set_ptrace_flags = 1;
>  921
>  922   if (linux_proc_pid_is_stopped (lwpid))

This can't happen today. Starting from v3.0 at least.

> This queuing of a SIGSTOP + PTRACE_CONT was necessary because
> otherwise when gdb attaches to a job stopped process, gdb would hang in the waitpid
> after PTRACE_ATTACH, waiting for the initial SIGSTOP which would never arrive.

Yes, because its exit code could be already cleared iirc. This was fixed
even before.

> If the proposed change makes it so that a new intermediate state can be observed
> right after PTRACE_ATTACH, and so linux_proc_pid_is_stopped can return false,
> then there's potential for breakage.

See above,

> But maybe not, if we're sure that
> that when that happens, waitpid returns for the initial
> PTRACE_ATTACH-induced SIGSTOP.

Yes. Just you can't assume that watpid(WNOHANG) will succeed. Is it OK?

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ptrace() hangs on attempt to seize/attach stopped & frozen task
  2015-11-19 17:47             ` Oleg Nesterov
@ 2015-11-19 18:08               ` Pedro Alves
  0 siblings, 0 replies; 16+ messages in thread
From: Pedro Alves @ 2015-11-19 18:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, Jan Kratochvil, Andrey Ryabinin, Roland McGrath, LKML

On 11/19/2015 05:47 PM, Oleg Nesterov wrote:
> Thanks Pedro for your email,
> 

>>  918   /* We need to wait for SIGSTOP before being able to make the next
>>  919      ptrace call on this LWP.  */
>>  920   new_lwp->must_set_ptrace_flags = 1;
>>  921
>>  922   if (linux_proc_pid_is_stopped (lwpid))
> 
> This can't happen today. Starting from v3.0 at least.

Eh, interesting.  So right after PTRACE_ATTACH, we either observe
"running" or "ptrace-stopped", but never "job stopped".  Correct?

I've actually just now tried this:

diff --git c/gdb/linux-nat.c w/gdb/linux-nat.c
index 841ec39..42f2b0d 100644
--- c/gdb/linux-nat.c
+++ w/gdb/linux-nat.c
@@ -981,6 +981,7 @@ linux_nat_post_attach_wait (ptid_t ptid, int first, int *cloned,
   pid_t new_pid, pid = ptid_get_lwp (ptid);
   int status;

+#if 0
   if (linux_proc_pid_is_stopped (pid))
     {
       if (debug_linux_nat)
@@ -1006,6 +1007,7 @@ linux_nat_post_attach_wait (ptid_t ptid, int first, int *cloned,
         (or a higher priority signal, just like normal PTRACE_ATTACH).  */
       ptrace (PTRACE_CONT, pid, 0, 0);
     }
+#endif

   /* Make sure the initial process is stopped.  The user-level threads
      layer might want to poke around in the inferior, and that won't

and sure enough, gdb's test that covers that use case still
passes, on Fedora 20 (Linux 3.19.8).

And given that my Thunderbird crashed while writing this, I had sufficient
time to be sure that a full test run passes cleanly too.  :-P  :-)

>> But maybe not, if we're sure that
>> that when that happens, waitpid returns for the initial
>> PTRACE_ATTACH-induced SIGSTOP.
> 
> Yes. Just you can't assume that watpid(WNOHANG) will succeed. Is it OK?

Yes, assuming the ptracer is guaranteed to get a SIGCHLD to wake it up.

Thanks,
Pedro Alves

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 0/2] (Was: ptrace() hangs on attempt to seize/attach stopped & frozen task)
  2015-11-09 15:12 ptrace() hangs on attempt to seize/attach stopped & frozen task Andrey Ryabinin
  2015-11-09 18:55 ` Oleg Nesterov
@ 2015-11-19 18:47 ` Oleg Nesterov
  2015-11-19 18:47   ` [PATCH 1/2] ptrace: make wait_on_bit(JOBCTL_TRAPPING_BIT) in ptrace_attach() killable Oleg Nesterov
  2015-11-19 18:47   ` [PATCH 2/2] ptrace: task_stopped_code(ptrace => true) can't see TASK_STOPPED task Oleg Nesterov
  1 sibling, 2 replies; 16+ messages in thread
From: Oleg Nesterov @ 2015-11-19 18:47 UTC (permalink / raw)
  To: Andrew Morton, Andrey Ryabinin
  Cc: Roland McGrath, Tejun Heo, LKML, Pedro Alves, Jan Kratochvil

On 11/09, Andrey Ryabinin wrote:
>
> So, the ptrace() hangs if we try to attach to stopped task from freezing cgroup.

OK, lets make it killable at least, TASK_KILLABLE looks better in
any case.

I am starting to think again that we should remove JOBCTL_TRAPPING,
but this needs a bit more discussion and probably testing.

Anyway I agree with Tejun, the freezer doesn't look nice too.

Hmm. I need to re-check but it seems we have another problem...
ptrace_stop() does task_participate_group_stop() and this (in particular)
clears JOBCTL_STOP_PENDING. This is only correct if the task is still traced,
but this is not necessarily true and we can in fact clear JOBCTL_STOP_PENDING
set by __ptrace_unlink() called by the exiting tracer.

Oleg.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] ptrace: make wait_on_bit(JOBCTL_TRAPPING_BIT) in ptrace_attach() killable
  2015-11-19 18:47 ` [PATCH 0/2] (Was: ptrace() hangs on attempt to seize/attach stopped & frozen task) Oleg Nesterov
@ 2015-11-19 18:47   ` Oleg Nesterov
  2015-11-23 23:05     ` Tejun Heo
  2015-11-19 18:47   ` [PATCH 2/2] ptrace: task_stopped_code(ptrace => true) can't see TASK_STOPPED task Oleg Nesterov
  1 sibling, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2015-11-19 18:47 UTC (permalink / raw)
  To: Andrew Morton, Andrey Ryabinin
  Cc: Roland McGrath, Tejun Heo, LKML, Pedro Alves, Jan Kratochvil

ptrace_attach() can hang waiting for STOPPED -> TRACED transition if the
tracee gets frozen in between, change wait_on_bit() to use TASK_KILLABLE.

This doesn't really solve the problem(s) and we probably need to fix the
freezer. In particular, note that this means that pm freezer will fail if
it races attach-to-stopped-task.

And otoh perhaps we can just remove JOBCTL_TRAPPING_BIT altogether, it is
not clear if we really need to hide this transition from debugger, WNOHANG
after PTRACE_ATTACH can fail anyway if it races with SIGCONT.

Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/ptrace.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c8e0e05..80b3604 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -364,8 +364,14 @@ unlock_creds:
 	mutex_unlock(&task->signal->cred_guard_mutex);
 out:
 	if (!retval) {
-		wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT,
-			    TASK_UNINTERRUPTIBLE);
+		/*
+		 * We do not bother to change retval or clear JOBCTL_TRAPPING
+		 * if wait_on_bit() was interrupted by SIGKILL. The tracer will
+		 * not return to user-mode, it will exit and clear this bit in
+		 * __ptrace_unlink() if it wasn't already cleared by the tracee;
+		 * and until then nobody can ptrace this task.
+		 */
+		wait_on_bit(&task->jobctl, JOBCTL_TRAPPING_BIT, TASK_KILLABLE);
 		proc_ptrace_connector(task, PTRACE_ATTACH);
 	}
 
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] ptrace: task_stopped_code(ptrace => true) can't see TASK_STOPPED task
  2015-11-19 18:47 ` [PATCH 0/2] (Was: ptrace() hangs on attempt to seize/attach stopped & frozen task) Oleg Nesterov
  2015-11-19 18:47   ` [PATCH 1/2] ptrace: make wait_on_bit(JOBCTL_TRAPPING_BIT) in ptrace_attach() killable Oleg Nesterov
@ 2015-11-19 18:47   ` Oleg Nesterov
  2015-11-23 23:15     ` Tejun Heo
  1 sibling, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2015-11-19 18:47 UTC (permalink / raw)
  To: Andrew Morton, Andrey Ryabinin
  Cc: Roland McGrath, Tejun Heo, LKML, Pedro Alves, Jan Kratochvil

task_stopped_code()->task_is_stopped_or_traced() doesn't look right,
the traced task must never be TASK_STOPPED.

We can not add WARN_ON(task_is_stopped(p)), but this is only because
do_wait() can race with PTRACE_ATTACH from another thread.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index c090738..fc7b4ec 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1127,7 +1127,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
 static int *task_stopped_code(struct task_struct *p, bool ptrace)
 {
 	if (ptrace) {
-		if (task_is_stopped_or_traced(p) &&
+		if (task_is_traced(p) &&
 		    !(p->jobctl & JOBCTL_LISTENING))
 			return &p->exit_code;
 	} else {
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] ptrace: make wait_on_bit(JOBCTL_TRAPPING_BIT) in ptrace_attach() killable
  2015-11-19 18:47   ` [PATCH 1/2] ptrace: make wait_on_bit(JOBCTL_TRAPPING_BIT) in ptrace_attach() killable Oleg Nesterov
@ 2015-11-23 23:05     ` Tejun Heo
  0 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2015-11-23 23:05 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Andrey Ryabinin, Roland McGrath, LKML,
	Pedro Alves, Jan Kratochvil

On Thu, Nov 19, 2015 at 07:47:32PM +0100, Oleg Nesterov wrote:
> ptrace_attach() can hang waiting for STOPPED -> TRACED transition if the
> tracee gets frozen in between, change wait_on_bit() to use TASK_KILLABLE.
> 
> This doesn't really solve the problem(s) and we probably need to fix the
> freezer. In particular, note that this means that pm freezer will fail if
> it races attach-to-stopped-task.
> 
> And otoh perhaps we can just remove JOBCTL_TRAPPING_BIT altogether, it is
> not clear if we really need to hide this transition from debugger, WNOHANG
> after PTRACE_ATTACH can fail anyway if it races with SIGCONT.
> 
> Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] ptrace: task_stopped_code(ptrace => true) can't see TASK_STOPPED task
  2015-11-19 18:47   ` [PATCH 2/2] ptrace: task_stopped_code(ptrace => true) can't see TASK_STOPPED task Oleg Nesterov
@ 2015-11-23 23:15     ` Tejun Heo
  0 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2015-11-23 23:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Andrey Ryabinin, Roland McGrath, LKML,
	Pedro Alves, Jan Kratochvil

On Thu, Nov 19, 2015 at 07:47:52PM +0100, Oleg Nesterov wrote:
> task_stopped_code()->task_is_stopped_or_traced() doesn't look right,
> the traced task must never be TASK_STOPPED.
> 
> We can not add WARN_ON(task_is_stopped(p)), but this is only because
> do_wait() can race with PTRACE_ATTACH from another thread.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

I think it's a left over from before.  We used to leave STOPPED tasks
alone during ptrace attach.

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-11-23 23:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-09 15:12 ptrace() hangs on attempt to seize/attach stopped & frozen task Andrey Ryabinin
2015-11-09 18:55 ` Oleg Nesterov
2015-11-09 18:02   ` Tejun Heo
2015-11-10 20:20     ` Oleg Nesterov
2015-11-16 18:45       ` Tejun Heo
2015-11-17 19:34         ` Oleg Nesterov
2015-11-17 18:57           ` Tejun Heo
2015-11-19 16:49           ` Pedro Alves
2015-11-19 17:47             ` Oleg Nesterov
2015-11-19 18:08               ` Pedro Alves
2015-11-10 20:20   ` Oleg Nesterov
2015-11-19 18:47 ` [PATCH 0/2] (Was: ptrace() hangs on attempt to seize/attach stopped & frozen task) Oleg Nesterov
2015-11-19 18:47   ` [PATCH 1/2] ptrace: make wait_on_bit(JOBCTL_TRAPPING_BIT) in ptrace_attach() killable Oleg Nesterov
2015-11-23 23:05     ` Tejun Heo
2015-11-19 18:47   ` [PATCH 2/2] ptrace: task_stopped_code(ptrace => true) can't see TASK_STOPPED task Oleg Nesterov
2015-11-23 23:15     ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.