All of lore.kernel.org
 help / color / mirror / Atom feed
* Race in ptrace.
@ 2010-02-08 22:16 Salman Qazi
       [not found] ` <20100208143231.6d804590.akpm@linux-foundation.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Salman Qazi @ 2010-02-08 22:16 UTC (permalink / raw)
  To: akpm, linux-kernel

Greetings,

A race in ptrace was pointed to us by a fellow Google engineer, Tavis
Ormandy.  The race involves interaction between a tracer, a tracee and 
an antagonist.  The tracer is tracing the tracee with PTRACE_SYSCALL and
waits on the tracee.  In the mean time, an antagonist blasts the tracee
with SIGCONTs.

The observed issue is that sometimes when the tracer attempts to continue the tracee
with PTRACE_SYSCALL, it gets a return value of -ESRCH, indicating that the tracee is already
running (or not being traced).  It turns out that a SIGCONT wakes up the
tracee in kernel mode, and for a moment the tracee's state is TASK_RUNNING
then in ptrace_stop we hit the condition where the tracee is found to be running (and thus
not traced).  If the syscall is repeated, the second time it usually 
succeeds (because by that time, the tracee has been put into TASK_TRACED).

Below is a quick and dirty fix for the one instance that I did
figure out.  Note that this doesn't completely close the race on 
2.6.33-rc6.  But on 2.6.26 it appears to be sufficient.  I suspect there 
are other code paths with similar issues:

    Fix a race in ptrace.
    
    Race description:
    
    The traced process is running for a small duration
    of time between when it is sent a SIGCONT and when it realizes that it needs
    to be asleep in order to get traced.  If during this time the tracer calls
    ptrace with PTRACE_SYSCALL, it recieves an errno value of -ESRCH.
    
    Solution:
    
    We add a new bit to the ptrace field of task_struct.  We call this PT_WAKING.
    When the process is being awoken for a SIGCONT signal, we set this bit before
    state changes to TASK_RUNNING.  When the process is about to go to sleep, we
    reset this bit after we change the state to TASK_TRACED.

Signed-off-by: Salman Qazi <sqazi@google.com>

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 56f2d63..6c6771a 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -67,8 +67,9 @@
 #define PT_TRACE_EXEC	0x00000080
 #define PT_TRACE_VFORK_DONE	0x00000100
 #define PT_TRACE_EXIT	0x00000200
+#define PT_WAKING	0x00000400
 
-#define PT_TRACE_MASK	0x000003f4
+#define PT_TRACE_MASK	0x000007f4
 
 /* single stepping state bits (used on ARM and PA-RISC) */
 #define PT_SINGLESTEP_BIT	31
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 23bd09c..32157f8 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -104,7 +104,8 @@ int ptrace_check_attach(struct task_struct *child, int kill)
 		spin_lock_irq(&child->sighand->siglock);
 		if (task_is_stopped(child))
 			child->state = TASK_TRACED;
-		else if (!task_is_traced(child) && !kill)
+		else if (!task_is_traced(child) && !kill &&
+				(!(child->ptrace & PT_WAKING)))
 			ret = -ESRCH;
 		spin_unlock_irq(&child->sighand->siglock);
 	}
diff --git a/kernel/signal.c b/kernel/signal.c
index 934ae5e..095507e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -697,6 +697,10 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		 * and wake all threads.
 		 */
 		rm_from_queue(SIG_KERNEL_STOP_MASK, &signal->shared_pending);
+		if (p->ptrace & PT_PTRACED) {
+			p->ptrace |= PT_WAKING;
+			mb();
+		}
 		t = p;
 		do {
 			unsigned int state;
@@ -1626,6 +1630,10 @@ static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
 
 	/* Let the debugger run.  */
 	__set_current_state(TASK_TRACED);
+	if (current->ptrace & PT_PTRACED) {
+		mb();
+		current->ptrace &= ~PT_WAKING;
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
 	if (may_ptrace_stop()) {

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Race in ptrace.
       [not found] ` <20100208143231.6d804590.akpm@linux-foundation.org>
@ 2010-02-09 11:27   ` Oleg Nesterov
  2010-02-10 13:35     ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2010-02-09 11:27 UTC (permalink / raw)
  To: Salman Qazi; +Cc: Roland Dreier, Andrew Morton, Roland McGrath, linux-kernel

Salman Qazi wrote:
>
> A race in ptrace was pointed to us by a fellow Google engineer, Tavis
> Ormandy.  The race involves interaction between a tracer, a tracee and
> an antagonist.  The tracer is tracing the tracee with PTRACE_SYSCALL and
> waits on the tracee.  In the mean time, an antagonist blasts the tracee
> with SIGCONTs.

Could you please explain how did observe this race? Do you have a
test-case, or could you explain how we can reproduce it?

Because,

> It turns out that a SIGCONT wakes up the
> tracee in kernel mode,

SIGCONT must not wake up a TASK_TRACED task.


Yes, there are some know problems with SIGCONT && ptrace, but I don't
see anthing related to the described problem.

Thanks,

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Race in ptrace.
  2010-02-09 11:27   ` Oleg Nesterov
@ 2010-02-10 13:35     ` Oleg Nesterov
  2010-02-10 18:38       ` Salman Qazi
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2010-02-10 13:35 UTC (permalink / raw)
  To: Salman Qazi; +Cc: Roland Dreier, Andrew Morton, Roland McGrath, linux-kernel

On 02/09, Oleg Nesterov wrote:
>
> Salman Qazi wrote:
> >
> > A race in ptrace was pointed to us by a fellow Google engineer, Tavis
> > Ormandy.  The race involves interaction between a tracer, a tracee and
> > an antagonist.  The tracer is tracing the tracee with PTRACE_SYSCALL and
> > waits on the tracee.  In the mean time, an antagonist blasts the tracee
> > with SIGCONTs.
>
> Could you please explain how did observe this race? Do you have a
> test-case, or could you explain how we can reproduce it?
>
> Because,
>
> > It turns out that a SIGCONT wakes up the
> > tracee in kernel mode,
>
> SIGCONT must not wake up a TASK_TRACED task.

In case I wasn't clear...

--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -697,6 +697,10 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		 * and wake all threads.
 		 */
 		rm_from_queue(SIG_KERNEL_STOP_MASK, &signal->shared_pending);
+		if (p->ptrace & PT_PTRACED) {
+			p->ptrace |= PT_WAKING;
+			mb();
+		}

Please note that we are going to do wake_up_state(state), and
this state can never have __TASK_TRACED bit set.

And we can't change ->ptrace here, we can race with the tracer.

There are other problems with this patch, but the main problem
is that I can't understand what this patch tries to fix.

IOW, please provide more info ;)

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Race in ptrace.
  2010-02-10 13:35     ` Oleg Nesterov
@ 2010-02-10 18:38       ` Salman Qazi
  2010-02-11 12:56         ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Salman Qazi @ 2010-02-10 18:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: taviso, Roland Dreier, Andrew Morton, Roland McGrath, linux-kernel

[+tavis]

Sorry for the delayed response.  I was out sick yesterday.  I have
made a simpler version of tavis's test case:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sched.h>
#include <errno.h>
#include <sys/ptrace.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/wait.h>

int child_pid;
int ant_pid;

void child(void)
{
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        asm("int3");
        while(1);
}

void antagonist(void)
{
        while (1) {
                kill(child_pid, SIGSTOP);
                usleep(2);
                kill(child_pid, SIGCONT);
                usleep(2);
        }
}

int do_fork(void (*callback)(void))
{
        int pid;
        pid = fork();
        if (pid)
                return pid;
        callback();

        exit(EXIT_SUCCESS);
}

int main(int argc, char **argv)
{
        int status;
        assert((child_pid = do_fork(child)) > 0);
        assert((ant_pid = do_fork(antagonist)) > 0);
        waitpid(child_pid, &status, 0);
        ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL);
        while(1) {
                if (waitpid(child_pid, &status, 0) <= 0) {
                        printf("Errno %d\n", errno);
                        exit(EXIT_FAILURE);
                }
                if (WIFSTOPPED(status)) {
                        printf("stopped: %d\n", WSTOPSIG(status));

                        /* This should work, but sometimes it doesn't */
                        if (ptrace(PTRACE_SYSCALL, child_pid,
                                                NULL, WSTOPSIG(status)) < 0) {
                                /* Oddly it works the second time! */
                                assert (ptrace(PTRACE_SYSCALL,
                                        child_pid, NULL, WSTOPSIG(status)) < 0);
                        }
                }
        }
}


On Wed, Feb 10, 2010 at 5:35 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/09, Oleg Nesterov wrote:
>>
>> Salman Qazi wrote:
>> >
>> > A race in ptrace was pointed to us by a fellow Google engineer, Tavis
>> > Ormandy.  The race involves interaction between a tracer, a tracee and
>> > an antagonist.  The tracer is tracing the tracee with PTRACE_SYSCALL and
>> > waits on the tracee.  In the mean time, an antagonist blasts the tracee
>> > with SIGCONTs.
>>
>> Could you please explain how did observe this race? Do you have a
>> test-case, or could you explain how we can reproduce it?
>>
>> Because,
>>
>> > It turns out that a SIGCONT wakes up the
>> > tracee in kernel mode,
>>
>> SIGCONT must not wake up a TASK_TRACED task.
>
> In case I wasn't clear...
>
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -697,6 +697,10 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
>                 * and wake all threads.
>                 */
>                rm_from_queue(SIG_KERNEL_STOP_MASK, &signal->shared_pending);
> +               if (p->ptrace & PT_PTRACED) {
> +                       p->ptrace |= PT_WAKING;
> +                       mb();
> +               }
>
> Please note that we are going to do wake_up_state(state), and
> this state can never have __TASK_TRACED bit set.
>
> And we can't change ->ptrace here, we can race with the tracer.
>
> There are other problems with this patch, but the main problem
> is that I can't understand what this patch tries to fix.
>
> IOW, please provide more info ;)
>
> Oleg.
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Race in ptrace.
  2010-02-10 18:38       ` Salman Qazi
@ 2010-02-11 12:56         ` Oleg Nesterov
  2010-02-11 16:32           ` Salman Qazi
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2010-02-11 12:56 UTC (permalink / raw)
  To: Salman Qazi
  Cc: taviso, Roland Dreier, Andrew Morton, Roland McGrath, linux-kernel

On 02/10, Salman Qazi wrote:
>
> I have
> made a simpler version of tavis's test case:

Thanks, now I see what you mean.

But this all is correct, you can't expect PTRACE_SYSCALL can succeed
is the tracee is running, it must be stopped or traced.

The tracee is running because it was TASK_STOPPED and antagonist()
sends SIGCONT.

The tracee was TASK_STOPPED because the tracer passes sig = SIGSTOP
via ptrace(PTRACE_SYSCALL, WSTOPSIG(status).

Where do you see the bug?

OK, let me simplify the test-case even more:

	int main(void)
	{
		int stat, ret;
		int pid = fork();

		if (!pid) {
			ptrace(PTRACE_TRACEME, 0, NULL, NULL);
			for (;;)
				;
		}

		sleep(1);	// wait for PTRACE_TRACEME
		kill(pid, SIGSTOP);

		// the child reports SIGSTOP, it is TASK_TRACED
		assert(pid == wait(&stat) && WIFSTOPPED(stat));

		// the tracee should stop, we pass sig = SIGSTOP
		assert(ptrace(PTRACE_SYSCALL, pid, 0, WSTOPSIG(stat)) == 0);

		// the child reports the group stop, it is TASK_STOPPED
		assert(pid == wait(&stat) && WIFSTOPPED(stat));

		// the tracee is STOPPED as requested, not TRACED,
		// SIGCONT wakes it up
		kill(pid, SIGCONT);

		// now the tracee is _running_, and PTRACE_SYSCALL must fail
		ret = ptrace(PTRACE_SYSCALL, pid, 0, WSTOPSIG(stat));
		printf("should fail: ret=%d %m\n", ret);

		return 0;
	}

PTRACE_SYSCALL fails, and this is absolutely correct.

Now, let's look at your test-case

> int main(int argc, char **argv)
> {
>         int status;
>         assert((child_pid = do_fork(child)) > 0);
>         assert((ant_pid = do_fork(antagonist)) > 0);
>         waitpid(child_pid, &status, 0);
>         ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL);
>         while(1) {
>                 if (waitpid(child_pid, &status, 0) <= 0) {
>                         printf("Errno %d\n", errno);
>                         exit(EXIT_FAILURE);
>                 }
>                 if (WIFSTOPPED(status)) {

WSTOPSIG() should be either SIGCONT or SIGSTOP

>                         printf("stopped: %d\n", WSTOPSIG(status));
>
>                         /* This should work, but sometimes it doesn't */
>                         if (ptrace(PTRACE_SYSCALL, child_pid,
>                                                 NULL, WSTOPSIG(status)) < 0) {

This should not work if the tracee reported the group stop (not the
fact it dequeued SIGSTOP) and antagonist() sends SIGCONT in between.

>                                 /* Oddly it works the second time! */
>                                 assert (ptrace(PTRACE_SYSCALL,
>                                         child_pid, NULL, WSTOPSIG(status)) < 0);

Of couse, it _can_ work the second time, antagonist() sends a signal
(SIGCONT or SIGSTOP), the tracee dequeues the signal, and stops to
report this signal.

See?

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Race in ptrace.
  2010-02-11 12:56         ` Oleg Nesterov
@ 2010-02-11 16:32           ` Salman Qazi
  2010-02-11 16:50             ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Salman Qazi @ 2010-02-11 16:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: taviso, Roland Dreier, Andrew Morton, Roland McGrath, linux-kernel

On Thu, Feb 11, 2010 at 4:56 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/10, Salman Qazi wrote:
>>
>> I have
>> made a simpler version of tavis's test case:
>
> Thanks, now I see what you mean.
>
> But this all is correct, you can't expect PTRACE_SYSCALL can succeed
> is the tracee is running, it must be stopped or traced.
>
> The tracee is running because it was TASK_STOPPED and antagonist()
> sends SIGCONT.
>
> The tracee was TASK_STOPPED because the tracer passes sig = SIGSTOP
> via ptrace(PTRACE_SYSCALL, WSTOPSIG(status).
>
> Where do you see the bug?

Shouldn't ptrace(PTRACE_SYSCALL, WSTOPSIG(status)...), cause any
future signals to the child be intercepted by the parent?

>
> OK, let me simplify the test-case even more:
>
>        int main(void)
>        {
>                int stat, ret;
>                int pid = fork();
>
>                if (!pid) {
>                        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
>                        for (;;)
>                                ;
>                }
>
>                sleep(1);       // wait for PTRACE_TRACEME
>                kill(pid, SIGSTOP);
>
>                // the child reports SIGSTOP, it is TASK_TRACED
>                assert(pid == wait(&stat) && WIFSTOPPED(stat));
>
>                // the tracee should stop, we pass sig = SIGSTOP
>                assert(ptrace(PTRACE_SYSCALL, pid, 0, WSTOPSIG(stat)) == 0);
>
>                // the child reports the group stop, it is TASK_STOPPED
>                assert(pid == wait(&stat) && WIFSTOPPED(stat));
>
>                // the tracee is STOPPED as requested, not TRACED,
>                // SIGCONT wakes it up
>                kill(pid, SIGCONT);

                   According to the man page, any signals to the
process are supposed to be intercepted by the parent and that is how
one is supposed to be able to control which signals make it to the
child.  I am not sure if it makes any difference if the signal
originates at the parent.  But in our test case, it doesn't.   So, why
doesn't the parent get a notification first?

>
>                // now the tracee is _running_, and PTRACE_SYSCALL must fail
>                ret = ptrace(PTRACE_SYSCALL, pid, 0, WSTOPSIG(stat));
>                printf("should fail: ret=%d %m\n", ret);
>
>                return 0;
>        }
>
> PTRACE_SYSCALL fails, and this is absolutely correct.
>
> Now, let's look at your test-case
>
>> int main(int argc, char **argv)
>> {
>>         int status;
>>         assert((child_pid = do_fork(child)) > 0);
>>         assert((ant_pid = do_fork(antagonist)) > 0);
>>         waitpid(child_pid, &status, 0);
>>         ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL);
>>         while(1) {
>>                 if (waitpid(child_pid, &status, 0) <= 0) {
>>                         printf("Errno %d\n", errno);
>>                         exit(EXIT_FAILURE);
>>                 }
>>                 if (WIFSTOPPED(status)) {
>
> WSTOPSIG() should be either SIGCONT or SIGSTOP
>
>>                         printf("stopped: %d\n", WSTOPSIG(status));
>>
>>                         /* This should work, but sometimes it doesn't */
>>                         if (ptrace(PTRACE_SYSCALL, child_pid,
>>                                                 NULL, WSTOPSIG(status)) < 0) {
>
> This should not work if the tracee reported the group stop (not the
> fact it dequeued SIGSTOP) and antagonist() sends SIGCONT in between.
>
>>                                 /* Oddly it works the second time! */
>>                                 assert (ptrace(PTRACE_SYSCALL,
>>                                         child_pid, NULL, WSTOPSIG(status)) < 0);
>
> Of couse, it _can_ work the second time, antagonist() sends a signal
> (SIGCONT or SIGSTOP), the tracee dequeues the signal, and stops to
> report this signal.
>
> See?
>
> Oleg.
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Race in ptrace.
  2010-02-11 16:32           ` Salman Qazi
@ 2010-02-11 16:50             ` Oleg Nesterov
  2010-02-11 18:43               ` Salman Qazi
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2010-02-11 16:50 UTC (permalink / raw)
  To: Salman Qazi
  Cc: taviso, Roland Dreier, Andrew Morton, Roland McGrath, linux-kernel

On 02/11, Salman Qazi wrote:
>
> On Thu, Feb 11, 2010 at 4:56 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > But this all is correct, you can't expect PTRACE_SYSCALL can succeed
> > is the tracee is running, it must be stopped or traced.
> >
> > The tracee is running because it was TASK_STOPPED and antagonist()
> > sends SIGCONT.
> >
> > The tracee was TASK_STOPPED because the tracer passes sig = SIGSTOP
> > via ptrace(PTRACE_SYSCALL, WSTOPSIG(status).
> >
> > Where do you see the bug?
>
> Shouldn't ptrace(PTRACE_SYSCALL, WSTOPSIG(status)...), cause any
> future signals to the child be intercepted by the parent?

Not sure I understand your question. Of course the tracee will report any
future signals signals, after it has a chance to dequeue a signal.

But if you mean that after, say, ptrace(PTRACE_SYSCALL, SIGTERM) the
tracee should report _this_ SIGTERM to the tracer - then no. Well,
actually "this depends", but if PTRACE_SYSCALL (or any other req)
is called after the tracee reported the signal - no. The signal was
already reported.

> >        int main(void)
> >        {
> >                int stat, ret;
> >                int pid = fork();
> >
> >                if (!pid) {
> >                        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
> >                        for (;;)
> >                                ;
> >                }
> >
> >                sleep(1);       // wait for PTRACE_TRACEME
> >                kill(pid, SIGSTOP);
> >
> >                // the child reports SIGSTOP, it is TASK_TRACED
> >                assert(pid == wait(&stat) && WIFSTOPPED(stat));
> >
> >                // the tracee should stop, we pass sig = SIGSTOP
> >                assert(ptrace(PTRACE_SYSCALL, pid, 0, WSTOPSIG(stat)) == 0);
> >
> >                // the child reports the group stop, it is TASK_STOPPED
> >                assert(pid == wait(&stat) && WIFSTOPPED(stat));
> >
> >                // the tracee is STOPPED as requested, not TRACED,
> >                // SIGCONT wakes it up
> >                kill(pid, SIGCONT);
>
>                    According to the man page, any signals to the
> process are supposed to be intercepted by the parent and that is how
> one is supposed to be able to control which signals make it to the
> child.  I am not sure if it makes any difference if the signal
> originates at the parent.  But in our test case, it doesn't.   So, why
> doesn't the parent get a notification first?

It does. You can insert another wait() (or just sleep(1)) between
kill(SIGCONT) and PTRACE_SYSCALL below, the tracee will stop to report
SIGCONT and the tracer will be notified. In this case the following
PTRACE_SYSCALL should succeed.

Perhaps I should have mentioned that the code above is racy. It is,
I only did it to simplify the explanations.

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Race in ptrace.
  2010-02-11 16:50             ` Oleg Nesterov
@ 2010-02-11 18:43               ` Salman Qazi
  2010-02-11 18:55                 ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Salman Qazi @ 2010-02-11 18:43 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: taviso, Roland Dreier, Andrew Morton, Roland McGrath, linux-kernel

On Thu, Feb 11, 2010 at 8:50 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/11, Salman Qazi wrote:
>>
>> On Thu, Feb 11, 2010 at 4:56 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> >
>> > But this all is correct, you can't expect PTRACE_SYSCALL can succeed
>> > is the tracee is running, it must be stopped or traced.
>> >
>> > The tracee is running because it was TASK_STOPPED and antagonist()
>> > sends SIGCONT.
>> >
>> > The tracee was TASK_STOPPED because the tracer passes sig = SIGSTOP
>> > via ptrace(PTRACE_SYSCALL, WSTOPSIG(status).
>> >
>> > Where do you see the bug?
>>
>> Shouldn't ptrace(PTRACE_SYSCALL, WSTOPSIG(status)...), cause any
>> future signals to the child be intercepted by the parent?
>
> Not sure I understand your question. Of course the tracee will report any
> future signals signals, after it has a chance to dequeue a signal.
>
> But if you mean that after, say, ptrace(PTRACE_SYSCALL, SIGTERM) the
> tracee should report _this_ SIGTERM to the tracer - then no. Well,
> actually "this depends", but if PTRACE_SYSCALL (or any other req)
> is called after the tracee reported the signal - no. The signal was
> already reported.
>
>> >        int main(void)
>> >        {
>> >                int stat, ret;
>> >                int pid = fork();
>> >
>> >                if (!pid) {
>> >                        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
>> >                        for (;;)
>> >                                ;
>> >                }
>> >
>> >                sleep(1);       // wait for PTRACE_TRACEME
>> >                kill(pid, SIGSTOP);
>> >
>> >                // the child reports SIGSTOP, it is TASK_TRACED
>> >                assert(pid == wait(&stat) && WIFSTOPPED(stat));
>> >
>> >                // the tracee should stop, we pass sig = SIGSTOP
>> >                assert(ptrace(PTRACE_SYSCALL, pid, 0, WSTOPSIG(stat)) == 0);
>> >
>> >                // the child reports the group stop, it is TASK_STOPPED
>> >                assert(pid == wait(&stat) && WIFSTOPPED(stat));
>> >
>> >                // the tracee is STOPPED as requested, not TRACED,
>> >                // SIGCONT wakes it up
>> >                kill(pid, SIGCONT);

I am still missing something.  There's probably a gap in my
understanding, so let's try to clarify it.  The last "kill" call,
sends a SIGCONT.  But, shouldn't this SIGCONT be intercepted by the
tracer before the tracee sees it?

>>
>>                    According to the man page, any signals to the
>> process are supposed to be intercepted by the parent and that is how
>> one is supposed to be able to control which signals make it to the
>> child.  I am not sure if it makes any difference if the signal
>> originates at the parent.  But in our test case, it doesn't.   So, why
>> doesn't the parent get a notification first?
>
> It does. You can insert another wait() (or just sleep(1)) between
> kill(SIGCONT) and PTRACE_SYSCALL below, the tracee will stop to report
> SIGCONT and the tracer will be notified. In this case the following
> PTRACE_SYSCALL should succeed.
>
> Perhaps I should have mentioned that the code above is racy. It is,
> I only did it to simplify the explanations.
>
> Oleg.
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Race in ptrace.
  2010-02-11 18:43               ` Salman Qazi
@ 2010-02-11 18:55                 ` Oleg Nesterov
  2010-02-11 19:08                   ` Salman Qazi
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2010-02-11 18:55 UTC (permalink / raw)
  To: Salman Qazi
  Cc: taviso, Roland Dreier, Andrew Morton, Roland McGrath, linux-kernel

On 02/11, Salman Qazi wrote:
>
> >> >                // the tracee is STOPPED as requested, not TRACED,
> >> >                // SIGCONT wakes it up
> >> >                kill(pid, SIGCONT);
>
> I am still missing something.  There's probably a gap in my
> understanding, so let's try to clarify it.  The last "kill" call,
> sends a SIGCONT.  But, shouldn't this SIGCONT be intercepted by the
> tracer before the tracee sees it?

No. The tracee resumes (again: because it was STOPPED, not TRACED),
dequeues SIGCONT, reports the signal and stops in TASK_TRACED,
see ptrace_signal(). Meanwhile, until it calls ptrace_stop(), it is
TASK_RUNNING and ptrace() fails.

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Race in ptrace.
  2010-02-11 18:55                 ` Oleg Nesterov
@ 2010-02-11 19:08                   ` Salman Qazi
  2010-02-11 20:10                     ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Salman Qazi @ 2010-02-11 19:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: taviso, Roland Dreier, Andrew Morton, Roland McGrath, linux-kernel

I understand what it does.  But, why is it the right thing to do?
>From the user's perspective, why should the task become untraced if we
use ptrace to deliver the signal?  Doesn't this make it impossible to
intercept and control which signals are sent to a traced task?

On Thu, Feb 11, 2010 at 10:55 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/11, Salman Qazi wrote:
>>
>> >> >                // the tracee is STOPPED as requested, not TRACED,
>> >> >                // SIGCONT wakes it up
>> >> >                kill(pid, SIGCONT);
>>
>> I am still missing something.  There's probably a gap in my
>> understanding, so let's try to clarify it.  The last "kill" call,
>> sends a SIGCONT.  But, shouldn't this SIGCONT be intercepted by the
>> tracer before the tracee sees it?
>
> No. The tracee resumes (again: because it was STOPPED, not TRACED),
> dequeues SIGCONT, reports the signal and stops in TASK_TRACED,
> see ptrace_signal(). Meanwhile, until it calls ptrace_stop(), it is
> TASK_RUNNING and ptrace() fails.
>
> Oleg.
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Race in ptrace.
  2010-02-11 19:08                   ` Salman Qazi
@ 2010-02-11 20:10                     ` Oleg Nesterov
  2010-02-11 20:39                       ` Salman Qazi
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2010-02-11 20:10 UTC (permalink / raw)
  To: Salman Qazi
  Cc: taviso, Roland Dreier, Andrew Morton, Roland McGrath, linux-kernel

On 02/11, Salman Qazi wrote:
>
> I understand what it does.  But, why is it the right thing to do?

Oh. Let's not discuss the current API. Nobody thinks it is great,
but we can't change it.

But,

> From the user's perspective, why should the task become untraced if we
> use ptrace to deliver the signal?

The task does not become untraced. The tracer (in your test-case)
explicitly asks the tracee to respect SIGSTOP and stop.

> Doesn't this make it impossible to
> intercept and control which signals are sent to a traced task?

Why? The tracee reports all signals. If the tracer does
ptrace(PTRACE_WHATEVER, SIGXXX) surely it knows SIGXXX is sent to
tracee.




In any case. This is how ptrace currently works, there is no race
and the patch is not needed (in fact it is very wrong, but this
soesn't matter).

Do you agree?

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Race in ptrace.
  2010-02-11 20:10                     ` Oleg Nesterov
@ 2010-02-11 20:39                       ` Salman Qazi
  2010-02-11 20:55                         ` Roland McGrath
  2010-02-11 20:59                         ` Oleg Nesterov
  0 siblings, 2 replies; 16+ messages in thread
From: Salman Qazi @ 2010-02-11 20:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: taviso, Roland Dreier, Andrew Morton, Roland McGrath, linux-kernel

On Thu, Feb 11, 2010 at 12:10 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/11, Salman Qazi wrote:
>>
>> I understand what it does.  But, why is it the right thing to do?
>
> Oh. Let's not discuss the current API. Nobody thinks it is great,
> but we can't change it.
>
> But,
>
>> From the user's perspective, why should the task become untraced if we
>> use ptrace to deliver the signal?
>
> The task does not become untraced. The tracer (in your test-case)
> explicitly asks the tracee to respect SIGSTOP and stop.
>
>> Doesn't this make it impossible to
>> intercept and control which signals are sent to a traced task?
>
> Why? The tracee reports all signals. If the tracer does
> ptrace(PTRACE_WHATEVER, SIGXXX) surely it knows SIGXXX is sent to
> tracee.
>
>

The ptrace syscall fails, as the child is running and so we are unable
to restart the child.  I suppose it is not accurate to say "impossible
to intercept" if it eventually works.  But, it's an unpleasant
behaviour.  How do you distinguish between this race and the child
suddenly becoming untraced or dying?

What I don't agree with is that when we send SIGCONT later with
"kill", we wake up the child at all.  It may make sense to someone who
has access to the kernel source code, but from a user's point of view
this is a surprise.  The signal is intercepted and should not have an
effect on the child.

>
>
> In any case. This is how ptrace currently works, there is no race
> and the patch is not needed (in fact it is very wrong, but this
> soesn't matter).
>
> Do you agree?

I agree that the patch is wrong because of the reasons you mentioned
earlier.  But I think there is an issue here.  It's hard to say what
it is supposed to do, but I can certainly see it being more useful
this behaviour wasn't there.  Thanks for your time and attention on
this matter.

>
> Oleg.
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Race in ptrace.
  2010-02-11 20:39                       ` Salman Qazi
@ 2010-02-11 20:55                         ` Roland McGrath
  2010-02-11 21:05                           ` Salman Qazi
  2010-02-11 20:59                         ` Oleg Nesterov
  1 sibling, 1 reply; 16+ messages in thread
From: Roland McGrath @ 2010-02-11 20:55 UTC (permalink / raw)
  To: Salman Qazi
  Cc: Oleg Nesterov, taviso, Roland Dreier, Andrew Morton, linux-kernel

> What I don't agree with is that when we send SIGCONT later with
> "kill", we wake up the child at all.  It may make sense to someone who
> has access to the kernel source code, but from a user's point of view
> this is a surprise.  The signal is intercepted and should not have an
> effect on the child.

This is the behavior of SIGCONT and doesn't really have anything to do with
ptrace.  Once you have let the SIGSTOP through, the process is in job
control stop just like if you'd sent a SIGSTOP without using ptrace at all.

The distinction that is confusing you is that *generating* SIGCONT is what
resumes the process, not *delivering* it.  Another example is that if your
process has SIGCONT blocked or ignored, SIGCONT still wakes it up.  Another
example is that SIGCONT wakes up all the threads in a process, before one
of those threads delivers the SIGCONT (i.e. runs a handler).


Thanks,
Roland

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Race in ptrace.
  2010-02-11 20:39                       ` Salman Qazi
  2010-02-11 20:55                         ` Roland McGrath
@ 2010-02-11 20:59                         ` Oleg Nesterov
  1 sibling, 0 replies; 16+ messages in thread
From: Oleg Nesterov @ 2010-02-11 20:59 UTC (permalink / raw)
  To: Salman Qazi
  Cc: taviso, Roland Dreier, Andrew Morton, Roland McGrath, linux-kernel

On 02/11, Salman Qazi wrote:
>
> On Thu, Feb 11, 2010 at 12:10 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> >>
> > Why? The tracee reports all signals. If the tracer does
> > ptrace(PTRACE_WHATEVER, SIGXXX) surely it knows SIGXXX is sent to
> > tracee.
>
> The ptrace syscall fails, as the child is running and so we are unable
> to restart the child.  I suppose it is not accurate to say "impossible
> to intercept" if it eventually works.  But, it's an unpleasant
> behaviour.  How do you distinguish between this race and the child
> suddenly becoming untraced or dying?

The child can't become untraced unless the tracer detaches. If the
tracee dies the tracer can notice this via wait(). And please note
again, this particular case is not possible when the tracee is
TASK_TRACED. The tracer explicitly instructed the tracee to stop in
TASK_STOPPED, it should take care of SIGCONT case.

But don't get me wrong, see below,

> > In any case. This is how ptrace currently works, there is no race
> > and the patch is not needed (in fact it is very wrong, but this
> > soesn't matter).
> >
> > Do you agree?
>
> I agree that the patch is wrong because of the reasons you mentioned
> earlier.  But I think there is an issue here.  It's hard to say what
> it is supposed to do, but I can certainly see it being more useful
> this behaviour wasn't there.

Ha. let me repeat, nobody thinks the current ptrace API is nice.


OK. Thanks Salman for your report and discussion.

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Race in ptrace.
  2010-02-11 20:55                         ` Roland McGrath
@ 2010-02-11 21:05                           ` Salman Qazi
  0 siblings, 0 replies; 16+ messages in thread
From: Salman Qazi @ 2010-02-11 21:05 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Oleg Nesterov, taviso, Roland Dreier, Andrew Morton, linux-kernel

On Thu, Feb 11, 2010 at 12:55 PM, Roland McGrath <roland@redhat.com> wrote:
>> What I don't agree with is that when we send SIGCONT later with
>> "kill", we wake up the child at all.  It may make sense to someone who
>> has access to the kernel source code, but from a user's point of view
>> this is a surprise.  The signal is intercepted and should not have an
>> effect on the child.
>
> This is the behavior of SIGCONT and doesn't really have anything to do with
> ptrace.  Once you have let the SIGSTOP through, the process is in job
> control stop just like if you'd sent a SIGSTOP without using ptrace at all.
>
> The distinction that is confusing you is that *generating* SIGCONT is what
> resumes the process, not *delivering* it.  Another example is that if your
> process has SIGCONT blocked or ignored, SIGCONT still wakes it up.  Another
> example is that SIGCONT wakes up all the threads in a process, before one
> of those threads delivers the SIGCONT (i.e. runs a handler).

Thanks for the clarification.  This is exactly the bit of information
I was missing.

>
>
> Thanks,
> Roland
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Race in ptrace.
@ 2010-02-08 22:04 Salman Qazi
  0 siblings, 0 replies; 16+ messages in thread
From: Salman Qazi @ 2010-02-08 22:04 UTC (permalink / raw)


Greetings,

A race in ptrace was pointed to us by a fellow Google engineer, Tavis
Ormandy.  The race involves interaction between a tracer, a tracee and 
an antagonist.  The tracer is tracing the tracee with PTRACE_SYSCALL and
waits on the tracee.  In the mean time, an antagonist blasts the tracee
with SIGCONTs.

The observed issue is that sometimes when the tracer attempts to continue the tracee
with PTRACE_SYSCALL, it gets a return value of -ESRCH, indicating that the tracee is already
running (or not being traced).  It turns out that a SIGCONT wakes up the
tracee in kernel mode, and for a moment the tracee's state is TASK_RUNNING
then in ptrace_stop we hit the condition where the tracee is found to be running (and thus
not traced).  If the syscall is repeated, the second time it usually 
succeeds (because by that time, the tracee has been put into TASK_TRACED).

Below is a quick and dirty fix for the one instance that I did
figure out.  Note that this doesn't completely close the race on 
2.6.33-rc6.  But on 2.6.26 it appears to be sufficient.  I suspect there 
are other code paths with similar issues:

    Fix a race in ptrace.
    
    Race description:
    
    The traced process is running for a small duration
    of time between when it is sent a SIGCONT and when it realizes that it needs
    to be asleep in order to get traced.  If during this time the tracer calls
    ptrace with PTRACE_SYSCALL, it recieves an errno value of -ESRCH.
    
    Solution:
    
    We add a new bit to the ptrace field of task_struct.  We call this PT_WAKING.
    When the process is being awoken for a SIGCONT signal, we set this bit before
    state changes to TASK_RUNNING.  When the process is about to go to sleep, we
    reset this bit after we change the state to TASK_TRACED.

Signed-off-by: Salman Qazi <sqazi@google.com>

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 56f2d63..6c6771a 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -67,8 +67,9 @@
 #define PT_TRACE_EXEC	0x00000080
 #define PT_TRACE_VFORK_DONE	0x00000100
 #define PT_TRACE_EXIT	0x00000200
+#define PT_WAKING	0x00000400
 
-#define PT_TRACE_MASK	0x000003f4
+#define PT_TRACE_MASK	0x000007f4
 
 /* single stepping state bits (used on ARM and PA-RISC) */
 #define PT_SINGLESTEP_BIT	31
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 23bd09c..32157f8 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -104,7 +104,8 @@ int ptrace_check_attach(struct task_struct *child, int kill)
 		spin_lock_irq(&child->sighand->siglock);
 		if (task_is_stopped(child))
 			child->state = TASK_TRACED;
-		else if (!task_is_traced(child) && !kill)
+		else if (!task_is_traced(child) && !kill &&
+				(!(child->ptrace & PT_WAKING)))
 			ret = -ESRCH;
 		spin_unlock_irq(&child->sighand->siglock);
 	}
diff --git a/kernel/signal.c b/kernel/signal.c
index 934ae5e..095507e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -697,6 +697,10 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		 * and wake all threads.
 		 */
 		rm_from_queue(SIG_KERNEL_STOP_MASK, &signal->shared_pending);
+		if (p->ptrace & PT_PTRACED) {
+			p->ptrace |= PT_WAKING;
+			mb();
+		}
 		t = p;
 		do {
 			unsigned int state;
@@ -1626,6 +1630,10 @@ static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
 
 	/* Let the debugger run.  */
 	__set_current_state(TASK_TRACED);
+	if (current->ptrace & PT_PTRACED) {
+		mb();
+		current->ptrace &= ~PT_WAKING;
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
 	if (may_ptrace_stop()) {

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-02-11 21:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-08 22:16 Race in ptrace Salman Qazi
     [not found] ` <20100208143231.6d804590.akpm@linux-foundation.org>
2010-02-09 11:27   ` Oleg Nesterov
2010-02-10 13:35     ` Oleg Nesterov
2010-02-10 18:38       ` Salman Qazi
2010-02-11 12:56         ` Oleg Nesterov
2010-02-11 16:32           ` Salman Qazi
2010-02-11 16:50             ` Oleg Nesterov
2010-02-11 18:43               ` Salman Qazi
2010-02-11 18:55                 ` Oleg Nesterov
2010-02-11 19:08                   ` Salman Qazi
2010-02-11 20:10                     ` Oleg Nesterov
2010-02-11 20:39                       ` Salman Qazi
2010-02-11 20:55                         ` Roland McGrath
2010-02-11 21:05                           ` Salman Qazi
2010-02-11 20:59                         ` Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2010-02-08 22:04 Salman Qazi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.