All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2] Fix ptrace events across pid namespaces
       [not found]   ` <CAF52+S5i7oqJnJ1NN0bk5Vg=CiYrussw0AunteE72kMMcWkeJA@mail.gmail.com>
@ 2014-04-01 18:52     ` Oleg Nesterov
  2014-04-01 20:44       ` Matthew Dempsky
  0 siblings, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2014-04-01 18:52 UTC (permalink / raw)
  To: Matthew Dempsky
  Cc: Kees Cook, Julien Tinnes, Roland McGrath, Jan Kratochvil, linux-kernel

Sorry for slow response,

On 03/31, Matthew Dempsky wrote:
>
> On Mon, Mar 31, 2014 at 11:16 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> >> --- a/kernel/fork.c
> >> +++ b/kernel/fork.c
> >> @@ -1621,12 +1621,18 @@ long do_fork(unsigned long clone_flags,
> >>               wake_up_new_task(p);
> >>
> >>               /* forking complete and child started to run, tell ptracer */
> >> -             if (unlikely(trace))
> >> -                     ptrace_event(trace, nr);
> >> +             if (unlikely(trace)) {
> >> +                     pid_t pid = task_pid_nr_ns(p,
> >> +                             task_active_pid_ns(current->parent));
> >> +                     ptrace_event(trace, pid);
> >
> > This is unsafe, both "p" or ->parent can go away. In fact "p" can be
> > already dead/freed/etc.
>
> True.  I suspect the problem of "p" going away could be solved by
> calculating the pid prior to calling wake_up_new_task(p)?

Yes. Or we can do get_pid(p) (unconditional get_task_struct() can't
help, task_pid_nr_ns(tsk) returns zero if tsk has exited), although
personaly I'd prefer to avoid this.

But see below.

> But how can I guard against current->parent disappearing?  Do I need
> to grab a read lock on the tasklist and rcu?

rcu_read_lock() should be enough.

The main question is how we can ensure that we use the correct namespace,
iow, how we can ensure that current->parent won't change after this.

Suppose that debugger detaches right after we calculated pid_t we were
going to report.

Now suppose that another task from the different namespace does
ptrace(PTRACE_SEIZE, PTRACE_O_TRACEFORK). Or PTRACE_O_TRACEVFORKDONE.

In this case ptrace_event() will still report the wrong pid. I simply do
not see how we can prevent this race without the really ugly complications.

But perhaps we do not really care? The race is very unlikely, and nothing
really bad can happen. So perhaps your simple patch makes sense anyway,
just the problem should be documented. Currently "nr" is always wrong if
the tracer is from another ns, this change will obviously makes the things
better in any case.

> >> --- a/kernel/ptrace.c
> >> +++ b/kernel/ptrace.c
> >> @@ -80,6 +80,7 @@ void __ptrace_unlink(struct task_struct *child)
> >>       BUG_ON(!child->ptrace);
> >>
> >>       child->ptrace = 0;
> >> +     child->ptrace_message = 0;
> >
> > It is too late for me to try to understand the reason for this change ;)
>
> Basically, if process A is being ptraced by process B and we record an
> event for it, I don't want process B to detach and then process C to
> attach and be able to still view the event.

Aha, so I got it right. But see above, this won't help anyway, C can
attach before ptrace_event() sets ->ptrace_message.

Perhaps this change makes sense anyway. I think it needs a separate patch
but I won't insist.

> > Probably this is related to detach/attach, but then it would be better
> > to do this in PTRACE_ATTACH. Because ->ptrace_message can be non-zero
> > if, say, a traced task forks.
>
> I think zeroing out ptrace_message at attach time works too, and I can
> do it that way if you prefer.

I do not really mind. But once again, unless we clear it in PTRACE_ATTACH,
PTRACE_GETEVENTMSG still can return non-zero right after PTRACE_ATTACH.
Of course we can change copy_process() to clear child->ptrace_message,
but I don't think this would be better.

Matthew, I just noticed we discuss this off-list, I added lkml.

Oleg.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] Fix ptrace events across pid namespaces
  2014-04-01 18:52     ` [PATCH v2] Fix ptrace events across pid namespaces Oleg Nesterov
@ 2014-04-01 20:44       ` Matthew Dempsky
  2014-04-01 22:29         ` [PATCH v3] ptrace: Fix fork event messages " Matthew Dempsky
  0 siblings, 1 reply; 23+ messages in thread
From: Matthew Dempsky @ 2014-04-01 20:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Kees Cook, Julien Tinnes, Roland McGrath, Jan Kratochvil, linux-kernel

On Tue, Apr 1, 2014 at 11:52 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> Sorry for slow response,

No worries!

>> But how can I guard against current->parent disappearing?  Do I need
>> to grab a read lock on the tasklist and rcu?
>
> rcu_read_lock() should be enough.

Noted.

> The main question is how we can ensure that we use the correct namespace,
> iow, how we can ensure that current->parent won't change after this.
>
> Suppose that debugger detaches right after we calculated pid_t we were
> going to report.
>
> Now suppose that another task from the different namespace does
> ptrace(PTRACE_SEIZE, PTRACE_O_TRACEFORK). Or PTRACE_O_TRACEVFORKDONE.
>
> In this case ptrace_event() will still report the wrong pid. I simply do
> not see how we can prevent this race without the really ugly complications.

Yeah, the 'best' alternative I can think of is to save the "struct pid
*" instead of a pid_t in the ptrace_message field, and then have
PTRACE_GETEVENTMSG handle translating it into a pid_t (similar to what
SCM_CREDENTIALS does)... but that seems very messy and intrusive for
such an unlikely situation (because we need extra bits to distinguish
the contents of ptrace_message and to free up references).

> But perhaps we do not really care? The race is very unlikely, and nothing
> really bad can happen. So perhaps your simple patch makes sense anyway,
> just the problem should be documented. Currently "nr" is always wrong if
> the tracer is from another ns, this change will obviously makes the things
> better in any case.

Yeah, I'm leaning towards just accepting that the ptrace event message
might be wrong in this case, and documenting that.  It's at least no
worse than the status quo where it's wrong in all cases.  I'll
followup with a v3 patch for this in a bit.

>> Basically, if process A is being ptraced by process B and we record an
>> event for it, I don't want process B to detach and then process C to
>> attach and be able to still view the event.
>
> Aha, so I got it right. But see above, this won't help anyway, C can
> attach before ptrace_event() sets ->ptrace_message.
>
> Perhaps this change makes sense anyway. I think it needs a separate patch
> but I won't insist.
>
>> > Probably this is related to detach/attach, but then it would be better
>> > to do this in PTRACE_ATTACH. Because ->ptrace_message can be non-zero
>> > if, say, a traced task forks.
>>
>> I think zeroing out ptrace_message at attach time works too, and I can
>> do it that way if you prefer.
>
> I do not really mind. But once again, unless we clear it in PTRACE_ATTACH,
> PTRACE_GETEVENTMSG still can return non-zero right after PTRACE_ATTACH.
> Of course we can change copy_process() to clear child->ptrace_message,
> but I don't think this would be better.

I think I'll punt on zero'ing ptrace_message for now and propose that
as a followup patch then.  I think it's a really narrow and unlikely
issue (like the detach/attach race above), and I don't want to
distract from the more pressing matter of getting debugging working.
:)

> Matthew, I just noticed we discuss this off-list, I added lkml.

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3] ptrace: Fix fork event messages across pid namespaces
  2014-04-01 20:44       ` Matthew Dempsky
@ 2014-04-01 22:29         ` Matthew Dempsky
  2014-04-02  0:39           ` Matthew Dempsky
                             ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Matthew Dempsky @ 2014-04-01 22:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matthew Dempsky, Kees Cook, Julien Tinnes, Roland McGrath,
	Jan Kratochvil, linux-kernel

Revised patch below.  Unfortunately, I just realized I don't have my
test machine handy, so I won't be able to test it until tomorrow.

v3:
        - Respond to Oleg feedback about p possibly already exiting
          and adding proper locking
        - Add comment warning that race condition still exists
        - Removed selftest to instead be included with other ptrace tests
        - Removed ptrace_message zero'ing; to be handled in followup patch

8>--------------------------------------------------------------------<8

When tracing a thread in another pid namespace, it's important for
fork event messages to contain the child's pid as seen from the
tracer's pid namespace, not the parent's.  Otherwise, the tracer won't
be able to correlate the fork event with later SIGTRAP signals it
receives from the child.

We still (at least theoretically) risk sending a bogus fork event
message if a debugger from pid namespace A detaches and then another
debugger from pid namespace B attaches right away.  But this is still
an improvement from the status quo where we always send bogus fork
event messages when the debugger is in a different pid namespace than
the parent.

Signed-off-by: Matthew Dempsky <mdempsky@chromium.org>
---
 kernel/fork.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 332688e..d928761 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1605,10 +1605,12 @@ long do_fork(unsigned long clone_flags,
 	 */
 	if (!IS_ERR(p)) {
 		struct completion vfork;
+		struct pid *pid;
 
 		trace_sched_process_fork(current, p);
 
-		nr = task_pid_vnr(p);
+		pid = get_task_pid(p, PIDTYPE_PID);
+		nr = pid_vnr(pid);
 
 		if (clone_flags & CLONE_PARENT_SETTID)
 			put_user(nr, parent_tidptr);
@@ -1622,13 +1624,35 @@ long do_fork(unsigned long clone_flags,
 		wake_up_new_task(p);
 
 		/* forking complete and child started to run, tell ptracer */
-		if (unlikely(trace))
-			ptrace_event(trace, nr);
+		if (unlikely(trace)) {
+			/*
+			 * We want to report the child's pid as seen from the
+			 * tracer's pid namespace.
+			 * FIXME: We still risk sending a bogus event message if
+			 * debuggers from different pid namespaces detach and
+			 * reattach between rcu_read_unlock() and ptrace_stop().
+			 */
+			unsigned long message;
+			rcu_read_lock();
+			message = pid_nr_ns(pid,
+				task_active_pid_ns(current->parent));
+			rcu_read_unlock();
+			ptrace_event(trace, message);
+		}
 
 		if (clone_flags & CLONE_VFORK) {
-			if (!wait_for_vfork_done(p, &vfork))
-				ptrace_event(PTRACE_EVENT_VFORK_DONE, nr);
+			if (!wait_for_vfork_done(p, &vfork)) {
+				/* See comment above about pid namespaces. */
+				unsigned long message;
+				rcu_read_lock();
+				message = pid_nr_ns(pid,
+					task_active_pid_ns(current->parent));
+				rcu_read_unlock();
+				ptrace_event(PTRACE_EVENT_VFORK_DONE, message);
+			}
 		}
+
+		put_pid(pid);
 	} else {
 		nr = PTR_ERR(p);
 	}
-- 
1.9.1.423.g4596e3a

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] ptrace: Fix fork event messages across pid namespaces
  2014-04-01 22:29         ` [PATCH v3] ptrace: Fix fork event messages " Matthew Dempsky
@ 2014-04-02  0:39           ` Matthew Dempsky
  2014-04-02 14:58           ` Oleg Nesterov
  2014-04-03  2:26           ` [PATCH v4] " Matthew Dempsky
  2 siblings, 0 replies; 23+ messages in thread
From: Matthew Dempsky @ 2014-04-02  0:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matthew Dempsky, Kees Cook, Julien Tinnes, Roland McGrath,
	Jan Kratochvil, linux-kernel

On Tue, Apr 1, 2014 at 3:29 PM, Matthew Dempsky <mdempsky@chromium.org> wrote:
> Revised patch below.  Unfortunately, I just realized I don't have my
> test machine handy, so I won't be able to test it until tomorrow.

Quick update: Julien helped me setup a VM to test this, and the patch
still passes my test case without any obvious issues.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] ptrace: Fix fork event messages across pid namespaces
  2014-04-01 22:29         ` [PATCH v3] ptrace: Fix fork event messages " Matthew Dempsky
  2014-04-02  0:39           ` Matthew Dempsky
@ 2014-04-02 14:58           ` Oleg Nesterov
  2014-04-02 15:44             ` [PATCH 0/1] pid_namespace: pidns_get() should check task_active_pid_ns() != NULL Oleg Nesterov
  2014-04-02 21:58             ` [PATCH v3] ptrace: Fix fork event messages across pid namespaces Matthew Dempsky
  2014-04-03  2:26           ` [PATCH v4] " Matthew Dempsky
  2 siblings, 2 replies; 23+ messages in thread
From: Oleg Nesterov @ 2014-04-02 14:58 UTC (permalink / raw)
  To: Matthew Dempsky
  Cc: Kees Cook, Julien Tinnes, Roland McGrath, Jan Kratochvil, linux-kernel

On 04/01, Matthew Dempsky wrote:
>
> @@ -1605,10 +1605,12 @@ long do_fork(unsigned long clone_flags,
>  	 */
>  	if (!IS_ERR(p)) {
>  		struct completion vfork;
> +		struct pid *pid;
>
>  		trace_sched_process_fork(current, p);
>
> -		nr = task_pid_vnr(p);
> +		pid = get_task_pid(p, PIDTYPE_PID);

So you decided to use get_pid/put_pid ;) Honestly, I'd prefer to just
calculate "pid_t trace_pid" before wake_up_new_task(), but I won't
argue. Plus this way the race window becomes really small, OK.

> +		if (unlikely(trace)) {
> +			/*
> +			 * We want to report the child's pid as seen from the
> +			 * tracer's pid namespace.
> +			 * FIXME: We still risk sending a bogus event message if
> +			 * debuggers from different pid namespaces detach and
> +			 * reattach between rcu_read_unlock() and ptrace_stop().
> +			 */
> +			unsigned long message;
> +			rcu_read_lock();
> +			message = pid_nr_ns(pid,
> +				task_active_pid_ns(current->parent));
> +			rcu_read_unlock();
> +			ptrace_event(trace, message);
> +		}
>
>  		if (clone_flags & CLONE_VFORK) {
> -			if (!wait_for_vfork_done(p, &vfork))
> -				ptrace_event(PTRACE_EVENT_VFORK_DONE, nr);
> +			if (!wait_for_vfork_done(p, &vfork)) {
> +				/* See comment above about pid namespaces. */
> +				unsigned long message;
> +				rcu_read_lock();
> +				message = pid_nr_ns(pid,
> +					task_active_pid_ns(current->parent));
> +				rcu_read_unlock();
> +				ptrace_event(PTRACE_EVENT_VFORK_DONE, message);
> +			}

OK, but may I suggest you to make a helper? Note that the code under
"if (trace)" and "if (CLONE_VFORK)" is the same. Even the comment above
equally applies to the CLONE_VFORK branch.

Especially because this code needs a fix. Yes, rcu_read_lock() should
be enough to ensure that ->parent and its namespace (if !NULL) can not
go away, but task_active_pid_ns() can return NULL release_task(->parent)
was already (although this race is pure theoretical). So this helper
should also check it is !NULL under rcu_read_lock(), afaics.

(Hmm... off-topic, but get_pidns looks buggy by the same reason, I'll
 send a fix).

And I forgot to mention, please send v5 to akpm. We usually route ptrace
patches via -mm tree.

Oleg.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 0/1] pid_namespace: pidns_get() should check task_active_pid_ns() != NULL
  2014-04-02 14:58           ` Oleg Nesterov
@ 2014-04-02 15:44             ` Oleg Nesterov
  2014-04-02 15:45               ` [PATCH 1/1] " Oleg Nesterov
  2014-04-02 21:58             ` [PATCH v3] ptrace: Fix fork event messages across pid namespaces Matthew Dempsky
  1 sibling, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2014-04-02 15:44 UTC (permalink / raw)
  To: Matthew Dempsky, Andrew Morton, Eric W. Biederman
  Cc: Kees Cook, Julien Tinnes, Roland McGrath, Jan Kratochvil, linux-kernel

On 04/02, Oleg Nesterov wrote:
>
> Especially because this code needs a fix. Yes, rcu_read_lock() should
> be enough to ensure that ->parent and its namespace (if !NULL) can not
> go away, but task_active_pid_ns() can return NULL release_task(->parent)
> was already (although this race is pure theoretical). So this helper
> should also check it is !NULL under rcu_read_lock(), afaics.
>
> (Hmm... off-topic, but get_pidns looks buggy by the same reason, I'll
>  send a fix).

Eric, I have to admit that I din't bother to actually verify this, but
pidns_get() looks "obviously buggy". Could you ack/nack ?

Oleg.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/1] pid_namespace: pidns_get() should check task_active_pid_ns() != NULL
  2014-04-02 15:44             ` [PATCH 0/1] pid_namespace: pidns_get() should check task_active_pid_ns() != NULL Oleg Nesterov
@ 2014-04-02 15:45               ` Oleg Nesterov
  2014-04-02 16:53                 ` Eric W. Biederman
  0 siblings, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2014-04-02 15:45 UTC (permalink / raw)
  To: Matthew Dempsky, Andrew Morton, Eric W. Biederman
  Cc: Kees Cook, Julien Tinnes, Roland McGrath, Jan Kratochvil, linux-kernel

pidns_get()->get_pid_ns() can hit ns == NULL. This task_struct can't
go away, but task_active_pid_ns(task) is NULL if release_task(task)
was already called. Alternatively we could change get_pid_ns(ns) to
check ns != NULL, but it seems that other callers are fine.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

--- x/kernel/pid_namespace.c
+++ x/kernel/pid_namespace.c
@@ -318,7 +318,9 @@ static void *pidns_get(struct task_struct *task)
 	struct pid_namespace *ns;
 
 	rcu_read_lock();
-	ns = get_pid_ns(task_active_pid_ns(task));
+	ns = task_active_pid_ns(task);
+	if (ns)
+		get_pid_ns(ns);
 	rcu_read_unlock();
 
 	return ns;


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] pid_namespace: pidns_get() should check task_active_pid_ns() != NULL
  2014-04-02 16:53                 ` Eric W. Biederman
@ 2014-04-02 15:58                   ` Oleg Nesterov
  2014-04-02 22:01                     ` Eric W. Biederman
  0 siblings, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2014-04-02 15:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Matthew Dempsky, Andrew Morton, Kees Cook, Julien Tinnes,
	Roland McGrath, Jan Kratochvil, linux-kernel

On 04/02, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > pidns_get()->get_pid_ns() can hit ns == NULL. This task_struct can't
> > go away, but task_active_pid_ns(task) is NULL if release_task(task)
> > was already called. Alternatively we could change get_pid_ns(ns) to
> > check ns != NULL, but it seems that other callers are fine.
>
> get_pid_ns already has a ns == NULL check, so no code changes should be
> needed.

Hmm...

	static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns)
	{
		if (ns != &init_pid_ns)
			kref_get(&ns->kref);
		return ns;
	}

?

Oleg.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] pid_namespace: pidns_get() should check task_active_pid_ns() != NULL
  2014-04-02 15:45               ` [PATCH 1/1] " Oleg Nesterov
@ 2014-04-02 16:53                 ` Eric W. Biederman
  2014-04-02 15:58                   ` Oleg Nesterov
  0 siblings, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2014-04-02 16:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matthew Dempsky, Andrew Morton, Kees Cook, Julien Tinnes,
	Roland McGrath, Jan Kratochvil, linux-kernel

Oleg Nesterov <oleg@redhat.com> writes:

> pidns_get()->get_pid_ns() can hit ns == NULL. This task_struct can't
> go away, but task_active_pid_ns(task) is NULL if release_task(task)
> was already called. Alternatively we could change get_pid_ns(ns) to
> check ns != NULL, but it seems that other callers are fine.

get_pid_ns already has a ns == NULL check, so no code changes should be
needed.

Eric

>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> --- x/kernel/pid_namespace.c
> +++ x/kernel/pid_namespace.c
> @@ -318,7 +318,9 @@ static void *pidns_get(struct task_struct *task)
>  	struct pid_namespace *ns;
>  
>  	rcu_read_lock();
> -	ns = get_pid_ns(task_active_pid_ns(task));
> +	ns = task_active_pid_ns(task);
> +	if (ns)
> +		get_pid_ns(ns);
>  	rcu_read_unlock();
>  
>  	return ns;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] ptrace: Fix fork event messages across pid namespaces
  2014-04-02 14:58           ` Oleg Nesterov
  2014-04-02 15:44             ` [PATCH 0/1] pid_namespace: pidns_get() should check task_active_pid_ns() != NULL Oleg Nesterov
@ 2014-04-02 21:58             ` Matthew Dempsky
  2014-04-02 22:37               ` Matthew Dempsky
  1 sibling, 1 reply; 23+ messages in thread
From: Matthew Dempsky @ 2014-04-02 21:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Kees Cook, Julien Tinnes, Roland McGrath, Jan Kratochvil, linux-kernel

On Wed, Apr 2, 2014 at 7:58 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 04/01, Matthew Dempsky wrote:
>>
>> @@ -1605,10 +1605,12 @@ long do_fork(unsigned long clone_flags,
>>        */
>>       if (!IS_ERR(p)) {
>>               struct completion vfork;
>> +             struct pid *pid;
>>
>>               trace_sched_process_fork(current, p);
>>
>> -             nr = task_pid_vnr(p);
>> +             pid = get_task_pid(p, PIDTYPE_PID);
>
> So you decided to use get_pid/put_pid ;) Honestly, I'd prefer to just
> calculate "pid_t trace_pid" before wake_up_new_task(), but I won't
> argue. Plus this way the race window becomes really small, OK.

I was leaning towards that, but then the conditions for trying to
avoid computing the pid_t became complex and I was worried that
waiting for the vfork child to finish could make the race window
arbitrarily large.  Holding a struct pid reference for the duration of
fork seemed like the easiest fix to both of those.

>> +             if (unlikely(trace)) {
>> +                     /*
>> +                      * We want to report the child's pid as seen from the
>> +                      * tracer's pid namespace.
>> +                      * FIXME: We still risk sending a bogus event message if
>> +                      * debuggers from different pid namespaces detach and
>> +                      * reattach between rcu_read_unlock() and ptrace_stop().
>> +                      */
>> +                     unsigned long message;
>> +                     rcu_read_lock();
>> +                     message = pid_nr_ns(pid,
>> +                             task_active_pid_ns(current->parent));
>> +                     rcu_read_unlock();
>> +                     ptrace_event(trace, message);
>> +             }
>>
>>               if (clone_flags & CLONE_VFORK) {
>> -                     if (!wait_for_vfork_done(p, &vfork))
>> -                             ptrace_event(PTRACE_EVENT_VFORK_DONE, nr);
>> +                     if (!wait_for_vfork_done(p, &vfork)) {
>> +                             /* See comment above about pid namespaces. */
>> +                             unsigned long message;
>> +                             rcu_read_lock();
>> +                             message = pid_nr_ns(pid,
>> +                                     task_active_pid_ns(current->parent));
>> +                             rcu_read_unlock();
>> +                             ptrace_event(PTRACE_EVENT_VFORK_DONE, message);
>> +                     }
>
> OK, but may I suggest you to make a helper? Note that the code under
> "if (trace)" and "if (CLONE_VFORK)" is the same. Even the comment above
> equally applies to the CLONE_VFORK branch.

Sure.

> Especially because this code needs a fix. Yes, rcu_read_lock() should
> be enough to ensure that ->parent and its namespace (if !NULL) can not
> go away, but task_active_pid_ns() can return NULL release_task(->parent)
> was already (although this race is pure theoretical). So this helper
> should also check it is !NULL under rcu_read_lock(), afaics.

Does this look right?

    static inline void ptrace_event_pid(int event, struct pid *pid)
    {
        unsigned long message = -1;
        struct pid_namespace *ns;

        rcu_read_lock();
        ns = task_active_pid_ns(rcu_dereference(current->parent));
        if (ns)
            message = pid_nr_ns(pid, ns);
        rcu_read_unlock();

        ptrace_event(event, message);
    }

I'm unsure if the rcu_dereference() is appropriate.  It seems like it
is based on my reading of the RCU documentation and that parent and
real_parent have been marked __rcu since 2011, but they prevailingly
seem to be accessed/mutated without the RCU APIs.

Also, to ensure I understand the race: the issue is that if the parent
were to call do_exit() concurrently with the above RCU critical
section, that parent's call to forget_original_parent() might not yet
be visible when the above code evaluates "current->parent", but a
later call to release_task() (e.g., if autoreap is true in
exit_notify) could detach the task's pids without any intervening
synchronize_rcu() call?

If so, why isn't the fix to have forget_original_parent() call
synchronize_rcu() before returning?  (And probably to use
rcu_assign_pointer() to updater t->real_parent and t->parent.)

Otherwise, it looks like (e.g.) the attempts to get the parent's pid
in fill_prstatus() and tomoyo_sys_getppid() are also theoretical races
of the same kind?

> And I forgot to mention, please send v5 to akpm. We usually route ptrace
> patches via -mm tree.

Will do.

Thanks for being patient with my locking questions! :)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] pid_namespace: pidns_get() should check task_active_pid_ns() != NULL
  2014-04-02 15:58                   ` Oleg Nesterov
@ 2014-04-02 22:01                     ` Eric W. Biederman
  0 siblings, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2014-04-02 22:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matthew Dempsky, Andrew Morton, Kees Cook, Julien Tinnes,
	Roland McGrath, Jan Kratochvil, linux-kernel

Oleg Nesterov <oleg@redhat.com> writes:

2> On 04/02, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> > pidns_get()->get_pid_ns() can hit ns == NULL. This task_struct can't
>> > go away, but task_active_pid_ns(task) is NULL if release_task(task)
>> > was already called. Alternatively we could change get_pid_ns(ns) to
>> > check ns != NULL, but it seems that other callers are fine.
>>
>> get_pid_ns already has a ns == NULL check, so no code changes should be
>> needed.
>
> Hmm...
>
> 	static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns)
> 	{
> 		if (ns != &init_pid_ns)
> 			kref_get(&ns->kref);
> 		return ns;
> 	}
>
> ?

Apparently I could not read this morning, and somehow thought
ns != &init_pid_ns was ns != NULL.


Eric

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] ptrace: Fix fork event messages across pid namespaces
  2014-04-02 21:58             ` [PATCH v3] ptrace: Fix fork event messages across pid namespaces Matthew Dempsky
@ 2014-04-02 22:37               ` Matthew Dempsky
  2014-04-07 19:24                 ` Oleg Nesterov
  0 siblings, 1 reply; 23+ messages in thread
From: Matthew Dempsky @ 2014-04-02 22:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Kees Cook, Julien Tinnes, Roland McGrath, Jan Kratochvil, linux-kernel

On Wed, Apr 2, 2014 at 2:58 PM, Matthew Dempsky <mdempsky@google.com> wrote:
> Otherwise, it looks like (e.g.) the attempts to get the parent's pid
> in fill_prstatus() and tomoyo_sys_getppid() are also theoretical races
> of the same kind?

Doh, nevermind, those functions would potentially be use null "struct
pid" pointers, not "struct pid_namespace" pointers; and pid_nr_ns()
guards against pid being null.

Okay, I think I understand now why guarding against the
task_parent_pid_ns() here is consistent with other RCU code.  I'll
send a revised patch that does that.

Also, it looks like "0" is the conventional value for unrepresentable
pid_t values, so I'll change to do that too.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4] ptrace: Fix fork event messages across pid namespaces
  2014-04-01 22:29         ` [PATCH v3] ptrace: Fix fork event messages " Matthew Dempsky
  2014-04-02  0:39           ` Matthew Dempsky
  2014-04-02 14:58           ` Oleg Nesterov
@ 2014-04-03  2:26           ` Matthew Dempsky
  2014-04-03 15:44             ` Oleg Nesterov
  2014-04-29 20:20             ` [RESEND PATCH " Matthew Dempsky
  2 siblings, 2 replies; 23+ messages in thread
From: Matthew Dempsky @ 2014-04-03  2:26 UTC (permalink / raw)
  To: Oleg Nesterov, Andrew Morton
  Cc: Matthew Dempsky, Kees Cook, Julien Tinnes, Roland McGrath,
	Jan Kratochvil, linux-kernel

v4:
	- Refactor out ptrace_event_pid() to dedup FIXME code
	- Handle task_active_pid_ns() returning NULL
	- Use rcu_dereference() for accessing current->parent

v3:
        - Respond to Oleg feedback about p possibly already exiting
          and adding proper locking
        - Add comment warning that race condition still exists
        - Removed selftest to instead be included with other ptrace tests
        - Removed ptrace_message zero'ing; to be handled in followup patch

v2:
        - Moved selftests/ptrace-pidns into selftests/ptrace as pidns-events
          per feedback from Kees.

8>--------------------------------------------------------------------<8

When tracing a process in another pid namespace, it's important for
fork event messages to contain the child's pid as seen from the
tracer's pid namespace, not the parent's.  Otherwise, the tracer won't
be able to correlate the fork event with later SIGTRAP signals it
receives from the child.

We still risk a race condition if a ptracer from a different pid
namespace attaches after we compute the pid_t value.  However, sending
a bogus fork event message in this unlikely scenario is still a vast
improvement over the status quo where we always send bogus fork event
messages to debuggers in a different pid namespace than the forking
process.

Signed-off-by: Matthew Dempsky <mdempsky@chromium.org>
---
 include/linux/ptrace.h | 31 +++++++++++++++++++++++++++++++
 kernel/fork.c          | 10 +++++++---
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 07d0df6..219c52c 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -5,6 +5,7 @@
 #include <linux/sched.h>		/* For struct task_struct.  */
 #include <linux/err.h>			/* for IS_ERR_VALUE */
 #include <linux/bug.h>			/* For BUG_ON.  */
+#include <linux/pid_namespace.h>	/* For task_active_pid_ns.  */
 #include <uapi/linux/ptrace.h>
 
 /*
@@ -129,6 +130,36 @@ static inline void ptrace_event(int event, unsigned long message)
 }
 
 /**
+ * ptrace_event_pid - possibly stop for a ptrace event notification
+ * @event:	%PTRACE_EVENT_* value to report
+ * @pid:	process identifier for %PTRACE_GETEVENTMSG to return
+ *
+ * Check whether @event is enabled and, if so, report @event and @pid
+ * to the ptrace parent.  @pid is reported as the pid_t seen from the
+ * the ptrace parent's pid namespace.
+ *
+ * Called without locks.
+ */
+static inline void ptrace_event_pid(int event, struct pid *pid)
+{
+	/*
+	 * FIXME: There's a potential race if a ptracer in a different pid
+	 * namespace than parent attaches between computing message below and
+	 * and when we acquire tasklist_lock in ptrace_stop().
+	 */
+	unsigned long message = 0;
+	struct pid_namespace *ns;
+
+	rcu_read_lock();
+	ns = task_active_pid_ns(rcu_dereference(current->parent));
+	if (ns)
+		message = pid_nr_ns(pid, ns);
+	rcu_read_unlock();
+
+	ptrace_event(event, message);
+}
+
+/**
  * ptrace_init_task - initialize ptrace state for a new child
  * @child:		new child task
  * @ptrace:		true if child should be ptrace'd by parent's tracer
diff --git a/kernel/fork.c b/kernel/fork.c
index 332688e..881cfe8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1605,10 +1605,12 @@ long do_fork(unsigned long clone_flags,
 	 */
 	if (!IS_ERR(p)) {
 		struct completion vfork;
+		struct pid *pid;
 
 		trace_sched_process_fork(current, p);
 
-		nr = task_pid_vnr(p);
+		pid = get_task_pid(p, PIDTYPE_PID);
+		nr = pid_vnr(pid);
 
 		if (clone_flags & CLONE_PARENT_SETTID)
 			put_user(nr, parent_tidptr);
@@ -1623,12 +1625,14 @@ long do_fork(unsigned long clone_flags,
 
 		/* forking complete and child started to run, tell ptracer */
 		if (unlikely(trace))
-			ptrace_event(trace, nr);
+			ptrace_event_pid(trace, pid);
 
 		if (clone_flags & CLONE_VFORK) {
 			if (!wait_for_vfork_done(p, &vfork))
-				ptrace_event(PTRACE_EVENT_VFORK_DONE, nr);
+				ptrace_event_pid(PTRACE_EVENT_VFORK_DONE, pid);
 		}
+
+		put_pid(pid);
 	} else {
 		nr = PTR_ERR(p);
 	}
-- 
1.9.1.423.g4596e3a

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4] ptrace: Fix fork event messages across pid namespaces
  2014-04-03  2:26           ` [PATCH v4] " Matthew Dempsky
@ 2014-04-03 15:44             ` Oleg Nesterov
  2014-04-03 16:13               ` Oleg Nesterov
  2014-04-03 18:07               ` Matthew Dempsky
  2014-04-29 20:20             ` [RESEND PATCH " Matthew Dempsky
  1 sibling, 2 replies; 23+ messages in thread
From: Oleg Nesterov @ 2014-04-03 15:44 UTC (permalink / raw)
  To: Matthew Dempsky
  Cc: Andrew Morton, Kees Cook, Julien Tinnes, Roland McGrath,
	Jan Kratochvil, linux-kernel

On 04/02, Matthew Dempsky wrote:
>
> When tracing a process in another pid namespace, it's important for
> fork event messages to contain the child's pid as seen from the
> tracer's pid namespace, not the parent's.  Otherwise, the tracer won't
> be able to correlate the fork event with later SIGTRAP signals it
> receives from the child.
>
> We still risk a race condition if a ptracer from a different pid
> namespace attaches after we compute the pid_t value.  However, sending
> a bogus fork event message in this unlikely scenario is still a vast
> improvement over the status quo where we always send bogus fork event
> messages to debuggers in a different pid namespace than the forking
> process.
>
> Signed-off-by: Matthew Dempsky <mdempsky@chromium.org>

Thanks,

Acked-by: Oleg Nesterov <oleg@redhat.com>



Some notes for potential future changes...

	- I do not not see any potential user of ptrace_event_pid() outside
	  of fork.c, so perhaps this helper should not be exported.

	  In fact I wouldn't mind if you send v5 which moves it into fork.c ;)

	- The usage of "trace" doesn't look very consistent after this patch...
	  OK, probably I'll try to cleanup this later.		

	- OTOH, calculating pid_nr in the namespace of ->parent can probably
	  go into another simple (exported) helper. do_notify_parent_*() and
	  exec_binprm() could use it, even they do not have the problem with
	  task_active_pid_ns(parent) == NULL. Not sure.

	- I am thinking about another approach... Suppose that we change
	  ptrace_attach() to nullify ->ptrace_message, as we already discussed
	  this probably makes sense anyway.

	  Now (ignoring CLONE_VFORK to simplify the discussion), do_fork() can
	  do something like the following:

		...

		if (trace)
			current->ptrace_message = calc_its_pid_in_parent_ns();

		wake_up_new_task(p);
		
		if (trace && current->ptrace_message && ptrace_event_enabled(trace))
			ptrace_notify(...);

	 so that if we race with detach + attach we should likely see
	 ->ptrace_message == 0 and report nothing as if the new debugger
	 attached after do_fork().

	 Not sure this is really possible (without additional complications),
	 and of course we still can't close the race completely.

	 But even if this can work and will not look too ugly, it would be
	 better to do this on top of your simple patch.


Matthew, I see other emails from you, will try to reply tomorrow.

Oleg.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4] ptrace: Fix fork event messages across pid namespaces
  2014-04-03 15:44             ` Oleg Nesterov
@ 2014-04-03 16:13               ` Oleg Nesterov
  2014-04-03 18:07               ` Matthew Dempsky
  1 sibling, 0 replies; 23+ messages in thread
From: Oleg Nesterov @ 2014-04-03 16:13 UTC (permalink / raw)
  To: Matthew Dempsky
  Cc: Andrew Morton, Kees Cook, Julien Tinnes, Roland McGrath,
	Jan Kratochvil, linux-kernel

On 04/03, Oleg Nesterov wrote:
>
> Some notes for potential future changes...
>
> 	- I do not not see any potential user of ptrace_event_pid() outside
> 	  of fork.c, so perhaps this helper should not be exported.
>
> 	  In fact I wouldn't mind if you send v5 which moves it into fork.c ;)

OTOH, exec_binprm() can use it... although in this case it needs get/put_pid
too. So please ignore.

Oleg.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4] ptrace: Fix fork event messages across pid namespaces
  2014-04-03 15:44             ` Oleg Nesterov
  2014-04-03 16:13               ` Oleg Nesterov
@ 2014-04-03 18:07               ` Matthew Dempsky
  2014-04-07 19:06                 ` Oleg Nesterov
  1 sibling, 1 reply; 23+ messages in thread
From: Matthew Dempsky @ 2014-04-03 18:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Kees Cook, Julien Tinnes, Roland McGrath,
	Jan Kratochvil, linux-kernel

On Thu, Apr 3, 2014 at 8:44 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 04/02, Matthew Dempsky wrote:
>>
>> When tracing a process in another pid namespace, it's important for
>> fork event messages to contain the child's pid as seen from the
>> tracer's pid namespace, not the parent's.  Otherwise, the tracer won't
>> be able to correlate the fork event with later SIGTRAP signals it
>> receives from the child.
>>
>> We still risk a race condition if a ptracer from a different pid
>> namespace attaches after we compute the pid_t value.  However, sending
>> a bogus fork event message in this unlikely scenario is still a vast
>> improvement over the status quo where we always send bogus fork event
>> messages to debuggers in a different pid namespace than the forking
>> process.
>>
>> Signed-off-by: Matthew Dempsky <mdempsky@chromium.org>
>
> Thanks,
>
> Acked-by: Oleg Nesterov <oleg@redhat.com>

Thanks!

> Some notes for potential future changes...
>
>         - I do not not see any potential user of ptrace_event_pid() outside
>           of fork.c, so perhaps this helper should not be exported.
>
>           In fact I wouldn't mind if you send v5 which moves it into fork.c ;)

Like you mentioned, it's potentially used by fs/exec.c too, which I
was intending to send a followup patch for.

>         - The usage of "trace" doesn't look very consistent after this patch...
>           OK, probably I'll try to cleanup this later.
>
>         - OTOH, calculating pid_nr in the namespace of ->parent can probably
>           go into another simple (exported) helper. do_notify_parent_*() and
>           exec_binprm() could use it, even they do not have the problem with
>           task_active_pid_ns(parent) == NULL. Not sure.

I think do_notify_parent_*() are safe from task_active_pid_ns(parent)
== NULL because they're under tasklist_lock, but it looks like
exec_binprm() is theoretically racy (but would be fixed by converting
it to use ptrace_event_pid()).

>         - I am thinking about another approach... Suppose that we change
>           ptrace_attach() to nullify ->ptrace_message, as we already discussed
>           this probably makes sense anyway.
>
>           Now (ignoring CLONE_VFORK to simplify the discussion), do_fork() can
>           do something like the following:
>
>                 ...
>
>                 if (trace)
>                         current->ptrace_message = calc_its_pid_in_parent_ns();
>
>                 wake_up_new_task(p);
>
>                 if (trace && current->ptrace_message && ptrace_event_enabled(trace))
>                         ptrace_notify(...);
>
>          so that if we race with detach + attach we should likely see
>          ->ptrace_message == 0 and report nothing as if the new debugger
>          attached after do_fork().
>
>          Not sure this is really possible (without additional complications),
>          and of course we still can't close the race completely.
>
>          But even if this can work and will not look too ugly, it would be
>          better to do this on top of your simple patch.

Hm, that's an interesting idea.  I'll think about it.

> Matthew, I see other emails from you, will try to reply tomorrow.

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4] ptrace: Fix fork event messages across pid namespaces
  2014-04-03 18:07               ` Matthew Dempsky
@ 2014-04-07 19:06                 ` Oleg Nesterov
  0 siblings, 0 replies; 23+ messages in thread
From: Oleg Nesterov @ 2014-04-07 19:06 UTC (permalink / raw)
  To: Matthew Dempsky
  Cc: Andrew Morton, Kees Cook, Julien Tinnes, Roland McGrath,
	Jan Kratochvil, linux-kernel

On 04/03, Matthew Dempsky wrote:
>
> On Thu, Apr 3, 2014 at 8:44 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Some notes for potential future changes...
> >
> >         - I do not not see any potential user of ptrace_event_pid() outside
> >           of fork.c, so perhaps this helper should not be exported.
> >
> >           In fact I wouldn't mind if you send v5 which moves it into fork.c ;)
>
> Like you mentioned, it's potentially used by fs/exec.c too, which I
> was intending to send a followup patch for.

OK, agreed. Probably we can tolerate the extra get/put_pid() but make
this code look better.

> >         - OTOH, calculating pid_nr in the namespace of ->parent can probably
> >           go into another simple (exported) helper. do_notify_parent_*() and
> >           exec_binprm() could use it, even they do not have the problem with
> >           task_active_pid_ns(parent) == NULL. Not sure.
>
> I think do_notify_parent_*() are safe from task_active_pid_ns(parent)
> == NULL because they're under tasklist_lock,

Yes, they are fine correctness-wise, just this task_pid_nr_ns(...) doesn't
look readable. OK, please forget, from_kuid_munged() doesn't look better.

> but it looks like
> exec_binprm() is theoretically racy

No (if you meant task_pid_nr_ns() == NULL). Note that __task_pid_nr_ns()
checks ns != NULL. In fact this is bad, this just reminds that we have
too many helpers with the subtle differences ;)

But if you meant that it can report the wrong pid then yes, of course,
it can race with detach/attach too.

Oleg.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] ptrace: Fix fork event messages across pid namespaces
  2014-04-02 22:37               ` Matthew Dempsky
@ 2014-04-07 19:24                 ` Oleg Nesterov
  0 siblings, 0 replies; 23+ messages in thread
From: Oleg Nesterov @ 2014-04-07 19:24 UTC (permalink / raw)
  To: Matthew Dempsky
  Cc: Kees Cook, Julien Tinnes, Roland McGrath, Jan Kratochvil, linux-kernel

On 04/02, Matthew Dempsky wrote:
>
> On Wed, Apr 2, 2014 at 2:58 PM, Matthew Dempsky <mdempsky@google.com> wrote:
> > Otherwise, it looks like (e.g.) the attempts to get the parent's pid
> > in fill_prstatus() and tomoyo_sys_getppid() are also theoretical races
> > of the same kind?
>
> Doh, nevermind, those functions would potentially be use null "struct
> pid" pointers, not "struct pid_namespace" pointers; and pid_nr_ns()
> guards against pid being null.

Yes, and they want the parent's pid, not ours pid in parent's namespace.

Hmm. In fact I am replying only because I am not sure fill_psinfo() is
correct, perhaps it confuses tid with tgid. I dunno.

Oleg.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RESEND PATCH v4] ptrace: Fix fork event messages across pid namespaces
  2014-04-03  2:26           ` [PATCH v4] " Matthew Dempsky
  2014-04-03 15:44             ` Oleg Nesterov
@ 2014-04-29 20:20             ` Matthew Dempsky
  2014-04-29 22:11               ` Andrew Morton
  2014-04-30  0:34               ` [PATCH v5] " Matthew Dempsky
  1 sibling, 2 replies; 23+ messages in thread
From: Matthew Dempsky @ 2014-04-29 20:20 UTC (permalink / raw)
  To: Oleg Nesterov, Andrew Morton
  Cc: Matthew Dempsky, Kees Cook, Julien Tinnes, Roland McGrath,
	Jan Kratochvil, linux-kernel

v4 resend:
	- Ran "git pull --rebase"; no code changes needed.

v4:
	- Refactor out ptrace_event_pid() to dedup FIXME code
	- Handle task_active_pid_ns() returning NULL
	- Use rcu_dereference() for accessing current->parent

v3:
        - Respond to Oleg feedback about p possibly already exiting
          and adding proper locking
        - Add comment warning that race condition still exists
        - Removed selftest to instead be included with other ptrace tests
        - Removed ptrace_message zero'ing; to be handled in followup patch

v2:
        - Moved selftests/ptrace-pidns into selftests/ptrace as pidns-events
          per feedback from Kees.

8>--------------------------------------------------------------------<8

When tracing a process in another pid namespace, it's important for
fork event messages to contain the child's pid as seen from the
tracer's pid namespace, not the parent's.  Otherwise, the tracer won't
be able to correlate the fork event with later SIGTRAP signals it
receives from the child.

We still risk a race condition if a ptracer from a different pid
namespace attaches after we compute the pid_t value.  However, sending
a bogus fork event message in this unlikely scenario is still a vast
improvement over the status quo where we always send bogus fork event
messages to debuggers in a different pid namespace than the forking
process.

Signed-off-by: Matthew Dempsky <mdempsky@chromium.org>
---
 include/linux/ptrace.h | 31 +++++++++++++++++++++++++++++++
 kernel/fork.c          | 10 +++++++---
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 07d0df6..219c52c 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -5,6 +5,7 @@
 #include <linux/sched.h>		/* For struct task_struct.  */
 #include <linux/err.h>			/* for IS_ERR_VALUE */
 #include <linux/bug.h>			/* For BUG_ON.  */
+#include <linux/pid_namespace.h>	/* For task_active_pid_ns.  */
 #include <uapi/linux/ptrace.h>
 
 /*
@@ -129,6 +130,36 @@ static inline void ptrace_event(int event, unsigned long message)
 }
 
 /**
+ * ptrace_event_pid - possibly stop for a ptrace event notification
+ * @event:	%PTRACE_EVENT_* value to report
+ * @pid:	process identifier for %PTRACE_GETEVENTMSG to return
+ *
+ * Check whether @event is enabled and, if so, report @event and @pid
+ * to the ptrace parent.  @pid is reported as the pid_t seen from the
+ * the ptrace parent's pid namespace.
+ *
+ * Called without locks.
+ */
+static inline void ptrace_event_pid(int event, struct pid *pid)
+{
+	/*
+	 * FIXME: There's a potential race if a ptracer in a different pid
+	 * namespace than parent attaches between computing message below and
+	 * and when we acquire tasklist_lock in ptrace_stop().
+	 */
+	unsigned long message = 0;
+	struct pid_namespace *ns;
+
+	rcu_read_lock();
+	ns = task_active_pid_ns(rcu_dereference(current->parent));
+	if (ns)
+		message = pid_nr_ns(pid, ns);
+	rcu_read_unlock();
+
+	ptrace_event(event, message);
+}
+
+/**
  * ptrace_init_task - initialize ptrace state for a new child
  * @child:		new child task
  * @ptrace:		true if child should be ptrace'd by parent's tracer
diff --git a/kernel/fork.c b/kernel/fork.c
index 54a8d26..1429043 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1606,10 +1606,12 @@ long do_fork(unsigned long clone_flags,
 	 */
 	if (!IS_ERR(p)) {
 		struct completion vfork;
+		struct pid *pid;
 
 		trace_sched_process_fork(current, p);
 
-		nr = task_pid_vnr(p);
+		pid = get_task_pid(p, PIDTYPE_PID);
+		nr = pid_vnr(pid);
 
 		if (clone_flags & CLONE_PARENT_SETTID)
 			put_user(nr, parent_tidptr);
@@ -1624,12 +1626,14 @@ long do_fork(unsigned long clone_flags,
 
 		/* forking complete and child started to run, tell ptracer */
 		if (unlikely(trace))
-			ptrace_event(trace, nr);
+			ptrace_event_pid(trace, pid);
 
 		if (clone_flags & CLONE_VFORK) {
 			if (!wait_for_vfork_done(p, &vfork))
-				ptrace_event(PTRACE_EVENT_VFORK_DONE, nr);
+				ptrace_event_pid(PTRACE_EVENT_VFORK_DONE, pid);
 		}
+
+		put_pid(pid);
 	} else {
 		nr = PTR_ERR(p);
 	}
-- 
1.9.1.423.g4596e3a

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RESEND PATCH v4] ptrace: Fix fork event messages across pid namespaces
  2014-04-29 20:20             ` [RESEND PATCH " Matthew Dempsky
@ 2014-04-29 22:11               ` Andrew Morton
  2014-04-30  0:34               ` [PATCH v5] " Matthew Dempsky
  1 sibling, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2014-04-29 22:11 UTC (permalink / raw)
  To: Matthew Dempsky
  Cc: Oleg Nesterov, Kees Cook, Julien Tinnes, Roland McGrath,
	Jan Kratochvil, linux-kernel

On Tue, 29 Apr 2014 13:20:58 -0700 Matthew Dempsky <mdempsky@chromium.org> wrote:

> When tracing a process in another pid namespace, it's important for
> fork event messages to contain the child's pid as seen from the
> tracer's pid namespace, not the parent's.  Otherwise, the tracer won't
> be able to correlate the fork event with later SIGTRAP signals it
> receives from the child.
> 
> We still risk a race condition if a ptracer from a different pid
> namespace attaches after we compute the pid_t value.  However, sending
> a bogus fork event message in this unlikely scenario is still a vast
> improvement over the status quo where we always send bogus fork event
> messages to debuggers in a different pid namespace than the forking
> process.

More Oleg review would be nice, please ;)

> @@ -129,6 +130,36 @@ static inline void ptrace_event(int event, unsigned long message)
>  }
>  
>  /**
> + * ptrace_event_pid - possibly stop for a ptrace event notification
> + * @event:	%PTRACE_EVENT_* value to report
> + * @pid:	process identifier for %PTRACE_GETEVENTMSG to return
> + *
> + * Check whether @event is enabled and, if so, report @event and @pid
> + * to the ptrace parent.  @pid is reported as the pid_t seen from the
> + * the ptrace parent's pid namespace.
> + *
> + * Called without locks.
> + */
> +static inline void ptrace_event_pid(int event, struct pid *pid)
> +{
> +	/*
> +	 * FIXME: There's a potential race if a ptracer in a different pid
> +	 * namespace than parent attaches between computing message below and
> +	 * and when we acquire tasklist_lock in ptrace_stop().
> +	 */

Well that's a scary comment.  If we're going to leave the code in this
state then please carefully describe (within this comment) the
*consequences* of the race.  Does the kernel crash?  Give away your ssh
keys?  If not then what.  

And how would userspace recognize and/or recover from the race?

> +	unsigned long message = 0;
> +	struct pid_namespace *ns;
> +
> +	rcu_read_lock();
> +	ns = task_active_pid_ns(rcu_dereference(current->parent));
> +	if (ns)
> +		message = pid_nr_ns(pid, ns);
> +	rcu_read_unlock();
> +
> +	ptrace_event(event, message);
> +}
>
> ...
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v5] ptrace: Fix fork event messages across pid namespaces
  2014-04-29 20:20             ` [RESEND PATCH " Matthew Dempsky
  2014-04-29 22:11               ` Andrew Morton
@ 2014-04-30  0:34               ` Matthew Dempsky
  2014-04-30 11:51                 ` Oleg Nesterov
  2014-04-30 20:16                 ` Andrew Morton
  1 sibling, 2 replies; 23+ messages in thread
From: Matthew Dempsky @ 2014-04-30  0:34 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov
  Cc: Matthew Dempsky, Kees Cook, Julien Tinnes, Roland McGrath,
	Jan Kratochvil, linux-kernel

On Tue, Apr 29, 2014 at 3:11 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> More Oleg review would be nice, please ;)

FWIW, Oleg "acked" v4 earlier in the thread.  Are you asking for
further review from him beyond that?

> Well that's a scary comment.  If we're going to leave the code in this
> state then please carefully describe (within this comment) the
> *consequences* of the race.  Does the kernel crash?  Give away your ssh
> keys?  If not then what.

Sorry, I can see how that comment could be scary without proper
context.  I added another sentence explaining the consequences are
limited to the ptracer receiving a bogus pid_t value from
PTRACE_GETEVENTMSG.

> And how would userspace recognize and/or recover from the race?

If the ptracer attaches via PTRACE_ATTACH, then there shouldn't be a
race: the ptracer can't use PTRACE_SETOPTIONS to request fork events
until after the child has already stopped.  So any SIGTRAP fork events
that it receives before using PTRACE_SETOPTIONS it should disregard,
because it hasn't asked the kernel to send them yet.

If the ptracer attaches via PTRACE_SEIZE and also requests fork events
at the same time, then it would need to discard the first SIGTRAP it
receives for the child if:

  1. it's a fork event;
  2. the ptracer can't otherwise prove the fork happened after the
     PTRACE_SEIZE rather than concurrently; and
  3. the ptracer is concerned a ptracer from a different pid namespace
     may have just detached.

--

v5:
	- Clarify race condition comment to be less scary.

v4 resend:
	- Ran "git pull --rebase"; no code changes needed.

v4:
	- Refactor out ptrace_event_pid() to dedup FIXME code
	- Handle task_active_pid_ns() returning NULL
	- Use rcu_dereference() for accessing current->parent

v3:
        - Respond to Oleg feedback about p possibly already exiting
          and adding proper locking
        - Add comment warning that race condition still exists
        - Removed selftest to instead be included with other ptrace tests
        - Removed ptrace_message zero'ing; to be handled in followup patch

v2:
        - Moved selftests/ptrace-pidns into selftests/ptrace as pidns-events
          per feedback from Kees.

8>--------------------------------------------------------------------<8

When tracing a process in another pid namespace, it's important for
fork event messages to contain the child's pid as seen from the
tracer's pid namespace, not the parent's.  Otherwise, the tracer won't
be able to correlate the fork event with later SIGTRAP signals it
receives from the child.

We still risk a race condition if a ptracer from a different pid
namespace attaches after we compute the pid_t value.  However, sending
a bogus fork event message in this unlikely scenario is still a vast
improvement over the status quo where we always send bogus fork event
messages to debuggers in a different pid namespace than the forking
process.

Signed-off-by: Matthew Dempsky <mdempsky@chromium.org>
---
 include/linux/ptrace.h | 32 ++++++++++++++++++++++++++++++++
 kernel/fork.c          | 10 +++++++---
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 07d0df6..077904c 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -5,6 +5,7 @@
 #include <linux/sched.h>		/* For struct task_struct.  */
 #include <linux/err.h>			/* for IS_ERR_VALUE */
 #include <linux/bug.h>			/* For BUG_ON.  */
+#include <linux/pid_namespace.h>	/* For task_active_pid_ns.  */
 #include <uapi/linux/ptrace.h>
 
 /*
@@ -129,6 +130,37 @@ static inline void ptrace_event(int event, unsigned long message)
 }
 
 /**
+ * ptrace_event_pid - possibly stop for a ptrace event notification
+ * @event:	%PTRACE_EVENT_* value to report
+ * @pid:	process identifier for %PTRACE_GETEVENTMSG to return
+ *
+ * Check whether @event is enabled and, if so, report @event and @pid
+ * to the ptrace parent.  @pid is reported as the pid_t seen from the
+ * the ptrace parent's pid namespace.
+ *
+ * Called without locks.
+ */
+static inline void ptrace_event_pid(int event, struct pid *pid)
+{
+	/*
+	 * FIXME: There's a potential race if a ptracer in a different pid
+	 * namespace than parent attaches between computing message below and
+	 * when we acquire tasklist_lock in ptrace_stop().  If this happens,
+	 * the ptracer will get a bogus pid from PTRACE_GETEVENTMSG.
+	 */
+	unsigned long message = 0;
+	struct pid_namespace *ns;
+
+	rcu_read_lock();
+	ns = task_active_pid_ns(rcu_dereference(current->parent));
+	if (ns)
+		message = pid_nr_ns(pid, ns);
+	rcu_read_unlock();
+
+	ptrace_event(event, message);
+}
+
+/**
  * ptrace_init_task - initialize ptrace state for a new child
  * @child:		new child task
  * @ptrace:		true if child should be ptrace'd by parent's tracer
diff --git a/kernel/fork.c b/kernel/fork.c
index 54a8d26..1429043 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1606,10 +1606,12 @@ long do_fork(unsigned long clone_flags,
 	 */
 	if (!IS_ERR(p)) {
 		struct completion vfork;
+		struct pid *pid;
 
 		trace_sched_process_fork(current, p);
 
-		nr = task_pid_vnr(p);
+		pid = get_task_pid(p, PIDTYPE_PID);
+		nr = pid_vnr(pid);
 
 		if (clone_flags & CLONE_PARENT_SETTID)
 			put_user(nr, parent_tidptr);
@@ -1624,12 +1626,14 @@ long do_fork(unsigned long clone_flags,
 
 		/* forking complete and child started to run, tell ptracer */
 		if (unlikely(trace))
-			ptrace_event(trace, nr);
+			ptrace_event_pid(trace, pid);
 
 		if (clone_flags & CLONE_VFORK) {
 			if (!wait_for_vfork_done(p, &vfork))
-				ptrace_event(PTRACE_EVENT_VFORK_DONE, nr);
+				ptrace_event_pid(PTRACE_EVENT_VFORK_DONE, pid);
 		}
+
+		put_pid(pid);
 	} else {
 		nr = PTR_ERR(p);
 	}
-- 
1.9.1.423.g4596e3a

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v5] ptrace: Fix fork event messages across pid namespaces
  2014-04-30  0:34               ` [PATCH v5] " Matthew Dempsky
@ 2014-04-30 11:51                 ` Oleg Nesterov
  2014-04-30 20:16                 ` Andrew Morton
  1 sibling, 0 replies; 23+ messages in thread
From: Oleg Nesterov @ 2014-04-30 11:51 UTC (permalink / raw)
  To: Matthew Dempsky
  Cc: Andrew Morton, Kees Cook, Julien Tinnes, Roland McGrath,
	Jan Kratochvil, linux-kernel

On 04/29, Matthew Dempsky wrote:
>
> On Tue, Apr 29, 2014 at 3:11 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> > More Oleg review would be nice, please ;)
>
> FWIW, Oleg "acked" v4 earlier in the thread.  Are you asking for
> further review from him beyond that?

Yes, still/again

Acked-by: Oleg Nesterov <oleg@redhat.com>



> > Well that's a scary comment.  If we're going to leave the code in this
> > state then please carefully describe (within this comment) the
> > *consequences* of the race.  Does the kernel crash?  Give away your ssh
> > keys?  If not then what.
>
> Sorry, I can see how that comment could be scary without proper
> context.  I added another sentence explaining the consequences are
> limited to the ptracer receiving a bogus pid_t value from
> PTRACE_GETEVENTMSG.
>
> > And how would userspace recognize and/or recover from the race?
>
> If the ptracer attaches via PTRACE_ATTACH, then there shouldn't be a
> race: the ptracer can't use PTRACE_SETOPTIONS to request fork events
> until after the child has already stopped.  So any SIGTRAP fork events
> that it receives before using PTRACE_SETOPTIONS it should disregard,
> because it hasn't asked the kernel to send them yet.
>
> If the ptracer attaches via PTRACE_SEIZE and also requests fork events
> at the same time, then it would need to discard the first SIGTRAP it
> receives for the child if:
>
>   1. it's a fork event;
>   2. the ptracer can't otherwise prove the fork happened after the
>      PTRACE_SEIZE rather than concurrently; and
>   3. the ptracer is concerned a ptracer from a different pid namespace
>      may have just detached.

And I think we should just ignore this very unlikely and harmless race.

We do not see a simple way to close it and in fact this ptrace_event() is
inherently racy anyway. Even without namespaces, if we race with DETACH +
ATTACH, the new tracer gets the correct child's pid, but the child can be
already untraced.

_Perhaps_ we can do something better later (to remind, we can setup
->ptrace_message beforehand and change ATTACH to clear it), but this is
more subtle and needs more changes.

This patch is straightforward, and it fixes the old/known problem: currently
this pid_t is always wrong unless the tracer is from the root namespace.

Oleg.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v5] ptrace: Fix fork event messages across pid namespaces
  2014-04-30  0:34               ` [PATCH v5] " Matthew Dempsky
  2014-04-30 11:51                 ` Oleg Nesterov
@ 2014-04-30 20:16                 ` Andrew Morton
  1 sibling, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2014-04-30 20:16 UTC (permalink / raw)
  To: Matthew Dempsky
  Cc: Oleg Nesterov, Kees Cook, Julien Tinnes, Roland McGrath,
	Jan Kratochvil, linux-kernel

On Tue, 29 Apr 2014 17:34:37 -0700 Matthew Dempsky <mdempsky@chromium.org> wrote:

> On Tue, Apr 29, 2014 at 3:11 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> > More Oleg review would be nice, please ;)
> 
> FWIW, Oleg "acked" v4 earlier in the thread.  Are you asking for
> further review from him beyond that?

It's conventional to retain acked-by's and reviewed-by's in the
changelog.  Sometimes they can be invalidated by subsequent changes -
that's a judgement call one has to make.



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-04-30 20:16 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1396288478-1314-1-git-send-email-mdempsky@chromium.org>
     [not found] ` <20140331181651.GA27686@redhat.com>
     [not found]   ` <CAF52+S5i7oqJnJ1NN0bk5Vg=CiYrussw0AunteE72kMMcWkeJA@mail.gmail.com>
2014-04-01 18:52     ` [PATCH v2] Fix ptrace events across pid namespaces Oleg Nesterov
2014-04-01 20:44       ` Matthew Dempsky
2014-04-01 22:29         ` [PATCH v3] ptrace: Fix fork event messages " Matthew Dempsky
2014-04-02  0:39           ` Matthew Dempsky
2014-04-02 14:58           ` Oleg Nesterov
2014-04-02 15:44             ` [PATCH 0/1] pid_namespace: pidns_get() should check task_active_pid_ns() != NULL Oleg Nesterov
2014-04-02 15:45               ` [PATCH 1/1] " Oleg Nesterov
2014-04-02 16:53                 ` Eric W. Biederman
2014-04-02 15:58                   ` Oleg Nesterov
2014-04-02 22:01                     ` Eric W. Biederman
2014-04-02 21:58             ` [PATCH v3] ptrace: Fix fork event messages across pid namespaces Matthew Dempsky
2014-04-02 22:37               ` Matthew Dempsky
2014-04-07 19:24                 ` Oleg Nesterov
2014-04-03  2:26           ` [PATCH v4] " Matthew Dempsky
2014-04-03 15:44             ` Oleg Nesterov
2014-04-03 16:13               ` Oleg Nesterov
2014-04-03 18:07               ` Matthew Dempsky
2014-04-07 19:06                 ` Oleg Nesterov
2014-04-29 20:20             ` [RESEND PATCH " Matthew Dempsky
2014-04-29 22:11               ` Andrew Morton
2014-04-30  0:34               ` [PATCH v5] " Matthew Dempsky
2014-04-30 11:51                 ` Oleg Nesterov
2014-04-30 20:16                 ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.