linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* CLONE_PARENT after setns(CLONE_NEWPID)
@ 2013-11-06 18:02 Serge Hallyn
  2013-11-06 19:33 ` Oleg Nesterov
  0 siblings, 1 reply; 15+ messages in thread
From: Serge Hallyn @ 2013-11-06 18:02 UTC (permalink / raw)
  To: Oleg Nesterov, Christian Seiler
  Cc: lkml, Andy Whitcroft, Eric W. Biederman, Lxc development list

Hi Oleg,

commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
"fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
breaks lxc-attach in 3.12.  That code forks a child which does
setns() and then does a clone(CLONE_PARENT).  That way the
grandchild can be in the right namespaces (which the child was
not) and be a child of the original task, which is the monitor.

lxc-attach in 3.11 was working fine with no side effects that I
could see.  Is there a real danger in allowing CLONE_PARENT
when current->nsproxy->pidns_for_children is not our pidns,
or was this done out of an "over-abundance of caution"?  Can we
safely revert that new extra check?

thanks,
-serge

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CLONE_PARENT after setns(CLONE_NEWPID)
  2013-11-06 18:02 CLONE_PARENT after setns(CLONE_NEWPID) Serge Hallyn
@ 2013-11-06 19:33 ` Oleg Nesterov
  2013-11-06 19:50   ` Andy Lutomirski
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Oleg Nesterov @ 2013-11-06 19:33 UTC (permalink / raw)
  To: Serge Hallyn, Andy Lutomirski, Brad Spengler
  Cc: Christian Seiler, lkml, Andy Whitcroft, Eric W. Biederman,
	Lxc development list

Hi Serge,

On 11/06, Serge Hallyn wrote:
>
> Hi Oleg,
>
> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
> breaks lxc-attach in 3.12.  That code forks a child which does
> setns() and then does a clone(CLONE_PARENT).  That way the
> grandchild can be in the right namespaces (which the child was
> not) and be a child of the original task, which is the monitor.

Thanks...

Yes, this is what 40a0d32d1ea explicitly tries to disallow.

> Is there a real danger in allowing CLONE_PARENT
> when current->nsproxy->pidns_for_children is not our pidns,
> or was this done out of an "over-abundance of caution"?

I am not sure... This all was based on the long discussion, and
it was decided that the CLONE_PARENT check should be consistent
wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns().

> Can we
> safely revert that new extra check?

Well, usually we do not break user-space, but I am not sure about
this case...

Eric, Andy, what do you think?

And if we allow CLONE_PARENT when ->pidns_for_children was changed,
should we also allow, say, CLONE_NEWPID && CLONE_PARENT ?

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CLONE_PARENT after setns(CLONE_NEWPID)
  2013-11-06 19:33 ` Oleg Nesterov
@ 2013-11-06 19:50   ` Andy Lutomirski
  2013-11-06 20:06     ` Oleg Nesterov
  2013-11-06 22:50   ` Eric W. Biederman
  2013-11-06 22:53   ` Serge Hallyn
  2 siblings, 1 reply; 15+ messages in thread
From: Andy Lutomirski @ 2013-11-06 19:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Serge Hallyn, Brad Spengler, Christian Seiler, lkml,
	Andy Whitcroft, Eric W. Biederman, Lxc development list

On Wed, Nov 6, 2013 at 11:33 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> Hi Serge,
>
> On 11/06, Serge Hallyn wrote:
>>
>> Hi Oleg,
>>
>> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
>> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
>> breaks lxc-attach in 3.12.  That code forks a child which does
>> setns() and then does a clone(CLONE_PARENT).  That way the
>> grandchild can be in the right namespaces (which the child was
>> not) and be a child of the original task, which is the monitor.
>
> Thanks...
>
> Yes, this is what 40a0d32d1ea explicitly tries to disallow.
>
>> Is there a real danger in allowing CLONE_PARENT
>> when current->nsproxy->pidns_for_children is not our pidns,
>> or was this done out of an "over-abundance of caution"?
>
> I am not sure... This all was based on the long discussion, and
> it was decided that the CLONE_PARENT check should be consistent
> wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns().
>
>> Can we
>> safely revert that new extra check?
>
> Well, usually we do not break user-space, but I am not sure about
> this case...

Presumably if we allow this, then we should also allow
clone(CLONE_NEWPID | CLONE_PARENT).  This seems a little odd, but off
the top of my head it doesn't seem obviously dangerous.

(Why were we worried about this in the first place?  The comment says
that we don't want signal handlers or thread groups to span
namespaces, but CLONE_PARENT has nothing to do with that.)

I feel like I'm rehashing something ancient, but shouldn't that code just be:

if (clone_flags & CLONE_VM) {
  // check for unsharing namespaces

with an update to the comment that CLONE_THREAD and CLONE_SIGHAND both
require CLONE_VM.

--Andy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CLONE_PARENT after setns(CLONE_NEWPID)
  2013-11-06 19:50   ` Andy Lutomirski
@ 2013-11-06 20:06     ` Oleg Nesterov
  2013-11-06 20:21       ` Andy Lutomirski
  0 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2013-11-06 20:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Serge Hallyn, Brad Spengler, Christian Seiler, lkml,
	Andy Whitcroft, Eric W. Biederman, Lxc development list

On 11/06, Andy Lutomirski wrote:
>
> On Wed, Nov 6, 2013 at 11:33 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > Hi Serge,
> >
> > On 11/06, Serge Hallyn wrote:
> >>
> >> Hi Oleg,
> >>
> >> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
> >> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
> >> breaks lxc-attach in 3.12.  That code forks a child which does
> >> setns() and then does a clone(CLONE_PARENT).  That way the
> >> grandchild can be in the right namespaces (which the child was
> >> not) and be a child of the original task, which is the monitor.
> >
> > Thanks...
> >
> > Yes, this is what 40a0d32d1ea explicitly tries to disallow.
> >
> >> Is there a real danger in allowing CLONE_PARENT
> >> when current->nsproxy->pidns_for_children is not our pidns,
> >> or was this done out of an "over-abundance of caution"?
> >
> > I am not sure... This all was based on the long discussion, and
> > it was decided that the CLONE_PARENT check should be consistent
> > wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns().
> >
> >> Can we
> >> safely revert that new extra check?
> >
> > Well, usually we do not break user-space, but I am not sure about
> > this case...
>
> Presumably if we allow this, then we should also allow
> clone(CLONE_NEWPID | CLONE_PARENT).

Yes, agreed. but this means another change, this was forbidden even
before this commit.

> This seems a little odd, but off
> the top of my head it doesn't seem obviously dangerous.

I do not see any "strong" reason too. At least right now... But I would
say that it would be better to not allow to abuse ->real_parent, it
doesn't event know about the new child (if CLONE_PARENT).

> (Why were we worried about this in the first place?  The comment says
> that we don't want signal handlers or thread groups to span
> namespaces, but CLONE_PARENT has nothing to do with that.)

it also says "or parent" ;)

> I feel like I'm rehashing something ancient, but shouldn't that code just be:
>
> if (clone_flags & CLONE_VM) {
>   // check for unsharing namespaces

No, this will break vfork().

And note that CLONE_SIGHAND was disallowed "just in case" and because
do_fork() had a similar check. Sharing the signal handlers is fine afaics.

>From e79f525e:

	We could probably even drop CLONE_SIGHAND and use CLONE_THREAD, but it
	would be safer to not do this.  The current check denies CLONE_SIGHAND
	implicitely and there is no reason to change this.

And I disagree with

	Eric said "CLONE_SIGHAND is fine.  CLONE_THREAD would be even better.
	Having shared signal handling between two different pid namespaces is
	the case that we are fundamentally guarding against."

added during the merging ;) Or perhaps I misunderstood the text above. But this
all is off-topic.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CLONE_PARENT after setns(CLONE_NEWPID)
  2013-11-06 20:06     ` Oleg Nesterov
@ 2013-11-06 20:21       ` Andy Lutomirski
  0 siblings, 0 replies; 15+ messages in thread
From: Andy Lutomirski @ 2013-11-06 20:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Serge Hallyn, Brad Spengler, Christian Seiler, lkml,
	Andy Whitcroft, Eric W. Biederman, Lxc development list

On Wed, Nov 6, 2013 at 12:06 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 11/06, Andy Lutomirski wrote:
>>
>> On Wed, Nov 6, 2013 at 11:33 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> > Hi Serge,
>> >
>> > On 11/06, Serge Hallyn wrote:
>> >>
>> >> Hi Oleg,
>> >>
>> >> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
>> >> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
>> >> breaks lxc-attach in 3.12.  That code forks a child which does
>> >> setns() and then does a clone(CLONE_PARENT).  That way the
>> >> grandchild can be in the right namespaces (which the child was
>> >> not) and be a child of the original task, which is the monitor.
>> >
>> > Thanks...
>> >
>> > Yes, this is what 40a0d32d1ea explicitly tries to disallow.
>> >
>> >> Is there a real danger in allowing CLONE_PARENT
>> >> when current->nsproxy->pidns_for_children is not our pidns,
>> >> or was this done out of an "over-abundance of caution"?
>> >
>> > I am not sure... This all was based on the long discussion, and
>> > it was decided that the CLONE_PARENT check should be consistent
>> > wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns().
>> >
>> >> Can we
>> >> safely revert that new extra check?
>> >
>> > Well, usually we do not break user-space, but I am not sure about
>> > this case...
>>
>> Presumably if we allow this, then we should also allow
>> clone(CLONE_NEWPID | CLONE_PARENT).
>
> Yes, agreed. but this means another change, this was forbidden even
> before this commit.

I'm really not a fan of allowing things by one path but disallowing
them by another.  That way lurk hidden bugs and
incomprehensibilities...

>
>> This seems a little odd, but off
>> the top of my head it doesn't seem obviously dangerous.
>
> I do not see any "strong" reason too. At least right now... But I would
> say that it would be better to not allow to abuse ->real_parent, it
> doesn't event know about the new child (if CLONE_PARENT).

I'm not sure what namespaces have to do with this, though.  The
grandparent may be surprised to acquire a new child, but I don't see
why it would care about the security context of that child.

I admit that this stuff is complicated, but I don't see the problem
that any of this is trying to prevent.

>
>> (Why were we worried about this in the first place?  The comment says
>> that we don't want signal handlers or thread groups to span
>> namespaces, but CLONE_PARENT has nothing to do with that.)
>
> it also says "or parent" ;)
>
>> I feel like I'm rehashing something ancient, but shouldn't that code just be:
>>
>> if (clone_flags & CLONE_VM) {
>>   // check for unsharing namespaces
>
> No, this will break vfork().
>
> And note that CLONE_SIGHAND was disallowed "just in case" and because
> do_fork() had a similar check. Sharing the signal handlers is fine afaics.
>
> From e79f525e:
>
>         We could probably even drop CLONE_SIGHAND and use CLONE_THREAD, but it
>         would be safer to not do this.  The current check denies CLONE_SIGHAND
>         implicitely and there is no reason to change this.
>
> And I disagree with
>
>         Eric said "CLONE_SIGHAND is fine.  CLONE_THREAD would be even better.
>         Having shared signal handling between two different pid namespaces is
>         the case that we are fundamentally guarding against."
>
> added during the merging ;) Or perhaps I misunderstood the text above. But this
> all is off-topic.
>
> Oleg.
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CLONE_PARENT after setns(CLONE_NEWPID)
  2013-11-06 19:33 ` Oleg Nesterov
  2013-11-06 19:50   ` Andy Lutomirski
@ 2013-11-06 22:50   ` Eric W. Biederman
  2013-11-06 22:56     ` Andy Lutomirski
                       ` (4 more replies)
  2013-11-06 22:53   ` Serge Hallyn
  2 siblings, 5 replies; 15+ messages in thread
From: Eric W. Biederman @ 2013-11-06 22:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Serge Hallyn, Andy Lutomirski, Brad Spengler, Christian Seiler,
	lkml, Andy Whitcroft, Lxc development list

Oleg Nesterov <oleg@redhat.com> writes:

> Hi Serge,
>
> On 11/06, Serge Hallyn wrote:
>>
>> Hi Oleg,
>>
>> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
>> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
>> breaks lxc-attach in 3.12.  That code forks a child which does
>> setns() and then does a clone(CLONE_PARENT).  That way the
>> grandchild can be in the right namespaces (which the child was
>> not) and be a child of the original task, which is the monitor.

Serge that is a clever trick to get around the limitation that we can
not change the pid namespace of our current process.  Given the
challenging relaying of signals etc I can see why you would use this.

At the same time it makes me a little sad to see new users of
CLONE_PARENT.  With CLONE_THREAD in existence the original reasons for
CLONE_PARENT are gone now.

Having used bash as an init process I know it can handle unexpeted
children.  However using CLONE_PARENT in this way still seems a little
dodgy.  Or am I misunderstanding why you are using CLONE_PARENT?

That trick sounds like it might be worth adding to nsenter in util-linux
just to simplify the code.

> Thanks...
>
> Yes, this is what 40a0d32d1ea explicitly tries to disallow.
>
>> Is there a real danger in allowing CLONE_PARENT
>> when current->nsproxy->pidns_for_children is not our pidns,
>> or was this done out of an "over-abundance of caution"?
>
> I am not sure... This all was based on the long discussion, and
> it was decided that the CLONE_PARENT check should be consistent
> wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns().
>
>> Can we
>> safely revert that new extra check?
>
> Well, usually we do not break user-space, but I am not sure about
> this case...
>
> Eric, Andy, what do you think?
>
> And if we allow CLONE_PARENT when ->pidns_for_children was changed,
> should we also allow, say, CLONE_NEWPID && CLONE_PARENT ?

The two fundamental things I know we can not allow are:
- A shared signal queue aka CLONE_THREAD.  Because we compute the pid
  and uid of the signal when we place it in the queue.

- Changing the pid and by extention pid_namespace of an existing
  process.

>From a parents perspective there is nothing special about the pid
namespace, to deny CLONE_PARENT, because the parent simply won't know or
care.

>From the childs perspective all that is special really are shared signal
queues.

User mode threading with CLONE_PARENT|CLONE_VM|CLONE_SIGHAND and tasks
in different pid namespaces is almost certainly going to break because
it is complicated.  But shared signal handlers can look at per thread
information to know which pid namespace a process is in, so I don't know
of any reason not to support CLONE_PARENT|CLONE_VM|CLONE_SIGHAND threads
at the kernel level.  It would be absolutely stupid to implement but
that is a different thing.

So hmm.

Because it can do no harm, and because it is a regression let's remove
the CLONE_PARENT check and send it stable.

diff --git a/kernel/fork.c b/kernel/fork.c
index 086fe73..c447fbc 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1174,7 +1174,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
         * do not allow it to share a thread group or signal handlers or
         * parent with the forking task.
         */
-       if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) {
+       if (clone_flags & (CLONE_SIGHAND)) {
                if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
                    (task_active_pid_ns(current) !=
                                current->nsproxy->pid_ns_for_children))


I don't know if there are shells that CLONE_PARENT can confuse but if
there are lxcattach and nsenter using this functionality should be
enough to slowly get that confusion fixed.

Changing the CLONE_SIGHAND into CLONE_THREAD will need to happen in a
separate patch.  It isn't stable material, and so far there is no
compelling use case for it.

Eric

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: CLONE_PARENT after setns(CLONE_NEWPID)
  2013-11-06 22:53   ` Serge Hallyn
@ 2013-11-06 22:53     ` Eric W. Biederman
  0 siblings, 0 replies; 15+ messages in thread
From: Eric W. Biederman @ 2013-11-06 22:53 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Oleg Nesterov, Andy Lutomirski, Brad Spengler, Christian Seiler,
	lkml, Andy Whitcroft, Lxc development list

Serge Hallyn <serge.hallyn@ubuntu.com> writes:

> So apart from peers seeing the new task as having pid 0, and
> sigchild going to the grandparent, are there any other side
> effects?  Is ptrace an issue?  (I took a quick look but it
> doesn't seem like it)

There is nothing new the pid namespace adds to the pid namespace case.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CLONE_PARENT after setns(CLONE_NEWPID)
  2013-11-06 19:33 ` Oleg Nesterov
  2013-11-06 19:50   ` Andy Lutomirski
  2013-11-06 22:50   ` Eric W. Biederman
@ 2013-11-06 22:53   ` Serge Hallyn
  2013-11-06 22:53     ` Eric W. Biederman
  2 siblings, 1 reply; 15+ messages in thread
From: Serge Hallyn @ 2013-11-06 22:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, Brad Spengler, Christian Seiler, lkml,
	Andy Whitcroft, Eric W. Biederman, Lxc development list

Quoting Oleg Nesterov (oleg@redhat.com):
> Hi Serge,
> 
> On 11/06, Serge Hallyn wrote:
> >
> > Hi Oleg,
> >
> > commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
> > "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
> > breaks lxc-attach in 3.12.  That code forks a child which does
> > setns() and then does a clone(CLONE_PARENT).  That way the
> > grandchild can be in the right namespaces (which the child was
> > not) and be a child of the original task, which is the monitor.
> 
> Thanks...
> 
> Yes, this is what 40a0d32d1ea explicitly tries to disallow.
> 
> > Is there a real danger in allowing CLONE_PARENT
> > when current->nsproxy->pidns_for_children is not our pidns,
> > or was this done out of an "over-abundance of caution"?
> 
> I am not sure... This all was based on the long discussion, and
> it was decided that the CLONE_PARENT check should be consistent
> wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns().

So apart from peers seeing the new task as having pid 0, and
sigchild going to the grandparent, are there any other side
effects?  Is ptrace an issue?  (I took a quick look but it
doesn't seem like it)

If not, then I very much think we should continue to allow this.

-serge

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CLONE_PARENT after setns(CLONE_NEWPID)
  2013-11-06 22:50   ` Eric W. Biederman
@ 2013-11-06 22:56     ` Andy Lutomirski
  2013-11-06 23:17       ` Serge Hallyn
  2013-11-06 23:12     ` Serge Hallyn
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Andy Lutomirski @ 2013-11-06 22:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, Serge Hallyn, Brad Spengler, Christian Seiler,
	lkml, Andy Whitcroft, Lxc development list

On Wed, Nov 6, 2013 at 2:50 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Oleg Nesterov <oleg@redhat.com> writes:
>
>> Hi Serge,
>>
>> On 11/06, Serge Hallyn wrote:
>>>
>>> Hi Oleg,
>>>
>>> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
>>> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
>>> breaks lxc-attach in 3.12.  That code forks a child which does
>>> setns() and then does a clone(CLONE_PARENT).  That way the
>>> grandchild can be in the right namespaces (which the child was
>>> not) and be a child of the original task, which is the monitor.
>
> Serge that is a clever trick to get around the limitation that we can
> not change the pid namespace of our current process.  Given the
> challenging relaying of signals etc I can see why you would use this.
>
> At the same time it makes me a little sad to see new users of
> CLONE_PARENT.  With CLONE_THREAD in existence the original reasons for
> CLONE_PARENT are gone now.
>
> Having used bash as an init process I know it can handle unexpeted
> children.  However using CLONE_PARENT in this way still seems a little
> dodgy.  Or am I misunderstanding why you are using CLONE_PARENT?
>
> That trick sounds like it might be worth adding to nsenter in util-linux
> just to simplify the code.
>
>> Thanks...
>>
>> Yes, this is what 40a0d32d1ea explicitly tries to disallow.
>>
>>> Is there a real danger in allowing CLONE_PARENT
>>> when current->nsproxy->pidns_for_children is not our pidns,
>>> or was this done out of an "over-abundance of caution"?
>>
>> I am not sure... This all was based on the long discussion, and
>> it was decided that the CLONE_PARENT check should be consistent
>> wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns().
>>
>>> Can we
>>> safely revert that new extra check?
>>
>> Well, usually we do not break user-space, but I am not sure about
>> this case...
>>
>> Eric, Andy, what do you think?
>>
>> And if we allow CLONE_PARENT when ->pidns_for_children was changed,
>> should we also allow, say, CLONE_NEWPID && CLONE_PARENT ?
>
> The two fundamental things I know we can not allow are:
> - A shared signal queue aka CLONE_THREAD.  Because we compute the pid
>   and uid of the signal when we place it in the queue.
>
> - Changing the pid and by extention pid_namespace of an existing
>   process.
>
> From a parents perspective there is nothing special about the pid
> namespace, to deny CLONE_PARENT, because the parent simply won't know or
> care.
>
> From the childs perspective all that is special really are shared signal
> queues.
>
> User mode threading with CLONE_PARENT|CLONE_VM|CLONE_SIGHAND and tasks
> in different pid namespaces is almost certainly going to break because
> it is complicated.  But shared signal handlers can look at per thread
> information to know which pid namespace a process is in, so I don't know
> of any reason not to support CLONE_PARENT|CLONE_VM|CLONE_SIGHAND threads
> at the kernel level.  It would be absolutely stupid to implement but
> that is a different thing.
>
> So hmm.
>
> Because it can do no harm, and because it is a regression let's remove
> the CLONE_PARENT check and send it stable.
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 086fe73..c447fbc 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1174,7 +1174,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>          * do not allow it to share a thread group or signal handlers or
>          * parent with the forking task.
>          */
> -       if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) {
> +       if (clone_flags & (CLONE_SIGHAND)) {
>                 if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
>                     (task_active_pid_ns(current) !=
>                                 current->nsproxy->pid_ns_for_children))
>

Acked-by: Andy Lutomirski <luto@amacapital.net>

>
> I don't know if there are shells that CLONE_PARENT can confuse but if
> there are lxcattach and nsenter using this functionality should be
> enough to slowly get that confusion fixed.
>
> Changing the CLONE_SIGHAND into CLONE_THREAD will need to happen in a
> separate patch.  It isn't stable material, and so far there is no
> compelling use case for it.
>
> Eric



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CLONE_PARENT after setns(CLONE_NEWPID)
  2013-11-06 22:50   ` Eric W. Biederman
  2013-11-06 22:56     ` Andy Lutomirski
@ 2013-11-06 23:12     ` Serge Hallyn
  2013-11-06 23:31     ` Christian Seiler
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Serge Hallyn @ 2013-11-06 23:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, Andy Lutomirski, Brad Spengler, Christian Seiler,
	lkml, Andy Whitcroft, Lxc development list

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Oleg Nesterov <oleg@redhat.com> writes:
> 
> > Hi Serge,
> >
> > On 11/06, Serge Hallyn wrote:
> >>
> >> Hi Oleg,
> >>
> >> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
> >> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
> >> breaks lxc-attach in 3.12.  That code forks a child which does
> >> setns() and then does a clone(CLONE_PARENT).  That way the
> >> grandchild can be in the right namespaces (which the child was
> >> not) and be a child of the original task, which is the monitor.
> 
> Serge that is a clever trick to get around the limitation that we can
> not change the pid namespace of our current process.  Given the
> challenging relaying of signals etc I can see why you would use this.
> 
> At the same time it makes me a little sad to see new users of
> CLONE_PARENT.  With CLONE_THREAD in existence the original reasons for
> CLONE_PARENT are gone now.
> 
> Having used bash as an init process I know it can handle unexpeted
> children.  However using CLONE_PARENT in this way still seems a little
> dodgy.  Or am I misunderstanding why you are using CLONE_PARENT?

FWIW Christian (cc:d from the start) was the author of that code, so he
can correct me if i mis-speak, but IIUC the design is:

1. pid X is the first process running lxc-attach.  It will be a monitor
for the process which is entered into the container

2. pid X forks pid Y, which does setns().  Now if it is setns()ing into
a pidns, it won't itself be in the new pidns, which is not satisfactory.
So

3. pid Y clones pid Z with CLONE_PARENT.  Y exists.  Z continues, as a
full member of the container, and a child of the monitor process.

So yes, as you said it's exactly to work around the fact that pid Y
can't change its own pidns.

-serge

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CLONE_PARENT after setns(CLONE_NEWPID)
  2013-11-06 22:56     ` Andy Lutomirski
@ 2013-11-06 23:17       ` Serge Hallyn
  0 siblings, 0 replies; 15+ messages in thread
From: Serge Hallyn @ 2013-11-06 23:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric W. Biederman, Oleg Nesterov, Brad Spengler,
	Christian Seiler, lkml, Andy Whitcroft, Lxc development list

Quoting Andy Lutomirski (luto@amacapital.net):
> On Wed, Nov 6, 2013 at 2:50 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> > Oleg Nesterov <oleg@redhat.com> writes:
> >
> >> Hi Serge,
> >>
> >> On 11/06, Serge Hallyn wrote:
> >>>
> >>> Hi Oleg,
> >>>
> >>> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
> >>> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
> >>> breaks lxc-attach in 3.12.  That code forks a child which does
> >>> setns() and then does a clone(CLONE_PARENT).  That way the
> >>> grandchild can be in the right namespaces (which the child was
> >>> not) and be a child of the original task, which is the monitor.
> >
> > Serge that is a clever trick to get around the limitation that we can
> > not change the pid namespace of our current process.  Given the
> > challenging relaying of signals etc I can see why you would use this.
> >
> > At the same time it makes me a little sad to see new users of
> > CLONE_PARENT.  With CLONE_THREAD in existence the original reasons for
> > CLONE_PARENT are gone now.
> >
> > Having used bash as an init process I know it can handle unexpeted
> > children.  However using CLONE_PARENT in this way still seems a little
> > dodgy.  Or am I misunderstanding why you are using CLONE_PARENT?
> >
> > That trick sounds like it might be worth adding to nsenter in util-linux
> > just to simplify the code.
> >
> >> Thanks...
> >>
> >> Yes, this is what 40a0d32d1ea explicitly tries to disallow.
> >>
> >>> Is there a real danger in allowing CLONE_PARENT
> >>> when current->nsproxy->pidns_for_children is not our pidns,
> >>> or was this done out of an "over-abundance of caution"?
> >>
> >> I am not sure... This all was based on the long discussion, and
> >> it was decided that the CLONE_PARENT check should be consistent
> >> wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns().
> >>
> >>> Can we
> >>> safely revert that new extra check?
> >>
> >> Well, usually we do not break user-space, but I am not sure about
> >> this case...
> >>
> >> Eric, Andy, what do you think?
> >>
> >> And if we allow CLONE_PARENT when ->pidns_for_children was changed,
> >> should we also allow, say, CLONE_NEWPID && CLONE_PARENT ?
> >
> > The two fundamental things I know we can not allow are:
> > - A shared signal queue aka CLONE_THREAD.  Because we compute the pid
> >   and uid of the signal when we place it in the queue.
> >
> > - Changing the pid and by extention pid_namespace of an existing
> >   process.
> >
> > From a parents perspective there is nothing special about the pid
> > namespace, to deny CLONE_PARENT, because the parent simply won't know or
> > care.
> >
> > From the childs perspective all that is special really are shared signal
> > queues.
> >
> > User mode threading with CLONE_PARENT|CLONE_VM|CLONE_SIGHAND and tasks
> > in different pid namespaces is almost certainly going to break because
> > it is complicated.  But shared signal handlers can look at per thread
> > information to know which pid namespace a process is in, so I don't know
> > of any reason not to support CLONE_PARENT|CLONE_VM|CLONE_SIGHAND threads
> > at the kernel level.  It would be absolutely stupid to implement but
> > that is a different thing.
> >
> > So hmm.
> >
> > Because it can do no harm, and because it is a regression let's remove
> > the CLONE_PARENT check and send it stable.
> >
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 086fe73..c447fbc 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -1174,7 +1174,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> >          * do not allow it to share a thread group or signal handlers or
> >          * parent with the forking task.
> >          */
> > -       if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) {
> > +       if (clone_flags & (CLONE_SIGHAND)) {
> >                 if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
> >                     (task_active_pid_ns(current) !=
> >                                 current->nsproxy->pid_ns_for_children))
> >
> 
> Acked-by: Andy Lutomirski <luto@amacapital.net>

Also (obviously)

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CLONE_PARENT after setns(CLONE_NEWPID)
  2013-11-06 22:50   ` Eric W. Biederman
  2013-11-06 22:56     ` Andy Lutomirski
  2013-11-06 23:12     ` Serge Hallyn
@ 2013-11-06 23:31     ` Christian Seiler
  2013-11-08 17:22     ` Oleg Nesterov
  2014-01-15 21:11     ` Christian Seiler
  4 siblings, 0 replies; 15+ messages in thread
From: Christian Seiler @ 2013-11-06 23:31 UTC (permalink / raw)
  To: Eric W. Biederman, Oleg Nesterov
  Cc: Serge Hallyn, Andy Lutomirski, Brad Spengler, lkml,
	Andy Whitcroft, Lxc development list

Hi there,

> Having used bash as an init process I know it can handle unexpeted
> children.  However using CLONE_PARENT in this way still seems a little
> dodgy.  Or am I misunderstanding why you are using CLONE_PARENT?

Since I (re)wrote that part of LXC, I should perhaps clarify how that is
used: In case of LXC, the grandparent is lxc-attach itself. The logic
goes as follows:

  - user calls lxc-attach -n $container -- /bin/command/to/execute
  - lxc-attach does a fork()
  - child process does setns()
  - child process does clone(CLONE_PARENT)
  - child process exits
  - new process is now in all of the correct namespaces
  - new process does some IPC (socketpair() from before fork/clone) to
    tell original lxc-attach process to finish initialization
    (mainly: add new process to the proper cgroups)
  - new process exec()s to /bin/command/to/execute
  - original lxc-attach process waitpid()s for the attached process
    to exit

So the only process that needs to handle a new child is going to be
lxc-attach itself, but that is designed in such a way that it expects
the new child.

(The initial fork is necessary because once setns(userns, mntns) has
occurred, the cgroup tree may not be writable anymore (depening on
further circumstances), so it would be impossible to just do setns() and
then fork() if one then wants to add the new process to the proper cgroups.)

> That trick sounds like it might be worth adding to nsenter in util-linux
> just to simplify the code.

I think nsenter currently only does setns() and then fork(), which is
simpler than lxc-attach - mainly because there's no need to attach the
process to cgroups etc. lxc-attach's approach does not eliminate the
need for the original process wait()ing on the attached process, the
CLONE_PARENT is really just used internally to simplify the process
hierarchy and also the IPC required.

Also, re: general point in this thread: I don't see how CLONE_PARENT
could be harmful in any way when used after setns() moreso than it might
already be harmful without setns(). I could always write a program that
just does clone(CLONE_PARENT) (and nothing else) and then the calling
process would also get an unexpected child - I don't see how the pid
namespace status of that child would change anything here. So I'd
definitely be in favor of allowing CLONE_PARENT after setns().

Christian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CLONE_PARENT after setns(CLONE_NEWPID)
  2013-11-06 22:50   ` Eric W. Biederman
                       ` (2 preceding siblings ...)
  2013-11-06 23:31     ` Christian Seiler
@ 2013-11-08 17:22     ` Oleg Nesterov
  2014-01-15 21:11     ` Christian Seiler
  4 siblings, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2013-11-08 17:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge Hallyn, Andy Lutomirski, Brad Spengler, Christian Seiler,
	lkml, Andy Whitcroft, Lxc development list

On 11/06, Eric W. Biederman wrote:
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1174,7 +1174,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>          * do not allow it to share a thread group or signal handlers or
>          * parent with the forking task.
>          */
> -       if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) {
> +       if (clone_flags & (CLONE_SIGHAND)) {
>                 if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
>                     (task_active_pid_ns(current) !=
>                                 current->nsproxy->pid_ns_for_children))

OK, agreed. I failed to find any problem with CLONE_PARENT with
CLONE_NEWUSER or after setns. And the main point of 40a0d32d1eaf
was "make them consistent", not "tighten up".

Besides, this doesn't differ too much from setns + fork() && exit(),
the grandchild will have the new namespace and reparented.

Acked-by: Oleg Nesterov <oleg@redhat.com>

> Changing the CLONE_SIGHAND into CLONE_THREAD will need to happen in a
> separate patch.  It isn't stable material, and so far there is no
> compelling use case for it.

Yes. Again, 40a0d32d1eaf chose CLONE_SIGHAND to unify CLONE_NEWUSER/setns
cases, copy_process() used this check. And in fact I voted for CLONE_THREAD
from the very beginning, it was you who suggested to use CLONE_SIGHAND
instead ;) OTOH, it was probably right to not relax the restrictions we
already had.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CLONE_PARENT after setns(CLONE_NEWPID)
  2013-11-06 22:50   ` Eric W. Biederman
                       ` (3 preceding siblings ...)
  2013-11-08 17:22     ` Oleg Nesterov
@ 2014-01-15 21:11     ` Christian Seiler
  2014-01-16  4:46       ` Serge Hallyn
  4 siblings, 1 reply; 15+ messages in thread
From: Christian Seiler @ 2014-01-15 21:11 UTC (permalink / raw)
  To: Eric W. Biederman, Oleg Nesterov
  Cc: Serge Hallyn, Andy Lutomirski, Brad Spengler, lkml,
	Andy Whitcroft, Lxc development list

Eric W. Biederman writes:
> So hmm.
>
> Because it can do no harm, and because it is a regression let's remove
> the CLONE_PARENT check and send it stable.
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 086fe73..c447fbc 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1174,7 +1174,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>           * do not allow it to share a thread group or signal handlers or
>           * parent with the forking task.
>           */
> -       if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) {
> +       if (clone_flags & (CLONE_SIGHAND)) {
>                  if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
>                      (task_active_pid_ns(current) !=
>                                  current->nsproxy->pid_ns_for_children))

Just a short question, what happened to this patch? As far as I can
tell, 3.13rc8 doesn't include it, neither does the current 3.12.7. This
means that lxc-attach currently still doesn't work on 3.12 and probably
won't work on 3.13 either... (3.11 is fine, see the previous mails in
this thread.)

-- Christian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CLONE_PARENT after setns(CLONE_NEWPID)
  2014-01-15 21:11     ` Christian Seiler
@ 2014-01-16  4:46       ` Serge Hallyn
  0 siblings, 0 replies; 15+ messages in thread
From: Serge Hallyn @ 2014-01-16  4:46 UTC (permalink / raw)
  To: Christian Seiler
  Cc: Eric W. Biederman, Oleg Nesterov, Andy Lutomirski, Brad Spengler,
	lkml, Andy Whitcroft, Lxc development list

Quoting Christian Seiler (christian@iwakd.de):
> Eric W. Biederman writes:
> >So hmm.
> >
> >Because it can do no harm, and because it is a regression let's remove
> >the CLONE_PARENT check and send it stable.
> >
> >diff --git a/kernel/fork.c b/kernel/fork.c
> >index 086fe73..c447fbc 100644
> >--- a/kernel/fork.c
> >+++ b/kernel/fork.c
> >@@ -1174,7 +1174,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> >          * do not allow it to share a thread group or signal handlers or
> >          * parent with the forking task.
> >          */
> >-       if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) {
> >+       if (clone_flags & (CLONE_SIGHAND)) {
> >                 if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
> >                     (task_active_pid_ns(current) !=
> >                                 current->nsproxy->pid_ns_for_children))
> 
> Just a short question, what happened to this patch? As far as I can
> tell, 3.13rc8 doesn't include it, neither does the current 3.12.7. This
> means that lxc-attach currently still doesn't work on 3.12 and probably
> won't work on 3.13 either... (3.11 is fine, see the previous mails in
> this thread.)

So, hm.  I didn't realize it hadn't hit upstream, because it's in the
ubuntu kernel (unfortunately wrongly attributed).

However it is in linux-next since Nov 27.

-serge

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-01-16  4:46 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-06 18:02 CLONE_PARENT after setns(CLONE_NEWPID) Serge Hallyn
2013-11-06 19:33 ` Oleg Nesterov
2013-11-06 19:50   ` Andy Lutomirski
2013-11-06 20:06     ` Oleg Nesterov
2013-11-06 20:21       ` Andy Lutomirski
2013-11-06 22:50   ` Eric W. Biederman
2013-11-06 22:56     ` Andy Lutomirski
2013-11-06 23:17       ` Serge Hallyn
2013-11-06 23:12     ` Serge Hallyn
2013-11-06 23:31     ` Christian Seiler
2013-11-08 17:22     ` Oleg Nesterov
2014-01-15 21:11     ` Christian Seiler
2014-01-16  4:46       ` Serge Hallyn
2013-11-06 22:53   ` Serge Hallyn
2013-11-06 22:53     ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).