All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] introduce user_ns inheritance in user-sched
@ 2009-03-19 21:16 Serge E. Hallyn
  2009-03-19 23:55 ` Matt Helsley
  2009-03-20  8:24 ` Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Serge E. Hallyn @ 2009-03-19 21:16 UTC (permalink / raw)
  To: lkml; +Cc: Dhaval Giani, mingo, Bharata B Rao, peterz, Linux Containers

In a kernel compiled with CONFIG_USER_SCHED=y, cpu shares are
allocated according to uid.  Shares are specifiable under
/sys/kernel/uids/<uid>/

In a kernel compiled with CONFIG_USER_NS=y, clone(2) with the
CLONE_NEWUSER flag creates a new user namespace, and the newly
cloned task will belong to uid 0 in the new user namespace.

Without this patch,  if uid 500 calls clone(CLONE_NEWUSER) (which
is possible using a program with the cap_sys_admin,cap_setuid,cap_setgid=pe
file capabilities), then the new task will get the cpu shares of
uid 0.

After this patch, if uid 500 calls clone(CLONE_NEWUSER), then even
though it is uid 0 in the new user namespace, it will be restricted to
the cpu shares of uid 500.

Currently there is no way to set shares for uids in user namespaces
other than the initial one.  That will be trivial to add when
sysfs tagging (or its functional equivalent, also needed to
expose network devices in network namespaces other than init)
becomes available.

Until cross-user-namespace file accesses are enforced, nothing
stops uid 0 in a child namespace from simply writing new values
into /sys/kernel/uids/500.

Here are results of some testing with and without the patch.

Cpu shares are initialized as follows::
	user root:   2048
	user hallyn: 1024
	user serge:  512

Results are the 'real' part of time make -j4 > o 2>&1,
each time after a make clean.

=================================================================
UNPATCHED
User 1: user serge creates a child user_ns and runs as user root
User 2: hallyn runs as user hallyn
=================================================================
           User 1          User 2
run 1:   2m58.834s        3m0.609s
run 2:   2m59.248s        2m59.457s

=============================================================
PATCHED
User 1: user serge
User 2: user hallyn
=============================================================

           User 1          User 2
run 1:   3m6.337s        2m22.681s
run 2:   3m6.323s        2m21.855s

=============================================================
PATCHED
User 1: user serge setuid to user root
User 2: hallyn
=============================================================

           User 1          User 2
run 1:   2m17.782s       3m3.947s
run 2:   2m18.497s       3m7.961s

==========================================================
PATCHED
User 1: user root inside userns created by userid serge
User 2: hallyn
==========================================================

           User 1          User 2
run 1:   3m9.876s        2m8.428s
run 2:   3m8.539s        2m6.356s

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: mingo@elte.hu
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: peterz@infradead.org
---
 kernel/user.c           |   12 +++++++++---
 kernel/user_namespace.c |    2 +-
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/user.c b/kernel/user.c
index 850e0ba..53aeea2 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -101,7 +101,12 @@ static int sched_create_user(struct user_struct *up)
 {
 	int rc = 0;
 
-	up->tg = sched_create_group(&root_task_group);
+	struct task_group *parent = &root_task_group;
+
+	if (up->user_ns != &init_user_ns)
+		parent = up->user_ns->creator->tg;
+
+	up->tg = sched_create_group(parent);
 	if (IS_ERR(up->tg))
 		rc = -ENOMEM;
 
@@ -434,11 +439,11 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 		new->uid = uid;
 		atomic_set(&new->__count, 1);
 
+		new->user_ns = get_user_ns(ns);
+
 		if (sched_create_user(new) < 0)
 			goto out_free_user;
 
-		new->user_ns = get_user_ns(ns);
-
 		if (uids_user_create(new))
 			goto out_destoy_sched;
 
@@ -472,6 +477,7 @@ out_destoy_sched:
 	sched_destroy_user(new);
 	put_user_ns(new->user_ns);
 out_free_user:
+	put_user_ns(new->user_ns);
 	kmem_cache_free(uid_cachep, new);
 out_unlock:
 	uids_mutex_unlock();
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 076c7c8..a99d3c7 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -35,6 +35,7 @@ int create_user_ns(struct cred *new)
 		INIT_HLIST_HEAD(ns->uidhash_table + n);
 
 	/* Alloc new root user.  */
+	ns->creator = new->user;
 	root_user = alloc_uid(ns, 0);
 	if (!root_user) {
 		kfree(ns);
@@ -42,7 +43,6 @@ int create_user_ns(struct cred *new)
 	}
 
 	/* set the new root user in the credentials under preparation */
-	ns->creator = new->user;
 	new->user = root_user;
 	new->uid = new->euid = new->suid = new->fsuid = 0;
 	new->gid = new->egid = new->sgid = new->fsgid = 0;
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] introduce user_ns inheritance in user-sched
  2009-03-19 21:16 [PATCH 1/1] introduce user_ns inheritance in user-sched Serge E. Hallyn
@ 2009-03-19 23:55 ` Matt Helsley
  2009-03-20  0:07   ` Serge E. Hallyn
  2009-03-20  0:23     ` Serge E. Hallyn
  2009-03-20  8:24 ` Peter Zijlstra
  1 sibling, 2 replies; 9+ messages in thread
From: Matt Helsley @ 2009-03-19 23:55 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: lkml, Dhaval Giani, mingo, Bharata B Rao, peterz, Linux Containers

On Thu, Mar 19, 2009 at 04:16:15PM -0500, Serge E. Hallyn wrote:
> In a kernel compiled with CONFIG_USER_SCHED=y, cpu shares are
> allocated according to uid.  Shares are specifiable under
> /sys/kernel/uids/<uid>/
> 
> In a kernel compiled with CONFIG_USER_NS=y, clone(2) with the
> CLONE_NEWUSER flag creates a new user namespace, and the newly
> cloned task will belong to uid 0 in the new user namespace.
> 
> Without this patch,  if uid 500 calls clone(CLONE_NEWUSER) (which
> is possible using a program with the cap_sys_admin,cap_setuid,cap_setgid=pe
> file capabilities), then the new task will get the cpu shares of
> uid 0.
> 
> After this patch, if uid 500 calls clone(CLONE_NEWUSER), then even
> though it is uid 0 in the new user namespace, it will be restricted to
> the cpu shares of uid 500.
> 
> Currently there is no way to set shares for uids in user namespaces
> other than the initial one.  That will be trivial to add when
> sysfs tagging (or its functional equivalent, also needed to
> expose network devices in network namespaces other than init)
> becomes available.
> 
> Until cross-user-namespace file accesses are enforced, nothing
> stops uid 0 in a child namespace from simply writing new values
> into /sys/kernel/uids/500.
> 
> Here are results of some testing with and without the patch.
> 
> Cpu shares are initialized as follows::
> 	user root:   2048
> 	user hallyn: 1024
> 	user serge:  512
> 
> Results are the 'real' part of time make -j4 > o 2>&1,
> each time after a make clean.
> 
> =================================================================
> UNPATCHED
> User 1: user serge creates a child user_ns and runs as user root
> User 2: hallyn runs as user hallyn
> =================================================================
>            User 1          User 2
> run 1:   2m58.834s        3m0.609s
> run 2:   2m59.248s        2m59.457s
> 
> =============================================================
> PATCHED
> User 1: user serge
> User 2: user hallyn
> =============================================================
> 
>            User 1          User 2
> run 1:   3m6.337s        2m22.681s
> run 2:   3m6.323s        2m21.855s
> 
> =============================================================
> PATCHED
> User 1: user serge setuid to user root
> User 2: hallyn
> =============================================================
> 
>            User 1          User 2
> run 1:   2m17.782s       3m3.947s
> run 2:   2m18.497s       3m7.961s
> 
> ==========================================================
> PATCHED
> User 1: user root inside userns created by userid serge
> User 2: hallyn
> ==========================================================
> 
>            User 1          User 2
> run 1:   3m9.876s        2m8.428s
> run 2:   3m8.539s        2m6.356s
> 
> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
> Cc: mingo@elte.hu
> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Cc: peterz@infradead.org
> ---
>  kernel/user.c           |   12 +++++++++---
>  kernel/user_namespace.c |    2 +-
>  2 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/user.c b/kernel/user.c
> index 850e0ba..53aeea2 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -101,7 +101,12 @@ static int sched_create_user(struct user_struct *up)
>  {
>  	int rc = 0;
> 
> -	up->tg = sched_create_group(&root_task_group);
> +	struct task_group *parent = &root_task_group;
> +
> +	if (up->user_ns != &init_user_ns)
> +		parent = up->user_ns->creator->tg;
> +
> +	up->tg = sched_create_group(parent);
>  	if (IS_ERR(up->tg))
>  		rc = -ENOMEM;
> 
> @@ -434,11 +439,11 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
>  		new->uid = uid;
>  		atomic_set(&new->__count, 1);
> 
> +		new->user_ns = get_user_ns(ns);
> +
>  		if (sched_create_user(new) < 0)
>  			goto out_free_user;
> 
> -		new->user_ns = get_user_ns(ns);
> -
>  		if (uids_user_create(new))
>  			goto out_destoy_sched;
> 
> @@ -472,6 +477,7 @@ out_destoy_sched:
>  	sched_destroy_user(new);
>  	put_user_ns(new->user_ns);

Shouldn't this put_user_ns(new->user_ns) be removed? It looks like two 
references to new->user_ns are being dropped if anything fails 
after sched_create_user(new) succeeds yet as far as I can tell the
patch does not introduce any new references to new->user_ns.

Otherwise looks good to me.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] introduce user_ns inheritance in user-sched
  2009-03-19 23:55 ` Matt Helsley
@ 2009-03-20  0:07   ` Serge E. Hallyn
  2009-03-20  0:23     ` Serge E. Hallyn
  1 sibling, 0 replies; 9+ messages in thread
From: Serge E. Hallyn @ 2009-03-20  0:07 UTC (permalink / raw)
  To: Matt Helsley
  Cc: lkml, Dhaval Giani, mingo, Bharata B Rao, peterz, Linux Containers

Quoting Matt Helsley (matthltc@us.ibm.com):
> On Thu, Mar 19, 2009 at 04:16:15PM -0500, Serge E. Hallyn wrote:
> > In a kernel compiled with CONFIG_USER_SCHED=y, cpu shares are
> > allocated according to uid.  Shares are specifiable under
> > /sys/kernel/uids/<uid>/
> > 
> > In a kernel compiled with CONFIG_USER_NS=y, clone(2) with the
> > CLONE_NEWUSER flag creates a new user namespace, and the newly
> > cloned task will belong to uid 0 in the new user namespace.
> > 
> > Without this patch,  if uid 500 calls clone(CLONE_NEWUSER) (which
> > is possible using a program with the cap_sys_admin,cap_setuid,cap_setgid=pe
> > file capabilities), then the new task will get the cpu shares of
> > uid 0.
> > 
> > After this patch, if uid 500 calls clone(CLONE_NEWUSER), then even
> > though it is uid 0 in the new user namespace, it will be restricted to
> > the cpu shares of uid 500.
> > 
> > Currently there is no way to set shares for uids in user namespaces
> > other than the initial one.  That will be trivial to add when
> > sysfs tagging (or its functional equivalent, also needed to
> > expose network devices in network namespaces other than init)
> > becomes available.
> > 
> > Until cross-user-namespace file accesses are enforced, nothing
> > stops uid 0 in a child namespace from simply writing new values
> > into /sys/kernel/uids/500.
> > 
> > Here are results of some testing with and without the patch.
> > 
> > Cpu shares are initialized as follows::
> > 	user root:   2048
> > 	user hallyn: 1024
> > 	user serge:  512
> > 
> > Results are the 'real' part of time make -j4 > o 2>&1,
> > each time after a make clean.
> > 
> > =================================================================
> > UNPATCHED
> > User 1: user serge creates a child user_ns and runs as user root
> > User 2: hallyn runs as user hallyn
> > =================================================================
> >            User 1          User 2
> > run 1:   2m58.834s        3m0.609s
> > run 2:   2m59.248s        2m59.457s
> > 
> > =============================================================
> > PATCHED
> > User 1: user serge
> > User 2: user hallyn
> > =============================================================
> > 
> >            User 1          User 2
> > run 1:   3m6.337s        2m22.681s
> > run 2:   3m6.323s        2m21.855s
> > 
> > =============================================================
> > PATCHED
> > User 1: user serge setuid to user root
> > User 2: hallyn
> > =============================================================
> > 
> >            User 1          User 2
> > run 1:   2m17.782s       3m3.947s
> > run 2:   2m18.497s       3m7.961s
> > 
> > ==========================================================
> > PATCHED
> > User 1: user root inside userns created by userid serge
> > User 2: hallyn
> > ==========================================================
> > 
> >            User 1          User 2
> > run 1:   3m9.876s        2m8.428s
> > run 2:   3m8.539s        2m6.356s
> > 
> > Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
> > Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
> > Cc: mingo@elte.hu
> > Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > Cc: peterz@infradead.org
> > ---
> >  kernel/user.c           |   12 +++++++++---
> >  kernel/user_namespace.c |    2 +-
> >  2 files changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/user.c b/kernel/user.c
> > index 850e0ba..53aeea2 100644
> > --- a/kernel/user.c
> > +++ b/kernel/user.c
> > @@ -101,7 +101,12 @@ static int sched_create_user(struct user_struct *up)
> >  {
> >  	int rc = 0;
> > 
> > -	up->tg = sched_create_group(&root_task_group);
> > +	struct task_group *parent = &root_task_group;
> > +
> > +	if (up->user_ns != &init_user_ns)
> > +		parent = up->user_ns->creator->tg;
> > +
> > +	up->tg = sched_create_group(parent);
> >  	if (IS_ERR(up->tg))
> >  		rc = -ENOMEM;
> > 
> > @@ -434,11 +439,11 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
> >  		new->uid = uid;
> >  		atomic_set(&new->__count, 1);
> > 
> > +		new->user_ns = get_user_ns(ns);
> > +
> >  		if (sched_create_user(new) < 0)
> >  			goto out_free_user;
> > 
> > -		new->user_ns = get_user_ns(ns);
> > -
> >  		if (uids_user_create(new))
> >  			goto out_destoy_sched;
> > 
> > @@ -472,6 +477,7 @@ out_destoy_sched:
> >  	sched_destroy_user(new);
> >  	put_user_ns(new->user_ns);
> 
> Shouldn't this put_user_ns(new->user_ns) be removed? It looks like two 
> references to new->user_ns are being dropped if anything fails 
> after sched_create_user(new) succeeds yet as far as I can tell the
> patch does not introduce any new references to new->user_ns.

Ouch, yeah, thought I'd done that...

Thanks for catching that!  Will resend.

> Otherwise looks good to me.
> 
> Cheers,
> 	-Matt Helsley

thanks,
-serge

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] introduce user_ns inheritance in user-sched
  2009-03-19 23:55 ` Matt Helsley
@ 2009-03-20  0:23     ` Serge E. Hallyn
  2009-03-20  0:23     ` Serge E. Hallyn
  1 sibling, 0 replies; 9+ messages in thread
From: Serge E. Hallyn @ 2009-03-20  0:23 UTC (permalink / raw)
  To: Matt Helsley
  Cc: lkml, Dhaval Giani, mingo, Bharata B Rao, peterz, Linux Containers

Quoting Matt Helsley (matthltc@us.ibm.com):
> Shouldn't this put_user_ns(new->user_ns) be removed? It looks like two 
> references to new->user_ns are being dropped if anything fails 
> after sched_create_user(new) succeeds yet as far as I can tell the
> patch does not introduce any new references to new->user_ns.
> 
> Otherwise looks good to me.

Here is the new version.  Thanks again.

From 55c264b27cb1f6f91007ae2aeda2d4f6067bb2eb Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <serue@us.ibm.com>
Date: Wed, 18 Mar 2009 13:29:32 -0700
Subject: [PATCH] introduce user_ns inheritance in user-sched (v2)

In a kernel compiled with CONFIG_USER_SCHED=y, cpu shares are
allocated according to uid.  Shares are specifiable under
/sys/kernel/uids/

In a kernel compiled with CONFIG_USER_NS=y, clone(2) with the
CLONE_NEWUSER flag creates a new user namespace, and the newly
cloned task will belong to uid 0 in the new user namespace.

Without this patch,  if uid 500 calls clone(CLONE_NEWUSER) (which
is possible using a program with the cap_sys_admin,cap_setuid,cap_setgid=pe
file capabilities), then the new task will get the cpu shares of
uid 0.

After this patch, if uid 500 calls clone(CLONE_NEWUSER), then even
though it is uid 0 in the new user namespace, it will be restricted to
the cpu shares of uid 500.

Currently there is no way to set shares for uids in user namespaces
other than the initial one.  That will be trivial to add when
sysfs tagging (or its functional equivalent, also needed to
expose network devices in network namespaces other than init)
becomes available.

Until cross-user-namespace file accesses are enforced, nothing
stops uid 0 in a child namespace from simply writing new values
into /sys/kernel/uids/500.

Here are results of some testing with and without the patch.

Cpu shares are initialized as follows::
	user root:   2048
	user hallyn: 1024
	user serge:  512

Results are the 'real' part of time make -j4 > o 2>&1,
each time after a make clean.

=================================================================
UNPATCHED
User 1: user serge creates a child user_ns and runs as user root
User 2: hallyn runs as user hallyn
=================================================================
           User 1          User 2
run 1:   2m58.834s        3m0.609s
run 2:   2m59.248s        2m59.457s

=============================================================
PATCHED
User 1: user serge
User 2: user hallyn
=============================================================

           User 1          User 2
run 1:   3m6.337s        2m22.681s
run 2:   3m6.323s        2m21.855s

=============================================================
PATCHED
User 1: user serge setuid to user root
User 2: hallyn
=============================================================

           User 1          User 2
run 1:   2m17.782s       3m3.947s
run 2:   2m18.497s       3m7.961s

==========================================================
PATCHED
User 1: user root inside userns created by userid serge
User 2: hallyn
==========================================================

           User 1          User 2
run 1:   3m9.876s        2m8.428s
run 2:   3m8.539s        2m6.356s

Changelog:
	Mar 19: Matt Helsley pointed out there were
		two calls to put_user_ns() in alloc_uid()
		error path.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: mingo@elte.hu
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: peterz@infradead.org
---
 kernel/user.c           |   13 +++++++++----
 kernel/user_namespace.c |    2 +-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/user.c b/kernel/user.c
index 850e0ba..8ae4bf8 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -101,7 +101,12 @@ static int sched_create_user(struct user_struct *up)
 {
 	int rc = 0;
 
-	up->tg = sched_create_group(&root_task_group);
+	struct task_group *parent = &root_task_group;
+
+	if (up->user_ns != &init_user_ns)
+		parent = up->user_ns->creator->tg;
+
+	up->tg = sched_create_group(parent);
 	if (IS_ERR(up->tg))
 		rc = -ENOMEM;
 
@@ -434,11 +439,11 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 		new->uid = uid;
 		atomic_set(&new->__count, 1);
 
+		new->user_ns = get_user_ns(ns);
+
 		if (sched_create_user(new) < 0)
 			goto out_free_user;
 
-		new->user_ns = get_user_ns(ns);
-
 		if (uids_user_create(new))
 			goto out_destoy_sched;
 
@@ -470,8 +475,8 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 
 out_destoy_sched:
 	sched_destroy_user(new);
-	put_user_ns(new->user_ns);
 out_free_user:
+	put_user_ns(new->user_ns);
 	kmem_cache_free(uid_cachep, new);
 out_unlock:
 	uids_mutex_unlock();
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 076c7c8..a99d3c7 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -35,6 +35,7 @@ int create_user_ns(struct cred *new)
 		INIT_HLIST_HEAD(ns->uidhash_table + n);
 
 	/* Alloc new root user.  */
+	ns->creator = new->user;
 	root_user = alloc_uid(ns, 0);
 	if (!root_user) {
 		kfree(ns);
@@ -42,7 +43,6 @@ int create_user_ns(struct cred *new)
 	}
 
 	/* set the new root user in the credentials under preparation */
-	ns->creator = new->user;
 	new->user = root_user;
 	new->uid = new->euid = new->suid = new->fsuid = 0;
 	new->gid = new->egid = new->sgid = new->fsgid = 0;
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] introduce user_ns inheritance in user-sched
@ 2009-03-20  0:23     ` Serge E. Hallyn
  0 siblings, 0 replies; 9+ messages in thread
From: Serge E. Hallyn @ 2009-03-20  0:23 UTC (permalink / raw)
  To: Matt Helsley
  Cc: lkml, Dhaval Giani, mingo, Bharata B Rao, peterz, Linux Containers

Quoting Matt Helsley (matthltc@us.ibm.com):
> Shouldn't this put_user_ns(new->user_ns) be removed? It looks like two 
> references to new->user_ns are being dropped if anything fails 
> after sched_create_user(new) succeeds yet as far as I can tell the
> patch does not introduce any new references to new->user_ns.
> 
> Otherwise looks good to me.

Here is the new version.  Thanks again.

>From 55c264b27cb1f6f91007ae2aeda2d4f6067bb2eb Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <serue@us.ibm.com>
Date: Wed, 18 Mar 2009 13:29:32 -0700
Subject: [PATCH] introduce user_ns inheritance in user-sched (v2)

In a kernel compiled with CONFIG_USER_SCHED=y, cpu shares are
allocated according to uid.  Shares are specifiable under
/sys/kernel/uids/

In a kernel compiled with CONFIG_USER_NS=y, clone(2) with the
CLONE_NEWUSER flag creates a new user namespace, and the newly
cloned task will belong to uid 0 in the new user namespace.

Without this patch,  if uid 500 calls clone(CLONE_NEWUSER) (which
is possible using a program with the cap_sys_admin,cap_setuid,cap_setgid=pe
file capabilities), then the new task will get the cpu shares of
uid 0.

After this patch, if uid 500 calls clone(CLONE_NEWUSER), then even
though it is uid 0 in the new user namespace, it will be restricted to
the cpu shares of uid 500.

Currently there is no way to set shares for uids in user namespaces
other than the initial one.  That will be trivial to add when
sysfs tagging (or its functional equivalent, also needed to
expose network devices in network namespaces other than init)
becomes available.

Until cross-user-namespace file accesses are enforced, nothing
stops uid 0 in a child namespace from simply writing new values
into /sys/kernel/uids/500.

Here are results of some testing with and without the patch.

Cpu shares are initialized as follows::
	user root:   2048
	user hallyn: 1024
	user serge:  512

Results are the 'real' part of time make -j4 > o 2>&1,
each time after a make clean.

=================================================================
UNPATCHED
User 1: user serge creates a child user_ns and runs as user root
User 2: hallyn runs as user hallyn
=================================================================
           User 1          User 2
run 1:   2m58.834s        3m0.609s
run 2:   2m59.248s        2m59.457s

=============================================================
PATCHED
User 1: user serge
User 2: user hallyn
=============================================================

           User 1          User 2
run 1:   3m6.337s        2m22.681s
run 2:   3m6.323s        2m21.855s

=============================================================
PATCHED
User 1: user serge setuid to user root
User 2: hallyn
=============================================================

           User 1          User 2
run 1:   2m17.782s       3m3.947s
run 2:   2m18.497s       3m7.961s

==========================================================
PATCHED
User 1: user root inside userns created by userid serge
User 2: hallyn
==========================================================

           User 1          User 2
run 1:   3m9.876s        2m8.428s
run 2:   3m8.539s        2m6.356s

Changelog:
	Mar 19: Matt Helsley pointed out there were
		two calls to put_user_ns() in alloc_uid()
		error path.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: mingo@elte.hu
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: peterz@infradead.org
---
 kernel/user.c           |   13 +++++++++----
 kernel/user_namespace.c |    2 +-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/user.c b/kernel/user.c
index 850e0ba..8ae4bf8 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -101,7 +101,12 @@ static int sched_create_user(struct user_struct *up)
 {
 	int rc = 0;
 
-	up->tg = sched_create_group(&root_task_group);
+	struct task_group *parent = &root_task_group;
+
+	if (up->user_ns != &init_user_ns)
+		parent = up->user_ns->creator->tg;
+
+	up->tg = sched_create_group(parent);
 	if (IS_ERR(up->tg))
 		rc = -ENOMEM;
 
@@ -434,11 +439,11 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 		new->uid = uid;
 		atomic_set(&new->__count, 1);
 
+		new->user_ns = get_user_ns(ns);
+
 		if (sched_create_user(new) < 0)
 			goto out_free_user;
 
-		new->user_ns = get_user_ns(ns);
-
 		if (uids_user_create(new))
 			goto out_destoy_sched;
 
@@ -470,8 +475,8 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 
 out_destoy_sched:
 	sched_destroy_user(new);
-	put_user_ns(new->user_ns);
 out_free_user:
+	put_user_ns(new->user_ns);
 	kmem_cache_free(uid_cachep, new);
 out_unlock:
 	uids_mutex_unlock();
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 076c7c8..a99d3c7 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -35,6 +35,7 @@ int create_user_ns(struct cred *new)
 		INIT_HLIST_HEAD(ns->uidhash_table + n);
 
 	/* Alloc new root user.  */
+	ns->creator = new->user;
 	root_user = alloc_uid(ns, 0);
 	if (!root_user) {
 		kfree(ns);
@@ -42,7 +43,6 @@ int create_user_ns(struct cred *new)
 	}
 
 	/* set the new root user in the credentials under preparation */
-	ns->creator = new->user;
 	new->user = root_user;
 	new->uid = new->euid = new->suid = new->fsuid = 0;
 	new->gid = new->egid = new->sgid = new->fsgid = 0;
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] introduce user_ns inheritance in user-sched
  2009-03-19 21:16 [PATCH 1/1] introduce user_ns inheritance in user-sched Serge E. Hallyn
  2009-03-19 23:55 ` Matt Helsley
@ 2009-03-20  8:24 ` Peter Zijlstra
  2009-03-21  2:46   ` Serge E. Hallyn
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2009-03-20  8:24 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: lkml, Dhaval Giani, mingo, Bharata B Rao, Linux Containers

On Thu, 2009-03-19 at 16:16 -0500, Serge E. Hallyn wrote:
> In a kernel compiled with CONFIG_USER_SCHED=y, cpu shares are
> allocated according to uid.  Shares are specifiable under
> /sys/kernel/uids/<uid>/
> 
> In a kernel compiled with CONFIG_USER_NS=y, clone(2) with the
> CLONE_NEWUSER flag creates a new user namespace, and the newly
> cloned task will belong to uid 0 in the new user namespace.

We seem to be adding more and more stuff for USER_SCHED, is anybody
actually using that cruft?

How far along with cgroups are we to fully simulate that behaviour?

I think if we have a capable cgroup based replacement for USER_SCHED we
should axe it from the kernel, would save lots of code...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] introduce user_ns inheritance in user-sched
  2009-03-20  8:24 ` Peter Zijlstra
@ 2009-03-21  2:46   ` Serge E. Hallyn
  2009-03-21 11:36     ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Serge E. Hallyn @ 2009-03-21  2:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Serge E. Hallyn, Linux Containers, mingo, Bharata B Rao, lkml,
	Dhaval Giani

Quoting Peter Zijlstra (peterz@infradead.org):
> On Thu, 2009-03-19 at 16:16 -0500, Serge E. Hallyn wrote:
> > In a kernel compiled with CONFIG_USER_SCHED=y, cpu shares are
> > allocated according to uid.  Shares are specifiable under
> > /sys/kernel/uids/<uid>/
> > 
> > In a kernel compiled with CONFIG_USER_NS=y, clone(2) with the
> > CLONE_NEWUSER flag creates a new user namespace, and the newly
> > cloned task will belong to uid 0 in the new user namespace.
> 
> We seem to be adding more and more stuff for USER_SCHED, is anybody
> actually using that cruft?
> 
> How far along with cgroups are we to fully simulate that behaviour?
> 
> I think if we have a capable cgroup based replacement for USER_SCHED we
> should axe it from the kernel, would save lots of code...

I didn't realize that was the plan.  Using PAM to move users
around cgroups?  If so, then yeah that would simplify quite a bit
of code.   Won't catch all setuid()s of course - I don't know
who uses USER_SCHED and if that would matter.

-serge

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] introduce user_ns inheritance in user-sched
  2009-03-21  2:46   ` Serge E. Hallyn
@ 2009-03-21 11:36     ` Peter Zijlstra
  2009-03-22  8:00       ` Dhaval Giani
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2009-03-21 11:36 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Serge E. Hallyn, Linux Containers, mingo, Bharata B Rao, lkml,
	Dhaval Giani

On Fri, 2009-03-20 at 21:46 -0500, Serge E. Hallyn wrote:
> Quoting Peter Zijlstra (peterz@infradead.org):
> > On Thu, 2009-03-19 at 16:16 -0500, Serge E. Hallyn wrote:
> > > In a kernel compiled with CONFIG_USER_SCHED=y, cpu shares are
> > > allocated according to uid.  Shares are specifiable under
> > > /sys/kernel/uids/<uid>/
> > > 
> > > In a kernel compiled with CONFIG_USER_NS=y, clone(2) with the
> > > CLONE_NEWUSER flag creates a new user namespace, and the newly
> > > cloned task will belong to uid 0 in the new user namespace.
> > 
> > We seem to be adding more and more stuff for USER_SCHED, is anybody
> > actually using that cruft?
> > 
> > How far along with cgroups are we to fully simulate that behaviour?
> > 
> > I think if we have a capable cgroup based replacement for USER_SCHED we
> > should axe it from the kernel, would save lots of code...
> 
> I didn't realize that was the plan.  Using PAM to move users
> around cgroups? 

Right, thing is, distro's will all want cgroup enabled, since that's the
latest fad :-), so this user sched thing will only be for people who
build their own kernels -- but I suspect most of those simply disable
all this group scheduling.

>  If so, then yeah that would simplify quite a bit
> of code.   Won't catch all setuid()s of course 

Right, so if we could somehow get a setuid notification hooked into
cgroups,.. not sure that would be worth the trouble though.

> - I don't know who uses USER_SCHED and if that would matter.

Right, me neither... I would just love to be able to cut all that code
out :-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] introduce user_ns inheritance in user-sched
  2009-03-21 11:36     ` Peter Zijlstra
@ 2009-03-22  8:00       ` Dhaval Giani
  0 siblings, 0 replies; 9+ messages in thread
From: Dhaval Giani @ 2009-03-22  8:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Serge E. Hallyn, Serge E. Hallyn, Linux Containers, mingo,
	Bharata B Rao, lkml

On Sat, Mar 21, 2009 at 12:36:49PM +0100, Peter Zijlstra wrote:
> On Fri, 2009-03-20 at 21:46 -0500, Serge E. Hallyn wrote:
> > Quoting Peter Zijlstra (peterz@infradead.org):
> > > On Thu, 2009-03-19 at 16:16 -0500, Serge E. Hallyn wrote:
> > > > In a kernel compiled with CONFIG_USER_SCHED=y, cpu shares are
> > > > allocated according to uid.  Shares are specifiable under
> > > > /sys/kernel/uids/<uid>/
> > > > 
> > > > In a kernel compiled with CONFIG_USER_NS=y, clone(2) with the
> > > > CLONE_NEWUSER flag creates a new user namespace, and the newly
> > > > cloned task will belong to uid 0 in the new user namespace.
> > > 
> > > We seem to be adding more and more stuff for USER_SCHED, is anybody
> > > actually using that cruft?
> > > 
> > > How far along with cgroups are we to fully simulate that behaviour?
> > > 
> > > I think if we have a capable cgroup based replacement for USER_SCHED we
> > > should axe it from the kernel, would save lots of code...
> > 
> > I didn't realize that was the plan.  Using PAM to move users
> > around cgroups? 
> 
> Right, thing is, distro's will all want cgroup enabled, since that's the
> latest fad :-), so this user sched thing will only be for people who
> build their own kernels -- but I suspect most of those simply disable
> all this group scheduling.
> 

But if they do not, then the behavior is wrong now, and I would prefer
it to be fixed, which is why this patch.

> >  If so, then yeah that would simplify quite a bit
> > of code.   Won't catch all setuid()s of course 
> 
> Right, so if we could somehow get a setuid notification hooked into
> cgroups,.. not sure that would be worth the trouble though.
> 

Does anyone really care about uid based grouping? (I do realize the
userspace daemon classifies on the basis of uids as of now, but still,
how many use cases really want uid based grouping as opposed to process
type (as in browser, compiler..) type of grouping)

> > - I don't know who uses USER_SCHED and if that would matter.
> 
> Right, me neither... I would just love to be able to cut all that code
> out :-)
> 

me too :)

-- 
regards,
Dhaval

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-03-22  8:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-19 21:16 [PATCH 1/1] introduce user_ns inheritance in user-sched Serge E. Hallyn
2009-03-19 23:55 ` Matt Helsley
2009-03-20  0:07   ` Serge E. Hallyn
2009-03-20  0:23   ` Serge E. Hallyn
2009-03-20  0:23     ` Serge E. Hallyn
2009-03-20  8:24 ` Peter Zijlstra
2009-03-21  2:46   ` Serge E. Hallyn
2009-03-21 11:36     ` Peter Zijlstra
2009-03-22  8:00       ` Dhaval Giani

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.