linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REVIEW][PATCH 0/5] A couple of lingering namespace patches
@ 2013-08-29 23:52 Eric W. Biederman
  2013-08-29 23:53 ` [REVIEW][PATCH 1/5] namespaces: Simplify copy_namespaces so it is clear what is going on Eric W. Biederman
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Eric W. Biederman @ 2013-08-29 23:52 UTC (permalink / raw)
  To: Linux Containers; +Cc: Serge E. Hallyn, linux-kernel


There are a couple of long overdue namespace patches, simple cleanups
and permision grants that have been sitting in my development tree
for far too long.  If anyone objects to these please let me know.

Eric W. Biederman (4):
      namespaces: Simplify copy_namespaces so it is clear what is going on.
      userns: Allow PR_CAPBSET_DROP in a user namespace.
      pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
      userns:  Kill nsown_capable it makes the wrong thing easy

Serge Hallyn (1):
      capabilities: allow nice if we are privileged

 fs/namespace.c             |    4 ++--
 fs/open.c                  |    2 +-
 include/linux/capability.h |    1 -
 ipc/namespace.c            |    2 +-
 kernel/capability.c        |   12 ------------
 kernel/fork.c              |    5 -----
 kernel/groups.c            |    2 +-
 kernel/nsproxy.c           |   35 +++++++++++------------------------
 kernel/pid_namespace.c     |    2 +-
 kernel/sys.c               |   20 ++++++++++----------
 kernel/uid16.c             |    2 +-
 kernel/utsname.c           |    2 +-
 net/core/net_namespace.c   |    2 +-
 net/core/scm.c             |    4 ++--
 security/commoncap.c       |   10 +++++-----
 15 files changed, 37 insertions(+), 68 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [REVIEW][PATCH 1/5] namespaces: Simplify copy_namespaces so it is clear what is going on.
  2013-08-29 23:52 [REVIEW][PATCH 0/5] A couple of lingering namespace patches Eric W. Biederman
@ 2013-08-29 23:53 ` Eric W. Biederman
  2013-08-30 16:10   ` Serge E. Hallyn
  2013-08-29 23:54 ` [REVIEW][PATCH 2/5] userns: Allow PR_CAPBSET_DROP in a user namespace Eric W. Biederman
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2013-08-29 23:53 UTC (permalink / raw)
  To: Linux Containers; +Cc: Serge E. Hallyn, linux-kernel


Remove the test for the impossible case where tsk->nsproxy == NULL.  Fork
will never be called with tsk->nsproxy == NULL.

Only call get_nsproxy when we don't need to generate a new_nsproxy,
and mark the case where we don't generate a new nsproxy as likely.

Remove the code to drop an unnecessarily acquired nsproxy value.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/nsproxy.c |   35 +++++++++++------------------------
 1 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index d9afd2563..a1ed011 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -125,22 +125,16 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
 	struct nsproxy *old_ns = tsk->nsproxy;
 	struct user_namespace *user_ns = task_cred_xxx(tsk, user_ns);
 	struct nsproxy *new_ns;
-	int err = 0;
 
-	if (!old_ns)
+	if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
+			      CLONE_NEWPID | CLONE_NEWNET)))) {
+		get_nsproxy(old_ns);
 		return 0;
-
-	get_nsproxy(old_ns);
-
-	if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
-				CLONE_NEWPID | CLONE_NEWNET)))
-		return 0;
-
-	if (!ns_capable(user_ns, CAP_SYS_ADMIN)) {
-		err = -EPERM;
-		goto out;
 	}
 
+	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
+		return -EPERM;
+
 	/*
 	 * CLONE_NEWIPC must detach from the undolist: after switching
 	 * to a new ipc namespace, the semaphore arrays from the old
@@ -149,22 +143,15 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
 	 * it along with CLONE_NEWIPC.
 	 */
 	if ((flags & (CLONE_NEWIPC | CLONE_SYSVSEM)) ==
-		(CLONE_NEWIPC | CLONE_SYSVSEM)) {
-		err = -EINVAL;
-		goto out;
-	}
+		(CLONE_NEWIPC | CLONE_SYSVSEM)) 
+		return -EINVAL;
 
 	new_ns = create_new_namespaces(flags, tsk, user_ns, tsk->fs);
-	if (IS_ERR(new_ns)) {
-		err = PTR_ERR(new_ns);
-		goto out;
-	}
+	if (IS_ERR(new_ns))
+		return  PTR_ERR(new_ns);
 
 	tsk->nsproxy = new_ns;
-
-out:
-	put_nsproxy(old_ns);
-	return err;
+	return 0;
 }
 
 void free_nsproxy(struct nsproxy *ns)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [REVIEW][PATCH 2/5] userns: Allow PR_CAPBSET_DROP in a user namespace.
  2013-08-29 23:52 [REVIEW][PATCH 0/5] A couple of lingering namespace patches Eric W. Biederman
  2013-08-29 23:53 ` [REVIEW][PATCH 1/5] namespaces: Simplify copy_namespaces so it is clear what is going on Eric W. Biederman
@ 2013-08-29 23:54 ` Eric W. Biederman
  2013-08-30  1:15   ` Serge E. Hallyn
  2013-08-29 23:55 ` [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD Eric W. Biederman
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2013-08-29 23:54 UTC (permalink / raw)
  To: Linux Containers; +Cc: Serge E. Hallyn, linux-kernel, Richard Weinberger


As the capabilites and capability bounding set are per user namespace
properties it is safe to allow changing them with just CAP_SETPCAP
permission in the user namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Richard Weinberger <richard@nod.at>
---
 security/commoncap.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index c44b6fe..9fccf71 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -824,7 +824,7 @@ int cap_task_setnice(struct task_struct *p, int nice)
  */
 static long cap_prctl_drop(struct cred *new, unsigned long cap)
 {
-	if (!capable(CAP_SETPCAP))
+	if (!ns_capable(current_user_ns(), CAP_SETPCAP))
 		return -EPERM;
 	if (!cap_valid(cap))
 		return -EINVAL;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
  2013-08-29 23:52 [REVIEW][PATCH 0/5] A couple of lingering namespace patches Eric W. Biederman
  2013-08-29 23:53 ` [REVIEW][PATCH 1/5] namespaces: Simplify copy_namespaces so it is clear what is going on Eric W. Biederman
  2013-08-29 23:54 ` [REVIEW][PATCH 2/5] userns: Allow PR_CAPBSET_DROP in a user namespace Eric W. Biederman
@ 2013-08-29 23:55 ` Eric W. Biederman
  2013-08-30 16:38   ` Serge E. Hallyn
  2013-09-08 17:00   ` Oleg Nesterov
  2013-08-29 23:55 ` [REVIEW][PATCH 4/5] capabilities: allow nice if we are privileged Eric W. Biederman
  2013-08-29 23:56 ` [REVIEW][PATCH 5/5] userns: Kill nsown_capable it makes the wrong thing easy Eric W. Biederman
  4 siblings, 2 replies; 13+ messages in thread
From: Eric W. Biederman @ 2013-08-29 23:55 UTC (permalink / raw)
  To: Linux Containers; +Cc: Serge E. Hallyn, linux-kernel, Oleg Nesterov


I goofed when I made unshare(CLONE_NEWPID) only work in a
single-threaded process.  There is no need for that requirement and in
fact I analyzied things right for setns.  The hard requirement
is for tasks that share a VM to all be in the pid namespace and
we properly prevent that in do_fork.

Just to be certain I took a look through do_wait and
forget_original_parent and there are no cases that make it any harder
for children to be in the multiple pid namespaces than it is for
children to be in the same pid namespace.  I also performed a check to
see if there were in uses of task->nsproxy_pid_ns I was not familiar
with, but it is only used when allocating a new pid for a new task,
and in checks to prevent craziness from happening.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/fork.c |    5 -----
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 66635c8..eb45f1d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1818,11 +1818,6 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
 	if (unshare_flags & CLONE_NEWUSER)
 		unshare_flags |= CLONE_THREAD | CLONE_FS;
 	/*
-	 * If unsharing a pid namespace must also unshare the thread.
-	 */
-	if (unshare_flags & CLONE_NEWPID)
-		unshare_flags |= CLONE_THREAD;
-	/*
 	 * If unsharing a thread from a thread group, must also unshare vm.
 	 */
 	if (unshare_flags & CLONE_THREAD)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [REVIEW][PATCH 4/5] capabilities: allow nice if we are privileged
  2013-08-29 23:52 [REVIEW][PATCH 0/5] A couple of lingering namespace patches Eric W. Biederman
                   ` (2 preceding siblings ...)
  2013-08-29 23:55 ` [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD Eric W. Biederman
@ 2013-08-29 23:55 ` Eric W. Biederman
  2013-08-29 23:56 ` [REVIEW][PATCH 5/5] userns: Kill nsown_capable it makes the wrong thing easy Eric W. Biederman
  4 siblings, 0 replies; 13+ messages in thread
From: Eric W. Biederman @ 2013-08-29 23:55 UTC (permalink / raw)
  To: Linux Containers; +Cc: Serge E. Hallyn, linux-kernel


We allow task A to change B's nice level if it has a supserset of
B's privileges, or of it has CAP_SYS_NICE.  Also allow it if A has
CAP_SYS_NICE with respect to B - meaning it is root in the same
namespace, or it created B's namespace.

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 security/commoncap.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index 9fccf71..b9d613e 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -768,16 +768,16 @@ int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags)
  */
 static int cap_safe_nice(struct task_struct *p)
 {
-	int is_subset;
+	int is_subset, ret = 0;
 
 	rcu_read_lock();
 	is_subset = cap_issubset(__task_cred(p)->cap_permitted,
 				 current_cred()->cap_permitted);
+	if (!is_subset && !ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE))
+		ret = -EPERM;
 	rcu_read_unlock();
 
-	if (!is_subset && !capable(CAP_SYS_NICE))
-		return -EPERM;
-	return 0;
+	return ret;
 }
 
 /**
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [REVIEW][PATCH 5/5] userns:  Kill nsown_capable it makes the wrong thing easy
  2013-08-29 23:52 [REVIEW][PATCH 0/5] A couple of lingering namespace patches Eric W. Biederman
                   ` (3 preceding siblings ...)
  2013-08-29 23:55 ` [REVIEW][PATCH 4/5] capabilities: allow nice if we are privileged Eric W. Biederman
@ 2013-08-29 23:56 ` Eric W. Biederman
  2013-08-30  1:14   ` Serge E. Hallyn
  4 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2013-08-29 23:56 UTC (permalink / raw)
  To: Linux Containers; +Cc: Serge E. Hallyn, linux-kernel


nsown_capable is a special case of ns_capable essentially for just CAP_SETUID and
CAP_SETGID.  For the existing users it doesn't noticably simplify things and
from the suggested patches I have seen it encourages people to do the wrong
thing.  So remove nsown_capable.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namespace.c             |    4 ++--
 fs/open.c                  |    2 +-
 include/linux/capability.h |    1 -
 ipc/namespace.c            |    2 +-
 kernel/capability.c        |   12 ------------
 kernel/groups.c            |    2 +-
 kernel/pid_namespace.c     |    2 +-
 kernel/sys.c               |   20 ++++++++++----------
 kernel/uid16.c             |    2 +-
 kernel/utsname.c           |    2 +-
 net/core/net_namespace.c   |    2 +-
 net/core/scm.c             |    4 ++--
 12 files changed, 21 insertions(+), 34 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 877e427..dc519a1 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2929,8 +2929,8 @@ static int mntns_install(struct nsproxy *nsproxy, void *ns)
 	struct path root;
 
 	if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
-	    !nsown_capable(CAP_SYS_CHROOT) ||
-	    !nsown_capable(CAP_SYS_ADMIN))
+	    !ns_capable(current_user_ns(), CAP_SYS_CHROOT) ||
+	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
 	if (fs->users != 1)
diff --git a/fs/open.c b/fs/open.c
index 9156cb0..2a57580 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -443,7 +443,7 @@ retry:
 		goto dput_and_out;
 
 	error = -EPERM;
-	if (!nsown_capable(CAP_SYS_CHROOT))
+	if (!ns_capable(current_user_ns(), CAP_SYS_CHROOT))
 		goto dput_and_out;
 	error = security_path_chroot(&path);
 	if (error)
diff --git a/include/linux/capability.h b/include/linux/capability.h
index d9a4f7f..a6ee1f9 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -210,7 +210,6 @@ extern bool has_ns_capability_noaudit(struct task_struct *t,
 				      struct user_namespace *ns, int cap);
 extern bool capable(int cap);
 extern bool ns_capable(struct user_namespace *ns, int cap);
-extern bool nsown_capable(int cap);
 extern bool inode_capable(const struct inode *inode, int cap);
 extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
 
diff --git a/ipc/namespace.c b/ipc/namespace.c
index 7ee61bf..4be6581 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -171,7 +171,7 @@ static int ipcns_install(struct nsproxy *nsproxy, void *new)
 {
 	struct ipc_namespace *ns = new;
 	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
-	    !nsown_capable(CAP_SYS_ADMIN))
+	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/* Ditch state from the old ipc namespace */
diff --git a/kernel/capability.c b/kernel/capability.c
index f6c2ce5..6fc1c8a 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -433,18 +433,6 @@ bool capable(int cap)
 EXPORT_SYMBOL(capable);
 
 /**
- * nsown_capable - Check superior capability to one's own user_ns
- * @cap: The capability in question
- *
- * Return true if the current task has the given superior capability
- * targeted at its own user namespace.
- */
-bool nsown_capable(int cap)
-{
-	return ns_capable(current_user_ns(), cap);
-}
-
-/**
  * inode_capable - Check superior capability over inode
  * @inode: The inode in question
  * @cap: The capability in question
diff --git a/kernel/groups.c b/kernel/groups.c
index 6b2588d..90cf1c3 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -233,7 +233,7 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
 	struct group_info *group_info;
 	int retval;
 
-	if (!nsown_capable(CAP_SETGID))
+	if (!ns_capable(current_user_ns(), CAP_SETGID))
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
 		return -EINVAL;
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 6917e8e..ee1f6bb 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -329,7 +329,7 @@ static int pidns_install(struct nsproxy *nsproxy, void *ns)
 	struct pid_namespace *ancestor, *new = ns;
 
 	if (!ns_capable(new->user_ns, CAP_SYS_ADMIN) ||
-	    !nsown_capable(CAP_SYS_ADMIN))
+	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/*
diff --git a/kernel/sys.c b/kernel/sys.c
index 771129b..c18ecca 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -337,7 +337,7 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
 	if (rgid != (gid_t) -1) {
 		if (gid_eq(old->gid, krgid) ||
 		    gid_eq(old->egid, krgid) ||
-		    nsown_capable(CAP_SETGID))
+		    ns_capable(old->user_ns, CAP_SETGID))
 			new->gid = krgid;
 		else
 			goto error;
@@ -346,7 +346,7 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
 		if (gid_eq(old->gid, kegid) ||
 		    gid_eq(old->egid, kegid) ||
 		    gid_eq(old->sgid, kegid) ||
-		    nsown_capable(CAP_SETGID))
+		    ns_capable(old->user_ns, CAP_SETGID))
 			new->egid = kegid;
 		else
 			goto error;
@@ -387,7 +387,7 @@ SYSCALL_DEFINE1(setgid, gid_t, gid)
 	old = current_cred();
 
 	retval = -EPERM;
-	if (nsown_capable(CAP_SETGID))
+	if (ns_capable(old->user_ns, CAP_SETGID))
 		new->gid = new->egid = new->sgid = new->fsgid = kgid;
 	else if (gid_eq(kgid, old->gid) || gid_eq(kgid, old->sgid))
 		new->egid = new->fsgid = kgid;
@@ -471,7 +471,7 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
 		new->uid = kruid;
 		if (!uid_eq(old->uid, kruid) &&
 		    !uid_eq(old->euid, kruid) &&
-		    !nsown_capable(CAP_SETUID))
+		    !ns_capable(old->user_ns, CAP_SETUID))
 			goto error;
 	}
 
@@ -480,7 +480,7 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
 		if (!uid_eq(old->uid, keuid) &&
 		    !uid_eq(old->euid, keuid) &&
 		    !uid_eq(old->suid, keuid) &&
-		    !nsown_capable(CAP_SETUID))
+		    !ns_capable(old->user_ns, CAP_SETUID))
 			goto error;
 	}
 
@@ -534,7 +534,7 @@ SYSCALL_DEFINE1(setuid, uid_t, uid)
 	old = current_cred();
 
 	retval = -EPERM;
-	if (nsown_capable(CAP_SETUID)) {
+	if (ns_capable(old->user_ns, CAP_SETUID)) {
 		new->suid = new->uid = kuid;
 		if (!uid_eq(kuid, old->uid)) {
 			retval = set_user(new);
@@ -591,7 +591,7 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
 	old = current_cred();
 
 	retval = -EPERM;
-	if (!nsown_capable(CAP_SETUID)) {
+	if (!ns_capable(old->user_ns, CAP_SETUID)) {
 		if (ruid != (uid_t) -1        && !uid_eq(kruid, old->uid) &&
 		    !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
 			goto error;
@@ -673,7 +673,7 @@ SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
 	old = current_cred();
 
 	retval = -EPERM;
-	if (!nsown_capable(CAP_SETGID)) {
+	if (!ns_capable(old->user_ns, CAP_SETGID)) {
 		if (rgid != (gid_t) -1        && !gid_eq(krgid, old->gid) &&
 		    !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
 			goto error;
@@ -744,7 +744,7 @@ SYSCALL_DEFINE1(setfsuid, uid_t, uid)
 
 	if (uid_eq(kuid, old->uid)  || uid_eq(kuid, old->euid)  ||
 	    uid_eq(kuid, old->suid) || uid_eq(kuid, old->fsuid) ||
-	    nsown_capable(CAP_SETUID)) {
+	    ns_capable(old->user_ns, CAP_SETUID)) {
 		if (!uid_eq(kuid, old->fsuid)) {
 			new->fsuid = kuid;
 			if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
@@ -783,7 +783,7 @@ SYSCALL_DEFINE1(setfsgid, gid_t, gid)
 
 	if (gid_eq(kgid, old->gid)  || gid_eq(kgid, old->egid)  ||
 	    gid_eq(kgid, old->sgid) || gid_eq(kgid, old->fsgid) ||
-	    nsown_capable(CAP_SETGID)) {
+	    ns_capable(old->user_ns, CAP_SETGID)) {
 		if (!gid_eq(kgid, old->fsgid)) {
 			new->fsgid = kgid;
 			goto change_okay;
diff --git a/kernel/uid16.c b/kernel/uid16.c
index f6c83d7..602e5bb 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -176,7 +176,7 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
 	struct group_info *group_info;
 	int retval;
 
-	if (!nsown_capable(CAP_SETGID))
+	if (!ns_capable(current_user_ns(), CAP_SETGID))
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
 		return -EINVAL;
diff --git a/kernel/utsname.c b/kernel/utsname.c
index 2fc8576..fd39312 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -114,7 +114,7 @@ static int utsns_install(struct nsproxy *nsproxy, void *new)
 	struct uts_namespace *ns = new;
 
 	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
-	    !nsown_capable(CAP_SYS_ADMIN))
+	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
 	get_uts_ns(ns);
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f9765203..81d3a9a 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -651,7 +651,7 @@ static int netns_install(struct nsproxy *nsproxy, void *ns)
 	struct net *net = ns;
 
 	if (!ns_capable(net->user_ns, CAP_SYS_ADMIN) ||
-	    !nsown_capable(CAP_SYS_ADMIN))
+	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
 	put_net(nsproxy->net_ns);
diff --git a/net/core/scm.c b/net/core/scm.c
index 03795d0..c346f58 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -56,9 +56,9 @@ static __inline__ int scm_check_creds(struct ucred *creds)
 	if ((creds->pid == task_tgid_vnr(current) ||
 	     ns_capable(current->nsproxy->pid_ns->user_ns, CAP_SYS_ADMIN)) &&
 	    ((uid_eq(uid, cred->uid)   || uid_eq(uid, cred->euid) ||
-	      uid_eq(uid, cred->suid)) || nsown_capable(CAP_SETUID)) &&
+	      uid_eq(uid, cred->suid)) || ns_capable(cred->user_ns, CAP_SETUID)) &&
 	    ((gid_eq(gid, cred->gid)   || gid_eq(gid, cred->egid) ||
-	      gid_eq(gid, cred->sgid)) || nsown_capable(CAP_SETGID))) {
+	      gid_eq(gid, cred->sgid)) || ns_capable(cred->user_ns, CAP_SETGID))) {
 	       return 0;
 	}
 	return -EPERM;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [REVIEW][PATCH 5/5] userns:  Kill nsown_capable it makes the wrong thing easy
  2013-08-29 23:56 ` [REVIEW][PATCH 5/5] userns: Kill nsown_capable it makes the wrong thing easy Eric W. Biederman
@ 2013-08-30  1:14   ` Serge E. Hallyn
  0 siblings, 0 replies; 13+ messages in thread
From: Serge E. Hallyn @ 2013-08-30  1:14 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux Containers, Serge E. Hallyn, linux-kernel

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> nsown_capable is a special case of ns_capable essentially for just CAP_SETUID and
> CAP_SETGID.  For the existing users it doesn't noticably simplify things and
> from the suggested patches I have seen it encourages people to do the wrong
> thing.  So remove nsown_capable.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Yeah I've had the same thought before.  You rarely want nsown_capable, and
it wants to be mis-used.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  fs/namespace.c             |    4 ++--
>  fs/open.c                  |    2 +-
>  include/linux/capability.h |    1 -
>  ipc/namespace.c            |    2 +-
>  kernel/capability.c        |   12 ------------
>  kernel/groups.c            |    2 +-
>  kernel/pid_namespace.c     |    2 +-
>  kernel/sys.c               |   20 ++++++++++----------
>  kernel/uid16.c             |    2 +-
>  kernel/utsname.c           |    2 +-
>  net/core/net_namespace.c   |    2 +-
>  net/core/scm.c             |    4 ++--
>  12 files changed, 21 insertions(+), 34 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 877e427..dc519a1 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2929,8 +2929,8 @@ static int mntns_install(struct nsproxy *nsproxy, void *ns)
>  	struct path root;
>  
>  	if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
> -	    !nsown_capable(CAP_SYS_CHROOT) ||
> -	    !nsown_capable(CAP_SYS_ADMIN))
> +	    !ns_capable(current_user_ns(), CAP_SYS_CHROOT) ||
> +	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
>  		return -EPERM;
>  
>  	if (fs->users != 1)
> diff --git a/fs/open.c b/fs/open.c
> index 9156cb0..2a57580 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -443,7 +443,7 @@ retry:
>  		goto dput_and_out;
>  
>  	error = -EPERM;
> -	if (!nsown_capable(CAP_SYS_CHROOT))
> +	if (!ns_capable(current_user_ns(), CAP_SYS_CHROOT))
>  		goto dput_and_out;
>  	error = security_path_chroot(&path);
>  	if (error)
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index d9a4f7f..a6ee1f9 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -210,7 +210,6 @@ extern bool has_ns_capability_noaudit(struct task_struct *t,
>  				      struct user_namespace *ns, int cap);
>  extern bool capable(int cap);
>  extern bool ns_capable(struct user_namespace *ns, int cap);
> -extern bool nsown_capable(int cap);
>  extern bool inode_capable(const struct inode *inode, int cap);
>  extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>  
> diff --git a/ipc/namespace.c b/ipc/namespace.c
> index 7ee61bf..4be6581 100644
> --- a/ipc/namespace.c
> +++ b/ipc/namespace.c
> @@ -171,7 +171,7 @@ static int ipcns_install(struct nsproxy *nsproxy, void *new)
>  {
>  	struct ipc_namespace *ns = new;
>  	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
> -	    !nsown_capable(CAP_SYS_ADMIN))
> +	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
>  		return -EPERM;
>  
>  	/* Ditch state from the old ipc namespace */
> diff --git a/kernel/capability.c b/kernel/capability.c
> index f6c2ce5..6fc1c8a 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -433,18 +433,6 @@ bool capable(int cap)
>  EXPORT_SYMBOL(capable);
>  
>  /**
> - * nsown_capable - Check superior capability to one's own user_ns
> - * @cap: The capability in question
> - *
> - * Return true if the current task has the given superior capability
> - * targeted at its own user namespace.
> - */
> -bool nsown_capable(int cap)
> -{
> -	return ns_capable(current_user_ns(), cap);
> -}
> -
> -/**
>   * inode_capable - Check superior capability over inode
>   * @inode: The inode in question
>   * @cap: The capability in question
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 6b2588d..90cf1c3 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -233,7 +233,7 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
>  	struct group_info *group_info;
>  	int retval;
>  
> -	if (!nsown_capable(CAP_SETGID))
> +	if (!ns_capable(current_user_ns(), CAP_SETGID))
>  		return -EPERM;
>  	if ((unsigned)gidsetsize > NGROUPS_MAX)
>  		return -EINVAL;
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 6917e8e..ee1f6bb 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -329,7 +329,7 @@ static int pidns_install(struct nsproxy *nsproxy, void *ns)
>  	struct pid_namespace *ancestor, *new = ns;
>  
>  	if (!ns_capable(new->user_ns, CAP_SYS_ADMIN) ||
> -	    !nsown_capable(CAP_SYS_ADMIN))
> +	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
>  		return -EPERM;
>  
>  	/*
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 771129b..c18ecca 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -337,7 +337,7 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
>  	if (rgid != (gid_t) -1) {
>  		if (gid_eq(old->gid, krgid) ||
>  		    gid_eq(old->egid, krgid) ||
> -		    nsown_capable(CAP_SETGID))
> +		    ns_capable(old->user_ns, CAP_SETGID))
>  			new->gid = krgid;
>  		else
>  			goto error;
> @@ -346,7 +346,7 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
>  		if (gid_eq(old->gid, kegid) ||
>  		    gid_eq(old->egid, kegid) ||
>  		    gid_eq(old->sgid, kegid) ||
> -		    nsown_capable(CAP_SETGID))
> +		    ns_capable(old->user_ns, CAP_SETGID))
>  			new->egid = kegid;
>  		else
>  			goto error;
> @@ -387,7 +387,7 @@ SYSCALL_DEFINE1(setgid, gid_t, gid)
>  	old = current_cred();
>  
>  	retval = -EPERM;
> -	if (nsown_capable(CAP_SETGID))
> +	if (ns_capable(old->user_ns, CAP_SETGID))
>  		new->gid = new->egid = new->sgid = new->fsgid = kgid;
>  	else if (gid_eq(kgid, old->gid) || gid_eq(kgid, old->sgid))
>  		new->egid = new->fsgid = kgid;
> @@ -471,7 +471,7 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
>  		new->uid = kruid;
>  		if (!uid_eq(old->uid, kruid) &&
>  		    !uid_eq(old->euid, kruid) &&
> -		    !nsown_capable(CAP_SETUID))
> +		    !ns_capable(old->user_ns, CAP_SETUID))
>  			goto error;
>  	}
>  
> @@ -480,7 +480,7 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
>  		if (!uid_eq(old->uid, keuid) &&
>  		    !uid_eq(old->euid, keuid) &&
>  		    !uid_eq(old->suid, keuid) &&
> -		    !nsown_capable(CAP_SETUID))
> +		    !ns_capable(old->user_ns, CAP_SETUID))
>  			goto error;
>  	}
>  
> @@ -534,7 +534,7 @@ SYSCALL_DEFINE1(setuid, uid_t, uid)
>  	old = current_cred();
>  
>  	retval = -EPERM;
> -	if (nsown_capable(CAP_SETUID)) {
> +	if (ns_capable(old->user_ns, CAP_SETUID)) {
>  		new->suid = new->uid = kuid;
>  		if (!uid_eq(kuid, old->uid)) {
>  			retval = set_user(new);
> @@ -591,7 +591,7 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
>  	old = current_cred();
>  
>  	retval = -EPERM;
> -	if (!nsown_capable(CAP_SETUID)) {
> +	if (!ns_capable(old->user_ns, CAP_SETUID)) {
>  		if (ruid != (uid_t) -1        && !uid_eq(kruid, old->uid) &&
>  		    !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
>  			goto error;
> @@ -673,7 +673,7 @@ SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
>  	old = current_cred();
>  
>  	retval = -EPERM;
> -	if (!nsown_capable(CAP_SETGID)) {
> +	if (!ns_capable(old->user_ns, CAP_SETGID)) {
>  		if (rgid != (gid_t) -1        && !gid_eq(krgid, old->gid) &&
>  		    !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
>  			goto error;
> @@ -744,7 +744,7 @@ SYSCALL_DEFINE1(setfsuid, uid_t, uid)
>  
>  	if (uid_eq(kuid, old->uid)  || uid_eq(kuid, old->euid)  ||
>  	    uid_eq(kuid, old->suid) || uid_eq(kuid, old->fsuid) ||
> -	    nsown_capable(CAP_SETUID)) {
> +	    ns_capable(old->user_ns, CAP_SETUID)) {
>  		if (!uid_eq(kuid, old->fsuid)) {
>  			new->fsuid = kuid;
>  			if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
> @@ -783,7 +783,7 @@ SYSCALL_DEFINE1(setfsgid, gid_t, gid)
>  
>  	if (gid_eq(kgid, old->gid)  || gid_eq(kgid, old->egid)  ||
>  	    gid_eq(kgid, old->sgid) || gid_eq(kgid, old->fsgid) ||
> -	    nsown_capable(CAP_SETGID)) {
> +	    ns_capable(old->user_ns, CAP_SETGID)) {
>  		if (!gid_eq(kgid, old->fsgid)) {
>  			new->fsgid = kgid;
>  			goto change_okay;
> diff --git a/kernel/uid16.c b/kernel/uid16.c
> index f6c83d7..602e5bb 100644
> --- a/kernel/uid16.c
> +++ b/kernel/uid16.c
> @@ -176,7 +176,7 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
>  	struct group_info *group_info;
>  	int retval;
>  
> -	if (!nsown_capable(CAP_SETGID))
> +	if (!ns_capable(current_user_ns(), CAP_SETGID))
>  		return -EPERM;
>  	if ((unsigned)gidsetsize > NGROUPS_MAX)
>  		return -EINVAL;
> diff --git a/kernel/utsname.c b/kernel/utsname.c
> index 2fc8576..fd39312 100644
> --- a/kernel/utsname.c
> +++ b/kernel/utsname.c
> @@ -114,7 +114,7 @@ static int utsns_install(struct nsproxy *nsproxy, void *new)
>  	struct uts_namespace *ns = new;
>  
>  	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
> -	    !nsown_capable(CAP_SYS_ADMIN))
> +	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
>  		return -EPERM;
>  
>  	get_uts_ns(ns);
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index f9765203..81d3a9a 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -651,7 +651,7 @@ static int netns_install(struct nsproxy *nsproxy, void *ns)
>  	struct net *net = ns;
>  
>  	if (!ns_capable(net->user_ns, CAP_SYS_ADMIN) ||
> -	    !nsown_capable(CAP_SYS_ADMIN))
> +	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
>  		return -EPERM;
>  
>  	put_net(nsproxy->net_ns);
> diff --git a/net/core/scm.c b/net/core/scm.c
> index 03795d0..c346f58 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -56,9 +56,9 @@ static __inline__ int scm_check_creds(struct ucred *creds)
>  	if ((creds->pid == task_tgid_vnr(current) ||
>  	     ns_capable(current->nsproxy->pid_ns->user_ns, CAP_SYS_ADMIN)) &&
>  	    ((uid_eq(uid, cred->uid)   || uid_eq(uid, cred->euid) ||
> -	      uid_eq(uid, cred->suid)) || nsown_capable(CAP_SETUID)) &&
> +	      uid_eq(uid, cred->suid)) || ns_capable(cred->user_ns, CAP_SETUID)) &&
>  	    ((gid_eq(gid, cred->gid)   || gid_eq(gid, cred->egid) ||
> -	      gid_eq(gid, cred->sgid)) || nsown_capable(CAP_SETGID))) {
> +	      gid_eq(gid, cred->sgid)) || ns_capable(cred->user_ns, CAP_SETGID))) {
>  	       return 0;
>  	}
>  	return -EPERM;
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [REVIEW][PATCH 2/5] userns: Allow PR_CAPBSET_DROP in a user namespace.
  2013-08-29 23:54 ` [REVIEW][PATCH 2/5] userns: Allow PR_CAPBSET_DROP in a user namespace Eric W. Biederman
@ 2013-08-30  1:15   ` Serge E. Hallyn
  0 siblings, 0 replies; 13+ messages in thread
From: Serge E. Hallyn @ 2013-08-30  1:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Serge E. Hallyn, linux-kernel, Richard Weinberger

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> As the capabilites and capability bounding set are per user namespace
> properties it is safe to allow changing them with just CAP_SETPCAP
> permission in the user namespace.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> Tested-by: Richard Weinberger <richard@nod.at>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  security/commoncap.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/security/commoncap.c b/security/commoncap.c
> index c44b6fe..9fccf71 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -824,7 +824,7 @@ int cap_task_setnice(struct task_struct *p, int nice)
>   */
>  static long cap_prctl_drop(struct cred *new, unsigned long cap)
>  {
> -	if (!capable(CAP_SETPCAP))
> +	if (!ns_capable(current_user_ns(), CAP_SETPCAP))
>  		return -EPERM;
>  	if (!cap_valid(cap))
>  		return -EINVAL;
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [REVIEW][PATCH 1/5] namespaces: Simplify copy_namespaces so it is clear what is going on.
  2013-08-29 23:53 ` [REVIEW][PATCH 1/5] namespaces: Simplify copy_namespaces so it is clear what is going on Eric W. Biederman
@ 2013-08-30 16:10   ` Serge E. Hallyn
  0 siblings, 0 replies; 13+ messages in thread
From: Serge E. Hallyn @ 2013-08-30 16:10 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux Containers, Serge E. Hallyn, linux-kernel

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> Remove the test for the impossible case where tsk->nsproxy == NULL.  Fork
> will never be called with tsk->nsproxy == NULL.
> 
> Only call get_nsproxy when we don't need to generate a new_nsproxy,
> and mark the case where we don't generate a new nsproxy as likely.
> 
> Remove the code to drop an unnecessarily acquired nsproxy value.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  kernel/nsproxy.c |   35 +++++++++++------------------------
>  1 files changed, 11 insertions(+), 24 deletions(-)
> 
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index d9afd2563..a1ed011 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -125,22 +125,16 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
>  	struct nsproxy *old_ns = tsk->nsproxy;
>  	struct user_namespace *user_ns = task_cred_xxx(tsk, user_ns);
>  	struct nsproxy *new_ns;
> -	int err = 0;
>  
> -	if (!old_ns)
> +	if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
> +			      CLONE_NEWPID | CLONE_NEWNET)))) {
> +		get_nsproxy(old_ns);
>  		return 0;
> -
> -	get_nsproxy(old_ns);
> -
> -	if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
> -				CLONE_NEWPID | CLONE_NEWNET)))
> -		return 0;
> -
> -	if (!ns_capable(user_ns, CAP_SYS_ADMIN)) {
> -		err = -EPERM;
> -		goto out;
>  	}
>  
> +	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
> +		return -EPERM;
> +
>  	/*
>  	 * CLONE_NEWIPC must detach from the undolist: after switching
>  	 * to a new ipc namespace, the semaphore arrays from the old
> @@ -149,22 +143,15 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
>  	 * it along with CLONE_NEWIPC.
>  	 */
>  	if ((flags & (CLONE_NEWIPC | CLONE_SYSVSEM)) ==
> -		(CLONE_NEWIPC | CLONE_SYSVSEM)) {
> -		err = -EINVAL;
> -		goto out;
> -	}
> +		(CLONE_NEWIPC | CLONE_SYSVSEM)) 
> +		return -EINVAL;
>  
>  	new_ns = create_new_namespaces(flags, tsk, user_ns, tsk->fs);
> -	if (IS_ERR(new_ns)) {
> -		err = PTR_ERR(new_ns);
> -		goto out;
> -	}
> +	if (IS_ERR(new_ns))
> +		return  PTR_ERR(new_ns);
>  
>  	tsk->nsproxy = new_ns;
> -
> -out:
> -	put_nsproxy(old_ns);
> -	return err;
> +	return 0;
>  }
>  
>  void free_nsproxy(struct nsproxy *ns)
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
  2013-08-29 23:55 ` [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD Eric W. Biederman
@ 2013-08-30 16:38   ` Serge E. Hallyn
  2013-08-30 23:49     ` Eric W. Biederman
  2013-09-08 17:00   ` Oleg Nesterov
  1 sibling, 1 reply; 13+ messages in thread
From: Serge E. Hallyn @ 2013-08-30 16:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Serge E. Hallyn, linux-kernel, Oleg Nesterov

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> I goofed when I made unshare(CLONE_NEWPID) only work in a
> single-threaded process.  There is no need for that requirement and in
> fact I analyzied things right for setns.  The hard requirement
> is for tasks that share a VM to all be in the pid namespace and
> we properly prevent that in do_fork.

I don't understand though - copy_process does have the right test:

   1176          * If the new process will be in a different pid namespace
   1177          * don't allow the creation of threads.
   1178          */
   1179         if ((clone_flags & (CLONE_VM|CLONE_NEWPID)) &&
   1180             (task_active_pid_ns(current) != current->nsproxy->pid_ns))
   1181                 return ERR_PTR(-EINVAL);

but why is it ok for sys_unshare not to do that?  Note that
in order for check_unshare_flags() to bail on &current->mm->mm_users > 1
you do have to set CLONE_VM (for inverse interpretation).

So it seems to me this isn't safe as is, and we need to at least
set CLONE_VM if CLONE_PID is set.

> Just to be certain I took a look through do_wait and
> forget_original_parent and there are no cases that make it any harder
> for children to be in the multiple pid namespaces than it is for
> children to be in the same pid namespace.  I also performed a check to
> see if there were in uses of task->nsproxy_pid_ns I was not familiar
> with, but it is only used when allocating a new pid for a new task,
> and in checks to prevent craziness from happening.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/fork.c |    5 -----
>  1 files changed, 0 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 66635c8..eb45f1d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1818,11 +1818,6 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
>  	if (unshare_flags & CLONE_NEWUSER)
>  		unshare_flags |= CLONE_THREAD | CLONE_FS;
>  	/*
> -	 * If unsharing a pid namespace must also unshare the thread.
> -	 */
> -	if (unshare_flags & CLONE_NEWPID)
> -		unshare_flags |= CLONE_THREAD;
> -	/*
>  	 * If unsharing a thread from a thread group, must also unshare vm.
>  	 */
>  	if (unshare_flags & CLONE_THREAD)
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
  2013-08-30 16:38   ` Serge E. Hallyn
@ 2013-08-30 23:49     ` Eric W. Biederman
  2013-08-31  5:31       ` Serge E. Hallyn
  0 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2013-08-30 23:49 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Linux Containers, linux-kernel, Oleg Nesterov

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> 
>> I goofed when I made unshare(CLONE_NEWPID) only work in a
>> single-threaded process.  There is no need for that requirement and in
>> fact I analyzied things right for setns.  The hard requirement
>> is for tasks that share a VM to all be in the pid namespace and
>> we properly prevent that in do_fork.
>
> I don't understand though - copy_process does have the right test:
>
>    1176          * If the new process will be in a different pid namespace
>    1177          * don't allow the creation of threads.
>    1178          */
>    1179         if ((clone_flags & (CLONE_VM|CLONE_NEWPID)) &&
>    1180             (task_active_pid_ns(current) != current->nsproxy->pid_ns))
>    1181                 return ERR_PTR(-EINVAL);
>
> but why is it ok for sys_unshare not to do that?  Note that
> in order for check_unshare_flags() to bail on &current->mm->mm_users > 1
> you do have to set CLONE_VM (for inverse interpretation).
>
> So it seems to me this isn't safe as is, and we need to at least
> set CLONE_VM if CLONE_PID is set.

Partly this is the difference in the meaning of the flags between
unshare and clone.

Basically in unshare all othat gets changed is
current->nsproxy->pid_ns_for_children (the rename is in the net tree).

So because unshare of the pid namespace does not actually effect the
current processes, just the pid namespace the children of the current
thread will be in this is safe.

And frankly having the checks be obviously different is a good thing
because it means that people will ask why in the world this is so and
realize the difference in meaning.

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
  2013-08-30 23:49     ` Eric W. Biederman
@ 2013-08-31  5:31       ` Serge E. Hallyn
  0 siblings, 0 replies; 13+ messages in thread
From: Serge E. Hallyn @ 2013-08-31  5:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Linux Containers, linux-kernel, Oleg Nesterov

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> 
> >> I goofed when I made unshare(CLONE_NEWPID) only work in a
> >> single-threaded process.  There is no need for that requirement and in
> >> fact I analyzied things right for setns.  The hard requirement
> >> is for tasks that share a VM to all be in the pid namespace and
> >> we properly prevent that in do_fork.
> >
> > I don't understand though - copy_process does have the right test:
> >
> >    1176          * If the new process will be in a different pid namespace
> >    1177          * don't allow the creation of threads.
> >    1178          */
> >    1179         if ((clone_flags & (CLONE_VM|CLONE_NEWPID)) &&
> >    1180             (task_active_pid_ns(current) != current->nsproxy->pid_ns))
> >    1181                 return ERR_PTR(-EINVAL);
> >
> > but why is it ok for sys_unshare not to do that?  Note that
> > in order for check_unshare_flags() to bail on &current->mm->mm_users > 1
> > you do have to set CLONE_VM (for inverse interpretation).
> >
> > So it seems to me this isn't safe as is, and we need to at least
> > set CLONE_VM if CLONE_PID is set.
> 
> Partly this is the difference in the meaning of the flags between
> unshare and clone.
> 
> Basically in unshare all othat gets changed is
> current->nsproxy->pid_ns_for_children (the rename is in the net tree).

D'oh, right.  Thanks!

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>


> So because unshare of the pid namespace does not actually effect the
> current processes, just the pid namespace the children of the current
> thread will be in this is safe.
> 
> And frankly having the checks be obviously different is a good thing
> because it means that people will ask why in the world this is so and
> realize the difference in meaning.
> 
> Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
  2013-08-29 23:55 ` [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD Eric W. Biederman
  2013-08-30 16:38   ` Serge E. Hallyn
@ 2013-09-08 17:00   ` Oleg Nesterov
  1 sibling, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2013-09-08 17:00 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux Containers, Serge E. Hallyn, linux-kernel

Sorry for delay, vacation.

On 08/29, Eric W. Biederman wrote:
>
> I goofed when I made unshare(CLONE_NEWPID) only work in a
> single-threaded process.  There is no need for that requirement and in
> fact I analyzied things right for setns.  The hard requirement
> is for tasks that share a VM to all be in the pid namespace and
> we properly prevent that in do_fork.

Yes, agreed, with the current meaning of ->pid_ns unshare(NEWPID)
looks safe even if the caller is multi-threaded... and this matches
pidns_install() which doesn't require single-threaded.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-09-08 17:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-29 23:52 [REVIEW][PATCH 0/5] A couple of lingering namespace patches Eric W. Biederman
2013-08-29 23:53 ` [REVIEW][PATCH 1/5] namespaces: Simplify copy_namespaces so it is clear what is going on Eric W. Biederman
2013-08-30 16:10   ` Serge E. Hallyn
2013-08-29 23:54 ` [REVIEW][PATCH 2/5] userns: Allow PR_CAPBSET_DROP in a user namespace Eric W. Biederman
2013-08-30  1:15   ` Serge E. Hallyn
2013-08-29 23:55 ` [REVIEW][PATCH 3/5] pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD Eric W. Biederman
2013-08-30 16:38   ` Serge E. Hallyn
2013-08-30 23:49     ` Eric W. Biederman
2013-08-31  5:31       ` Serge E. Hallyn
2013-09-08 17:00   ` Oleg Nesterov
2013-08-29 23:55 ` [REVIEW][PATCH 4/5] capabilities: allow nice if we are privileged Eric W. Biederman
2013-08-29 23:56 ` [REVIEW][PATCH 5/5] userns: Kill nsown_capable it makes the wrong thing easy Eric W. Biederman
2013-08-30  1:14   ` Serge E. Hallyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).