[PATCH 0/14] user namespaces v2: continue targetting capabilities

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/14] user namespaces v2: continue targetting capabilities
@ 2011-07-26 18:58 Serge Hallyn
  2011-07-26 18:58 ` [PATCH 01/14] add Documentation/namespaces/user_namespace.txt Serge Hallyn
                   ` (13 more replies)
  0 siblings, 14 replies; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: dhowells, ebiederm, containers, netdev, akpm

Hi,

here is a set of patches to continue targetting capabilities
where appropriate.  This set goes about as far as is possible
without making the VFS user namespace aware, meaning that the
VFS can provide a namespaced view of userids, i.e init_user_ns
sees file owner 500, while child user ns sees file owner 0 or
1000.  (There are a few other things, like siginfos, which can
be addressed before we address the VFS).

With this set applied, you can create and configure veth netdevs
if your user namespace owns your network namespace (and you are
privileged), but not otherwise.

Some simple testcases can be found at
https://code.launchpad.net/~serge-hallyn/+junk/usernstests with
packages at https://launchpad.net/~serge-hallyn/+archive/userns-natty

Feedback very much appreciated.

Changes since v1:
    documentation: incorporate feedback on user_namespaces.txt
    netlink_capable: use sock_net() instead of ifdefs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/14] add Documentation/namespaces/user_namespace.txt
  2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
@ 2011-07-26 18:58 ` Serge Hallyn
  2011-07-26 20:22   ` Randy Dunlap
  2011-07-26 20:29   ` David Howells
  2011-07-26 18:58 ` [PATCH 02/14] allow root in container to copy namespaces Serge Hallyn
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: dhowells, ebiederm, containers, netdev, akpm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

This will hold some info about the design.  Currently it contains
future todos, issues and questions.

Changelog:
   jul 26: incorporate feed back from David Howells.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
---
 Documentation/namespaces/user_namespace.txt |  107 +++++++++++++++++++++++++++
 1 files changed, 107 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/namespaces/user_namespace.txt

diff --git a/Documentation/namespaces/user_namespace.txt b/Documentation/namespaces/user_namespace.txt
new file mode 100644
index 0000000..7e50517
--- /dev/null
+++ b/Documentation/namespaces/user_namespace.txt
@@ -0,0 +1,107 @@
+Description
+===========
+
+Traditionally, each task is owned by a user ID (UID) and belongs to one or more
+groups (GID).  Both are simple numeric IDs, though userspace usually translates
+them to names.  The user namespace allows tasks to have different views of the
+UIDs and GIDs associated with tasks and other resources.  (See 'UID mapping'
+below for more)
+
+The user namespace is a simple hierarchical one.  The system starts with all
+tasks belonging to the initial user namespace.  A task creates a new user
+namespace by passing the CLONE_NEWUSER flag to clone(2).  This requires the
+creating task to have the CAP_SETUID, CAP_SETGID, and CAP_CHOWN capabilities,
+but it does not need to be running as root.  The clone(2) call will result in a
+new task which to itself appears to be running as UID and GID 0, but to its
+creator seems to have the creator's credentials.
+
+Any task in or resource belonging to the initial user namespace will, to this
+new task, appear to belong to UID and GID -1 - which is usually known as
+'nobody'.  Permission to open such files will be granted according to world
+access permissions.  UID comparisons and group membership checks will return
+false, and privilege will be denied.
+
+When a task belonging to (for example) userid 500 in the initial user namespace
+creates a new user namespace, even though the new task will see itself as
+belonging to UID 0, any task in the initial user namespace will see it as
+belonging to UID 500.  Therefore, UID 500 in the initial user namespace will be
+able to kill the new task.  Files created by the new user will (eventually) be
+seen by tasks in its own user namespace as belonging to UID 0, but to tasks in
+the initial user namespace as belonging to UID 500.
+
+Note that this userid mapping for the VFS is not yet implemented, though the
+lkml and containers mailing list archives will show several previous
+prototypes.  In the end, those got hung up waiting on the concept of targeted
+capabilities to be developed, which, thanks to the insight of Eric Biederman,
+they finally did.
+
+Relationship between the User namespace and other namespaces
+============================================================
+
+Other namespaces, such as UTS and network, are owned by a user namespace.  When
+such a namespace is created, it is assigned to the user namespace of the task
+by which it was created.  Therefore, attempts to exercise privilege to
+resources in, for instance, a particular network namespace, can be properly
+validated by checking whether the caller has the needed privilege (i.e.
+CAP_NET_ADMIN) targeted to the user namespace which owns the network namespace.
+This is done using the ns_capable() function.
+
+As an example, if a new task is cloned with a private user namespace but
+no private network namespace, then the task's network namespace is owned
+by the parent user namespace.  The new task has no privilege to the
+parent user namespace, so it will not be able to create or configure
+network devices.  If, instead, the task were cloned with both private
+user and network namespaces, then the private network namespace is owned
+by the private user namespace, and so root in the new user namespace
+will have privilege targeted to the network namespace.  It will be able
+to create and configure network devices.
+
+UID Mapping
+===========
+The current plan (see 'flexible UID mapping' at
+https://wiki.ubuntu.com/UserNamespace) is:
+
+The UID/GID stored on disk will be that in the init_user_ns.  Most likely
+UID/GID in other namespaces will be stored in xattrs.  But Eric was advocating
+(a few years ago) leaving the details up to filesystems while providing a lib/
+stock implementation.  See the thread around here
+http://www.mail-archive.com/devel@openvz.org/msg09331.html
+
+
+Working notes
+=============
+Capability checks for actions related to syslog must be against the
+init_user_ns until syslog is containerized.
+
+Same is true for reboot and power, control groups, devices, and time.
+
+Perf actions (kernel/event/core.c for instance) will always be constrained to
+init_user_ns.
+
+Q:
+Is accounting considered properly containerized wrt pidns?  (it appears to be).
+If so, then we can change the capable() check in kernel/acct.c to
+'ns_capable(current_pid_ns()->user_ns, CAP_PACCT)'
+
+Q:
+For things like nice and schedaffinity, we could allow root in a container to
+control those, and leave only cgroups to constrain the container.  I'm not sure
+whether that is right, or whether it violates admin expectations.
+
+I deferred some of commoncap.c.  I'm punting on xattr stuff as they take
+dentries, not inodes.
+
+For drivers/tty/tty_io.c and drivers/tty/vt/vt.c, we'll want to (for some of
+them) target the capability checks at the user_ns owning the tty.  That will
+have to wait until we get userns owning files straightened out.
+
+We need to figure out how to label devices.  Should we just toss a user_ns
+right into struct device?
+
+capable(CAP_MAC_ADMIN) checks are always to be against init_user_ns, unless
+some day LSMs were to be containerized, near zero chance.
+
+inode_owner_or_capable() should probably take an optional ns and cap parameter.
+If cap is 0, then CAP_FOWNER is checked.  If ns is NULL, we derive the ns from
+inode.  But if ns is provided, then callers who need to derive
+inode_userns(inode) anyway can save a few cycles.
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/14] allow root in container to copy namespaces
  2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
  2011-07-26 18:58 ` [PATCH 01/14] add Documentation/namespaces/user_namespace.txt Serge Hallyn
@ 2011-07-26 18:58 ` Serge Hallyn
  2011-07-27 23:14   ` Eric W. Biederman
  2011-07-26 18:58 ` [PATCH 03/14] keyctl: check capabilities against key's user_ns Serge Hallyn
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: dhowells, ebiederm, containers, netdev, akpm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

Othewise nested containers with user namespaces won't be possible.

It's true that user namespaces are not yet fully isolated, but for
that same reason there are far worse things that root in a child
user ns can do.  Spawning a child user ns is not in itself bad.

This patch also allows setns for root in a container:
@Eric Biederman: are there gotchas in allowing setns from child
userns?

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 kernel/fork.c    |    4 ++--
 kernel/nsproxy.c |    6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 17bf7c8..22d0cf0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1473,8 +1473,8 @@ long do_fork(unsigned long clone_flags,
 		/* hopefully this check will go away when userns support is
 		 * complete
 		 */
-		if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
-				!capable(CAP_SETGID))
+		if (!nsown_capable(CAP_SYS_ADMIN) || !nsown_capable(CAP_SETUID) ||
+				!nsown_capable(CAP_SETGID))
 			return -EPERM;
 	}
 
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 9aeab4b..f50542d 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -134,7 +134,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
 				CLONE_NEWPID | CLONE_NEWNET)))
 		return 0;
 
-	if (!capable(CAP_SYS_ADMIN)) {
+	if (!nsown_capable(CAP_SYS_ADMIN)) {
 		err = -EPERM;
 		goto out;
 	}
@@ -191,7 +191,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
 			       CLONE_NEWNET)))
 		return 0;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!nsown_capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
 	*new_nsp = create_new_namespaces(unshare_flags, current,
@@ -241,7 +241,7 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype)
 	struct file *file;
 	int err;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!nsown_capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
 	file = proc_ns_fget(fd);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/14] keyctl: check capabilities against key's user_ns
  2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
  2011-07-26 18:58 ` [PATCH 01/14] add Documentation/namespaces/user_namespace.txt Serge Hallyn
  2011-07-26 18:58 ` [PATCH 02/14] allow root in container to copy namespaces Serge Hallyn
@ 2011-07-26 18:58 ` Serge Hallyn
  2011-07-26 18:58 ` [PATCH 04/14] user_ns: convert fs/attr.c to targeted capabilities Serge Hallyn
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: dhowells, ebiederm, containers, netdev, akpm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

ATM, task should only be able to get his own user_ns's keys
anyway, so nsown_capable should also work, but there is no
advantage to doing that, while using key's user_ns is clearer.

changelog: jun 6:
	compile fix: keyctl.c (key_user, not key has user_ns)

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 security/keys/keyctl.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index eca5191..fa7d420 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -745,7 +745,7 @@ long keyctl_chown_key(key_serial_t id, uid_t uid, gid_t gid)
 	ret = -EACCES;
 	down_write(&key->sem);
 
-	if (!capable(CAP_SYS_ADMIN)) {
+	if (!ns_capable(key->user->user_ns, CAP_SYS_ADMIN)) {
 		/* only the sysadmin can chown a key to some other UID */
 		if (uid != (uid_t) -1 && key->uid != uid)
 			goto error_put;
@@ -852,7 +852,8 @@ long keyctl_setperm_key(key_serial_t id, key_perm_t perm)
 	down_write(&key->sem);
 
 	/* if we're not the sysadmin, we can only change a key that we own */
-	if (capable(CAP_SYS_ADMIN) || key->uid == current_fsuid()) {
+	if (ns_capable(key->user->user_ns, CAP_SYS_ADMIN) ||
+	    key->uid == current_fsuid()) {
 		key->perm = perm;
 		ret = 0;
 	}
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/14] user_ns: convert fs/attr.c to targeted capabilities
  2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
                   ` (2 preceding siblings ...)
  2011-07-26 18:58 ` [PATCH 03/14] keyctl: check capabilities against key's user_ns Serge Hallyn
@ 2011-07-26 18:58 ` Serge Hallyn
  2011-07-26 18:58 ` [PATCH 05/14] userns: clamp down users of cap_raised Serge Hallyn
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: dhowells, ebiederm, containers, netdev, akpm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/attr.c |   20 +++++++++++++-------
 1 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index 538e279..e0cf46a 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -29,6 +29,7 @@
 int inode_change_ok(const struct inode *inode, struct iattr *attr)
 {
 	unsigned int ia_valid = attr->ia_valid;
+	struct user_namespace *ns;
 
 	/*
 	 * First check size constraints.  These can't be overriden using
@@ -44,26 +45,28 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
 	if (ia_valid & ATTR_FORCE)
 		return 0;
 
+	ns = inode_userns(inode);
 	/* Make sure a caller can chown. */
 	if ((ia_valid & ATTR_UID) &&
-	    (current_fsuid() != inode->i_uid ||
-	     attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN))
+	    (ns != current_user_ns() || current_fsuid() != inode->i_uid ||
+	     attr->ia_uid != inode->i_uid) && !ns_capable(ns, CAP_CHOWN))
 		return -EPERM;
 
 	/* Make sure caller can chgrp. */
 	if ((ia_valid & ATTR_GID) &&
-	    (current_fsuid() != inode->i_uid ||
+	    (ns != current_user_ns() || current_fsuid() != inode->i_uid ||
 	    (!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid)) &&
-	    !capable(CAP_CHOWN))
+	    !ns_capable(ns, CAP_CHOWN))
 		return -EPERM;
 
 	/* Make sure a caller can chmod. */
 	if (ia_valid & ATTR_MODE) {
+		gid_t gid = (ia_valid & ATTR_GID) ? attr->ia_gid : inode->i_gid;
 		if (!inode_owner_or_capable(inode))
 			return -EPERM;
 		/* Also check the setgid bit! */
-		if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
-				inode->i_gid) && !capable(CAP_FSETID))
+		if ((ns != current_user_ns() || !in_group_p(gid)) &&
+		    !ns_capable(ns, CAP_FSETID))
 			attr->ia_mode &= ~S_ISGID;
 	}
 
@@ -154,9 +157,12 @@ void setattr_copy(struct inode *inode, const struct iattr *attr)
 						inode->i_sb->s_time_gran);
 	if (ia_valid & ATTR_MODE) {
 		umode_t mode = attr->ia_mode;
+		struct user_namespace *ns = inode_userns(inode);
 
-		if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
+		if ((ns != current_user_ns() || !in_group_p(inode->i_gid)) &&
+		    !ns_capable(ns, CAP_FSETID))
 			mode &= ~S_ISGID;
+
 		inode->i_mode = mode;
 	}
 }
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/14] userns: clamp down users of cap_raised
  2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
                   ` (3 preceding siblings ...)
  2011-07-26 18:58 ` [PATCH 04/14] user_ns: convert fs/attr.c to targeted capabilities Serge Hallyn
@ 2011-07-26 18:58 ` Serge Hallyn
  2011-07-28 23:23   ` Vasiliy Kulikov
  2011-07-26 18:58 ` [PATCH 06/14] user namespace: make each net (net_ns) belong to a user_ns Serge Hallyn
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: dhowells, ebiederm, containers, netdev, akpm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

A few modules are using cap_raised(current_cap(), cap) to authorize
actions, but the privilege should be applicable against the initial
user namespace.  Refuse privilege if the caller is not in init_user_ns.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/block/drbd/drbd_nl.c           |    5 +++++
 drivers/md/dm-log-userspace-transfer.c |    3 +++
 drivers/staging/pohmelfs/config.c      |    3 +++
 drivers/video/uvesafb.c                |    3 +++
 4 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 515bcd9..7717f8a 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2297,6 +2297,11 @@ static void drbd_connector_callback(struct cn_msg *req, struct netlink_skb_parms
 		return;
 	}
 
+	if (current_user_ns() != &init_user_ns) {
+		retcode = ERR_PERM;
+		goto fail;
+	}
+
 	if (!cap_raised(current_cap(), CAP_SYS_ADMIN)) {
 		retcode = ERR_PERM;
 		goto fail;
diff --git a/drivers/md/dm-log-userspace-transfer.c b/drivers/md/dm-log-userspace-transfer.c
index 1f23e04..140ca81 100644
--- a/drivers/md/dm-log-userspace-transfer.c
+++ b/drivers/md/dm-log-userspace-transfer.c
@@ -134,6 +134,9 @@ static void cn_ulog_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
 {
 	struct dm_ulog_request *tfr = (struct dm_ulog_request *)(msg + 1);
 
+	if (current_user_ns() != &init_user_ns)
+		return;
+
 	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
diff --git a/drivers/staging/pohmelfs/config.c b/drivers/staging/pohmelfs/config.c
index b6c42cb..cd259d0 100644
--- a/drivers/staging/pohmelfs/config.c
+++ b/drivers/staging/pohmelfs/config.c
@@ -525,6 +525,9 @@ static void pohmelfs_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *n
 {
 	int err;
 
+	if (current_user_ns() != &init_user_ns)
+		return;
+
 	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
diff --git a/drivers/video/uvesafb.c b/drivers/video/uvesafb.c
index 7f8472c..71dab8e 100644
--- a/drivers/video/uvesafb.c
+++ b/drivers/video/uvesafb.c
@@ -73,6 +73,9 @@ static void uvesafb_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *ns
 	struct uvesafb_task *utask;
 	struct uvesafb_ktask *task;
 
+	if (current_user_ns() != &init_user_ns)
+		return;
+
 	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/14] user namespace: make each net (net_ns) belong to a user_ns
  2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
                   ` (4 preceding siblings ...)
  2011-07-26 18:58 ` [PATCH 05/14] userns: clamp down users of cap_raised Serge Hallyn
@ 2011-07-26 18:58 ` Serge Hallyn
  2011-07-26 18:58 ` [PATCH 07/14] user namespace: use net->user_ns for some capable calls under net/ Serge Hallyn
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: dhowells, ebiederm, containers, netdev, akpm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

This way we can target capabilites at the user_ns which created the
net ns.

Changelog:
   jul 8: nsproxy: don't assign netns->userns if not cloning.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 include/net/net_namespace.h |    2 ++
 kernel/nsproxy.c            |    2 ++
 net/core/net_namespace.c    |    3 +++
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 1ab1aec..38a5154 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -29,6 +29,7 @@ struct ctl_table_header;
 struct net_generic;
 struct sock;
 struct netns_ipvs;
+struct user_namespace;
 
 
 #define NETDEV_HASHBITS    8
@@ -101,6 +102,7 @@ struct net {
 	struct netns_xfrm	xfrm;
 #endif
 	struct netns_ipvs	*ipvs;
+	struct user_namespace	*user_ns;
 };
 
 
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index f50542d..e616904 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -95,6 +95,8 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
 		err = PTR_ERR(new_nsp->net_ns);
 		goto out_net;
 	}
+	if (flags & CLONE_NEWNET)
+		new_nsp->net_ns->user_ns = get_user_ns(task_cred_xxx(tsk, user_ns));
 
 	return new_nsp;
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 5bbdbf0..791c19c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -10,6 +10,7 @@
 #include <linux/nsproxy.h>
 #include <linux/proc_fs.h>
 #include <linux/file.h>
+#include <linux/user_namespace.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -209,6 +210,7 @@ static void net_free(struct net *net)
 	}
 #endif
 	kfree(net->gen);
+	put_user_ns(net->user_ns);
 	kmem_cache_free(net_cachep, net);
 }
 
@@ -389,6 +391,7 @@ static int __init net_ns_init(void)
 	rcu_assign_pointer(init_net.gen, ng);
 
 	mutex_lock(&net_mutex);
+	init_net.user_ns = &init_user_ns;
 	if (setup_net(&init_net))
 		panic("Could not setup the initial network namespace");
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/14] user namespace: use net->user_ns for some capable calls under net/
  2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
                   ` (5 preceding siblings ...)
  2011-07-26 18:58 ` [PATCH 06/14] user namespace: make each net (net_ns) belong to a user_ns Serge Hallyn
@ 2011-07-26 18:58 ` Serge Hallyn
  2011-07-26 18:58 ` [PATCH 08/14] af_netlink.c: make netlink_capable userns-aware Serge Hallyn
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: dhowells, ebiederm, containers, netdev, akpm, Serge Hallyn

From: Serge Hallyn <serge.hallyn@ubuntu.com>

Just a partial conversion to show how the previous patch is expected to
be used.

Changelog:
  6/28/11: fix typo in net/core/sock.c
  7/08/11: don't target capability which authorizes module loading

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/dev.c  |    4 ++--
 net/core/sock.c |   14 ++++++++------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 9444c5c..cee43eb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5014,7 +5014,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCGMIIPHY:
 	case SIOCGMIIREG:
 	case SIOCSIFNAME:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		dev_load(net, ifr.ifr_name);
 		rtnl_lock();
@@ -5053,7 +5053,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCBRADDIF:
 	case SIOCBRDELIF:
 	case SIOCSHWTSTAMP:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		/* fall through */
 	case SIOCBONDSLAVEINFOQUERY:
diff --git a/net/core/sock.c b/net/core/sock.c
index bc745d0..0f31675 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -420,7 +420,7 @@ static int sock_bindtodevice(struct sock *sk, char __user *optval, int optlen)
 
 	/* Sorry... */
 	ret = -EPERM;
-	if (!capable(CAP_NET_RAW))
+	if (!ns_capable(net->user_ns, CAP_NET_RAW))
 		goto out;
 
 	ret = -EINVAL;
@@ -488,6 +488,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 	int valbool;
 	struct linger ling;
 	int ret = 0;
+	struct net *net = sock_net(sk);
 
 	/*
 	 *	Options without arguments
@@ -508,7 +509,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 
 	switch (optname) {
 	case SO_DEBUG:
-		if (val && !capable(CAP_NET_ADMIN))
+		if (val && !ns_capable(net->user_ns, CAP_NET_ADMIN))
 			ret = -EACCES;
 		else
 			sock_valbool_flag(sk, SOCK_DBG, valbool);
@@ -551,7 +552,7 @@ set_sndbuf:
 		break;
 
 	case SO_SNDBUFFORCE:
-		if (!capable(CAP_NET_ADMIN)) {
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) {
 			ret = -EPERM;
 			break;
 		}
@@ -589,7 +590,7 @@ set_rcvbuf:
 		break;
 
 	case SO_RCVBUFFORCE:
-		if (!capable(CAP_NET_ADMIN)) {
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) {
 			ret = -EPERM;
 			break;
 		}
@@ -612,7 +613,8 @@ set_rcvbuf:
 		break;
 
 	case SO_PRIORITY:
-		if ((val >= 0 && val <= 6) || capable(CAP_NET_ADMIN))
+		if ((val >= 0 && val <= 6) ||
+		     ns_capable(net->user_ns, CAP_NET_ADMIN))
 			sk->sk_priority = val;
 		else
 			ret = -EPERM;
@@ -729,7 +731,7 @@ set_rcvbuf:
 			clear_bit(SOCK_PASSSEC, &sock->flags);
 		break;
 	case SO_MARK:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			ret = -EPERM;
 		else
 			sk->sk_mark = val;
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/14] af_netlink.c: make netlink_capable userns-aware
  2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
                   ` (6 preceding siblings ...)
  2011-07-26 18:58 ` [PATCH 07/14] user namespace: use net->user_ns for some capable calls under net/ Serge Hallyn
@ 2011-07-26 18:58 ` Serge Hallyn
  2011-07-26 18:58 ` [PATCH 09/14] user ns: convert ipv6 to targeted capabilities Serge Hallyn
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: dhowells, ebiederm, containers, netdev, akpm, Serge E. Hallyn,
	Eric Dumazet

From: Serge E. Hallyn <serge.hallyn@canonical.com>

netlink_capable should check for permissions against the user
namespace owning the socket in question.

Changelog:
  Per Eric Dumazet advice, use sock_net(sk) instead of #ifdef.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/netlink/af_netlink.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 0a4db02..3cc0bbe 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -580,8 +580,9 @@ retry:
 
 static inline int netlink_capable(struct socket *sock, unsigned int flag)
 {
-	return (nl_table[sock->sk->sk_protocol].nl_nonroot & flag) ||
-	       capable(CAP_NET_ADMIN);
+	if (nl_table[sock->sk->sk_protocol].nl_nonroot & flag)
+		return 1;
+	return ns_capable(sock_net(sock->sk)->user_ns, CAP_NET_ADMIN);
 }
 
 static void
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/14] user ns: convert ipv6 to targeted capabilities
  2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
                   ` (7 preceding siblings ...)
  2011-07-26 18:58 ` [PATCH 08/14] af_netlink.c: make netlink_capable userns-aware Serge Hallyn
@ 2011-07-26 18:58 ` Serge Hallyn
  2011-07-26 18:58 ` [PATCH 10/14] net/core/scm.c: target capable() calls to user_ns owning the net_ns Serge Hallyn
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: dhowells, ebiederm, containers, netdev, akpm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/ipv6/addrconf.c             |    4 ++--
 net/ipv6/af_inet6.c             |    6 ++++--
 net/ipv6/datagram.c             |    6 +++---
 net/ipv6/ip6_flowlabel.c        |   24 ++++++++++++++----------
 net/ipv6/ip6_tunnel.c           |    4 ++--
 net/ipv6/ip6mr.c                |    2 +-
 net/ipv6/ipv6_sockglue.c        |    7 ++++---
 net/ipv6/netfilter/ip6_tables.c |    8 ++++----
 net/ipv6/route.c                |    2 +-
 net/ipv6/sit.c                  |   10 +++++-----
 10 files changed, 40 insertions(+), 33 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index a06c53c..6b93e3b 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2228,7 +2228,7 @@ int addrconf_add_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
@@ -2247,7 +2247,7 @@ int addrconf_del_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 3b5669a..1854ffe 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -160,7 +160,8 @@ lookup_protocol:
 	}
 
 	err = -EPERM;
-	if (sock->type == SOCK_RAW && !kern && !capable(CAP_NET_RAW))
+	if (sock->type == SOCK_RAW && !kern &&
+	    !ns_capable(net->user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
 	sock->ops = answer->ops;
@@ -281,7 +282,8 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		return -EINVAL;
 
 	snum = ntohs(addr->sin6_port);
-	if (snum && snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
+	if (snum && snum < PROT_SOCK &&
+	    !ns_capable(sock_net(sk)->user_ns, CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
 	lock_sock(sk);
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 1656033..7e38d8f 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -694,7 +694,7 @@ int datagram_send_ctl(struct net *net,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!capable(CAP_NET_RAW)) {
+			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -714,7 +714,7 @@ int datagram_send_ctl(struct net *net,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!capable(CAP_NET_RAW)) {
+			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -739,7 +739,7 @@ int datagram_send_ctl(struct net *net,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!capable(CAP_NET_RAW)) {
+			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index f3caf1b..4726c02 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -294,21 +294,22 @@ struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions * opt_space,
 	return opt_space;
 }
 
-static unsigned long check_linger(unsigned long ttl)
+static unsigned long check_linger(unsigned long ttl, struct user_namespace *ns)
 {
 	if (ttl < FL_MIN_LINGER)
 		return FL_MIN_LINGER*HZ;
-	if (ttl > FL_MAX_LINGER && !capable(CAP_NET_ADMIN))
+	if (ttl > FL_MAX_LINGER && !ns_capable(ns, CAP_NET_ADMIN))
 		return 0;
 	return ttl*HZ;
 }
 
-static int fl6_renew(struct ip6_flowlabel *fl, unsigned long linger, unsigned long expires)
+static int fl6_renew(struct ip6_flowlabel *fl, unsigned long linger,
+		     unsigned long expires, struct user_namespace *ns)
 {
-	linger = check_linger(linger);
+	linger = check_linger(linger, ns);
 	if (!linger)
 		return -EPERM;
-	expires = check_linger(expires);
+	expires = check_linger(expires, ns);
 	if (!expires)
 		return -EPERM;
 	fl->lastuse = jiffies;
@@ -375,7 +376,7 @@ fl_create(struct net *net, struct in6_flowlabel_req *freq, char __user *optval,
 
 	fl->fl_net = hold_net(net);
 	fl->expires = jiffies;
-	err = fl6_renew(fl, freq->flr_linger, freq->flr_expires);
+	err = fl6_renew(fl, freq->flr_linger, freq->flr_expires, net->user_ns);
 	if (err)
 		goto done;
 	fl->share = freq->flr_share;
@@ -425,7 +426,7 @@ static int mem_check(struct sock *sk)
 	if (room <= 0 ||
 	    ((count >= FL_MAX_PER_SOCK ||
 	      (count > 0 && room < FL_MAX_SIZE/2) || room < FL_MAX_SIZE/4) &&
-	     !capable(CAP_NET_ADMIN)))
+	     !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)))
 		return -ENOBUFS;
 
 	return 0;
@@ -507,17 +508,20 @@ int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen)
 		read_lock_bh(&ip6_sk_fl_lock);
 		for (sfl = np->ipv6_fl_list; sfl; sfl = sfl->next) {
 			if (sfl->fl->label == freq.flr_label) {
-				err = fl6_renew(sfl->fl, freq.flr_linger, freq.flr_expires);
+				err = fl6_renew(sfl->fl, freq.flr_linger, freq.flr_expires,
+						net->user_ns);
 				read_unlock_bh(&ip6_sk_fl_lock);
 				return err;
 			}
 		}
 		read_unlock_bh(&ip6_sk_fl_lock);
 
-		if (freq.flr_share == IPV6_FL_S_NONE && capable(CAP_NET_ADMIN)) {
+		if (freq.flr_share == IPV6_FL_S_NONE &&
+		    ns_capable(net->user_ns, CAP_NET_ADMIN)) {
 			fl = fl_lookup(net, freq.flr_label);
 			if (fl) {
-				err = fl6_renew(fl, freq.flr_linger, freq.flr_expires);
+				err = fl6_renew(fl, freq.flr_linger, freq.flr_expires,
+						net->user_ns);
 				fl_release(fl);
 				return err;
 			}
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 36c2842..1a98c23 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1269,7 +1269,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			break;
 		err = -EFAULT;
 		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof (p)))
@@ -1304,7 +1304,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		break;
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			break;
 
 		if (dev == ip6n->fb_tnl_dev) {
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 705c828..1649ccd 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1582,7 +1582,7 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
 		return -ENOENT;
 
 	if (optname != MRT6_INIT) {
-		if (sk != mrt->mroute6_sk && !capable(CAP_NET_ADMIN))
+		if (sk != mrt->mroute6_sk && !ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EACCES;
 	}
 
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 9cb191e..196b099 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -343,7 +343,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 		break;
 
 	case IPV6_TRANSPARENT:
-		if (!capable(CAP_NET_ADMIN)) {
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) {
 			retv = -EPERM;
 			break;
 		}
@@ -381,7 +381,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 
 		/* hop-by-hop / destination options are privileged option */
 		retv = -EPERM;
-		if (optname != IPV6_RTHDR && !capable(CAP_NET_RAW))
+		if (optname != IPV6_RTHDR &&
+		    !ns_capable(net->user_ns, CAP_NET_RAW))
 			break;
 
 		opt = ipv6_renew_options(sk, np->opt, optname,
@@ -725,7 +726,7 @@ done:
 	case IPV6_IPSEC_POLICY:
 	case IPV6_XFRM_POLICY:
 		retv = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		retv = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 94874b0..7fce7d8 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1869,7 +1869,7 @@ compat_do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1984,7 +1984,7 @@ compat_do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2006,7 +2006,7 @@ do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2031,7 +2031,7 @@ do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index e8987da..2a64c67 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1929,7 +1929,7 @@ int ipv6_route_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch(cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		err = copy_from_user(&rtmsg, arg,
 				     sizeof(struct in6_rtmsg));
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 07bf108..402e2a5 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -308,7 +308,7 @@ static int ipip6_tunnel_get_prl(struct ip_tunnel *t,
 	/* For simple GET or for root users,
 	 * we try harder to allocate.
 	 */
-	kp = (cmax <= 1 || capable(CAP_NET_ADMIN)) ?
+	kp = (cmax <= 1 || ns_capable(dev_net(t->dev)->user_ns, CAP_NET_ADMIN)) ?
 		kcalloc(cmax, sizeof(*kp), GFP_KERNEL) :
 		NULL;
 
@@ -926,7 +926,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -985,7 +985,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == sitn->fb_tunnel_dev) {
@@ -1018,7 +1018,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCDELPRL:
 	case SIOCCHGPRL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			goto done;
 		err = -EINVAL;
 		if (dev == sitn->fb_tunnel_dev)
@@ -1047,7 +1047,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCCHG6RD:
 	case SIOCDEL6RD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/14] net/core/scm.c: target capable() calls to user_ns owning the net_ns
  2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
                   ` (8 preceding siblings ...)
  2011-07-26 18:58 ` [PATCH 09/14] user ns: convert ipv6 to targeted capabilities Serge Hallyn
@ 2011-07-26 18:58 ` Serge Hallyn
  2011-08-04 22:06   ` Serge E. Hallyn
  2011-07-26 18:58 ` [PATCH 11/14] userns: make some net-sysfs capable calls targeted Serge Hallyn
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: dhowells, ebiederm, containers, netdev, akpm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

The uid/gid comparisons don't have to be pulled out.  This just seemed
more easily proved correct.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/scm.c |   41 ++++++++++++++++++++++++++++++++++-------
 1 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/net/core/scm.c b/net/core/scm.c
index 4c1ef02..21b5d0b 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -43,17 +43,44 @@
  *	setu(g)id.
  */
 
-static __inline__ int scm_check_creds(struct ucred *creds)
+static __inline__ bool uidequiv(struct cred *src, struct ucred *tgt,
+			       struct user_namespace *ns)
+{
+	if (src->user_ns != ns)
+		goto check_capable;
+	if (src->uid == tgt->uid || src->euid == tgt->uid ||
+	    src->suid == tgt->uid)
+		return true;
+check_capable:
+	if (ns_capable(ns, CAP_SETUID))
+		return true;
+	return false;
+}
+
+static __inline__ bool gidequiv(struct cred *src, struct ucred *tgt,
+			       struct user_namespace *ns)
+{
+	if (src->user_ns != ns)
+		goto check_capable;
+	if (src->gid == tgt->gid || src->egid == tgt->gid ||
+	    src->sgid == tgt->gid)
+		return true;
+check_capable:
+	if (ns_capable(ns, CAP_SETGID))
+		return true;
+	return false;
+}
+
+static __inline__ int scm_check_creds(struct ucred *creds, struct socket *sock)
 {
 	const struct cred *cred = current_cred();
+	struct user_namespace *ns = sock_net(sock->sk)->user_ns;
 
-	if ((creds->pid == task_tgid_vnr(current) || capable(CAP_SYS_ADMIN)) &&
-	    ((creds->uid == cred->uid   || creds->uid == cred->euid ||
-	      creds->uid == cred->suid) || capable(CAP_SETUID)) &&
-	    ((creds->gid == cred->gid   || creds->gid == cred->egid ||
-	      creds->gid == cred->sgid) || capable(CAP_SETGID))) {
+	if ((creds->pid == task_tgid_vnr(current) || ns_capable(ns, CAP_SYS_ADMIN)) &&
+	     uidequiv(cred, creds, ns) && gidequiv(cred, creds, ns)) {
 	       return 0;
 	}
+
 	return -EPERM;
 }
 
@@ -169,7 +196,7 @@ int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *p)
 			if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct ucred)))
 				goto error;
 			memcpy(&p->creds, CMSG_DATA(cmsg), sizeof(struct ucred));
-			err = scm_check_creds(&p->creds);
+			err = scm_check_creds(&p->creds, sock);
 			if (err)
 				goto error;
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/14] userns: make some net-sysfs capable calls targeted
  2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
                   ` (9 preceding siblings ...)
  2011-07-26 18:58 ` [PATCH 10/14] net/core/scm.c: target capable() calls to user_ns owning the net_ns Serge Hallyn
@ 2011-07-26 18:58 ` Serge Hallyn
  2011-07-26 18:58 ` [PATCH 12/14] user_ns: target af_key capability check Serge Hallyn
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: dhowells, ebiederm, containers, netdev, akpm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

Changelog: jul 1: fix compilation errors (net_device != net)

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/net-sysfs.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 1683e5d..876915b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -76,7 +76,7 @@ static ssize_t netdev_store(struct device *dev, struct device_attribute *attr,
 	unsigned long new;
 	int ret = -EINVAL;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(net)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	new = simple_strtoul(buf, &endp, 0);
@@ -261,7 +261,7 @@ static ssize_t store_ifalias(struct device *dev, struct device_attribute *attr,
 	size_t count = len;
 	ssize_t ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(netdev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* ignore trailing newline */
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 12/14] user_ns: target af_key capability check
  2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
                   ` (10 preceding siblings ...)
  2011-07-26 18:58 ` [PATCH 11/14] userns: make some net-sysfs capable calls targeted Serge Hallyn
@ 2011-07-26 18:58 ` Serge Hallyn
  2011-07-26 18:58 ` [PATCH 13/14] userns: net: make many network capable calls targeted Serge Hallyn
  2011-07-26 18:58 ` [PATCH 14/14] net: pass user_ns to cap_netlink_recv() Serge Hallyn
  13 siblings, 0 replies; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: dhowells, ebiederm, containers, netdev, akpm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

This presumes that it really is complete wrt network namespaces.  Looking
at the code it appears to be.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/key/af_key.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 1e733e9..1f90f4e 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -141,7 +141,7 @@ static int pfkey_create(struct net *net, struct socket *sock, int protocol,
 	struct sock *sk;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (sock->type != SOCK_RAW)
 		return -ESOCKTNOSUPPORT;
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 13/14] userns: net: make many network capable calls targeted
  2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
                   ` (11 preceding siblings ...)
  2011-07-26 18:58 ` [PATCH 12/14] user_ns: target af_key capability check Serge Hallyn
@ 2011-07-26 18:58 ` Serge Hallyn
  2011-07-26 18:58 ` [PATCH 14/14] net: pass user_ns to cap_netlink_recv() Serge Hallyn
  13 siblings, 0 replies; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: dhowells, ebiederm, containers, netdev, akpm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

When privilege is protected a namespaced network resource, then having
the required privilege targed toward the user namespace which owns the
resource suffices.

As with other patches, a big concern here is that we be cleanly separating
the cases where privilege protects a network resource from cases where
privilege can lead to laxer constraints on input and, subsequently,
the ability to corrupt, crash, or own the host kernel.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/8021q/vlan.c                  |   12 ++++++------
 net/bridge/br_ioctl.c             |   22 +++++++++++-----------
 net/bridge/br_sysfs_br.c          |    8 ++++----
 net/bridge/br_sysfs_if.c          |    2 +-
 net/bridge/netfilter/ebtables.c   |    8 ++++----
 net/core/ethtool.c                |    2 +-
 net/ipv4/arp.c                    |    2 +-
 net/ipv4/devinet.c                |    4 ++--
 net/ipv4/fib_frontend.c           |    2 +-
 net/ipv4/ip_options.c             |    6 +++---
 net/ipv4/ip_sockglue.c            |    4 ++--
 net/ipv4/ipip.c                   |    4 ++--
 net/ipv4/ipmr.c                   |    2 +-
 net/ipv4/netfilter/arp_tables.c   |    8 ++++----
 net/ipv4/netfilter/ip_tables.c    |    8 ++++----
 net/netfilter/ipset/ip_set_core.c |    2 +-
 net/netfilter/ipvs/ip_vs_ctl.c    |    4 ++--
 net/packet/af_packet.c            |    2 +-
 18 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 8970ba1..7d12f63 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -558,7 +558,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 	switch (args.cmd) {
 	case SET_VLAN_INGRESS_PRIORITY_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		vlan_dev_set_ingress_priority(dev,
 					      args.u.skb_priority,
@@ -568,7 +568,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_EGRESS_PRIORITY_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		err = vlan_dev_set_egress_priority(dev,
 						   args.u.skb_priority,
@@ -577,7 +577,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_FLAG_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		err = vlan_dev_change_flags(dev,
 					    args.vlan_qos ? args.u.flag : 0,
@@ -586,7 +586,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_NAME_TYPE_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		if ((args.u.name_type >= 0) &&
 		    (args.u.name_type < VLAN_NAME_TYPE_HIGHEST)) {
@@ -602,14 +602,14 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case ADD_VLAN_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		err = register_vlan_device(dev, args.u.VID);
 		break;
 
 	case DEL_VLAN_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		unregister_vlan_dev(dev, NULL);
 		err = 0;
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index 7222fe1..c82f9cb 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -88,7 +88,7 @@ static int add_del_if(struct net_bridge *br, int ifindex, int isadd)
 	struct net_device *dev;
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	dev = __dev_get_by_index(dev_net(br->dev), ifindex);
@@ -178,25 +178,25 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	}
 
 	case BRCTL_SET_BRIDGE_FORWARD_DELAY:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		return br_set_forward_delay(br, args[1]);
 
 	case BRCTL_SET_BRIDGE_HELLO_TIME:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		return br_set_hello_time(br, args[1]);
 
 	case BRCTL_SET_BRIDGE_MAX_AGE:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		return br_set_max_age(br, args[1]);
 
 	case BRCTL_SET_AGEING_TIME:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		br->ageing_time = clock_t_to_jiffies(args[1]);
@@ -236,14 +236,14 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	}
 
 	case BRCTL_SET_BRIDGE_STP_STATE:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		br_stp_set_enabled(br, args[1]);
 		return 0;
 
 	case BRCTL_SET_BRIDGE_PRIORITY:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		spin_lock_bh(&br->lock);
@@ -256,7 +256,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 		struct net_bridge_port *p;
 		int ret;
 
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		spin_lock_bh(&br->lock);
@@ -273,7 +273,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 		struct net_bridge_port *p;
 		int ret;
 
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		spin_lock_bh(&br->lock);
@@ -330,7 +330,7 @@ static int old_deviceless(struct net *net, void __user *uarg)
 	{
 		char buf[IFNAMSIZ];
 
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(buf, (void __user *)args[1], IFNAMSIZ))
@@ -360,7 +360,7 @@ int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __user *uar
 	{
 		char buf[IFNAMSIZ];
 
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(buf, uarg, IFNAMSIZ))
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 68b893e..7f4fa3a 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -36,7 +36,7 @@ static ssize_t store_bridge_parm(struct device *d,
 	unsigned long val;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	val = simple_strtoul(buf, &endp, 0);
@@ -132,7 +132,7 @@ static ssize_t store_stp_state(struct device *d,
 	char *endp;
 	unsigned long val;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	val = simple_strtoul(buf, &endp, 0);
@@ -267,7 +267,7 @@ static ssize_t store_group_addr(struct device *d,
 	unsigned new_addr[6];
 	int i;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (sscanf(buf, "%x:%x:%x:%x:%x:%x",
@@ -304,7 +304,7 @@ static ssize_t store_flush(struct device *d,
 {
 	struct net_bridge *br = to_bridge(d);
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	br_fdb_flush(br);
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 6229b62..9cb4d2e 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -209,7 +209,7 @@ static ssize_t brport_store(struct kobject * kobj,
 	char *endp;
 	unsigned long val;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(p->br->dev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	val = simple_strtoul(buf, &endp, 0);
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 2b5ca1a..c403c45 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1462,7 +1462,7 @@ static int do_ebt_set_ctl(struct sock *sk,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch(cmd) {
@@ -1484,7 +1484,7 @@ static int do_ebt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 	struct ebt_replace tmp;
 	struct ebt_table *t;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&tmp, user, sizeof(tmp)))
@@ -2275,7 +2275,7 @@ static int compat_do_ebt_set_ctl(struct sock *sk,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2298,7 +2298,7 @@ static int compat_do_ebt_get_ctl(struct sock *sk, int cmd,
 	struct compat_ebt_replace tmp;
 	struct ebt_table *t;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* try real handler in case userland supplied needed padding */
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 6cdba5f..56878bf 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1676,7 +1676,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_GFEATURES:
 		break;
 	default:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	}
 
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 96a164a..023ad24 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1175,7 +1175,7 @@ int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCDARP:
 	case SIOCSARP:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	case SIOCGARP:
 		err = copy_from_user(&r, arg, sizeof(struct arpreq));
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 37b3c18..3683d37 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -728,7 +728,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 
 	case SIOCSIFFLAGS:
 		ret = -EACCES;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto out;
 		break;
 	case SIOCSIFADDR:	/* Set interface address (and family) */
@@ -736,7 +736,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCSIFDSTADDR:	/* Set the destination address */
 	case SIOCSIFNETMASK: 	/* Set the netmask for the interface */
 		ret = -EACCES;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto out;
 		ret = -EINVAL;
 		if (sin->sin_family != AF_INET)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 92fc5f6..8f34a07 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -437,7 +437,7 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(&rt, arg, sizeof(rt)))
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index ec93335..21df700 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -396,7 +396,7 @@ int ip_options_compile(struct net *net,
 					optptr[2] += 8;
 					break;
 				      default:
-					if (!skb && !capable(CAP_NET_RAW)) {
+					if (!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) {
 						pp_ptr = optptr + 3;
 						goto error;
 					}
@@ -432,7 +432,7 @@ int ip_options_compile(struct net *net,
 				opt->router_alert = optptr - iph;
 			break;
 		      case IPOPT_CIPSO:
-			if ((!skb && !capable(CAP_NET_RAW)) || opt->cipso) {
+			if ((!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) || opt->cipso) {
 				pp_ptr = optptr;
 				goto error;
 			}
@@ -445,7 +445,7 @@ int ip_options_compile(struct net *net,
 		      case IPOPT_SEC:
 		      case IPOPT_SID:
 		      default:
-			if (!skb && !capable(CAP_NET_RAW)) {
+			if (!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) {
 				pp_ptr = optptr;
 				goto error;
 			}
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index ab0c9ef..972c65f 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -955,13 +955,13 @@ mc_msf_out:
 	case IP_IPSEC_POLICY:
 	case IP_XFRM_POLICY:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 			break;
 		err = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
 
 	case IP_TRANSPARENT:
-		if (!capable(CAP_NET_ADMIN)) {
+		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
 			err = -EPERM;
 			break;
 		}
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 378b20b..6725832 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -629,7 +629,7 @@ ipip_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -689,7 +689,7 @@ ipip_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == ipn->fb_tunnel_dev) {
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 58e8791..309aa0c 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1204,7 +1204,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
 
 	if (optname != MRT_INIT) {
 		if (sk != rcu_dereference_raw(mrt->mroute_sk) &&
-		    !capable(CAP_NET_ADMIN))
+		    !ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EACCES;
 	}
 
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index fd7a3f6..acc908f 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -1534,7 +1534,7 @@ static int compat_do_arpt_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1678,7 +1678,7 @@ static int compat_do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1699,7 +1699,7 @@ static int do_arpt_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1723,7 +1723,7 @@ static int do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 24e556e..72f2cde 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1847,7 +1847,7 @@ compat_do_ipt_set_ctl(struct sock *sk,	int cmd, void __user *user,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1962,7 +1962,7 @@ compat_do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1984,7 +1984,7 @@ do_ipt_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2009,7 +2009,7 @@ do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index d7e86ef..38d69a5 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1596,7 +1596,7 @@ ip_set_sockfn_get(struct sock *sk, int optval, void __user *user, int *len)
 	void *data;
 	int copylen = *len, ret = 0;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (optval != SO_IP_SET)
 		return -EBADF;
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index be43fd8..7eda0eb 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2284,7 +2284,7 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 	struct ip_vs_dest_user *udest_compat;
 	struct ip_vs_dest_user_kern udest;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_SET_MAX)
@@ -2566,7 +2566,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
 	BUG_ON(!net);
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_GET_MAX)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c698cec..c2e6bb6 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1793,7 +1793,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
 	__be16 proto = (__force __be16)protocol; /* weird, but documented */
 	int err;
 
-	if (!capable(CAP_NET_RAW))
+	if (!ns_capable(net->user_ns, CAP_NET_RAW))
 		return -EPERM;
 	if (sock->type != SOCK_DGRAM && sock->type != SOCK_RAW &&
 	    sock->type != SOCK_PACKET)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 14/14] net: pass user_ns to cap_netlink_recv()
  2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
                   ` (12 preceding siblings ...)
  2011-07-26 18:58 ` [PATCH 13/14] userns: net: make many network capable calls targeted Serge Hallyn
@ 2011-07-26 18:58 ` Serge Hallyn
  13 siblings, 0 replies; 30+ messages in thread
From: Serge Hallyn @ 2011-07-26 18:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: dhowells, ebiederm, containers, netdev, akpm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

and make cap_netlink_recv() userns-aware

cap_netlink_recv() was granting privilege if a capability is in
current_cap(), regardless of the user namespace.  Fix that by
targeting the capability check against the user namespace which
owns the skb.

Because sock_net is static inline defined in net/sock.h, which we
don't want to #include at the cap_netlink_recv function (commoncap.h).

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/scsi/scsi_netlink.c     |    3 ++-
 include/linux/security.h        |   14 +++++++++-----
 kernel/audit.c                  |    6 ++++--
 net/core/rtnetlink.c            |    3 ++-
 net/decnet/netfilter/dn_rtmsg.c |    3 ++-
 net/ipv4/netfilter/ip_queue.c   |    3 ++-
 net/ipv6/netfilter/ip6_queue.c  |    3 ++-
 net/netfilter/nfnetlink.c       |    2 +-
 net/netlink/genetlink.c         |    2 +-
 net/xfrm/xfrm_user.c            |    2 +-
 security/commoncap.c            |    6 ++----
 security/security.c             |    4 ++--
 security/selinux/hooks.c        |    5 +++--
 13 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c
index 26a8a45..0aa2e57 100644
--- a/drivers/scsi/scsi_netlink.c
+++ b/drivers/scsi/scsi_netlink.c
@@ -111,7 +111,8 @@ scsi_nl_rcv_msg(struct sk_buff *skb)
 			goto next_msg;
 		}
 
-		if (security_netlink_recv(skb, CAP_SYS_ADMIN)) {
+		if (security_netlink_recv(skb, CAP_SYS_ADMIN,
+					  sock_net(skb->sk)->user_ns)) {
 			err = -EPERM;
 			goto next_msg;
 		}
diff --git a/include/linux/security.h b/include/linux/security.h
index ebd2a53..cfa1f47 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -95,7 +95,8 @@ struct xfrm_user_sec_ctx;
 struct seq_file;
 
 extern int cap_netlink_send(struct sock *sk, struct sk_buff *skb);
-extern int cap_netlink_recv(struct sk_buff *skb, int cap);
+extern int cap_netlink_recv(struct sk_buff *skb, int cap,
+			    struct user_namespace *ns);
 
 void reset_security_ops(void);
 
@@ -797,6 +798,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
  *	@skb.
  *	@skb contains the sk_buff structure for the netlink message.
  *	@cap indicates the capability required
+ *	@ns is the user namespace which owns skb
  *	Return 0 if permission is granted.
  *
  * Security hooks for Unix domain networking.
@@ -1557,7 +1559,8 @@ struct security_operations {
 			  struct sembuf *sops, unsigned nsops, int alter);
 
 	int (*netlink_send) (struct sock *sk, struct sk_buff *skb);
-	int (*netlink_recv) (struct sk_buff *skb, int cap);
+	int (*netlink_recv) (struct sk_buff *skb, int cap,
+			     struct user_namespace *ns);
 
 	void (*d_instantiate) (struct dentry *dentry, struct inode *inode);
 
@@ -1806,7 +1809,7 @@ void security_d_instantiate(struct dentry *dentry, struct inode *inode);
 int security_getprocattr(struct task_struct *p, char *name, char **value);
 int security_setprocattr(struct task_struct *p, char *name, void *value, size_t size);
 int security_netlink_send(struct sock *sk, struct sk_buff *skb);
-int security_netlink_recv(struct sk_buff *skb, int cap);
+int security_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns);
 int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen);
 int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid);
 void security_release_secctx(char *secdata, u32 seclen);
@@ -2498,9 +2501,10 @@ static inline int security_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return cap_netlink_send(sk, skb);
 }
 
-static inline int security_netlink_recv(struct sk_buff *skb, int cap)
+static inline int security_netlink_recv(struct sk_buff *skb, int cap,
+					struct user_namespace *ns)
 {
-	return cap_netlink_recv(skb, cap);
+	return cap_netlink_recv(skb, cap, ns);
 }
 
 static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
diff --git a/kernel/audit.c b/kernel/audit.c
index 52501b5..bed1c50 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -601,13 +601,15 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
 	case AUDIT_TTY_SET:
 	case AUDIT_TRIM:
 	case AUDIT_MAKE_EQUIV:
-		if (security_netlink_recv(skb, CAP_AUDIT_CONTROL))
+		if (security_netlink_recv(skb, CAP_AUDIT_CONTROL,
+					  sock_net(skb->sk)->user_ns))
 			err = -EPERM;
 		break;
 	case AUDIT_USER:
 	case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
 	case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
-		if (security_netlink_recv(skb, CAP_AUDIT_WRITE))
+		if (security_netlink_recv(skb, CAP_AUDIT_WRITE,
+					  sock_net(skb->sk)->user_ns))
 			err = -EPERM;
 		break;
 	default:  /* bad msg */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 99d9e95..4a444de 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1931,7 +1931,8 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	sz_idx = type>>2;
 	kind = type&3;
 
-	if (kind != 2 && security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (kind != 2 && security_netlink_recv(skb, CAP_NET_ADMIN,
+					       net->user_ns))
 		return -EPERM;
 
 	if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) {
diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c
index 69975e0..2d052ab 100644
--- a/net/decnet/netfilter/dn_rtmsg.c
+++ b/net/decnet/netfilter/dn_rtmsg.c
@@ -108,7 +108,8 @@ static inline void dnrmg_receive_user_skb(struct sk_buff *skb)
 	if (nlh->nlmsg_len < sizeof(*nlh) || skb->len < nlh->nlmsg_len)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+	    sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	/* Eventually we might send routing messages too */
diff --git a/net/ipv4/netfilter/ip_queue.c b/net/ipv4/netfilter/ip_queue.c
index 5c9b9d9..51d7c52 100644
--- a/net/ipv4/netfilter/ip_queue.c
+++ b/net/ipv4/netfilter/ip_queue.c
@@ -432,7 +432,8 @@ __ipq_rcv_skb(struct sk_buff *skb)
 	if (type <= IPQM_BASE)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+				  sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	spin_lock_bh(&queue_lock);
diff --git a/net/ipv6/netfilter/ip6_queue.c b/net/ipv6/netfilter/ip6_queue.c
index 2493948..8206bf3 100644
--- a/net/ipv6/netfilter/ip6_queue.c
+++ b/net/ipv6/netfilter/ip6_queue.c
@@ -433,7 +433,8 @@ __ipq_rcv_skb(struct sk_buff *skb)
 	if (type <= IPQM_BASE)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+				  sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	spin_lock_bh(&queue_lock);
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 1905976..bcaff9d 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -130,7 +130,7 @@ static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	const struct nfnetlink_subsystem *ss;
 	int type, err;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	/* All the messages must at least contain nfgenmsg */
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 482fa57..00a101c 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -516,7 +516,7 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		return -EOPNOTSUPP;
 
 	if ((ops->flags & GENL_ADMIN_PERM) &&
-	    security_netlink_recv(skb, CAP_NET_ADMIN))
+	    security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 0256b8a..1808e1e 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2290,7 +2290,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	link = &xfrm_dispatch[type];
 
 	/* All operations require privileges, even GET */
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	if ((type == (XFRM_MSG_GETSA - XFRM_MSG_BASE) ||
diff --git a/security/commoncap.c b/security/commoncap.c
index a93b3b7..1e48e6a 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -56,11 +56,9 @@ int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return 0;
 }
 
-int cap_netlink_recv(struct sk_buff *skb, int cap)
+int cap_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns)
 {
-	if (!cap_raised(current_cap(), cap))
-		return -EPERM;
-	return 0;
+	return security_capable(ns, current_cred(), cap);
 }
 EXPORT_SYMBOL(cap_netlink_recv);
 
diff --git a/security/security.c b/security/security.c
index 0e4fccf..0a1453e 100644
--- a/security/security.c
+++ b/security/security.c
@@ -941,9 +941,9 @@ int security_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return security_ops->netlink_send(sk, skb);
 }
 
-int security_netlink_recv(struct sk_buff *skb, int cap)
+int security_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns)
 {
-	return security_ops->netlink_recv(skb, cap);
+	return security_ops->netlink_recv(skb, cap, ns);
 }
 EXPORT_SYMBOL(security_netlink_recv);
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 9f4c77d..c80a063 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4723,13 +4723,14 @@ static int selinux_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return selinux_nlmsg_perm(sk, skb);
 }
 
-static int selinux_netlink_recv(struct sk_buff *skb, int capability)
+static int selinux_netlink_recv(struct sk_buff *skb, int capability,
+				struct user_namespace *ns)
 {
 	int err;
 	struct common_audit_data ad;
 	u32 sid;
 
-	err = cap_netlink_recv(skb, capability);
+	err = cap_netlink_recv(skb, capability, ns);
 	if (err)
 		return err;
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/14] add Documentation/namespaces/user_namespace.txt
  2011-07-26 18:58 ` [PATCH 01/14] add Documentation/namespaces/user_namespace.txt Serge Hallyn
@ 2011-07-26 20:22   ` Randy Dunlap
  2011-07-27 15:38     ` Serge E. Hallyn
  2011-07-26 20:29   ` David Howells
  1 sibling, 1 reply; 30+ messages in thread
From: Randy Dunlap @ 2011-07-26 20:22 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-kernel, dhowells, ebiederm, containers, netdev, akpm,
	Serge E. Hallyn

On Tue, 26 Jul 2011 18:58:24 +0000 Serge Hallyn wrote:

> From: Serge E. Hallyn <serge.hallyn@canonical.com>
> 
> This will hold some info about the design.  Currently it contains
> future todos, issues and questions.
> 
> Changelog:
>    jul 26: incorporate feed back from David Howells.
> 
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: David Howells <dhowells@redhat.com>
> ---
>  Documentation/namespaces/user_namespace.txt |  107 +++++++++++++++++++++++++++
>  1 files changed, 107 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/namespaces/user_namespace.txt
> 
> diff --git a/Documentation/namespaces/user_namespace.txt b/Documentation/namespaces/user_namespace.txt
> new file mode 100644
> index 0000000..7e50517
> --- /dev/null
> +++ b/Documentation/namespaces/user_namespace.txt
> @@ -0,0 +1,107 @@
> +Description
> +===========
> +
> +Traditionally, each task is owned by a user ID (UID) and belongs to one or more
> +groups (GID).  Both are simple numeric IDs, though userspace usually translates
> +them to names.  The user namespace allows tasks to have different views of the
> +UIDs and GIDs associated with tasks and other resources.  (See 'UID mapping'
> +below for more)

         for more.)

> +
> +The user namespace is a simple hierarchical one.  The system starts with all
> +tasks belonging to the initial user namespace.  A task creates a new user
> +namespace by passing the CLONE_NEWUSER flag to clone(2).  This requires the
> +creating task to have the CAP_SETUID, CAP_SETGID, and CAP_CHOWN capabilities,
> +but it does not need to be running as root.  The clone(2) call will result in a
> +new task which to itself appears to be running as UID and GID 0, but to its
> +creator seems to have the creator's credentials.
> +
> +Any task in or resource belonging to the initial user namespace will, to this
> +new task, appear to belong to UID and GID -1 - which is usually known as

that extra hyphen is confusing.  how about:

                              to UID and GID -1, which is

> +'nobody'.  Permission to open such files will be granted according to world
> +access permissions.  UID comparisons and group membership checks will return
> +false, and privilege will be denied.
> +
> +When a task belonging to (for example) userid 500 in the initial user namespace
> +creates a new user namespace, even though the new task will see itself as
> +belonging to UID 0, any task in the initial user namespace will see it as
> +belonging to UID 500.  Therefore, UID 500 in the initial user namespace will be
> +able to kill the new task.  Files created by the new user will (eventually) be
> +seen by tasks in its own user namespace as belonging to UID 0, but to tasks in
> +the initial user namespace as belonging to UID 500.
> +
> +Note that this userid mapping for the VFS is not yet implemented, though the
> +lkml and containers mailing list archives will show several previous
> +prototypes.  In the end, those got hung up waiting on the concept of targeted
> +capabilities to be developed, which, thanks to the insight of Eric Biederman,
> +they finally did.
> +
> +Relationship between the User namespace and other namespaces
> +============================================================
> +
> +Other namespaces, such as UTS and network, are owned by a user namespace.  When
> +such a namespace is created, it is assigned to the user namespace of the task
> +by which it was created.  Therefore, attempts to exercise privilege to
> +resources in, for instance, a particular network namespace, can be properly
> +validated by checking whether the caller has the needed privilege (i.e.
> +CAP_NET_ADMIN) targeted to the user namespace which owns the network namespace.
> +This is done using the ns_capable() function.
> +
> +As an example, if a new task is cloned with a private user namespace but
> +no private network namespace, then the task's network namespace is owned
> +by the parent user namespace.  The new task has no privilege to the
> +parent user namespace, so it will not be able to create or configure
> +network devices.  If, instead, the task were cloned with both private
> +user and network namespaces, then the private network namespace is owned
> +by the private user namespace, and so root in the new user namespace
> +will have privilege targeted to the network namespace.  It will be able
> +to create and configure network devices.
> +
> +UID Mapping
> +===========
> +The current plan (see 'flexible UID mapping' at
> +https://wiki.ubuntu.com/UserNamespace) is:
> +
> +The UID/GID stored on disk will be that in the init_user_ns.  Most likely
> +UID/GID in other namespaces will be stored in xattrs.  But Eric was advocating
> +(a few years ago) leaving the details up to filesystems while providing a lib/
> +stock implementation.  See the thread around here

                                                here:

> +http://www.mail-archive.com/devel@openvz.org/msg09331.html
> +
> +
> +Working notes
> +=============

A lot of this file is working notes and will need to be updated...

> +Capability checks for actions related to syslog must be against the
> +init_user_ns until syslog is containerized.
> +
> +Same is true for reboot and power, control groups, devices, and time.
> +
> +Perf actions (kernel/event/core.c for instance) will always be constrained to
> +init_user_ns.
> +
> +Q:
> +Is accounting considered properly containerized wrt pidns?  (it appears to be).

s/wrt/with respect to/

> +If so, then we can change the capable() check in kernel/acct.c to
> +'ns_capable(current_pid_ns()->user_ns, CAP_PACCT)'
> +
> +Q:
> +For things like nice and schedaffinity, we could allow root in a container to
> +control those, and leave only cgroups to constrain the container.  I'm not sure
> +whether that is right, or whether it violates admin expectations.
> +
> +I deferred some of commoncap.c.  I'm punting on xattr stuff as they take
> +dentries, not inodes.
> +
> +For drivers/tty/tty_io.c and drivers/tty/vt/vt.c, we'll want to (for some of
> +them) target the capability checks at the user_ns owning the tty.  That will
> +have to wait until we get userns owning files straightened out.
> +
> +We need to figure out how to label devices.  Should we just toss a user_ns
> +right into struct device?
> +
> +capable(CAP_MAC_ADMIN) checks are always to be against init_user_ns, unless
> +some day LSMs were to be containerized, near zero chance.
> +
> +inode_owner_or_capable() should probably take an optional ns and cap parameter.
> +If cap is 0, then CAP_FOWNER is checked.  If ns is NULL, we derive the ns from
> +inode.  But if ns is provided, then callers who need to derive
> +inode_userns(inode) anyway can save a few cycles.
> -- 


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/14] add Documentation/namespaces/user_namespace.txt
  2011-07-26 18:58 ` [PATCH 01/14] add Documentation/namespaces/user_namespace.txt Serge Hallyn
  2011-07-26 20:22   ` Randy Dunlap
@ 2011-07-26 20:29   ` David Howells
  2011-07-29 17:25     ` [PATCH 01/14] add Documentation/namespaces/user_namespace.txt (v3) Serge E. Hallyn
  1 sibling, 1 reply; 30+ messages in thread
From: David Howells @ 2011-07-26 20:29 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: dhowells, Serge Hallyn, linux-kernel, ebiederm, containers,
	netdev, akpm, Serge E. Hallyn

Randy Dunlap <rdunlap@xenotime.net> wrote:

> > +Any task in or resource belonging to the initial user namespace will, to this
> > +new task, appear to belong to UID and GID -1 - which is usually known as
> 
> that extra hyphen is confusing.  how about:
> 
>                               to UID and GID -1, which is

'which are'.

David

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/14] add Documentation/namespaces/user_namespace.txt
  2011-07-26 20:22   ` Randy Dunlap
@ 2011-07-27 15:38     ` Serge E. Hallyn
  2011-07-27 16:02       ` Randy Dunlap
  0 siblings, 1 reply; 30+ messages in thread
From: Serge E. Hallyn @ 2011-07-27 15:38 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-kernel, dhowells, ebiederm, containers, netdev, akpm,
	Serge E. Hallyn

Quoting Randy Dunlap (rdunlap@xenotime.net):
> On Tue, 26 Jul 2011 18:58:24 +0000 Serge Hallyn wrote:
> 
> > From: Serge E. Hallyn <serge.hallyn@canonical.com>
> > 
> > This will hold some info about the design.  Currently it contains
> > future todos, issues and questions.
> > 
> > Changelog:
> >    jul 26: incorporate feed back from David Howells.
> > 
> > Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > Cc: David Howells <dhowells@redhat.com>
> > ---
> >  Documentation/namespaces/user_namespace.txt |  107 +++++++++++++++++++++++++++
> >  1 files changed, 107 insertions(+), 0 deletions(-)
> >  create mode 100644 Documentation/namespaces/user_namespace.txt
> > 
> > diff --git a/Documentation/namespaces/user_namespace.txt b/Documentation/namespaces/user_namespace.txt
> > new file mode 100644
> > index 0000000..7e50517
> > --- /dev/null
> > +++ b/Documentation/namespaces/user_namespace.txt
> > @@ -0,0 +1,107 @@
> > +Description
> > +===========
> > +
> > +Traditionally, each task is owned by a user ID (UID) and belongs to one or more
> > +groups (GID).  Both are simple numeric IDs, though userspace usually translates
> > +them to names.  The user namespace allows tasks to have different views of the
> > +UIDs and GIDs associated with tasks and other resources.  (See 'UID mapping'
> > +below for more)
> 
>          for more.)

Thanks for reviewing, Randy.

> > +
> > +The user namespace is a simple hierarchical one.  The system starts with all
> > +tasks belonging to the initial user namespace.  A task creates a new user
> > +namespace by passing the CLONE_NEWUSER flag to clone(2).  This requires the
> > +creating task to have the CAP_SETUID, CAP_SETGID, and CAP_CHOWN capabilities,
> > +but it does not need to be running as root.  The clone(2) call will result in a
> > +new task which to itself appears to be running as UID and GID 0, but to its
> > +creator seems to have the creator's credentials.
> > +
> > +Any task in or resource belonging to the initial user namespace will, to this
> > +new task, appear to belong to UID and GID -1 - which is usually known as
> 
> that extra hyphen is confusing.  how about:
> 
>                               to UID and GID -1, which is
> 
> > +'nobody'.  Permission to open such files will be granted according to world

As I'd been asked to switch from comma, I'll restructure, something like:

"To this new task, any resource belonging to the initial user namespace will
appear to belong to user 'nobody', which has UID and GID -1."

> > +access permissions.  UID comparisons and group membership checks will return
> > +false, and privilege will be denied.
> > +
> > +When a task belonging to (for example) userid 500 in the initial user namespace
> > +creates a new user namespace, even though the new task will see itself as
> > +belonging to UID 0, any task in the initial user namespace will see it as
> > +belonging to UID 500.  Therefore, UID 500 in the initial user namespace will be
> > +able to kill the new task.  Files created by the new user will (eventually) be
> > +seen by tasks in its own user namespace as belonging to UID 0, but to tasks in
> > +the initial user namespace as belonging to UID 500.
> > +
> > +Note that this userid mapping for the VFS is not yet implemented, though the
> > +lkml and containers mailing list archives will show several previous
> > +prototypes.  In the end, those got hung up waiting on the concept of targeted
> > +capabilities to be developed, which, thanks to the insight of Eric Biederman,
> > +they finally did.
> > +
> > +Relationship between the User namespace and other namespaces
> > +============================================================
> > +
> > +Other namespaces, such as UTS and network, are owned by a user namespace.  When
> > +such a namespace is created, it is assigned to the user namespace of the task
> > +by which it was created.  Therefore, attempts to exercise privilege to
> > +resources in, for instance, a particular network namespace, can be properly
> > +validated by checking whether the caller has the needed privilege (i.e.
> > +CAP_NET_ADMIN) targeted to the user namespace which owns the network namespace.
> > +This is done using the ns_capable() function.
> > +
> > +As an example, if a new task is cloned with a private user namespace but
> > +no private network namespace, then the task's network namespace is owned
> > +by the parent user namespace.  The new task has no privilege to the
> > +parent user namespace, so it will not be able to create or configure
> > +network devices.  If, instead, the task were cloned with both private
> > +user and network namespaces, then the private network namespace is owned
> > +by the private user namespace, and so root in the new user namespace
> > +will have privilege targeted to the network namespace.  It will be able
> > +to create and configure network devices.
> > +
> > +UID Mapping
> > +===========
> > +The current plan (see 'flexible UID mapping' at
> > +https://wiki.ubuntu.com/UserNamespace) is:
> > +
> > +The UID/GID stored on disk will be that in the init_user_ns.  Most likely
> > +UID/GID in other namespaces will be stored in xattrs.  But Eric was advocating
> > +(a few years ago) leaving the details up to filesystems while providing a lib/
> > +stock implementation.  See the thread around here
> 
>                                                 here:
> 
> > +http://www.mail-archive.com/devel@openvz.org/msg09331.html
> > +
> > +
> > +Working notes
> > +=============
> 
> A lot of this file is working notes and will need to be updated...

Yup.  I can leave it out of this file and keep it on the wiki instead, if
that is preferred.

> > +Capability checks for actions related to syslog must be against the
> > +init_user_ns until syslog is containerized.
> > +
> > +Same is true for reboot and power, control groups, devices, and time.
> > +
> > +Perf actions (kernel/event/core.c for instance) will always be constrained to
> > +init_user_ns.
> > +
> > +Q:
> > +Is accounting considered properly containerized wrt pidns?  (it appears to be).
> 
> s/wrt/with respect to/
> 
> > +If so, then we can change the capable() check in kernel/acct.c to
> > +'ns_capable(current_pid_ns()->user_ns, CAP_PACCT)'
> > +
> > +Q:
> > +For things like nice and schedaffinity, we could allow root in a container to
> > +control those, and leave only cgroups to constrain the container.  I'm not sure
> > +whether that is right, or whether it violates admin expectations.
> > +
> > +I deferred some of commoncap.c.  I'm punting on xattr stuff as they take
> > +dentries, not inodes.
> > +
> > +For drivers/tty/tty_io.c and drivers/tty/vt/vt.c, we'll want to (for some of
> > +them) target the capability checks at the user_ns owning the tty.  That will
> > +have to wait until we get userns owning files straightened out.
> > +
> > +We need to figure out how to label devices.  Should we just toss a user_ns
> > +right into struct device?
> > +
> > +capable(CAP_MAC_ADMIN) checks are always to be against init_user_ns, unless
> > +some day LSMs were to be containerized, near zero chance.
> > +
> > +inode_owner_or_capable() should probably take an optional ns and cap parameter.
> > +If cap is 0, then CAP_FOWNER is checked.  If ns is NULL, we derive the ns from
> > +inode.  But if ns is provided, then callers who need to derive
> > +inode_userns(inode) anyway can save a few cycles.
> > -- 
> 
> 
> ---
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/14] add Documentation/namespaces/user_namespace.txt
  2011-07-27 15:38     ` Serge E. Hallyn
@ 2011-07-27 16:02       ` Randy Dunlap
  0 siblings, 0 replies; 30+ messages in thread
From: Randy Dunlap @ 2011-07-27 16:02 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-kernel, dhowells, ebiederm, containers, netdev, akpm,
	Serge E. Hallyn

On Wed, 27 Jul 2011 15:38:48 +0000 Serge E. Hallyn wrote:

> > > +Working notes
> > > +=============
> > 
> > A lot of this file is working notes and will need to be updated...
> 
> Yup.  I can leave it out of this file and keep it on the wiki instead, if
> that is preferred.

Either place is OK with me, as long as you continue to update it
and don't let it go stale.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/14] allow root in container to copy namespaces
  2011-07-26 18:58 ` [PATCH 02/14] allow root in container to copy namespaces Serge Hallyn
@ 2011-07-27 23:14   ` Eric W. Biederman
  2011-07-28  2:13     ` Serge E. Hallyn
  2011-07-29 17:27     ` [PATCH 02/14] allow root in container to copy namespaces (v3) Serge E. Hallyn
  0 siblings, 2 replies; 30+ messages in thread
From: Eric W. Biederman @ 2011-07-27 23:14 UTC (permalink / raw)
  To: Serge Hallyn; +Cc: linux-kernel, netdev, containers, dhowells

Serge Hallyn <serge@hallyn.com> writes:

> From: Serge E. Hallyn <serge.hallyn@canonical.com>
>
> Othewise nested containers with user namespaces won't be possible.
>
> It's true that user namespaces are not yet fully isolated, but for
> that same reason there are far worse things that root in a child
> user ns can do.  Spawning a child user ns is not in itself bad.
>
> This patch also allows setns for root in a container:
> @Eric Biederman: are there gotchas in allowing setns from child
> userns?

Yes.  We need to ensure that the target namespaces are namespaces
that have been created in from user_namespace or from a child of this
user_namespace.

Aka we need to ensure that we have CAP_SYS_ADMIN for the new namespace.

Eric

> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  kernel/fork.c    |    4 ++--
>  kernel/nsproxy.c |    6 +++---
>  2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 17bf7c8..22d0cf0 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1473,8 +1473,8 @@ long do_fork(unsigned long clone_flags,
>  		/* hopefully this check will go away when userns support is
>  		 * complete
>  		 */
> -		if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
> -				!capable(CAP_SETGID))
> +		if (!nsown_capable(CAP_SYS_ADMIN) || !nsown_capable(CAP_SETUID) ||
> +				!nsown_capable(CAP_SETGID))
>  			return -EPERM;
>  	}
>  
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index 9aeab4b..f50542d 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -134,7 +134,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
>  				CLONE_NEWPID | CLONE_NEWNET)))
>  		return 0;
>  
> -	if (!capable(CAP_SYS_ADMIN)) {
> +	if (!nsown_capable(CAP_SYS_ADMIN)) {
>  		err = -EPERM;
>  		goto out;
>  	}
> @@ -191,7 +191,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
>  			       CLONE_NEWNET)))
>  		return 0;
>  
> -	if (!capable(CAP_SYS_ADMIN))
> +	if (!nsown_capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
>  	*new_nsp = create_new_namespaces(unshare_flags, current,
> @@ -241,7 +241,7 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype)
>  	struct file *file;
>  	int err;
>  
> -	if (!capable(CAP_SYS_ADMIN))
> +	if (!nsown_capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
>  	file = proc_ns_fget(fd);

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/14] allow root in container to copy namespaces
  2011-07-27 23:14   ` Eric W. Biederman
@ 2011-07-28  2:13     ` Serge E. Hallyn
  2011-07-29 17:27     ` [PATCH 02/14] allow root in container to copy namespaces (v3) Serge E. Hallyn
  1 sibling, 0 replies; 30+ messages in thread
From: Serge E. Hallyn @ 2011-07-28  2:13 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, netdev, containers, dhowells

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Serge Hallyn <serge@hallyn.com> writes:
> 
> > From: Serge E. Hallyn <serge.hallyn@canonical.com>
> >
> > Othewise nested containers with user namespaces won't be possible.
> >
> > It's true that user namespaces are not yet fully isolated, but for
> > that same reason there are far worse things that root in a child
> > user ns can do.  Spawning a child user ns is not in itself bad.
> >
> > This patch also allows setns for root in a container:
> > @Eric Biederman: are there gotchas in allowing setns from child
> > userns?
> 
> Yes.  We need to ensure that the target namespaces are namespaces
> that have been created in from user_namespace or from a child of this
> user_namespace.
> 
> Aka we need to ensure that we have CAP_SYS_ADMIN for the new namespace.

Thanks - so the last hunk in this patch is wrong.

> Eric
> 
> > Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > ---
> >  kernel/fork.c    |    4 ++--
> >  kernel/nsproxy.c |    6 +++---
> >  2 files changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 17bf7c8..22d0cf0 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -1473,8 +1473,8 @@ long do_fork(unsigned long clone_flags,
> >  		/* hopefully this check will go away when userns support is
> >  		 * complete
> >  		 */
> > -		if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
> > -				!capable(CAP_SETGID))
> > +		if (!nsown_capable(CAP_SYS_ADMIN) || !nsown_capable(CAP_SETUID) ||
> > +				!nsown_capable(CAP_SETGID))
> >  			return -EPERM;
> >  	}
> >  
> > diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> > index 9aeab4b..f50542d 100644
> > --- a/kernel/nsproxy.c
> > +++ b/kernel/nsproxy.c
> > @@ -134,7 +134,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
> >  				CLONE_NEWPID | CLONE_NEWNET)))
> >  		return 0;
> >  
> > -	if (!capable(CAP_SYS_ADMIN)) {
> > +	if (!nsown_capable(CAP_SYS_ADMIN)) {
> >  		err = -EPERM;
> >  		goto out;
> >  	}
> > @@ -191,7 +191,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
> >  			       CLONE_NEWNET)))
> >  		return 0;
> >  
> > -	if (!capable(CAP_SYS_ADMIN))
> > +	if (!nsown_capable(CAP_SYS_ADMIN))
> >  		return -EPERM;
> >  
> >  	*new_nsp = create_new_namespaces(unshare_flags, current,
> > @@ -241,7 +241,7 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype)
> >  	struct file *file;
> >  	int err;
> >  
> > -	if (!capable(CAP_SYS_ADMIN))
> > +	if (!nsown_capable(CAP_SYS_ADMIN))
> >  		return -EPERM;
> >  
> >  	file = proc_ns_fget(fd);

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/14] userns: clamp down users of cap_raised
  2011-07-26 18:58 ` [PATCH 05/14] userns: clamp down users of cap_raised Serge Hallyn
@ 2011-07-28 23:23   ` Vasiliy Kulikov
  2011-07-28 23:51     ` Serge E. Hallyn
  0 siblings, 1 reply; 30+ messages in thread
From: Vasiliy Kulikov @ 2011-07-28 23:23 UTC (permalink / raw)
  To: Serge Hallyn; +Cc: linux-kernel, netdev, containers, dhowells, ebiederm

On Tue, Jul 26, 2011 at 18:58 +0000, Serge Hallyn wrote:
> From: Serge E. Hallyn <serge.hallyn@canonical.com>
> 
> A few modules are using cap_raised(current_cap(), cap) to authorize
> actions, but the privilege should be applicable against the initial
> user namespace.  Refuse privilege if the caller is not in init_user_ns.
> 
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  drivers/block/drbd/drbd_nl.c           |    5 +++++
>  drivers/md/dm-log-userspace-transfer.c |    3 +++
>  drivers/staging/pohmelfs/config.c      |    3 +++
>  drivers/video/uvesafb.c                |    3 +++
>  4 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
> index 515bcd9..7717f8a 100644
> --- a/drivers/block/drbd/drbd_nl.c
> +++ b/drivers/block/drbd/drbd_nl.c
> @@ -2297,6 +2297,11 @@ static void drbd_connector_callback(struct cn_msg *req, struct netlink_skb_parms
>  		return;
>  	}
>  
> +	if (current_user_ns() != &init_user_ns) {
[...]
>  	if (!cap_raised(current_cap(), CAP_SYS_ADMIN)) {
[...]

Looks like it is an often pattern.  Maybe move both checks to a
function?


Thanks,

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/14] userns: clamp down users of cap_raised
  2011-07-28 23:23   ` Vasiliy Kulikov
@ 2011-07-28 23:51     ` Serge E. Hallyn
  0 siblings, 0 replies; 30+ messages in thread
From: Serge E. Hallyn @ 2011-07-28 23:51 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: Serge Hallyn, dhowells, netdev, containers, linux-kernel, ebiederm

Quoting Vasiliy Kulikov (segooon@gmail.com):
> On Tue, Jul 26, 2011 at 18:58 +0000, Serge Hallyn wrote:
> > From: Serge E. Hallyn <serge.hallyn@canonical.com>
> > 
> > A few modules are using cap_raised(current_cap(), cap) to authorize
> > actions, but the privilege should be applicable against the initial
> > user namespace.  Refuse privilege if the caller is not in init_user_ns.
> > 
> > Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > ---
> >  drivers/block/drbd/drbd_nl.c           |    5 +++++
> >  drivers/md/dm-log-userspace-transfer.c |    3 +++
> >  drivers/staging/pohmelfs/config.c      |    3 +++
> >  drivers/video/uvesafb.c                |    3 +++
> >  4 files changed, 14 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
> > index 515bcd9..7717f8a 100644
> > --- a/drivers/block/drbd/drbd_nl.c
> > +++ b/drivers/block/drbd/drbd_nl.c
> > @@ -2297,6 +2297,11 @@ static void drbd_connector_callback(struct cn_msg *req, struct netlink_skb_parms
> >  		return;
> >  	}
> >  
> > +	if (current_user_ns() != &init_user_ns) {
> [...]
> >  	if (!cap_raised(current_cap(), CAP_SYS_ADMIN)) {
> [...]
> 
> Looks like it is an often pattern.  Maybe move both checks to a
> function?

This pattern is used 4 times (IIRC).  The reason I didn't break it out is
that it's very close to just 'capable(CAP_SYS_ADMIN)', which also checks
for CAP_SYS_ADMIN to the init_user_ns.  But the above, rightly or wrongly,
does not set the PF_SUPERPRIV task flag.  I don't want to advocate usage
of the above, and creating a helper for the above would both further
pollute the capability-related function namespace, and make the above
look more legitimate than I think it is.

Imo 'cap-raised(current_cap(), X)' should not be used at all.  But I
didn't want to deal with that here, just make it user-ns safe.

-serge

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/14] add Documentation/namespaces/user_namespace.txt (v3)
  2011-07-26 20:29   ` David Howells
@ 2011-07-29 17:25     ` Serge E. Hallyn
  0 siblings, 0 replies; 30+ messages in thread
From: Serge E. Hallyn @ 2011-07-29 17:25 UTC (permalink / raw)
  To: David Howells
  Cc: Randy Dunlap, linux-kernel, ebiederm, containers, netdev, akpm,
	Serge E. Hallyn

Quoting David Howells (dhowells@redhat.com):
> Randy Dunlap <rdunlap@xenotime.net> wrote:
> 
> > > +Any task in or resource belonging to the initial user namespace will, to this
> > > +new task, appear to belong to UID and GID -1 - which is usually known as
> > 
> > that extra hyphen is confusing.  how about:
> > 
> >                               to UID and GID -1, which is
> 
> 'which are'.
> 
> David

This will hold some info about the design.  Currently it contains
future todos, issues and questions.

Changelog:
   jul 26: incorporate feedback from David Howells.
   jul 29: incorporate feedback from Randy Dunlap.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
---
 Documentation/namespaces/user_namespace.txt |  107 +++++++++++++++++++++++++++
 1 files changed, 107 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/namespaces/user_namespace.txt

diff --git a/Documentation/namespaces/user_namespace.txt b/Documentation/namespaces/user_namespace.txt
new file mode 100644
index 0000000..b0bc480
--- /dev/null
+++ b/Documentation/namespaces/user_namespace.txt
@@ -0,0 +1,107 @@
+Description
+===========
+
+Traditionally, each task is owned by a user ID (UID) and belongs to one or more
+groups (GID).  Both are simple numeric IDs, though userspace usually translates
+them to names.  The user namespace allows tasks to have different views of the
+UIDs and GIDs associated with tasks and other resources.  (See 'UID mapping'
+below for more.)
+
+The user namespace is a simple hierarchical one.  The system starts with all
+tasks belonging to the initial user namespace.  A task creates a new user
+namespace by passing the CLONE_NEWUSER flag to clone(2).  This requires the
+creating task to have the CAP_SETUID, CAP_SETGID, and CAP_CHOWN capabilities,
+but it does not need to be running as root.  The clone(2) call will result in a
+new task which to itself appears to be running as UID and GID 0, but to its
+creator seems to have the creator's credentials.
+
+To this new task, any resource belonging to the initial user namespace will
+appear to belong to user and group 'nobody', which are UID and GID -1.
+Permission to open such files will be granted according to world access
+permissions.  UID comparisons and group membership checks will return false,
+and privilege will be denied.
+
+When a task belonging to (for example) userid 500 in the initial user namespace
+creates a new user namespace, even though the new task will see itself as
+belonging to UID 0, any task in the initial user namespace will see it as
+belonging to UID 500.  Therefore, UID 500 in the initial user namespace will be
+able to kill the new task.  Files created by the new user will (eventually) be
+seen by tasks in its own user namespace as belonging to UID 0, but to tasks in
+the initial user namespace as belonging to UID 500.
+
+Note that this userid mapping for the VFS is not yet implemented, though the
+lkml and containers mailing list archives will show several previous
+prototypes.  In the end, those got hung up waiting on the concept of targeted
+capabilities to be developed, which, thanks to the insight of Eric Biederman,
+they finally did.
+
+Relationship between the User namespace and other namespaces
+============================================================
+
+Other namespaces, such as UTS and network, are owned by a user namespace.  When
+such a namespace is created, it is assigned to the user namespace of the task
+by which it was created.  Therefore, attempts to exercise privilege to
+resources in, for instance, a particular network namespace, can be properly
+validated by checking whether the caller has the needed privilege (i.e.
+CAP_NET_ADMIN) targeted to the user namespace which owns the network namespace.
+This is done using the ns_capable() function.
+
+As an example, if a new task is cloned with a private user namespace but
+no private network namespace, then the task's network namespace is owned
+by the parent user namespace.  The new task has no privilege to the
+parent user namespace, so it will not be able to create or configure
+network devices.  If, instead, the task were cloned with both private
+user and network namespaces, then the private network namespace is owned
+by the private user namespace, and so root in the new user namespace
+will have privilege targeted to the network namespace.  It will be able
+to create and configure network devices.
+
+UID Mapping
+===========
+The current plan (see 'flexible UID mapping' at
+https://wiki.ubuntu.com/UserNamespace) is:
+
+The UID/GID stored on disk will be that in the init_user_ns.  Most likely
+UID/GID in other namespaces will be stored in xattrs.  But Eric was advocating
+(a few years ago) leaving the details up to filesystems while providing a lib/
+stock implementation.  See the thread around here:
+http://www.mail-archive.com/devel@openvz.org/msg09331.html
+
+
+Working notes
+=============
+Capability checks for actions related to syslog must be against the
+init_user_ns until syslog is containerized.
+
+Same is true for reboot and power, control groups, devices, and time.
+
+Perf actions (kernel/event/core.c for instance) will always be constrained to
+init_user_ns.
+
+Q:
+Is accounting considered properly containerized with respect to pidns?  (it
+appears to be).  If so, then we can change the capable() check in
+kernel/acct.c to 'ns_capable(current_pid_ns()->user_ns, CAP_PACCT)'
+
+Q:
+For things like nice and schedaffinity, we could allow root in a container to
+control those, and leave only cgroups to constrain the container.  I'm not sure
+whether that is right, or whether it violates admin expectations.
+
+I deferred some of commoncap.c.  I'm punting on xattr stuff as they take
+dentries, not inodes.
+
+For drivers/tty/tty_io.c and drivers/tty/vt/vt.c, we'll want to (for some of
+them) target the capability checks at the user_ns owning the tty.  That will
+have to wait until we get userns owning files straightened out.
+
+We need to figure out how to label devices.  Should we just toss a user_ns
+right into struct device?
+
+capable(CAP_MAC_ADMIN) checks are always to be against init_user_ns, unless
+some day LSMs were to be containerized, near zero chance.
+
+inode_owner_or_capable() should probably take an optional ns and cap parameter.
+If cap is 0, then CAP_FOWNER is checked.  If ns is NULL, we derive the ns from
+inode.  But if ns is provided, then callers who need to derive
+inode_userns(inode) anyway can save a few cycles.
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/14] allow root in container to copy namespaces (v3)
  2011-07-27 23:14   ` Eric W. Biederman
  2011-07-28  2:13     ` Serge E. Hallyn
@ 2011-07-29 17:27     ` Serge E. Hallyn
  2011-08-01 22:25       ` Eric W. Biederman
  1 sibling, 1 reply; 30+ messages in thread
From: Serge E. Hallyn @ 2011-07-29 17:27 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, netdev, containers, dhowells

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Serge Hallyn <serge@hallyn.com> writes:
> 
> > From: Serge E. Hallyn <serge.hallyn@canonical.com>
> >
> > Othewise nested containers with user namespaces won't be possible.
> >
> > It's true that user namespaces are not yet fully isolated, but for
> > that same reason there are far worse things that root in a child
> > user ns can do.  Spawning a child user ns is not in itself bad.
> >
> > This patch also allows setns for root in a container:
> > @Eric Biederman: are there gotchas in allowing setns from child
> > userns?
> 
> Yes.  We need to ensure that the target namespaces are namespaces
> that have been created in from user_namespace or from a child of this
> user_namespace.
> 
> Aka we need to ensure that we have CAP_SYS_ADMIN for the new namespace.

[New patch below]

Othewise nested containers with user namespaces won't be possible.

It's true that user namespaces are not yet fully isolated, but for
that same reason there are far worse things that root in a child
user ns can do.  Spawning a child user ns is not in itself bad.

This patch also allows setns for root in a container:
@Eric Biederman: are there gotchas in allowing setns from child
userns?

Changelog:
  Jul 29: setns: target capability check for setns
          When changing to another namespace, make sure that we have
          the CAP_SYS_ADMIN capability targeted at the user namespace
          owning the new ns.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 ipc/namespace.c          |    3 +++
 kernel/fork.c            |    4 ++--
 kernel/nsproxy.c         |    7 ++-----
 kernel/utsname.c         |    3 +++
 net/core/net_namespace.c |    3 +++
 5 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/ipc/namespace.c b/ipc/namespace.c
index ce0a647..f527e49 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -163,6 +163,9 @@ static void ipcns_put(void *ns)
 
 static int ipcns_install(struct nsproxy *nsproxy, void *ns)
 {
+	struct ipc_namespace *newns = ns;
+	if (!ns_capable(newns->user_ns, CAP_SYS_ADMIN))
+		return -1;
 	/* Ditch state from the old ipc namespace */
 	exit_sem(current);
 	put_ipc_ns(nsproxy->ipc_ns);
diff --git a/kernel/fork.c b/kernel/fork.c
index e7ceaca..f9fac70 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1488,8 +1488,8 @@ long do_fork(unsigned long clone_flags,
 		/* hopefully this check will go away when userns support is
 		 * complete
 		 */
-		if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
-				!capable(CAP_SETGID))
+		if (!nsown_capable(CAP_SYS_ADMIN) || !nsown_capable(CAP_SETUID) ||
+				!nsown_capable(CAP_SETGID))
 			return -EPERM;
 	}
 
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 9aeab4b..cadcee0 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -134,7 +134,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
 				CLONE_NEWPID | CLONE_NEWNET)))
 		return 0;
 
-	if (!capable(CAP_SYS_ADMIN)) {
+	if (!nsown_capable(CAP_SYS_ADMIN)) {
 		err = -EPERM;
 		goto out;
 	}
@@ -191,7 +191,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
 			       CLONE_NEWNET)))
 		return 0;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!nsown_capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
 	*new_nsp = create_new_namespaces(unshare_flags, current,
@@ -241,9 +241,6 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype)
 	struct file *file;
 	int err;
 
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
 	file = proc_ns_fget(fd);
 	if (IS_ERR(file))
 		return PTR_ERR(file);
diff --git a/kernel/utsname.c b/kernel/utsname.c
index bff131b..8f648cc 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -104,6 +104,9 @@ static void utsns_put(void *ns)
 
 static int utsns_install(struct nsproxy *nsproxy, void *ns)
 {
+	struct uts_namespace *newns = ns;
+	if (!ns_capable(newns->user_ns, CAP_SYS_ADMIN))
+		return -1;
 	get_uts_ns(ns);
 	put_uts_ns(nsproxy->uts_ns);
 	nsproxy->uts_ns = ns;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 5bbdbf0..90c97f6 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -620,6 +620,9 @@ static void netns_put(void *ns)
 
 static int netns_install(struct nsproxy *nsproxy, void *ns)
 {
+	struct net *net = ns;
+	if (!ns_capable(net->user_ns, CAP_SYS_ADMIN))
+		return -1;
 	put_net(nsproxy->net_ns);
 	nsproxy->net_ns = get_net(ns);
 	return 0;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/14] allow root in container to copy namespaces (v3)
  2011-07-29 17:27     ` [PATCH 02/14] allow root in container to copy namespaces (v3) Serge E. Hallyn
@ 2011-08-01 22:25       ` Eric W. Biederman
       [not found]         ` <m1ei146a6t.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2011-08-01 22:25 UTC (permalink / raw)
  To: Serge E. Hallyn, Serge E. Hallyn
  Cc: dhowells, netdev, containers, linux-kernel

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> Serge Hallyn <serge@hallyn.com> writes:
>> 
>> > From: Serge E. Hallyn <serge.hallyn@canonical.com>
>> >
>> > Othewise nested containers with user namespaces won't be possible.
>> >
>> > It's true that user namespaces are not yet fully isolated, but for
>> > that same reason there are far worse things that root in a child
>> > user ns can do.  Spawning a child user ns is not in itself bad.
>> >
>> > This patch also allows setns for root in a container:
>> > @Eric Biederman: are there gotchas in allowing setns from child
>> > userns?
>> 
>> Yes.  We need to ensure that the target namespaces are namespaces
>> that have been created in from user_namespace or from a child of this
>> user_namespace.
>> 
>> Aka we need to ensure that we have CAP_SYS_ADMIN for the new namespace.
>
> [New patch below]
>
> Othewise nested containers with user namespaces won't be possible.
>
> It's true that user namespaces are not yet fully isolated, but for
> that same reason there are far worse things that root in a child
> user ns can do.  Spawning a child user ns is not in itself bad.
>
> This patch also allows setns for root in a container:
> @Eric Biederman: are there gotchas in allowing setns from child
> userns?

The dangers of changing the namespace of a process remain the same,
confused suid programs.  I don't believe there are any unique new
dangers. 

Not allowing joining namespaces you already have a copy of is just
a matter of making it hard to get things wrong.

I would feel more a bit more comfortable if the way we did this was
to move all of the capable calls into the per namespace methods
and then changed them one namespace at a time.  I don't think
there are any fundmanetal dangers of allowing unshare without
the global CAP_SYS_ADMIN, but it would be good to be able to audit
and make or revoke the decision one namespace at a time.

Eric


> Changelog:
>   Jul 29: setns: target capability check for setns
>           When changing to another namespace, make sure that we have
>           the CAP_SYS_ADMIN capability targeted at the user namespace
>           owning the new ns.
>
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  ipc/namespace.c          |    3 +++
>  kernel/fork.c            |    4 ++--
>  kernel/nsproxy.c         |    7 ++-----
>  kernel/utsname.c         |    3 +++
>  net/core/net_namespace.c |    3 +++
>  5 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/ipc/namespace.c b/ipc/namespace.c
> index ce0a647..f527e49 100644
> --- a/ipc/namespace.c
> +++ b/ipc/namespace.c
> @@ -163,6 +163,9 @@ static void ipcns_put(void *ns)
>  
>  static int ipcns_install(struct nsproxy *nsproxy, void *ns)
>  {
> +	struct ipc_namespace *newns = ns;
> +	if (!ns_capable(newns->user_ns, CAP_SYS_ADMIN))
> +		return -1;
>  	/* Ditch state from the old ipc namespace */
>  	exit_sem(current);
>  	put_ipc_ns(nsproxy->ipc_ns);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index e7ceaca..f9fac70 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1488,8 +1488,8 @@ long do_fork(unsigned long clone_flags,
>  		/* hopefully this check will go away when userns support is
>  		 * complete
>  		 */
> -		if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
> -				!capable(CAP_SETGID))
> +		if (!nsown_capable(CAP_SYS_ADMIN) || !nsown_capable(CAP_SETUID) ||
> +				!nsown_capable(CAP_SETGID))
>  			return -EPERM;
>  	}
>  
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index 9aeab4b..cadcee0 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -134,7 +134,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
>  				CLONE_NEWPID | CLONE_NEWNET)))
>  		return 0;
>  
> -	if (!capable(CAP_SYS_ADMIN)) {
> +	if (!nsown_capable(CAP_SYS_ADMIN)) {
>  		err = -EPERM;
>  		goto out;
>  	}
> @@ -191,7 +191,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
>  			       CLONE_NEWNET)))
>  		return 0;
>  
> -	if (!capable(CAP_SYS_ADMIN))
> +	if (!nsown_capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
>  	*new_nsp = create_new_namespaces(unshare_flags, current,
> @@ -241,9 +241,6 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype)
>  	struct file *file;
>  	int err;
>  
> -	if (!capable(CAP_SYS_ADMIN))
> -		return -EPERM;
> -
>  	file = proc_ns_fget(fd);
>  	if (IS_ERR(file))
>  		return PTR_ERR(file);
> diff --git a/kernel/utsname.c b/kernel/utsname.c
> index bff131b..8f648cc 100644
> --- a/kernel/utsname.c
> +++ b/kernel/utsname.c
> @@ -104,6 +104,9 @@ static void utsns_put(void *ns)
>  
>  static int utsns_install(struct nsproxy *nsproxy, void *ns)
>  {
> +	struct uts_namespace *newns = ns;
> +	if (!ns_capable(newns->user_ns, CAP_SYS_ADMIN))
> +		return -1;
>  	get_uts_ns(ns);
>  	put_uts_ns(nsproxy->uts_ns);
>  	nsproxy->uts_ns = ns;
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 5bbdbf0..90c97f6 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -620,6 +620,9 @@ static void netns_put(void *ns)
>  
>  static int netns_install(struct nsproxy *nsproxy, void *ns)
>  {
> +	struct net *net = ns;
> +	if (!ns_capable(net->user_ns, CAP_SYS_ADMIN))
> +		return -1;
>  	put_net(nsproxy->net_ns);
>  	nsproxy->net_ns = get_net(ns);
>  	return 0;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/14] allow root in container to copy namespaces (v3)
       [not found]         ` <m1ei146a6t.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2011-08-02 14:08           ` Serge E. Hallyn
  2011-08-02 22:03             ` Eric W. Biederman
  0 siblings, 1 reply; 30+ messages in thread
From: Serge E. Hallyn @ 2011-08-02 14:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> The dangers of changing the namespace of a process remain the same,
> confused suid programs.  I don't believe there are any unique new
> dangers. 
> 
> Not allowing joining namespaces you already have a copy of is just
> a matter of making it hard to get things wrong.
> 
> I would feel more a bit more comfortable if the way we did this was
> to move all of the capable calls into the per namespace methods
> and then changed them one namespace at a time.  I don't think

The patch belows moves them into the per namespace methods, for
what it's worth.  If you like I can change them, for now, to
'capable(CAP_SYS_ADMIN)' targeted at init_user_ns, but if we're
targetting at the userns owning the destination namespace, it
seems this must be sufficient...

> there are any fundmanetal dangers of allowing unshare without
> the global CAP_SYS_ADMIN, but it would be good to be able to audit

If you have suspicions that there may in fact be dangers, then
perhaps this whole patch should be delayed, and copy_namespaces()
and unshare_nsproxy_namespaces() should continue to check global
CAP_SYS_ADMIN?  The only part which would remain would be the
moving of the setns capable check into the per-ns ->install
method, but it would check the global CAP_SYS_ADMIN?

> and make or revoke the decision one namespace at a time.
> 
> Eric
> 
> 
> > Changelog:
> >   Jul 29: setns: target capability check for setns
> >           When changing to another namespace, make sure that we have
> >           the CAP_SYS_ADMIN capability targeted at the user namespace
> >           owning the new ns.
> >
> > Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> > Cc: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> > ---
> >  ipc/namespace.c          |    3 +++
> >  kernel/fork.c            |    4 ++--
> >  kernel/nsproxy.c         |    7 ++-----
> >  kernel/utsname.c         |    3 +++
> >  net/core/net_namespace.c |    3 +++
> >  5 files changed, 13 insertions(+), 7 deletions(-)
> >
> > diff --git a/ipc/namespace.c b/ipc/namespace.c
> > index ce0a647..f527e49 100644
> > --- a/ipc/namespace.c
> > +++ b/ipc/namespace.c
> > @@ -163,6 +163,9 @@ static void ipcns_put(void *ns)
> >  
> >  static int ipcns_install(struct nsproxy *nsproxy, void *ns)
> >  {
> > +	struct ipc_namespace *newns = ns;
> > +	if (!ns_capable(newns->user_ns, CAP_SYS_ADMIN))
> > +		return -1;
> >  	/* Ditch state from the old ipc namespace */
> >  	exit_sem(current);
> >  	put_ipc_ns(nsproxy->ipc_ns);
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index e7ceaca..f9fac70 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -1488,8 +1488,8 @@ long do_fork(unsigned long clone_flags,
> >  		/* hopefully this check will go away when userns support is
> >  		 * complete
> >  		 */
> > -		if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
> > -				!capable(CAP_SETGID))
> > +		if (!nsown_capable(CAP_SYS_ADMIN) || !nsown_capable(CAP_SETUID) ||
> > +				!nsown_capable(CAP_SETGID))
> >  			return -EPERM;
> >  	}
> >  
> > diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> > index 9aeab4b..cadcee0 100644
> > --- a/kernel/nsproxy.c
> > +++ b/kernel/nsproxy.c
> > @@ -134,7 +134,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
> >  				CLONE_NEWPID | CLONE_NEWNET)))
> >  		return 0;
> >  
> > -	if (!capable(CAP_SYS_ADMIN)) {
> > +	if (!nsown_capable(CAP_SYS_ADMIN)) {
> >  		err = -EPERM;
> >  		goto out;
> >  	}
> > @@ -191,7 +191,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
> >  			       CLONE_NEWNET)))
> >  		return 0;
> >  
> > -	if (!capable(CAP_SYS_ADMIN))
> > +	if (!nsown_capable(CAP_SYS_ADMIN))
> >  		return -EPERM;
> >  
> >  	*new_nsp = create_new_namespaces(unshare_flags, current,
> > @@ -241,9 +241,6 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype)
> >  	struct file *file;
> >  	int err;
> >  
> > -	if (!capable(CAP_SYS_ADMIN))
> > -		return -EPERM;
> > -
> >  	file = proc_ns_fget(fd);
> >  	if (IS_ERR(file))
> >  		return PTR_ERR(file);
> > diff --git a/kernel/utsname.c b/kernel/utsname.c
> > index bff131b..8f648cc 100644
> > --- a/kernel/utsname.c
> > +++ b/kernel/utsname.c
> > @@ -104,6 +104,9 @@ static void utsns_put(void *ns)
> >  
> >  static int utsns_install(struct nsproxy *nsproxy, void *ns)
> >  {
> > +	struct uts_namespace *newns = ns;
> > +	if (!ns_capable(newns->user_ns, CAP_SYS_ADMIN))
> > +		return -1;
> >  	get_uts_ns(ns);
> >  	put_uts_ns(nsproxy->uts_ns);
> >  	nsproxy->uts_ns = ns;
> > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> > index 5bbdbf0..90c97f6 100644
> > --- a/net/core/net_namespace.c
> > +++ b/net/core/net_namespace.c
> > @@ -620,6 +620,9 @@ static void netns_put(void *ns)
> >  
> >  static int netns_install(struct nsproxy *nsproxy, void *ns)
> >  {
> > +	struct net *net = ns;
> > +	if (!ns_capable(net->user_ns, CAP_SYS_ADMIN))
> > +		return -1;
> >  	put_net(nsproxy->net_ns);
> >  	nsproxy->net_ns = get_net(ns);
> >  	return 0;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/14] allow root in container to copy namespaces (v3)
  2011-08-02 14:08           ` Serge E. Hallyn
@ 2011-08-02 22:03             ` Eric W. Biederman
  2011-08-04 22:01               ` Serge E. Hallyn
  0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2011-08-02 22:03 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Serge E. Hallyn, dhowells, netdev, containers, linux-kernel

"Serge E. Hallyn" <serge.hallyn@canonical.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> The dangers of changing the namespace of a process remain the same,
>> confused suid programs.  I don't believe there are any unique new
>> dangers. 
>> 
>> Not allowing joining namespaces you already have a copy of is just
>> a matter of making it hard to get things wrong.
>> 
>> I would feel more a bit more comfortable if the way we did this was
>> to move all of the capable calls into the per namespace methods
>> and then changed them one namespace at a time.  I don't think
>
> The patch belows moves them into the per namespace methods, for
> what it's worth.  If you like I can change them, for now, to
> 'capable(CAP_SYS_ADMIN)' targeted at init_user_ns, but if we're
> targetting at the userns owning the destination namespace, it
> seems this must be sufficient...

I like the was this was done.  I was mostly thinking of the non
setns case when I was talking about moving the calls.

>> there are any fundmanetal dangers of allowing unshare without
>> the global CAP_SYS_ADMIN, but it would be good to be able to audit
>
> If you have suspicions that there may in fact be dangers, then
> perhaps this whole patch should be delayed, and copy_namespaces()
> and unshare_nsproxy_namespaces() should continue to check global
> CAP_SYS_ADMIN?  The only part which would remain would be the
> moving of the setns capable check into the per-ns ->install
> method, but it would check the global CAP_SYS_ADMIN?

Yes.  I am in favor of delaying this and making the changes one
namespace at a time.  I don't think there are real dangers but I do
think we should try and think through the possible dangers.

There should not be any dangers but at the same time it is easy to think
only root can do X, so who cares if the code isn't quite perfect.

So when we drop the only root can do X to be responsible we should make
an effort to review the code.

Eric

>> and make or revoke the decision one namespace at a time.
>> 
>> Eric
>> 
>> 
>> > Changelog:
>> >   Jul 29: setns: target capability check for setns
>> >           When changing to another namespace, make sure that we have
>> >           the CAP_SYS_ADMIN capability targeted at the user namespace
>> >           owning the new ns.
>> >
>> > Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
>> > Cc: Eric W. Biederman <ebiederm@xmission.com>
>> > ---
>> >  ipc/namespace.c          |    3 +++
>> >  kernel/fork.c            |    4 ++--
>> >  kernel/nsproxy.c         |    7 ++-----
>> >  kernel/utsname.c         |    3 +++
>> >  net/core/net_namespace.c |    3 +++
>> >  5 files changed, 13 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/ipc/namespace.c b/ipc/namespace.c
>> > index ce0a647..f527e49 100644
>> > --- a/ipc/namespace.c
>> > +++ b/ipc/namespace.c
>> > @@ -163,6 +163,9 @@ static void ipcns_put(void *ns)
>> >  
>> >  static int ipcns_install(struct nsproxy *nsproxy, void *ns)
>> >  {
>> > +	struct ipc_namespace *newns = ns;
>> > +	if (!ns_capable(newns->user_ns, CAP_SYS_ADMIN))
>> > +		return -1;
>> >  	/* Ditch state from the old ipc namespace */
>> >  	exit_sem(current);
>> >  	put_ipc_ns(nsproxy->ipc_ns);
>> > diff --git a/kernel/fork.c b/kernel/fork.c
>> > index e7ceaca..f9fac70 100644
>> > --- a/kernel/fork.c
>> > +++ b/kernel/fork.c
>> > @@ -1488,8 +1488,8 @@ long do_fork(unsigned long clone_flags,
>> >  		/* hopefully this check will go away when userns support is
>> >  		 * complete
>> >  		 */
>> > -		if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
>> > -				!capable(CAP_SETGID))
>> > +		if (!nsown_capable(CAP_SYS_ADMIN) || !nsown_capable(CAP_SETUID) ||
>> > +				!nsown_capable(CAP_SETGID))
>> >  			return -EPERM;
>> >  	}
>> >  
>> > diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
>> > index 9aeab4b..cadcee0 100644
>> > --- a/kernel/nsproxy.c
>> > +++ b/kernel/nsproxy.c
>> > @@ -134,7 +134,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
>> >  				CLONE_NEWPID | CLONE_NEWNET)))
>> >  		return 0;
>> >  
>> > -	if (!capable(CAP_SYS_ADMIN)) {
>> > +	if (!nsown_capable(CAP_SYS_ADMIN)) {
>> >  		err = -EPERM;
>> >  		goto out;
>> >  	}
>> > @@ -191,7 +191,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
>> >  			       CLONE_NEWNET)))
>> >  		return 0;
>> >  
>> > -	if (!capable(CAP_SYS_ADMIN))
>> > +	if (!nsown_capable(CAP_SYS_ADMIN))
>> >  		return -EPERM;
>> >  
>> >  	*new_nsp = create_new_namespaces(unshare_flags, current,
>> > @@ -241,9 +241,6 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype)
>> >  	struct file *file;
>> >  	int err;
>> >  
>> > -	if (!capable(CAP_SYS_ADMIN))
>> > -		return -EPERM;
>> > -
>> >  	file = proc_ns_fget(fd);
>> >  	if (IS_ERR(file))
>> >  		return PTR_ERR(file);
>> > diff --git a/kernel/utsname.c b/kernel/utsname.c
>> > index bff131b..8f648cc 100644
>> > --- a/kernel/utsname.c
>> > +++ b/kernel/utsname.c
>> > @@ -104,6 +104,9 @@ static void utsns_put(void *ns)
>> >  
>> >  static int utsns_install(struct nsproxy *nsproxy, void *ns)
>> >  {
>> > +	struct uts_namespace *newns = ns;
>> > +	if (!ns_capable(newns->user_ns, CAP_SYS_ADMIN))
>> > +		return -1;
>> >  	get_uts_ns(ns);
>> >  	put_uts_ns(nsproxy->uts_ns);
>> >  	nsproxy->uts_ns = ns;
>> > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>> > index 5bbdbf0..90c97f6 100644
>> > --- a/net/core/net_namespace.c
>> > +++ b/net/core/net_namespace.c
>> > @@ -620,6 +620,9 @@ static void netns_put(void *ns)
>> >  
>> >  static int netns_install(struct nsproxy *nsproxy, void *ns)
>> >  {
>> > +	struct net *net = ns;
>> > +	if (!ns_capable(net->user_ns, CAP_SYS_ADMIN))
>> > +		return -1;
>> >  	put_net(nsproxy->net_ns);
>> >  	nsproxy->net_ns = get_net(ns);
>> >  	return 0;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/14] allow root in container to copy namespaces (v3)
  2011-08-02 22:03             ` Eric W. Biederman
@ 2011-08-04 22:01               ` Serge E. Hallyn
  0 siblings, 0 replies; 30+ messages in thread
From: Serge E. Hallyn @ 2011-08-04 22:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, dhowells, netdev, containers, linux-kernel

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serge.hallyn@canonical.com> writes:
> 
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> The dangers of changing the namespace of a process remain the same,
> >> confused suid programs.  I don't believe there are any unique new
> >> dangers. 
> >> 
> >> Not allowing joining namespaces you already have a copy of is just
> >> a matter of making it hard to get things wrong.
> >> 
> >> I would feel more a bit more comfortable if the way we did this was
> >> to move all of the capable calls into the per namespace methods
> >> and then changed them one namespace at a time.  I don't think
> >
> > The patch belows moves them into the per namespace methods, for
> > what it's worth.  If you like I can change them, for now, to
> > 'capable(CAP_SYS_ADMIN)' targeted at init_user_ns, but if we're
> > targetting at the userns owning the destination namespace, it
> > seems this must be sufficient...
> 
> I like the was this was done.  I was mostly thinking of the non
> setns case when I was talking about moving the calls.

Oh, you mean unshare and copy namespaces?

(The flow on those paths is scary to touch :)

> >> there are any fundmanetal dangers of allowing unshare without
> >> the global CAP_SYS_ADMIN, but it would be good to be able to audit
> >
> > If you have suspicions that there may in fact be dangers, then
> > perhaps this whole patch should be delayed, and copy_namespaces()
> > and unshare_nsproxy_namespaces() should continue to check global
> > CAP_SYS_ADMIN?  The only part which would remain would be the
> > moving of the setns capable check into the per-ns ->install
> > method, but it would check the global CAP_SYS_ADMIN?
> 
> Yes.  I am in favor of delaying this and making the changes one
> namespace at a time.  I don't think there are real dangers but I do
> think we should try and think through the possible dangers.

Ok, so for now here is a patch to fold into the previous one
which I think sets us at a reasonable point.

>From 78e1a4efa464086e8df95fc3ffd35c385e363957 Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serge.hallyn@canonical.com>
Date: Thu, 4 Aug 2011 22:10:12 +0100
Subject: [PATCH 1/2] fold up - dont yet target the capable checks for
 namespace manipulation

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 ipc/namespace.c          |    4 ++++
 kernel/fork.c            |    5 +++++
 kernel/nsproxy.c         |    8 ++++++++
 kernel/utsname.c         |    4 ++++
 net/core/net_namespace.c |    4 ++++
 5 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/ipc/namespace.c b/ipc/namespace.c
index f527e49..a0a7609 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -163,8 +163,12 @@ static void ipcns_put(void *ns)
 
 static int ipcns_install(struct nsproxy *nsproxy, void *ns)
 {
+#if 0
 	struct ipc_namespace *newns = ns;
 	if (!ns_capable(newns->user_ns, CAP_SYS_ADMIN))
+#else
+	if (!capable(CAP_SYS_ADMIN))
+#endif
 		return -1;
 	/* Ditch state from the old ipc namespace */
 	exit_sem(current);
diff --git a/kernel/fork.c b/kernel/fork.c
index f9fac70..a25343c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1488,8 +1488,13 @@ long do_fork(unsigned long clone_flags,
 		/* hopefully this check will go away when userns support is
 		 * complete
 		 */
+#if 0
 		if (!nsown_capable(CAP_SYS_ADMIN) || !nsown_capable(CAP_SETUID) ||
 				!nsown_capable(CAP_SETGID))
+#else
+		if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
+				!capable(CAP_SETGID))
+#endif
 			return -EPERM;
 	}
 
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 62a995d..752b477 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -136,7 +136,11 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
 				CLONE_NEWPID | CLONE_NEWNET)))
 		return 0;
 
+#if 0
 	if (!nsown_capable(CAP_SYS_ADMIN)) {
+#else
+	if (!capable(CAP_SYS_ADMIN)) {
+#endif
 		err = -EPERM;
 		goto out;
 	}
@@ -193,7 +197,11 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
 			       CLONE_NEWNET)))
 		return 0;
 
+#if 0
 	if (!nsown_capable(CAP_SYS_ADMIN))
+#else
+	if (!capable(CAP_SYS_ADMIN))
+#endif
 		return -EPERM;
 
 	*new_nsp = create_new_namespaces(unshare_flags, current,
diff --git a/kernel/utsname.c b/kernel/utsname.c
index 8f648cc..4638a54 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -104,8 +104,12 @@ static void utsns_put(void *ns)
 
 static int utsns_install(struct nsproxy *nsproxy, void *ns)
 {
+#if 0
 	struct uts_namespace *newns = ns;
 	if (!ns_capable(newns->user_ns, CAP_SYS_ADMIN))
+#else
+	if (!capable(CAP_SYS_ADMIN))
+#endif
 		return -1;
 	get_uts_ns(ns);
 	put_uts_ns(nsproxy->uts_ns);
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 8778a0a..5ca95cc 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -623,8 +623,12 @@ static void netns_put(void *ns)
 
 static int netns_install(struct nsproxy *nsproxy, void *ns)
 {
+#if 0
 	struct net *net = ns;
 	if (!ns_capable(net->user_ns, CAP_SYS_ADMIN))
+#else
+	if (capable(CAP_SYS_ADMIN))
+#endif
 		return -1;
 	put_net(nsproxy->net_ns);
 	nsproxy->net_ns = get_net(ns);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/14] net/core/scm.c: target capable() calls to user_ns owning the net_ns
  2011-07-26 18:58 ` [PATCH 10/14] net/core/scm.c: target capable() calls to user_ns owning the net_ns Serge Hallyn
@ 2011-08-04 22:06   ` Serge E. Hallyn
  0 siblings, 0 replies; 30+ messages in thread
From: Serge E. Hallyn @ 2011-08-04 22:06 UTC (permalink / raw)
  To: Serge Hallyn; +Cc: linux-kernel, dhowells, ebiederm, containers, netdev, akpm

Quoting Serge Hallyn (serge@hallyn.com):
> From: Serge E. Hallyn <serge.hallyn@canonical.com>
> 
> The uid/gid comparisons don't have to be pulled out.  This just seemed
> more easily proved correct.

The following needs to be folded into this patch:

From: Serge Hallyn <serge.hallyn@canonical.com>
Date: Thu, 4 Aug 2011 21:48:13 +0000
Subject: [PATCH 2/2] fold up - net/core/scm.c: cred is const

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 net/core/scm.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/scm.c b/net/core/scm.c
index 21b5d0b..528fa36 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -43,7 +43,7 @@
  *	setu(g)id.
  */
 
-static __inline__ bool uidequiv(struct cred *src, struct ucred *tgt,
+static __inline__ bool uidequiv(const struct cred *src, struct ucred *tgt,
 			       struct user_namespace *ns)
 {
 	if (src->user_ns != ns)
@@ -57,7 +57,7 @@ check_capable:
 	return false;
 }
 
-static __inline__ bool gidequiv(struct cred *src, struct ucred *tgt,
+static __inline__ bool gidequiv(const struct cred *src, struct ucred *tgt,
 			       struct user_namespace *ns)
 {
 	if (src->user_ns != ns)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-08-04 22:06 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-26 18:58 [PATCH 0/14] user namespaces v2: continue targetting capabilities Serge Hallyn
2011-07-26 18:58 ` [PATCH 01/14] add Documentation/namespaces/user_namespace.txt Serge Hallyn
2011-07-26 20:22   ` Randy Dunlap
2011-07-27 15:38     ` Serge E. Hallyn
2011-07-27 16:02       ` Randy Dunlap
2011-07-26 20:29   ` David Howells
2011-07-29 17:25     ` [PATCH 01/14] add Documentation/namespaces/user_namespace.txt (v3) Serge E. Hallyn
2011-07-26 18:58 ` [PATCH 02/14] allow root in container to copy namespaces Serge Hallyn
2011-07-27 23:14   ` Eric W. Biederman
2011-07-28  2:13     ` Serge E. Hallyn
2011-07-29 17:27     ` [PATCH 02/14] allow root in container to copy namespaces (v3) Serge E. Hallyn
2011-08-01 22:25       ` Eric W. Biederman
     [not found]         ` <m1ei146a6t.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2011-08-02 14:08           ` Serge E. Hallyn
2011-08-02 22:03             ` Eric W. Biederman
2011-08-04 22:01               ` Serge E. Hallyn
2011-07-26 18:58 ` [PATCH 03/14] keyctl: check capabilities against key's user_ns Serge Hallyn
2011-07-26 18:58 ` [PATCH 04/14] user_ns: convert fs/attr.c to targeted capabilities Serge Hallyn
2011-07-26 18:58 ` [PATCH 05/14] userns: clamp down users of cap_raised Serge Hallyn
2011-07-28 23:23   ` Vasiliy Kulikov
2011-07-28 23:51     ` Serge E. Hallyn
2011-07-26 18:58 ` [PATCH 06/14] user namespace: make each net (net_ns) belong to a user_ns Serge Hallyn
2011-07-26 18:58 ` [PATCH 07/14] user namespace: use net->user_ns for some capable calls under net/ Serge Hallyn
2011-07-26 18:58 ` [PATCH 08/14] af_netlink.c: make netlink_capable userns-aware Serge Hallyn
2011-07-26 18:58 ` [PATCH 09/14] user ns: convert ipv6 to targeted capabilities Serge Hallyn
2011-07-26 18:58 ` [PATCH 10/14] net/core/scm.c: target capable() calls to user_ns owning the net_ns Serge Hallyn
2011-08-04 22:06   ` Serge E. Hallyn
2011-07-26 18:58 ` [PATCH 11/14] userns: make some net-sysfs capable calls targeted Serge Hallyn
2011-07-26 18:58 ` [PATCH 12/14] user_ns: target af_key capability check Serge Hallyn
2011-07-26 18:58 ` [PATCH 13/14] userns: net: make many network capable calls targeted Serge Hallyn
2011-07-26 18:58 ` [PATCH 14/14] net: pass user_ns to cap_netlink_recv() Serge Hallyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).