[PATCH 0/5 RFC] Add an interface to discover relationships between namespaces

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-14 18:20 ` Andrey Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 18:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-api, containers, criu, linux-fsdevel, Andrey Vagin,
	Eric W. Biederman, James Bottomley, Michael Kerrisk (man-pages),
	W. Trevor King, Alexander Viro, Serge Hallyn

Each namespace has an owning user namespace and now there is not way
to discover these relationships.

Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships too.

Why we may want to know relationships between namespaces?

One use would be visualization, in order to understand the running system.
Another would be to answer the question: what capability does process X have to
perform operations on a resource governed by namespace Y?

One more use-case (which usually called abnormal) is checkpoint/restart.
In CRIU we age going to dump and restore nested namespaces.

There [1] was a discussion about which interface to choose to determing
relationships between namespaces.

Eric suggested to add two ioctl-s [2]:
> Grumble, Grumble.  I think this may actually a case for creating ioctls
> for these two cases.  Now that random nsfs file descriptors are bind
> mountable the original reason for using proc files is not as pressing.
>
> One ioctl for the user namespace that owns a file descriptor.
> One ioctl for the parent namespace of a namespace file descriptor.

Here is an implementaions of these ioctl-s.

[1] https://lkml.org/lkml/2016/7/6/158
[2] https://lkml.org/lkml/2016/7/9/101

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: "W. Trevor King" <wking@tremily.us>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Serge Hallyn <serge.hallyn@canonical.com>

--
2.5.5

^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-14 18:20 ` Andrey Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 18:20 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrey Vagin,
	Eric W. Biederman, James Bottomley, Michael Kerrisk (man-pages),
	W. Trevor King, Alexander Viro, Serge Hallyn

Each namespace has an owning user namespace and now there is not way
to discover these relationships.

Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships too.

Why we may want to know relationships between namespaces?

One use would be visualization, in order to understand the running system.
Another would be to answer the question: what capability does process X have to
perform operations on a resource governed by namespace Y?

One more use-case (which usually called abnormal) is checkpoint/restart.
In CRIU we age going to dump and restore nested namespaces.

There [1] was a discussion about which interface to choose to determing
relationships between namespaces.

Eric suggested to add two ioctl-s [2]:
> Grumble, Grumble.  I think this may actually a case for creating ioctls
> for these two cases.  Now that random nsfs file descriptors are bind
> mountable the original reason for using proc files is not as pressing.
>
> One ioctl for the user namespace that owns a file descriptor.
> One ioctl for the parent namespace of a namespace file descriptor.

Here is an implementaions of these ioctl-s.

[1] https://lkml.org/lkml/2016/7/6/158
[2] https://lkml.org/lkml/2016/7/9/101

Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: "W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org>
Cc: Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

--
2.5.5

^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 1/5] namespaces: move user_ns into ns_common
       [not found] ` <1468520419-28220-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2016-07-14 18:20   ` Andrey Vagin
  2016-07-14 18:20     ` Andrey Vagin
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 18:20 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: criu-GEFAQzZX7r8dnm+yROfE0A, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrey Vagin

Every namespace has a pointer to an user namespace where is was created,
but they're all privately embedded in the individual namespace specific
structures.

Now we are going to add an user-space interface to get an owning user
namespace, so it looks reasonable to move it into ns_common.

Originally this idea was suggested by James Bottomley.

Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 drivers/net/bonding/bond_main.c         |  2 +-
 drivers/net/tun.c                       |  4 ++--
 fs/mount.h                              |  1 -
 fs/namespace.c                          | 14 +++++++-------
 fs/pnode.c                              |  4 ++--
 fs/proc/root.c                          |  2 +-
 include/linux/cgroup.h                  |  1 -
 include/linux/ipc_namespace.h           |  3 ---
 include/linux/ns_common.h               |  1 +
 include/linux/pid_namespace.h           |  1 -
 include/linux/user_namespace.h          |  8 ++++++--
 include/linux/utsname.h                 |  1 -
 include/net/net_namespace.h             |  1 -
 init/version.c                          |  2 +-
 ipc/mqueue.c                            |  2 +-
 ipc/msgutil.c                           |  2 +-
 ipc/namespace.c                         |  6 +++---
 ipc/shm.c                               |  2 +-
 ipc/util.c                              |  4 ++--
 kernel/cgroup.c                         | 12 ++++++------
 kernel/pid.c                            |  2 +-
 kernel/pid_namespace.c                  |  8 ++++----
 kernel/reboot.c                         |  2 +-
 kernel/sys.c                            |  4 ++--
 kernel/user_namespace.c                 |  4 ++++
 kernel/utsname.c                        |  6 +++---
 net/8021q/vlan.c                        | 12 ++++++------
 net/bridge/br_ioctl.c                   | 22 +++++++++++-----------
 net/bridge/br_sysfs_br.c                |  4 ++--
 net/bridge/br_sysfs_if.c                |  2 +-
 net/bridge/netfilter/ebtables.c         |  8 ++++----
 net/core/dev_ioctl.c                    |  4 ++--
 net/core/ethtool.c                      |  2 +-
 net/core/neighbour.c                    |  2 +-
 net/core/net-sysfs.c                    |  6 +++---
 net/core/net_namespace.c                |  6 +++---
 net/core/rtnetlink.c                    |  6 +++---
 net/core/scm.c                          |  2 +-
 net/core/sock.c                         | 10 +++++-----
 net/core/sock_diag.c                    |  2 +-
 net/core/sysctl_net_core.c              |  2 +-
 net/ieee802154/6lowpan/reassembly.c     |  2 +-
 net/ieee802154/socket.c                 |  8 ++++----
 net/ipv4/af_inet.c                      |  4 ++--
 net/ipv4/arp.c                          |  2 +-
 net/ipv4/devinet.c                      |  4 ++--
 net/ipv4/fib_frontend.c                 |  2 +-
 net/ipv4/ip_options.c                   |  6 +++---
 net/ipv4/ip_sockglue.c                  |  6 +++---
 net/ipv4/ip_tunnel.c                    |  4 ++--
 net/ipv4/ipmr.c                         |  2 +-
 net/ipv4/netfilter/arp_tables.c         |  8 ++++----
 net/ipv4/netfilter/ip_tables.c          |  8 ++++----
 net/ipv4/route.c                        |  2 +-
 net/ipv4/tcp.c                          |  2 +-
 net/ipv4/tcp_cong.c                     |  2 +-
 net/ipv6/addrconf.c                     |  4 ++--
 net/ipv6/af_inet6.c                     |  4 ++--
 net/ipv6/anycast.c                      |  2 +-
 net/ipv6/datagram.c                     |  6 +++---
 net/ipv6/ip6_flowlabel.c                |  2 +-
 net/ipv6/ip6_gre.c                      |  4 ++--
 net/ipv6/ip6_tunnel.c                   |  4 ++--
 net/ipv6/ip6_vti.c                      |  4 ++--
 net/ipv6/ip6mr.c                        |  2 +-
 net/ipv6/ipv6_sockglue.c                |  8 ++++----
 net/ipv6/netfilter/ip6_tables.c         |  8 ++++----
 net/ipv6/reassembly.c                   |  2 +-
 net/ipv6/route.c                        |  4 ++--
 net/ipv6/sit.c                          |  8 ++++----
 net/key/af_key.c                        |  2 +-
 net/llc/af_llc.c                        |  2 +-
 net/netfilter/ipset/ip_set_core.c       |  2 +-
 net/netfilter/ipvs/ip_vs_ctl.c          |  6 +++---
 net/netfilter/ipvs/ip_vs_lblc.c         |  2 +-
 net/netfilter/ipvs/ip_vs_lblcr.c        |  2 +-
 net/netfilter/nf_conntrack_acct.c       |  2 +-
 net/netfilter/nf_conntrack_ecache.c     |  2 +-
 net/netfilter/nf_conntrack_expect.c     |  4 ++--
 net/netfilter/nf_conntrack_helper.c     |  2 +-
 net/netfilter/nf_conntrack_proto_dccp.c |  2 +-
 net/netfilter/nf_conntrack_standalone.c |  6 +++---
 net/netfilter/nf_conntrack_timestamp.c  |  2 +-
 net/netfilter/nfnetlink_log.c           |  4 ++--
 net/netfilter/x_tables.c                |  4 ++--
 net/netlink/af_netlink.c                |  8 ++++----
 net/netlink/genetlink.c                 |  2 +-
 net/packet/af_packet.c                  |  2 +-
 net/sched/cls_api.c                     |  2 +-
 net/sched/sch_api.c                     |  6 +++---
 net/sctp/socket.c                       |  6 +++---
 net/sysctl_net.c                        |  6 +++---
 net/unix/sysctl_net_unix.c              |  2 +-
 net/xfrm/xfrm_sysctl.c                  |  2 +-
 94 files changed, 197 insertions(+), 196 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a2afa3b..5ebe22a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3425,7 +3425,7 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
 
 	net = dev_net(bond_dev);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	slave_dev = __dev_get_by_name(net, ifr->ifr_slave);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e16487c..2730608 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -487,7 +487,7 @@ static inline bool tun_not_capable(struct tun_struct *tun)
 
 	return ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
 		  (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
-		!ns_capable(net->user_ns, CAP_NET_ADMIN);
+		!ns_capable(net->ns.user_ns, CAP_NET_ADMIN);
 }
 
 static void tun_set_real_num_queues(struct tun_struct *tun)
@@ -1737,7 +1737,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		int queues = ifr->ifr_flags & IFF_MULTI_QUEUE ?
 			     MAX_TAP_QUEUES : 1;
 
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		err = security_tun_dev_create();
 		if (err < 0)
diff --git a/fs/mount.h b/fs/mount.h
index 14db05d..532dd92 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -9,7 +9,6 @@ struct mnt_namespace {
 	struct ns_common	ns;
 	struct mount *	root;
 	struct list_head	list;
-	struct user_namespace	*user_ns;
 	u64			seq;	/* Sequence number to prevent loops */
 	wait_queue_head_t poll;
 	u64 event;
diff --git a/fs/namespace.c b/fs/namespace.c
index 419f746..22b0dbc 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1582,7 +1582,7 @@ out_unlock:
  */
 static inline bool may_mount(void)
 {
-	return ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN);
+	return ns_capable(current->nsproxy->mnt_ns->ns.user_ns, CAP_SYS_ADMIN);
 }
 
 static inline bool may_mandlock(void)
@@ -2187,7 +2187,7 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	if ((mnt->mnt.mnt_flags & MNT_LOCK_NODEV) &&
 	    !(mnt_flags & MNT_NODEV)) {
 		/* Was the nodev implicitly added in mount? */
-		if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
+		if ((mnt->mnt_ns->ns.user_ns != &init_user_ns) &&
 		    !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
 			mnt_flags |= MNT_NODEV;
 		} else {
@@ -2386,7 +2386,7 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 			int mnt_flags, const char *name, void *data)
 {
 	struct file_system_type *type;
-	struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
+	struct user_namespace *user_ns = current->nsproxy->mnt_ns->ns.user_ns;
 	struct vfsmount *mnt;
 	int err;
 
@@ -2744,7 +2744,7 @@ dput_out:
 static void free_mnt_ns(struct mnt_namespace *ns)
 {
 	ns_free_inum(&ns->ns);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	kfree(ns);
 }
 
@@ -2777,7 +2777,7 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns)
 	INIT_LIST_HEAD(&new_ns->list);
 	init_waitqueue_head(&new_ns->poll);
 	new_ns->event = 0;
-	new_ns->user_ns = get_user_ns(user_ns);
+	new_ns->ns.user_ns = get_user_ns(user_ns);
 	return new_ns;
 }
 
@@ -2807,7 +2807,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 	namespace_lock();
 	/* First pass: copy the tree topology */
 	copy_flags = CL_COPY_UNBINDABLE | CL_EXPIRE;
-	if (user_ns != ns->user_ns)
+	if (user_ns != ns->ns.user_ns)
 		copy_flags |= CL_SHARED_TO_SLAVE | CL_UNPRIVILEGED;
 	new = copy_tree(old, old->mnt.mnt_root, copy_flags);
 	if (IS_ERR(new)) {
@@ -3326,7 +3326,7 @@ static int mntns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	struct mnt_namespace *mnt_ns = to_mnt_ns(ns);
 	struct path root;
 
-	if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(mnt_ns->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_CHROOT) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
diff --git a/fs/pnode.c b/fs/pnode.c
index 9989970..e051f11 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -244,7 +244,7 @@ static int propagate_one(struct mount *m)
 	}
 		
 	/* Notice when we are propagating across user namespaces */
-	if (m->mnt_ns->user_ns != user_ns)
+	if (m->mnt_ns->ns.user_ns != user_ns)
 		type |= CL_UNPRIVILEGED;
 	child = copy_tree(last_source, last_source->mnt.mnt_root, type);
 	if (IS_ERR(child))
@@ -286,7 +286,7 @@ int propagate_mnt(struct mount *dest_mnt, struct mountpoint *dest_mp,
 	 * propagate_one(); everything is serialized by namespace_sem,
 	 * so globals will do just fine.
 	 */
-	user_ns = current->nsproxy->mnt_ns->user_ns;
+	user_ns = current->nsproxy->mnt_ns->ns.user_ns;
 	last_dest = dest_mnt;
 	first_source = source_mnt;
 	last_source = source_mnt;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 0670278..aae5104 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -113,7 +113,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 		options = data;
 
 		/* Does the mounter have privilege over the pid namespace? */
-		if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+		if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN))
 			return ERR_PTR(-EPERM);
 	}
 
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index a20320c..f531cc5 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -619,7 +619,6 @@ static inline void cgroup_sk_free(struct sock_cgroup_data *skcd) {}
 struct cgroup_namespace {
 	atomic_t		count;
 	struct ns_common	ns;
-	struct user_namespace	*user_ns;
 	struct css_set          *root_cset;
 };
 
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 1eee6bc..0f9d806 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -56,9 +56,6 @@ struct ipc_namespace {
 	unsigned int    mq_msg_default;
 	unsigned int    mq_msgsize_default;
 
-	/* user_ns which owns the ipc ns */
-	struct user_namespace *user_ns;
-
 	struct ns_common ns;
 };
 
diff --git a/include/linux/ns_common.h b/include/linux/ns_common.h
index 85a5c8c..af2f30d 100644
--- a/include/linux/ns_common.h
+++ b/include/linux/ns_common.h
@@ -4,6 +4,7 @@
 struct proc_ns_operations;
 
 struct ns_common {
+	struct user_namespace *user_ns; /* Owning user namespace */
 	atomic_long_t stashed;
 	const struct proc_ns_operations *ops;
 	unsigned int inum;
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 918b117..b1802c6 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -39,7 +39,6 @@ struct pid_namespace {
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	struct fs_pin *bacct;
 #endif
-	struct user_namespace *user_ns;
 	struct work_struct proc_work;
 	kgid_t pid_gid;
 	int hide_pid;
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 8297e5b..a941b44 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -27,11 +27,15 @@ struct user_namespace {
 	struct uid_gid_map	gid_map;
 	struct uid_gid_map	projid_map;
 	atomic_t		count;
-	struct user_namespace	*parent;
 	int			level;
 	kuid_t			owner;
 	kgid_t			group;
-	struct ns_common	ns;
+
+	/* ->ns.user_ns and ->parent are synonyms */
+	union {
+		struct user_namespace	*parent;
+		struct ns_common	ns;
+	};
 	unsigned long		flags;
 
 	/* Register of per-UID persistent keyrings for this namespace */
diff --git a/include/linux/utsname.h b/include/linux/utsname.h
index 5093f58..78c9ef8 100644
--- a/include/linux/utsname.h
+++ b/include/linux/utsname.h
@@ -23,7 +23,6 @@ extern struct user_namespace init_user_ns;
 struct uts_namespace {
 	struct kref kref;
 	struct new_utsname name;
-	struct user_namespace *user_ns;
 	struct ns_common ns;
 };
 extern struct uts_namespace init_uts_ns;
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..acb714e 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -59,7 +59,6 @@ struct net {
 	struct list_head	cleanup_list;	/* namespaces on death row */
 	struct list_head	exit_list;	/* Use only net_mutex */
 
-	struct user_namespace   *user_ns;	/* Owning user namespace */
 	spinlock_t		nsid_lock;
 	struct idr		netns_ids;
 
diff --git a/init/version.c b/init/version.c
index fe41a63..51ac701 100644
--- a/init/version.c
+++ b/init/version.c
@@ -34,7 +34,7 @@ struct uts_namespace init_uts_ns = {
 		.machine	= UTS_MACHINE,
 		.domainname	= UTS_DOMAINNAME,
 	},
-	.user_ns = &init_user_ns,
+	.ns.user_ns = &init_user_ns,
 	.ns.inum = PROC_UTS_INIT_INO,
 #ifdef CONFIG_UTS_NS
 	.ns.ops = &utsns_operations,
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index ade739f..378cec6 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -331,7 +331,7 @@ static struct dentry *mqueue_mount(struct file_system_type *fs_type,
 		/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
 		 * over the ipc namespace.
 		 */
-		if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+		if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN))
 			return ERR_PTR(-EPERM);
 
 		data = ns;
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index ed81aaf..b2e570c 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -30,7 +30,7 @@ DEFINE_SPINLOCK(mq_lock);
  */
 struct ipc_namespace init_ipc_ns = {
 	.count		= ATOMIC_INIT(1),
-	.user_ns = &init_user_ns,
+	.ns.user_ns = &init_user_ns,
 	.ns.inum = PROC_IPC_INIT_INO,
 #ifdef CONFIG_IPC_NS
 	.ns.ops = &ipcns_operations,
diff --git a/ipc/namespace.c b/ipc/namespace.c
index 068caf1..d9f663b8 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -46,7 +46,7 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 	msg_init_ns(ns);
 	shm_init_ns(ns);
 
-	ns->user_ns = get_user_ns(user_ns);
+	ns->ns.user_ns = get_user_ns(user_ns);
 
 	return ns;
 }
@@ -97,7 +97,7 @@ static void free_ipc_ns(struct ipc_namespace *ns)
 	shm_exit_ns(ns);
 	atomic_dec(&nr_ipc_ns);
 
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	ns_free_inum(&ns->ns);
 	kfree(ns);
 }
@@ -155,7 +155,7 @@ static void ipcns_put(struct ns_common *ns)
 static int ipcns_install(struct nsproxy *nsproxy, struct ns_common *new)
 {
 	struct ipc_namespace *ns = to_ipc_ns(new);
-	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/ipc/shm.c b/ipc/shm.c
index 1328251..20546f1 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1024,7 +1024,7 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
 			goto out_unlock0;
 		}
 
-		if (!ns_capable(ns->user_ns, CAP_IPC_LOCK)) {
+		if (!ns_capable(ns->ns.user_ns, CAP_IPC_LOCK)) {
 			kuid_t euid = current_euid();
 			if (!uid_eq(euid, shp->shm_perm.uid) &&
 			    !uid_eq(euid, shp->shm_perm.cuid)) {
diff --git a/ipc/util.c b/ipc/util.c
index 798cad1..2a1a700 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -491,7 +491,7 @@ int ipcperms(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp, short flag)
 		granted_mode >>= 3;
 	/* is there some bit set in requested_mode but not in granted_mode? */
 	if ((requested_mode & ~granted_mode & 0007) &&
-	    !ns_capable(ns->user_ns, CAP_IPC_OWNER))
+	    !ns_capable(ns->ns.user_ns, CAP_IPC_OWNER))
 		return -1;
 
 	return security_ipc_permission(ipcp, flag);
@@ -700,7 +700,7 @@ struct kern_ipc_perm *ipcctl_pre_down_nolock(struct ipc_namespace *ns,
 
 	euid = current_euid();
 	if (uid_eq(euid, ipcp->cuid) || uid_eq(euid, ipcp->uid)  ||
-	    ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+	    ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN))
 		return ipcp; /* successful lookup */
 err:
 	return ERR_PTR(err);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 75c0ff0..3635600 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -221,7 +221,7 @@ static u16 have_free_callback __read_mostly;
 /* cgroup namespace for init task */
 struct cgroup_namespace init_cgroup_ns = {
 	.count		= { .counter = 2, },
-	.user_ns	= &init_user_ns,
+	.ns.user_ns	= &init_user_ns,
 	.ns.ops		= &cgroupns_operations,
 	.ns.inum	= PROC_CGROUP_INIT_INO,
 	.root_cset	= &init_css_set,
@@ -2094,7 +2094,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 	get_cgroup_ns(ns);
 
 	/* Check if the caller has permission to mount. */
-	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN)) {
+	if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN)) {
 		put_cgroup_ns(ns);
 		return ERR_PTR(-EPERM);
 	}
@@ -5609,7 +5609,7 @@ int __init cgroup_init(void)
 	BUG_ON(cgroup_init_cftypes(NULL, cgroup_dfl_base_files));
 	BUG_ON(cgroup_init_cftypes(NULL, cgroup_legacy_base_files));
 
-	get_user_ns(init_cgroup_ns.user_ns);
+	get_user_ns(init_cgroup_ns.ns.user_ns);
 
 	mutex_lock(&cgroup_mutex);
 
@@ -6285,7 +6285,7 @@ static struct cgroup_namespace *alloc_cgroup_ns(void)
 void free_cgroup_ns(struct cgroup_namespace *ns)
 {
 	put_css_set(ns->root_cset);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	ns_free_inum(&ns->ns);
 	kfree(ns);
 }
@@ -6324,7 +6324,7 @@ struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
 		return new_ns;
 	}
 
-	new_ns->user_ns = get_user_ns(user_ns);
+	new_ns->ns.user_ns = get_user_ns(user_ns);
 	new_ns->root_cset = cset;
 
 	return new_ns;
@@ -6340,7 +6340,7 @@ static int cgroupns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	struct cgroup_namespace *cgroup_ns = to_cg_ns(ns);
 
 	if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN) ||
-	    !ns_capable(cgroup_ns->user_ns, CAP_SYS_ADMIN))
+	    !ns_capable(cgroup_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/* Don't need to do anything if we are attaching to our own cgroupns. */
diff --git a/kernel/pid.c b/kernel/pid.c
index f66162f..c63f992d 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -78,7 +78,7 @@ struct pid_namespace init_pid_ns = {
 	.nr_hashed = PIDNS_HASH_ADDING,
 	.level = 0,
 	.child_reaper = &init_task,
-	.user_ns = &init_user_ns,
+	.ns.user_ns = &init_user_ns,
 	.ns.inum = PROC_PID_INIT_INO,
 #ifdef CONFIG_PID_NS
 	.ns.ops = &pidns_operations,
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index a65ba13..3529a03 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -113,7 +113,7 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
 	kref_init(&ns->kref);
 	ns->level = level;
 	ns->parent = get_pid_ns(parent_pid_ns);
-	ns->user_ns = get_user_ns(user_ns);
+	ns->ns.user_ns = get_user_ns(user_ns);
 	ns->nr_hashed = PIDNS_HASH_ADDING;
 	INIT_WORK(&ns->proc_work, proc_cleanup_work);
 
@@ -146,7 +146,7 @@ static void destroy_pid_namespace(struct pid_namespace *ns)
 	ns_free_inum(&ns->ns);
 	for (i = 0; i < PIDMAP_ENTRIES; i++)
 		kfree(ns->pidmap[i].page);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	call_rcu(&ns->rcu, delayed_free_pidns);
 }
 
@@ -276,7 +276,7 @@ static int pid_ns_ctl_handler(struct ctl_table *table, int write,
 	struct pid_namespace *pid_ns = task_active_pid_ns(current);
 	struct ctl_table tmp = *table;
 
-	if (write && !ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN))
+	if (write && !ns_capable(pid_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/*
@@ -362,7 +362,7 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	struct pid_namespace *active = task_active_pid_ns(current);
 	struct pid_namespace *ancestor, *new = to_pid_ns(ns);
 
-	if (!ns_capable(new->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(new->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/kernel/reboot.c b/kernel/reboot.c
index bd30a97..38f81a6 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -285,7 +285,7 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
 	int ret = 0;
 
 	/* We only trust the superuser with rebooting the system. */
-	if (!ns_capable(pid_ns->user_ns, CAP_SYS_BOOT))
+	if (!ns_capable(pid_ns->ns.user_ns, CAP_SYS_BOOT))
 		return -EPERM;
 
 	/* For safety, we require "magic" arguments. */
diff --git a/kernel/sys.c b/kernel/sys.c
index 89d5be4..9db5647 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1217,7 +1217,7 @@ SYSCALL_DEFINE2(sethostname, char __user *, name, int, len)
 	int errno;
 	char tmp[__NEW_UTS_LEN];
 
-	if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
+	if (!ns_capable(current->nsproxy->uts_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	if (len < 0 || len > __NEW_UTS_LEN)
@@ -1268,7 +1268,7 @@ SYSCALL_DEFINE2(setdomainname, char __user *, name, int, len)
 	int errno;
 	char tmp[__NEW_UTS_LEN];
 
-	if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
+	if (!ns_capable(current->nsproxy->uts_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 	if (len < 0 || len > __NEW_UTS_LEN)
 		return -EINVAL;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 9bafc21..a5bc78c 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -96,6 +96,10 @@ int create_user_ns(struct cred *new)
 	ns->ns.ops = &userns_operations;
 
 	atomic_set(&ns->count, 1);
+
+	/* ->ns.user_ns and ->parent are synonyms. */
+	BUILD_BUG_ON(&ns->ns.user_ns != &ns->parent);
+
 	/* Leave the new->user_ns reference with the new user namespace. */
 	ns->parent = parent_ns;
 	ns->level = parent_ns->level + 1;
diff --git a/kernel/utsname.c b/kernel/utsname.c
index 831ea71..40a119a 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -52,7 +52,7 @@ static struct uts_namespace *clone_uts_ns(struct user_namespace *user_ns,
 
 	down_read(&uts_sem);
 	memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
-	ns->user_ns = get_user_ns(user_ns);
+	ns->ns.user_ns = get_user_ns(user_ns);
 	up_read(&uts_sem);
 	return ns;
 }
@@ -85,7 +85,7 @@ void free_uts_ns(struct kref *kref)
 	struct uts_namespace *ns;
 
 	ns = container_of(kref, struct uts_namespace, kref);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	ns_free_inum(&ns->ns);
 	kfree(ns);
 }
@@ -120,7 +120,7 @@ static int utsns_install(struct nsproxy *nsproxy, struct ns_common *new)
 {
 	struct uts_namespace *ns = to_uts_ns(new);
 
-	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 82a116b..6c46a80 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -541,7 +541,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 	switch (args.cmd) {
 	case SET_VLAN_INGRESS_PRIORITY_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		vlan_dev_set_ingress_priority(dev,
 					      args.u.skb_priority,
@@ -551,7 +551,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_EGRESS_PRIORITY_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = vlan_dev_set_egress_priority(dev,
 						   args.u.skb_priority,
@@ -560,7 +560,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_FLAG_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = vlan_dev_change_flags(dev,
 					    args.vlan_qos ? args.u.flag : 0,
@@ -569,7 +569,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_NAME_TYPE_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		if ((args.u.name_type >= 0) &&
 		    (args.u.name_type < VLAN_NAME_TYPE_HIGHEST)) {
@@ -585,14 +585,14 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case ADD_VLAN_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = register_vlan_device(dev, args.u.VID);
 		break;
 
 	case DEL_VLAN_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		unregister_vlan_dev(dev, NULL);
 		err = 0;
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index d99b200..2fdea4f 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -90,7 +90,7 @@ static int add_del_if(struct net_bridge *br, int ifindex, int isadd)
 	struct net_device *dev;
 	int ret;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	dev = __dev_get_by_index(net, ifindex);
@@ -182,28 +182,28 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	}
 
 	case BRCTL_SET_BRIDGE_FORWARD_DELAY:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_forward_delay(br, args[1]);
 		break;
 
 	case BRCTL_SET_BRIDGE_HELLO_TIME:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_hello_time(br, args[1]);
 		break;
 
 	case BRCTL_SET_BRIDGE_MAX_AGE:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_max_age(br, args[1]);
 		break;
 
 	case BRCTL_SET_AGEING_TIME:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_ageing_time(br, args[1]);
@@ -243,7 +243,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	}
 
 	case BRCTL_SET_BRIDGE_STP_STATE:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		br_stp_set_enabled(br, args[1]);
@@ -251,7 +251,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 		break;
 
 	case BRCTL_SET_BRIDGE_PRIORITY:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		br_stp_set_bridge_priority(br, args[1]);
@@ -260,7 +260,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 
 	case BRCTL_SET_PORT_PRIORITY:
 	{
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		spin_lock_bh(&br->lock);
@@ -274,7 +274,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 
 	case BRCTL_SET_PATH_COST:
 	{
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		spin_lock_bh(&br->lock);
@@ -337,7 +337,7 @@ static int old_deviceless(struct net *net, void __user *uarg)
 	{
 		char buf[IFNAMSIZ];
 
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(buf, (void __user *)args[1], IFNAMSIZ))
@@ -367,7 +367,7 @@ int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __user *uar
 	{
 		char buf[IFNAMSIZ];
 
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(buf, uarg, IFNAMSIZ))
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index beb4707..06d417e 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -36,7 +36,7 @@ static ssize_t store_bridge_parm(struct device *d,
 	unsigned long val;
 	int err;
 
-	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	val = simple_strtoul(buf, &endp, 0);
@@ -285,7 +285,7 @@ static ssize_t group_addr_store(struct device *d,
 	u8 new_addr[6];
 	int i;
 
-	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (sscanf(buf, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 1e04d4d..e7ceab1 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -241,7 +241,7 @@ static ssize_t brport_store(struct kobject *kobj,
 	char *endp;
 	unsigned long val;
 
-	if (!ns_capable(dev_net(p->dev)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(p->dev)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	val = simple_strtoul(buf, &endp, 0);
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 5a61f35..dab0cc2 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1496,7 +1496,7 @@ static int do_ebt_set_ctl(struct sock *sk,
 	int ret;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1519,7 +1519,7 @@ static int do_ebt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 	struct ebt_table *t;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&tmp, user, sizeof(tmp)))
@@ -2303,7 +2303,7 @@ static int compat_do_ebt_set_ctl(struct sock *sk,
 	int ret;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2327,7 +2327,7 @@ static int compat_do_ebt_get_ctl(struct sock *sk, int cmd,
 	struct ebt_table *t;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* try real handler in case userland supplied needed padding */
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index b94b1d2..a705922 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -474,7 +474,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCGMIIPHY:
 	case SIOCGMIIREG:
 	case SIOCSIFNAME:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		dev_load(net, ifr.ifr_name);
 		rtnl_lock();
@@ -522,7 +522,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCBRADDIF:
 	case SIOCBRDELIF:
 	case SIOCSHWTSTAMP:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		/* fall through */
 	case SIOCBONDSLAVEINFOQUERY:
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f403481..27a3085 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2480,7 +2480,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_GTUNABLE:
 		break;
 	default:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	}
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 510cd62..8df69fd 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -3169,7 +3169,7 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
 	}
 
 	/* Don't export sysctls to unprivileged users */
-	if (neigh_parms_net(p)->user_ns != &init_user_ns)
+	if (neigh_parms_net(p)->ns.user_ns != &init_user_ns)
 		t->neigh_vars[0].procname = NULL;
 
 	switch (neigh_parms_family(p)) {
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 7a0b616..eb20bc7 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -85,7 +85,7 @@ static ssize_t netdev_store(struct device *dev, struct device_attribute *attr,
 	unsigned long new;
 	int ret = -EINVAL;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	ret = kstrtoul(buf, 0, &new);
@@ -362,7 +362,7 @@ static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
 	size_t count = len;
 	ssize_t ret;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* ignore trailing newline */
@@ -1390,7 +1390,7 @@ static bool net_current_may_mount(void)
 {
 	struct net *net = current->nsproxy->net_ns;
 
-	return ns_capable(net->user_ns, CAP_SYS_ADMIN);
+	return ns_capable(net->ns.user_ns, CAP_SYS_ADMIN);
 }
 
 static void *net_grab_current_ns(void)
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2c2eb1b..3433f0c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -279,7 +279,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	atomic_set(&net->count, 1);
 	atomic_set(&net->passive, 1);
 	net->dev_base_seq = 1;
-	net->user_ns = user_ns;
+	net->ns.user_ns = user_ns;
 	idr_init(&net->netns_ids);
 	spin_lock_init(&net->nsid_lock);
 
@@ -444,7 +444,7 @@ static void cleanup_net(struct work_struct *work)
 	/* Finally it is safe to free my network namespace structure */
 	list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {
 		list_del_init(&net->exit_list);
-		put_user_ns(net->user_ns);
+		put_user_ns(net->ns.user_ns);
 		net_drop_ns(net);
 	}
 }
@@ -987,7 +987,7 @@ static int netns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 {
 	struct net *net = to_net_ns(ns);
 
-	if (!ns_capable(net->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(net->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d69c464..ea7ba06 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1785,7 +1785,7 @@ static int do_setlink(const struct sk_buff *skb,
 			err = PTR_ERR(net);
 			goto errout;
 		}
-		if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) {
+		if (!netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN)) {
 			put_net(net);
 			err = -EPERM;
 			goto errout;
@@ -2430,7 +2430,7 @@ replay:
 			return PTR_ERR(dest_net);
 
 		err = -EPERM;
-		if (!netlink_ns_capable(skb, dest_net->user_ns, CAP_NET_ADMIN))
+		if (!netlink_ns_capable(skb, dest_net->ns.user_ns, CAP_NET_ADMIN))
 			goto out;
 
 		if (tb[IFLA_LINK_NETNSID]) {
@@ -2442,7 +2442,7 @@ replay:
 				goto out;
 			}
 			err = -EPERM;
-			if (!netlink_ns_capable(skb, link_net->user_ns, CAP_NET_ADMIN))
+			if (!netlink_ns_capable(skb, link_net->ns.user_ns, CAP_NET_ADMIN))
 				goto out;
 		}
 
diff --git a/net/core/scm.c b/net/core/scm.c
index 2696aef..1a2301a 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -54,7 +54,7 @@ static __inline__ int scm_check_creds(struct ucred *creds)
 		return -EINVAL;
 
 	if ((creds->pid == task_tgid_vnr(current) ||
-	     ns_capable(task_active_pid_ns(current)->user_ns, CAP_SYS_ADMIN)) &&
+	     ns_capable(task_active_pid_ns(current)->ns.user_ns, CAP_SYS_ADMIN)) &&
 	    ((uid_eq(uid, cred->uid)   || uid_eq(uid, cred->euid) ||
 	      uid_eq(uid, cred->suid)) || ns_capable(cred->user_ns, CAP_SETUID)) &&
 	    ((gid_eq(gid, cred->gid)   || gid_eq(gid, cred->egid) ||
diff --git a/net/core/sock.c b/net/core/sock.c
index 08bf97e..321ca3c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -191,7 +191,7 @@ EXPORT_SYMBOL(sk_capable);
  */
 bool sk_net_capable(const struct sock *sk, int cap)
 {
-	return sk_ns_capable(sk, sock_net(sk)->user_ns, cap);
+	return sk_ns_capable(sk, sock_net(sk)->ns.user_ns, cap);
 }
 EXPORT_SYMBOL(sk_net_capable);
 
@@ -534,7 +534,7 @@ static int sock_setbindtodevice(struct sock *sk, char __user *optval,
 
 	/* Sorry... */
 	ret = -EPERM;
-	if (!ns_capable(net->user_ns, CAP_NET_RAW))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		goto out;
 
 	ret = -EINVAL;
@@ -778,7 +778,7 @@ set_rcvbuf:
 
 	case SO_PRIORITY:
 		if ((val >= 0 && val <= 6) ||
-		    ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		    ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			sk->sk_priority = val;
 		else
 			ret = -EPERM;
@@ -945,7 +945,7 @@ set_rcvbuf:
 			clear_bit(SOCK_PASSSEC, &sock->flags);
 		break;
 	case SO_MARK:
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			ret = -EPERM;
 		else
 			sk->sk_mark = val;
@@ -1921,7 +1921,7 @@ int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
 
 	switch (cmsg->cmsg_type) {
 	case SO_MARK:
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
 			return -EINVAL;
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 6b10573..7151b43 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -303,7 +303,7 @@ static int sock_diag_bind(struct net *net, int group)
 
 int sock_diag_destroy(struct sock *sk, int err)
 {
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (!sk->sk_prot->diag_destroy)
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 0df2aa6..6f6749d 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -441,7 +441,7 @@ static __net_init int sysctl_core_net_init(struct net *net)
 		tbl[0].data = &net->core.sysctl_somaxconn;
 
 		/* Don't export any sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns) {
+		if (net->ns.user_ns != &init_user_ns) {
 			tbl[0].procname = NULL;
 		}
 	}
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 30d875d..9d002f4 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -512,7 +512,7 @@ static int __net_init lowpan_frags_ns_sysctl_register(struct net *net)
 		table[2].data = &ieee802154_lowpan->frags.timeout;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			table[0].procname = NULL;
 	}
 
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index e0bd013..6353184 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -895,8 +895,8 @@ static int dgram_setsockopt(struct sock *sk, int level, int optname,
 		ro->want_ack = !!val;
 		break;
 	case WPAN_SECURITY:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN) &&
-		    !ns_capable(net->user_ns, CAP_NET_RAW)) {
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN) &&
+		    !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 			err = -EPERM;
 			break;
 		}
@@ -919,8 +919,8 @@ static int dgram_setsockopt(struct sock *sk, int level, int optname,
 		}
 		break;
 	case WPAN_SECURITY_LEVEL:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN) &&
-		    !ns_capable(net->user_ns, CAP_NET_RAW)) {
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN) &&
+		    !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 			err = -EPERM;
 			break;
 		}
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index d39e9e4..bec3946 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -309,7 +309,7 @@ lookup_protocol:
 
 	err = -EPERM;
 	if (sock->type == SOCK_RAW && !kern &&
-	    !ns_capable(net->user_ns, CAP_NET_RAW))
+	    !ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
 	sock->ops = answer->ops;
@@ -475,7 +475,7 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	snum = ntohs(addr->sin_port);
 	err = -EACCES;
 	if (snum && snum < PROT_SOCK &&
-	    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
+	    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE))
 		goto out;
 
 	/*      We keep a pair of addresses. rcv_saddr is the one
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 89a8cac4..22517fb 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1140,7 +1140,7 @@ int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCDARP:
 	case SIOCSARP:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	case SIOCGARP:
 		err = copy_from_user(&r, arg, sizeof(struct arpreq));
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index e333bc8..fc8f1f2 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -961,7 +961,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 
 	case SIOCSIFFLAGS:
 		ret = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto out;
 		break;
 	case SIOCSIFADDR:	/* Set interface address (and family) */
@@ -969,7 +969,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCSIFDSTADDR:	/* Set the destination address */
 	case SIOCSIFNETMASK: 	/* Set the netmask for the interface */
 		ret = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto out;
 		ret = -EINVAL;
 		if (sin->sin_family != AF_INET)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index ef2ebeb..fbc7311 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -581,7 +581,7 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(&rt, arg, sizeof(rt)))
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index 4d158ff..dda262e 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -407,7 +407,7 @@ int ip_options_compile(struct net *net,
 					optptr[2] += 8;
 					break;
 				default:
-					if (!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) {
+					if (!skb && !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 						pp_ptr = optptr + 3;
 						goto error;
 					}
@@ -442,7 +442,7 @@ int ip_options_compile(struct net *net,
 				opt->router_alert = optptr - iph;
 			break;
 		case IPOPT_CIPSO:
-			if ((!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) || opt->cipso) {
+			if ((!skb && !ns_capable(net->ns.user_ns, CAP_NET_RAW)) || opt->cipso) {
 				pp_ptr = optptr;
 				goto error;
 			}
@@ -455,7 +455,7 @@ int ip_options_compile(struct net *net,
 		case IPOPT_SEC:
 		case IPOPT_SID:
 		default:
-			if (!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!skb && !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				pp_ptr = optptr;
 				goto error;
 			}
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 71a52f4d..474af75 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1138,14 +1138,14 @@ mc_msf_out:
 	case IP_IPSEC_POLICY:
 	case IP_XFRM_POLICY:
 		err = -EPERM;
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
 
 	case IP_TRANSPARENT:
-		if (!!val && !ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) &&
-		    !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
+		if (!!val && !ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_RAW) &&
+		    !ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN)) {
 			err = -EPERM;
 			break;
 		}
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index d8f5e0a..4ddc520 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -765,7 +765,7 @@ int ip_tunnel_ioctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 		if (p->iph.ttl)
 			p->iph.frag_off |= htons(IP_DF);
@@ -821,7 +821,7 @@ int ip_tunnel_ioctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == itn->fb_tunnel_dev) {
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 5ad48ec..df292fa 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1272,7 +1272,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval,
 	}
 	if (optname != MRT_INIT) {
 		if (sk != rcu_access_pointer(mrt->mroute_sk) &&
-		    !ns_capable(net->user_ns, CAP_NET_ADMIN)) {
+		    !ns_capable(net->ns.user_ns, CAP_NET_ADMIN)) {
 			ret = -EACCES;
 			goto out_unlock;
 		}
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 2033f92..e123093 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -1300,7 +1300,7 @@ static int compat_do_arpt_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1434,7 +1434,7 @@ static int compat_do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1455,7 +1455,7 @@ static int do_arpt_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1478,7 +1478,7 @@ static int do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 54906e0..b29238a 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1554,7 +1554,7 @@ compat_do_ipt_set_ctl(struct sock *sk,	int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1656,7 +1656,7 @@ compat_do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1678,7 +1678,7 @@ do_ipt_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1702,7 +1702,7 @@ do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a1f2830..ddb0003 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2787,7 +2787,7 @@ static __net_init int sysctl_route_net_init(struct net *net)
 			goto err_dup;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			tbl[0].procname = NULL;
 	}
 	tbl[0].extra1 = net;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5c7ed14..467b6cc 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2273,7 +2273,7 @@ EXPORT_SYMBOL(tcp_disconnect);
 
 static inline bool tcp_can_repair_sock(const struct sock *sk)
 {
-	return ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) &&
+	return ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN) &&
 		((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_ESTABLISHED));
 }
 
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 882caa4..385d0f4 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -354,7 +354,7 @@ int tcp_set_congestion_control(struct sock *sk, const char *name)
 	if (!ca)
 		err = -ENOENT;
 	else if (!((ca->flags & TCP_CONG_NON_RESTRICTED) ||
-		   ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)))
+		   ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN)))
 		err = -EPERM;
 	else if (!try_module_get(ca->owner))
 		err = -EBUSY;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 47f837a..9aaabf8 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2781,7 +2781,7 @@ int addrconf_add_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
@@ -2800,7 +2800,7 @@ int addrconf_del_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index bfa86f0..1491cbd 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -161,7 +161,7 @@ lookup_protocol:
 
 	err = -EPERM;
 	if (sock->type == SOCK_RAW && !kern &&
-	    !ns_capable(net->user_ns, CAP_NET_RAW))
+	    !ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
 	sock->ops = answer->ops;
@@ -286,7 +286,7 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		return -EINVAL;
 
 	snum = ntohs(addr->sin6_port);
-	if (snum && snum < PROT_SOCK && !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
+	if (snum && snum < PROT_SOCK && !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
 	lock_sock(sk);
diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c
index 514ac25..e168ca3 100644
--- a/net/ipv6/anycast.c
+++ b/net/ipv6/anycast.c
@@ -62,7 +62,7 @@ int ipv6_sock_ac_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
 
 	ASSERT_RTNL();
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (ipv6_addr_is_multicast(addr))
 		return -EINVAL;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 37874e2..92204ba 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -837,7 +837,7 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -857,7 +857,7 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -882,7 +882,7 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index b912f0d..c07e37e 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -569,7 +569,7 @@ int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen)
 		rcu_read_unlock_bh();
 
 		if (freq.flr_share == IPV6_FL_S_NONE &&
-		    ns_capable(net->user_ns, CAP_NET_ADMIN)) {
+		    ns_capable(net->ns.user_ns, CAP_NET_ADMIN)) {
 			fl = fl_lookup(net, freq.flr_label);
 			if (fl) {
 				err = fl6_renew(fl, freq.flr_linger, freq.flr_expires);
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 776d145..7f23d34 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -852,7 +852,7 @@ static int ip6gre_tunnel_ioctl(struct net_device *dev,
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -901,7 +901,7 @@ static int ip6gre_tunnel_ioctl(struct net_device *dev,
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == ign->fb_tunnel_dev) {
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7b0481e..fa9443c 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1484,7 +1484,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = -EFAULT;
 		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
@@ -1520,7 +1520,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		break;
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 
 		if (dev == ip6n->fb_tnl_dev) {
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d90a11f..ece8758 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -743,7 +743,7 @@ vti6_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = -EFAULT;
 		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
@@ -775,7 +775,7 @@ vti6_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		break;
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 
 		if (dev == ip6n->fb_tnl_dev) {
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 487ef3b..87a6a20 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1669,7 +1669,7 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
 		return -ENOENT;
 
 	if (optname != MRT6_INIT) {
-		if (sk != mrt->mroute6_sk && !ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (sk != mrt->mroute6_sk && !ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EACCES;
 	}
 
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index a9895e1..d5dc2aa 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -365,8 +365,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 		break;
 
 	case IPV6_TRANSPARENT:
-		if (valbool && !ns_capable(net->user_ns, CAP_NET_ADMIN) &&
-		    !ns_capable(net->user_ns, CAP_NET_RAW)) {
+		if (valbool && !ns_capable(net->ns.user_ns, CAP_NET_ADMIN) &&
+		    !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 			retv = -EPERM;
 			break;
 		}
@@ -404,7 +404,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 
 		/* hop-by-hop / destination options are privileged option */
 		retv = -EPERM;
-		if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
+		if (optname != IPV6_RTHDR && !ns_capable(net->ns.user_ns, CAP_NET_RAW))
 			break;
 
 		opt = rcu_dereference_protected(np->opt,
@@ -785,7 +785,7 @@ done:
 	case IPV6_IPSEC_POLICY:
 	case IPV6_XFRM_POLICY:
 		retv = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		retv = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 63e06c3..0f92561 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1573,7 +1573,7 @@ compat_do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1675,7 +1675,7 @@ compat_do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1697,7 +1697,7 @@ do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1721,7 +1721,7 @@ do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 2160d5d..4efbd91 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -645,7 +645,7 @@ static int __net_init ip6_frags_ns_sysctl_register(struct net *net)
 		table[2].data = &net->ipv6.frags.timeout;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			table[0].procname = NULL;
 	}
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 520b788..938a7aa 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2468,7 +2468,7 @@ int ipv6_route_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		err = copy_from_user(&rtmsg, arg,
 				     sizeof(struct in6_rtmsg));
@@ -3594,7 +3594,7 @@ struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
 		table[9].data = &net->ipv6.sysctl.ip6_rt_gc_min_interval;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			table[0].procname = NULL;
 	}
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0619ac7..196f476 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1181,7 +1181,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -1229,7 +1229,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == sitn->fb_tunnel_dev) {
@@ -1260,7 +1260,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCDELPRL:
 	case SIOCCHGPRL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 		err = -EINVAL;
 		if (dev == sitn->fb_tunnel_dev)
@@ -1287,7 +1287,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCCHG6RD:
 	case SIOCDEL6RD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
diff --git a/net/key/af_key.c b/net/key/af_key.c
index f9c9ecb..47183e9 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -141,7 +141,7 @@ static int pfkey_create(struct net *net, struct socket *sock, int protocol,
 	struct sock *sk;
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (sock->type != SOCK_RAW)
 		return -ESOCKTNOSUPPORT;
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 8ae3ed9..41c3da3 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -160,7 +160,7 @@ static int llc_ui_create(struct net *net, struct socket *sock, int protocol,
 	struct sock *sk;
 	int rc = -ESOCKTNOSUPPORT;
 
-	if (!ns_capable(net->user_ns, CAP_NET_RAW))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		return -EPERM;
 
 	if (!net_eq(net, &init_net))
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index a748b0c..46745a7 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1901,7 +1901,7 @@ ip_set_sockfn_get(struct sock *sk, int optval, void __user *user, int *len)
 	struct net *net = sock_net(sk);
 	struct ip_set_net *inst = ip_set_pernet(net);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (optval != SO_IP_SET)
 		return -EBADF;
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index c3c809b..a02b3b3 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2360,7 +2360,7 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
 	BUILD_BUG_ON(sizeof(arg) > 255);
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_SET_MAX)
@@ -2678,7 +2678,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 
 	BUG_ON(!net);
 	BUILD_BUG_ON(sizeof(arg) > 255);
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_GET_MAX)
@@ -3906,7 +3906,7 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs)
 			return -ENOMEM;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			tbl[0].procname = NULL;
 	} else
 		tbl = vs_vars;
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index cccf4d6..23a3ec3 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -564,7 +564,7 @@ static int __net_init __ip_vs_lblc_init(struct net *net)
 			return -ENOMEM;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			ipvs->lblc_ctl_table[0].procname = NULL;
 
 	} else
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 796d70e..704ad5c 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -750,7 +750,7 @@ static int __net_init __ip_vs_lblcr_init(struct net *net)
 			return -ENOMEM;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			ipvs->lblcr_ctl_table[0].procname = NULL;
 	} else
 		ipvs->lblcr_ctl_table = vs_vars_table;
diff --git a/net/netfilter/nf_conntrack_acct.c b/net/netfilter/nf_conntrack_acct.c
index 45da11a..9303901 100644
--- a/net/netfilter/nf_conntrack_acct.c
+++ b/net/netfilter/nf_conntrack_acct.c
@@ -74,7 +74,7 @@ static int nf_conntrack_acct_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_acct;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.acct_sysctl_header = register_net_sysctl(net, "net/netfilter",
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index d28011b..22411e5 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -358,7 +358,7 @@ static int nf_conntrack_event_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_events;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.event_sysctl_header =
diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c
index 9e36931..c1e6242 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -618,8 +618,8 @@ static int exp_proc_init(struct net *net)
 	if (!proc)
 		return -ENOMEM;
 
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 	if (uid_valid(root_uid) && gid_valid(root_gid))
 		proc_set_user(proc, root_uid, root_gid);
 #endif /* CONFIG_NF_CONNTRACK_PROCFS */
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 196cb39..4cff85b 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -67,7 +67,7 @@ static int nf_conntrack_helper_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_auto_assign_helper;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.helper_sysctl_header =
diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c
index 399a38f..766dbee 100644
--- a/net/netfilter/nf_conntrack_proto_dccp.c
+++ b/net/netfilter/nf_conntrack_proto_dccp.c
@@ -841,7 +841,7 @@ static int dccp_kmemdup_sysctl_table(struct net *net, struct nf_proto_net *pn,
 	pn->ctl_table[7].data = &dn->dccp_loose;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		pn->ctl_table[0].procname = NULL;
 #endif
 	return 0;
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index c026c47..8796e36 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -397,8 +397,8 @@ static int nf_conntrack_standalone_init_proc(struct net *net)
 	if (!pde)
 		goto out_nf_conntrack;
 
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 	if (uid_valid(root_uid) && gid_valid(root_gid))
 		proc_set_user(pde, root_uid, root_gid);
 
@@ -512,7 +512,7 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
 	table[4].data = &net->ct.sysctl_log_invalid;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.sysctl_header = register_net_sysctl(net, "net/netfilter", table);
diff --git a/net/netfilter/nf_conntrack_timestamp.c b/net/netfilter/nf_conntrack_timestamp.c
index 7a394df..43bd240 100644
--- a/net/netfilter/nf_conntrack_timestamp.c
+++ b/net/netfilter/nf_conntrack_timestamp.c
@@ -52,7 +52,7 @@ static int nf_conntrack_tstamp_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_tstamp;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.tstamp_sysctl_header = register_net_sysctl(net,	"net/netfilter",
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 11f81c8..5428b8e 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -1072,8 +1072,8 @@ static int __net_init nfnl_log_net_init(struct net *net)
 	if (!proc)
 		return -ENOMEM;
 
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 	if (uid_valid(root_uid) && gid_valid(root_gid))
 		proc_set_user(proc, root_uid, root_gid);
 #endif
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 2675d58..d840aa6 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1493,8 +1493,8 @@ int xt_proto_init(struct net *net, u_int8_t af)
 
 
 #ifdef CONFIG_PROC_FS
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 
 	strlcpy(buf, xt_prefix[af], sizeof(buf));
 	strlcat(buf, FORMAT_TABLES, sizeof(buf));
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 627f898..070e24d 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -828,14 +828,14 @@ EXPORT_SYMBOL(netlink_capable);
  */
 bool netlink_net_capable(const struct sk_buff *skb, int cap)
 {
-	return netlink_ns_capable(skb, sock_net(skb->sk)->user_ns, cap);
+	return netlink_ns_capable(skb, sock_net(skb->sk)->ns.user_ns, cap);
 }
 EXPORT_SYMBOL(netlink_net_capable);
 
 static inline int netlink_allowed(const struct socket *sock, unsigned int flag)
 {
 	return (nl_table[sock->sk->sk_protocol].flags & flag) ||
-		ns_capable(sock_net(sock->sk)->user_ns, CAP_NET_ADMIN);
+		ns_capable(sock_net(sock->sk)->ns.user_ns, CAP_NET_ADMIN);
 }
 
 static void
@@ -1323,7 +1323,7 @@ static void do_one_broadcast(struct sock *sk,
 		if (!peernet_has_id(sock_net(sk), p->net))
 			return;
 
-		if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
+		if (!file_ns_capable(sk->sk_socket->file, p->net->ns.user_ns,
 				     CAP_NET_BROADCAST))
 			return;
 	}
@@ -1586,7 +1586,7 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 		err = 0;
 		break;
 	case NETLINK_LISTEN_ALL_NSID:
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_BROADCAST))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_BROADCAST))
 			return -EPERM;
 
 		if (val)
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index a09132a..831e863 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -561,7 +561,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
 		return -EPERM;
 
 	if ((ops->flags & GENL_UNS_ADMIN_PERM) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) {
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 9f0983f..8172443 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3208,7 +3208,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
 	__be16 proto = (__force __be16)protocol; /* weird, but documented */
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_RAW))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		return -EPERM;
 	if (sock->type != SOCK_DGRAM && sock->type != SOCK_RAW &&
 	    sock->type != SOCK_PACKET)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index a75864d..249a340 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -140,7 +140,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
 	int tp_created = 0;
 
 	if ((n->nlmsg_type != RTM_GETTFILTER) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 replay:
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..783f495 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1123,7 +1123,7 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n)
 	int err;
 
 	if ((n->nlmsg_type != RTM_GETQDISC) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	err = nlmsg_parse(n, sizeof(*tcm), tca, TCA_MAX, NULL);
@@ -1190,7 +1190,7 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n)
 	struct Qdisc *q, *p;
 	int err;
 
-	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	if (!netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 replay:
@@ -1539,7 +1539,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n)
 	int err;
 
 	if ((n->nlmsg_type != RTM_GETTCLASS) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	err = nlmsg_parse(n, sizeof(*tcm), tca, TCA_MAX, NULL);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 67154b8..bb65b08 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -361,7 +361,7 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
 	}
 
 	if (snum && snum < PROT_SOCK &&
-	    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
+	    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
 	/* See if the address matches any of the addresses we may have
@@ -1153,7 +1153,7 @@ static int __sctp_connect(struct sock *sk,
 				 * be permitted to open new associations.
 				 */
 				if (ep->base.bind_addr.port < PROT_SOCK &&
-				    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) {
+				    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE)) {
 					err = -EACCES;
 					goto out_free;
 				}
@@ -1815,7 +1815,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 			 * associations.
 			 */
 			if (ep->base.bind_addr.port < PROT_SOCK &&
-			    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) {
+			    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE)) {
 				err = -EACCES;
 				goto out_unlock;
 			}
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index ed98c1f..cb46bc9 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -42,11 +42,11 @@ static int net_ctl_permissions(struct ctl_table_header *head,
 			       struct ctl_table *table)
 {
 	struct net *net = container_of(head->set, struct net, sysctls);
-	kuid_t root_uid = make_kuid(net->user_ns, 0);
-	kgid_t root_gid = make_kgid(net->user_ns, 0);
+	kuid_t root_uid = make_kuid(net->ns.user_ns, 0);
+	kgid_t root_gid = make_kgid(net->ns.user_ns, 0);
 
 	/* Allow network administrator to have same access as root. */
-	if (ns_capable(net->user_ns, CAP_NET_ADMIN) ||
+	if (ns_capable(net->ns.user_ns, CAP_NET_ADMIN) ||
 	    uid_eq(root_uid, current_euid())) {
 		int mode = (table->mode >> 6) & 7;
 		return (mode << 6) | (mode << 3) | mode;
diff --git a/net/unix/sysctl_net_unix.c b/net/unix/sysctl_net_unix.c
index b3d5150..b5aec8a 100644
--- a/net/unix/sysctl_net_unix.c
+++ b/net/unix/sysctl_net_unix.c
@@ -35,7 +35,7 @@ int __net_init unix_sysctl_register(struct net *net)
 		goto err_alloc;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	table[0].data = &net->unx.sysctl_max_dgram_qlen;
diff --git a/net/xfrm/xfrm_sysctl.c b/net/xfrm/xfrm_sysctl.c
index 05a6e3d..8d4b41f 100644
--- a/net/xfrm/xfrm_sysctl.c
+++ b/net/xfrm/xfrm_sysctl.c
@@ -55,7 +55,7 @@ int __net_init xfrm_sysctl_init(struct net *net)
 	table[3].data = &net->xfrm.sysctl_acq_expires;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->xfrm.sysctl_hdr = register_net_sysctl(net, "net/core", table);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 1/5] namespaces: move user_ns into ns_common
  2016-07-14 18:20 ` Andrey Vagin
  (?)
@ 2016-07-14 18:20 ` Andrey Vagin
       [not found]   ` <1468520419-28220-2-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
  -1 siblings, 1 reply; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 18:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-api, containers, criu, linux-fsdevel, Andrey Vagin

Every namespace has a pointer to an user namespace where is was created,
but they're all privately embedded in the individual namespace specific
structures.

Now we are going to add an user-space interface to get an owning user
namespace, so it looks reasonable to move it into ns_common.

Originally this idea was suggested by James Bottomley.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 drivers/net/bonding/bond_main.c         |  2 +-
 drivers/net/tun.c                       |  4 ++--
 fs/mount.h                              |  1 -
 fs/namespace.c                          | 14 +++++++-------
 fs/pnode.c                              |  4 ++--
 fs/proc/root.c                          |  2 +-
 include/linux/cgroup.h                  |  1 -
 include/linux/ipc_namespace.h           |  3 ---
 include/linux/ns_common.h               |  1 +
 include/linux/pid_namespace.h           |  1 -
 include/linux/user_namespace.h          |  8 ++++++--
 include/linux/utsname.h                 |  1 -
 include/net/net_namespace.h             |  1 -
 init/version.c                          |  2 +-
 ipc/mqueue.c                            |  2 +-
 ipc/msgutil.c                           |  2 +-
 ipc/namespace.c                         |  6 +++---
 ipc/shm.c                               |  2 +-
 ipc/util.c                              |  4 ++--
 kernel/cgroup.c                         | 12 ++++++------
 kernel/pid.c                            |  2 +-
 kernel/pid_namespace.c                  |  8 ++++----
 kernel/reboot.c                         |  2 +-
 kernel/sys.c                            |  4 ++--
 kernel/user_namespace.c                 |  4 ++++
 kernel/utsname.c                        |  6 +++---
 net/8021q/vlan.c                        | 12 ++++++------
 net/bridge/br_ioctl.c                   | 22 +++++++++++-----------
 net/bridge/br_sysfs_br.c                |  4 ++--
 net/bridge/br_sysfs_if.c                |  2 +-
 net/bridge/netfilter/ebtables.c         |  8 ++++----
 net/core/dev_ioctl.c                    |  4 ++--
 net/core/ethtool.c                      |  2 +-
 net/core/neighbour.c                    |  2 +-
 net/core/net-sysfs.c                    |  6 +++---
 net/core/net_namespace.c                |  6 +++---
 net/core/rtnetlink.c                    |  6 +++---
 net/core/scm.c                          |  2 +-
 net/core/sock.c                         | 10 +++++-----
 net/core/sock_diag.c                    |  2 +-
 net/core/sysctl_net_core.c              |  2 +-
 net/ieee802154/6lowpan/reassembly.c     |  2 +-
 net/ieee802154/socket.c                 |  8 ++++----
 net/ipv4/af_inet.c                      |  4 ++--
 net/ipv4/arp.c                          |  2 +-
 net/ipv4/devinet.c                      |  4 ++--
 net/ipv4/fib_frontend.c                 |  2 +-
 net/ipv4/ip_options.c                   |  6 +++---
 net/ipv4/ip_sockglue.c                  |  6 +++---
 net/ipv4/ip_tunnel.c                    |  4 ++--
 net/ipv4/ipmr.c                         |  2 +-
 net/ipv4/netfilter/arp_tables.c         |  8 ++++----
 net/ipv4/netfilter/ip_tables.c          |  8 ++++----
 net/ipv4/route.c                        |  2 +-
 net/ipv4/tcp.c                          |  2 +-
 net/ipv4/tcp_cong.c                     |  2 +-
 net/ipv6/addrconf.c                     |  4 ++--
 net/ipv6/af_inet6.c                     |  4 ++--
 net/ipv6/anycast.c                      |  2 +-
 net/ipv6/datagram.c                     |  6 +++---
 net/ipv6/ip6_flowlabel.c                |  2 +-
 net/ipv6/ip6_gre.c                      |  4 ++--
 net/ipv6/ip6_tunnel.c                   |  4 ++--
 net/ipv6/ip6_vti.c                      |  4 ++--
 net/ipv6/ip6mr.c                        |  2 +-
 net/ipv6/ipv6_sockglue.c                |  8 ++++----
 net/ipv6/netfilter/ip6_tables.c         |  8 ++++----
 net/ipv6/reassembly.c                   |  2 +-
 net/ipv6/route.c                        |  4 ++--
 net/ipv6/sit.c                          |  8 ++++----
 net/key/af_key.c                        |  2 +-
 net/llc/af_llc.c                        |  2 +-
 net/netfilter/ipset/ip_set_core.c       |  2 +-
 net/netfilter/ipvs/ip_vs_ctl.c          |  6 +++---
 net/netfilter/ipvs/ip_vs_lblc.c         |  2 +-
 net/netfilter/ipvs/ip_vs_lblcr.c        |  2 +-
 net/netfilter/nf_conntrack_acct.c       |  2 +-
 net/netfilter/nf_conntrack_ecache.c     |  2 +-
 net/netfilter/nf_conntrack_expect.c     |  4 ++--
 net/netfilter/nf_conntrack_helper.c     |  2 +-
 net/netfilter/nf_conntrack_proto_dccp.c |  2 +-
 net/netfilter/nf_conntrack_standalone.c |  6 +++---
 net/netfilter/nf_conntrack_timestamp.c  |  2 +-
 net/netfilter/nfnetlink_log.c           |  4 ++--
 net/netfilter/x_tables.c                |  4 ++--
 net/netlink/af_netlink.c                |  8 ++++----
 net/netlink/genetlink.c                 |  2 +-
 net/packet/af_packet.c                  |  2 +-
 net/sched/cls_api.c                     |  2 +-
 net/sched/sch_api.c                     |  6 +++---
 net/sctp/socket.c                       |  6 +++---
 net/sysctl_net.c                        |  6 +++---
 net/unix/sysctl_net_unix.c              |  2 +-
 net/xfrm/xfrm_sysctl.c                  |  2 +-
 94 files changed, 197 insertions(+), 196 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a2afa3b..5ebe22a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3425,7 +3425,7 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
 
 	net = dev_net(bond_dev);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	slave_dev = __dev_get_by_name(net, ifr->ifr_slave);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e16487c..2730608 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -487,7 +487,7 @@ static inline bool tun_not_capable(struct tun_struct *tun)
 
 	return ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
 		  (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
-		!ns_capable(net->user_ns, CAP_NET_ADMIN);
+		!ns_capable(net->ns.user_ns, CAP_NET_ADMIN);
 }
 
 static void tun_set_real_num_queues(struct tun_struct *tun)
@@ -1737,7 +1737,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		int queues = ifr->ifr_flags & IFF_MULTI_QUEUE ?
 			     MAX_TAP_QUEUES : 1;
 
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		err = security_tun_dev_create();
 		if (err < 0)
diff --git a/fs/mount.h b/fs/mount.h
index 14db05d..532dd92 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -9,7 +9,6 @@ struct mnt_namespace {
 	struct ns_common	ns;
 	struct mount *	root;
 	struct list_head	list;
-	struct user_namespace	*user_ns;
 	u64			seq;	/* Sequence number to prevent loops */
 	wait_queue_head_t poll;
 	u64 event;
diff --git a/fs/namespace.c b/fs/namespace.c
index 419f746..22b0dbc 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1582,7 +1582,7 @@ out_unlock:
  */
 static inline bool may_mount(void)
 {
-	return ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN);
+	return ns_capable(current->nsproxy->mnt_ns->ns.user_ns, CAP_SYS_ADMIN);
 }
 
 static inline bool may_mandlock(void)
@@ -2187,7 +2187,7 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	if ((mnt->mnt.mnt_flags & MNT_LOCK_NODEV) &&
 	    !(mnt_flags & MNT_NODEV)) {
 		/* Was the nodev implicitly added in mount? */
-		if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
+		if ((mnt->mnt_ns->ns.user_ns != &init_user_ns) &&
 		    !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
 			mnt_flags |= MNT_NODEV;
 		} else {
@@ -2386,7 +2386,7 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 			int mnt_flags, const char *name, void *data)
 {
 	struct file_system_type *type;
-	struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
+	struct user_namespace *user_ns = current->nsproxy->mnt_ns->ns.user_ns;
 	struct vfsmount *mnt;
 	int err;
 
@@ -2744,7 +2744,7 @@ dput_out:
 static void free_mnt_ns(struct mnt_namespace *ns)
 {
 	ns_free_inum(&ns->ns);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	kfree(ns);
 }
 
@@ -2777,7 +2777,7 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns)
 	INIT_LIST_HEAD(&new_ns->list);
 	init_waitqueue_head(&new_ns->poll);
 	new_ns->event = 0;
-	new_ns->user_ns = get_user_ns(user_ns);
+	new_ns->ns.user_ns = get_user_ns(user_ns);
 	return new_ns;
 }
 
@@ -2807,7 +2807,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 	namespace_lock();
 	/* First pass: copy the tree topology */
 	copy_flags = CL_COPY_UNBINDABLE | CL_EXPIRE;
-	if (user_ns != ns->user_ns)
+	if (user_ns != ns->ns.user_ns)
 		copy_flags |= CL_SHARED_TO_SLAVE | CL_UNPRIVILEGED;
 	new = copy_tree(old, old->mnt.mnt_root, copy_flags);
 	if (IS_ERR(new)) {
@@ -3326,7 +3326,7 @@ static int mntns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	struct mnt_namespace *mnt_ns = to_mnt_ns(ns);
 	struct path root;
 
-	if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(mnt_ns->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_CHROOT) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
diff --git a/fs/pnode.c b/fs/pnode.c
index 9989970..e051f11 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -244,7 +244,7 @@ static int propagate_one(struct mount *m)
 	}
 		
 	/* Notice when we are propagating across user namespaces */
-	if (m->mnt_ns->user_ns != user_ns)
+	if (m->mnt_ns->ns.user_ns != user_ns)
 		type |= CL_UNPRIVILEGED;
 	child = copy_tree(last_source, last_source->mnt.mnt_root, type);
 	if (IS_ERR(child))
@@ -286,7 +286,7 @@ int propagate_mnt(struct mount *dest_mnt, struct mountpoint *dest_mp,
 	 * propagate_one(); everything is serialized by namespace_sem,
 	 * so globals will do just fine.
 	 */
-	user_ns = current->nsproxy->mnt_ns->user_ns;
+	user_ns = current->nsproxy->mnt_ns->ns.user_ns;
 	last_dest = dest_mnt;
 	first_source = source_mnt;
 	last_source = source_mnt;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 0670278..aae5104 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -113,7 +113,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 		options = data;
 
 		/* Does the mounter have privilege over the pid namespace? */
-		if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+		if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN))
 			return ERR_PTR(-EPERM);
 	}
 
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index a20320c..f531cc5 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -619,7 +619,6 @@ static inline void cgroup_sk_free(struct sock_cgroup_data *skcd) {}
 struct cgroup_namespace {
 	atomic_t		count;
 	struct ns_common	ns;
-	struct user_namespace	*user_ns;
 	struct css_set          *root_cset;
 };
 
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 1eee6bc..0f9d806 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -56,9 +56,6 @@ struct ipc_namespace {
 	unsigned int    mq_msg_default;
 	unsigned int    mq_msgsize_default;
 
-	/* user_ns which owns the ipc ns */
-	struct user_namespace *user_ns;
-
 	struct ns_common ns;
 };
 
diff --git a/include/linux/ns_common.h b/include/linux/ns_common.h
index 85a5c8c..af2f30d 100644
--- a/include/linux/ns_common.h
+++ b/include/linux/ns_common.h
@@ -4,6 +4,7 @@
 struct proc_ns_operations;
 
 struct ns_common {
+	struct user_namespace *user_ns; /* Owning user namespace */
 	atomic_long_t stashed;
 	const struct proc_ns_operations *ops;
 	unsigned int inum;
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 918b117..b1802c6 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -39,7 +39,6 @@ struct pid_namespace {
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	struct fs_pin *bacct;
 #endif
-	struct user_namespace *user_ns;
 	struct work_struct proc_work;
 	kgid_t pid_gid;
 	int hide_pid;
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 8297e5b..a941b44 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -27,11 +27,15 @@ struct user_namespace {
 	struct uid_gid_map	gid_map;
 	struct uid_gid_map	projid_map;
 	atomic_t		count;
-	struct user_namespace	*parent;
 	int			level;
 	kuid_t			owner;
 	kgid_t			group;
-	struct ns_common	ns;
+
+	/* ->ns.user_ns and ->parent are synonyms */
+	union {
+		struct user_namespace	*parent;
+		struct ns_common	ns;
+	};
 	unsigned long		flags;
 
 	/* Register of per-UID persistent keyrings for this namespace */
diff --git a/include/linux/utsname.h b/include/linux/utsname.h
index 5093f58..78c9ef8 100644
--- a/include/linux/utsname.h
+++ b/include/linux/utsname.h
@@ -23,7 +23,6 @@ extern struct user_namespace init_user_ns;
 struct uts_namespace {
 	struct kref kref;
 	struct new_utsname name;
-	struct user_namespace *user_ns;
 	struct ns_common ns;
 };
 extern struct uts_namespace init_uts_ns;
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..acb714e 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -59,7 +59,6 @@ struct net {
 	struct list_head	cleanup_list;	/* namespaces on death row */
 	struct list_head	exit_list;	/* Use only net_mutex */
 
-	struct user_namespace   *user_ns;	/* Owning user namespace */
 	spinlock_t		nsid_lock;
 	struct idr		netns_ids;
 
diff --git a/init/version.c b/init/version.c
index fe41a63..51ac701 100644
--- a/init/version.c
+++ b/init/version.c
@@ -34,7 +34,7 @@ struct uts_namespace init_uts_ns = {
 		.machine	= UTS_MACHINE,
 		.domainname	= UTS_DOMAINNAME,
 	},
-	.user_ns = &init_user_ns,
+	.ns.user_ns = &init_user_ns,
 	.ns.inum = PROC_UTS_INIT_INO,
 #ifdef CONFIG_UTS_NS
 	.ns.ops = &utsns_operations,
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index ade739f..378cec6 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -331,7 +331,7 @@ static struct dentry *mqueue_mount(struct file_system_type *fs_type,
 		/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
 		 * over the ipc namespace.
 		 */
-		if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+		if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN))
 			return ERR_PTR(-EPERM);
 
 		data = ns;
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index ed81aaf..b2e570c 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -30,7 +30,7 @@ DEFINE_SPINLOCK(mq_lock);
  */
 struct ipc_namespace init_ipc_ns = {
 	.count		= ATOMIC_INIT(1),
-	.user_ns = &init_user_ns,
+	.ns.user_ns = &init_user_ns,
 	.ns.inum = PROC_IPC_INIT_INO,
 #ifdef CONFIG_IPC_NS
 	.ns.ops = &ipcns_operations,
diff --git a/ipc/namespace.c b/ipc/namespace.c
index 068caf1..d9f663b8 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -46,7 +46,7 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 	msg_init_ns(ns);
 	shm_init_ns(ns);
 
-	ns->user_ns = get_user_ns(user_ns);
+	ns->ns.user_ns = get_user_ns(user_ns);
 
 	return ns;
 }
@@ -97,7 +97,7 @@ static void free_ipc_ns(struct ipc_namespace *ns)
 	shm_exit_ns(ns);
 	atomic_dec(&nr_ipc_ns);
 
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	ns_free_inum(&ns->ns);
 	kfree(ns);
 }
@@ -155,7 +155,7 @@ static void ipcns_put(struct ns_common *ns)
 static int ipcns_install(struct nsproxy *nsproxy, struct ns_common *new)
 {
 	struct ipc_namespace *ns = to_ipc_ns(new);
-	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/ipc/shm.c b/ipc/shm.c
index 1328251..20546f1 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1024,7 +1024,7 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
 			goto out_unlock0;
 		}
 
-		if (!ns_capable(ns->user_ns, CAP_IPC_LOCK)) {
+		if (!ns_capable(ns->ns.user_ns, CAP_IPC_LOCK)) {
 			kuid_t euid = current_euid();
 			if (!uid_eq(euid, shp->shm_perm.uid) &&
 			    !uid_eq(euid, shp->shm_perm.cuid)) {
diff --git a/ipc/util.c b/ipc/util.c
index 798cad1..2a1a700 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -491,7 +491,7 @@ int ipcperms(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp, short flag)
 		granted_mode >>= 3;
 	/* is there some bit set in requested_mode but not in granted_mode? */
 	if ((requested_mode & ~granted_mode & 0007) &&
-	    !ns_capable(ns->user_ns, CAP_IPC_OWNER))
+	    !ns_capable(ns->ns.user_ns, CAP_IPC_OWNER))
 		return -1;
 
 	return security_ipc_permission(ipcp, flag);
@@ -700,7 +700,7 @@ struct kern_ipc_perm *ipcctl_pre_down_nolock(struct ipc_namespace *ns,
 
 	euid = current_euid();
 	if (uid_eq(euid, ipcp->cuid) || uid_eq(euid, ipcp->uid)  ||
-	    ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+	    ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN))
 		return ipcp; /* successful lookup */
 err:
 	return ERR_PTR(err);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 75c0ff0..3635600 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -221,7 +221,7 @@ static u16 have_free_callback __read_mostly;
 /* cgroup namespace for init task */
 struct cgroup_namespace init_cgroup_ns = {
 	.count		= { .counter = 2, },
-	.user_ns	= &init_user_ns,
+	.ns.user_ns	= &init_user_ns,
 	.ns.ops		= &cgroupns_operations,
 	.ns.inum	= PROC_CGROUP_INIT_INO,
 	.root_cset	= &init_css_set,
@@ -2094,7 +2094,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 	get_cgroup_ns(ns);
 
 	/* Check if the caller has permission to mount. */
-	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN)) {
+	if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN)) {
 		put_cgroup_ns(ns);
 		return ERR_PTR(-EPERM);
 	}
@@ -5609,7 +5609,7 @@ int __init cgroup_init(void)
 	BUG_ON(cgroup_init_cftypes(NULL, cgroup_dfl_base_files));
 	BUG_ON(cgroup_init_cftypes(NULL, cgroup_legacy_base_files));
 
-	get_user_ns(init_cgroup_ns.user_ns);
+	get_user_ns(init_cgroup_ns.ns.user_ns);
 
 	mutex_lock(&cgroup_mutex);
 
@@ -6285,7 +6285,7 @@ static struct cgroup_namespace *alloc_cgroup_ns(void)
 void free_cgroup_ns(struct cgroup_namespace *ns)
 {
 	put_css_set(ns->root_cset);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	ns_free_inum(&ns->ns);
 	kfree(ns);
 }
@@ -6324,7 +6324,7 @@ struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
 		return new_ns;
 	}
 
-	new_ns->user_ns = get_user_ns(user_ns);
+	new_ns->ns.user_ns = get_user_ns(user_ns);
 	new_ns->root_cset = cset;
 
 	return new_ns;
@@ -6340,7 +6340,7 @@ static int cgroupns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	struct cgroup_namespace *cgroup_ns = to_cg_ns(ns);
 
 	if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN) ||
-	    !ns_capable(cgroup_ns->user_ns, CAP_SYS_ADMIN))
+	    !ns_capable(cgroup_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/* Don't need to do anything if we are attaching to our own cgroupns. */
diff --git a/kernel/pid.c b/kernel/pid.c
index f66162f..c63f992d 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -78,7 +78,7 @@ struct pid_namespace init_pid_ns = {
 	.nr_hashed = PIDNS_HASH_ADDING,
 	.level = 0,
 	.child_reaper = &init_task,
-	.user_ns = &init_user_ns,
+	.ns.user_ns = &init_user_ns,
 	.ns.inum = PROC_PID_INIT_INO,
 #ifdef CONFIG_PID_NS
 	.ns.ops = &pidns_operations,
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index a65ba13..3529a03 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -113,7 +113,7 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
 	kref_init(&ns->kref);
 	ns->level = level;
 	ns->parent = get_pid_ns(parent_pid_ns);
-	ns->user_ns = get_user_ns(user_ns);
+	ns->ns.user_ns = get_user_ns(user_ns);
 	ns->nr_hashed = PIDNS_HASH_ADDING;
 	INIT_WORK(&ns->proc_work, proc_cleanup_work);
 
@@ -146,7 +146,7 @@ static void destroy_pid_namespace(struct pid_namespace *ns)
 	ns_free_inum(&ns->ns);
 	for (i = 0; i < PIDMAP_ENTRIES; i++)
 		kfree(ns->pidmap[i].page);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	call_rcu(&ns->rcu, delayed_free_pidns);
 }
 
@@ -276,7 +276,7 @@ static int pid_ns_ctl_handler(struct ctl_table *table, int write,
 	struct pid_namespace *pid_ns = task_active_pid_ns(current);
 	struct ctl_table tmp = *table;
 
-	if (write && !ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN))
+	if (write && !ns_capable(pid_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/*
@@ -362,7 +362,7 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	struct pid_namespace *active = task_active_pid_ns(current);
 	struct pid_namespace *ancestor, *new = to_pid_ns(ns);
 
-	if (!ns_capable(new->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(new->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/kernel/reboot.c b/kernel/reboot.c
index bd30a97..38f81a6 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -285,7 +285,7 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
 	int ret = 0;
 
 	/* We only trust the superuser with rebooting the system. */
-	if (!ns_capable(pid_ns->user_ns, CAP_SYS_BOOT))
+	if (!ns_capable(pid_ns->ns.user_ns, CAP_SYS_BOOT))
 		return -EPERM;
 
 	/* For safety, we require "magic" arguments. */
diff --git a/kernel/sys.c b/kernel/sys.c
index 89d5be4..9db5647 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1217,7 +1217,7 @@ SYSCALL_DEFINE2(sethostname, char __user *, name, int, len)
 	int errno;
 	char tmp[__NEW_UTS_LEN];
 
-	if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
+	if (!ns_capable(current->nsproxy->uts_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	if (len < 0 || len > __NEW_UTS_LEN)
@@ -1268,7 +1268,7 @@ SYSCALL_DEFINE2(setdomainname, char __user *, name, int, len)
 	int errno;
 	char tmp[__NEW_UTS_LEN];
 
-	if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
+	if (!ns_capable(current->nsproxy->uts_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 	if (len < 0 || len > __NEW_UTS_LEN)
 		return -EINVAL;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 9bafc21..a5bc78c 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -96,6 +96,10 @@ int create_user_ns(struct cred *new)
 	ns->ns.ops = &userns_operations;
 
 	atomic_set(&ns->count, 1);
+
+	/* ->ns.user_ns and ->parent are synonyms. */
+	BUILD_BUG_ON(&ns->ns.user_ns != &ns->parent);
+
 	/* Leave the new->user_ns reference with the new user namespace. */
 	ns->parent = parent_ns;
 	ns->level = parent_ns->level + 1;
diff --git a/kernel/utsname.c b/kernel/utsname.c
index 831ea71..40a119a 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -52,7 +52,7 @@ static struct uts_namespace *clone_uts_ns(struct user_namespace *user_ns,
 
 	down_read(&uts_sem);
 	memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
-	ns->user_ns = get_user_ns(user_ns);
+	ns->ns.user_ns = get_user_ns(user_ns);
 	up_read(&uts_sem);
 	return ns;
 }
@@ -85,7 +85,7 @@ void free_uts_ns(struct kref *kref)
 	struct uts_namespace *ns;
 
 	ns = container_of(kref, struct uts_namespace, kref);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	ns_free_inum(&ns->ns);
 	kfree(ns);
 }
@@ -120,7 +120,7 @@ static int utsns_install(struct nsproxy *nsproxy, struct ns_common *new)
 {
 	struct uts_namespace *ns = to_uts_ns(new);
 
-	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 82a116b..6c46a80 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -541,7 +541,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 	switch (args.cmd) {
 	case SET_VLAN_INGRESS_PRIORITY_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		vlan_dev_set_ingress_priority(dev,
 					      args.u.skb_priority,
@@ -551,7 +551,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_EGRESS_PRIORITY_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = vlan_dev_set_egress_priority(dev,
 						   args.u.skb_priority,
@@ -560,7 +560,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_FLAG_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = vlan_dev_change_flags(dev,
 					    args.vlan_qos ? args.u.flag : 0,
@@ -569,7 +569,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_NAME_TYPE_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		if ((args.u.name_type >= 0) &&
 		    (args.u.name_type < VLAN_NAME_TYPE_HIGHEST)) {
@@ -585,14 +585,14 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case ADD_VLAN_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = register_vlan_device(dev, args.u.VID);
 		break;
 
 	case DEL_VLAN_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		unregister_vlan_dev(dev, NULL);
 		err = 0;
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index d99b200..2fdea4f 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -90,7 +90,7 @@ static int add_del_if(struct net_bridge *br, int ifindex, int isadd)
 	struct net_device *dev;
 	int ret;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	dev = __dev_get_by_index(net, ifindex);
@@ -182,28 +182,28 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	}
 
 	case BRCTL_SET_BRIDGE_FORWARD_DELAY:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_forward_delay(br, args[1]);
 		break;
 
 	case BRCTL_SET_BRIDGE_HELLO_TIME:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_hello_time(br, args[1]);
 		break;
 
 	case BRCTL_SET_BRIDGE_MAX_AGE:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_max_age(br, args[1]);
 		break;
 
 	case BRCTL_SET_AGEING_TIME:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_ageing_time(br, args[1]);
@@ -243,7 +243,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	}
 
 	case BRCTL_SET_BRIDGE_STP_STATE:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		br_stp_set_enabled(br, args[1]);
@@ -251,7 +251,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 		break;
 
 	case BRCTL_SET_BRIDGE_PRIORITY:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		br_stp_set_bridge_priority(br, args[1]);
@@ -260,7 +260,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 
 	case BRCTL_SET_PORT_PRIORITY:
 	{
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		spin_lock_bh(&br->lock);
@@ -274,7 +274,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 
 	case BRCTL_SET_PATH_COST:
 	{
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		spin_lock_bh(&br->lock);
@@ -337,7 +337,7 @@ static int old_deviceless(struct net *net, void __user *uarg)
 	{
 		char buf[IFNAMSIZ];
 
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(buf, (void __user *)args[1], IFNAMSIZ))
@@ -367,7 +367,7 @@ int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __user *uar
 	{
 		char buf[IFNAMSIZ];
 
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(buf, uarg, IFNAMSIZ))
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index beb4707..06d417e 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -36,7 +36,7 @@ static ssize_t store_bridge_parm(struct device *d,
 	unsigned long val;
 	int err;
 
-	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	val = simple_strtoul(buf, &endp, 0);
@@ -285,7 +285,7 @@ static ssize_t group_addr_store(struct device *d,
 	u8 new_addr[6];
 	int i;
 
-	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (sscanf(buf, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 1e04d4d..e7ceab1 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -241,7 +241,7 @@ static ssize_t brport_store(struct kobject *kobj,
 	char *endp;
 	unsigned long val;
 
-	if (!ns_capable(dev_net(p->dev)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(p->dev)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	val = simple_strtoul(buf, &endp, 0);
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 5a61f35..dab0cc2 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1496,7 +1496,7 @@ static int do_ebt_set_ctl(struct sock *sk,
 	int ret;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1519,7 +1519,7 @@ static int do_ebt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 	struct ebt_table *t;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&tmp, user, sizeof(tmp)))
@@ -2303,7 +2303,7 @@ static int compat_do_ebt_set_ctl(struct sock *sk,
 	int ret;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2327,7 +2327,7 @@ static int compat_do_ebt_get_ctl(struct sock *sk, int cmd,
 	struct ebt_table *t;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* try real handler in case userland supplied needed padding */
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index b94b1d2..a705922 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -474,7 +474,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCGMIIPHY:
 	case SIOCGMIIREG:
 	case SIOCSIFNAME:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		dev_load(net, ifr.ifr_name);
 		rtnl_lock();
@@ -522,7 +522,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCBRADDIF:
 	case SIOCBRDELIF:
 	case SIOCSHWTSTAMP:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		/* fall through */
 	case SIOCBONDSLAVEINFOQUERY:
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f403481..27a3085 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2480,7 +2480,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_GTUNABLE:
 		break;
 	default:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	}
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 510cd62..8df69fd 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -3169,7 +3169,7 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
 	}
 
 	/* Don't export sysctls to unprivileged users */
-	if (neigh_parms_net(p)->user_ns != &init_user_ns)
+	if (neigh_parms_net(p)->ns.user_ns != &init_user_ns)
 		t->neigh_vars[0].procname = NULL;
 
 	switch (neigh_parms_family(p)) {
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 7a0b616..eb20bc7 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -85,7 +85,7 @@ static ssize_t netdev_store(struct device *dev, struct device_attribute *attr,
 	unsigned long new;
 	int ret = -EINVAL;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	ret = kstrtoul(buf, 0, &new);
@@ -362,7 +362,7 @@ static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
 	size_t count = len;
 	ssize_t ret;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* ignore trailing newline */
@@ -1390,7 +1390,7 @@ static bool net_current_may_mount(void)
 {
 	struct net *net = current->nsproxy->net_ns;
 
-	return ns_capable(net->user_ns, CAP_SYS_ADMIN);
+	return ns_capable(net->ns.user_ns, CAP_SYS_ADMIN);
 }
 
 static void *net_grab_current_ns(void)
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2c2eb1b..3433f0c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -279,7 +279,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	atomic_set(&net->count, 1);
 	atomic_set(&net->passive, 1);
 	net->dev_base_seq = 1;
-	net->user_ns = user_ns;
+	net->ns.user_ns = user_ns;
 	idr_init(&net->netns_ids);
 	spin_lock_init(&net->nsid_lock);
 
@@ -444,7 +444,7 @@ static void cleanup_net(struct work_struct *work)
 	/* Finally it is safe to free my network namespace structure */
 	list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {
 		list_del_init(&net->exit_list);
-		put_user_ns(net->user_ns);
+		put_user_ns(net->ns.user_ns);
 		net_drop_ns(net);
 	}
 }
@@ -987,7 +987,7 @@ static int netns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 {
 	struct net *net = to_net_ns(ns);
 
-	if (!ns_capable(net->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(net->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d69c464..ea7ba06 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1785,7 +1785,7 @@ static int do_setlink(const struct sk_buff *skb,
 			err = PTR_ERR(net);
 			goto errout;
 		}
-		if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) {
+		if (!netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN)) {
 			put_net(net);
 			err = -EPERM;
 			goto errout;
@@ -2430,7 +2430,7 @@ replay:
 			return PTR_ERR(dest_net);
 
 		err = -EPERM;
-		if (!netlink_ns_capable(skb, dest_net->user_ns, CAP_NET_ADMIN))
+		if (!netlink_ns_capable(skb, dest_net->ns.user_ns, CAP_NET_ADMIN))
 			goto out;
 
 		if (tb[IFLA_LINK_NETNSID]) {
@@ -2442,7 +2442,7 @@ replay:
 				goto out;
 			}
 			err = -EPERM;
-			if (!netlink_ns_capable(skb, link_net->user_ns, CAP_NET_ADMIN))
+			if (!netlink_ns_capable(skb, link_net->ns.user_ns, CAP_NET_ADMIN))
 				goto out;
 		}
 
diff --git a/net/core/scm.c b/net/core/scm.c
index 2696aef..1a2301a 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -54,7 +54,7 @@ static __inline__ int scm_check_creds(struct ucred *creds)
 		return -EINVAL;
 
 	if ((creds->pid == task_tgid_vnr(current) ||
-	     ns_capable(task_active_pid_ns(current)->user_ns, CAP_SYS_ADMIN)) &&
+	     ns_capable(task_active_pid_ns(current)->ns.user_ns, CAP_SYS_ADMIN)) &&
 	    ((uid_eq(uid, cred->uid)   || uid_eq(uid, cred->euid) ||
 	      uid_eq(uid, cred->suid)) || ns_capable(cred->user_ns, CAP_SETUID)) &&
 	    ((gid_eq(gid, cred->gid)   || gid_eq(gid, cred->egid) ||
diff --git a/net/core/sock.c b/net/core/sock.c
index 08bf97e..321ca3c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -191,7 +191,7 @@ EXPORT_SYMBOL(sk_capable);
  */
 bool sk_net_capable(const struct sock *sk, int cap)
 {
-	return sk_ns_capable(sk, sock_net(sk)->user_ns, cap);
+	return sk_ns_capable(sk, sock_net(sk)->ns.user_ns, cap);
 }
 EXPORT_SYMBOL(sk_net_capable);
 
@@ -534,7 +534,7 @@ static int sock_setbindtodevice(struct sock *sk, char __user *optval,
 
 	/* Sorry... */
 	ret = -EPERM;
-	if (!ns_capable(net->user_ns, CAP_NET_RAW))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		goto out;
 
 	ret = -EINVAL;
@@ -778,7 +778,7 @@ set_rcvbuf:
 
 	case SO_PRIORITY:
 		if ((val >= 0 && val <= 6) ||
-		    ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		    ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			sk->sk_priority = val;
 		else
 			ret = -EPERM;
@@ -945,7 +945,7 @@ set_rcvbuf:
 			clear_bit(SOCK_PASSSEC, &sock->flags);
 		break;
 	case SO_MARK:
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			ret = -EPERM;
 		else
 			sk->sk_mark = val;
@@ -1921,7 +1921,7 @@ int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
 
 	switch (cmsg->cmsg_type) {
 	case SO_MARK:
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
 			return -EINVAL;
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 6b10573..7151b43 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -303,7 +303,7 @@ static int sock_diag_bind(struct net *net, int group)
 
 int sock_diag_destroy(struct sock *sk, int err)
 {
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (!sk->sk_prot->diag_destroy)
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 0df2aa6..6f6749d 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -441,7 +441,7 @@ static __net_init int sysctl_core_net_init(struct net *net)
 		tbl[0].data = &net->core.sysctl_somaxconn;
 
 		/* Don't export any sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns) {
+		if (net->ns.user_ns != &init_user_ns) {
 			tbl[0].procname = NULL;
 		}
 	}
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 30d875d..9d002f4 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -512,7 +512,7 @@ static int __net_init lowpan_frags_ns_sysctl_register(struct net *net)
 		table[2].data = &ieee802154_lowpan->frags.timeout;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			table[0].procname = NULL;
 	}
 
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index e0bd013..6353184 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -895,8 +895,8 @@ static int dgram_setsockopt(struct sock *sk, int level, int optname,
 		ro->want_ack = !!val;
 		break;
 	case WPAN_SECURITY:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN) &&
-		    !ns_capable(net->user_ns, CAP_NET_RAW)) {
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN) &&
+		    !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 			err = -EPERM;
 			break;
 		}
@@ -919,8 +919,8 @@ static int dgram_setsockopt(struct sock *sk, int level, int optname,
 		}
 		break;
 	case WPAN_SECURITY_LEVEL:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN) &&
-		    !ns_capable(net->user_ns, CAP_NET_RAW)) {
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN) &&
+		    !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 			err = -EPERM;
 			break;
 		}
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index d39e9e4..bec3946 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -309,7 +309,7 @@ lookup_protocol:
 
 	err = -EPERM;
 	if (sock->type == SOCK_RAW && !kern &&
-	    !ns_capable(net->user_ns, CAP_NET_RAW))
+	    !ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
 	sock->ops = answer->ops;
@@ -475,7 +475,7 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	snum = ntohs(addr->sin_port);
 	err = -EACCES;
 	if (snum && snum < PROT_SOCK &&
-	    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
+	    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE))
 		goto out;
 
 	/*      We keep a pair of addresses. rcv_saddr is the one
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 89a8cac4..22517fb 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1140,7 +1140,7 @@ int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCDARP:
 	case SIOCSARP:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	case SIOCGARP:
 		err = copy_from_user(&r, arg, sizeof(struct arpreq));
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index e333bc8..fc8f1f2 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -961,7 +961,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 
 	case SIOCSIFFLAGS:
 		ret = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto out;
 		break;
 	case SIOCSIFADDR:	/* Set interface address (and family) */
@@ -969,7 +969,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCSIFDSTADDR:	/* Set the destination address */
 	case SIOCSIFNETMASK: 	/* Set the netmask for the interface */
 		ret = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto out;
 		ret = -EINVAL;
 		if (sin->sin_family != AF_INET)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index ef2ebeb..fbc7311 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -581,7 +581,7 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(&rt, arg, sizeof(rt)))
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index 4d158ff..dda262e 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -407,7 +407,7 @@ int ip_options_compile(struct net *net,
 					optptr[2] += 8;
 					break;
 				default:
-					if (!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) {
+					if (!skb && !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 						pp_ptr = optptr + 3;
 						goto error;
 					}
@@ -442,7 +442,7 @@ int ip_options_compile(struct net *net,
 				opt->router_alert = optptr - iph;
 			break;
 		case IPOPT_CIPSO:
-			if ((!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) || opt->cipso) {
+			if ((!skb && !ns_capable(net->ns.user_ns, CAP_NET_RAW)) || opt->cipso) {
 				pp_ptr = optptr;
 				goto error;
 			}
@@ -455,7 +455,7 @@ int ip_options_compile(struct net *net,
 		case IPOPT_SEC:
 		case IPOPT_SID:
 		default:
-			if (!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!skb && !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				pp_ptr = optptr;
 				goto error;
 			}
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 71a52f4d..474af75 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1138,14 +1138,14 @@ mc_msf_out:
 	case IP_IPSEC_POLICY:
 	case IP_XFRM_POLICY:
 		err = -EPERM;
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
 
 	case IP_TRANSPARENT:
-		if (!!val && !ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) &&
-		    !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
+		if (!!val && !ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_RAW) &&
+		    !ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN)) {
 			err = -EPERM;
 			break;
 		}
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index d8f5e0a..4ddc520 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -765,7 +765,7 @@ int ip_tunnel_ioctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 		if (p->iph.ttl)
 			p->iph.frag_off |= htons(IP_DF);
@@ -821,7 +821,7 @@ int ip_tunnel_ioctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == itn->fb_tunnel_dev) {
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 5ad48ec..df292fa 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1272,7 +1272,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval,
 	}
 	if (optname != MRT_INIT) {
 		if (sk != rcu_access_pointer(mrt->mroute_sk) &&
-		    !ns_capable(net->user_ns, CAP_NET_ADMIN)) {
+		    !ns_capable(net->ns.user_ns, CAP_NET_ADMIN)) {
 			ret = -EACCES;
 			goto out_unlock;
 		}
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 2033f92..e123093 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -1300,7 +1300,7 @@ static int compat_do_arpt_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1434,7 +1434,7 @@ static int compat_do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1455,7 +1455,7 @@ static int do_arpt_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1478,7 +1478,7 @@ static int do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 54906e0..b29238a 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1554,7 +1554,7 @@ compat_do_ipt_set_ctl(struct sock *sk,	int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1656,7 +1656,7 @@ compat_do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1678,7 +1678,7 @@ do_ipt_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1702,7 +1702,7 @@ do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a1f2830..ddb0003 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2787,7 +2787,7 @@ static __net_init int sysctl_route_net_init(struct net *net)
 			goto err_dup;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			tbl[0].procname = NULL;
 	}
 	tbl[0].extra1 = net;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5c7ed14..467b6cc 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2273,7 +2273,7 @@ EXPORT_SYMBOL(tcp_disconnect);
 
 static inline bool tcp_can_repair_sock(const struct sock *sk)
 {
-	return ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) &&
+	return ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN) &&
 		((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_ESTABLISHED));
 }
 
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 882caa4..385d0f4 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -354,7 +354,7 @@ int tcp_set_congestion_control(struct sock *sk, const char *name)
 	if (!ca)
 		err = -ENOENT;
 	else if (!((ca->flags & TCP_CONG_NON_RESTRICTED) ||
-		   ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)))
+		   ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN)))
 		err = -EPERM;
 	else if (!try_module_get(ca->owner))
 		err = -EBUSY;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 47f837a..9aaabf8 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2781,7 +2781,7 @@ int addrconf_add_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
@@ -2800,7 +2800,7 @@ int addrconf_del_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index bfa86f0..1491cbd 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -161,7 +161,7 @@ lookup_protocol:
 
 	err = -EPERM;
 	if (sock->type == SOCK_RAW && !kern &&
-	    !ns_capable(net->user_ns, CAP_NET_RAW))
+	    !ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
 	sock->ops = answer->ops;
@@ -286,7 +286,7 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		return -EINVAL;
 
 	snum = ntohs(addr->sin6_port);
-	if (snum && snum < PROT_SOCK && !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
+	if (snum && snum < PROT_SOCK && !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
 	lock_sock(sk);
diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c
index 514ac25..e168ca3 100644
--- a/net/ipv6/anycast.c
+++ b/net/ipv6/anycast.c
@@ -62,7 +62,7 @@ int ipv6_sock_ac_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
 
 	ASSERT_RTNL();
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (ipv6_addr_is_multicast(addr))
 		return -EINVAL;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 37874e2..92204ba 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -837,7 +837,7 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -857,7 +857,7 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -882,7 +882,7 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index b912f0d..c07e37e 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -569,7 +569,7 @@ int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen)
 		rcu_read_unlock_bh();
 
 		if (freq.flr_share == IPV6_FL_S_NONE &&
-		    ns_capable(net->user_ns, CAP_NET_ADMIN)) {
+		    ns_capable(net->ns.user_ns, CAP_NET_ADMIN)) {
 			fl = fl_lookup(net, freq.flr_label);
 			if (fl) {
 				err = fl6_renew(fl, freq.flr_linger, freq.flr_expires);
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 776d145..7f23d34 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -852,7 +852,7 @@ static int ip6gre_tunnel_ioctl(struct net_device *dev,
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -901,7 +901,7 @@ static int ip6gre_tunnel_ioctl(struct net_device *dev,
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == ign->fb_tunnel_dev) {
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7b0481e..fa9443c 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1484,7 +1484,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = -EFAULT;
 		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
@@ -1520,7 +1520,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		break;
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 
 		if (dev == ip6n->fb_tnl_dev) {
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d90a11f..ece8758 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -743,7 +743,7 @@ vti6_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = -EFAULT;
 		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
@@ -775,7 +775,7 @@ vti6_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		break;
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 
 		if (dev == ip6n->fb_tnl_dev) {
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 487ef3b..87a6a20 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1669,7 +1669,7 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
 		return -ENOENT;
 
 	if (optname != MRT6_INIT) {
-		if (sk != mrt->mroute6_sk && !ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (sk != mrt->mroute6_sk && !ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EACCES;
 	}
 
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index a9895e1..d5dc2aa 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -365,8 +365,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 		break;
 
 	case IPV6_TRANSPARENT:
-		if (valbool && !ns_capable(net->user_ns, CAP_NET_ADMIN) &&
-		    !ns_capable(net->user_ns, CAP_NET_RAW)) {
+		if (valbool && !ns_capable(net->ns.user_ns, CAP_NET_ADMIN) &&
+		    !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 			retv = -EPERM;
 			break;
 		}
@@ -404,7 +404,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 
 		/* hop-by-hop / destination options are privileged option */
 		retv = -EPERM;
-		if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
+		if (optname != IPV6_RTHDR && !ns_capable(net->ns.user_ns, CAP_NET_RAW))
 			break;
 
 		opt = rcu_dereference_protected(np->opt,
@@ -785,7 +785,7 @@ done:
 	case IPV6_IPSEC_POLICY:
 	case IPV6_XFRM_POLICY:
 		retv = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		retv = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 63e06c3..0f92561 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1573,7 +1573,7 @@ compat_do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1675,7 +1675,7 @@ compat_do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1697,7 +1697,7 @@ do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1721,7 +1721,7 @@ do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 2160d5d..4efbd91 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -645,7 +645,7 @@ static int __net_init ip6_frags_ns_sysctl_register(struct net *net)
 		table[2].data = &net->ipv6.frags.timeout;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			table[0].procname = NULL;
 	}
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 520b788..938a7aa 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2468,7 +2468,7 @@ int ipv6_route_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		err = copy_from_user(&rtmsg, arg,
 				     sizeof(struct in6_rtmsg));
@@ -3594,7 +3594,7 @@ struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
 		table[9].data = &net->ipv6.sysctl.ip6_rt_gc_min_interval;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			table[0].procname = NULL;
 	}
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0619ac7..196f476 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1181,7 +1181,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -1229,7 +1229,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == sitn->fb_tunnel_dev) {
@@ -1260,7 +1260,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCDELPRL:
 	case SIOCCHGPRL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 		err = -EINVAL;
 		if (dev == sitn->fb_tunnel_dev)
@@ -1287,7 +1287,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCCHG6RD:
 	case SIOCDEL6RD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
diff --git a/net/key/af_key.c b/net/key/af_key.c
index f9c9ecb..47183e9 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -141,7 +141,7 @@ static int pfkey_create(struct net *net, struct socket *sock, int protocol,
 	struct sock *sk;
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (sock->type != SOCK_RAW)
 		return -ESOCKTNOSUPPORT;
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 8ae3ed9..41c3da3 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -160,7 +160,7 @@ static int llc_ui_create(struct net *net, struct socket *sock, int protocol,
 	struct sock *sk;
 	int rc = -ESOCKTNOSUPPORT;
 
-	if (!ns_capable(net->user_ns, CAP_NET_RAW))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		return -EPERM;
 
 	if (!net_eq(net, &init_net))
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index a748b0c..46745a7 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1901,7 +1901,7 @@ ip_set_sockfn_get(struct sock *sk, int optval, void __user *user, int *len)
 	struct net *net = sock_net(sk);
 	struct ip_set_net *inst = ip_set_pernet(net);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (optval != SO_IP_SET)
 		return -EBADF;
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index c3c809b..a02b3b3 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2360,7 +2360,7 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
 	BUILD_BUG_ON(sizeof(arg) > 255);
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_SET_MAX)
@@ -2678,7 +2678,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 
 	BUG_ON(!net);
 	BUILD_BUG_ON(sizeof(arg) > 255);
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_GET_MAX)
@@ -3906,7 +3906,7 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs)
 			return -ENOMEM;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			tbl[0].procname = NULL;
 	} else
 		tbl = vs_vars;
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index cccf4d6..23a3ec3 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -564,7 +564,7 @@ static int __net_init __ip_vs_lblc_init(struct net *net)
 			return -ENOMEM;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			ipvs->lblc_ctl_table[0].procname = NULL;
 
 	} else
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 796d70e..704ad5c 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -750,7 +750,7 @@ static int __net_init __ip_vs_lblcr_init(struct net *net)
 			return -ENOMEM;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			ipvs->lblcr_ctl_table[0].procname = NULL;
 	} else
 		ipvs->lblcr_ctl_table = vs_vars_table;
diff --git a/net/netfilter/nf_conntrack_acct.c b/net/netfilter/nf_conntrack_acct.c
index 45da11a..9303901 100644
--- a/net/netfilter/nf_conntrack_acct.c
+++ b/net/netfilter/nf_conntrack_acct.c
@@ -74,7 +74,7 @@ static int nf_conntrack_acct_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_acct;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.acct_sysctl_header = register_net_sysctl(net, "net/netfilter",
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index d28011b..22411e5 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -358,7 +358,7 @@ static int nf_conntrack_event_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_events;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.event_sysctl_header =
diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c
index 9e36931..c1e6242 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -618,8 +618,8 @@ static int exp_proc_init(struct net *net)
 	if (!proc)
 		return -ENOMEM;
 
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 	if (uid_valid(root_uid) && gid_valid(root_gid))
 		proc_set_user(proc, root_uid, root_gid);
 #endif /* CONFIG_NF_CONNTRACK_PROCFS */
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 196cb39..4cff85b 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -67,7 +67,7 @@ static int nf_conntrack_helper_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_auto_assign_helper;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.helper_sysctl_header =
diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c
index 399a38f..766dbee 100644
--- a/net/netfilter/nf_conntrack_proto_dccp.c
+++ b/net/netfilter/nf_conntrack_proto_dccp.c
@@ -841,7 +841,7 @@ static int dccp_kmemdup_sysctl_table(struct net *net, struct nf_proto_net *pn,
 	pn->ctl_table[7].data = &dn->dccp_loose;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		pn->ctl_table[0].procname = NULL;
 #endif
 	return 0;
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index c026c47..8796e36 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -397,8 +397,8 @@ static int nf_conntrack_standalone_init_proc(struct net *net)
 	if (!pde)
 		goto out_nf_conntrack;
 
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 	if (uid_valid(root_uid) && gid_valid(root_gid))
 		proc_set_user(pde, root_uid, root_gid);
 
@@ -512,7 +512,7 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
 	table[4].data = &net->ct.sysctl_log_invalid;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.sysctl_header = register_net_sysctl(net, "net/netfilter", table);
diff --git a/net/netfilter/nf_conntrack_timestamp.c b/net/netfilter/nf_conntrack_timestamp.c
index 7a394df..43bd240 100644
--- a/net/netfilter/nf_conntrack_timestamp.c
+++ b/net/netfilter/nf_conntrack_timestamp.c
@@ -52,7 +52,7 @@ static int nf_conntrack_tstamp_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_tstamp;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.tstamp_sysctl_header = register_net_sysctl(net,	"net/netfilter",
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 11f81c8..5428b8e 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -1072,8 +1072,8 @@ static int __net_init nfnl_log_net_init(struct net *net)
 	if (!proc)
 		return -ENOMEM;
 
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 	if (uid_valid(root_uid) && gid_valid(root_gid))
 		proc_set_user(proc, root_uid, root_gid);
 #endif
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 2675d58..d840aa6 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1493,8 +1493,8 @@ int xt_proto_init(struct net *net, u_int8_t af)
 
 
 #ifdef CONFIG_PROC_FS
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 
 	strlcpy(buf, xt_prefix[af], sizeof(buf));
 	strlcat(buf, FORMAT_TABLES, sizeof(buf));
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 627f898..070e24d 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -828,14 +828,14 @@ EXPORT_SYMBOL(netlink_capable);
  */
 bool netlink_net_capable(const struct sk_buff *skb, int cap)
 {
-	return netlink_ns_capable(skb, sock_net(skb->sk)->user_ns, cap);
+	return netlink_ns_capable(skb, sock_net(skb->sk)->ns.user_ns, cap);
 }
 EXPORT_SYMBOL(netlink_net_capable);
 
 static inline int netlink_allowed(const struct socket *sock, unsigned int flag)
 {
 	return (nl_table[sock->sk->sk_protocol].flags & flag) ||
-		ns_capable(sock_net(sock->sk)->user_ns, CAP_NET_ADMIN);
+		ns_capable(sock_net(sock->sk)->ns.user_ns, CAP_NET_ADMIN);
 }
 
 static void
@@ -1323,7 +1323,7 @@ static void do_one_broadcast(struct sock *sk,
 		if (!peernet_has_id(sock_net(sk), p->net))
 			return;
 
-		if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
+		if (!file_ns_capable(sk->sk_socket->file, p->net->ns.user_ns,
 				     CAP_NET_BROADCAST))
 			return;
 	}
@@ -1586,7 +1586,7 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 		err = 0;
 		break;
 	case NETLINK_LISTEN_ALL_NSID:
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_BROADCAST))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_BROADCAST))
 			return -EPERM;
 
 		if (val)
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index a09132a..831e863 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -561,7 +561,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
 		return -EPERM;
 
 	if ((ops->flags & GENL_UNS_ADMIN_PERM) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) {
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 9f0983f..8172443 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3208,7 +3208,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
 	__be16 proto = (__force __be16)protocol; /* weird, but documented */
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_RAW))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		return -EPERM;
 	if (sock->type != SOCK_DGRAM && sock->type != SOCK_RAW &&
 	    sock->type != SOCK_PACKET)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index a75864d..249a340 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -140,7 +140,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
 	int tp_created = 0;
 
 	if ((n->nlmsg_type != RTM_GETTFILTER) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 replay:
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..783f495 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1123,7 +1123,7 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n)
 	int err;
 
 	if ((n->nlmsg_type != RTM_GETQDISC) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	err = nlmsg_parse(n, sizeof(*tcm), tca, TCA_MAX, NULL);
@@ -1190,7 +1190,7 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n)
 	struct Qdisc *q, *p;
 	int err;
 
-	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	if (!netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 replay:
@@ -1539,7 +1539,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n)
 	int err;
 
 	if ((n->nlmsg_type != RTM_GETTCLASS) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	err = nlmsg_parse(n, sizeof(*tcm), tca, TCA_MAX, NULL);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 67154b8..bb65b08 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -361,7 +361,7 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
 	}
 
 	if (snum && snum < PROT_SOCK &&
-	    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
+	    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
 	/* See if the address matches any of the addresses we may have
@@ -1153,7 +1153,7 @@ static int __sctp_connect(struct sock *sk,
 				 * be permitted to open new associations.
 				 */
 				if (ep->base.bind_addr.port < PROT_SOCK &&
-				    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) {
+				    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE)) {
 					err = -EACCES;
 					goto out_free;
 				}
@@ -1815,7 +1815,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 			 * associations.
 			 */
 			if (ep->base.bind_addr.port < PROT_SOCK &&
-			    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) {
+			    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE)) {
 				err = -EACCES;
 				goto out_unlock;
 			}
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index ed98c1f..cb46bc9 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -42,11 +42,11 @@ static int net_ctl_permissions(struct ctl_table_header *head,
 			       struct ctl_table *table)
 {
 	struct net *net = container_of(head->set, struct net, sysctls);
-	kuid_t root_uid = make_kuid(net->user_ns, 0);
-	kgid_t root_gid = make_kgid(net->user_ns, 0);
+	kuid_t root_uid = make_kuid(net->ns.user_ns, 0);
+	kgid_t root_gid = make_kgid(net->ns.user_ns, 0);
 
 	/* Allow network administrator to have same access as root. */
-	if (ns_capable(net->user_ns, CAP_NET_ADMIN) ||
+	if (ns_capable(net->ns.user_ns, CAP_NET_ADMIN) ||
 	    uid_eq(root_uid, current_euid())) {
 		int mode = (table->mode >> 6) & 7;
 		return (mode << 6) | (mode << 3) | mode;
diff --git a/net/unix/sysctl_net_unix.c b/net/unix/sysctl_net_unix.c
index b3d5150..b5aec8a 100644
--- a/net/unix/sysctl_net_unix.c
+++ b/net/unix/sysctl_net_unix.c
@@ -35,7 +35,7 @@ int __net_init unix_sysctl_register(struct net *net)
 		goto err_alloc;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	table[0].data = &net->unx.sysctl_max_dgram_qlen;
diff --git a/net/xfrm/xfrm_sysctl.c b/net/xfrm/xfrm_sysctl.c
index 05a6e3d..8d4b41f 100644
--- a/net/xfrm/xfrm_sysctl.c
+++ b/net/xfrm/xfrm_sysctl.c
@@ -55,7 +55,7 @@ int __net_init xfrm_sysctl_init(struct net *net)
 	table[3].data = &net->xfrm.sysctl_acq_expires;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->xfrm.sysctl_hdr = register_net_sysctl(net, "net/core", table);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
  2016-07-14 18:20 ` Andrey Vagin
@ 2016-07-14 18:20     ` Andrey Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 18:20 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: criu-GEFAQzZX7r8dnm+yROfE0A, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrey Vagin

Return -EPERM if an owning user namespace is outside of a process
current user namespace.

Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 include/linux/user_namespace.h |  7 +++++++
 kernel/user_namespace.c        | 24 ++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index a941b44..e416b76 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -76,6 +76,8 @@ extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t,
 extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
 extern int proc_setgroups_show(struct seq_file *m, void *v);
 extern bool userns_may_setgroups(const struct user_namespace *ns);
+
+struct ns_common *ns_get_owner(struct ns_common *ns);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -104,6 +106,11 @@ static inline bool userns_may_setgroups(const struct user_namespace *ns)
 {
 	return true;
 }
+
+static inline struct ns_common *ns_get_owner(struct ns_common *ns)
+{
+	return ERR_PTR(-ENOENT);
+}
 #endif
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index a5bc78c..6382e5e 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -994,6 +994,30 @@ static int userns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	return commit_creds(cred);
 }
 
+struct ns_common *ns_get_owner(struct ns_common *ns)
+{
+	const struct cred *cred = current_cred();
+	struct user_namespace *user_ns, *p;
+
+	user_ns = p = ns->user_ns;
+	if (user_ns == NULL) { /* ns is init_user_ns */
+		/* Unprivileged user should not know that it's init_user_ns. */
+		if (capable(CAP_SYS_ADMIN))
+			return ERR_PTR(-ENOENT);
+		return ERR_PTR(-EPERM);
+	}
+
+	for (;;) {
+		if (p == cred->user_ns)
+			break;
+		if (p == &init_user_ns)
+			return ERR_PTR(-EPERM);
+		p = p->parent;
+	}
+
+	return &get_user_ns(user_ns)->ns;
+}
+
 const struct proc_ns_operations userns_operations = {
 	.name		= "user",
 	.type		= CLONE_NEWUSER,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
@ 2016-07-14 18:20     ` Andrey Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 18:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-api, containers, criu, linux-fsdevel, Andrey Vagin

Return -EPERM if an owning user namespace is outside of a process
current user namespace.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 include/linux/user_namespace.h |  7 +++++++
 kernel/user_namespace.c        | 24 ++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index a941b44..e416b76 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -76,6 +76,8 @@ extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t,
 extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
 extern int proc_setgroups_show(struct seq_file *m, void *v);
 extern bool userns_may_setgroups(const struct user_namespace *ns);
+
+struct ns_common *ns_get_owner(struct ns_common *ns);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -104,6 +106,11 @@ static inline bool userns_may_setgroups(const struct user_namespace *ns)
 {
 	return true;
 }
+
+static inline struct ns_common *ns_get_owner(struct ns_common *ns)
+{
+	return ERR_PTR(-ENOENT);
+}
 #endif
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index a5bc78c..6382e5e 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -994,6 +994,30 @@ static int userns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	return commit_creds(cred);
 }
 
+struct ns_common *ns_get_owner(struct ns_common *ns)
+{
+	const struct cred *cred = current_cred();
+	struct user_namespace *user_ns, *p;
+
+	user_ns = p = ns->user_ns;
+	if (user_ns == NULL) { /* ns is init_user_ns */
+		/* Unprivileged user should not know that it's init_user_ns. */
+		if (capable(CAP_SYS_ADMIN))
+			return ERR_PTR(-ENOENT);
+		return ERR_PTR(-EPERM);
+	}
+
+	for (;;) {
+		if (p == cred->user_ns)
+			break;
+		if (p == &init_user_ns)
+			return ERR_PTR(-EPERM);
+		p = p->parent;
+	}
+
+	return &get_user_ns(user_ns)->ns;
+}
+
 const struct proc_ns_operations userns_operations = {
 	.name		= "user",
 	.type		= CLONE_NEWUSER,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 3/5] nsfs: add ioctl to get an owning user namespace for ns file descriptor
       [not found] ` <1468520419-28220-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
  2016-07-14 18:20   ` [PATCH 1/5] namespaces: move user_ns into ns_common Andrey Vagin
  2016-07-14 18:20     ` Andrey Vagin
@ 2016-07-14 18:20   ` Andrey Vagin
  2016-07-14 18:20     ` Andrey Vagin
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 18:20 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: criu-GEFAQzZX7r8dnm+yROfE0A, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrey Vagin

Each namespace has an owning user namespace and now there is not way
to discover these relationships.

Understending namespaces relationships allows to answer the question:
what capability does process X have to perform operations on a resource
governed by namespace Y?

After a long discussion, Eric W. Biederman proposed to use ioctl-s for
this purpose.

The NS_GET_USERNS ioctl returns a file descriptor to an owning user
namespace.
It returns EPERM if a target namespace is outside of a current user
namespace.

Link: https://lkml.org/lkml/2016/7/6/158
Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 fs/nsfs.c                 | 94 ++++++++++++++++++++++++++++++++++++++++-------
 include/uapi/linux/nsfs.h |  9 +++++
 2 files changed, 90 insertions(+), 13 deletions(-)
 create mode 100644 include/uapi/linux/nsfs.h

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 8f20d60..1e5d2d0 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -5,11 +5,16 @@
 #include <linux/magic.h>
 #include <linux/ktime.h>
 #include <linux/seq_file.h>
+#include <linux/user_namespace.h>
+#include <linux/nsfs.h>
 
 static struct vfsmount *nsfs_mnt;
 
+static long ns_ioctl(struct file *filp, unsigned int ioctl,
+			unsigned long arg);
 static const struct file_operations ns_file_operations = {
 	.llseek		= no_llseek,
+	.unlocked_ioctl = ns_ioctl,
 };
 
 static char *ns_dname(struct dentry *dentry, char *buffer, int buflen)
@@ -44,22 +49,14 @@ static void nsfs_evict(struct inode *inode)
 	ns->ops->put(ns);
 }
 
-void *ns_get_path(struct path *path, struct task_struct *task,
-			const struct proc_ns_operations *ns_ops)
+static void *__ns_get_path(struct path *path, struct ns_common *ns)
 {
 	struct vfsmount *mnt = mntget(nsfs_mnt);
 	struct qstr qname = { .name = "", };
 	struct dentry *dentry;
 	struct inode *inode;
-	struct ns_common *ns;
 	unsigned long d;
 
-again:
-	ns = ns_ops->get(task);
-	if (!ns) {
-		mntput(mnt);
-		return ERR_PTR(-ENOENT);
-	}
 	rcu_read_lock();
 	d = atomic_long_read(&ns->stashed);
 	if (!d)
@@ -68,7 +65,7 @@ again:
 	if (!lockref_get_not_dead(&dentry->d_lockref))
 		goto slow;
 	rcu_read_unlock();
-	ns_ops->put(ns);
+	ns->ops->put(ns);
 got_it:
 	path->mnt = mnt;
 	path->dentry = dentry;
@@ -77,7 +74,7 @@ slow:
 	rcu_read_unlock();
 	inode = new_inode_pseudo(mnt->mnt_sb);
 	if (!inode) {
-		ns_ops->put(ns);
+		ns->ops->put(ns);
 		mntput(mnt);
 		return ERR_PTR(-ENOMEM);
 	}
@@ -95,17 +92,88 @@ slow:
 		return ERR_PTR(-ENOMEM);
 	}
 	d_instantiate(dentry, inode);
-	dentry->d_fsdata = (void *)ns_ops;
+	dentry->d_fsdata = (void *)ns->ops;
 	d = atomic_long_cmpxchg(&ns->stashed, 0, (unsigned long)dentry);
 	if (d) {
 		d_delete(dentry);	/* make sure ->d_prune() does nothing */
 		dput(dentry);
 		cpu_relax();
-		goto again;
+		return ERR_PTR(-EAGAIN);
 	}
 	goto got_it;
 }
 
+void *ns_get_path(struct path *path, struct task_struct *task,
+			const struct proc_ns_operations *ns_ops)
+{
+	struct ns_common *ns;
+	void *ret;
+
+again:
+	ns = ns_ops->get(task);
+	if (!ns)
+		return ERR_PTR(-ENOENT);
+
+	ret = __ns_get_path(path, ns);
+	if (IS_ERR(ret) && PTR_ERR(ret) == -EAGAIN)
+		goto again;
+	return ret;
+}
+
+int open_related_ns(struct ns_common *ns,
+		   struct ns_common *(*get_ns)(struct ns_common *ns))
+{
+	struct path path = {};
+	struct file *f;
+	void *err;
+	int fd;
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0)
+		return fd;
+
+	while (1) {
+		struct ns_common *parent;
+
+		parent = get_ns(ns);
+		if (IS_ERR(parent)) {
+			put_unused_fd(fd);
+			return PTR_ERR(parent);
+		}
+
+		err = __ns_get_path(&path, parent);
+		if (IS_ERR(err) && PTR_ERR(err) == -EAGAIN)
+			continue;
+		break;
+	}
+	if (IS_ERR(err)) {
+		put_unused_fd(fd);
+		return PTR_ERR(err);
+	}
+
+	f = dentry_open(&path, O_RDONLY, current_cred());
+	path_put(&path);
+	if (IS_ERR(f)) {
+		put_unused_fd(fd);
+		fd = PTR_ERR(f);
+	} else
+		fd_install(fd, f);
+	return fd;
+}
+
+static long ns_ioctl(struct file *filp, unsigned int ioctl,
+			unsigned long arg)
+{
+	struct ns_common *ns = get_proc_ns(file_inode(filp));
+
+	switch (ioctl) {
+	case NS_GET_USERNS:
+		return open_related_ns(ns, ns_get_owner);
+	default:
+		return -ENOTTY;
+	}
+}
+
 int ns_get_name(char *buf, size_t size, struct task_struct *task,
 			const struct proc_ns_operations *ns_ops)
 {
diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h
new file mode 100644
index 0000000..7a09ede
--- /dev/null
+++ b/include/uapi/linux/nsfs.h
@@ -0,0 +1,9 @@
+#ifndef __LINUX_NSFS_H
+#define __LINUX_NSFS_H
+
+#include <linux/ioctl.h>
+
+#define NSIO	0xb7
+#define NS_GET_USERNS	_IO(NSIO, 0x1)
+
+#endif /* __LINUX_NSFS_H */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 3/5] nsfs: add ioctl to get an owning user namespace for ns file descriptor
  2016-07-14 18:20 ` Andrey Vagin
  (?)
  (?)
@ 2016-07-14 18:20 ` Andrey Vagin
  2016-07-14 18:48     ` W. Trevor King
       [not found]   ` <1468520419-28220-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
  -1 siblings, 2 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 18:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-api, containers, criu, linux-fsdevel, Andrey Vagin

Each namespace has an owning user namespace and now there is not way
to discover these relationships.

Understending namespaces relationships allows to answer the question:
what capability does process X have to perform operations on a resource
governed by namespace Y?

After a long discussion, Eric W. Biederman proposed to use ioctl-s for
this purpose.

The NS_GET_USERNS ioctl returns a file descriptor to an owning user
namespace.
It returns EPERM if a target namespace is outside of a current user
namespace.

Link: https://lkml.org/lkml/2016/7/6/158
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/nsfs.c                 | 94 ++++++++++++++++++++++++++++++++++++++++-------
 include/uapi/linux/nsfs.h |  9 +++++
 2 files changed, 90 insertions(+), 13 deletions(-)
 create mode 100644 include/uapi/linux/nsfs.h

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 8f20d60..1e5d2d0 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -5,11 +5,16 @@
 #include <linux/magic.h>
 #include <linux/ktime.h>
 #include <linux/seq_file.h>
+#include <linux/user_namespace.h>
+#include <linux/nsfs.h>
 
 static struct vfsmount *nsfs_mnt;
 
+static long ns_ioctl(struct file *filp, unsigned int ioctl,
+			unsigned long arg);
 static const struct file_operations ns_file_operations = {
 	.llseek		= no_llseek,
+	.unlocked_ioctl = ns_ioctl,
 };
 
 static char *ns_dname(struct dentry *dentry, char *buffer, int buflen)
@@ -44,22 +49,14 @@ static void nsfs_evict(struct inode *inode)
 	ns->ops->put(ns);
 }
 
-void *ns_get_path(struct path *path, struct task_struct *task,
-			const struct proc_ns_operations *ns_ops)
+static void *__ns_get_path(struct path *path, struct ns_common *ns)
 {
 	struct vfsmount *mnt = mntget(nsfs_mnt);
 	struct qstr qname = { .name = "", };
 	struct dentry *dentry;
 	struct inode *inode;
-	struct ns_common *ns;
 	unsigned long d;
 
-again:
-	ns = ns_ops->get(task);
-	if (!ns) {
-		mntput(mnt);
-		return ERR_PTR(-ENOENT);
-	}
 	rcu_read_lock();
 	d = atomic_long_read(&ns->stashed);
 	if (!d)
@@ -68,7 +65,7 @@ again:
 	if (!lockref_get_not_dead(&dentry->d_lockref))
 		goto slow;
 	rcu_read_unlock();
-	ns_ops->put(ns);
+	ns->ops->put(ns);
 got_it:
 	path->mnt = mnt;
 	path->dentry = dentry;
@@ -77,7 +74,7 @@ slow:
 	rcu_read_unlock();
 	inode = new_inode_pseudo(mnt->mnt_sb);
 	if (!inode) {
-		ns_ops->put(ns);
+		ns->ops->put(ns);
 		mntput(mnt);
 		return ERR_PTR(-ENOMEM);
 	}
@@ -95,17 +92,88 @@ slow:
 		return ERR_PTR(-ENOMEM);
 	}
 	d_instantiate(dentry, inode);
-	dentry->d_fsdata = (void *)ns_ops;
+	dentry->d_fsdata = (void *)ns->ops;
 	d = atomic_long_cmpxchg(&ns->stashed, 0, (unsigned long)dentry);
 	if (d) {
 		d_delete(dentry);	/* make sure ->d_prune() does nothing */
 		dput(dentry);
 		cpu_relax();
-		goto again;
+		return ERR_PTR(-EAGAIN);
 	}
 	goto got_it;
 }
 
+void *ns_get_path(struct path *path, struct task_struct *task,
+			const struct proc_ns_operations *ns_ops)
+{
+	struct ns_common *ns;
+	void *ret;
+
+again:
+	ns = ns_ops->get(task);
+	if (!ns)
+		return ERR_PTR(-ENOENT);
+
+	ret = __ns_get_path(path, ns);
+	if (IS_ERR(ret) && PTR_ERR(ret) == -EAGAIN)
+		goto again;
+	return ret;
+}
+
+int open_related_ns(struct ns_common *ns,
+		   struct ns_common *(*get_ns)(struct ns_common *ns))
+{
+	struct path path = {};
+	struct file *f;
+	void *err;
+	int fd;
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0)
+		return fd;
+
+	while (1) {
+		struct ns_common *parent;
+
+		parent = get_ns(ns);
+		if (IS_ERR(parent)) {
+			put_unused_fd(fd);
+			return PTR_ERR(parent);
+		}
+
+		err = __ns_get_path(&path, parent);
+		if (IS_ERR(err) && PTR_ERR(err) == -EAGAIN)
+			continue;
+		break;
+	}
+	if (IS_ERR(err)) {
+		put_unused_fd(fd);
+		return PTR_ERR(err);
+	}
+
+	f = dentry_open(&path, O_RDONLY, current_cred());
+	path_put(&path);
+	if (IS_ERR(f)) {
+		put_unused_fd(fd);
+		fd = PTR_ERR(f);
+	} else
+		fd_install(fd, f);
+	return fd;
+}
+
+static long ns_ioctl(struct file *filp, unsigned int ioctl,
+			unsigned long arg)
+{
+	struct ns_common *ns = get_proc_ns(file_inode(filp));
+
+	switch (ioctl) {
+	case NS_GET_USERNS:
+		return open_related_ns(ns, ns_get_owner);
+	default:
+		return -ENOTTY;
+	}
+}
+
 int ns_get_name(char *buf, size_t size, struct task_struct *task,
 			const struct proc_ns_operations *ns_ops)
 {
diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h
new file mode 100644
index 0000000..7a09ede
--- /dev/null
+++ b/include/uapi/linux/nsfs.h
@@ -0,0 +1,9 @@
+#ifndef __LINUX_NSFS_H
+#define __LINUX_NSFS_H
+
+#include <linux/ioctl.h>
+
+#define NSIO	0xb7
+#define NS_GET_USERNS	_IO(NSIO, 0x1)
+
+#endif /* __LINUX_NSFS_H */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 4/5] nsfs: add ioctl to get a parent namespace
  2016-07-14 18:20 ` Andrey Vagin
@ 2016-07-14 18:20     ` Andrey Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 18:20 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: criu-GEFAQzZX7r8dnm+yROfE0A, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrey Vagin

Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships.

In a future we will use this interface to dump and restore nested
namespaces.

Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 fs/nsfs.c                 |  4 ++++
 include/linux/proc_ns.h   |  1 +
 include/uapi/linux/nsfs.h |  1 +
 kernel/pid_namespace.c    | 26 ++++++++++++++++++++++++++
 kernel/user_namespace.c   |  1 +
 5 files changed, 33 insertions(+)

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 1e5d2d0..b607a42 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -169,6 +169,10 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl,
 	switch (ioctl) {
 	case NS_GET_USERNS:
 		return open_related_ns(ns, ns_get_owner);
+	case NS_GET_PARENT:
+		if (!ns->ops->get_parent)
+			return -EINVAL;
+		return open_related_ns(ns, ns->ops->get_parent);
 	default:
 		return -ENOTTY;
 	}
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index de0e771..1c9f720 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -18,6 +18,7 @@ struct proc_ns_operations {
 	struct ns_common *(*get)(struct task_struct *task);
 	void (*put)(struct ns_common *ns);
 	int (*install)(struct nsproxy *nsproxy, struct ns_common *ns);
+	struct ns_common *(*get_parent)(struct ns_common *ns);
 };
 
 extern const struct proc_ns_operations netns_operations;
diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h
index 7a09ede..88098ea 100644
--- a/include/uapi/linux/nsfs.h
+++ b/include/uapi/linux/nsfs.h
@@ -5,5 +5,6 @@
 
 #define NSIO	0xb7
 #define NS_GET_USERNS	_IO(NSIO, 0x1)
+#define NS_GET_PARENT	_IO(NSIO, 0x2)
 
 #endif /* __LINUX_NSFS_H */
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 3529a03..a63adfb 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -388,12 +388,38 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	return 0;
 }
 
+static struct ns_common *pidns_get_parent(struct ns_common *ns)
+{
+	struct pid_namespace *active = task_active_pid_ns(current);
+	struct pid_namespace *pid_ns, *p;
+
+	pid_ns = to_pid_ns(ns);
+	if (pid_ns == &init_pid_ns) {
+		if (capable(CAP_SYS_ADMIN))
+			return ERR_PTR(-ENOENT);
+		return ERR_PTR(-EPERM);
+	}
+
+	pid_ns = p = pid_ns->parent;
+
+	for (;;) {
+		if (p == active)
+			break;
+		if (p == &init_pid_ns)
+			return ERR_PTR(-EPERM);
+		p = p->parent;
+	}
+
+	return &get_pid_ns(pid_ns)->ns;
+}
+
 const struct proc_ns_operations pidns_operations = {
 	.name		= "pid",
 	.type		= CLONE_NEWPID,
 	.get		= pidns_get,
 	.put		= pidns_put,
 	.install	= pidns_install,
+	.get_parent	= pidns_get_parent,
 };
 
 static __init int pid_namespaces_init(void)
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 6382e5e..d6ba0b8 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -1024,6 +1024,7 @@ const struct proc_ns_operations userns_operations = {
 	.get		= userns_get,
 	.put		= userns_put,
 	.install	= userns_install,
+	.get_parent	= ns_get_owner,
 };
 
 static __init int user_namespaces_init(void)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 4/5] nsfs: add ioctl to get a parent namespace
@ 2016-07-14 18:20     ` Andrey Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 18:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-api, containers, criu, linux-fsdevel, Andrey Vagin

Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships.

In a future we will use this interface to dump and restore nested
namespaces.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/nsfs.c                 |  4 ++++
 include/linux/proc_ns.h   |  1 +
 include/uapi/linux/nsfs.h |  1 +
 kernel/pid_namespace.c    | 26 ++++++++++++++++++++++++++
 kernel/user_namespace.c   |  1 +
 5 files changed, 33 insertions(+)

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 1e5d2d0..b607a42 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -169,6 +169,10 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl,
 	switch (ioctl) {
 	case NS_GET_USERNS:
 		return open_related_ns(ns, ns_get_owner);
+	case NS_GET_PARENT:
+		if (!ns->ops->get_parent)
+			return -EINVAL;
+		return open_related_ns(ns, ns->ops->get_parent);
 	default:
 		return -ENOTTY;
 	}
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index de0e771..1c9f720 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -18,6 +18,7 @@ struct proc_ns_operations {
 	struct ns_common *(*get)(struct task_struct *task);
 	void (*put)(struct ns_common *ns);
 	int (*install)(struct nsproxy *nsproxy, struct ns_common *ns);
+	struct ns_common *(*get_parent)(struct ns_common *ns);
 };
 
 extern const struct proc_ns_operations netns_operations;
diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h
index 7a09ede..88098ea 100644
--- a/include/uapi/linux/nsfs.h
+++ b/include/uapi/linux/nsfs.h
@@ -5,5 +5,6 @@
 
 #define NSIO	0xb7
 #define NS_GET_USERNS	_IO(NSIO, 0x1)
+#define NS_GET_PARENT	_IO(NSIO, 0x2)
 
 #endif /* __LINUX_NSFS_H */
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 3529a03..a63adfb 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -388,12 +388,38 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	return 0;
 }
 
+static struct ns_common *pidns_get_parent(struct ns_common *ns)
+{
+	struct pid_namespace *active = task_active_pid_ns(current);
+	struct pid_namespace *pid_ns, *p;
+
+	pid_ns = to_pid_ns(ns);
+	if (pid_ns == &init_pid_ns) {
+		if (capable(CAP_SYS_ADMIN))
+			return ERR_PTR(-ENOENT);
+		return ERR_PTR(-EPERM);
+	}
+
+	pid_ns = p = pid_ns->parent;
+
+	for (;;) {
+		if (p == active)
+			break;
+		if (p == &init_pid_ns)
+			return ERR_PTR(-EPERM);
+		p = p->parent;
+	}
+
+	return &get_pid_ns(pid_ns)->ns;
+}
+
 const struct proc_ns_operations pidns_operations = {
 	.name		= "pid",
 	.type		= CLONE_NEWPID,
 	.get		= pidns_get,
 	.put		= pidns_put,
 	.install	= pidns_install,
+	.get_parent	= pidns_get_parent,
 };
 
 static __init int pid_namespaces_init(void)
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 6382e5e..d6ba0b8 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -1024,6 +1024,7 @@ const struct proc_ns_operations userns_operations = {
 	.get		= userns_get,
 	.put		= userns_put,
 	.install	= userns_install,
+	.get_parent	= ns_get_owner,
 };
 
 static __init int user_namespaces_init(void)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 5/5] tools/testing: add a test to check nsfs ioctl-s
  2016-07-14 18:20 ` Andrey Vagin
@ 2016-07-14 18:20     ` Andrey Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 18:20 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: criu-GEFAQzZX7r8dnm+yROfE0A, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrey Vagin

There are two new ioctl-s:
One ioctl for the user namespace that owns a file descriptor.
One ioctl for the parent namespace of a namespace file descriptor.

The test checks that these ioctl-s works and that they handle a case
when a target namespace is outside of the current process namespace.

Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 tools/testing/selftests/Makefile      |  1 +
 tools/testing/selftests/nsfs/Makefile | 12 +++++
 tools/testing/selftests/nsfs/owner.c  | 91 +++++++++++++++++++++++++++++++++++
 tools/testing/selftests/nsfs/pidns.c  | 74 ++++++++++++++++++++++++++++
 4 files changed, 178 insertions(+)
 create mode 100644 tools/testing/selftests/nsfs/Makefile
 create mode 100644 tools/testing/selftests/nsfs/owner.c
 create mode 100644 tools/testing/selftests/nsfs/pidns.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index ff9e5f2..f770dba 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -15,6 +15,7 @@ TARGETS += memory-hotplug
 TARGETS += mount
 TARGETS += mqueue
 TARGETS += net
+TARGETS += nsfs
 TARGETS += powerpc
 TARGETS += pstore
 TARGETS += ptrace
diff --git a/tools/testing/selftests/nsfs/Makefile b/tools/testing/selftests/nsfs/Makefile
new file mode 100644
index 0000000..2306054
--- /dev/null
+++ b/tools/testing/selftests/nsfs/Makefile
@@ -0,0 +1,12 @@
+TEST_PROGS := owner pidns
+
+CFLAGS := -Wall -Werror
+
+all: owner pidns
+owner: owner.c
+pidns: pidns.c
+
+clean:
+	$(RM) owner pidns
+
+include ../lib.mk
diff --git a/tools/testing/selftests/nsfs/owner.c b/tools/testing/selftests/nsfs/owner.c
new file mode 100644
index 0000000..c97aa50
--- /dev/null
+++ b/tools/testing/selftests/nsfs/owner.c
@@ -0,0 +1,91 @@
+#define _GNU_SOURCE
+#include <sched.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+
+#define NSIO    0xb7
+#define NS_GET_USERNS   _IO(NSIO, 0x1)
+
+#define pr_err(fmt, ...) \
+		({ \
+			fprintf(stderr, "%s:%d:" fmt ": %m\n", \
+				__func__, __LINE__, ##__VA_ARGS__); \
+			1; \
+		})
+
+int main(int argc, char *argvp[])
+{
+	int pfd[2], ns, uns, init_uns;
+	struct stat st1, st2;
+	char path[128];
+	pid_t pid;
+	char c;
+
+	if (pipe(pfd))
+		return 1;
+
+	pid = fork();
+	if (pid < 0)
+		return pr_err("fork");
+	if (pid == 0) {
+		prctl(PR_SET_PDEATHSIG, SIGKILL);
+		if (unshare(CLONE_NEWUTS | CLONE_NEWUSER))
+			return pr_err("unshare");
+		close(pfd[0]);
+		close(pfd[1]);
+		while (1)
+			sleep(1);
+		return 0;
+	}
+	close(pfd[1]);
+	if (read(pfd[0], &c, 1) != 0)
+		return pr_err("Unable to read from pipe");
+	close(pfd[0]);
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/uts", pid);
+	ns = open(path, O_RDONLY);
+	if (ns < 0)
+		return pr_err("Unable to open %s", path);
+
+	uns = ioctl(ns, NS_GET_USERNS);
+	if (uns < 0)
+		return pr_err("Unable to get an owning user namespace");
+
+	if (fstat(uns, &st1))
+		return pr_err("fstat");
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/user", pid);
+	if (stat(path, &st2))
+		return pr_err("stat");
+
+	if (st1.st_ino != st2.st_ino)
+		return pr_err("NS_GET_USERNS returned a wrong namespace");
+
+	init_uns = ioctl(uns, NS_GET_USERNS);
+	if (uns < 0)
+		return pr_err("Unable to get an owning user namespace");
+
+	if (ioctl(init_uns, NS_GET_USERNS) >= 0 || errno != ENOENT)
+		return pr_err("Don't get ENOENT");
+
+	if (unshare(CLONE_NEWUSER))
+		return pr_err("unshare");
+
+	if (ioctl(ns, NS_GET_USERNS) >= 0 || errno != EPERM)
+		return pr_err("Don't get EPERM");
+	if (ioctl(init_uns, NS_GET_USERNS) >= 0 || errno != EPERM)
+		return pr_err("Don't get EPERM");
+
+	kill(pid, SIGKILL);
+	wait(NULL);
+	return 0;
+}
diff --git a/tools/testing/selftests/nsfs/pidns.c b/tools/testing/selftests/nsfs/pidns.c
new file mode 100644
index 0000000..99b1131
--- /dev/null
+++ b/tools/testing/selftests/nsfs/pidns.c
@@ -0,0 +1,74 @@
+#define _GNU_SOURCE
+#include <sched.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+
+#define pr_err(fmt, ...) \
+		({ \
+			fprintf(stderr, "%s:%d:" fmt ": %m\n", \
+				__func__, __LINE__, ##__VA_ARGS__); \
+			1; \
+		})
+
+#define NSIO	0xb7
+#define NS_GET_USERNS   _IO(NSIO, 0x1)
+#define NS_GET_PARENT   _IO(NSIO, 0x2)
+
+#define __stack_aligned__	__attribute__((aligned(16)))
+struct cr_clone_arg {
+	char stack[128] __stack_aligned__;
+	char stack_ptr[0];
+};
+
+static int child(void *args)
+{
+	prctl(PR_SET_PDEATHSIG, SIGKILL);
+	while (1)
+		sleep(1);
+	exit(0);
+}
+
+int main(int argc, char *argv[])
+{
+	char path[] = "/proc/0123456789/ns/pid";
+	struct cr_clone_arg ca;
+	struct stat st1, st2;
+	int ns, pns;
+	pid_t pid;
+
+	pid = clone(child, ca.stack_ptr, CLONE_NEWPID | SIGCLD, NULL);
+	if (pid < 0)
+		return pr_err("clone");
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/pid", pid);
+	ns = open(path, O_RDONLY);
+	if (ns < 0)
+		return pr_err("Unable to open %s", path);
+
+	pns = ioctl(ns, NS_GET_PARENT);
+	if (pns < 0)
+		return pr_err("Unable to get a parent pidns");
+
+	if (stat("/proc/self/ns/pid", &st2))
+		return pr_err("Unable to stat %s", path);
+	if (fstat(pns, &st1))
+		return pr_err("Unable to stat the parent pidns");
+	if (st1.st_ino != st2.st_ino)
+		return pr_err("NS_GET_PARENT returned a wrong namespace");
+
+	if (ioctl(pns, NS_GET_PARENT) >= 0 || errno != ENOENT)
+		return pr_err("Don't get ENOENT");;
+
+	kill(pid, SIGKILL);
+	wait(NULL);
+	return 0;
+}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 5/5] tools/testing: add a test to check nsfs ioctl-s
@ 2016-07-14 18:20     ` Andrey Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 18:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-api, containers, criu, linux-fsdevel, Andrey Vagin

There are two new ioctl-s:
One ioctl for the user namespace that owns a file descriptor.
One ioctl for the parent namespace of a namespace file descriptor.

The test checks that these ioctl-s works and that they handle a case
when a target namespace is outside of the current process namespace.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 tools/testing/selftests/Makefile      |  1 +
 tools/testing/selftests/nsfs/Makefile | 12 +++++
 tools/testing/selftests/nsfs/owner.c  | 91 +++++++++++++++++++++++++++++++++++
 tools/testing/selftests/nsfs/pidns.c  | 74 ++++++++++++++++++++++++++++
 4 files changed, 178 insertions(+)
 create mode 100644 tools/testing/selftests/nsfs/Makefile
 create mode 100644 tools/testing/selftests/nsfs/owner.c
 create mode 100644 tools/testing/selftests/nsfs/pidns.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index ff9e5f2..f770dba 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -15,6 +15,7 @@ TARGETS += memory-hotplug
 TARGETS += mount
 TARGETS += mqueue
 TARGETS += net
+TARGETS += nsfs
 TARGETS += powerpc
 TARGETS += pstore
 TARGETS += ptrace
diff --git a/tools/testing/selftests/nsfs/Makefile b/tools/testing/selftests/nsfs/Makefile
new file mode 100644
index 0000000..2306054
--- /dev/null
+++ b/tools/testing/selftests/nsfs/Makefile
@@ -0,0 +1,12 @@
+TEST_PROGS := owner pidns
+
+CFLAGS := -Wall -Werror
+
+all: owner pidns
+owner: owner.c
+pidns: pidns.c
+
+clean:
+	$(RM) owner pidns
+
+include ../lib.mk
diff --git a/tools/testing/selftests/nsfs/owner.c b/tools/testing/selftests/nsfs/owner.c
new file mode 100644
index 0000000..c97aa50
--- /dev/null
+++ b/tools/testing/selftests/nsfs/owner.c
@@ -0,0 +1,91 @@
+#define _GNU_SOURCE
+#include <sched.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+
+#define NSIO    0xb7
+#define NS_GET_USERNS   _IO(NSIO, 0x1)
+
+#define pr_err(fmt, ...) \
+		({ \
+			fprintf(stderr, "%s:%d:" fmt ": %m\n", \
+				__func__, __LINE__, ##__VA_ARGS__); \
+			1; \
+		})
+
+int main(int argc, char *argvp[])
+{
+	int pfd[2], ns, uns, init_uns;
+	struct stat st1, st2;
+	char path[128];
+	pid_t pid;
+	char c;
+
+	if (pipe(pfd))
+		return 1;
+
+	pid = fork();
+	if (pid < 0)
+		return pr_err("fork");
+	if (pid == 0) {
+		prctl(PR_SET_PDEATHSIG, SIGKILL);
+		if (unshare(CLONE_NEWUTS | CLONE_NEWUSER))
+			return pr_err("unshare");
+		close(pfd[0]);
+		close(pfd[1]);
+		while (1)
+			sleep(1);
+		return 0;
+	}
+	close(pfd[1]);
+	if (read(pfd[0], &c, 1) != 0)
+		return pr_err("Unable to read from pipe");
+	close(pfd[0]);
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/uts", pid);
+	ns = open(path, O_RDONLY);
+	if (ns < 0)
+		return pr_err("Unable to open %s", path);
+
+	uns = ioctl(ns, NS_GET_USERNS);
+	if (uns < 0)
+		return pr_err("Unable to get an owning user namespace");
+
+	if (fstat(uns, &st1))
+		return pr_err("fstat");
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/user", pid);
+	if (stat(path, &st2))
+		return pr_err("stat");
+
+	if (st1.st_ino != st2.st_ino)
+		return pr_err("NS_GET_USERNS returned a wrong namespace");
+
+	init_uns = ioctl(uns, NS_GET_USERNS);
+	if (uns < 0)
+		return pr_err("Unable to get an owning user namespace");
+
+	if (ioctl(init_uns, NS_GET_USERNS) >= 0 || errno != ENOENT)
+		return pr_err("Don't get ENOENT");
+
+	if (unshare(CLONE_NEWUSER))
+		return pr_err("unshare");
+
+	if (ioctl(ns, NS_GET_USERNS) >= 0 || errno != EPERM)
+		return pr_err("Don't get EPERM");
+	if (ioctl(init_uns, NS_GET_USERNS) >= 0 || errno != EPERM)
+		return pr_err("Don't get EPERM");
+
+	kill(pid, SIGKILL);
+	wait(NULL);
+	return 0;
+}
diff --git a/tools/testing/selftests/nsfs/pidns.c b/tools/testing/selftests/nsfs/pidns.c
new file mode 100644
index 0000000..99b1131
--- /dev/null
+++ b/tools/testing/selftests/nsfs/pidns.c
@@ -0,0 +1,74 @@
+#define _GNU_SOURCE
+#include <sched.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+
+#define pr_err(fmt, ...) \
+		({ \
+			fprintf(stderr, "%s:%d:" fmt ": %m\n", \
+				__func__, __LINE__, ##__VA_ARGS__); \
+			1; \
+		})
+
+#define NSIO	0xb7
+#define NS_GET_USERNS   _IO(NSIO, 0x1)
+#define NS_GET_PARENT   _IO(NSIO, 0x2)
+
+#define __stack_aligned__	__attribute__((aligned(16)))
+struct cr_clone_arg {
+	char stack[128] __stack_aligned__;
+	char stack_ptr[0];
+};
+
+static int child(void *args)
+{
+	prctl(PR_SET_PDEATHSIG, SIGKILL);
+	while (1)
+		sleep(1);
+	exit(0);
+}
+
+int main(int argc, char *argv[])
+{
+	char path[] = "/proc/0123456789/ns/pid";
+	struct cr_clone_arg ca;
+	struct stat st1, st2;
+	int ns, pns;
+	pid_t pid;
+
+	pid = clone(child, ca.stack_ptr, CLONE_NEWPID | SIGCLD, NULL);
+	if (pid < 0)
+		return pr_err("clone");
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/pid", pid);
+	ns = open(path, O_RDONLY);
+	if (ns < 0)
+		return pr_err("Unable to open %s", path);
+
+	pns = ioctl(ns, NS_GET_PARENT);
+	if (pns < 0)
+		return pr_err("Unable to get a parent pidns");
+
+	if (stat("/proc/self/ns/pid", &st2))
+		return pr_err("Unable to stat %s", path);
+	if (fstat(pns, &st1))
+		return pr_err("Unable to stat the parent pidns");
+	if (st1.st_ino != st2.st_ino)
+		return pr_err("NS_GET_PARENT returned a wrong namespace");
+
+	if (ioctl(pns, NS_GET_PARENT) >= 0 || errno != ENOENT)
+		return pr_err("Don't get ENOENT");;
+
+	kill(pid, SIGKILL);
+	wait(NULL);
+	return 0;
+}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [PATCH 3/5] nsfs: add ioctl to get an owning user namespace for ns file descriptor
       [not found]   ` <1468520419-28220-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2016-07-14 18:48     ` W. Trevor King
  0 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-14 18:48 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: criu-GEFAQzZX7r8dnm+yROfE0A, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 830 bytes --]


On Thu, Jul 14, 2016 at 11:20:17AM -0700, Andrey Vagin wrote:
> +int open_related_ns(struct ns_common *ns,
> +		   struct ns_common *(*get_ns)(struct ns_common *ns))
> +{
> +	struct path path = {};
> +	struct file *f;
> +	void *err;
> +	int fd;
> +
> +	fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (fd < 0)
> +		return fd;
> +
> +	while (1) {
> +		struct ns_common *parent;

I think you want to rename this variable to ‘relative’ or some other
more-generic term [1] to echo ‘related’ in the function name.

Cheers,
Trevor

[1]: https://github.com/avagin/linux-task-diag/commit/7fad8ff3fc4110bebf0920cec2388390b3bd2238#commitcomment-18223391

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 3/5] nsfs: add ioctl to get an owning user namespace for ns file descriptor
       [not found]   ` <1468520419-28220-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2016-07-14 18:48     ` W. Trevor King
  0 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-14 18:48 UTC (permalink / raw)
  To: Andrey Vagin; +Cc: linux-kernel, criu, linux-api, containers, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 830 bytes --]


On Thu, Jul 14, 2016 at 11:20:17AM -0700, Andrey Vagin wrote:
> +int open_related_ns(struct ns_common *ns,
> +		   struct ns_common *(*get_ns)(struct ns_common *ns))
> +{
> +	struct path path = {};
> +	struct file *f;
> +	void *err;
> +	int fd;
> +
> +	fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (fd < 0)
> +		return fd;
> +
> +	while (1) {
> +		struct ns_common *parent;

I think you want to rename this variable to ‘relative’ or some other
more-generic term [1] to echo ‘related’ in the function name.

Cheers,
Trevor

[1]: https://github.com/avagin/linux-task-diag/commit/7fad8ff3fc4110bebf0920cec2388390b3bd2238#commitcomment-18223391

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 3/5] nsfs: add ioctl to get an owning user namespace for ns file descriptor
@ 2016-07-14 18:48     ` W. Trevor King
  0 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-14 18:48 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 830 bytes --]


On Thu, Jul 14, 2016 at 11:20:17AM -0700, Andrey Vagin wrote:
> +int open_related_ns(struct ns_common *ns,
> +		   struct ns_common *(*get_ns)(struct ns_common *ns))
> +{
> +	struct path path = {};
> +	struct file *f;
> +	void *err;
> +	int fd;
> +
> +	fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (fd < 0)
> +		return fd;
> +
> +	while (1) {
> +		struct ns_common *parent;

I think you want to rename this variable to ‘relative’ or some other
more-generic term [1] to echo ‘related’ in the function name.

Cheers,
Trevor

[1]: https://github.com/avagin/linux-task-diag/commit/7fad8ff3fc4110bebf0920cec2388390b3bd2238#commitcomment-18223391

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
  2016-07-14 18:20     ` Andrey Vagin
@ 2016-07-14 19:07         ` W. Trevor King
  -1 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-14 19:07 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: criu-GEFAQzZX7r8dnm+yROfE0A, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 2203 bytes --]

On Thu, Jul 14, 2016 at 11:20:16AM -0700, Andrey Vagin wrote:
> +struct ns_common *ns_get_owner(struct ns_common *ns)
> +{
> +	const struct cred *cred = current_cred();
> +	struct user_namespace *user_ns, *p;
> +
> +	user_ns = p = ns->user_ns;
> +	if (user_ns == NULL) { /* ns is init_user_ns */
> +		/* Unprivileged user should not know that it's init_user_ns. */
> +		if (capable(CAP_SYS_ADMIN))
> +			return ERR_PTR(-ENOENT);
> +		return ERR_PTR(-EPERM);
> +	}
> +
> +	for (;;) {
> +		if (p == cred->user_ns)
> +			break;
> +		if (p == &init_user_ns)
> +			return ERR_PTR(-EPERM);
> +		p = p->parent;
> +	}
> +
> +	return &get_user_ns(user_ns)->ns;
> +}

I'm still not sure we need the CAP_SYS_ADMIN check [1].  Maybe “you
have an open file descriptor for the namespace” means you've already
been authorized to access the parent information (e.g. via POSIX
permissions on /proc/<pid>/ns/… or the bind-mounted namespace).
Whether you can get the parent information probably depends whether
you can use setns to join the parent namespace (I haven't looked up
the backing code for that).

But whichever way we go there, I think we do want to be consistent
between init_user_ns and other namespaces.  So we should have a
CAP_SYS_ADMIN check for init_user_ns if and only if we also have a
CAP_SYS_ADMIN check for the returned parent in the non-init_user_ns
case as well:

  user_ns = p = ns->user_ns;
  if (user_ns == NULL) { /* ns is init_user_ns */
    /* Unprivileged user should not know that it's init_user_ns. */
    if (capable(CAP_SYS_ADMIN))
      return ERR_PTR(-ENOENT);
     return ERR_PTR(-EPERM);
  } else if (! capable_in(user_ns, CAP_SYS_ADMIN)) {
    /* Unprivileged user should not know about the owning user ns. */
    return ERR_PTR(-ENOENT);
  }

Although I'm not sure what the real name for capable_in is, or even if
it exists.

Cheers,
Trevor

[1]: https://github.com/avagin/linux-task-diag/commit/2663bc803d324785e328261f3c07a0fef37d2088#commitcomment-18223327

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
@ 2016-07-14 19:07         ` W. Trevor King
  0 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-14 19:07 UTC (permalink / raw)
  To: Andrey Vagin; +Cc: linux-kernel, criu, linux-api, containers, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 2203 bytes --]

On Thu, Jul 14, 2016 at 11:20:16AM -0700, Andrey Vagin wrote:
> +struct ns_common *ns_get_owner(struct ns_common *ns)
> +{
> +	const struct cred *cred = current_cred();
> +	struct user_namespace *user_ns, *p;
> +
> +	user_ns = p = ns->user_ns;
> +	if (user_ns == NULL) { /* ns is init_user_ns */
> +		/* Unprivileged user should not know that it's init_user_ns. */
> +		if (capable(CAP_SYS_ADMIN))
> +			return ERR_PTR(-ENOENT);
> +		return ERR_PTR(-EPERM);
> +	}
> +
> +	for (;;) {
> +		if (p == cred->user_ns)
> +			break;
> +		if (p == &init_user_ns)
> +			return ERR_PTR(-EPERM);
> +		p = p->parent;
> +	}
> +
> +	return &get_user_ns(user_ns)->ns;
> +}

I'm still not sure we need the CAP_SYS_ADMIN check [1].  Maybe “you
have an open file descriptor for the namespace” means you've already
been authorized to access the parent information (e.g. via POSIX
permissions on /proc/<pid>/ns/… or the bind-mounted namespace).
Whether you can get the parent information probably depends whether
you can use setns to join the parent namespace (I haven't looked up
the backing code for that).

But whichever way we go there, I think we do want to be consistent
between init_user_ns and other namespaces.  So we should have a
CAP_SYS_ADMIN check for init_user_ns if and only if we also have a
CAP_SYS_ADMIN check for the returned parent in the non-init_user_ns
case as well:

  user_ns = p = ns->user_ns;
  if (user_ns == NULL) { /* ns is init_user_ns */
    /* Unprivileged user should not know that it's init_user_ns. */
    if (capable(CAP_SYS_ADMIN))
      return ERR_PTR(-ENOENT);
     return ERR_PTR(-EPERM);
  } else if (! capable_in(user_ns, CAP_SYS_ADMIN)) {
    /* Unprivileged user should not know about the owning user ns. */
    return ERR_PTR(-ENOENT);
  }

Although I'm not sure what the real name for capable_in is, or even if
it exists.

Cheers,
Trevor

[1]: https://github.com/avagin/linux-task-diag/commit/2663bc803d324785e328261f3c07a0fef37d2088#commitcomment-18223327

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found] ` <1468520419-28220-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
                     ` (4 preceding siblings ...)
  2016-07-14 18:20     ` Andrey Vagin
@ 2016-07-14 22:02   ` Andrey Vagin
  2016-07-21 14:41   ` Michael Kerrisk (man-pages)
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 22:02 UTC (permalink / raw)
  To: LKML
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn, Linux API,
	Linux Containers, Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A,
	Eric W. Biederman, linux-fsdevel, Michael Kerrisk (man-pages)

Hello,

I forgot to add --cc-cover for git send-email, so everyone who is in
Cc got only a cover letter. All messages were sent in mail lists.

Sorry for inconvenience.

On Thu, Jul 14, 2016 at 11:20 AM, Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> Each namespace has an owning user namespace and now there is not way
> to discover these relationships.
>
> Pid and user namepaces are hierarchical. There is no way to discover
> parent-child relationships too.
>
> Why we may want to know relationships between namespaces?
>
> One use would be visualization, in order to understand the running system.
> Another would be to answer the question: what capability does process X have to
> perform operations on a resource governed by namespace Y?
>
> One more use-case (which usually called abnormal) is checkpoint/restart.
> In CRIU we age going to dump and restore nested namespaces.
>
> There [1] was a discussion about which interface to choose to determing
> relationships between namespaces.
>
> Eric suggested to add two ioctl-s [2]:
>> Grumble, Grumble.  I think this may actually a case for creating ioctls
>> for these two cases.  Now that random nsfs file descriptors are bind
>> mountable the original reason for using proc files is not as pressing.
>>
>> One ioctl for the user namespace that owns a file descriptor.
>> One ioctl for the parent namespace of a namespace file descriptor.
>
> Here is an implementaions of these ioctl-s.
>
> [1] https://lkml.org/lkml/2016/7/6/158
> [2] https://lkml.org/lkml/2016/7/9/101
>
> Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> Cc: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: "W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org>
> Cc: Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
> Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
>
> --
> 2.5.5
>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found] ` <1468520419-28220-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2016-07-14 22:02   ` Andrey Vagin
  2016-07-14 18:20     ` Andrey Vagin
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 22:02 UTC (permalink / raw)
  To: LKML
  Cc: Linux API, Linux Containers, criu, linux-fsdevel, Andrey Vagin,
	Eric W. Biederman, James Bottomley, Michael Kerrisk (man-pages),
	W. Trevor King, Alexander Viro, Serge Hallyn

Hello,

I forgot to add --cc-cover for git send-email, so everyone who is in
Cc got only a cover letter. All messages were sent in mail lists.

Sorry for inconvenience.

On Thu, Jul 14, 2016 at 11:20 AM, Andrey Vagin <avagin@openvz.org> wrote:
> Each namespace has an owning user namespace and now there is not way
> to discover these relationships.
>
> Pid and user namepaces are hierarchical. There is no way to discover
> parent-child relationships too.
>
> Why we may want to know relationships between namespaces?
>
> One use would be visualization, in order to understand the running system.
> Another would be to answer the question: what capability does process X have to
> perform operations on a resource governed by namespace Y?
>
> One more use-case (which usually called abnormal) is checkpoint/restart.
> In CRIU we age going to dump and restore nested namespaces.
>
> There [1] was a discussion about which interface to choose to determing
> relationships between namespaces.
>
> Eric suggested to add two ioctl-s [2]:
>> Grumble, Grumble.  I think this may actually a case for creating ioctls
>> for these two cases.  Now that random nsfs file descriptors are bind
>> mountable the original reason for using proc files is not as pressing.
>>
>> One ioctl for the user namespace that owns a file descriptor.
>> One ioctl for the parent namespace of a namespace file descriptor.
>
> Here is an implementaions of these ioctl-s.
>
> [1] https://lkml.org/lkml/2016/7/6/158
> [2] https://lkml.org/lkml/2016/7/9/101
>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
> Cc: "W. Trevor King" <wking@tremily.us>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Serge Hallyn <serge.hallyn@canonical.com>
>
> --
> 2.5.5
>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-14 22:02   ` Andrey Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-14 22:02 UTC (permalink / raw)
  To: LKML
  Cc: Linux API, Linux Containers, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel, Andrey Vagin, Eric W. Biederman, James Bottomley,
	Michael Kerrisk (man-pages),
	W. Trevor King, Alexander Viro, Serge Hallyn

Hello,

I forgot to add --cc-cover for git send-email, so everyone who is in
Cc got only a cover letter. All messages were sent in mail lists.

Sorry for inconvenience.

On Thu, Jul 14, 2016 at 11:20 AM, Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> Each namespace has an owning user namespace and now there is not way
> to discover these relationships.
>
> Pid and user namepaces are hierarchical. There is no way to discover
> parent-child relationships too.
>
> Why we may want to know relationships between namespaces?
>
> One use would be visualization, in order to understand the running system.
> Another would be to answer the question: what capability does process X have to
> perform operations on a resource governed by namespace Y?
>
> One more use-case (which usually called abnormal) is checkpoint/restart.
> In CRIU we age going to dump and restore nested namespaces.
>
> There [1] was a discussion about which interface to choose to determing
> relationships between namespaces.
>
> Eric suggested to add two ioctl-s [2]:
>> Grumble, Grumble.  I think this may actually a case for creating ioctls
>> for these two cases.  Now that random nsfs file descriptors are bind
>> mountable the original reason for using proc files is not as pressing.
>>
>> One ioctl for the user namespace that owns a file descriptor.
>> One ioctl for the parent namespace of a namespace file descriptor.
>
> Here is an implementaions of these ioctl-s.
>
> [1] https://lkml.org/lkml/2016/7/6/158
> [2] https://lkml.org/lkml/2016/7/9/101
>
> Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> Cc: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: "W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org>
> Cc: Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
> Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
>
> --
> 2.5.5
>

^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH 1/5] namespaces: move user_ns into ns_common
  2016-07-14 22:02   ` Andrey Vagin
@ 2016-07-15  2:12       ` Andrey Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-15  2:12 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

Every namespace has a pointer to an user namespace where is was created,
but they're all privately embedded in the individual namespace specific
structures.

Now we are going to add an user-space interface to get an owning user
namespace, so it looks reasonable to move it into ns_common.

Originally this idea was suggested by James Bottomley.

Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 drivers/net/bonding/bond_main.c         |  2 +-
 drivers/net/tun.c                       |  4 ++--
 fs/mount.h                              |  1 -
 fs/namespace.c                          | 14 +++++++-------
 fs/pnode.c                              |  4 ++--
 fs/proc/root.c                          |  2 +-
 include/linux/cgroup.h                  |  1 -
 include/linux/ipc_namespace.h           |  3 ---
 include/linux/ns_common.h               |  1 +
 include/linux/pid_namespace.h           |  1 -
 include/linux/user_namespace.h          |  8 ++++++--
 include/linux/utsname.h                 |  1 -
 include/net/net_namespace.h             |  1 -
 init/version.c                          |  2 +-
 ipc/mqueue.c                            |  2 +-
 ipc/msgutil.c                           |  2 +-
 ipc/namespace.c                         |  6 +++---
 ipc/shm.c                               |  2 +-
 ipc/util.c                              |  4 ++--
 kernel/cgroup.c                         | 12 ++++++------
 kernel/pid.c                            |  2 +-
 kernel/pid_namespace.c                  |  8 ++++----
 kernel/reboot.c                         |  2 +-
 kernel/sys.c                            |  4 ++--
 kernel/user_namespace.c                 |  4 ++++
 kernel/utsname.c                        |  6 +++---
 net/8021q/vlan.c                        | 12 ++++++------
 net/bridge/br_ioctl.c                   | 22 +++++++++++-----------
 net/bridge/br_sysfs_br.c                |  4 ++--
 net/bridge/br_sysfs_if.c                |  2 +-
 net/bridge/netfilter/ebtables.c         |  8 ++++----
 net/core/dev_ioctl.c                    |  4 ++--
 net/core/ethtool.c                      |  2 +-
 net/core/neighbour.c                    |  2 +-
 net/core/net-sysfs.c                    |  6 +++---
 net/core/net_namespace.c                |  6 +++---
 net/core/rtnetlink.c                    |  6 +++---
 net/core/scm.c                          |  2 +-
 net/core/sock.c                         | 10 +++++-----
 net/core/sock_diag.c                    |  2 +-
 net/core/sysctl_net_core.c              |  2 +-
 net/ieee802154/6lowpan/reassembly.c     |  2 +-
 net/ieee802154/socket.c                 |  8 ++++----
 net/ipv4/af_inet.c                      |  4 ++--
 net/ipv4/arp.c                          |  2 +-
 net/ipv4/devinet.c                      |  4 ++--
 net/ipv4/fib_frontend.c                 |  2 +-
 net/ipv4/ip_options.c                   |  6 +++---
 net/ipv4/ip_sockglue.c                  |  6 +++---
 net/ipv4/ip_tunnel.c                    |  4 ++--
 net/ipv4/ipmr.c                         |  2 +-
 net/ipv4/netfilter/arp_tables.c         |  8 ++++----
 net/ipv4/netfilter/ip_tables.c          |  8 ++++----
 net/ipv4/route.c                        |  2 +-
 net/ipv4/tcp.c                          |  2 +-
 net/ipv4/tcp_cong.c                     |  2 +-
 net/ipv6/addrconf.c                     |  4 ++--
 net/ipv6/af_inet6.c                     |  4 ++--
 net/ipv6/anycast.c                      |  2 +-
 net/ipv6/datagram.c                     |  6 +++---
 net/ipv6/ip6_flowlabel.c                |  2 +-
 net/ipv6/ip6_gre.c                      |  4 ++--
 net/ipv6/ip6_tunnel.c                   |  4 ++--
 net/ipv6/ip6_vti.c                      |  4 ++--
 net/ipv6/ip6mr.c                        |  2 +-
 net/ipv6/ipv6_sockglue.c                |  8 ++++----
 net/ipv6/netfilter/ip6_tables.c         |  8 ++++----
 net/ipv6/reassembly.c                   |  2 +-
 net/ipv6/route.c                        |  4 ++--
 net/ipv6/sit.c                          |  8 ++++----
 net/key/af_key.c                        |  2 +-
 net/llc/af_llc.c                        |  2 +-
 net/netfilter/ipset/ip_set_core.c       |  2 +-
 net/netfilter/ipvs/ip_vs_ctl.c          |  6 +++---
 net/netfilter/ipvs/ip_vs_lblc.c         |  2 +-
 net/netfilter/ipvs/ip_vs_lblcr.c        |  2 +-
 net/netfilter/nf_conntrack_acct.c       |  2 +-
 net/netfilter/nf_conntrack_ecache.c     |  2 +-
 net/netfilter/nf_conntrack_expect.c     |  4 ++--
 net/netfilter/nf_conntrack_helper.c     |  2 +-
 net/netfilter/nf_conntrack_proto_dccp.c |  2 +-
 net/netfilter/nf_conntrack_standalone.c |  6 +++---
 net/netfilter/nf_conntrack_timestamp.c  |  2 +-
 net/netfilter/nfnetlink_log.c           |  4 ++--
 net/netfilter/x_tables.c                |  4 ++--
 net/netlink/af_netlink.c                |  8 ++++----
 net/netlink/genetlink.c                 |  2 +-
 net/packet/af_packet.c                  |  2 +-
 net/sched/cls_api.c                     |  2 +-
 net/sched/sch_api.c                     |  6 +++---
 net/sctp/socket.c                       |  6 +++---
 net/sysctl_net.c                        |  6 +++---
 net/unix/sysctl_net_unix.c              |  2 +-
 net/xfrm/xfrm_sysctl.c                  |  2 +-
 94 files changed, 197 insertions(+), 196 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a2afa3b..5ebe22a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3425,7 +3425,7 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
 
 	net = dev_net(bond_dev);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	slave_dev = __dev_get_by_name(net, ifr->ifr_slave);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e16487c..2730608 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -487,7 +487,7 @@ static inline bool tun_not_capable(struct tun_struct *tun)
 
 	return ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
 		  (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
-		!ns_capable(net->user_ns, CAP_NET_ADMIN);
+		!ns_capable(net->ns.user_ns, CAP_NET_ADMIN);
 }
 
 static void tun_set_real_num_queues(struct tun_struct *tun)
@@ -1737,7 +1737,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		int queues = ifr->ifr_flags & IFF_MULTI_QUEUE ?
 			     MAX_TAP_QUEUES : 1;
 
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		err = security_tun_dev_create();
 		if (err < 0)
diff --git a/fs/mount.h b/fs/mount.h
index 14db05d..532dd92 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -9,7 +9,6 @@ struct mnt_namespace {
 	struct ns_common	ns;
 	struct mount *	root;
 	struct list_head	list;
-	struct user_namespace	*user_ns;
 	u64			seq;	/* Sequence number to prevent loops */
 	wait_queue_head_t poll;
 	u64 event;
diff --git a/fs/namespace.c b/fs/namespace.c
index 419f746..22b0dbc 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1582,7 +1582,7 @@ out_unlock:
  */
 static inline bool may_mount(void)
 {
-	return ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN);
+	return ns_capable(current->nsproxy->mnt_ns->ns.user_ns, CAP_SYS_ADMIN);
 }
 
 static inline bool may_mandlock(void)
@@ -2187,7 +2187,7 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	if ((mnt->mnt.mnt_flags & MNT_LOCK_NODEV) &&
 	    !(mnt_flags & MNT_NODEV)) {
 		/* Was the nodev implicitly added in mount? */
-		if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
+		if ((mnt->mnt_ns->ns.user_ns != &init_user_ns) &&
 		    !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
 			mnt_flags |= MNT_NODEV;
 		} else {
@@ -2386,7 +2386,7 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 			int mnt_flags, const char *name, void *data)
 {
 	struct file_system_type *type;
-	struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
+	struct user_namespace *user_ns = current->nsproxy->mnt_ns->ns.user_ns;
 	struct vfsmount *mnt;
 	int err;
 
@@ -2744,7 +2744,7 @@ dput_out:
 static void free_mnt_ns(struct mnt_namespace *ns)
 {
 	ns_free_inum(&ns->ns);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	kfree(ns);
 }
 
@@ -2777,7 +2777,7 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns)
 	INIT_LIST_HEAD(&new_ns->list);
 	init_waitqueue_head(&new_ns->poll);
 	new_ns->event = 0;
-	new_ns->user_ns = get_user_ns(user_ns);
+	new_ns->ns.user_ns = get_user_ns(user_ns);
 	return new_ns;
 }
 
@@ -2807,7 +2807,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 	namespace_lock();
 	/* First pass: copy the tree topology */
 	copy_flags = CL_COPY_UNBINDABLE | CL_EXPIRE;
-	if (user_ns != ns->user_ns)
+	if (user_ns != ns->ns.user_ns)
 		copy_flags |= CL_SHARED_TO_SLAVE | CL_UNPRIVILEGED;
 	new = copy_tree(old, old->mnt.mnt_root, copy_flags);
 	if (IS_ERR(new)) {
@@ -3326,7 +3326,7 @@ static int mntns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	struct mnt_namespace *mnt_ns = to_mnt_ns(ns);
 	struct path root;
 
-	if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(mnt_ns->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_CHROOT) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
diff --git a/fs/pnode.c b/fs/pnode.c
index 9989970..e051f11 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -244,7 +244,7 @@ static int propagate_one(struct mount *m)
 	}
 		
 	/* Notice when we are propagating across user namespaces */
-	if (m->mnt_ns->user_ns != user_ns)
+	if (m->mnt_ns->ns.user_ns != user_ns)
 		type |= CL_UNPRIVILEGED;
 	child = copy_tree(last_source, last_source->mnt.mnt_root, type);
 	if (IS_ERR(child))
@@ -286,7 +286,7 @@ int propagate_mnt(struct mount *dest_mnt, struct mountpoint *dest_mp,
 	 * propagate_one(); everything is serialized by namespace_sem,
 	 * so globals will do just fine.
 	 */
-	user_ns = current->nsproxy->mnt_ns->user_ns;
+	user_ns = current->nsproxy->mnt_ns->ns.user_ns;
 	last_dest = dest_mnt;
 	first_source = source_mnt;
 	last_source = source_mnt;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 0670278..aae5104 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -113,7 +113,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 		options = data;
 
 		/* Does the mounter have privilege over the pid namespace? */
-		if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+		if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN))
 			return ERR_PTR(-EPERM);
 	}
 
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index a20320c..f531cc5 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -619,7 +619,6 @@ static inline void cgroup_sk_free(struct sock_cgroup_data *skcd) {}
 struct cgroup_namespace {
 	atomic_t		count;
 	struct ns_common	ns;
-	struct user_namespace	*user_ns;
 	struct css_set          *root_cset;
 };
 
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 1eee6bc..0f9d806 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -56,9 +56,6 @@ struct ipc_namespace {
 	unsigned int    mq_msg_default;
 	unsigned int    mq_msgsize_default;
 
-	/* user_ns which owns the ipc ns */
-	struct user_namespace *user_ns;
-
 	struct ns_common ns;
 };
 
diff --git a/include/linux/ns_common.h b/include/linux/ns_common.h
index 85a5c8c..af2f30d 100644
--- a/include/linux/ns_common.h
+++ b/include/linux/ns_common.h
@@ -4,6 +4,7 @@
 struct proc_ns_operations;
 
 struct ns_common {
+	struct user_namespace *user_ns; /* Owning user namespace */
 	atomic_long_t stashed;
 	const struct proc_ns_operations *ops;
 	unsigned int inum;
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 918b117..b1802c6 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -39,7 +39,6 @@ struct pid_namespace {
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	struct fs_pin *bacct;
 #endif
-	struct user_namespace *user_ns;
 	struct work_struct proc_work;
 	kgid_t pid_gid;
 	int hide_pid;
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 8297e5b..a941b44 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -27,11 +27,15 @@ struct user_namespace {
 	struct uid_gid_map	gid_map;
 	struct uid_gid_map	projid_map;
 	atomic_t		count;
-	struct user_namespace	*parent;
 	int			level;
 	kuid_t			owner;
 	kgid_t			group;
-	struct ns_common	ns;
+
+	/* ->ns.user_ns and ->parent are synonyms */
+	union {
+		struct user_namespace	*parent;
+		struct ns_common	ns;
+	};
 	unsigned long		flags;
 
 	/* Register of per-UID persistent keyrings for this namespace */
diff --git a/include/linux/utsname.h b/include/linux/utsname.h
index 5093f58..78c9ef8 100644
--- a/include/linux/utsname.h
+++ b/include/linux/utsname.h
@@ -23,7 +23,6 @@ extern struct user_namespace init_user_ns;
 struct uts_namespace {
 	struct kref kref;
 	struct new_utsname name;
-	struct user_namespace *user_ns;
 	struct ns_common ns;
 };
 extern struct uts_namespace init_uts_ns;
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..acb714e 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -59,7 +59,6 @@ struct net {
 	struct list_head	cleanup_list;	/* namespaces on death row */
 	struct list_head	exit_list;	/* Use only net_mutex */
 
-	struct user_namespace   *user_ns;	/* Owning user namespace */
 	spinlock_t		nsid_lock;
 	struct idr		netns_ids;
 
diff --git a/init/version.c b/init/version.c
index fe41a63..51ac701 100644
--- a/init/version.c
+++ b/init/version.c
@@ -34,7 +34,7 @@ struct uts_namespace init_uts_ns = {
 		.machine	= UTS_MACHINE,
 		.domainname	= UTS_DOMAINNAME,
 	},
-	.user_ns = &init_user_ns,
+	.ns.user_ns = &init_user_ns,
 	.ns.inum = PROC_UTS_INIT_INO,
 #ifdef CONFIG_UTS_NS
 	.ns.ops = &utsns_operations,
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index ade739f..378cec6 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -331,7 +331,7 @@ static struct dentry *mqueue_mount(struct file_system_type *fs_type,
 		/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
 		 * over the ipc namespace.
 		 */
-		if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+		if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN))
 			return ERR_PTR(-EPERM);
 
 		data = ns;
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index ed81aaf..b2e570c 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -30,7 +30,7 @@ DEFINE_SPINLOCK(mq_lock);
  */
 struct ipc_namespace init_ipc_ns = {
 	.count		= ATOMIC_INIT(1),
-	.user_ns = &init_user_ns,
+	.ns.user_ns = &init_user_ns,
 	.ns.inum = PROC_IPC_INIT_INO,
 #ifdef CONFIG_IPC_NS
 	.ns.ops = &ipcns_operations,
diff --git a/ipc/namespace.c b/ipc/namespace.c
index 068caf1..d9f663b8 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -46,7 +46,7 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 	msg_init_ns(ns);
 	shm_init_ns(ns);
 
-	ns->user_ns = get_user_ns(user_ns);
+	ns->ns.user_ns = get_user_ns(user_ns);
 
 	return ns;
 }
@@ -97,7 +97,7 @@ static void free_ipc_ns(struct ipc_namespace *ns)
 	shm_exit_ns(ns);
 	atomic_dec(&nr_ipc_ns);
 
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	ns_free_inum(&ns->ns);
 	kfree(ns);
 }
@@ -155,7 +155,7 @@ static void ipcns_put(struct ns_common *ns)
 static int ipcns_install(struct nsproxy *nsproxy, struct ns_common *new)
 {
 	struct ipc_namespace *ns = to_ipc_ns(new);
-	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/ipc/shm.c b/ipc/shm.c
index 1328251..20546f1 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1024,7 +1024,7 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
 			goto out_unlock0;
 		}
 
-		if (!ns_capable(ns->user_ns, CAP_IPC_LOCK)) {
+		if (!ns_capable(ns->ns.user_ns, CAP_IPC_LOCK)) {
 			kuid_t euid = current_euid();
 			if (!uid_eq(euid, shp->shm_perm.uid) &&
 			    !uid_eq(euid, shp->shm_perm.cuid)) {
diff --git a/ipc/util.c b/ipc/util.c
index 798cad1..2a1a700 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -491,7 +491,7 @@ int ipcperms(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp, short flag)
 		granted_mode >>= 3;
 	/* is there some bit set in requested_mode but not in granted_mode? */
 	if ((requested_mode & ~granted_mode & 0007) &&
-	    !ns_capable(ns->user_ns, CAP_IPC_OWNER))
+	    !ns_capable(ns->ns.user_ns, CAP_IPC_OWNER))
 		return -1;
 
 	return security_ipc_permission(ipcp, flag);
@@ -700,7 +700,7 @@ struct kern_ipc_perm *ipcctl_pre_down_nolock(struct ipc_namespace *ns,
 
 	euid = current_euid();
 	if (uid_eq(euid, ipcp->cuid) || uid_eq(euid, ipcp->uid)  ||
-	    ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+	    ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN))
 		return ipcp; /* successful lookup */
 err:
 	return ERR_PTR(err);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 75c0ff0..3635600 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -221,7 +221,7 @@ static u16 have_free_callback __read_mostly;
 /* cgroup namespace for init task */
 struct cgroup_namespace init_cgroup_ns = {
 	.count		= { .counter = 2, },
-	.user_ns	= &init_user_ns,
+	.ns.user_ns	= &init_user_ns,
 	.ns.ops		= &cgroupns_operations,
 	.ns.inum	= PROC_CGROUP_INIT_INO,
 	.root_cset	= &init_css_set,
@@ -2094,7 +2094,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 	get_cgroup_ns(ns);
 
 	/* Check if the caller has permission to mount. */
-	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN)) {
+	if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN)) {
 		put_cgroup_ns(ns);
 		return ERR_PTR(-EPERM);
 	}
@@ -5609,7 +5609,7 @@ int __init cgroup_init(void)
 	BUG_ON(cgroup_init_cftypes(NULL, cgroup_dfl_base_files));
 	BUG_ON(cgroup_init_cftypes(NULL, cgroup_legacy_base_files));
 
-	get_user_ns(init_cgroup_ns.user_ns);
+	get_user_ns(init_cgroup_ns.ns.user_ns);
 
 	mutex_lock(&cgroup_mutex);
 
@@ -6285,7 +6285,7 @@ static struct cgroup_namespace *alloc_cgroup_ns(void)
 void free_cgroup_ns(struct cgroup_namespace *ns)
 {
 	put_css_set(ns->root_cset);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	ns_free_inum(&ns->ns);
 	kfree(ns);
 }
@@ -6324,7 +6324,7 @@ struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
 		return new_ns;
 	}
 
-	new_ns->user_ns = get_user_ns(user_ns);
+	new_ns->ns.user_ns = get_user_ns(user_ns);
 	new_ns->root_cset = cset;
 
 	return new_ns;
@@ -6340,7 +6340,7 @@ static int cgroupns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	struct cgroup_namespace *cgroup_ns = to_cg_ns(ns);
 
 	if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN) ||
-	    !ns_capable(cgroup_ns->user_ns, CAP_SYS_ADMIN))
+	    !ns_capable(cgroup_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/* Don't need to do anything if we are attaching to our own cgroupns. */
diff --git a/kernel/pid.c b/kernel/pid.c
index f66162f..c63f992d 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -78,7 +78,7 @@ struct pid_namespace init_pid_ns = {
 	.nr_hashed = PIDNS_HASH_ADDING,
 	.level = 0,
 	.child_reaper = &init_task,
-	.user_ns = &init_user_ns,
+	.ns.user_ns = &init_user_ns,
 	.ns.inum = PROC_PID_INIT_INO,
 #ifdef CONFIG_PID_NS
 	.ns.ops = &pidns_operations,
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index a65ba13..3529a03 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -113,7 +113,7 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
 	kref_init(&ns->kref);
 	ns->level = level;
 	ns->parent = get_pid_ns(parent_pid_ns);
-	ns->user_ns = get_user_ns(user_ns);
+	ns->ns.user_ns = get_user_ns(user_ns);
 	ns->nr_hashed = PIDNS_HASH_ADDING;
 	INIT_WORK(&ns->proc_work, proc_cleanup_work);
 
@@ -146,7 +146,7 @@ static void destroy_pid_namespace(struct pid_namespace *ns)
 	ns_free_inum(&ns->ns);
 	for (i = 0; i < PIDMAP_ENTRIES; i++)
 		kfree(ns->pidmap[i].page);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	call_rcu(&ns->rcu, delayed_free_pidns);
 }
 
@@ -276,7 +276,7 @@ static int pid_ns_ctl_handler(struct ctl_table *table, int write,
 	struct pid_namespace *pid_ns = task_active_pid_ns(current);
 	struct ctl_table tmp = *table;
 
-	if (write && !ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN))
+	if (write && !ns_capable(pid_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/*
@@ -362,7 +362,7 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	struct pid_namespace *active = task_active_pid_ns(current);
 	struct pid_namespace *ancestor, *new = to_pid_ns(ns);
 
-	if (!ns_capable(new->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(new->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/kernel/reboot.c b/kernel/reboot.c
index bd30a97..38f81a6 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -285,7 +285,7 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
 	int ret = 0;
 
 	/* We only trust the superuser with rebooting the system. */
-	if (!ns_capable(pid_ns->user_ns, CAP_SYS_BOOT))
+	if (!ns_capable(pid_ns->ns.user_ns, CAP_SYS_BOOT))
 		return -EPERM;
 
 	/* For safety, we require "magic" arguments. */
diff --git a/kernel/sys.c b/kernel/sys.c
index 89d5be4..9db5647 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1217,7 +1217,7 @@ SYSCALL_DEFINE2(sethostname, char __user *, name, int, len)
 	int errno;
 	char tmp[__NEW_UTS_LEN];
 
-	if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
+	if (!ns_capable(current->nsproxy->uts_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	if (len < 0 || len > __NEW_UTS_LEN)
@@ -1268,7 +1268,7 @@ SYSCALL_DEFINE2(setdomainname, char __user *, name, int, len)
 	int errno;
 	char tmp[__NEW_UTS_LEN];
 
-	if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
+	if (!ns_capable(current->nsproxy->uts_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 	if (len < 0 || len > __NEW_UTS_LEN)
 		return -EINVAL;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 9bafc21..a5bc78c 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -96,6 +96,10 @@ int create_user_ns(struct cred *new)
 	ns->ns.ops = &userns_operations;
 
 	atomic_set(&ns->count, 1);
+
+	/* ->ns.user_ns and ->parent are synonyms. */
+	BUILD_BUG_ON(&ns->ns.user_ns != &ns->parent);
+
 	/* Leave the new->user_ns reference with the new user namespace. */
 	ns->parent = parent_ns;
 	ns->level = parent_ns->level + 1;
diff --git a/kernel/utsname.c b/kernel/utsname.c
index 831ea71..40a119a 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -52,7 +52,7 @@ static struct uts_namespace *clone_uts_ns(struct user_namespace *user_ns,
 
 	down_read(&uts_sem);
 	memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
-	ns->user_ns = get_user_ns(user_ns);
+	ns->ns.user_ns = get_user_ns(user_ns);
 	up_read(&uts_sem);
 	return ns;
 }
@@ -85,7 +85,7 @@ void free_uts_ns(struct kref *kref)
 	struct uts_namespace *ns;
 
 	ns = container_of(kref, struct uts_namespace, kref);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	ns_free_inum(&ns->ns);
 	kfree(ns);
 }
@@ -120,7 +120,7 @@ static int utsns_install(struct nsproxy *nsproxy, struct ns_common *new)
 {
 	struct uts_namespace *ns = to_uts_ns(new);
 
-	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 82a116b..6c46a80 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -541,7 +541,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 	switch (args.cmd) {
 	case SET_VLAN_INGRESS_PRIORITY_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		vlan_dev_set_ingress_priority(dev,
 					      args.u.skb_priority,
@@ -551,7 +551,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_EGRESS_PRIORITY_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = vlan_dev_set_egress_priority(dev,
 						   args.u.skb_priority,
@@ -560,7 +560,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_FLAG_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = vlan_dev_change_flags(dev,
 					    args.vlan_qos ? args.u.flag : 0,
@@ -569,7 +569,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_NAME_TYPE_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		if ((args.u.name_type >= 0) &&
 		    (args.u.name_type < VLAN_NAME_TYPE_HIGHEST)) {
@@ -585,14 +585,14 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case ADD_VLAN_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = register_vlan_device(dev, args.u.VID);
 		break;
 
 	case DEL_VLAN_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		unregister_vlan_dev(dev, NULL);
 		err = 0;
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index d99b200..2fdea4f 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -90,7 +90,7 @@ static int add_del_if(struct net_bridge *br, int ifindex, int isadd)
 	struct net_device *dev;
 	int ret;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	dev = __dev_get_by_index(net, ifindex);
@@ -182,28 +182,28 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	}
 
 	case BRCTL_SET_BRIDGE_FORWARD_DELAY:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_forward_delay(br, args[1]);
 		break;
 
 	case BRCTL_SET_BRIDGE_HELLO_TIME:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_hello_time(br, args[1]);
 		break;
 
 	case BRCTL_SET_BRIDGE_MAX_AGE:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_max_age(br, args[1]);
 		break;
 
 	case BRCTL_SET_AGEING_TIME:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_ageing_time(br, args[1]);
@@ -243,7 +243,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	}
 
 	case BRCTL_SET_BRIDGE_STP_STATE:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		br_stp_set_enabled(br, args[1]);
@@ -251,7 +251,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 		break;
 
 	case BRCTL_SET_BRIDGE_PRIORITY:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		br_stp_set_bridge_priority(br, args[1]);
@@ -260,7 +260,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 
 	case BRCTL_SET_PORT_PRIORITY:
 	{
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		spin_lock_bh(&br->lock);
@@ -274,7 +274,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 
 	case BRCTL_SET_PATH_COST:
 	{
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		spin_lock_bh(&br->lock);
@@ -337,7 +337,7 @@ static int old_deviceless(struct net *net, void __user *uarg)
 	{
 		char buf[IFNAMSIZ];
 
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(buf, (void __user *)args[1], IFNAMSIZ))
@@ -367,7 +367,7 @@ int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __user *uar
 	{
 		char buf[IFNAMSIZ];
 
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(buf, uarg, IFNAMSIZ))
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index beb4707..06d417e 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -36,7 +36,7 @@ static ssize_t store_bridge_parm(struct device *d,
 	unsigned long val;
 	int err;
 
-	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	val = simple_strtoul(buf, &endp, 0);
@@ -285,7 +285,7 @@ static ssize_t group_addr_store(struct device *d,
 	u8 new_addr[6];
 	int i;
 
-	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (sscanf(buf, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 1e04d4d..e7ceab1 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -241,7 +241,7 @@ static ssize_t brport_store(struct kobject *kobj,
 	char *endp;
 	unsigned long val;
 
-	if (!ns_capable(dev_net(p->dev)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(p->dev)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	val = simple_strtoul(buf, &endp, 0);
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 5a61f35..dab0cc2 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1496,7 +1496,7 @@ static int do_ebt_set_ctl(struct sock *sk,
 	int ret;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1519,7 +1519,7 @@ static int do_ebt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 	struct ebt_table *t;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&tmp, user, sizeof(tmp)))
@@ -2303,7 +2303,7 @@ static int compat_do_ebt_set_ctl(struct sock *sk,
 	int ret;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2327,7 +2327,7 @@ static int compat_do_ebt_get_ctl(struct sock *sk, int cmd,
 	struct ebt_table *t;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* try real handler in case userland supplied needed padding */
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index b94b1d2..a705922 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -474,7 +474,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCGMIIPHY:
 	case SIOCGMIIREG:
 	case SIOCSIFNAME:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		dev_load(net, ifr.ifr_name);
 		rtnl_lock();
@@ -522,7 +522,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCBRADDIF:
 	case SIOCBRDELIF:
 	case SIOCSHWTSTAMP:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		/* fall through */
 	case SIOCBONDSLAVEINFOQUERY:
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f403481..27a3085 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2480,7 +2480,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_GTUNABLE:
 		break;
 	default:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	}
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 510cd62..8df69fd 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -3169,7 +3169,7 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
 	}
 
 	/* Don't export sysctls to unprivileged users */
-	if (neigh_parms_net(p)->user_ns != &init_user_ns)
+	if (neigh_parms_net(p)->ns.user_ns != &init_user_ns)
 		t->neigh_vars[0].procname = NULL;
 
 	switch (neigh_parms_family(p)) {
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 7a0b616..eb20bc7 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -85,7 +85,7 @@ static ssize_t netdev_store(struct device *dev, struct device_attribute *attr,
 	unsigned long new;
 	int ret = -EINVAL;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	ret = kstrtoul(buf, 0, &new);
@@ -362,7 +362,7 @@ static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
 	size_t count = len;
 	ssize_t ret;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* ignore trailing newline */
@@ -1390,7 +1390,7 @@ static bool net_current_may_mount(void)
 {
 	struct net *net = current->nsproxy->net_ns;
 
-	return ns_capable(net->user_ns, CAP_SYS_ADMIN);
+	return ns_capable(net->ns.user_ns, CAP_SYS_ADMIN);
 }
 
 static void *net_grab_current_ns(void)
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2c2eb1b..3433f0c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -279,7 +279,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	atomic_set(&net->count, 1);
 	atomic_set(&net->passive, 1);
 	net->dev_base_seq = 1;
-	net->user_ns = user_ns;
+	net->ns.user_ns = user_ns;
 	idr_init(&net->netns_ids);
 	spin_lock_init(&net->nsid_lock);
 
@@ -444,7 +444,7 @@ static void cleanup_net(struct work_struct *work)
 	/* Finally it is safe to free my network namespace structure */
 	list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {
 		list_del_init(&net->exit_list);
-		put_user_ns(net->user_ns);
+		put_user_ns(net->ns.user_ns);
 		net_drop_ns(net);
 	}
 }
@@ -987,7 +987,7 @@ static int netns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 {
 	struct net *net = to_net_ns(ns);
 
-	if (!ns_capable(net->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(net->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d69c464..ea7ba06 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1785,7 +1785,7 @@ static int do_setlink(const struct sk_buff *skb,
 			err = PTR_ERR(net);
 			goto errout;
 		}
-		if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) {
+		if (!netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN)) {
 			put_net(net);
 			err = -EPERM;
 			goto errout;
@@ -2430,7 +2430,7 @@ replay:
 			return PTR_ERR(dest_net);
 
 		err = -EPERM;
-		if (!netlink_ns_capable(skb, dest_net->user_ns, CAP_NET_ADMIN))
+		if (!netlink_ns_capable(skb, dest_net->ns.user_ns, CAP_NET_ADMIN))
 			goto out;
 
 		if (tb[IFLA_LINK_NETNSID]) {
@@ -2442,7 +2442,7 @@ replay:
 				goto out;
 			}
 			err = -EPERM;
-			if (!netlink_ns_capable(skb, link_net->user_ns, CAP_NET_ADMIN))
+			if (!netlink_ns_capable(skb, link_net->ns.user_ns, CAP_NET_ADMIN))
 				goto out;
 		}
 
diff --git a/net/core/scm.c b/net/core/scm.c
index 2696aef..1a2301a 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -54,7 +54,7 @@ static __inline__ int scm_check_creds(struct ucred *creds)
 		return -EINVAL;
 
 	if ((creds->pid == task_tgid_vnr(current) ||
-	     ns_capable(task_active_pid_ns(current)->user_ns, CAP_SYS_ADMIN)) &&
+	     ns_capable(task_active_pid_ns(current)->ns.user_ns, CAP_SYS_ADMIN)) &&
 	    ((uid_eq(uid, cred->uid)   || uid_eq(uid, cred->euid) ||
 	      uid_eq(uid, cred->suid)) || ns_capable(cred->user_ns, CAP_SETUID)) &&
 	    ((gid_eq(gid, cred->gid)   || gid_eq(gid, cred->egid) ||
diff --git a/net/core/sock.c b/net/core/sock.c
index 08bf97e..321ca3c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -191,7 +191,7 @@ EXPORT_SYMBOL(sk_capable);
  */
 bool sk_net_capable(const struct sock *sk, int cap)
 {
-	return sk_ns_capable(sk, sock_net(sk)->user_ns, cap);
+	return sk_ns_capable(sk, sock_net(sk)->ns.user_ns, cap);
 }
 EXPORT_SYMBOL(sk_net_capable);
 
@@ -534,7 +534,7 @@ static int sock_setbindtodevice(struct sock *sk, char __user *optval,
 
 	/* Sorry... */
 	ret = -EPERM;
-	if (!ns_capable(net->user_ns, CAP_NET_RAW))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		goto out;
 
 	ret = -EINVAL;
@@ -778,7 +778,7 @@ set_rcvbuf:
 
 	case SO_PRIORITY:
 		if ((val >= 0 && val <= 6) ||
-		    ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		    ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			sk->sk_priority = val;
 		else
 			ret = -EPERM;
@@ -945,7 +945,7 @@ set_rcvbuf:
 			clear_bit(SOCK_PASSSEC, &sock->flags);
 		break;
 	case SO_MARK:
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			ret = -EPERM;
 		else
 			sk->sk_mark = val;
@@ -1921,7 +1921,7 @@ int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
 
 	switch (cmsg->cmsg_type) {
 	case SO_MARK:
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
 			return -EINVAL;
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 6b10573..7151b43 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -303,7 +303,7 @@ static int sock_diag_bind(struct net *net, int group)
 
 int sock_diag_destroy(struct sock *sk, int err)
 {
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (!sk->sk_prot->diag_destroy)
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 0df2aa6..6f6749d 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -441,7 +441,7 @@ static __net_init int sysctl_core_net_init(struct net *net)
 		tbl[0].data = &net->core.sysctl_somaxconn;
 
 		/* Don't export any sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns) {
+		if (net->ns.user_ns != &init_user_ns) {
 			tbl[0].procname = NULL;
 		}
 	}
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 30d875d..9d002f4 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -512,7 +512,7 @@ static int __net_init lowpan_frags_ns_sysctl_register(struct net *net)
 		table[2].data = &ieee802154_lowpan->frags.timeout;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			table[0].procname = NULL;
 	}
 
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index e0bd013..6353184 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -895,8 +895,8 @@ static int dgram_setsockopt(struct sock *sk, int level, int optname,
 		ro->want_ack = !!val;
 		break;
 	case WPAN_SECURITY:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN) &&
-		    !ns_capable(net->user_ns, CAP_NET_RAW)) {
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN) &&
+		    !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 			err = -EPERM;
 			break;
 		}
@@ -919,8 +919,8 @@ static int dgram_setsockopt(struct sock *sk, int level, int optname,
 		}
 		break;
 	case WPAN_SECURITY_LEVEL:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN) &&
-		    !ns_capable(net->user_ns, CAP_NET_RAW)) {
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN) &&
+		    !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 			err = -EPERM;
 			break;
 		}
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index d39e9e4..bec3946 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -309,7 +309,7 @@ lookup_protocol:
 
 	err = -EPERM;
 	if (sock->type == SOCK_RAW && !kern &&
-	    !ns_capable(net->user_ns, CAP_NET_RAW))
+	    !ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
 	sock->ops = answer->ops;
@@ -475,7 +475,7 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	snum = ntohs(addr->sin_port);
 	err = -EACCES;
 	if (snum && snum < PROT_SOCK &&
-	    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
+	    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE))
 		goto out;
 
 	/*      We keep a pair of addresses. rcv_saddr is the one
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 89a8cac4..22517fb 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1140,7 +1140,7 @@ int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCDARP:
 	case SIOCSARP:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	case SIOCGARP:
 		err = copy_from_user(&r, arg, sizeof(struct arpreq));
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index e333bc8..fc8f1f2 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -961,7 +961,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 
 	case SIOCSIFFLAGS:
 		ret = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto out;
 		break;
 	case SIOCSIFADDR:	/* Set interface address (and family) */
@@ -969,7 +969,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCSIFDSTADDR:	/* Set the destination address */
 	case SIOCSIFNETMASK: 	/* Set the netmask for the interface */
 		ret = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto out;
 		ret = -EINVAL;
 		if (sin->sin_family != AF_INET)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index ef2ebeb..fbc7311 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -581,7 +581,7 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(&rt, arg, sizeof(rt)))
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index 4d158ff..dda262e 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -407,7 +407,7 @@ int ip_options_compile(struct net *net,
 					optptr[2] += 8;
 					break;
 				default:
-					if (!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) {
+					if (!skb && !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 						pp_ptr = optptr + 3;
 						goto error;
 					}
@@ -442,7 +442,7 @@ int ip_options_compile(struct net *net,
 				opt->router_alert = optptr - iph;
 			break;
 		case IPOPT_CIPSO:
-			if ((!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) || opt->cipso) {
+			if ((!skb && !ns_capable(net->ns.user_ns, CAP_NET_RAW)) || opt->cipso) {
 				pp_ptr = optptr;
 				goto error;
 			}
@@ -455,7 +455,7 @@ int ip_options_compile(struct net *net,
 		case IPOPT_SEC:
 		case IPOPT_SID:
 		default:
-			if (!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!skb && !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				pp_ptr = optptr;
 				goto error;
 			}
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 71a52f4d..474af75 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1138,14 +1138,14 @@ mc_msf_out:
 	case IP_IPSEC_POLICY:
 	case IP_XFRM_POLICY:
 		err = -EPERM;
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
 
 	case IP_TRANSPARENT:
-		if (!!val && !ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) &&
-		    !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
+		if (!!val && !ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_RAW) &&
+		    !ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN)) {
 			err = -EPERM;
 			break;
 		}
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index d8f5e0a..4ddc520 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -765,7 +765,7 @@ int ip_tunnel_ioctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 		if (p->iph.ttl)
 			p->iph.frag_off |= htons(IP_DF);
@@ -821,7 +821,7 @@ int ip_tunnel_ioctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == itn->fb_tunnel_dev) {
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 5ad48ec..df292fa 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1272,7 +1272,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval,
 	}
 	if (optname != MRT_INIT) {
 		if (sk != rcu_access_pointer(mrt->mroute_sk) &&
-		    !ns_capable(net->user_ns, CAP_NET_ADMIN)) {
+		    !ns_capable(net->ns.user_ns, CAP_NET_ADMIN)) {
 			ret = -EACCES;
 			goto out_unlock;
 		}
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 2033f92..e123093 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -1300,7 +1300,7 @@ static int compat_do_arpt_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1434,7 +1434,7 @@ static int compat_do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1455,7 +1455,7 @@ static int do_arpt_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1478,7 +1478,7 @@ static int do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 54906e0..b29238a 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1554,7 +1554,7 @@ compat_do_ipt_set_ctl(struct sock *sk,	int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1656,7 +1656,7 @@ compat_do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1678,7 +1678,7 @@ do_ipt_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1702,7 +1702,7 @@ do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a1f2830..ddb0003 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2787,7 +2787,7 @@ static __net_init int sysctl_route_net_init(struct net *net)
 			goto err_dup;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			tbl[0].procname = NULL;
 	}
 	tbl[0].extra1 = net;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5c7ed14..467b6cc 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2273,7 +2273,7 @@ EXPORT_SYMBOL(tcp_disconnect);
 
 static inline bool tcp_can_repair_sock(const struct sock *sk)
 {
-	return ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) &&
+	return ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN) &&
 		((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_ESTABLISHED));
 }
 
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 882caa4..385d0f4 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -354,7 +354,7 @@ int tcp_set_congestion_control(struct sock *sk, const char *name)
 	if (!ca)
 		err = -ENOENT;
 	else if (!((ca->flags & TCP_CONG_NON_RESTRICTED) ||
-		   ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)))
+		   ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN)))
 		err = -EPERM;
 	else if (!try_module_get(ca->owner))
 		err = -EBUSY;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 47f837a..9aaabf8 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2781,7 +2781,7 @@ int addrconf_add_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
@@ -2800,7 +2800,7 @@ int addrconf_del_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index bfa86f0..1491cbd 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -161,7 +161,7 @@ lookup_protocol:
 
 	err = -EPERM;
 	if (sock->type == SOCK_RAW && !kern &&
-	    !ns_capable(net->user_ns, CAP_NET_RAW))
+	    !ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
 	sock->ops = answer->ops;
@@ -286,7 +286,7 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		return -EINVAL;
 
 	snum = ntohs(addr->sin6_port);
-	if (snum && snum < PROT_SOCK && !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
+	if (snum && snum < PROT_SOCK && !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
 	lock_sock(sk);
diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c
index 514ac25..e168ca3 100644
--- a/net/ipv6/anycast.c
+++ b/net/ipv6/anycast.c
@@ -62,7 +62,7 @@ int ipv6_sock_ac_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
 
 	ASSERT_RTNL();
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (ipv6_addr_is_multicast(addr))
 		return -EINVAL;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 37874e2..92204ba 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -837,7 +837,7 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -857,7 +857,7 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -882,7 +882,7 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index b912f0d..c07e37e 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -569,7 +569,7 @@ int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen)
 		rcu_read_unlock_bh();
 
 		if (freq.flr_share == IPV6_FL_S_NONE &&
-		    ns_capable(net->user_ns, CAP_NET_ADMIN)) {
+		    ns_capable(net->ns.user_ns, CAP_NET_ADMIN)) {
 			fl = fl_lookup(net, freq.flr_label);
 			if (fl) {
 				err = fl6_renew(fl, freq.flr_linger, freq.flr_expires);
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 776d145..7f23d34 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -852,7 +852,7 @@ static int ip6gre_tunnel_ioctl(struct net_device *dev,
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -901,7 +901,7 @@ static int ip6gre_tunnel_ioctl(struct net_device *dev,
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == ign->fb_tunnel_dev) {
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7b0481e..fa9443c 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1484,7 +1484,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = -EFAULT;
 		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
@@ -1520,7 +1520,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		break;
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 
 		if (dev == ip6n->fb_tnl_dev) {
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d90a11f..ece8758 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -743,7 +743,7 @@ vti6_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = -EFAULT;
 		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
@@ -775,7 +775,7 @@ vti6_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		break;
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 
 		if (dev == ip6n->fb_tnl_dev) {
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 487ef3b..87a6a20 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1669,7 +1669,7 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
 		return -ENOENT;
 
 	if (optname != MRT6_INIT) {
-		if (sk != mrt->mroute6_sk && !ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (sk != mrt->mroute6_sk && !ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EACCES;
 	}
 
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index a9895e1..d5dc2aa 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -365,8 +365,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 		break;
 
 	case IPV6_TRANSPARENT:
-		if (valbool && !ns_capable(net->user_ns, CAP_NET_ADMIN) &&
-		    !ns_capable(net->user_ns, CAP_NET_RAW)) {
+		if (valbool && !ns_capable(net->ns.user_ns, CAP_NET_ADMIN) &&
+		    !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 			retv = -EPERM;
 			break;
 		}
@@ -404,7 +404,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 
 		/* hop-by-hop / destination options are privileged option */
 		retv = -EPERM;
-		if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
+		if (optname != IPV6_RTHDR && !ns_capable(net->ns.user_ns, CAP_NET_RAW))
 			break;
 
 		opt = rcu_dereference_protected(np->opt,
@@ -785,7 +785,7 @@ done:
 	case IPV6_IPSEC_POLICY:
 	case IPV6_XFRM_POLICY:
 		retv = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		retv = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 63e06c3..0f92561 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1573,7 +1573,7 @@ compat_do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1675,7 +1675,7 @@ compat_do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1697,7 +1697,7 @@ do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1721,7 +1721,7 @@ do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 2160d5d..4efbd91 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -645,7 +645,7 @@ static int __net_init ip6_frags_ns_sysctl_register(struct net *net)
 		table[2].data = &net->ipv6.frags.timeout;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			table[0].procname = NULL;
 	}
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 520b788..938a7aa 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2468,7 +2468,7 @@ int ipv6_route_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		err = copy_from_user(&rtmsg, arg,
 				     sizeof(struct in6_rtmsg));
@@ -3594,7 +3594,7 @@ struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
 		table[9].data = &net->ipv6.sysctl.ip6_rt_gc_min_interval;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			table[0].procname = NULL;
 	}
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0619ac7..196f476 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1181,7 +1181,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -1229,7 +1229,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == sitn->fb_tunnel_dev) {
@@ -1260,7 +1260,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCDELPRL:
 	case SIOCCHGPRL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 		err = -EINVAL;
 		if (dev == sitn->fb_tunnel_dev)
@@ -1287,7 +1287,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCCHG6RD:
 	case SIOCDEL6RD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
diff --git a/net/key/af_key.c b/net/key/af_key.c
index f9c9ecb..47183e9 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -141,7 +141,7 @@ static int pfkey_create(struct net *net, struct socket *sock, int protocol,
 	struct sock *sk;
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (sock->type != SOCK_RAW)
 		return -ESOCKTNOSUPPORT;
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 8ae3ed9..41c3da3 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -160,7 +160,7 @@ static int llc_ui_create(struct net *net, struct socket *sock, int protocol,
 	struct sock *sk;
 	int rc = -ESOCKTNOSUPPORT;
 
-	if (!ns_capable(net->user_ns, CAP_NET_RAW))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		return -EPERM;
 
 	if (!net_eq(net, &init_net))
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index a748b0c..46745a7 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1901,7 +1901,7 @@ ip_set_sockfn_get(struct sock *sk, int optval, void __user *user, int *len)
 	struct net *net = sock_net(sk);
 	struct ip_set_net *inst = ip_set_pernet(net);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (optval != SO_IP_SET)
 		return -EBADF;
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index c3c809b..a02b3b3 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2360,7 +2360,7 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
 	BUILD_BUG_ON(sizeof(arg) > 255);
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_SET_MAX)
@@ -2678,7 +2678,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 
 	BUG_ON(!net);
 	BUILD_BUG_ON(sizeof(arg) > 255);
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_GET_MAX)
@@ -3906,7 +3906,7 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs)
 			return -ENOMEM;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			tbl[0].procname = NULL;
 	} else
 		tbl = vs_vars;
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index cccf4d6..23a3ec3 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -564,7 +564,7 @@ static int __net_init __ip_vs_lblc_init(struct net *net)
 			return -ENOMEM;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			ipvs->lblc_ctl_table[0].procname = NULL;
 
 	} else
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 796d70e..704ad5c 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -750,7 +750,7 @@ static int __net_init __ip_vs_lblcr_init(struct net *net)
 			return -ENOMEM;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			ipvs->lblcr_ctl_table[0].procname = NULL;
 	} else
 		ipvs->lblcr_ctl_table = vs_vars_table;
diff --git a/net/netfilter/nf_conntrack_acct.c b/net/netfilter/nf_conntrack_acct.c
index 45da11a..9303901 100644
--- a/net/netfilter/nf_conntrack_acct.c
+++ b/net/netfilter/nf_conntrack_acct.c
@@ -74,7 +74,7 @@ static int nf_conntrack_acct_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_acct;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.acct_sysctl_header = register_net_sysctl(net, "net/netfilter",
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index d28011b..22411e5 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -358,7 +358,7 @@ static int nf_conntrack_event_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_events;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.event_sysctl_header =
diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c
index 9e36931..c1e6242 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -618,8 +618,8 @@ static int exp_proc_init(struct net *net)
 	if (!proc)
 		return -ENOMEM;
 
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 	if (uid_valid(root_uid) && gid_valid(root_gid))
 		proc_set_user(proc, root_uid, root_gid);
 #endif /* CONFIG_NF_CONNTRACK_PROCFS */
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 196cb39..4cff85b 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -67,7 +67,7 @@ static int nf_conntrack_helper_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_auto_assign_helper;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.helper_sysctl_header =
diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c
index 399a38f..766dbee 100644
--- a/net/netfilter/nf_conntrack_proto_dccp.c
+++ b/net/netfilter/nf_conntrack_proto_dccp.c
@@ -841,7 +841,7 @@ static int dccp_kmemdup_sysctl_table(struct net *net, struct nf_proto_net *pn,
 	pn->ctl_table[7].data = &dn->dccp_loose;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		pn->ctl_table[0].procname = NULL;
 #endif
 	return 0;
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index c026c47..8796e36 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -397,8 +397,8 @@ static int nf_conntrack_standalone_init_proc(struct net *net)
 	if (!pde)
 		goto out_nf_conntrack;
 
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 	if (uid_valid(root_uid) && gid_valid(root_gid))
 		proc_set_user(pde, root_uid, root_gid);
 
@@ -512,7 +512,7 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
 	table[4].data = &net->ct.sysctl_log_invalid;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.sysctl_header = register_net_sysctl(net, "net/netfilter", table);
diff --git a/net/netfilter/nf_conntrack_timestamp.c b/net/netfilter/nf_conntrack_timestamp.c
index 7a394df..43bd240 100644
--- a/net/netfilter/nf_conntrack_timestamp.c
+++ b/net/netfilter/nf_conntrack_timestamp.c
@@ -52,7 +52,7 @@ static int nf_conntrack_tstamp_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_tstamp;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.tstamp_sysctl_header = register_net_sysctl(net,	"net/netfilter",
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 11f81c8..5428b8e 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -1072,8 +1072,8 @@ static int __net_init nfnl_log_net_init(struct net *net)
 	if (!proc)
 		return -ENOMEM;
 
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 	if (uid_valid(root_uid) && gid_valid(root_gid))
 		proc_set_user(proc, root_uid, root_gid);
 #endif
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 2675d58..d840aa6 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1493,8 +1493,8 @@ int xt_proto_init(struct net *net, u_int8_t af)
 
 
 #ifdef CONFIG_PROC_FS
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 
 	strlcpy(buf, xt_prefix[af], sizeof(buf));
 	strlcat(buf, FORMAT_TABLES, sizeof(buf));
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 627f898..070e24d 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -828,14 +828,14 @@ EXPORT_SYMBOL(netlink_capable);
  */
 bool netlink_net_capable(const struct sk_buff *skb, int cap)
 {
-	return netlink_ns_capable(skb, sock_net(skb->sk)->user_ns, cap);
+	return netlink_ns_capable(skb, sock_net(skb->sk)->ns.user_ns, cap);
 }
 EXPORT_SYMBOL(netlink_net_capable);
 
 static inline int netlink_allowed(const struct socket *sock, unsigned int flag)
 {
 	return (nl_table[sock->sk->sk_protocol].flags & flag) ||
-		ns_capable(sock_net(sock->sk)->user_ns, CAP_NET_ADMIN);
+		ns_capable(sock_net(sock->sk)->ns.user_ns, CAP_NET_ADMIN);
 }
 
 static void
@@ -1323,7 +1323,7 @@ static void do_one_broadcast(struct sock *sk,
 		if (!peernet_has_id(sock_net(sk), p->net))
 			return;
 
-		if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
+		if (!file_ns_capable(sk->sk_socket->file, p->net->ns.user_ns,
 				     CAP_NET_BROADCAST))
 			return;
 	}
@@ -1586,7 +1586,7 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 		err = 0;
 		break;
 	case NETLINK_LISTEN_ALL_NSID:
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_BROADCAST))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_BROADCAST))
 			return -EPERM;
 
 		if (val)
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index a09132a..831e863 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -561,7 +561,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
 		return -EPERM;
 
 	if ((ops->flags & GENL_UNS_ADMIN_PERM) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) {
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 9f0983f..8172443 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3208,7 +3208,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
 	__be16 proto = (__force __be16)protocol; /* weird, but documented */
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_RAW))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		return -EPERM;
 	if (sock->type != SOCK_DGRAM && sock->type != SOCK_RAW &&
 	    sock->type != SOCK_PACKET)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index a75864d..249a340 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -140,7 +140,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
 	int tp_created = 0;
 
 	if ((n->nlmsg_type != RTM_GETTFILTER) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 replay:
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..783f495 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1123,7 +1123,7 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n)
 	int err;
 
 	if ((n->nlmsg_type != RTM_GETQDISC) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	err = nlmsg_parse(n, sizeof(*tcm), tca, TCA_MAX, NULL);
@@ -1190,7 +1190,7 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n)
 	struct Qdisc *q, *p;
 	int err;
 
-	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	if (!netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 replay:
@@ -1539,7 +1539,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n)
 	int err;
 
 	if ((n->nlmsg_type != RTM_GETTCLASS) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	err = nlmsg_parse(n, sizeof(*tcm), tca, TCA_MAX, NULL);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 67154b8..bb65b08 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -361,7 +361,7 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
 	}
 
 	if (snum && snum < PROT_SOCK &&
-	    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
+	    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
 	/* See if the address matches any of the addresses we may have
@@ -1153,7 +1153,7 @@ static int __sctp_connect(struct sock *sk,
 				 * be permitted to open new associations.
 				 */
 				if (ep->base.bind_addr.port < PROT_SOCK &&
-				    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) {
+				    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE)) {
 					err = -EACCES;
 					goto out_free;
 				}
@@ -1815,7 +1815,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 			 * associations.
 			 */
 			if (ep->base.bind_addr.port < PROT_SOCK &&
-			    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) {
+			    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE)) {
 				err = -EACCES;
 				goto out_unlock;
 			}
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index ed98c1f..cb46bc9 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -42,11 +42,11 @@ static int net_ctl_permissions(struct ctl_table_header *head,
 			       struct ctl_table *table)
 {
 	struct net *net = container_of(head->set, struct net, sysctls);
-	kuid_t root_uid = make_kuid(net->user_ns, 0);
-	kgid_t root_gid = make_kgid(net->user_ns, 0);
+	kuid_t root_uid = make_kuid(net->ns.user_ns, 0);
+	kgid_t root_gid = make_kgid(net->ns.user_ns, 0);
 
 	/* Allow network administrator to have same access as root. */
-	if (ns_capable(net->user_ns, CAP_NET_ADMIN) ||
+	if (ns_capable(net->ns.user_ns, CAP_NET_ADMIN) ||
 	    uid_eq(root_uid, current_euid())) {
 		int mode = (table->mode >> 6) & 7;
 		return (mode << 6) | (mode << 3) | mode;
diff --git a/net/unix/sysctl_net_unix.c b/net/unix/sysctl_net_unix.c
index b3d5150..b5aec8a 100644
--- a/net/unix/sysctl_net_unix.c
+++ b/net/unix/sysctl_net_unix.c
@@ -35,7 +35,7 @@ int __net_init unix_sysctl_register(struct net *net)
 		goto err_alloc;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	table[0].data = &net->unx.sysctl_max_dgram_qlen;
diff --git a/net/xfrm/xfrm_sysctl.c b/net/xfrm/xfrm_sysctl.c
index 05a6e3d..8d4b41f 100644
--- a/net/xfrm/xfrm_sysctl.c
+++ b/net/xfrm/xfrm_sysctl.c
@@ -55,7 +55,7 @@ int __net_init xfrm_sysctl_init(struct net *net)
 	table[3].data = &net->xfrm.sysctl_acq_expires;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->xfrm.sysctl_hdr = register_net_sysctl(net, "net/core", table);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 1/5] namespaces: move user_ns into ns_common
@ 2016-07-15  2:12       ` Andrey Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-15  2:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-api, containers, criu, linux-fsdevel, Eric W. Biederman,
	James Bottomley, Michael Kerrisk (man-pages),
	W. Trevor King, Alexander Viro, Serge Hallyn, Andrey Vagin

Every namespace has a pointer to an user namespace where is was created,
but they're all privately embedded in the individual namespace specific
structures.

Now we are going to add an user-space interface to get an owning user
namespace, so it looks reasonable to move it into ns_common.

Originally this idea was suggested by James Bottomley.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 drivers/net/bonding/bond_main.c         |  2 +-
 drivers/net/tun.c                       |  4 ++--
 fs/mount.h                              |  1 -
 fs/namespace.c                          | 14 +++++++-------
 fs/pnode.c                              |  4 ++--
 fs/proc/root.c                          |  2 +-
 include/linux/cgroup.h                  |  1 -
 include/linux/ipc_namespace.h           |  3 ---
 include/linux/ns_common.h               |  1 +
 include/linux/pid_namespace.h           |  1 -
 include/linux/user_namespace.h          |  8 ++++++--
 include/linux/utsname.h                 |  1 -
 include/net/net_namespace.h             |  1 -
 init/version.c                          |  2 +-
 ipc/mqueue.c                            |  2 +-
 ipc/msgutil.c                           |  2 +-
 ipc/namespace.c                         |  6 +++---
 ipc/shm.c                               |  2 +-
 ipc/util.c                              |  4 ++--
 kernel/cgroup.c                         | 12 ++++++------
 kernel/pid.c                            |  2 +-
 kernel/pid_namespace.c                  |  8 ++++----
 kernel/reboot.c                         |  2 +-
 kernel/sys.c                            |  4 ++--
 kernel/user_namespace.c                 |  4 ++++
 kernel/utsname.c                        |  6 +++---
 net/8021q/vlan.c                        | 12 ++++++------
 net/bridge/br_ioctl.c                   | 22 +++++++++++-----------
 net/bridge/br_sysfs_br.c                |  4 ++--
 net/bridge/br_sysfs_if.c                |  2 +-
 net/bridge/netfilter/ebtables.c         |  8 ++++----
 net/core/dev_ioctl.c                    |  4 ++--
 net/core/ethtool.c                      |  2 +-
 net/core/neighbour.c                    |  2 +-
 net/core/net-sysfs.c                    |  6 +++---
 net/core/net_namespace.c                |  6 +++---
 net/core/rtnetlink.c                    |  6 +++---
 net/core/scm.c                          |  2 +-
 net/core/sock.c                         | 10 +++++-----
 net/core/sock_diag.c                    |  2 +-
 net/core/sysctl_net_core.c              |  2 +-
 net/ieee802154/6lowpan/reassembly.c     |  2 +-
 net/ieee802154/socket.c                 |  8 ++++----
 net/ipv4/af_inet.c                      |  4 ++--
 net/ipv4/arp.c                          |  2 +-
 net/ipv4/devinet.c                      |  4 ++--
 net/ipv4/fib_frontend.c                 |  2 +-
 net/ipv4/ip_options.c                   |  6 +++---
 net/ipv4/ip_sockglue.c                  |  6 +++---
 net/ipv4/ip_tunnel.c                    |  4 ++--
 net/ipv4/ipmr.c                         |  2 +-
 net/ipv4/netfilter/arp_tables.c         |  8 ++++----
 net/ipv4/netfilter/ip_tables.c          |  8 ++++----
 net/ipv4/route.c                        |  2 +-
 net/ipv4/tcp.c                          |  2 +-
 net/ipv4/tcp_cong.c                     |  2 +-
 net/ipv6/addrconf.c                     |  4 ++--
 net/ipv6/af_inet6.c                     |  4 ++--
 net/ipv6/anycast.c                      |  2 +-
 net/ipv6/datagram.c                     |  6 +++---
 net/ipv6/ip6_flowlabel.c                |  2 +-
 net/ipv6/ip6_gre.c                      |  4 ++--
 net/ipv6/ip6_tunnel.c                   |  4 ++--
 net/ipv6/ip6_vti.c                      |  4 ++--
 net/ipv6/ip6mr.c                        |  2 +-
 net/ipv6/ipv6_sockglue.c                |  8 ++++----
 net/ipv6/netfilter/ip6_tables.c         |  8 ++++----
 net/ipv6/reassembly.c                   |  2 +-
 net/ipv6/route.c                        |  4 ++--
 net/ipv6/sit.c                          |  8 ++++----
 net/key/af_key.c                        |  2 +-
 net/llc/af_llc.c                        |  2 +-
 net/netfilter/ipset/ip_set_core.c       |  2 +-
 net/netfilter/ipvs/ip_vs_ctl.c          |  6 +++---
 net/netfilter/ipvs/ip_vs_lblc.c         |  2 +-
 net/netfilter/ipvs/ip_vs_lblcr.c        |  2 +-
 net/netfilter/nf_conntrack_acct.c       |  2 +-
 net/netfilter/nf_conntrack_ecache.c     |  2 +-
 net/netfilter/nf_conntrack_expect.c     |  4 ++--
 net/netfilter/nf_conntrack_helper.c     |  2 +-
 net/netfilter/nf_conntrack_proto_dccp.c |  2 +-
 net/netfilter/nf_conntrack_standalone.c |  6 +++---
 net/netfilter/nf_conntrack_timestamp.c  |  2 +-
 net/netfilter/nfnetlink_log.c           |  4 ++--
 net/netfilter/x_tables.c                |  4 ++--
 net/netlink/af_netlink.c                |  8 ++++----
 net/netlink/genetlink.c                 |  2 +-
 net/packet/af_packet.c                  |  2 +-
 net/sched/cls_api.c                     |  2 +-
 net/sched/sch_api.c                     |  6 +++---
 net/sctp/socket.c                       |  6 +++---
 net/sysctl_net.c                        |  6 +++---
 net/unix/sysctl_net_unix.c              |  2 +-
 net/xfrm/xfrm_sysctl.c                  |  2 +-
 94 files changed, 197 insertions(+), 196 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a2afa3b..5ebe22a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3425,7 +3425,7 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
 
 	net = dev_net(bond_dev);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	slave_dev = __dev_get_by_name(net, ifr->ifr_slave);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e16487c..2730608 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -487,7 +487,7 @@ static inline bool tun_not_capable(struct tun_struct *tun)
 
 	return ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
 		  (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
-		!ns_capable(net->user_ns, CAP_NET_ADMIN);
+		!ns_capable(net->ns.user_ns, CAP_NET_ADMIN);
 }
 
 static void tun_set_real_num_queues(struct tun_struct *tun)
@@ -1737,7 +1737,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		int queues = ifr->ifr_flags & IFF_MULTI_QUEUE ?
 			     MAX_TAP_QUEUES : 1;
 
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		err = security_tun_dev_create();
 		if (err < 0)
diff --git a/fs/mount.h b/fs/mount.h
index 14db05d..532dd92 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -9,7 +9,6 @@ struct mnt_namespace {
 	struct ns_common	ns;
 	struct mount *	root;
 	struct list_head	list;
-	struct user_namespace	*user_ns;
 	u64			seq;	/* Sequence number to prevent loops */
 	wait_queue_head_t poll;
 	u64 event;
diff --git a/fs/namespace.c b/fs/namespace.c
index 419f746..22b0dbc 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1582,7 +1582,7 @@ out_unlock:
  */
 static inline bool may_mount(void)
 {
-	return ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN);
+	return ns_capable(current->nsproxy->mnt_ns->ns.user_ns, CAP_SYS_ADMIN);
 }
 
 static inline bool may_mandlock(void)
@@ -2187,7 +2187,7 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	if ((mnt->mnt.mnt_flags & MNT_LOCK_NODEV) &&
 	    !(mnt_flags & MNT_NODEV)) {
 		/* Was the nodev implicitly added in mount? */
-		if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
+		if ((mnt->mnt_ns->ns.user_ns != &init_user_ns) &&
 		    !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
 			mnt_flags |= MNT_NODEV;
 		} else {
@@ -2386,7 +2386,7 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 			int mnt_flags, const char *name, void *data)
 {
 	struct file_system_type *type;
-	struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
+	struct user_namespace *user_ns = current->nsproxy->mnt_ns->ns.user_ns;
 	struct vfsmount *mnt;
 	int err;
 
@@ -2744,7 +2744,7 @@ dput_out:
 static void free_mnt_ns(struct mnt_namespace *ns)
 {
 	ns_free_inum(&ns->ns);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	kfree(ns);
 }
 
@@ -2777,7 +2777,7 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns)
 	INIT_LIST_HEAD(&new_ns->list);
 	init_waitqueue_head(&new_ns->poll);
 	new_ns->event = 0;
-	new_ns->user_ns = get_user_ns(user_ns);
+	new_ns->ns.user_ns = get_user_ns(user_ns);
 	return new_ns;
 }
 
@@ -2807,7 +2807,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 	namespace_lock();
 	/* First pass: copy the tree topology */
 	copy_flags = CL_COPY_UNBINDABLE | CL_EXPIRE;
-	if (user_ns != ns->user_ns)
+	if (user_ns != ns->ns.user_ns)
 		copy_flags |= CL_SHARED_TO_SLAVE | CL_UNPRIVILEGED;
 	new = copy_tree(old, old->mnt.mnt_root, copy_flags);
 	if (IS_ERR(new)) {
@@ -3326,7 +3326,7 @@ static int mntns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	struct mnt_namespace *mnt_ns = to_mnt_ns(ns);
 	struct path root;
 
-	if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(mnt_ns->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_CHROOT) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
diff --git a/fs/pnode.c b/fs/pnode.c
index 9989970..e051f11 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -244,7 +244,7 @@ static int propagate_one(struct mount *m)
 	}
 		
 	/* Notice when we are propagating across user namespaces */
-	if (m->mnt_ns->user_ns != user_ns)
+	if (m->mnt_ns->ns.user_ns != user_ns)
 		type |= CL_UNPRIVILEGED;
 	child = copy_tree(last_source, last_source->mnt.mnt_root, type);
 	if (IS_ERR(child))
@@ -286,7 +286,7 @@ int propagate_mnt(struct mount *dest_mnt, struct mountpoint *dest_mp,
 	 * propagate_one(); everything is serialized by namespace_sem,
 	 * so globals will do just fine.
 	 */
-	user_ns = current->nsproxy->mnt_ns->user_ns;
+	user_ns = current->nsproxy->mnt_ns->ns.user_ns;
 	last_dest = dest_mnt;
 	first_source = source_mnt;
 	last_source = source_mnt;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 0670278..aae5104 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -113,7 +113,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 		options = data;
 
 		/* Does the mounter have privilege over the pid namespace? */
-		if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+		if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN))
 			return ERR_PTR(-EPERM);
 	}
 
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index a20320c..f531cc5 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -619,7 +619,6 @@ static inline void cgroup_sk_free(struct sock_cgroup_data *skcd) {}
 struct cgroup_namespace {
 	atomic_t		count;
 	struct ns_common	ns;
-	struct user_namespace	*user_ns;
 	struct css_set          *root_cset;
 };
 
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 1eee6bc..0f9d806 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -56,9 +56,6 @@ struct ipc_namespace {
 	unsigned int    mq_msg_default;
 	unsigned int    mq_msgsize_default;
 
-	/* user_ns which owns the ipc ns */
-	struct user_namespace *user_ns;
-
 	struct ns_common ns;
 };
 
diff --git a/include/linux/ns_common.h b/include/linux/ns_common.h
index 85a5c8c..af2f30d 100644
--- a/include/linux/ns_common.h
+++ b/include/linux/ns_common.h
@@ -4,6 +4,7 @@
 struct proc_ns_operations;
 
 struct ns_common {
+	struct user_namespace *user_ns; /* Owning user namespace */
 	atomic_long_t stashed;
 	const struct proc_ns_operations *ops;
 	unsigned int inum;
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 918b117..b1802c6 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -39,7 +39,6 @@ struct pid_namespace {
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	struct fs_pin *bacct;
 #endif
-	struct user_namespace *user_ns;
 	struct work_struct proc_work;
 	kgid_t pid_gid;
 	int hide_pid;
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 8297e5b..a941b44 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -27,11 +27,15 @@ struct user_namespace {
 	struct uid_gid_map	gid_map;
 	struct uid_gid_map	projid_map;
 	atomic_t		count;
-	struct user_namespace	*parent;
 	int			level;
 	kuid_t			owner;
 	kgid_t			group;
-	struct ns_common	ns;
+
+	/* ->ns.user_ns and ->parent are synonyms */
+	union {
+		struct user_namespace	*parent;
+		struct ns_common	ns;
+	};
 	unsigned long		flags;
 
 	/* Register of per-UID persistent keyrings for this namespace */
diff --git a/include/linux/utsname.h b/include/linux/utsname.h
index 5093f58..78c9ef8 100644
--- a/include/linux/utsname.h
+++ b/include/linux/utsname.h
@@ -23,7 +23,6 @@ extern struct user_namespace init_user_ns;
 struct uts_namespace {
 	struct kref kref;
 	struct new_utsname name;
-	struct user_namespace *user_ns;
 	struct ns_common ns;
 };
 extern struct uts_namespace init_uts_ns;
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..acb714e 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -59,7 +59,6 @@ struct net {
 	struct list_head	cleanup_list;	/* namespaces on death row */
 	struct list_head	exit_list;	/* Use only net_mutex */
 
-	struct user_namespace   *user_ns;	/* Owning user namespace */
 	spinlock_t		nsid_lock;
 	struct idr		netns_ids;
 
diff --git a/init/version.c b/init/version.c
index fe41a63..51ac701 100644
--- a/init/version.c
+++ b/init/version.c
@@ -34,7 +34,7 @@ struct uts_namespace init_uts_ns = {
 		.machine	= UTS_MACHINE,
 		.domainname	= UTS_DOMAINNAME,
 	},
-	.user_ns = &init_user_ns,
+	.ns.user_ns = &init_user_ns,
 	.ns.inum = PROC_UTS_INIT_INO,
 #ifdef CONFIG_UTS_NS
 	.ns.ops = &utsns_operations,
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index ade739f..378cec6 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -331,7 +331,7 @@ static struct dentry *mqueue_mount(struct file_system_type *fs_type,
 		/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
 		 * over the ipc namespace.
 		 */
-		if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+		if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN))
 			return ERR_PTR(-EPERM);
 
 		data = ns;
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index ed81aaf..b2e570c 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -30,7 +30,7 @@ DEFINE_SPINLOCK(mq_lock);
  */
 struct ipc_namespace init_ipc_ns = {
 	.count		= ATOMIC_INIT(1),
-	.user_ns = &init_user_ns,
+	.ns.user_ns = &init_user_ns,
 	.ns.inum = PROC_IPC_INIT_INO,
 #ifdef CONFIG_IPC_NS
 	.ns.ops = &ipcns_operations,
diff --git a/ipc/namespace.c b/ipc/namespace.c
index 068caf1..d9f663b8 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -46,7 +46,7 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 	msg_init_ns(ns);
 	shm_init_ns(ns);
 
-	ns->user_ns = get_user_ns(user_ns);
+	ns->ns.user_ns = get_user_ns(user_ns);
 
 	return ns;
 }
@@ -97,7 +97,7 @@ static void free_ipc_ns(struct ipc_namespace *ns)
 	shm_exit_ns(ns);
 	atomic_dec(&nr_ipc_ns);
 
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	ns_free_inum(&ns->ns);
 	kfree(ns);
 }
@@ -155,7 +155,7 @@ static void ipcns_put(struct ns_common *ns)
 static int ipcns_install(struct nsproxy *nsproxy, struct ns_common *new)
 {
 	struct ipc_namespace *ns = to_ipc_ns(new);
-	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/ipc/shm.c b/ipc/shm.c
index 1328251..20546f1 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1024,7 +1024,7 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
 			goto out_unlock0;
 		}
 
-		if (!ns_capable(ns->user_ns, CAP_IPC_LOCK)) {
+		if (!ns_capable(ns->ns.user_ns, CAP_IPC_LOCK)) {
 			kuid_t euid = current_euid();
 			if (!uid_eq(euid, shp->shm_perm.uid) &&
 			    !uid_eq(euid, shp->shm_perm.cuid)) {
diff --git a/ipc/util.c b/ipc/util.c
index 798cad1..2a1a700 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -491,7 +491,7 @@ int ipcperms(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp, short flag)
 		granted_mode >>= 3;
 	/* is there some bit set in requested_mode but not in granted_mode? */
 	if ((requested_mode & ~granted_mode & 0007) &&
-	    !ns_capable(ns->user_ns, CAP_IPC_OWNER))
+	    !ns_capable(ns->ns.user_ns, CAP_IPC_OWNER))
 		return -1;
 
 	return security_ipc_permission(ipcp, flag);
@@ -700,7 +700,7 @@ struct kern_ipc_perm *ipcctl_pre_down_nolock(struct ipc_namespace *ns,
 
 	euid = current_euid();
 	if (uid_eq(euid, ipcp->cuid) || uid_eq(euid, ipcp->uid)  ||
-	    ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+	    ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN))
 		return ipcp; /* successful lookup */
 err:
 	return ERR_PTR(err);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 75c0ff0..3635600 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -221,7 +221,7 @@ static u16 have_free_callback __read_mostly;
 /* cgroup namespace for init task */
 struct cgroup_namespace init_cgroup_ns = {
 	.count		= { .counter = 2, },
-	.user_ns	= &init_user_ns,
+	.ns.user_ns	= &init_user_ns,
 	.ns.ops		= &cgroupns_operations,
 	.ns.inum	= PROC_CGROUP_INIT_INO,
 	.root_cset	= &init_css_set,
@@ -2094,7 +2094,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 	get_cgroup_ns(ns);
 
 	/* Check if the caller has permission to mount. */
-	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN)) {
+	if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN)) {
 		put_cgroup_ns(ns);
 		return ERR_PTR(-EPERM);
 	}
@@ -5609,7 +5609,7 @@ int __init cgroup_init(void)
 	BUG_ON(cgroup_init_cftypes(NULL, cgroup_dfl_base_files));
 	BUG_ON(cgroup_init_cftypes(NULL, cgroup_legacy_base_files));
 
-	get_user_ns(init_cgroup_ns.user_ns);
+	get_user_ns(init_cgroup_ns.ns.user_ns);
 
 	mutex_lock(&cgroup_mutex);
 
@@ -6285,7 +6285,7 @@ static struct cgroup_namespace *alloc_cgroup_ns(void)
 void free_cgroup_ns(struct cgroup_namespace *ns)
 {
 	put_css_set(ns->root_cset);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	ns_free_inum(&ns->ns);
 	kfree(ns);
 }
@@ -6324,7 +6324,7 @@ struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
 		return new_ns;
 	}
 
-	new_ns->user_ns = get_user_ns(user_ns);
+	new_ns->ns.user_ns = get_user_ns(user_ns);
 	new_ns->root_cset = cset;
 
 	return new_ns;
@@ -6340,7 +6340,7 @@ static int cgroupns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	struct cgroup_namespace *cgroup_ns = to_cg_ns(ns);
 
 	if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN) ||
-	    !ns_capable(cgroup_ns->user_ns, CAP_SYS_ADMIN))
+	    !ns_capable(cgroup_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/* Don't need to do anything if we are attaching to our own cgroupns. */
diff --git a/kernel/pid.c b/kernel/pid.c
index f66162f..c63f992d 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -78,7 +78,7 @@ struct pid_namespace init_pid_ns = {
 	.nr_hashed = PIDNS_HASH_ADDING,
 	.level = 0,
 	.child_reaper = &init_task,
-	.user_ns = &init_user_ns,
+	.ns.user_ns = &init_user_ns,
 	.ns.inum = PROC_PID_INIT_INO,
 #ifdef CONFIG_PID_NS
 	.ns.ops = &pidns_operations,
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index a65ba13..3529a03 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -113,7 +113,7 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
 	kref_init(&ns->kref);
 	ns->level = level;
 	ns->parent = get_pid_ns(parent_pid_ns);
-	ns->user_ns = get_user_ns(user_ns);
+	ns->ns.user_ns = get_user_ns(user_ns);
 	ns->nr_hashed = PIDNS_HASH_ADDING;
 	INIT_WORK(&ns->proc_work, proc_cleanup_work);
 
@@ -146,7 +146,7 @@ static void destroy_pid_namespace(struct pid_namespace *ns)
 	ns_free_inum(&ns->ns);
 	for (i = 0; i < PIDMAP_ENTRIES; i++)
 		kfree(ns->pidmap[i].page);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	call_rcu(&ns->rcu, delayed_free_pidns);
 }
 
@@ -276,7 +276,7 @@ static int pid_ns_ctl_handler(struct ctl_table *table, int write,
 	struct pid_namespace *pid_ns = task_active_pid_ns(current);
 	struct ctl_table tmp = *table;
 
-	if (write && !ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN))
+	if (write && !ns_capable(pid_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/*
@@ -362,7 +362,7 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	struct pid_namespace *active = task_active_pid_ns(current);
 	struct pid_namespace *ancestor, *new = to_pid_ns(ns);
 
-	if (!ns_capable(new->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(new->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/kernel/reboot.c b/kernel/reboot.c
index bd30a97..38f81a6 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -285,7 +285,7 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
 	int ret = 0;
 
 	/* We only trust the superuser with rebooting the system. */
-	if (!ns_capable(pid_ns->user_ns, CAP_SYS_BOOT))
+	if (!ns_capable(pid_ns->ns.user_ns, CAP_SYS_BOOT))
 		return -EPERM;
 
 	/* For safety, we require "magic" arguments. */
diff --git a/kernel/sys.c b/kernel/sys.c
index 89d5be4..9db5647 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1217,7 +1217,7 @@ SYSCALL_DEFINE2(sethostname, char __user *, name, int, len)
 	int errno;
 	char tmp[__NEW_UTS_LEN];
 
-	if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
+	if (!ns_capable(current->nsproxy->uts_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	if (len < 0 || len > __NEW_UTS_LEN)
@@ -1268,7 +1268,7 @@ SYSCALL_DEFINE2(setdomainname, char __user *, name, int, len)
 	int errno;
 	char tmp[__NEW_UTS_LEN];
 
-	if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
+	if (!ns_capable(current->nsproxy->uts_ns->ns.user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 	if (len < 0 || len > __NEW_UTS_LEN)
 		return -EINVAL;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 9bafc21..a5bc78c 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -96,6 +96,10 @@ int create_user_ns(struct cred *new)
 	ns->ns.ops = &userns_operations;
 
 	atomic_set(&ns->count, 1);
+
+	/* ->ns.user_ns and ->parent are synonyms. */
+	BUILD_BUG_ON(&ns->ns.user_ns != &ns->parent);
+
 	/* Leave the new->user_ns reference with the new user namespace. */
 	ns->parent = parent_ns;
 	ns->level = parent_ns->level + 1;
diff --git a/kernel/utsname.c b/kernel/utsname.c
index 831ea71..40a119a 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -52,7 +52,7 @@ static struct uts_namespace *clone_uts_ns(struct user_namespace *user_ns,
 
 	down_read(&uts_sem);
 	memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
-	ns->user_ns = get_user_ns(user_ns);
+	ns->ns.user_ns = get_user_ns(user_ns);
 	up_read(&uts_sem);
 	return ns;
 }
@@ -85,7 +85,7 @@ void free_uts_ns(struct kref *kref)
 	struct uts_namespace *ns;
 
 	ns = container_of(kref, struct uts_namespace, kref);
-	put_user_ns(ns->user_ns);
+	put_user_ns(ns->ns.user_ns);
 	ns_free_inum(&ns->ns);
 	kfree(ns);
 }
@@ -120,7 +120,7 @@ static int utsns_install(struct nsproxy *nsproxy, struct ns_common *new)
 {
 	struct uts_namespace *ns = to_uts_ns(new);
 
-	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(ns->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 82a116b..6c46a80 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -541,7 +541,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 	switch (args.cmd) {
 	case SET_VLAN_INGRESS_PRIORITY_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		vlan_dev_set_ingress_priority(dev,
 					      args.u.skb_priority,
@@ -551,7 +551,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_EGRESS_PRIORITY_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = vlan_dev_set_egress_priority(dev,
 						   args.u.skb_priority,
@@ -560,7 +560,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_FLAG_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = vlan_dev_change_flags(dev,
 					    args.vlan_qos ? args.u.flag : 0,
@@ -569,7 +569,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_NAME_TYPE_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		if ((args.u.name_type >= 0) &&
 		    (args.u.name_type < VLAN_NAME_TYPE_HIGHEST)) {
@@ -585,14 +585,14 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case ADD_VLAN_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = register_vlan_device(dev, args.u.VID);
 		break;
 
 	case DEL_VLAN_CMD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		unregister_vlan_dev(dev, NULL);
 		err = 0;
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index d99b200..2fdea4f 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -90,7 +90,7 @@ static int add_del_if(struct net_bridge *br, int ifindex, int isadd)
 	struct net_device *dev;
 	int ret;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	dev = __dev_get_by_index(net, ifindex);
@@ -182,28 +182,28 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	}
 
 	case BRCTL_SET_BRIDGE_FORWARD_DELAY:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_forward_delay(br, args[1]);
 		break;
 
 	case BRCTL_SET_BRIDGE_HELLO_TIME:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_hello_time(br, args[1]);
 		break;
 
 	case BRCTL_SET_BRIDGE_MAX_AGE:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_max_age(br, args[1]);
 		break;
 
 	case BRCTL_SET_AGEING_TIME:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		ret = br_set_ageing_time(br, args[1]);
@@ -243,7 +243,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	}
 
 	case BRCTL_SET_BRIDGE_STP_STATE:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		br_stp_set_enabled(br, args[1]);
@@ -251,7 +251,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 		break;
 
 	case BRCTL_SET_BRIDGE_PRIORITY:
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		br_stp_set_bridge_priority(br, args[1]);
@@ -260,7 +260,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 
 	case BRCTL_SET_PORT_PRIORITY:
 	{
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		spin_lock_bh(&br->lock);
@@ -274,7 +274,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 
 	case BRCTL_SET_PATH_COST:
 	{
-		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		spin_lock_bh(&br->lock);
@@ -337,7 +337,7 @@ static int old_deviceless(struct net *net, void __user *uarg)
 	{
 		char buf[IFNAMSIZ];
 
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(buf, (void __user *)args[1], IFNAMSIZ))
@@ -367,7 +367,7 @@ int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __user *uar
 	{
 		char buf[IFNAMSIZ];
 
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(buf, uarg, IFNAMSIZ))
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index beb4707..06d417e 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -36,7 +36,7 @@ static ssize_t store_bridge_parm(struct device *d,
 	unsigned long val;
 	int err;
 
-	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	val = simple_strtoul(buf, &endp, 0);
@@ -285,7 +285,7 @@ static ssize_t group_addr_store(struct device *d,
 	u8 new_addr[6];
 	int i;
 
-	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (sscanf(buf, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 1e04d4d..e7ceab1 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -241,7 +241,7 @@ static ssize_t brport_store(struct kobject *kobj,
 	char *endp;
 	unsigned long val;
 
-	if (!ns_capable(dev_net(p->dev)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(p->dev)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	val = simple_strtoul(buf, &endp, 0);
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 5a61f35..dab0cc2 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1496,7 +1496,7 @@ static int do_ebt_set_ctl(struct sock *sk,
 	int ret;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1519,7 +1519,7 @@ static int do_ebt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 	struct ebt_table *t;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&tmp, user, sizeof(tmp)))
@@ -2303,7 +2303,7 @@ static int compat_do_ebt_set_ctl(struct sock *sk,
 	int ret;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2327,7 +2327,7 @@ static int compat_do_ebt_get_ctl(struct sock *sk, int cmd,
 	struct ebt_table *t;
 	struct net *net = sock_net(sk);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* try real handler in case userland supplied needed padding */
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index b94b1d2..a705922 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -474,7 +474,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCGMIIPHY:
 	case SIOCGMIIREG:
 	case SIOCSIFNAME:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		dev_load(net, ifr.ifr_name);
 		rtnl_lock();
@@ -522,7 +522,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCBRADDIF:
 	case SIOCBRDELIF:
 	case SIOCSHWTSTAMP:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		/* fall through */
 	case SIOCBONDSLAVEINFOQUERY:
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f403481..27a3085 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2480,7 +2480,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_GTUNABLE:
 		break;
 	default:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	}
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 510cd62..8df69fd 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -3169,7 +3169,7 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
 	}
 
 	/* Don't export sysctls to unprivileged users */
-	if (neigh_parms_net(p)->user_ns != &init_user_ns)
+	if (neigh_parms_net(p)->ns.user_ns != &init_user_ns)
 		t->neigh_vars[0].procname = NULL;
 
 	switch (neigh_parms_family(p)) {
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 7a0b616..eb20bc7 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -85,7 +85,7 @@ static ssize_t netdev_store(struct device *dev, struct device_attribute *attr,
 	unsigned long new;
 	int ret = -EINVAL;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	ret = kstrtoul(buf, 0, &new);
@@ -362,7 +362,7 @@ static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
 	size_t count = len;
 	ssize_t ret;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* ignore trailing newline */
@@ -1390,7 +1390,7 @@ static bool net_current_may_mount(void)
 {
 	struct net *net = current->nsproxy->net_ns;
 
-	return ns_capable(net->user_ns, CAP_SYS_ADMIN);
+	return ns_capable(net->ns.user_ns, CAP_SYS_ADMIN);
 }
 
 static void *net_grab_current_ns(void)
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2c2eb1b..3433f0c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -279,7 +279,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	atomic_set(&net->count, 1);
 	atomic_set(&net->passive, 1);
 	net->dev_base_seq = 1;
-	net->user_ns = user_ns;
+	net->ns.user_ns = user_ns;
 	idr_init(&net->netns_ids);
 	spin_lock_init(&net->nsid_lock);
 
@@ -444,7 +444,7 @@ static void cleanup_net(struct work_struct *work)
 	/* Finally it is safe to free my network namespace structure */
 	list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {
 		list_del_init(&net->exit_list);
-		put_user_ns(net->user_ns);
+		put_user_ns(net->ns.user_ns);
 		net_drop_ns(net);
 	}
 }
@@ -987,7 +987,7 @@ static int netns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 {
 	struct net *net = to_net_ns(ns);
 
-	if (!ns_capable(net->user_ns, CAP_SYS_ADMIN) ||
+	if (!ns_capable(net->ns.user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d69c464..ea7ba06 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1785,7 +1785,7 @@ static int do_setlink(const struct sk_buff *skb,
 			err = PTR_ERR(net);
 			goto errout;
 		}
-		if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) {
+		if (!netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN)) {
 			put_net(net);
 			err = -EPERM;
 			goto errout;
@@ -2430,7 +2430,7 @@ replay:
 			return PTR_ERR(dest_net);
 
 		err = -EPERM;
-		if (!netlink_ns_capable(skb, dest_net->user_ns, CAP_NET_ADMIN))
+		if (!netlink_ns_capable(skb, dest_net->ns.user_ns, CAP_NET_ADMIN))
 			goto out;
 
 		if (tb[IFLA_LINK_NETNSID]) {
@@ -2442,7 +2442,7 @@ replay:
 				goto out;
 			}
 			err = -EPERM;
-			if (!netlink_ns_capable(skb, link_net->user_ns, CAP_NET_ADMIN))
+			if (!netlink_ns_capable(skb, link_net->ns.user_ns, CAP_NET_ADMIN))
 				goto out;
 		}
 
diff --git a/net/core/scm.c b/net/core/scm.c
index 2696aef..1a2301a 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -54,7 +54,7 @@ static __inline__ int scm_check_creds(struct ucred *creds)
 		return -EINVAL;
 
 	if ((creds->pid == task_tgid_vnr(current) ||
-	     ns_capable(task_active_pid_ns(current)->user_ns, CAP_SYS_ADMIN)) &&
+	     ns_capable(task_active_pid_ns(current)->ns.user_ns, CAP_SYS_ADMIN)) &&
 	    ((uid_eq(uid, cred->uid)   || uid_eq(uid, cred->euid) ||
 	      uid_eq(uid, cred->suid)) || ns_capable(cred->user_ns, CAP_SETUID)) &&
 	    ((gid_eq(gid, cred->gid)   || gid_eq(gid, cred->egid) ||
diff --git a/net/core/sock.c b/net/core/sock.c
index 08bf97e..321ca3c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -191,7 +191,7 @@ EXPORT_SYMBOL(sk_capable);
  */
 bool sk_net_capable(const struct sock *sk, int cap)
 {
-	return sk_ns_capable(sk, sock_net(sk)->user_ns, cap);
+	return sk_ns_capable(sk, sock_net(sk)->ns.user_ns, cap);
 }
 EXPORT_SYMBOL(sk_net_capable);
 
@@ -534,7 +534,7 @@ static int sock_setbindtodevice(struct sock *sk, char __user *optval,
 
 	/* Sorry... */
 	ret = -EPERM;
-	if (!ns_capable(net->user_ns, CAP_NET_RAW))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		goto out;
 
 	ret = -EINVAL;
@@ -778,7 +778,7 @@ set_rcvbuf:
 
 	case SO_PRIORITY:
 		if ((val >= 0 && val <= 6) ||
-		    ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		    ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			sk->sk_priority = val;
 		else
 			ret = -EPERM;
@@ -945,7 +945,7 @@ set_rcvbuf:
 			clear_bit(SOCK_PASSSEC, &sock->flags);
 		break;
 	case SO_MARK:
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			ret = -EPERM;
 		else
 			sk->sk_mark = val;
@@ -1921,7 +1921,7 @@ int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
 
 	switch (cmsg->cmsg_type) {
 	case SO_MARK:
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
 			return -EINVAL;
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 6b10573..7151b43 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -303,7 +303,7 @@ static int sock_diag_bind(struct net *net, int group)
 
 int sock_diag_destroy(struct sock *sk, int err)
 {
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (!sk->sk_prot->diag_destroy)
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 0df2aa6..6f6749d 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -441,7 +441,7 @@ static __net_init int sysctl_core_net_init(struct net *net)
 		tbl[0].data = &net->core.sysctl_somaxconn;
 
 		/* Don't export any sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns) {
+		if (net->ns.user_ns != &init_user_ns) {
 			tbl[0].procname = NULL;
 		}
 	}
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 30d875d..9d002f4 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -512,7 +512,7 @@ static int __net_init lowpan_frags_ns_sysctl_register(struct net *net)
 		table[2].data = &ieee802154_lowpan->frags.timeout;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			table[0].procname = NULL;
 	}
 
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index e0bd013..6353184 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -895,8 +895,8 @@ static int dgram_setsockopt(struct sock *sk, int level, int optname,
 		ro->want_ack = !!val;
 		break;
 	case WPAN_SECURITY:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN) &&
-		    !ns_capable(net->user_ns, CAP_NET_RAW)) {
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN) &&
+		    !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 			err = -EPERM;
 			break;
 		}
@@ -919,8 +919,8 @@ static int dgram_setsockopt(struct sock *sk, int level, int optname,
 		}
 		break;
 	case WPAN_SECURITY_LEVEL:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN) &&
-		    !ns_capable(net->user_ns, CAP_NET_RAW)) {
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN) &&
+		    !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 			err = -EPERM;
 			break;
 		}
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index d39e9e4..bec3946 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -309,7 +309,7 @@ lookup_protocol:
 
 	err = -EPERM;
 	if (sock->type == SOCK_RAW && !kern &&
-	    !ns_capable(net->user_ns, CAP_NET_RAW))
+	    !ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
 	sock->ops = answer->ops;
@@ -475,7 +475,7 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	snum = ntohs(addr->sin_port);
 	err = -EACCES;
 	if (snum && snum < PROT_SOCK &&
-	    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
+	    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE))
 		goto out;
 
 	/*      We keep a pair of addresses. rcv_saddr is the one
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 89a8cac4..22517fb 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1140,7 +1140,7 @@ int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCDARP:
 	case SIOCSARP:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	case SIOCGARP:
 		err = copy_from_user(&r, arg, sizeof(struct arpreq));
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index e333bc8..fc8f1f2 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -961,7 +961,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 
 	case SIOCSIFFLAGS:
 		ret = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto out;
 		break;
 	case SIOCSIFADDR:	/* Set interface address (and family) */
@@ -969,7 +969,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCSIFDSTADDR:	/* Set the destination address */
 	case SIOCSIFNETMASK: 	/* Set the netmask for the interface */
 		ret = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto out;
 		ret = -EINVAL;
 		if (sin->sin_family != AF_INET)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index ef2ebeb..fbc7311 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -581,7 +581,7 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(&rt, arg, sizeof(rt)))
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index 4d158ff..dda262e 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -407,7 +407,7 @@ int ip_options_compile(struct net *net,
 					optptr[2] += 8;
 					break;
 				default:
-					if (!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) {
+					if (!skb && !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 						pp_ptr = optptr + 3;
 						goto error;
 					}
@@ -442,7 +442,7 @@ int ip_options_compile(struct net *net,
 				opt->router_alert = optptr - iph;
 			break;
 		case IPOPT_CIPSO:
-			if ((!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) || opt->cipso) {
+			if ((!skb && !ns_capable(net->ns.user_ns, CAP_NET_RAW)) || opt->cipso) {
 				pp_ptr = optptr;
 				goto error;
 			}
@@ -455,7 +455,7 @@ int ip_options_compile(struct net *net,
 		case IPOPT_SEC:
 		case IPOPT_SID:
 		default:
-			if (!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!skb && !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				pp_ptr = optptr;
 				goto error;
 			}
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 71a52f4d..474af75 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1138,14 +1138,14 @@ mc_msf_out:
 	case IP_IPSEC_POLICY:
 	case IP_XFRM_POLICY:
 		err = -EPERM;
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
 
 	case IP_TRANSPARENT:
-		if (!!val && !ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) &&
-		    !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
+		if (!!val && !ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_RAW) &&
+		    !ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN)) {
 			err = -EPERM;
 			break;
 		}
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index d8f5e0a..4ddc520 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -765,7 +765,7 @@ int ip_tunnel_ioctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 		if (p->iph.ttl)
 			p->iph.frag_off |= htons(IP_DF);
@@ -821,7 +821,7 @@ int ip_tunnel_ioctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == itn->fb_tunnel_dev) {
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 5ad48ec..df292fa 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1272,7 +1272,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval,
 	}
 	if (optname != MRT_INIT) {
 		if (sk != rcu_access_pointer(mrt->mroute_sk) &&
-		    !ns_capable(net->user_ns, CAP_NET_ADMIN)) {
+		    !ns_capable(net->ns.user_ns, CAP_NET_ADMIN)) {
 			ret = -EACCES;
 			goto out_unlock;
 		}
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 2033f92..e123093 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -1300,7 +1300,7 @@ static int compat_do_arpt_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1434,7 +1434,7 @@ static int compat_do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1455,7 +1455,7 @@ static int do_arpt_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1478,7 +1478,7 @@ static int do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 54906e0..b29238a 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1554,7 +1554,7 @@ compat_do_ipt_set_ctl(struct sock *sk,	int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1656,7 +1656,7 @@ compat_do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1678,7 +1678,7 @@ do_ipt_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1702,7 +1702,7 @@ do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a1f2830..ddb0003 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2787,7 +2787,7 @@ static __net_init int sysctl_route_net_init(struct net *net)
 			goto err_dup;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			tbl[0].procname = NULL;
 	}
 	tbl[0].extra1 = net;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5c7ed14..467b6cc 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2273,7 +2273,7 @@ EXPORT_SYMBOL(tcp_disconnect);
 
 static inline bool tcp_can_repair_sock(const struct sock *sk)
 {
-	return ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) &&
+	return ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN) &&
 		((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_ESTABLISHED));
 }
 
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 882caa4..385d0f4 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -354,7 +354,7 @@ int tcp_set_congestion_control(struct sock *sk, const char *name)
 	if (!ca)
 		err = -ENOENT;
 	else if (!((ca->flags & TCP_CONG_NON_RESTRICTED) ||
-		   ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)))
+		   ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN)))
 		err = -EPERM;
 	else if (!try_module_get(ca->owner))
 		err = -EBUSY;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 47f837a..9aaabf8 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2781,7 +2781,7 @@ int addrconf_add_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
@@ -2800,7 +2800,7 @@ int addrconf_del_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index bfa86f0..1491cbd 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -161,7 +161,7 @@ lookup_protocol:
 
 	err = -EPERM;
 	if (sock->type == SOCK_RAW && !kern &&
-	    !ns_capable(net->user_ns, CAP_NET_RAW))
+	    !ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
 	sock->ops = answer->ops;
@@ -286,7 +286,7 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		return -EINVAL;
 
 	snum = ntohs(addr->sin6_port);
-	if (snum && snum < PROT_SOCK && !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
+	if (snum && snum < PROT_SOCK && !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
 	lock_sock(sk);
diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c
index 514ac25..e168ca3 100644
--- a/net/ipv6/anycast.c
+++ b/net/ipv6/anycast.c
@@ -62,7 +62,7 @@ int ipv6_sock_ac_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
 
 	ASSERT_RTNL();
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (ipv6_addr_is_multicast(addr))
 		return -EINVAL;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 37874e2..92204ba 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -837,7 +837,7 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -857,7 +857,7 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -882,7 +882,7 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
+			if (!ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index b912f0d..c07e37e 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -569,7 +569,7 @@ int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen)
 		rcu_read_unlock_bh();
 
 		if (freq.flr_share == IPV6_FL_S_NONE &&
-		    ns_capable(net->user_ns, CAP_NET_ADMIN)) {
+		    ns_capable(net->ns.user_ns, CAP_NET_ADMIN)) {
 			fl = fl_lookup(net, freq.flr_label);
 			if (fl) {
 				err = fl6_renew(fl, freq.flr_linger, freq.flr_expires);
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 776d145..7f23d34 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -852,7 +852,7 @@ static int ip6gre_tunnel_ioctl(struct net_device *dev,
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -901,7 +901,7 @@ static int ip6gre_tunnel_ioctl(struct net_device *dev,
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == ign->fb_tunnel_dev) {
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7b0481e..fa9443c 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1484,7 +1484,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = -EFAULT;
 		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
@@ -1520,7 +1520,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		break;
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 
 		if (dev == ip6n->fb_tnl_dev) {
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d90a11f..ece8758 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -743,7 +743,7 @@ vti6_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		err = -EFAULT;
 		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
@@ -775,7 +775,7 @@ vti6_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		break;
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 
 		if (dev == ip6n->fb_tnl_dev) {
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 487ef3b..87a6a20 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1669,7 +1669,7 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
 		return -ENOENT;
 
 	if (optname != MRT6_INIT) {
-		if (sk != mrt->mroute6_sk && !ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (sk != mrt->mroute6_sk && !ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EACCES;
 	}
 
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index a9895e1..d5dc2aa 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -365,8 +365,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 		break;
 
 	case IPV6_TRANSPARENT:
-		if (valbool && !ns_capable(net->user_ns, CAP_NET_ADMIN) &&
-		    !ns_capable(net->user_ns, CAP_NET_RAW)) {
+		if (valbool && !ns_capable(net->ns.user_ns, CAP_NET_ADMIN) &&
+		    !ns_capable(net->ns.user_ns, CAP_NET_RAW)) {
 			retv = -EPERM;
 			break;
 		}
@@ -404,7 +404,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 
 		/* hop-by-hop / destination options are privileged option */
 		retv = -EPERM;
-		if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
+		if (optname != IPV6_RTHDR && !ns_capable(net->ns.user_ns, CAP_NET_RAW))
 			break;
 
 		opt = rcu_dereference_protected(np->opt,
@@ -785,7 +785,7 @@ done:
 	case IPV6_IPSEC_POLICY:
 	case IPV6_XFRM_POLICY:
 		retv = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			break;
 		retv = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 63e06c3..0f92561 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1573,7 +1573,7 @@ compat_do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1675,7 +1675,7 @@ compat_do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1697,7 +1697,7 @@ do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1721,7 +1721,7 @@ do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 2160d5d..4efbd91 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -645,7 +645,7 @@ static int __net_init ip6_frags_ns_sysctl_register(struct net *net)
 		table[2].data = &net->ipv6.frags.timeout;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			table[0].procname = NULL;
 	}
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 520b788..938a7aa 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2468,7 +2468,7 @@ int ipv6_route_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		err = copy_from_user(&rtmsg, arg,
 				     sizeof(struct in6_rtmsg));
@@ -3594,7 +3594,7 @@ struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
 		table[9].data = &net->ipv6.sysctl.ip6_rt_gc_min_interval;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			table[0].procname = NULL;
 	}
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0619ac7..196f476 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1181,7 +1181,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -1229,7 +1229,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == sitn->fb_tunnel_dev) {
@@ -1260,7 +1260,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCDELPRL:
 	case SIOCCHGPRL:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 		err = -EINVAL;
 		if (dev == sitn->fb_tunnel_dev)
@@ -1287,7 +1287,7 @@ ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCCHG6RD:
 	case SIOCDEL6RD:
 		err = -EPERM;
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+		if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
diff --git a/net/key/af_key.c b/net/key/af_key.c
index f9c9ecb..47183e9 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -141,7 +141,7 @@ static int pfkey_create(struct net *net, struct socket *sock, int protocol,
 	struct sock *sk;
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (sock->type != SOCK_RAW)
 		return -ESOCKTNOSUPPORT;
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 8ae3ed9..41c3da3 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -160,7 +160,7 @@ static int llc_ui_create(struct net *net, struct socket *sock, int protocol,
 	struct sock *sk;
 	int rc = -ESOCKTNOSUPPORT;
 
-	if (!ns_capable(net->user_ns, CAP_NET_RAW))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		return -EPERM;
 
 	if (!net_eq(net, &init_net))
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index a748b0c..46745a7 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1901,7 +1901,7 @@ ip_set_sockfn_get(struct sock *sk, int optval, void __user *user, int *len)
 	struct net *net = sock_net(sk);
 	struct ip_set_net *inst = ip_set_pernet(net);
 
-	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (optval != SO_IP_SET)
 		return -EBADF;
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index c3c809b..a02b3b3 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2360,7 +2360,7 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
 	BUILD_BUG_ON(sizeof(arg) > 255);
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_SET_MAX)
@@ -2678,7 +2678,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 
 	BUG_ON(!net);
 	BUILD_BUG_ON(sizeof(arg) > 255);
-	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_GET_MAX)
@@ -3906,7 +3906,7 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs)
 			return -ENOMEM;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			tbl[0].procname = NULL;
 	} else
 		tbl = vs_vars;
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index cccf4d6..23a3ec3 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -564,7 +564,7 @@ static int __net_init __ip_vs_lblc_init(struct net *net)
 			return -ENOMEM;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			ipvs->lblc_ctl_table[0].procname = NULL;
 
 	} else
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 796d70e..704ad5c 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -750,7 +750,7 @@ static int __net_init __ip_vs_lblcr_init(struct net *net)
 			return -ENOMEM;
 
 		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
+		if (net->ns.user_ns != &init_user_ns)
 			ipvs->lblcr_ctl_table[0].procname = NULL;
 	} else
 		ipvs->lblcr_ctl_table = vs_vars_table;
diff --git a/net/netfilter/nf_conntrack_acct.c b/net/netfilter/nf_conntrack_acct.c
index 45da11a..9303901 100644
--- a/net/netfilter/nf_conntrack_acct.c
+++ b/net/netfilter/nf_conntrack_acct.c
@@ -74,7 +74,7 @@ static int nf_conntrack_acct_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_acct;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.acct_sysctl_header = register_net_sysctl(net, "net/netfilter",
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index d28011b..22411e5 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -358,7 +358,7 @@ static int nf_conntrack_event_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_events;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.event_sysctl_header =
diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c
index 9e36931..c1e6242 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -618,8 +618,8 @@ static int exp_proc_init(struct net *net)
 	if (!proc)
 		return -ENOMEM;
 
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 	if (uid_valid(root_uid) && gid_valid(root_gid))
 		proc_set_user(proc, root_uid, root_gid);
 #endif /* CONFIG_NF_CONNTRACK_PROCFS */
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 196cb39..4cff85b 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -67,7 +67,7 @@ static int nf_conntrack_helper_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_auto_assign_helper;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.helper_sysctl_header =
diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c
index 399a38f..766dbee 100644
--- a/net/netfilter/nf_conntrack_proto_dccp.c
+++ b/net/netfilter/nf_conntrack_proto_dccp.c
@@ -841,7 +841,7 @@ static int dccp_kmemdup_sysctl_table(struct net *net, struct nf_proto_net *pn,
 	pn->ctl_table[7].data = &dn->dccp_loose;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		pn->ctl_table[0].procname = NULL;
 #endif
 	return 0;
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index c026c47..8796e36 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -397,8 +397,8 @@ static int nf_conntrack_standalone_init_proc(struct net *net)
 	if (!pde)
 		goto out_nf_conntrack;
 
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 	if (uid_valid(root_uid) && gid_valid(root_gid))
 		proc_set_user(pde, root_uid, root_gid);
 
@@ -512,7 +512,7 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
 	table[4].data = &net->ct.sysctl_log_invalid;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.sysctl_header = register_net_sysctl(net, "net/netfilter", table);
diff --git a/net/netfilter/nf_conntrack_timestamp.c b/net/netfilter/nf_conntrack_timestamp.c
index 7a394df..43bd240 100644
--- a/net/netfilter/nf_conntrack_timestamp.c
+++ b/net/netfilter/nf_conntrack_timestamp.c
@@ -52,7 +52,7 @@ static int nf_conntrack_tstamp_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_tstamp;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->ct.tstamp_sysctl_header = register_net_sysctl(net,	"net/netfilter",
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 11f81c8..5428b8e 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -1072,8 +1072,8 @@ static int __net_init nfnl_log_net_init(struct net *net)
 	if (!proc)
 		return -ENOMEM;
 
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 	if (uid_valid(root_uid) && gid_valid(root_gid))
 		proc_set_user(proc, root_uid, root_gid);
 #endif
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 2675d58..d840aa6 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1493,8 +1493,8 @@ int xt_proto_init(struct net *net, u_int8_t af)
 
 
 #ifdef CONFIG_PROC_FS
-	root_uid = make_kuid(net->user_ns, 0);
-	root_gid = make_kgid(net->user_ns, 0);
+	root_uid = make_kuid(net->ns.user_ns, 0);
+	root_gid = make_kgid(net->ns.user_ns, 0);
 
 	strlcpy(buf, xt_prefix[af], sizeof(buf));
 	strlcat(buf, FORMAT_TABLES, sizeof(buf));
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 627f898..070e24d 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -828,14 +828,14 @@ EXPORT_SYMBOL(netlink_capable);
  */
 bool netlink_net_capable(const struct sk_buff *skb, int cap)
 {
-	return netlink_ns_capable(skb, sock_net(skb->sk)->user_ns, cap);
+	return netlink_ns_capable(skb, sock_net(skb->sk)->ns.user_ns, cap);
 }
 EXPORT_SYMBOL(netlink_net_capable);
 
 static inline int netlink_allowed(const struct socket *sock, unsigned int flag)
 {
 	return (nl_table[sock->sk->sk_protocol].flags & flag) ||
-		ns_capable(sock_net(sock->sk)->user_ns, CAP_NET_ADMIN);
+		ns_capable(sock_net(sock->sk)->ns.user_ns, CAP_NET_ADMIN);
 }
 
 static void
@@ -1323,7 +1323,7 @@ static void do_one_broadcast(struct sock *sk,
 		if (!peernet_has_id(sock_net(sk), p->net))
 			return;
 
-		if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
+		if (!file_ns_capable(sk->sk_socket->file, p->net->ns.user_ns,
 				     CAP_NET_BROADCAST))
 			return;
 	}
@@ -1586,7 +1586,7 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 		err = 0;
 		break;
 	case NETLINK_LISTEN_ALL_NSID:
-		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_BROADCAST))
+		if (!ns_capable(sock_net(sk)->ns.user_ns, CAP_NET_BROADCAST))
 			return -EPERM;
 
 		if (val)
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index a09132a..831e863 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -561,7 +561,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
 		return -EPERM;
 
 	if ((ops->flags & GENL_UNS_ADMIN_PERM) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) {
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 9f0983f..8172443 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3208,7 +3208,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
 	__be16 proto = (__force __be16)protocol; /* weird, but documented */
 	int err;
 
-	if (!ns_capable(net->user_ns, CAP_NET_RAW))
+	if (!ns_capable(net->ns.user_ns, CAP_NET_RAW))
 		return -EPERM;
 	if (sock->type != SOCK_DGRAM && sock->type != SOCK_RAW &&
 	    sock->type != SOCK_PACKET)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index a75864d..249a340 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -140,7 +140,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
 	int tp_created = 0;
 
 	if ((n->nlmsg_type != RTM_GETTFILTER) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 replay:
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..783f495 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1123,7 +1123,7 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n)
 	int err;
 
 	if ((n->nlmsg_type != RTM_GETQDISC) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	err = nlmsg_parse(n, sizeof(*tcm), tca, TCA_MAX, NULL);
@@ -1190,7 +1190,7 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n)
 	struct Qdisc *q, *p;
 	int err;
 
-	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	if (!netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 replay:
@@ -1539,7 +1539,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n)
 	int err;
 
 	if ((n->nlmsg_type != RTM_GETTCLASS) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	    !netlink_ns_capable(skb, net->ns.user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	err = nlmsg_parse(n, sizeof(*tcm), tca, TCA_MAX, NULL);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 67154b8..bb65b08 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -361,7 +361,7 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
 	}
 
 	if (snum && snum < PROT_SOCK &&
-	    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
+	    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
 	/* See if the address matches any of the addresses we may have
@@ -1153,7 +1153,7 @@ static int __sctp_connect(struct sock *sk,
 				 * be permitted to open new associations.
 				 */
 				if (ep->base.bind_addr.port < PROT_SOCK &&
-				    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) {
+				    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE)) {
 					err = -EACCES;
 					goto out_free;
 				}
@@ -1815,7 +1815,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 			 * associations.
 			 */
 			if (ep->base.bind_addr.port < PROT_SOCK &&
-			    !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) {
+			    !ns_capable(net->ns.user_ns, CAP_NET_BIND_SERVICE)) {
 				err = -EACCES;
 				goto out_unlock;
 			}
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index ed98c1f..cb46bc9 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -42,11 +42,11 @@ static int net_ctl_permissions(struct ctl_table_header *head,
 			       struct ctl_table *table)
 {
 	struct net *net = container_of(head->set, struct net, sysctls);
-	kuid_t root_uid = make_kuid(net->user_ns, 0);
-	kgid_t root_gid = make_kgid(net->user_ns, 0);
+	kuid_t root_uid = make_kuid(net->ns.user_ns, 0);
+	kgid_t root_gid = make_kgid(net->ns.user_ns, 0);
 
 	/* Allow network administrator to have same access as root. */
-	if (ns_capable(net->user_ns, CAP_NET_ADMIN) ||
+	if (ns_capable(net->ns.user_ns, CAP_NET_ADMIN) ||
 	    uid_eq(root_uid, current_euid())) {
 		int mode = (table->mode >> 6) & 7;
 		return (mode << 6) | (mode << 3) | mode;
diff --git a/net/unix/sysctl_net_unix.c b/net/unix/sysctl_net_unix.c
index b3d5150..b5aec8a 100644
--- a/net/unix/sysctl_net_unix.c
+++ b/net/unix/sysctl_net_unix.c
@@ -35,7 +35,7 @@ int __net_init unix_sysctl_register(struct net *net)
 		goto err_alloc;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	table[0].data = &net->unx.sysctl_max_dgram_qlen;
diff --git a/net/xfrm/xfrm_sysctl.c b/net/xfrm/xfrm_sysctl.c
index 05a6e3d..8d4b41f 100644
--- a/net/xfrm/xfrm_sysctl.c
+++ b/net/xfrm/xfrm_sysctl.c
@@ -55,7 +55,7 @@ int __net_init xfrm_sysctl_init(struct net *net)
 	table[3].data = &net->xfrm.sysctl_acq_expires;
 
 	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
+	if (net->ns.user_ns != &init_user_ns)
 		table[0].procname = NULL;
 
 	net->xfrm.sysctl_hdr = register_net_sysctl(net, "net/core", table);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
  2016-07-15  2:12       ` Andrey Vagin
@ 2016-07-15  2:12           ` Andrey Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-15  2:12 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

Return -EPERM if an owning user namespace is outside of a process
current user namespace.

Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 include/linux/user_namespace.h |  7 +++++++
 kernel/user_namespace.c        | 24 ++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index a941b44..e416b76 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -76,6 +76,8 @@ extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t,
 extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
 extern int proc_setgroups_show(struct seq_file *m, void *v);
 extern bool userns_may_setgroups(const struct user_namespace *ns);
+
+struct ns_common *ns_get_owner(struct ns_common *ns);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -104,6 +106,11 @@ static inline bool userns_may_setgroups(const struct user_namespace *ns)
 {
 	return true;
 }
+
+static inline struct ns_common *ns_get_owner(struct ns_common *ns)
+{
+	return ERR_PTR(-ENOENT);
+}
 #endif
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index a5bc78c..6382e5e 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -994,6 +994,30 @@ static int userns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	return commit_creds(cred);
 }
 
+struct ns_common *ns_get_owner(struct ns_common *ns)
+{
+	const struct cred *cred = current_cred();
+	struct user_namespace *user_ns, *p;
+
+	user_ns = p = ns->user_ns;
+	if (user_ns == NULL) { /* ns is init_user_ns */
+		/* Unprivileged user should not know that it's init_user_ns. */
+		if (capable(CAP_SYS_ADMIN))
+			return ERR_PTR(-ENOENT);
+		return ERR_PTR(-EPERM);
+	}
+
+	for (;;) {
+		if (p == cred->user_ns)
+			break;
+		if (p == &init_user_ns)
+			return ERR_PTR(-EPERM);
+		p = p->parent;
+	}
+
+	return &get_user_ns(user_ns)->ns;
+}
+
 const struct proc_ns_operations userns_operations = {
 	.name		= "user",
 	.type		= CLONE_NEWUSER,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
@ 2016-07-15  2:12           ` Andrey Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-15  2:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-api, containers, criu, linux-fsdevel, Eric W. Biederman,
	James Bottomley, Michael Kerrisk (man-pages),
	W. Trevor King, Alexander Viro, Serge Hallyn, Andrey Vagin

Return -EPERM if an owning user namespace is outside of a process
current user namespace.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 include/linux/user_namespace.h |  7 +++++++
 kernel/user_namespace.c        | 24 ++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index a941b44..e416b76 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -76,6 +76,8 @@ extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t,
 extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *);
 extern int proc_setgroups_show(struct seq_file *m, void *v);
 extern bool userns_may_setgroups(const struct user_namespace *ns);
+
+struct ns_common *ns_get_owner(struct ns_common *ns);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -104,6 +106,11 @@ static inline bool userns_may_setgroups(const struct user_namespace *ns)
 {
 	return true;
 }
+
+static inline struct ns_common *ns_get_owner(struct ns_common *ns)
+{
+	return ERR_PTR(-ENOENT);
+}
 #endif
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index a5bc78c..6382e5e 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -994,6 +994,30 @@ static int userns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	return commit_creds(cred);
 }
 
+struct ns_common *ns_get_owner(struct ns_common *ns)
+{
+	const struct cred *cred = current_cred();
+	struct user_namespace *user_ns, *p;
+
+	user_ns = p = ns->user_ns;
+	if (user_ns == NULL) { /* ns is init_user_ns */
+		/* Unprivileged user should not know that it's init_user_ns. */
+		if (capable(CAP_SYS_ADMIN))
+			return ERR_PTR(-ENOENT);
+		return ERR_PTR(-EPERM);
+	}
+
+	for (;;) {
+		if (p == cred->user_ns)
+			break;
+		if (p == &init_user_ns)
+			return ERR_PTR(-EPERM);
+		p = p->parent;
+	}
+
+	return &get_user_ns(user_ns)->ns;
+}
+
 const struct proc_ns_operations userns_operations = {
 	.name		= "user",
 	.type		= CLONE_NEWUSER,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 3/5] nsfs: add ioctl to get an owning user namespace for ns file descriptor
  2016-07-15  2:12       ` Andrey Vagin
@ 2016-07-15  2:12           ` Andrey Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-15  2:12 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

Each namespace has an owning user namespace and now there is not way
to discover these relationships.

Understending namespaces relationships allows to answer the question:
what capability does process X have to perform operations on a resource
governed by namespace Y?

After a long discussion, Eric W. Biederman proposed to use ioctl-s for
this purpose.

The NS_GET_USERNS ioctl returns a file descriptor to an owning user
namespace.
It returns EPERM if a target namespace is outside of a current user
namespace.

Link: https://lkml.org/lkml/2016/7/6/158
Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 fs/nsfs.c                 | 94 ++++++++++++++++++++++++++++++++++++++++-------
 include/uapi/linux/nsfs.h |  9 +++++
 2 files changed, 90 insertions(+), 13 deletions(-)
 create mode 100644 include/uapi/linux/nsfs.h

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 8f20d60..1e5d2d0 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -5,11 +5,16 @@
 #include <linux/magic.h>
 #include <linux/ktime.h>
 #include <linux/seq_file.h>
+#include <linux/user_namespace.h>
+#include <linux/nsfs.h>
 
 static struct vfsmount *nsfs_mnt;
 
+static long ns_ioctl(struct file *filp, unsigned int ioctl,
+			unsigned long arg);
 static const struct file_operations ns_file_operations = {
 	.llseek		= no_llseek,
+	.unlocked_ioctl = ns_ioctl,
 };
 
 static char *ns_dname(struct dentry *dentry, char *buffer, int buflen)
@@ -44,22 +49,14 @@ static void nsfs_evict(struct inode *inode)
 	ns->ops->put(ns);
 }
 
-void *ns_get_path(struct path *path, struct task_struct *task,
-			const struct proc_ns_operations *ns_ops)
+static void *__ns_get_path(struct path *path, struct ns_common *ns)
 {
 	struct vfsmount *mnt = mntget(nsfs_mnt);
 	struct qstr qname = { .name = "", };
 	struct dentry *dentry;
 	struct inode *inode;
-	struct ns_common *ns;
 	unsigned long d;
 
-again:
-	ns = ns_ops->get(task);
-	if (!ns) {
-		mntput(mnt);
-		return ERR_PTR(-ENOENT);
-	}
 	rcu_read_lock();
 	d = atomic_long_read(&ns->stashed);
 	if (!d)
@@ -68,7 +65,7 @@ again:
 	if (!lockref_get_not_dead(&dentry->d_lockref))
 		goto slow;
 	rcu_read_unlock();
-	ns_ops->put(ns);
+	ns->ops->put(ns);
 got_it:
 	path->mnt = mnt;
 	path->dentry = dentry;
@@ -77,7 +74,7 @@ slow:
 	rcu_read_unlock();
 	inode = new_inode_pseudo(mnt->mnt_sb);
 	if (!inode) {
-		ns_ops->put(ns);
+		ns->ops->put(ns);
 		mntput(mnt);
 		return ERR_PTR(-ENOMEM);
 	}
@@ -95,17 +92,88 @@ slow:
 		return ERR_PTR(-ENOMEM);
 	}
 	d_instantiate(dentry, inode);
-	dentry->d_fsdata = (void *)ns_ops;
+	dentry->d_fsdata = (void *)ns->ops;
 	d = atomic_long_cmpxchg(&ns->stashed, 0, (unsigned long)dentry);
 	if (d) {
 		d_delete(dentry);	/* make sure ->d_prune() does nothing */
 		dput(dentry);
 		cpu_relax();
-		goto again;
+		return ERR_PTR(-EAGAIN);
 	}
 	goto got_it;
 }
 
+void *ns_get_path(struct path *path, struct task_struct *task,
+			const struct proc_ns_operations *ns_ops)
+{
+	struct ns_common *ns;
+	void *ret;
+
+again:
+	ns = ns_ops->get(task);
+	if (!ns)
+		return ERR_PTR(-ENOENT);
+
+	ret = __ns_get_path(path, ns);
+	if (IS_ERR(ret) && PTR_ERR(ret) == -EAGAIN)
+		goto again;
+	return ret;
+}
+
+int open_related_ns(struct ns_common *ns,
+		   struct ns_common *(*get_ns)(struct ns_common *ns))
+{
+	struct path path = {};
+	struct file *f;
+	void *err;
+	int fd;
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0)
+		return fd;
+
+	while (1) {
+		struct ns_common *parent;
+
+		parent = get_ns(ns);
+		if (IS_ERR(parent)) {
+			put_unused_fd(fd);
+			return PTR_ERR(parent);
+		}
+
+		err = __ns_get_path(&path, parent);
+		if (IS_ERR(err) && PTR_ERR(err) == -EAGAIN)
+			continue;
+		break;
+	}
+	if (IS_ERR(err)) {
+		put_unused_fd(fd);
+		return PTR_ERR(err);
+	}
+
+	f = dentry_open(&path, O_RDONLY, current_cred());
+	path_put(&path);
+	if (IS_ERR(f)) {
+		put_unused_fd(fd);
+		fd = PTR_ERR(f);
+	} else
+		fd_install(fd, f);
+	return fd;
+}
+
+static long ns_ioctl(struct file *filp, unsigned int ioctl,
+			unsigned long arg)
+{
+	struct ns_common *ns = get_proc_ns(file_inode(filp));
+
+	switch (ioctl) {
+	case NS_GET_USERNS:
+		return open_related_ns(ns, ns_get_owner);
+	default:
+		return -ENOTTY;
+	}
+}
+
 int ns_get_name(char *buf, size_t size, struct task_struct *task,
 			const struct proc_ns_operations *ns_ops)
 {
diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h
new file mode 100644
index 0000000..7a09ede
--- /dev/null
+++ b/include/uapi/linux/nsfs.h
@@ -0,0 +1,9 @@
+#ifndef __LINUX_NSFS_H
+#define __LINUX_NSFS_H
+
+#include <linux/ioctl.h>
+
+#define NSIO	0xb7
+#define NS_GET_USERNS	_IO(NSIO, 0x1)
+
+#endif /* __LINUX_NSFS_H */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 3/5] nsfs: add ioctl to get an owning user namespace for ns file descriptor
@ 2016-07-15  2:12           ` Andrey Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-15  2:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-api, containers, criu, linux-fsdevel, Eric W. Biederman,
	James Bottomley, Michael Kerrisk (man-pages),
	W. Trevor King, Alexander Viro, Serge Hallyn, Andrey Vagin

Each namespace has an owning user namespace and now there is not way
to discover these relationships.

Understending namespaces relationships allows to answer the question:
what capability does process X have to perform operations on a resource
governed by namespace Y?

After a long discussion, Eric W. Biederman proposed to use ioctl-s for
this purpose.

The NS_GET_USERNS ioctl returns a file descriptor to an owning user
namespace.
It returns EPERM if a target namespace is outside of a current user
namespace.

Link: https://lkml.org/lkml/2016/7/6/158
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/nsfs.c                 | 94 ++++++++++++++++++++++++++++++++++++++++-------
 include/uapi/linux/nsfs.h |  9 +++++
 2 files changed, 90 insertions(+), 13 deletions(-)
 create mode 100644 include/uapi/linux/nsfs.h

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 8f20d60..1e5d2d0 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -5,11 +5,16 @@
 #include <linux/magic.h>
 #include <linux/ktime.h>
 #include <linux/seq_file.h>
+#include <linux/user_namespace.h>
+#include <linux/nsfs.h>
 
 static struct vfsmount *nsfs_mnt;
 
+static long ns_ioctl(struct file *filp, unsigned int ioctl,
+			unsigned long arg);
 static const struct file_operations ns_file_operations = {
 	.llseek		= no_llseek,
+	.unlocked_ioctl = ns_ioctl,
 };
 
 static char *ns_dname(struct dentry *dentry, char *buffer, int buflen)
@@ -44,22 +49,14 @@ static void nsfs_evict(struct inode *inode)
 	ns->ops->put(ns);
 }
 
-void *ns_get_path(struct path *path, struct task_struct *task,
-			const struct proc_ns_operations *ns_ops)
+static void *__ns_get_path(struct path *path, struct ns_common *ns)
 {
 	struct vfsmount *mnt = mntget(nsfs_mnt);
 	struct qstr qname = { .name = "", };
 	struct dentry *dentry;
 	struct inode *inode;
-	struct ns_common *ns;
 	unsigned long d;
 
-again:
-	ns = ns_ops->get(task);
-	if (!ns) {
-		mntput(mnt);
-		return ERR_PTR(-ENOENT);
-	}
 	rcu_read_lock();
 	d = atomic_long_read(&ns->stashed);
 	if (!d)
@@ -68,7 +65,7 @@ again:
 	if (!lockref_get_not_dead(&dentry->d_lockref))
 		goto slow;
 	rcu_read_unlock();
-	ns_ops->put(ns);
+	ns->ops->put(ns);
 got_it:
 	path->mnt = mnt;
 	path->dentry = dentry;
@@ -77,7 +74,7 @@ slow:
 	rcu_read_unlock();
 	inode = new_inode_pseudo(mnt->mnt_sb);
 	if (!inode) {
-		ns_ops->put(ns);
+		ns->ops->put(ns);
 		mntput(mnt);
 		return ERR_PTR(-ENOMEM);
 	}
@@ -95,17 +92,88 @@ slow:
 		return ERR_PTR(-ENOMEM);
 	}
 	d_instantiate(dentry, inode);
-	dentry->d_fsdata = (void *)ns_ops;
+	dentry->d_fsdata = (void *)ns->ops;
 	d = atomic_long_cmpxchg(&ns->stashed, 0, (unsigned long)dentry);
 	if (d) {
 		d_delete(dentry);	/* make sure ->d_prune() does nothing */
 		dput(dentry);
 		cpu_relax();
-		goto again;
+		return ERR_PTR(-EAGAIN);
 	}
 	goto got_it;
 }
 
+void *ns_get_path(struct path *path, struct task_struct *task,
+			const struct proc_ns_operations *ns_ops)
+{
+	struct ns_common *ns;
+	void *ret;
+
+again:
+	ns = ns_ops->get(task);
+	if (!ns)
+		return ERR_PTR(-ENOENT);
+
+	ret = __ns_get_path(path, ns);
+	if (IS_ERR(ret) && PTR_ERR(ret) == -EAGAIN)
+		goto again;
+	return ret;
+}
+
+int open_related_ns(struct ns_common *ns,
+		   struct ns_common *(*get_ns)(struct ns_common *ns))
+{
+	struct path path = {};
+	struct file *f;
+	void *err;
+	int fd;
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0)
+		return fd;
+
+	while (1) {
+		struct ns_common *parent;
+
+		parent = get_ns(ns);
+		if (IS_ERR(parent)) {
+			put_unused_fd(fd);
+			return PTR_ERR(parent);
+		}
+
+		err = __ns_get_path(&path, parent);
+		if (IS_ERR(err) && PTR_ERR(err) == -EAGAIN)
+			continue;
+		break;
+	}
+	if (IS_ERR(err)) {
+		put_unused_fd(fd);
+		return PTR_ERR(err);
+	}
+
+	f = dentry_open(&path, O_RDONLY, current_cred());
+	path_put(&path);
+	if (IS_ERR(f)) {
+		put_unused_fd(fd);
+		fd = PTR_ERR(f);
+	} else
+		fd_install(fd, f);
+	return fd;
+}
+
+static long ns_ioctl(struct file *filp, unsigned int ioctl,
+			unsigned long arg)
+{
+	struct ns_common *ns = get_proc_ns(file_inode(filp));
+
+	switch (ioctl) {
+	case NS_GET_USERNS:
+		return open_related_ns(ns, ns_get_owner);
+	default:
+		return -ENOTTY;
+	}
+}
+
 int ns_get_name(char *buf, size_t size, struct task_struct *task,
 			const struct proc_ns_operations *ns_ops)
 {
diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h
new file mode 100644
index 0000000..7a09ede
--- /dev/null
+++ b/include/uapi/linux/nsfs.h
@@ -0,0 +1,9 @@
+#ifndef __LINUX_NSFS_H
+#define __LINUX_NSFS_H
+
+#include <linux/ioctl.h>
+
+#define NSIO	0xb7
+#define NS_GET_USERNS	_IO(NSIO, 0x1)
+
+#endif /* __LINUX_NSFS_H */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 4/5] nsfs: add ioctl to get a parent namespace
       [not found]       ` <1468548742-32136-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
  2016-07-15  2:12           ` Andrey Vagin
  2016-07-15  2:12           ` Andrey Vagin
@ 2016-07-15  2:12         ` Andrey Vagin
  2016-07-15  2:12         ` [PATCH 5/5] tools/testing: add a test to check nsfs ioctl-s Andrey Vagin
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-15  2:12 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships.

In a future we will use this interface to dump and restore nested
namespaces.

Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 fs/nsfs.c                 |  4 ++++
 include/linux/proc_ns.h   |  1 +
 include/uapi/linux/nsfs.h |  1 +
 kernel/pid_namespace.c    | 26 ++++++++++++++++++++++++++
 kernel/user_namespace.c   |  1 +
 5 files changed, 33 insertions(+)

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 1e5d2d0..b607a42 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -169,6 +169,10 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl,
 	switch (ioctl) {
 	case NS_GET_USERNS:
 		return open_related_ns(ns, ns_get_owner);
+	case NS_GET_PARENT:
+		if (!ns->ops->get_parent)
+			return -EINVAL;
+		return open_related_ns(ns, ns->ops->get_parent);
 	default:
 		return -ENOTTY;
 	}
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index de0e771..1c9f720 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -18,6 +18,7 @@ struct proc_ns_operations {
 	struct ns_common *(*get)(struct task_struct *task);
 	void (*put)(struct ns_common *ns);
 	int (*install)(struct nsproxy *nsproxy, struct ns_common *ns);
+	struct ns_common *(*get_parent)(struct ns_common *ns);
 };
 
 extern const struct proc_ns_operations netns_operations;
diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h
index 7a09ede..88098ea 100644
--- a/include/uapi/linux/nsfs.h
+++ b/include/uapi/linux/nsfs.h
@@ -5,5 +5,6 @@
 
 #define NSIO	0xb7
 #define NS_GET_USERNS	_IO(NSIO, 0x1)
+#define NS_GET_PARENT	_IO(NSIO, 0x2)
 
 #endif /* __LINUX_NSFS_H */
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 3529a03..a63adfb 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -388,12 +388,38 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	return 0;
 }
 
+static struct ns_common *pidns_get_parent(struct ns_common *ns)
+{
+	struct pid_namespace *active = task_active_pid_ns(current);
+	struct pid_namespace *pid_ns, *p;
+
+	pid_ns = to_pid_ns(ns);
+	if (pid_ns == &init_pid_ns) {
+		if (capable(CAP_SYS_ADMIN))
+			return ERR_PTR(-ENOENT);
+		return ERR_PTR(-EPERM);
+	}
+
+	pid_ns = p = pid_ns->parent;
+
+	for (;;) {
+		if (p == active)
+			break;
+		if (p == &init_pid_ns)
+			return ERR_PTR(-EPERM);
+		p = p->parent;
+	}
+
+	return &get_pid_ns(pid_ns)->ns;
+}
+
 const struct proc_ns_operations pidns_operations = {
 	.name		= "pid",
 	.type		= CLONE_NEWPID,
 	.get		= pidns_get,
 	.put		= pidns_put,
 	.install	= pidns_install,
+	.get_parent	= pidns_get_parent,
 };
 
 static __init int pid_namespaces_init(void)
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 6382e5e..d6ba0b8 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -1024,6 +1024,7 @@ const struct proc_ns_operations userns_operations = {
 	.get		= userns_get,
 	.put		= userns_put,
 	.install	= userns_install,
+	.get_parent	= ns_get_owner,
 };
 
 static __init int user_namespaces_init(void)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 4/5] nsfs: add ioctl to get a parent namespace
       [not found]       ` <1468548742-32136-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2016-07-15  2:12         ` Andrey Vagin
  2016-07-15  2:12           ` Andrey Vagin
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-15  2:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-api, containers, criu, linux-fsdevel, Eric W. Biederman,
	James Bottomley, Michael Kerrisk (man-pages),
	W. Trevor King, Alexander Viro, Serge Hallyn, Andrey Vagin

Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships.

In a future we will use this interface to dump and restore nested
namespaces.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/nsfs.c                 |  4 ++++
 include/linux/proc_ns.h   |  1 +
 include/uapi/linux/nsfs.h |  1 +
 kernel/pid_namespace.c    | 26 ++++++++++++++++++++++++++
 kernel/user_namespace.c   |  1 +
 5 files changed, 33 insertions(+)

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 1e5d2d0..b607a42 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -169,6 +169,10 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl,
 	switch (ioctl) {
 	case NS_GET_USERNS:
 		return open_related_ns(ns, ns_get_owner);
+	case NS_GET_PARENT:
+		if (!ns->ops->get_parent)
+			return -EINVAL;
+		return open_related_ns(ns, ns->ops->get_parent);
 	default:
 		return -ENOTTY;
 	}
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index de0e771..1c9f720 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -18,6 +18,7 @@ struct proc_ns_operations {
 	struct ns_common *(*get)(struct task_struct *task);
 	void (*put)(struct ns_common *ns);
 	int (*install)(struct nsproxy *nsproxy, struct ns_common *ns);
+	struct ns_common *(*get_parent)(struct ns_common *ns);
 };
 
 extern const struct proc_ns_operations netns_operations;
diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h
index 7a09ede..88098ea 100644
--- a/include/uapi/linux/nsfs.h
+++ b/include/uapi/linux/nsfs.h
@@ -5,5 +5,6 @@
 
 #define NSIO	0xb7
 #define NS_GET_USERNS	_IO(NSIO, 0x1)
+#define NS_GET_PARENT	_IO(NSIO, 0x2)
 
 #endif /* __LINUX_NSFS_H */
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 3529a03..a63adfb 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -388,12 +388,38 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	return 0;
 }
 
+static struct ns_common *pidns_get_parent(struct ns_common *ns)
+{
+	struct pid_namespace *active = task_active_pid_ns(current);
+	struct pid_namespace *pid_ns, *p;
+
+	pid_ns = to_pid_ns(ns);
+	if (pid_ns == &init_pid_ns) {
+		if (capable(CAP_SYS_ADMIN))
+			return ERR_PTR(-ENOENT);
+		return ERR_PTR(-EPERM);
+	}
+
+	pid_ns = p = pid_ns->parent;
+
+	for (;;) {
+		if (p == active)
+			break;
+		if (p == &init_pid_ns)
+			return ERR_PTR(-EPERM);
+		p = p->parent;
+	}
+
+	return &get_pid_ns(pid_ns)->ns;
+}
+
 const struct proc_ns_operations pidns_operations = {
 	.name		= "pid",
 	.type		= CLONE_NEWPID,
 	.get		= pidns_get,
 	.put		= pidns_put,
 	.install	= pidns_install,
+	.get_parent	= pidns_get_parent,
 };
 
 static __init int pid_namespaces_init(void)
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 6382e5e..d6ba0b8 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -1024,6 +1024,7 @@ const struct proc_ns_operations userns_operations = {
 	.get		= userns_get,
 	.put		= userns_put,
 	.install	= userns_install,
+	.get_parent	= ns_get_owner,
 };
 
 static __init int user_namespaces_init(void)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 4/5] nsfs: add ioctl to get a parent namespace
@ 2016-07-15  2:12         ` Andrey Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-15  2:12 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman,
	James Bottomley, Michael Kerrisk (man-pages),
	W. Trevor King, Alexander Viro, Serge Hallyn, Andrey Vagin

Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships.

In a future we will use this interface to dump and restore nested
namespaces.

Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 fs/nsfs.c                 |  4 ++++
 include/linux/proc_ns.h   |  1 +
 include/uapi/linux/nsfs.h |  1 +
 kernel/pid_namespace.c    | 26 ++++++++++++++++++++++++++
 kernel/user_namespace.c   |  1 +
 5 files changed, 33 insertions(+)

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 1e5d2d0..b607a42 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -169,6 +169,10 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl,
 	switch (ioctl) {
 	case NS_GET_USERNS:
 		return open_related_ns(ns, ns_get_owner);
+	case NS_GET_PARENT:
+		if (!ns->ops->get_parent)
+			return -EINVAL;
+		return open_related_ns(ns, ns->ops->get_parent);
 	default:
 		return -ENOTTY;
 	}
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index de0e771..1c9f720 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -18,6 +18,7 @@ struct proc_ns_operations {
 	struct ns_common *(*get)(struct task_struct *task);
 	void (*put)(struct ns_common *ns);
 	int (*install)(struct nsproxy *nsproxy, struct ns_common *ns);
+	struct ns_common *(*get_parent)(struct ns_common *ns);
 };
 
 extern const struct proc_ns_operations netns_operations;
diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h
index 7a09ede..88098ea 100644
--- a/include/uapi/linux/nsfs.h
+++ b/include/uapi/linux/nsfs.h
@@ -5,5 +5,6 @@
 
 #define NSIO	0xb7
 #define NS_GET_USERNS	_IO(NSIO, 0x1)
+#define NS_GET_PARENT	_IO(NSIO, 0x2)
 
 #endif /* __LINUX_NSFS_H */
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 3529a03..a63adfb 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -388,12 +388,38 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 	return 0;
 }
 
+static struct ns_common *pidns_get_parent(struct ns_common *ns)
+{
+	struct pid_namespace *active = task_active_pid_ns(current);
+	struct pid_namespace *pid_ns, *p;
+
+	pid_ns = to_pid_ns(ns);
+	if (pid_ns == &init_pid_ns) {
+		if (capable(CAP_SYS_ADMIN))
+			return ERR_PTR(-ENOENT);
+		return ERR_PTR(-EPERM);
+	}
+
+	pid_ns = p = pid_ns->parent;
+
+	for (;;) {
+		if (p == active)
+			break;
+		if (p == &init_pid_ns)
+			return ERR_PTR(-EPERM);
+		p = p->parent;
+	}
+
+	return &get_pid_ns(pid_ns)->ns;
+}
+
 const struct proc_ns_operations pidns_operations = {
 	.name		= "pid",
 	.type		= CLONE_NEWPID,
 	.get		= pidns_get,
 	.put		= pidns_put,
 	.install	= pidns_install,
+	.get_parent	= pidns_get_parent,
 };
 
 static __init int pid_namespaces_init(void)
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 6382e5e..d6ba0b8 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -1024,6 +1024,7 @@ const struct proc_ns_operations userns_operations = {
 	.get		= userns_get,
 	.put		= userns_put,
 	.install	= userns_install,
+	.get_parent	= ns_get_owner,
 };
 
 static __init int user_namespaces_init(void)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 5/5] tools/testing: add a test to check nsfs ioctl-s
       [not found]       ` <1468548742-32136-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
                           ` (2 preceding siblings ...)
  2016-07-15  2:12         ` [PATCH 4/5] nsfs: add ioctl to get a parent namespace Andrey Vagin
@ 2016-07-15  2:12         ` Andrey Vagin
  2016-07-16  8:21           ` kbuild test robot
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-15  2:12 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

There are two new ioctl-s:
One ioctl for the user namespace that owns a file descriptor.
One ioctl for the parent namespace of a namespace file descriptor.

The test checks that these ioctl-s works and that they handle a case
when a target namespace is outside of the current process namespace.

Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 tools/testing/selftests/Makefile      |  1 +
 tools/testing/selftests/nsfs/Makefile | 12 +++++
 tools/testing/selftests/nsfs/owner.c  | 91 +++++++++++++++++++++++++++++++++++
 tools/testing/selftests/nsfs/pidns.c  | 74 ++++++++++++++++++++++++++++
 4 files changed, 178 insertions(+)
 create mode 100644 tools/testing/selftests/nsfs/Makefile
 create mode 100644 tools/testing/selftests/nsfs/owner.c
 create mode 100644 tools/testing/selftests/nsfs/pidns.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index ff9e5f2..f770dba 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -15,6 +15,7 @@ TARGETS += memory-hotplug
 TARGETS += mount
 TARGETS += mqueue
 TARGETS += net
+TARGETS += nsfs
 TARGETS += powerpc
 TARGETS += pstore
 TARGETS += ptrace
diff --git a/tools/testing/selftests/nsfs/Makefile b/tools/testing/selftests/nsfs/Makefile
new file mode 100644
index 0000000..2306054
--- /dev/null
+++ b/tools/testing/selftests/nsfs/Makefile
@@ -0,0 +1,12 @@
+TEST_PROGS := owner pidns
+
+CFLAGS := -Wall -Werror
+
+all: owner pidns
+owner: owner.c
+pidns: pidns.c
+
+clean:
+	$(RM) owner pidns
+
+include ../lib.mk
diff --git a/tools/testing/selftests/nsfs/owner.c b/tools/testing/selftests/nsfs/owner.c
new file mode 100644
index 0000000..c97aa50
--- /dev/null
+++ b/tools/testing/selftests/nsfs/owner.c
@@ -0,0 +1,91 @@
+#define _GNU_SOURCE
+#include <sched.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+
+#define NSIO    0xb7
+#define NS_GET_USERNS   _IO(NSIO, 0x1)
+
+#define pr_err(fmt, ...) \
+		({ \
+			fprintf(stderr, "%s:%d:" fmt ": %m\n", \
+				__func__, __LINE__, ##__VA_ARGS__); \
+			1; \
+		})
+
+int main(int argc, char *argvp[])
+{
+	int pfd[2], ns, uns, init_uns;
+	struct stat st1, st2;
+	char path[128];
+	pid_t pid;
+	char c;
+
+	if (pipe(pfd))
+		return 1;
+
+	pid = fork();
+	if (pid < 0)
+		return pr_err("fork");
+	if (pid == 0) {
+		prctl(PR_SET_PDEATHSIG, SIGKILL);
+		if (unshare(CLONE_NEWUTS | CLONE_NEWUSER))
+			return pr_err("unshare");
+		close(pfd[0]);
+		close(pfd[1]);
+		while (1)
+			sleep(1);
+		return 0;
+	}
+	close(pfd[1]);
+	if (read(pfd[0], &c, 1) != 0)
+		return pr_err("Unable to read from pipe");
+	close(pfd[0]);
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/uts", pid);
+	ns = open(path, O_RDONLY);
+	if (ns < 0)
+		return pr_err("Unable to open %s", path);
+
+	uns = ioctl(ns, NS_GET_USERNS);
+	if (uns < 0)
+		return pr_err("Unable to get an owning user namespace");
+
+	if (fstat(uns, &st1))
+		return pr_err("fstat");
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/user", pid);
+	if (stat(path, &st2))
+		return pr_err("stat");
+
+	if (st1.st_ino != st2.st_ino)
+		return pr_err("NS_GET_USERNS returned a wrong namespace");
+
+	init_uns = ioctl(uns, NS_GET_USERNS);
+	if (uns < 0)
+		return pr_err("Unable to get an owning user namespace");
+
+	if (ioctl(init_uns, NS_GET_USERNS) >= 0 || errno != ENOENT)
+		return pr_err("Don't get ENOENT");
+
+	if (unshare(CLONE_NEWUSER))
+		return pr_err("unshare");
+
+	if (ioctl(ns, NS_GET_USERNS) >= 0 || errno != EPERM)
+		return pr_err("Don't get EPERM");
+	if (ioctl(init_uns, NS_GET_USERNS) >= 0 || errno != EPERM)
+		return pr_err("Don't get EPERM");
+
+	kill(pid, SIGKILL);
+	wait(NULL);
+	return 0;
+}
diff --git a/tools/testing/selftests/nsfs/pidns.c b/tools/testing/selftests/nsfs/pidns.c
new file mode 100644
index 0000000..99b1131
--- /dev/null
+++ b/tools/testing/selftests/nsfs/pidns.c
@@ -0,0 +1,74 @@
+#define _GNU_SOURCE
+#include <sched.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+
+#define pr_err(fmt, ...) \
+		({ \
+			fprintf(stderr, "%s:%d:" fmt ": %m\n", \
+				__func__, __LINE__, ##__VA_ARGS__); \
+			1; \
+		})
+
+#define NSIO	0xb7
+#define NS_GET_USERNS   _IO(NSIO, 0x1)
+#define NS_GET_PARENT   _IO(NSIO, 0x2)
+
+#define __stack_aligned__	__attribute__((aligned(16)))
+struct cr_clone_arg {
+	char stack[128] __stack_aligned__;
+	char stack_ptr[0];
+};
+
+static int child(void *args)
+{
+	prctl(PR_SET_PDEATHSIG, SIGKILL);
+	while (1)
+		sleep(1);
+	exit(0);
+}
+
+int main(int argc, char *argv[])
+{
+	char path[] = "/proc/0123456789/ns/pid";
+	struct cr_clone_arg ca;
+	struct stat st1, st2;
+	int ns, pns;
+	pid_t pid;
+
+	pid = clone(child, ca.stack_ptr, CLONE_NEWPID | SIGCLD, NULL);
+	if (pid < 0)
+		return pr_err("clone");
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/pid", pid);
+	ns = open(path, O_RDONLY);
+	if (ns < 0)
+		return pr_err("Unable to open %s", path);
+
+	pns = ioctl(ns, NS_GET_PARENT);
+	if (pns < 0)
+		return pr_err("Unable to get a parent pidns");
+
+	if (stat("/proc/self/ns/pid", &st2))
+		return pr_err("Unable to stat %s", path);
+	if (fstat(pns, &st1))
+		return pr_err("Unable to stat the parent pidns");
+	if (st1.st_ino != st2.st_ino)
+		return pr_err("NS_GET_PARENT returned a wrong namespace");
+
+	if (ioctl(pns, NS_GET_PARENT) >= 0 || errno != ENOENT)
+		return pr_err("Don't get ENOENT");;
+
+	kill(pid, SIGKILL);
+	wait(NULL);
+	return 0;
+}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH 5/5] tools/testing: add a test to check nsfs ioctl-s
  2016-07-15  2:12       ` Andrey Vagin
  (?)
  (?)
@ 2016-07-15  2:12       ` Andrey Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-15  2:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-api, containers, criu, linux-fsdevel, Eric W. Biederman,
	James Bottomley, Michael Kerrisk (man-pages),
	W. Trevor King, Alexander Viro, Serge Hallyn, Andrey Vagin

There are two new ioctl-s:
One ioctl for the user namespace that owns a file descriptor.
One ioctl for the parent namespace of a namespace file descriptor.

The test checks that these ioctl-s works and that they handle a case
when a target namespace is outside of the current process namespace.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 tools/testing/selftests/Makefile      |  1 +
 tools/testing/selftests/nsfs/Makefile | 12 +++++
 tools/testing/selftests/nsfs/owner.c  | 91 +++++++++++++++++++++++++++++++++++
 tools/testing/selftests/nsfs/pidns.c  | 74 ++++++++++++++++++++++++++++
 4 files changed, 178 insertions(+)
 create mode 100644 tools/testing/selftests/nsfs/Makefile
 create mode 100644 tools/testing/selftests/nsfs/owner.c
 create mode 100644 tools/testing/selftests/nsfs/pidns.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index ff9e5f2..f770dba 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -15,6 +15,7 @@ TARGETS += memory-hotplug
 TARGETS += mount
 TARGETS += mqueue
 TARGETS += net
+TARGETS += nsfs
 TARGETS += powerpc
 TARGETS += pstore
 TARGETS += ptrace
diff --git a/tools/testing/selftests/nsfs/Makefile b/tools/testing/selftests/nsfs/Makefile
new file mode 100644
index 0000000..2306054
--- /dev/null
+++ b/tools/testing/selftests/nsfs/Makefile
@@ -0,0 +1,12 @@
+TEST_PROGS := owner pidns
+
+CFLAGS := -Wall -Werror
+
+all: owner pidns
+owner: owner.c
+pidns: pidns.c
+
+clean:
+	$(RM) owner pidns
+
+include ../lib.mk
diff --git a/tools/testing/selftests/nsfs/owner.c b/tools/testing/selftests/nsfs/owner.c
new file mode 100644
index 0000000..c97aa50
--- /dev/null
+++ b/tools/testing/selftests/nsfs/owner.c
@@ -0,0 +1,91 @@
+#define _GNU_SOURCE
+#include <sched.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+
+#define NSIO    0xb7
+#define NS_GET_USERNS   _IO(NSIO, 0x1)
+
+#define pr_err(fmt, ...) \
+		({ \
+			fprintf(stderr, "%s:%d:" fmt ": %m\n", \
+				__func__, __LINE__, ##__VA_ARGS__); \
+			1; \
+		})
+
+int main(int argc, char *argvp[])
+{
+	int pfd[2], ns, uns, init_uns;
+	struct stat st1, st2;
+	char path[128];
+	pid_t pid;
+	char c;
+
+	if (pipe(pfd))
+		return 1;
+
+	pid = fork();
+	if (pid < 0)
+		return pr_err("fork");
+	if (pid == 0) {
+		prctl(PR_SET_PDEATHSIG, SIGKILL);
+		if (unshare(CLONE_NEWUTS | CLONE_NEWUSER))
+			return pr_err("unshare");
+		close(pfd[0]);
+		close(pfd[1]);
+		while (1)
+			sleep(1);
+		return 0;
+	}
+	close(pfd[1]);
+	if (read(pfd[0], &c, 1) != 0)
+		return pr_err("Unable to read from pipe");
+	close(pfd[0]);
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/uts", pid);
+	ns = open(path, O_RDONLY);
+	if (ns < 0)
+		return pr_err("Unable to open %s", path);
+
+	uns = ioctl(ns, NS_GET_USERNS);
+	if (uns < 0)
+		return pr_err("Unable to get an owning user namespace");
+
+	if (fstat(uns, &st1))
+		return pr_err("fstat");
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/user", pid);
+	if (stat(path, &st2))
+		return pr_err("stat");
+
+	if (st1.st_ino != st2.st_ino)
+		return pr_err("NS_GET_USERNS returned a wrong namespace");
+
+	init_uns = ioctl(uns, NS_GET_USERNS);
+	if (uns < 0)
+		return pr_err("Unable to get an owning user namespace");
+
+	if (ioctl(init_uns, NS_GET_USERNS) >= 0 || errno != ENOENT)
+		return pr_err("Don't get ENOENT");
+
+	if (unshare(CLONE_NEWUSER))
+		return pr_err("unshare");
+
+	if (ioctl(ns, NS_GET_USERNS) >= 0 || errno != EPERM)
+		return pr_err("Don't get EPERM");
+	if (ioctl(init_uns, NS_GET_USERNS) >= 0 || errno != EPERM)
+		return pr_err("Don't get EPERM");
+
+	kill(pid, SIGKILL);
+	wait(NULL);
+	return 0;
+}
diff --git a/tools/testing/selftests/nsfs/pidns.c b/tools/testing/selftests/nsfs/pidns.c
new file mode 100644
index 0000000..99b1131
--- /dev/null
+++ b/tools/testing/selftests/nsfs/pidns.c
@@ -0,0 +1,74 @@
+#define _GNU_SOURCE
+#include <sched.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+
+#define pr_err(fmt, ...) \
+		({ \
+			fprintf(stderr, "%s:%d:" fmt ": %m\n", \
+				__func__, __LINE__, ##__VA_ARGS__); \
+			1; \
+		})
+
+#define NSIO	0xb7
+#define NS_GET_USERNS   _IO(NSIO, 0x1)
+#define NS_GET_PARENT   _IO(NSIO, 0x2)
+
+#define __stack_aligned__	__attribute__((aligned(16)))
+struct cr_clone_arg {
+	char stack[128] __stack_aligned__;
+	char stack_ptr[0];
+};
+
+static int child(void *args)
+{
+	prctl(PR_SET_PDEATHSIG, SIGKILL);
+	while (1)
+		sleep(1);
+	exit(0);
+}
+
+int main(int argc, char *argv[])
+{
+	char path[] = "/proc/0123456789/ns/pid";
+	struct cr_clone_arg ca;
+	struct stat st1, st2;
+	int ns, pns;
+	pid_t pid;
+
+	pid = clone(child, ca.stack_ptr, CLONE_NEWPID | SIGCLD, NULL);
+	if (pid < 0)
+		return pr_err("clone");
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/pid", pid);
+	ns = open(path, O_RDONLY);
+	if (ns < 0)
+		return pr_err("Unable to open %s", path);
+
+	pns = ioctl(ns, NS_GET_PARENT);
+	if (pns < 0)
+		return pr_err("Unable to get a parent pidns");
+
+	if (stat("/proc/self/ns/pid", &st2))
+		return pr_err("Unable to stat %s", path);
+	if (fstat(pns, &st1))
+		return pr_err("Unable to stat the parent pidns");
+	if (st1.st_ino != st2.st_ino)
+		return pr_err("NS_GET_PARENT returned a wrong namespace");
+
+	if (ioctl(pns, NS_GET_PARENT) >= 0 || errno != ENOENT)
+		return pr_err("Don't get ENOENT");;
+
+	kill(pid, SIGKILL);
+	wait(NULL);
+	return 0;
+}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
  2016-07-14 18:20 ` [PATCH 1/5] namespaces: move user_ns into ns_common Andrey Vagin
@ 2016-07-15 12:21       ` kbuild test robot
  0 siblings, 0 replies; 142+ messages in thread
From: kbuild test robot @ 2016-07-15 12:21 UTC (permalink / raw)
  Cc: Andrey Vagin, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
	kbuild-all-JC7UmRfGjtg, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2219 bytes --]

Hi,

[auto build test ERROR on net/master]
[also build test ERROR on v4.7-rc7]
[cannot apply to next-20160715]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Andrey-Vagin/namespaces-move-user_ns-into-ns_common/20160715-181039
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or32-linux-gcc (GCC) 4.5.1-or32-1.0rc1
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All error/warnings (new ones prefixed by >>):

>> kernel/user.c:53:2: error: unknown field 'ns' specified in initializer
>> kernel/user.c:53:2: warning: missing braces around initializer
   kernel/user.c:53:2: warning: (near initialization for 'init_user_ns.<anonymous>')
>> kernel/user.c:53:2: error: incompatible types when initializing type 'struct user_namespace *' using type 'enum <anonymous>'

vim +53 kernel/user.c

f76d207a Eric W. Biederman 2012-08-30  47  			.count = 4294967295U,
f76d207a Eric W. Biederman 2012-08-30  48  		},
f76d207a Eric W. Biederman 2012-08-30  49  	},
c61a2810 Eric W. Biederman 2012-12-28  50  	.count = ATOMIC_INIT(3),
783291e6 Eric W. Biederman 2011-11-17  51  	.owner = GLOBAL_ROOT_UID,
783291e6 Eric W. Biederman 2011-11-17  52  	.group = GLOBAL_ROOT_GID,
435d5f4b Al Viro           2014-10-31 @53  	.ns.inum = PROC_USER_INIT_INO,
33c42940 Al Viro           2014-11-01  54  #ifdef CONFIG_USER_NS
33c42940 Al Viro           2014-11-01  55  	.ns.ops = &userns_operations,
33c42940 Al Viro           2014-11-01  56  #endif

:::::: The code at line 53 was first introduced by commit
:::::: 435d5f4bb2ccba3b791d9ef61d2590e30b8e806e common object embedded into various struct ....ns

:::::: TO: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
:::::: CC: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 7572 bytes --]

[-- Attachment #3: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
@ 2016-07-15 12:21       ` kbuild test robot
  0 siblings, 0 replies; 142+ messages in thread
From: kbuild test robot @ 2016-07-15 12:21 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: kbuild-all, linux-kernel, linux-api, containers, criu,
	linux-fsdevel, Andrey Vagin

[-- Attachment #1: Type: text/plain, Size: 2157 bytes --]

Hi,

[auto build test ERROR on net/master]
[also build test ERROR on v4.7-rc7]
[cannot apply to next-20160715]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Andrey-Vagin/namespaces-move-user_ns-into-ns_common/20160715-181039
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or32-linux-gcc (GCC) 4.5.1-or32-1.0rc1
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All error/warnings (new ones prefixed by >>):

>> kernel/user.c:53:2: error: unknown field 'ns' specified in initializer
>> kernel/user.c:53:2: warning: missing braces around initializer
   kernel/user.c:53:2: warning: (near initialization for 'init_user_ns.<anonymous>')
>> kernel/user.c:53:2: error: incompatible types when initializing type 'struct user_namespace *' using type 'enum <anonymous>'

vim +53 kernel/user.c

f76d207a Eric W. Biederman 2012-08-30  47  			.count = 4294967295U,
f76d207a Eric W. Biederman 2012-08-30  48  		},
f76d207a Eric W. Biederman 2012-08-30  49  	},
c61a2810 Eric W. Biederman 2012-12-28  50  	.count = ATOMIC_INIT(3),
783291e6 Eric W. Biederman 2011-11-17  51  	.owner = GLOBAL_ROOT_UID,
783291e6 Eric W. Biederman 2011-11-17  52  	.group = GLOBAL_ROOT_GID,
435d5f4b Al Viro           2014-10-31 @53  	.ns.inum = PROC_USER_INIT_INO,
33c42940 Al Viro           2014-11-01  54  #ifdef CONFIG_USER_NS
33c42940 Al Viro           2014-11-01  55  	.ns.ops = &userns_operations,
33c42940 Al Viro           2014-11-01  56  #endif

:::::: The code at line 53 was first introduced by commit
:::::: 435d5f4bb2ccba3b791d9ef61d2590e30b8e806e common object embedded into various struct ....ns

:::::: TO: Al Viro <viro@zeniv.linux.org.uk>
:::::: CC: Al Viro <viro@zeniv.linux.org.uk>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 7572 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
  2016-07-15  2:12       ` Andrey Vagin
@ 2016-07-16  8:21           ` kbuild test robot
  -1 siblings, 0 replies; 142+ messages in thread
From: kbuild test robot @ 2016-07-16  8:21 UTC (permalink / raw)
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
	criu-GEFAQzZX7r8dnm+yROfE0A, kbuild-all-JC7UmRfGjtg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages),
	Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 2517 bytes --]

Hi,

[auto build test WARNING on net/master]
[also build test WARNING on v4.7-rc7]
[cannot apply to next-20160715]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Andrey-Vagin/namespaces-move-user_ns-into-ns_common/20160716-093057
config: openrisc-allyesconfig (attached as .config)
compiler: or32-linux-gcc (GCC) 4.5.1-or32-1.0rc1
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All warnings (new ones prefixed by >>):

   kernel/user.c:53:2: error: unknown field 'ns' specified in initializer
   kernel/user.c:53:2: warning: missing braces around initializer
   kernel/user.c:53:2: warning: (near initialization for 'init_user_ns.<anonymous>')
   kernel/user.c:53:2: error: incompatible types when initializing type 'struct user_namespace *' using type 'enum <anonymous>'
   kernel/user.c:55:2: error: unknown field 'ns' specified in initializer
>> kernel/user.c:55:2: warning: initialization makes integer from pointer without a cast

vim +55 kernel/user.c

f76d207a Eric W. Biederman 2012-08-30  47  			.count = 4294967295U,
f76d207a Eric W. Biederman 2012-08-30  48  		},
f76d207a Eric W. Biederman 2012-08-30  49  	},
c61a2810 Eric W. Biederman 2012-12-28  50  	.count = ATOMIC_INIT(3),
783291e6 Eric W. Biederman 2011-11-17  51  	.owner = GLOBAL_ROOT_UID,
783291e6 Eric W. Biederman 2011-11-17  52  	.group = GLOBAL_ROOT_GID,
435d5f4b Al Viro           2014-10-31 @53  	.ns.inum = PROC_USER_INIT_INO,
33c42940 Al Viro           2014-11-01  54  #ifdef CONFIG_USER_NS
33c42940 Al Viro           2014-11-01 @55  	.ns.ops = &userns_operations,
33c42940 Al Viro           2014-11-01  56  #endif
9cc46516 Eric W. Biederman 2014-12-02  57  	.flags = USERNS_INIT_FLAGS,
6bd364d8 Xiao Guangrong    2013-12-13  58  #ifdef CONFIG_PERSISTENT_KEYRINGS

:::::: The code at line 55 was first introduced by commit
:::::: 33c429405a2c8d9e42afb9fee88a63cfb2de1e98 copy address of proc_ns_ops into ns_common

:::::: TO: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
:::::: CC: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 37947 bytes --]

[-- Attachment #3: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
@ 2016-07-16  8:21           ` kbuild test robot
  0 siblings, 0 replies; 142+ messages in thread
From: kbuild test robot @ 2016-07-16  8:21 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: kbuild-all, linux-kernel, linux-api, containers, criu,
	linux-fsdevel, Eric W. Biederman, James Bottomley,
	Michael Kerrisk (man-pages),
	W. Trevor King, Alexander Viro, Serge Hallyn, Andrey Vagin

[-- Attachment #1: Type: text/plain, Size: 2455 bytes --]

Hi,

[auto build test WARNING on net/master]
[also build test WARNING on v4.7-rc7]
[cannot apply to next-20160715]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Andrey-Vagin/namespaces-move-user_ns-into-ns_common/20160716-093057
config: openrisc-allyesconfig (attached as .config)
compiler: or32-linux-gcc (GCC) 4.5.1-or32-1.0rc1
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All warnings (new ones prefixed by >>):

   kernel/user.c:53:2: error: unknown field 'ns' specified in initializer
   kernel/user.c:53:2: warning: missing braces around initializer
   kernel/user.c:53:2: warning: (near initialization for 'init_user_ns.<anonymous>')
   kernel/user.c:53:2: error: incompatible types when initializing type 'struct user_namespace *' using type 'enum <anonymous>'
   kernel/user.c:55:2: error: unknown field 'ns' specified in initializer
>> kernel/user.c:55:2: warning: initialization makes integer from pointer without a cast

vim +55 kernel/user.c

f76d207a Eric W. Biederman 2012-08-30  47  			.count = 4294967295U,
f76d207a Eric W. Biederman 2012-08-30  48  		},
f76d207a Eric W. Biederman 2012-08-30  49  	},
c61a2810 Eric W. Biederman 2012-12-28  50  	.count = ATOMIC_INIT(3),
783291e6 Eric W. Biederman 2011-11-17  51  	.owner = GLOBAL_ROOT_UID,
783291e6 Eric W. Biederman 2011-11-17  52  	.group = GLOBAL_ROOT_GID,
435d5f4b Al Viro           2014-10-31 @53  	.ns.inum = PROC_USER_INIT_INO,
33c42940 Al Viro           2014-11-01  54  #ifdef CONFIG_USER_NS
33c42940 Al Viro           2014-11-01 @55  	.ns.ops = &userns_operations,
33c42940 Al Viro           2014-11-01  56  #endif
9cc46516 Eric W. Biederman 2014-12-02  57  	.flags = USERNS_INIT_FLAGS,
6bd364d8 Xiao Guangrong    2013-12-13  58  #ifdef CONFIG_PERSISTENT_KEYRINGS

:::::: The code at line 55 was first introduced by commit
:::::: 33c429405a2c8d9e42afb9fee88a63cfb2de1e98 copy address of proc_ns_ops into ns_common

:::::: TO: Al Viro <viro@zeniv.linux.org.uk>
:::::: CC: Al Viro <viro@zeniv.linux.org.uk>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 37947 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found] ` <1468520419-28220-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
                     ` (5 preceding siblings ...)
  2016-07-14 22:02   ` [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Andrey Vagin
@ 2016-07-21 14:41   ` Michael Kerrisk (man-pages)
  2016-07-23 21:14     ` W. Trevor King
  2016-08-01 18:20     ` Alban Crequy
  8 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-21 14:41 UTC (permalink / raw)
  To: Andrey Vagin, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: James Bottomley, Serge Hallyn, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman

Hi Andrey,

On 07/14/2016 08:20 PM, Andrey Vagin wrote:
> Each namespace has an owning user namespace and now there is not way
> to discover these relationships.
>
> Pid and user namepaces are hierarchical. There is no way to discover
> parent-child relationships too.
>
> Why we may want to know relationships between namespaces?
>
> One use would be visualization, in order to understand the running system.
> Another would be to answer the question: what capability does process X have to
> perform operations on a resource governed by namespace Y?
>
> One more use-case (which usually called abnormal) is checkpoint/restart.
> In CRIU we age going to dump and restore nested namespaces.
>
> There [1] was a discussion about which interface to choose to determing
> relationships between namespaces.
>
> Eric suggested to add two ioctl-s [2]:
>> Grumble, Grumble.  I think this may actually a case for creating ioctls
>> for these two cases.  Now that random nsfs file descriptors are bind
>> mountable the original reason for using proc files is not as pressing.
>>
>> One ioctl for the user namespace that owns a file descriptor.
>> One ioctl for the parent namespace of a namespace file descriptor.
>
> Here is an implementaions of these ioctl-s.

Could you add here an of the API in detail: what do these FDs refer to,
and how do you use them to solve the use case? And could you you add
that info to the commit messages please.

Thanks,

Michael


> [1] https://lkml.org/lkml/2016/7/6/158
> [2] https://lkml.org/lkml/2016/7/9/101
>
> Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> Cc: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: "W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org>
> Cc: Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
> Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
>
> --
> 2.5.5
>
>


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found] ` <1468520419-28220-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2016-07-21 14:41   ` Michael Kerrisk (man-pages)
  2016-07-14 18:20     ` Andrey Vagin
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-21 14:41 UTC (permalink / raw)
  To: Andrey Vagin, linux-kernel
  Cc: mtk.manpages, linux-api, containers, criu, linux-fsdevel,
	Eric W. Biederman, James Bottomley, W. Trevor King,
	Alexander Viro, Serge Hallyn

Hi Andrey,

On 07/14/2016 08:20 PM, Andrey Vagin wrote:
> Each namespace has an owning user namespace and now there is not way
> to discover these relationships.
>
> Pid and user namepaces are hierarchical. There is no way to discover
> parent-child relationships too.
>
> Why we may want to know relationships between namespaces?
>
> One use would be visualization, in order to understand the running system.
> Another would be to answer the question: what capability does process X have to
> perform operations on a resource governed by namespace Y?
>
> One more use-case (which usually called abnormal) is checkpoint/restart.
> In CRIU we age going to dump and restore nested namespaces.
>
> There [1] was a discussion about which interface to choose to determing
> relationships between namespaces.
>
> Eric suggested to add two ioctl-s [2]:
>> Grumble, Grumble.  I think this may actually a case for creating ioctls
>> for these two cases.  Now that random nsfs file descriptors are bind
>> mountable the original reason for using proc files is not as pressing.
>>
>> One ioctl for the user namespace that owns a file descriptor.
>> One ioctl for the parent namespace of a namespace file descriptor.
>
> Here is an implementaions of these ioctl-s.

Could you add here an of the API in detail: what do these FDs refer to,
and how do you use them to solve the use case? And could you you add
that info to the commit messages please.

Thanks,

Michael


> [1] https://lkml.org/lkml/2016/7/6/158
> [2] https://lkml.org/lkml/2016/7/9/101
>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
> Cc: "W. Trevor King" <wking@tremily.us>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Serge Hallyn <serge.hallyn@canonical.com>
>
> --
> 2.5.5
>
>


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-21 14:41   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-21 14:41 UTC (permalink / raw)
  To: Andrey Vagin, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman,
	James Bottomley, W. Trevor King, Alexander Viro, Serge Hallyn

Hi Andrey,

On 07/14/2016 08:20 PM, Andrey Vagin wrote:
> Each namespace has an owning user namespace and now there is not way
> to discover these relationships.
>
> Pid and user namepaces are hierarchical. There is no way to discover
> parent-child relationships too.
>
> Why we may want to know relationships between namespaces?
>
> One use would be visualization, in order to understand the running system.
> Another would be to answer the question: what capability does process X have to
> perform operations on a resource governed by namespace Y?
>
> One more use-case (which usually called abnormal) is checkpoint/restart.
> In CRIU we age going to dump and restore nested namespaces.
>
> There [1] was a discussion about which interface to choose to determing
> relationships between namespaces.
>
> Eric suggested to add two ioctl-s [2]:
>> Grumble, Grumble.  I think this may actually a case for creating ioctls
>> for these two cases.  Now that random nsfs file descriptors are bind
>> mountable the original reason for using proc files is not as pressing.
>>
>> One ioctl for the user namespace that owns a file descriptor.
>> One ioctl for the parent namespace of a namespace file descriptor.
>
> Here is an implementaions of these ioctl-s.

Could you add here an of the API in detail: what do these FDs refer to,
and how do you use them to solve the use case? And could you you add
that info to the commit messages please.

Thanks,

Michael


> [1] https://lkml.org/lkml/2016/7/6/158
> [2] https://lkml.org/lkml/2016/7/9/101
>
> Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> Cc: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: "W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org>
> Cc: Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
> Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
>
> --
> 2.5.5
>
>


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-21 14:41   ` Michael Kerrisk (man-pages)
  (?)
@ 2016-07-21 21:06       ` Andrew Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-21 21:06 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
	Eric W. Biederman, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Alexander Viro

[-- Attachment #1: Type: text/plain, Size: 2156 bytes --]

On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages) wrote:
> Hi Andrey,
> 
> On 07/14/2016 08:20 PM, Andrey Vagin wrote:

<snip>

> 
> Could you add here an of the API in detail: what do these FDs refer to,
> and how do you use them to solve the use case? And could you you add
> that info to the commit messages please.

Hi Michael,

A patch for man-pages is attached. It adds the following text to
namespaces(7).

Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
pace file descriptors.  The correct syntax is:

      fd = ioctl(ns_fd, ioctl_type);

where ioctl_type is one of the following:

NS_GET_USERNS
      Returns a file descriptor that refers to an owning  user  names‐
      pace.

NS_GET_PARENT
      Returns  a  file  descriptor  that refers to a parent namespace.
      This ioctl(2) can be used for pid and user namespaces. For  user
      namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
      ing.

In addition to generic ioctl(2) errors, the following specific ones can
occur:

EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.

EPERM  The  requested  namespace  is  outside  of the current namespace
      scope.

ENOENT ns_fd refers to the init namespace.

Thanks,
Andrew

> 
> Thanks,
> 
> Michael
> 
> 
> > [1] https://lkml.org/lkml/2016/7/6/158
> > [2] https://lkml.org/lkml/2016/7/9/101
> > 
> > Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> > Cc: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
> > Cc: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > Cc: "W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org>
> > Cc: Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
> > Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> > 
> > --
> > 2.5.5
> > 
> > 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

[-- Attachment #2: 0001-namespace.7-descirbe-NS_GET_USERNS-and-NS_GET-PARENT.patch --]
[-- Type: text/plain, Size: 1796 bytes --]

From 4b9194026f901c2247150bb3038c41658700f6dd Mon Sep 17 00:00:00 2001
From: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Date: Thu, 21 Jul 2016 13:58:06 -0700
Subject: [PATCH] namespace.7: descirbe NS_GET_USERNS and NS_GET-PARENT ioctl-s

Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 man7/namespaces.7 | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/man7/namespaces.7 b/man7/namespaces.7
index 98ed3e5..207e4a5 100644
--- a/man7/namespaces.7
+++ b/man7/namespaces.7
@@ -149,6 +149,49 @@ even if all processes in the namespace terminate.
 The file descriptor can be passed to
 .BR setns (2).
 
+Since Linux 4.X, the following
+.BR ioctl (2)
+calls are supported for namespace file descriptors.
+The correct syntax is:
+.PP
+.RS
+.nf
+.IB fd " = ioctl(" ns_fd ", " ioctl_type ");"
+.fi
+.RE
+.PP
+where
+.I ioctl_type
+is one of the following:
+.TP
+.B NS_GET_USERNS
+Returns a file descriptor that refers to an owning user namespace.
+.TP
+.B NS_GET_PARENT
+Returns a file descriptor that refers to a parent namespace. This
+.BR ioctl (2)
+can be used for pid and user namespaces. For user namespaces,
+.B NS_GET_PARENT
+and
+.B NS_GET_USERNS
+have the same meaning.
+.PP
+In addition to generic
+.BR ioctl (2)
+errors, the following specific ones can occur:
+.PP
+.TP
+.B EINVAL
+.B NS_GET_PARENT
+was called for a nonhierarchical namespace.
+.TP
+.B EPERM
+The requested namespace is outside of the current namespace scope.
+.TP
+.B ENOENT
+.IB ns_fd
+refers to the init namespace.
+.PP
 In Linux 3.7 and earlier, these files were visible as hard links.
 Since Linux 3.8, they appear as symbolic links.
 If two processes are in the same namespace, then the inode numbers of their
-- 
2.5.5


[-- Attachment #3: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-21 21:06       ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-21 21:06 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Andrey Vagin, linux-kernel, linux-api, containers, criu,
	linux-fsdevel, Eric W. Biederman, James Bottomley,
	W. Trevor King, Alexander Viro, Serge Hallyn

[-- Attachment #1: Type: text/plain, Size: 1912 bytes --]

On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages) wrote:
> Hi Andrey,
> 
> On 07/14/2016 08:20 PM, Andrey Vagin wrote:

<snip>

> 
> Could you add here an of the API in detail: what do these FDs refer to,
> and how do you use them to solve the use case? And could you you add
> that info to the commit messages please.

Hi Michael,

A patch for man-pages is attached. It adds the following text to
namespaces(7).

Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
pace file descriptors.  The correct syntax is:

      fd = ioctl(ns_fd, ioctl_type);

where ioctl_type is one of the following:

NS_GET_USERNS
      Returns a file descriptor that refers to an owning  user  names‐
      pace.

NS_GET_PARENT
      Returns  a  file  descriptor  that refers to a parent namespace.
      This ioctl(2) can be used for pid and user namespaces. For  user
      namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
      ing.

In addition to generic ioctl(2) errors, the following specific ones can
occur:

EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.

EPERM  The  requested  namespace  is  outside  of the current namespace
      scope.

ENOENT ns_fd refers to the init namespace.

Thanks,
Andrew

> 
> Thanks,
> 
> Michael
> 
> 
> > [1] https://lkml.org/lkml/2016/7/6/158
> > [2] https://lkml.org/lkml/2016/7/9/101
> > 
> > Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> > Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
> > Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
> > Cc: "W. Trevor King" <wking@tremily.us>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Cc: Serge Hallyn <serge.hallyn@canonical.com>
> > 
> > --
> > 2.5.5
> > 
> > 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

[-- Attachment #2: 0001-namespace.7-descirbe-NS_GET_USERNS-and-NS_GET-PARENT.patch --]
[-- Type: text/plain, Size: 1739 bytes --]

>From 4b9194026f901c2247150bb3038c41658700f6dd Mon Sep 17 00:00:00 2001
From: Andrey Vagin <avagin@openvz.org>
Date: Thu, 21 Jul 2016 13:58:06 -0700
Subject: [PATCH] namespace.7: descirbe NS_GET_USERNS and NS_GET-PARENT ioctl-s

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 man7/namespaces.7 | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/man7/namespaces.7 b/man7/namespaces.7
index 98ed3e5..207e4a5 100644
--- a/man7/namespaces.7
+++ b/man7/namespaces.7
@@ -149,6 +149,49 @@ even if all processes in the namespace terminate.
 The file descriptor can be passed to
 .BR setns (2).
 
+Since Linux 4.X, the following
+.BR ioctl (2)
+calls are supported for namespace file descriptors.
+The correct syntax is:
+.PP
+.RS
+.nf
+.IB fd " = ioctl(" ns_fd ", " ioctl_type ");"
+.fi
+.RE
+.PP
+where
+.I ioctl_type
+is one of the following:
+.TP
+.B NS_GET_USERNS
+Returns a file descriptor that refers to an owning user namespace.
+.TP
+.B NS_GET_PARENT
+Returns a file descriptor that refers to a parent namespace. This
+.BR ioctl (2)
+can be used for pid and user namespaces. For user namespaces,
+.B NS_GET_PARENT
+and
+.B NS_GET_USERNS
+have the same meaning.
+.PP
+In addition to generic
+.BR ioctl (2)
+errors, the following specific ones can occur:
+.PP
+.TP
+.B EINVAL
+.B NS_GET_PARENT
+was called for a nonhierarchical namespace.
+.TP
+.B EPERM
+The requested namespace is outside of the current namespace scope.
+.TP
+.B ENOENT
+.IB ns_fd
+refers to the init namespace.
+.PP
 In Linux 3.7 and earlier, these files were visible as hard links.
 Since Linux 3.8, they appear as symbolic links.
 If two processes are in the same namespace, then the inode numbers of their
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-21 21:06       ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-21 21:06 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
	Eric W. Biederman, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Alexander Viro

[-- Attachment #1: Type: text/plain, Size: 2156 bytes --]

On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages) wrote:
> Hi Andrey,
> 
> On 07/14/2016 08:20 PM, Andrey Vagin wrote:

<snip>

> 
> Could you add here an of the API in detail: what do these FDs refer to,
> and how do you use them to solve the use case? And could you you add
> that info to the commit messages please.

Hi Michael,

A patch for man-pages is attached. It adds the following text to
namespaces(7).

Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
pace file descriptors.  The correct syntax is:

      fd = ioctl(ns_fd, ioctl_type);

where ioctl_type is one of the following:

NS_GET_USERNS
      Returns a file descriptor that refers to an owning  user  names‐
      pace.

NS_GET_PARENT
      Returns  a  file  descriptor  that refers to a parent namespace.
      This ioctl(2) can be used for pid and user namespaces. For  user
      namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
      ing.

In addition to generic ioctl(2) errors, the following specific ones can
occur:

EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.

EPERM  The  requested  namespace  is  outside  of the current namespace
      scope.

ENOENT ns_fd refers to the init namespace.

Thanks,
Andrew

> 
> Thanks,
> 
> Michael
> 
> 
> > [1] https://lkml.org/lkml/2016/7/6/158
> > [2] https://lkml.org/lkml/2016/7/9/101
> > 
> > Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> > Cc: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
> > Cc: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > Cc: "W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org>
> > Cc: Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
> > Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> > 
> > --
> > 2.5.5
> > 
> > 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

[-- Attachment #2: 0001-namespace.7-descirbe-NS_GET_USERNS-and-NS_GET-PARENT.patch --]
[-- Type: text/plain, Size: 1797 bytes --]

>From 4b9194026f901c2247150bb3038c41658700f6dd Mon Sep 17 00:00:00 2001
From: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Date: Thu, 21 Jul 2016 13:58:06 -0700
Subject: [PATCH] namespace.7: descirbe NS_GET_USERNS and NS_GET-PARENT ioctl-s

Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 man7/namespaces.7 | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/man7/namespaces.7 b/man7/namespaces.7
index 98ed3e5..207e4a5 100644
--- a/man7/namespaces.7
+++ b/man7/namespaces.7
@@ -149,6 +149,49 @@ even if all processes in the namespace terminate.
 The file descriptor can be passed to
 .BR setns (2).
 
+Since Linux 4.X, the following
+.BR ioctl (2)
+calls are supported for namespace file descriptors.
+The correct syntax is:
+.PP
+.RS
+.nf
+.IB fd " = ioctl(" ns_fd ", " ioctl_type ");"
+.fi
+.RE
+.PP
+where
+.I ioctl_type
+is one of the following:
+.TP
+.B NS_GET_USERNS
+Returns a file descriptor that refers to an owning user namespace.
+.TP
+.B NS_GET_PARENT
+Returns a file descriptor that refers to a parent namespace. This
+.BR ioctl (2)
+can be used for pid and user namespaces. For user namespaces,
+.B NS_GET_PARENT
+and
+.B NS_GET_USERNS
+have the same meaning.
+.PP
+In addition to generic
+.BR ioctl (2)
+errors, the following specific ones can occur:
+.PP
+.TP
+.B EINVAL
+.B NS_GET_PARENT
+was called for a nonhierarchical namespace.
+.TP
+.B EPERM
+The requested namespace is outside of the current namespace scope.
+.TP
+.B ENOENT
+.IB ns_fd
+refers to the init namespace.
+.PP
 In Linux 3.7 and earlier, these files were visible as hard links.
 Since Linux 3.8, they appear as symbolic links.
 If two processes are in the same namespace, then the inode numbers of their
-- 
2.5.5


[-- Attachment #3: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]       ` <20160721210650.GA10989-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
@ 2016-07-22  6:48         ` Michael Kerrisk (man-pages)
       [not found]           ` <1515f5f2-5a49-fcab-61f4-8b627d3ba3e2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-22  6:48 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
	criu-GEFAQzZX7r8dnm+yROfE0A, mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman

Hi Andrey,

On 07/21/2016 11:06 PM, Andrew Vagin wrote:
> On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages) wrote:
>> Hi Andrey,
>>
>> On 07/14/2016 08:20 PM, Andrey Vagin wrote:
>
> <snip>
>
>>
>> Could you add here an of the API in detail: what do these FDs refer to,
>> and how do you use them to solve the use case? And could you you add
>> that info to the commit messages please.
>
> Hi Michael,
>
> A patch for man-pages is attached. It adds the following text to
> namespaces(7).
>
> Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
> pace file descriptors.  The correct syntax is:
>
>       fd = ioctl(ns_fd, ioctl_type);
>
> where ioctl_type is one of the following:
>
> NS_GET_USERNS
>       Returns a file descriptor that refers to an owning  user  names‐
>       pace.
>
> NS_GET_PARENT
>       Returns  a  file  descriptor  that refers to a parent namespace.
>       This ioctl(2) can be used for pid and user namespaces. For  user
>       namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
>       ing.
>
> In addition to generic ioctl(2) errors, the following specific ones can
> occur:
>
> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>
> EPERM  The  requested  namespace  is  outside  of the current namespace
>       scope.
>
> ENOENT ns_fd refers to the init namespace.

Thanks for this. But still part of the question remains unanswered.
How do we (in user-space) use the file descriptors to answer any of
the questions that this patch series was designed to solve? (This
info should be in the commit message and the man-pages patch.)

Thanks,

Michael


>>> [1] https://lkml.org/lkml/2016/7/6/158
>>> [2] https://lkml.org/lkml/2016/7/9/101
>>>
>>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>>> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
>>> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
>>> Cc: "W. Trevor King" <wking@tremily.us>
>>> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
>>> Cc: Serge Hallyn <serge.hallyn@canonical.com>
>>>
>>> --
>>> 2.5.5
>>>
>>>
>>
>>
>> --
>> Michael Kerrisk
>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>> Linux/UNIX System Programming Training: http://man7.org/training/


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-22  6:48         ` Michael Kerrisk (man-pages)
@ 2016-07-22 18:25               ` Andrey Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-22 18:25 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Serge Hallyn, Andrew Vagin, criu-GEFAQzZX7r8dnm+yROfE0A,
	Linux API, Linux Containers, LKML, James Bottomley,
	Alexander Viro, linux-fsdevel, Eric W. Biederman

On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> Hi Andrey,
>
>
> On 07/21/2016 11:06 PM, Andrew Vagin wrote:
>>
>> On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages)
>> wrote:
>>>
>>> Hi Andrey,
>>>
>>> On 07/14/2016 08:20 PM, Andrey Vagin wrote:
>>
>>
>> <snip>
>>
>>>
>>> Could you add here an of the API in detail: what do these FDs refer to,
>>> and how do you use them to solve the use case? And could you you add
>>> that info to the commit messages please.
>>
>>
>> Hi Michael,
>>
>> A patch for man-pages is attached. It adds the following text to
>> namespaces(7).
>>
>> Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
>> pace file descriptors.  The correct syntax is:
>>
>>       fd = ioctl(ns_fd, ioctl_type);
>>
>> where ioctl_type is one of the following:
>>
>> NS_GET_USERNS
>>       Returns a file descriptor that refers to an owning  user  names‐
>>       pace.
>>
>> NS_GET_PARENT
>>       Returns  a  file  descriptor  that refers to a parent namespace.
>>       This ioctl(2) can be used for pid and user namespaces. For  user
>>       namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
>>       ing.
>>
>> In addition to generic ioctl(2) errors, the following specific ones can
>> occur:
>>
>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>>
>> EPERM  The  requested  namespace  is  outside  of the current namespace
>>       scope.
>>
>> ENOENT ns_fd refers to the init namespace.
>
>
> Thanks for this. But still part of the question remains unanswered.
> How do we (in user-space) use the file descriptors to answer any of
> the questions that this patch series was designed to solve? (This
> info should be in the commit message and the man-pages patch.)

I'm sorry, but I am not sure that I understand what you ask.

Here are the origin questions:
Someone else then asked me a question that led me to wonder about
generally introspecting on the parental relationships between user
namespaces and the association of other namespaces types with user
namespaces. One use would be visualization, in order to understand the
running system. Another would be to answer the question I already
mentioned: what capability does process X have to perform operations
on a resource governed by namespace Y?

Here is an example which shows how we can get the owning namespace
inode number by using these ioctl-s.

$ ls -l /proc/13929/ns/pid
lrwxrwxrwx 1 root root 0 Jul 22 21:03 /proc/13929/ns/pid -> 'pid:[4026532228]'

$ ./nsowner /proc/13929/ns/pid
user:[4026532227]

The owning user namespace for pid:[4026532228] is user:[4026532227].

The nsowner  tool is cimpiled from this code:

int main(int argc, char *argv[])
{
        char buf[128], path[] = "/proc/self/fd/0123456789";
        int ns, uns, ret;

        ns = open(argv[1], O_RDONLY);
        if (ns < 0)
                return 1;

        uns = ioctl(ns, NS_GET_USERNS);
        if (uns < 0)
                return 1;

        snprintf(path, sizeof(path), "/proc/self/fd/%d", uns);
        ret = readlink(path, buf, sizeof(buf) - 1);
        if (ret < 0)
                return 1;
        buf[ret] = 0;

        printf("%s\n", buf);

        return 0;
}

Does this example answer to the origin question? If it isn't, could
you eloborate what you expect to see here.

And I wrote one more example which show all relationships between
namespaces. It enumirates all processes in a system, collects all
namespaces and determins parent and owning namespaces for each of
them, then it constructs a namespace tree and shows it.

Here is a code: https://gist.github.com/avagin/db805f95e15ffb0af7e559dbb8de4418

Here is an example of output for my test system:
[root@fc24 nsfs]# ./nstree
user:[4026531837]
 \__  mnt:[4026532203]
 \__  ipc:[4026531839]
 \__  user:[4026532224]
     \__  user:[4026532226]
         \__  user:[4026532227]
             \__  pid:[4026532228]
     \__  pid:[4026532225]
         \__  pid:[4026532228]
 \__  user:[4026532221]
     \__  pid:[4026532222]
     \__  user:[4026532223]
 \__  mnt:[4026532211]
 \__  uts:[4026531838]
 \__  cgroup:[4026531835]
 \__  pid:[4026531836]
     \__  pid:[4026532225]
         \__  pid:[4026532228]
     \__  pid:[4026532222]
 \__  mnt:[4026531857]
 \__  mnt:[4026531840]
 \__  net:[4026531957]

Thanks,
Andrew

>
> Thanks,
>
> Michael
>
>
>>>> [1] https://lkml.org/lkml/2016/7/6/158
>>>> [2] https://lkml.org/lkml/2016/7/9/101
>>>>
>>>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>>>> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
>>>> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
>>>> Cc: "W. Trevor King" <wking@tremily.us>
>>>> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
>>>> Cc: Serge Hallyn <serge.hallyn@canonical.com>
>>>>
>>>> --
>>>> 2.5.5
>>>>
>>>>
>>>
>>>
>>> --
>>> Michael Kerrisk
>>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>>> Linux/UNIX System Programming Training: http://man7.org/training/
>
>
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-22 18:25               ` Andrey Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrey Vagin @ 2016-07-22 18:25 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Andrew Vagin, James Bottomley, Serge Hallyn, Linux API,
	Linux Containers, LKML, Alexander Viro, criu, linux-fsdevel,
	Eric W. Biederman

On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> Hi Andrey,
>
>
> On 07/21/2016 11:06 PM, Andrew Vagin wrote:
>>
>> On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages)
>> wrote:
>>>
>>> Hi Andrey,
>>>
>>> On 07/14/2016 08:20 PM, Andrey Vagin wrote:
>>
>>
>> <snip>
>>
>>>
>>> Could you add here an of the API in detail: what do these FDs refer to,
>>> and how do you use them to solve the use case? And could you you add
>>> that info to the commit messages please.
>>
>>
>> Hi Michael,
>>
>> A patch for man-pages is attached. It adds the following text to
>> namespaces(7).
>>
>> Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
>> pace file descriptors.  The correct syntax is:
>>
>>       fd = ioctl(ns_fd, ioctl_type);
>>
>> where ioctl_type is one of the following:
>>
>> NS_GET_USERNS
>>       Returns a file descriptor that refers to an owning  user  names‐
>>       pace.
>>
>> NS_GET_PARENT
>>       Returns  a  file  descriptor  that refers to a parent namespace.
>>       This ioctl(2) can be used for pid and user namespaces. For  user
>>       namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
>>       ing.
>>
>> In addition to generic ioctl(2) errors, the following specific ones can
>> occur:
>>
>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>>
>> EPERM  The  requested  namespace  is  outside  of the current namespace
>>       scope.
>>
>> ENOENT ns_fd refers to the init namespace.
>
>
> Thanks for this. But still part of the question remains unanswered.
> How do we (in user-space) use the file descriptors to answer any of
> the questions that this patch series was designed to solve? (This
> info should be in the commit message and the man-pages patch.)

I'm sorry, but I am not sure that I understand what you ask.

Here are the origin questions:
Someone else then asked me a question that led me to wonder about
generally introspecting on the parental relationships between user
namespaces and the association of other namespaces types with user
namespaces. One use would be visualization, in order to understand the
running system. Another would be to answer the question I already
mentioned: what capability does process X have to perform operations
on a resource governed by namespace Y?

Here is an example which shows how we can get the owning namespace
inode number by using these ioctl-s.

$ ls -l /proc/13929/ns/pid
lrwxrwxrwx 1 root root 0 Jul 22 21:03 /proc/13929/ns/pid -> 'pid:[4026532228]'

$ ./nsowner /proc/13929/ns/pid
user:[4026532227]

The owning user namespace for pid:[4026532228] is user:[4026532227].

The nsowner  tool is cimpiled from this code:

int main(int argc, char *argv[])
{
        char buf[128], path[] = "/proc/self/fd/0123456789";
        int ns, uns, ret;

        ns = open(argv[1], O_RDONLY);
        if (ns < 0)
                return 1;

        uns = ioctl(ns, NS_GET_USERNS);
        if (uns < 0)
                return 1;

        snprintf(path, sizeof(path), "/proc/self/fd/%d", uns);
        ret = readlink(path, buf, sizeof(buf) - 1);
        if (ret < 0)
                return 1;
        buf[ret] = 0;

        printf("%s\n", buf);

        return 0;
}

Does this example answer to the origin question? If it isn't, could
you eloborate what you expect to see here.

And I wrote one more example which show all relationships between
namespaces. It enumirates all processes in a system, collects all
namespaces and determins parent and owning namespaces for each of
them, then it constructs a namespace tree and shows it.

Here is a code: https://gist.github.com/avagin/db805f95e15ffb0af7e559dbb8de4418

Here is an example of output for my test system:
[root@fc24 nsfs]# ./nstree
user:[4026531837]
 \__  mnt:[4026532203]
 \__  ipc:[4026531839]
 \__  user:[4026532224]
     \__  user:[4026532226]
         \__  user:[4026532227]
             \__  pid:[4026532228]
     \__  pid:[4026532225]
         \__  pid:[4026532228]
 \__  user:[4026532221]
     \__  pid:[4026532222]
     \__  user:[4026532223]
 \__  mnt:[4026532211]
 \__  uts:[4026531838]
 \__  cgroup:[4026531835]
 \__  pid:[4026531836]
     \__  pid:[4026532225]
         \__  pid:[4026532228]
     \__  pid:[4026532222]
 \__  mnt:[4026531857]
 \__  mnt:[4026531840]
 \__  net:[4026531957]

Thanks,
Andrew

>
> Thanks,
>
> Michael
>
>
>>>> [1] https://lkml.org/lkml/2016/7/6/158
>>>> [2] https://lkml.org/lkml/2016/7/9/101
>>>>
>>>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>>>> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
>>>> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
>>>> Cc: "W. Trevor King" <wking@tremily.us>
>>>> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
>>>> Cc: Serge Hallyn <serge.hallyn@canonical.com>
>>>>
>>>> --
>>>> 2.5.5
>>>>
>>>>
>>>
>>>
>>> --
>>> Michael Kerrisk
>>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>>> Linux/UNIX System Programming Training: http://man7.org/training/
>
>
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-14 18:20 ` Andrey Vagin
@ 2016-07-23 21:14     ` W. Trevor King
  -1 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-23 21:14 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: James Bottomley, Serge Hallyn, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
	criu-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

[-- Attachment #1.1: Type: text/plain, Size: 2211 bytes --]

On Thu, Jul 14, 2016 at 11:20:14AM -0700, Andrey Vagin wrote:
> Pid and user namepaces are hierarchical. There is no way to discover
> parent-child relationships too.

It bothers me that network namespaces are not hierarchical too ;).
namespaces(7) and clone(2) both have:

  When a network namespace is freed (i.e., when the last process in
  the namespace terminates), its physical network devices are moved
  back to the initial network namespace (not to the parent of the
  process).

So the initial network namespace (the head of net_namespace_list?) is
special [1].  To understand how physical network devices will be
handled, it seems like we want to treat network devices as a depth-1
tree, with all non-initial net namespaces as children of the initial
net namespace.  Can we extend this series' NS_GET_PARENT to return:

* EPERM for an unprivileged caller (like this series currently does
  for PID namespaces),
* ENOENT when called on net_namespace_list, and
* net_namespace_list when called on any other net namespace.

If that sounds reasonable, I'm happy to stumble my way through a patch
;).

And one benefit of the net_namespace_list approach is that it will be
really easy to walk children if we ever add a parent → children lookup
service to mirror this series' child → parent service.

Cheers,
Trevor

[1]: The commit message for 2b035b39 (net: Batch network namespace
  destruction, 2009-11-29) opens with:

    It is fairly common to kill several network namespaces at once.
    Either because they are nested one inside the other or…

  which I'm having trouble understanding if network namespaces aren't
  hierarchical (and they don't seem to be, except for the initial
  network namespace being special).  Maybe nested network namespaces
  were on the table at one point but never materialized?

  net->list looks like a reference to that namespace's entry in
  net_namespace_list, and I didn't see anything else that looked like
  a reference to a parent or list of children.

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-23 21:14     ` W. Trevor King
  0 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-23 21:14 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel, linux-api, containers, criu, linux-fsdevel,
	Eric W. Biederman, James Bottomley, Michael Kerrisk (man-pages),
	Alexander Viro, Serge Hallyn

[-- Attachment #1: Type: text/plain, Size: 2211 bytes --]

On Thu, Jul 14, 2016 at 11:20:14AM -0700, Andrey Vagin wrote:
> Pid and user namepaces are hierarchical. There is no way to discover
> parent-child relationships too.

It bothers me that network namespaces are not hierarchical too ;).
namespaces(7) and clone(2) both have:

  When a network namespace is freed (i.e., when the last process in
  the namespace terminates), its physical network devices are moved
  back to the initial network namespace (not to the parent of the
  process).

So the initial network namespace (the head of net_namespace_list?) is
special [1].  To understand how physical network devices will be
handled, it seems like we want to treat network devices as a depth-1
tree, with all non-initial net namespaces as children of the initial
net namespace.  Can we extend this series' NS_GET_PARENT to return:

* EPERM for an unprivileged caller (like this series currently does
  for PID namespaces),
* ENOENT when called on net_namespace_list, and
* net_namespace_list when called on any other net namespace.

If that sounds reasonable, I'm happy to stumble my way through a patch
;).

And one benefit of the net_namespace_list approach is that it will be
really easy to walk children if we ever add a parent → children lookup
service to mirror this series' child → parent service.

Cheers,
Trevor

[1]: The commit message for 2b035b39 (net: Batch network namespace
  destruction, 2009-11-29) opens with:

    It is fairly common to kill several network namespaces at once.
    Either because they are nested one inside the other or…

  which I'm having trouble understanding if network namespaces aren't
  hierarchical (and they don't seem to be, except for the initial
  network namespace being special).  Maybe nested network namespaces
  were on the table at one point but never materialized?

  net->list looks like a reference to that namespace's entry in
  net_namespace_list, and I didn't see anything else that looked like
  a reference to a parent or list of children.

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-23 21:14     ` W. Trevor King
@ 2016-07-23 21:38         ` James Bottomley
  -1 siblings, 0 replies; 142+ messages in thread
From: James Bottomley @ 2016-07-23 21:38 UTC (permalink / raw)
  To: W. Trevor King, Andrey Vagin
  Cc: Serge Hallyn, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
	Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Michael Kerrisk (man-pages),
	Eric W. Biederman


[-- Attachment #1.1: Type: text/plain, Size: 1852 bytes --]

On Sat, 2016-07-23 at 14:14 -0700, W. Trevor King wrote:
> On Thu, Jul 14, 2016 at 11:20:14AM -0700, Andrey Vagin wrote:
> > Pid and user namepaces are hierarchical. There is no way to 
> > discover parent-child relationships too.
> 
> It bothers me that network namespaces are not hierarchical too ;).

Well, there's a reason for that: mapping namespaces need to be be
hierarchical because the mapping may be remapped; The initial point for
creating a new namespace is the mapped endpoint of the old one.  Label
based namespaces don't really have any need to be.

> namespaces(7) and clone(2) both have:
> 
>   When a network namespace is freed (i.e., when the last process in
>   the namespace terminates), its physical network devices are moved
>   back to the initial network namespace (not to the parent of the
>   process).
> 
> So the initial network namespace (the head of net_namespace_list?) is
> special [1].  To understand how physical network devices will be
> handled, it seems like we want to treat network devices as a depth-1
> tree, with all non-initial net namespaces as children of the initial
> net namespace.  Can we extend this series' NS_GET_PARENT to return:
> 
> * EPERM for an unprivileged caller (like this series currently does
>   for PID namespaces),
> * ENOENT when called on net_namespace_list, and
> * net_namespace_list when called on any other net namespace.

What's the practical application of this?  independent net namespaces
are managed by the ip netns command.  It pins them by a bind mount in a
flat fashion; if we make them hierarchical the tool would probably need
updating to reflect this, so we're going to need a reason to give the
network people.  Just having the interfaces not go back to root when
you do an ip netns delete doesn't seem very compelling.

James


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-23 21:38         ` James Bottomley
  0 siblings, 0 replies; 142+ messages in thread
From: James Bottomley @ 2016-07-23 21:38 UTC (permalink / raw)
  To: W. Trevor King, Andrey Vagin
  Cc: Serge Hallyn, linux-api, containers, linux-kernel,
	Alexander Viro, criu, Eric W. Biederman, linux-fsdevel,
	Michael Kerrisk (man-pages)

[-- Attachment #1: Type: text/plain, Size: 1852 bytes --]

On Sat, 2016-07-23 at 14:14 -0700, W. Trevor King wrote:
> On Thu, Jul 14, 2016 at 11:20:14AM -0700, Andrey Vagin wrote:
> > Pid and user namepaces are hierarchical. There is no way to 
> > discover parent-child relationships too.
> 
> It bothers me that network namespaces are not hierarchical too ;).

Well, there's a reason for that: mapping namespaces need to be be
hierarchical because the mapping may be remapped; The initial point for
creating a new namespace is the mapped endpoint of the old one.  Label
based namespaces don't really have any need to be.

> namespaces(7) and clone(2) both have:
> 
>   When a network namespace is freed (i.e., when the last process in
>   the namespace terminates), its physical network devices are moved
>   back to the initial network namespace (not to the parent of the
>   process).
> 
> So the initial network namespace (the head of net_namespace_list?) is
> special [1].  To understand how physical network devices will be
> handled, it seems like we want to treat network devices as a depth-1
> tree, with all non-initial net namespaces as children of the initial
> net namespace.  Can we extend this series' NS_GET_PARENT to return:
> 
> * EPERM for an unprivileged caller (like this series currently does
>   for PID namespaces),
> * ENOENT when called on net_namespace_list, and
> * net_namespace_list when called on any other net namespace.

What's the practical application of this?  independent net namespaces
are managed by the ip netns command.  It pins them by a bind mount in a
flat fashion; if we make them hierarchical the tool would probably need
updating to reflect this, so we're going to need a reason to give the
network people.  Just having the interfaces not go back to root when
you do an ip netns delete doesn't seem very compelling.

James


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]           ` <20160723215802.GO24913-q4NCUed9G3sTnwFZoN752g@public.gmane.org>
@ 2016-07-23 21:56             ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-23 21:56 UTC (permalink / raw)
  To: W. Trevor King
  Cc: Serge Hallyn, Andrey Vagin, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Michael Kerrisk (man-pages)

"W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org> writes:

2> On Sat, Jul 23, 2016 at 02:38:56PM -0700, James Bottomley wrote:
>> On Sat, 2016-07-23 at 14:14 -0700, W. Trevor King wrote:
>> > namespaces(7) and clone(2) both have:
>> > 
>> >   When a network namespace is freed (i.e., when the last process
>> >   in the namespace terminates), its physical network devices are
>> >   moved back to the initial network namespace (not to the parent
>> >   of the process).
>> > 
>> > So the initial network namespace (the head of net_namespace_list?)
>> > is special [1].  To understand how physical network devices will
>> > be handled, it seems like we want to treat network devices as a
>> > depth-1 tree, with all non-initial net namespaces as children of
>> > the initial net namespace.  Can we extend this series'
>> > NS_GET_PARENT to return:
>> > 
>> > * EPERM for an unprivileged caller (like this series currently does
>> >   for PID namespaces),
>> > * ENOENT when called on net_namespace_list, and
>> > * net_namespace_list when called on any other net namespace.
>> 
>> What's the practical application of this?  independent net
>> namespaces are managed by the ip netns command.  It pins them by a
>> bind mount in a flat fashion; if we make them hierarchical the tool
>> would probably need updating to reflect this, so we're going to need
>> a reason to give the network people.  Just having the interfaces not
>> go back to root when you do an ip netns delete doesn't seem very
>> compelling.
>
> I'm not suggesting we add support for deeper nesting, I'm suggesting
> we use NS_GET_PARENT to allow sufficiently privileged users to
> determine if a given net namespace is the initial net namespace.  You
> could do this already with something like:
>
> 1. Create a new net namespace.
> 2. Add a physical network device to that namespace.
> 3. Delete that namespace.
> 4. See if the physical network device shows up in your
>    initial-net-namespace candidate.
> 5. Delete the physical network device (hopefully it ended up somewhere
>    you can find it ;).
>
> But using an NS_GET_PARENT call seems much safer and easier.

Have you had the problem in practice where you can't tell which network
namespace is the initial network namespace.  This all seems like a
theoretical problem rather than a real one.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]           ` <20160723215802.GO24913-q4NCUed9G3sTnwFZoN752g@public.gmane.org>
@ 2016-07-23 21:56             ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-23 21:56 UTC (permalink / raw)
  To: W. Trevor King
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn, linux-api,
	containers, linux-kernel, Alexander Viro, criu, linux-fsdevel,
	Michael Kerrisk (man-pages)

"W. Trevor King" <wking@tremily.us> writes:

2> On Sat, Jul 23, 2016 at 02:38:56PM -0700, James Bottomley wrote:
>> On Sat, 2016-07-23 at 14:14 -0700, W. Trevor King wrote:
>> > namespaces(7) and clone(2) both have:
>> > 
>> >   When a network namespace is freed (i.e., when the last process
>> >   in the namespace terminates), its physical network devices are
>> >   moved back to the initial network namespace (not to the parent
>> >   of the process).
>> > 
>> > So the initial network namespace (the head of net_namespace_list?)
>> > is special [1].  To understand how physical network devices will
>> > be handled, it seems like we want to treat network devices as a
>> > depth-1 tree, with all non-initial net namespaces as children of
>> > the initial net namespace.  Can we extend this series'
>> > NS_GET_PARENT to return:
>> > 
>> > * EPERM for an unprivileged caller (like this series currently does
>> >   for PID namespaces),
>> > * ENOENT when called on net_namespace_list, and
>> > * net_namespace_list when called on any other net namespace.
>> 
>> What's the practical application of this?  independent net
>> namespaces are managed by the ip netns command.  It pins them by a
>> bind mount in a flat fashion; if we make them hierarchical the tool
>> would probably need updating to reflect this, so we're going to need
>> a reason to give the network people.  Just having the interfaces not
>> go back to root when you do an ip netns delete doesn't seem very
>> compelling.
>
> I'm not suggesting we add support for deeper nesting, I'm suggesting
> we use NS_GET_PARENT to allow sufficiently privileged users to
> determine if a given net namespace is the initial net namespace.  You
> could do this already with something like:
>
> 1. Create a new net namespace.
> 2. Add a physical network device to that namespace.
> 3. Delete that namespace.
> 4. See if the physical network device shows up in your
>    initial-net-namespace candidate.
> 5. Delete the physical network device (hopefully it ended up somewhere
>    you can find it ;).
>
> But using an NS_GET_PARENT call seems much safer and easier.

Have you had the problem in practice where you can't tell which network
namespace is the initial network namespace.  This all seems like a
theoretical problem rather than a real one.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-23 21:56             ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-23 21:56 UTC (permalink / raw)
  To: W. Trevor King
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
	criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

"W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org> writes:

2> On Sat, Jul 23, 2016 at 02:38:56PM -0700, James Bottomley wrote:
>> On Sat, 2016-07-23 at 14:14 -0700, W. Trevor King wrote:
>> > namespaces(7) and clone(2) both have:
>> > 
>> >   When a network namespace is freed (i.e., when the last process
>> >   in the namespace terminates), its physical network devices are
>> >   moved back to the initial network namespace (not to the parent
>> >   of the process).
>> > 
>> > So the initial network namespace (the head of net_namespace_list?)
>> > is special [1].  To understand how physical network devices will
>> > be handled, it seems like we want to treat network devices as a
>> > depth-1 tree, with all non-initial net namespaces as children of
>> > the initial net namespace.  Can we extend this series'
>> > NS_GET_PARENT to return:
>> > 
>> > * EPERM for an unprivileged caller (like this series currently does
>> >   for PID namespaces),
>> > * ENOENT when called on net_namespace_list, and
>> > * net_namespace_list when called on any other net namespace.
>> 
>> What's the practical application of this?  independent net
>> namespaces are managed by the ip netns command.  It pins them by a
>> bind mount in a flat fashion; if we make them hierarchical the tool
>> would probably need updating to reflect this, so we're going to need
>> a reason to give the network people.  Just having the interfaces not
>> go back to root when you do an ip netns delete doesn't seem very
>> compelling.
>
> I'm not suggesting we add support for deeper nesting, I'm suggesting
> we use NS_GET_PARENT to allow sufficiently privileged users to
> determine if a given net namespace is the initial net namespace.  You
> could do this already with something like:
>
> 1. Create a new net namespace.
> 2. Add a physical network device to that namespace.
> 3. Delete that namespace.
> 4. See if the physical network device shows up in your
>    initial-net-namespace candidate.
> 5. Delete the physical network device (hopefully it ended up somewhere
>    you can find it ;).
>
> But using an NS_GET_PARENT call seems much safer and easier.

Have you had the problem in practice where you can't tell which network
namespace is the initial network namespace.  This all seems like a
theoretical problem rather than a real one.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]         ` <1469309936.2332.35.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
@ 2016-07-23 21:58           ` W. Trevor King
  0 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-23 21:58 UTC (permalink / raw)
  To: James Bottomley
  Cc: Serge Hallyn, Andrey Vagin, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
	Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Michael Kerrisk (man-pages),
	Eric W. Biederman


[-- Attachment #1.1: Type: text/plain, Size: 2240 bytes --]

On Sat, Jul 23, 2016 at 02:38:56PM -0700, James Bottomley wrote:
> On Sat, 2016-07-23 at 14:14 -0700, W. Trevor King wrote:
> > namespaces(7) and clone(2) both have:
> > 
> >   When a network namespace is freed (i.e., when the last process
> >   in the namespace terminates), its physical network devices are
> >   moved back to the initial network namespace (not to the parent
> >   of the process).
> > 
> > So the initial network namespace (the head of net_namespace_list?)
> > is special [1].  To understand how physical network devices will
> > be handled, it seems like we want to treat network devices as a
> > depth-1 tree, with all non-initial net namespaces as children of
> > the initial net namespace.  Can we extend this series'
> > NS_GET_PARENT to return:
> > 
> > * EPERM for an unprivileged caller (like this series currently does
> >   for PID namespaces),
> > * ENOENT when called on net_namespace_list, and
> > * net_namespace_list when called on any other net namespace.
> 
> What's the practical application of this?  independent net
> namespaces are managed by the ip netns command.  It pins them by a
> bind mount in a flat fashion; if we make them hierarchical the tool
> would probably need updating to reflect this, so we're going to need
> a reason to give the network people.  Just having the interfaces not
> go back to root when you do an ip netns delete doesn't seem very
> compelling.

I'm not suggesting we add support for deeper nesting, I'm suggesting
we use NS_GET_PARENT to allow sufficiently privileged users to
determine if a given net namespace is the initial net namespace.  You
could do this already with something like:

1. Create a new net namespace.
2. Add a physical network device to that namespace.
3. Delete that namespace.
4. See if the physical network device shows up in your
   initial-net-namespace candidate.
5. Delete the physical network device (hopefully it ended up somewhere
   you can find it ;).

But using an NS_GET_PARENT call seems much safer and easier.

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]         ` <1469309936.2332.35.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
@ 2016-07-23 21:58           ` W. Trevor King
  0 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-23 21:58 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrey Vagin, Serge Hallyn, linux-api, containers, linux-kernel,
	Alexander Viro, criu, Eric W. Biederman, linux-fsdevel,
	Michael Kerrisk (man-pages)

[-- Attachment #1: Type: text/plain, Size: 2240 bytes --]

On Sat, Jul 23, 2016 at 02:38:56PM -0700, James Bottomley wrote:
> On Sat, 2016-07-23 at 14:14 -0700, W. Trevor King wrote:
> > namespaces(7) and clone(2) both have:
> > 
> >   When a network namespace is freed (i.e., when the last process
> >   in the namespace terminates), its physical network devices are
> >   moved back to the initial network namespace (not to the parent
> >   of the process).
> > 
> > So the initial network namespace (the head of net_namespace_list?)
> > is special [1].  To understand how physical network devices will
> > be handled, it seems like we want to treat network devices as a
> > depth-1 tree, with all non-initial net namespaces as children of
> > the initial net namespace.  Can we extend this series'
> > NS_GET_PARENT to return:
> > 
> > * EPERM for an unprivileged caller (like this series currently does
> >   for PID namespaces),
> > * ENOENT when called on net_namespace_list, and
> > * net_namespace_list when called on any other net namespace.
> 
> What's the practical application of this?  independent net
> namespaces are managed by the ip netns command.  It pins them by a
> bind mount in a flat fashion; if we make them hierarchical the tool
> would probably need updating to reflect this, so we're going to need
> a reason to give the network people.  Just having the interfaces not
> go back to root when you do an ip netns delete doesn't seem very
> compelling.

I'm not suggesting we add support for deeper nesting, I'm suggesting
we use NS_GET_PARENT to allow sufficiently privileged users to
determine if a given net namespace is the initial net namespace.  You
could do this already with something like:

1. Create a new net namespace.
2. Add a physical network device to that namespace.
3. Delete that namespace.
4. See if the physical network device shows up in your
   initial-net-namespace candidate.
5. Delete the physical network device (hopefully it ended up somewhere
   you can find it ;).

But using an NS_GET_PARENT call seems much safer and easier.

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-23 21:58           ` W. Trevor King
  0 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-23 21:58 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrey Vagin, Serge Hallyn, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
	criu-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

[-- Attachment #1: Type: text/plain, Size: 2240 bytes --]

On Sat, Jul 23, 2016 at 02:38:56PM -0700, James Bottomley wrote:
> On Sat, 2016-07-23 at 14:14 -0700, W. Trevor King wrote:
> > namespaces(7) and clone(2) both have:
> > 
> >   When a network namespace is freed (i.e., when the last process
> >   in the namespace terminates), its physical network devices are
> >   moved back to the initial network namespace (not to the parent
> >   of the process).
> > 
> > So the initial network namespace (the head of net_namespace_list?)
> > is special [1].  To understand how physical network devices will
> > be handled, it seems like we want to treat network devices as a
> > depth-1 tree, with all non-initial net namespaces as children of
> > the initial net namespace.  Can we extend this series'
> > NS_GET_PARENT to return:
> > 
> > * EPERM for an unprivileged caller (like this series currently does
> >   for PID namespaces),
> > * ENOENT when called on net_namespace_list, and
> > * net_namespace_list when called on any other net namespace.
> 
> What's the practical application of this?  independent net
> namespaces are managed by the ip netns command.  It pins them by a
> bind mount in a flat fashion; if we make them hierarchical the tool
> would probably need updating to reflect this, so we're going to need
> a reason to give the network people.  Just having the interfaces not
> go back to root when you do an ip netns delete doesn't seem very
> compelling.

I'm not suggesting we add support for deeper nesting, I'm suggesting
we use NS_GET_PARENT to allow sufficiently privileged users to
determine if a given net namespace is the initial net namespace.  You
could do this already with something like:

1. Create a new net namespace.
2. Add a physical network device to that namespace.
3. Delete that namespace.
4. See if the physical network device shows up in your
   initial-net-namespace candidate.
5. Delete the physical network device (hopefully it ended up somewhere
   you can find it ;).

But using an NS_GET_PARENT call seems much safer and easier.

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-23 21:56             ` Eric W. Biederman
@ 2016-07-23 22:34                 ` W. Trevor King
  -1 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-23 22:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge Hallyn, Andrey Vagin, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Michael Kerrisk (man-pages)


[-- Attachment #1.1: Type: text/plain, Size: 3232 bytes --]

On Sat, Jul 23, 2016 at 04:56:44PM -0500, Eric W. Biederman wrote:
> "W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org> writes:
> > On Sat, Jul 23, 2016 at 02:38:56PM -0700, James Bottomley wrote:
> >> On Sat, 2016-07-23 at 14:14 -0700, W. Trevor King wrote:
> >> > namespaces(7) and clone(2) both have:
> >> > 
> >> >   When a network namespace is freed (i.e., when the last
> >> >   process in the namespace terminates), its physical network
> >> >   devices are moved back to the initial network namespace (not
> >> >   to the parent of the process).
> >> > 
> >> > So the initial network namespace (the head of
> >> > net_namespace_list?)  is special [1].  To understand how
> >> > physical network devices will be handled, it seems like we want
> >> > to treat network devices as a depth-1 tree, with all
> >> > non-initial net namespaces as children of the initial net
> >> > namespace.  Can we extend this series' NS_GET_PARENT to return:
> >> > 
> >> > * EPERM for an unprivileged caller (like this series currently
> >> >   does for PID namespaces),
> >> > * ENOENT when called on net_namespace_list, and
> >> > * net_namespace_list when called on any other net namespace.
> >> 
> >> What's the practical application of this?  independent net
> >> namespaces are managed by the ip netns command.  It pins them by
> >> a bind mount in a flat fashion; if we make them hierarchical the
> >> tool would probably need updating to reflect this, so we're going
> >> to need a reason to give the network people.  Just having the
> >> interfaces not go back to root when you do an ip netns delete
> >> doesn't seem very compelling.
> >
> > I'm not suggesting we add support for deeper nesting, I'm suggesting
> > we use NS_GET_PARENT to allow sufficiently privileged users to
> > determine if a given net namespace is the initial net namespace.  You
> > could do this already with something like:
> >
> > 1. Create a new net namespace.
> > 2. Add a physical network device to that namespace.
> > 3. Delete that namespace.
> > 4. See if the physical network device shows up in your
> >    initial-net-namespace candidate.
> > 5. Delete the physical network device (hopefully it ended up
> >    somewhere you can find it ;).
> >
> > But using an NS_GET_PARENT call seems much safer and easier.
> 
> Have you had the problem in practice where you can't tell which
> network namespace is the initial network namespace.  This all seems
> like a theoretical problem rather than a real one.

I haven't had any practical problems here, I'm just trying to wrap my
head around namespace-relationship discovery.  The special physical
network device handling seems a lot like init re-parenting (with no
PR_SET_CHILD_SUBREAPER analog in a 1-deep namespace tree), so calling
the initial network namespace a parent (and all the other namespaces
its direct children) seems natural enough.  If that doesn't sound
convincing, I'm happy to punt this idea until someone runs into a
practical problem ;).

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-23 22:34                 ` W. Trevor King
  0 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-23 22:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn, linux-api,
	containers, linux-kernel, Alexander Viro, criu, linux-fsdevel,
	Michael Kerrisk (man-pages)

[-- Attachment #1: Type: text/plain, Size: 3203 bytes --]

On Sat, Jul 23, 2016 at 04:56:44PM -0500, Eric W. Biederman wrote:
> "W. Trevor King" <wking@tremily.us> writes:
> > On Sat, Jul 23, 2016 at 02:38:56PM -0700, James Bottomley wrote:
> >> On Sat, 2016-07-23 at 14:14 -0700, W. Trevor King wrote:
> >> > namespaces(7) and clone(2) both have:
> >> > 
> >> >   When a network namespace is freed (i.e., when the last
> >> >   process in the namespace terminates), its physical network
> >> >   devices are moved back to the initial network namespace (not
> >> >   to the parent of the process).
> >> > 
> >> > So the initial network namespace (the head of
> >> > net_namespace_list?)  is special [1].  To understand how
> >> > physical network devices will be handled, it seems like we want
> >> > to treat network devices as a depth-1 tree, with all
> >> > non-initial net namespaces as children of the initial net
> >> > namespace.  Can we extend this series' NS_GET_PARENT to return:
> >> > 
> >> > * EPERM for an unprivileged caller (like this series currently
> >> >   does for PID namespaces),
> >> > * ENOENT when called on net_namespace_list, and
> >> > * net_namespace_list when called on any other net namespace.
> >> 
> >> What's the practical application of this?  independent net
> >> namespaces are managed by the ip netns command.  It pins them by
> >> a bind mount in a flat fashion; if we make them hierarchical the
> >> tool would probably need updating to reflect this, so we're going
> >> to need a reason to give the network people.  Just having the
> >> interfaces not go back to root when you do an ip netns delete
> >> doesn't seem very compelling.
> >
> > I'm not suggesting we add support for deeper nesting, I'm suggesting
> > we use NS_GET_PARENT to allow sufficiently privileged users to
> > determine if a given net namespace is the initial net namespace.  You
> > could do this already with something like:
> >
> > 1. Create a new net namespace.
> > 2. Add a physical network device to that namespace.
> > 3. Delete that namespace.
> > 4. See if the physical network device shows up in your
> >    initial-net-namespace candidate.
> > 5. Delete the physical network device (hopefully it ended up
> >    somewhere you can find it ;).
> >
> > But using an NS_GET_PARENT call seems much safer and easier.
> 
> Have you had the problem in practice where you can't tell which
> network namespace is the initial network namespace.  This all seems
> like a theoretical problem rather than a real one.

I haven't had any practical problems here, I'm just trying to wrap my
head around namespace-relationship discovery.  The special physical
network device handling seems a lot like init re-parenting (with no
PR_SET_CHILD_SUBREAPER analog in a 1-deep namespace tree), so calling
the initial network namespace a parent (and all the other namespaces
its direct children) seems natural enough.  If that doesn't sound
convincing, I'm happy to punt this idea until someone runs into a
practical problem ;).

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
  2016-07-15  2:12       ` Andrey Vagin
@ 2016-07-23 23:07           ` kbuild test robot
  -1 siblings, 0 replies; 142+ messages in thread
From: kbuild test robot @ 2016-07-23 23:07 UTC (permalink / raw)
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
	criu-GEFAQzZX7r8dnm+yROfE0A, kbuild-all-JC7UmRfGjtg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages),
	Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 2197 bytes --]

Hi,

[auto build test ERROR on net/master]
[also build test ERROR on v4.7-rc7]
[cannot apply to next-20160722]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Andrey-Vagin/namespaces-move-user_ns-into-ns_common/20160716-093057
config: x86_64-randconfig-s0-07240634 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> kernel/user.c:53: error: unknown field 'ns' specified in initializer
   kernel/user.c:53: warning: missing braces around initializer
   kernel/user.c:53: warning: (near initialization for 'init_user_ns.<anonymous>')
>> kernel/user.c:53: error: incompatible types when initializing type 'struct user_namespace *' using type 'enum <anonymous>'
   kernel/user.c:55: error: unknown field 'ns' specified in initializer
   kernel/user.c:55: warning: initialization makes integer from pointer without a cast

vim +53 kernel/user.c

f76d207a Eric W. Biederman 2012-08-30  47  			.count = 4294967295U,
f76d207a Eric W. Biederman 2012-08-30  48  		},
f76d207a Eric W. Biederman 2012-08-30  49  	},
c61a2810 Eric W. Biederman 2012-12-28  50  	.count = ATOMIC_INIT(3),
783291e6 Eric W. Biederman 2011-11-17  51  	.owner = GLOBAL_ROOT_UID,
783291e6 Eric W. Biederman 2011-11-17  52  	.group = GLOBAL_ROOT_GID,
435d5f4b Al Viro           2014-10-31 @53  	.ns.inum = PROC_USER_INIT_INO,
33c42940 Al Viro           2014-11-01  54  #ifdef CONFIG_USER_NS
33c42940 Al Viro           2014-11-01  55  	.ns.ops = &userns_operations,
33c42940 Al Viro           2014-11-01  56  #endif

:::::: The code at line 53 was first introduced by commit
:::::: 435d5f4bb2ccba3b791d9ef61d2590e30b8e806e common object embedded into various struct ....ns

:::::: TO: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
:::::: CC: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 22370 bytes --]

[-- Attachment #3: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
@ 2016-07-23 23:07           ` kbuild test robot
  0 siblings, 0 replies; 142+ messages in thread
From: kbuild test robot @ 2016-07-23 23:07 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: kbuild-all, linux-kernel, linux-api, containers, criu,
	linux-fsdevel, Eric W. Biederman, James Bottomley,
	Michael Kerrisk (man-pages),
	W. Trevor King, Alexander Viro, Serge Hallyn, Andrey Vagin

[-- Attachment #1: Type: text/plain, Size: 2135 bytes --]

Hi,

[auto build test ERROR on net/master]
[also build test ERROR on v4.7-rc7]
[cannot apply to next-20160722]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Andrey-Vagin/namespaces-move-user_ns-into-ns_common/20160716-093057
config: x86_64-randconfig-s0-07240634 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> kernel/user.c:53: error: unknown field 'ns' specified in initializer
   kernel/user.c:53: warning: missing braces around initializer
   kernel/user.c:53: warning: (near initialization for 'init_user_ns.<anonymous>')
>> kernel/user.c:53: error: incompatible types when initializing type 'struct user_namespace *' using type 'enum <anonymous>'
   kernel/user.c:55: error: unknown field 'ns' specified in initializer
   kernel/user.c:55: warning: initialization makes integer from pointer without a cast

vim +53 kernel/user.c

f76d207a Eric W. Biederman 2012-08-30  47  			.count = 4294967295U,
f76d207a Eric W. Biederman 2012-08-30  48  		},
f76d207a Eric W. Biederman 2012-08-30  49  	},
c61a2810 Eric W. Biederman 2012-12-28  50  	.count = ATOMIC_INIT(3),
783291e6 Eric W. Biederman 2011-11-17  51  	.owner = GLOBAL_ROOT_UID,
783291e6 Eric W. Biederman 2011-11-17  52  	.group = GLOBAL_ROOT_GID,
435d5f4b Al Viro           2014-10-31 @53  	.ns.inum = PROC_USER_INIT_INO,
33c42940 Al Viro           2014-11-01  54  #ifdef CONFIG_USER_NS
33c42940 Al Viro           2014-11-01  55  	.ns.ops = &userns_operations,
33c42940 Al Viro           2014-11-01  56  #endif

:::::: The code at line 53 was first introduced by commit
:::::: 435d5f4bb2ccba3b791d9ef61d2590e30b8e806e common object embedded into various struct ....ns

:::::: TO: Al Viro <viro@zeniv.linux.org.uk>
:::::: CC: Al Viro <viro@zeniv.linux.org.uk>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 22370 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-23 22:34                 ` W. Trevor King
@ 2016-07-24  4:51                     ` Eric W. Biederman
  -1 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24  4:51 UTC (permalink / raw)
  To: W. Trevor King
  Cc: Serge Hallyn, Andrey Vagin, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Michael Kerrisk (man-pages)

"W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org> writes:

> On Sat, Jul 23, 2016 at 04:56:44PM -0500, Eric W. Biederman wrote:
>> "W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org> writes:
>> > On Sat, Jul 23, 2016 at 02:38:56PM -0700, James Bottomley wrote:
>> >> On Sat, 2016-07-23 at 14:14 -0700, W. Trevor King wrote:
>> >> > namespaces(7) and clone(2) both have:
>> >> > 
>> >> >   When a network namespace is freed (i.e., when the last
>> >> >   process in the namespace terminates), its physical network
>> >> >   devices are moved back to the initial network namespace (not
>> >> >   to the parent of the process).
>> >> > 
>> >> > So the initial network namespace (the head of
>> >> > net_namespace_list?)  is special [1].  To understand how
>> >> > physical network devices will be handled, it seems like we want
>> >> > to treat network devices as a depth-1 tree, with all
>> >> > non-initial net namespaces as children of the initial net
>> >> > namespace.  Can we extend this series' NS_GET_PARENT to return:
>> >> > 
>> >> > * EPERM for an unprivileged caller (like this series currently
>> >> >   does for PID namespaces),
>> >> > * ENOENT when called on net_namespace_list, and
>> >> > * net_namespace_list when called on any other net namespace.
>> >> 
>> >> What's the practical application of this?  independent net
>> >> namespaces are managed by the ip netns command.  It pins them by
>> >> a bind mount in a flat fashion; if we make them hierarchical the
>> >> tool would probably need updating to reflect this, so we're going
>> >> to need a reason to give the network people.  Just having the
>> >> interfaces not go back to root when you do an ip netns delete
>> >> doesn't seem very compelling.
>> >
>> > I'm not suggesting we add support for deeper nesting, I'm suggesting
>> > we use NS_GET_PARENT to allow sufficiently privileged users to
>> > determine if a given net namespace is the initial net namespace.  You
>> > could do this already with something like:
>> >
>> > 1. Create a new net namespace.
>> > 2. Add a physical network device to that namespace.
>> > 3. Delete that namespace.
>> > 4. See if the physical network device shows up in your
>> >    initial-net-namespace candidate.
>> > 5. Delete the physical network device (hopefully it ended up
>> >    somewhere you can find it ;).
>> >
>> > But using an NS_GET_PARENT call seems much safer and easier.
>> 
>> Have you had the problem in practice where you can't tell which
>> network namespace is the initial network namespace.  This all seems
>> like a theoretical problem rather than a real one.
>
> I haven't had any practical problems here, I'm just trying to wrap my
> head around namespace-relationship discovery.  The special physical
> network device handling seems a lot like init re-parenting (with no
> PR_SET_CHILD_SUBREAPER analog in a 1-deep namespace tree), so calling
> the initial network namespace a parent (and all the other namespaces
> its direct children) seems natural enough.  If that doesn't sound
> convincing, I'm happy to punt this idea until someone runs into a
> practical problem ;).

Then let's punt this until someone runs into a practical problem.

For scaling and for sanity it is desirable to keep the connections
between namespaces to a minimum.  Further the initial instances of a
namespace always tend to be a little bit special.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-24  4:51                     ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24  4:51 UTC (permalink / raw)
  To: W. Trevor King
  Cc: James Bottomley, Andrey Vagin, Serge Hallyn, linux-api,
	containers, linux-kernel, Alexander Viro, criu, linux-fsdevel,
	Michael Kerrisk (man-pages)

"W. Trevor King" <wking@tremily.us> writes:

> On Sat, Jul 23, 2016 at 04:56:44PM -0500, Eric W. Biederman wrote:
>> "W. Trevor King" <wking@tremily.us> writes:
>> > On Sat, Jul 23, 2016 at 02:38:56PM -0700, James Bottomley wrote:
>> >> On Sat, 2016-07-23 at 14:14 -0700, W. Trevor King wrote:
>> >> > namespaces(7) and clone(2) both have:
>> >> > 
>> >> >   When a network namespace is freed (i.e., when the last
>> >> >   process in the namespace terminates), its physical network
>> >> >   devices are moved back to the initial network namespace (not
>> >> >   to the parent of the process).
>> >> > 
>> >> > So the initial network namespace (the head of
>> >> > net_namespace_list?)  is special [1].  To understand how
>> >> > physical network devices will be handled, it seems like we want
>> >> > to treat network devices as a depth-1 tree, with all
>> >> > non-initial net namespaces as children of the initial net
>> >> > namespace.  Can we extend this series' NS_GET_PARENT to return:
>> >> > 
>> >> > * EPERM for an unprivileged caller (like this series currently
>> >> >   does for PID namespaces),
>> >> > * ENOENT when called on net_namespace_list, and
>> >> > * net_namespace_list when called on any other net namespace.
>> >> 
>> >> What's the practical application of this?  independent net
>> >> namespaces are managed by the ip netns command.  It pins them by
>> >> a bind mount in a flat fashion; if we make them hierarchical the
>> >> tool would probably need updating to reflect this, so we're going
>> >> to need a reason to give the network people.  Just having the
>> >> interfaces not go back to root when you do an ip netns delete
>> >> doesn't seem very compelling.
>> >
>> > I'm not suggesting we add support for deeper nesting, I'm suggesting
>> > we use NS_GET_PARENT to allow sufficiently privileged users to
>> > determine if a given net namespace is the initial net namespace.  You
>> > could do this already with something like:
>> >
>> > 1. Create a new net namespace.
>> > 2. Add a physical network device to that namespace.
>> > 3. Delete that namespace.
>> > 4. See if the physical network device shows up in your
>> >    initial-net-namespace candidate.
>> > 5. Delete the physical network device (hopefully it ended up
>> >    somewhere you can find it ;).
>> >
>> > But using an NS_GET_PARENT call seems much safer and easier.
>> 
>> Have you had the problem in practice where you can't tell which
>> network namespace is the initial network namespace.  This all seems
>> like a theoretical problem rather than a real one.
>
> I haven't had any practical problems here, I'm just trying to wrap my
> head around namespace-relationship discovery.  The special physical
> network device handling seems a lot like init re-parenting (with no
> PR_SET_CHILD_SUBREAPER analog in a 1-deep namespace tree), so calling
> the initial network namespace a parent (and all the other namespaces
> its direct children) seems natural enough.  If that doesn't sound
> convincing, I'm happy to punt this idea until someone runs into a
> practical problem ;).

Then let's punt this until someone runs into a practical problem.

For scaling and for sanity it is desirable to keep the connections
between namespaces to a minimum.  Further the initial instances of a
namespace always tend to be a little bit special.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
       [not found]       ` <1468548742-32136-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
                           ` (5 preceding siblings ...)
  2016-07-23 23:07           ` kbuild test robot
@ 2016-07-24  5:00         ` Eric W. Biederman
  6 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24  5:00 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: Serge Hallyn, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Michael Kerrisk (man-pages)

Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:

> Every namespace has a pointer to an user namespace where is was created,
> but they're all privately embedded in the individual namespace specific
> structures.
>
> Now we are going to add an user-space interface to get an owning user
> namespace, so it looks reasonable to move it into ns_common.
>
> Originally this idea was suggested by James Bottomley.

I skimmed through this and I really don't like move user_ns into
ns_common.  If for no other reason that it seems to have guarantteed
this patchset as written would not apply to my tree.

> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 8297e5b..a941b44 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -27,11 +27,15 @@ struct user_namespace {
>  	struct uid_gid_map	gid_map;
>  	struct uid_gid_map	projid_map;
>  	atomic_t		count;
> -	struct user_namespace	*parent;
>  	int			level;
>  	kuid_t			owner;
>  	kgid_t			group;
> -	struct ns_common	ns;
> +
> +	/* ->ns.user_ns and ->parent are synonyms */
> +	union {
> +		struct user_namespace	*parent;
> +		struct ns_common	ns;
> +	};
>  	unsigned long		flags;
>  
>  	/* Register of per-UID persistent keyrings for this namespace */

This union is unmaintainable.  It is very easy for someone to change
ns_common and accidentially break this.  The C standard does not
allow data to be accessed as either one union member or the other.
Which means semantically this code relies on undefined behavior, and
the compiler can do anything in this case and gcc has sometimes been
known to use that allowance.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
       [not found]       ` <1468548742-32136-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2016-07-24  5:00         ` Eric W. Biederman
  2016-07-15  2:12           ` Andrey Vagin
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24  5:00 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel, James Bottomley, Serge Hallyn, linux-api,
	containers, Alexander Viro, criu, linux-fsdevel,
	Michael Kerrisk (man-pages)

Andrey Vagin <avagin@openvz.org> writes:

> Every namespace has a pointer to an user namespace where is was created,
> but they're all privately embedded in the individual namespace specific
> structures.
>
> Now we are going to add an user-space interface to get an owning user
> namespace, so it looks reasonable to move it into ns_common.
>
> Originally this idea was suggested by James Bottomley.

I skimmed through this and I really don't like move user_ns into
ns_common.  If for no other reason that it seems to have guarantteed
this patchset as written would not apply to my tree.

> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 8297e5b..a941b44 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -27,11 +27,15 @@ struct user_namespace {
>  	struct uid_gid_map	gid_map;
>  	struct uid_gid_map	projid_map;
>  	atomic_t		count;
> -	struct user_namespace	*parent;
>  	int			level;
>  	kuid_t			owner;
>  	kgid_t			group;
> -	struct ns_common	ns;
> +
> +	/* ->ns.user_ns and ->parent are synonyms */
> +	union {
> +		struct user_namespace	*parent;
> +		struct ns_common	ns;
> +	};
>  	unsigned long		flags;
>  
>  	/* Register of per-UID persistent keyrings for this namespace */

This union is unmaintainable.  It is very easy for someone to change
ns_common and accidentially break this.  The C standard does not
allow data to be accessed as either one union member or the other.
Which means semantically this code relies on undefined behavior, and
the compiler can do anything in this case and gcc has sometimes been
known to use that allowance.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
@ 2016-07-24  5:00         ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24  5:00 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Serge Hallyn, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:

> Every namespace has a pointer to an user namespace where is was created,
> but they're all privately embedded in the individual namespace specific
> structures.
>
> Now we are going to add an user-space interface to get an owning user
> namespace, so it looks reasonable to move it into ns_common.
>
> Originally this idea was suggested by James Bottomley.

I skimmed through this and I really don't like move user_ns into
ns_common.  If for no other reason that it seems to have guarantteed
this patchset as written would not apply to my tree.

> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 8297e5b..a941b44 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -27,11 +27,15 @@ struct user_namespace {
>  	struct uid_gid_map	gid_map;
>  	struct uid_gid_map	projid_map;
>  	atomic_t		count;
> -	struct user_namespace	*parent;
>  	int			level;
>  	kuid_t			owner;
>  	kgid_t			group;
> -	struct ns_common	ns;
> +
> +	/* ->ns.user_ns and ->parent are synonyms */
> +	union {
> +		struct user_namespace	*parent;
> +		struct ns_common	ns;
> +	};
>  	unsigned long		flags;
>  
>  	/* Register of per-UID persistent keyrings for this namespace */

This union is unmaintainable.  It is very easy for someone to change
ns_common and accidentially break this.  The C standard does not
allow data to be accessed as either one union member or the other.
Which means semantically this code relies on undefined behavior, and
the compiler can do anything in this case and gcc has sometimes been
known to use that allowance.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
       [not found]           ` <1468548742-32136-2-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2016-07-24  5:03             ` Eric W. Biederman
  2016-07-24 16:54               ` W. Trevor King
  1 sibling, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24  5:03 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: Serge Hallyn, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Michael Kerrisk (man-pages)

Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:

> Return -EPERM if an owning user namespace is outside of a process
> current user namespace.
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index a5bc78c..6382e5e 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -994,6 +994,30 @@ static int userns_install(struct nsproxy *nsproxy, struct ns_common *ns)
>  	return commit_creds(cred);
>  }
>  
> +struct ns_common *ns_get_owner(struct ns_common *ns)
> +{
> +	const struct cred *cred = current_cred();
> +	struct user_namespace *user_ns, *p;
> +
> +	user_ns = p = ns->user_ns;
> +	if (user_ns == NULL) { /* ns is init_user_ns */
> +		/* Unprivileged user should not know that it's init_user_ns. */
> +		if (capable(CAP_SYS_ADMIN))
> +			return ERR_PTR(-ENOENT);
> +		return ERR_PTR(-EPERM);
> +	}

This permission check is not what I meant to request.  This does not
handle nested user namespaces.

> +	for (;;) {
> +		if (p == cred->user_ns)
> +			break;
> +		if (p == &init_user_ns)
> +			return ERR_PTR(-EPERM);
> +		p = p->parent;
> +	}
> +

The permission check really needs to be down here. And be:

	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
        	return ERR_PTR(-EPERM).

That cleanly and easily handles more than a depth of a single user
namespace.

> +	return &get_user_ns(user_ns)->ns;
> +}
> +
>  const struct proc_ns_operations userns_operations = {
>  	.name		= "user",
>  	.type		= CLONE_NEWUSER,


Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
       [not found]           ` <1468548742-32136-2-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2016-07-24  5:03             ` Eric W. Biederman
  2016-07-24 16:54               ` W. Trevor King
  1 sibling, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24  5:03 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel, James Bottomley, Serge Hallyn, linux-api,
	containers, Alexander Viro, criu, linux-fsdevel,
	Michael Kerrisk (man-pages)

Andrey Vagin <avagin@openvz.org> writes:

> Return -EPERM if an owning user namespace is outside of a process
> current user namespace.
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index a5bc78c..6382e5e 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -994,6 +994,30 @@ static int userns_install(struct nsproxy *nsproxy, struct ns_common *ns)
>  	return commit_creds(cred);
>  }
>  
> +struct ns_common *ns_get_owner(struct ns_common *ns)
> +{
> +	const struct cred *cred = current_cred();
> +	struct user_namespace *user_ns, *p;
> +
> +	user_ns = p = ns->user_ns;
> +	if (user_ns == NULL) { /* ns is init_user_ns */
> +		/* Unprivileged user should not know that it's init_user_ns. */
> +		if (capable(CAP_SYS_ADMIN))
> +			return ERR_PTR(-ENOENT);
> +		return ERR_PTR(-EPERM);
> +	}

This permission check is not what I meant to request.  This does not
handle nested user namespaces.

> +	for (;;) {
> +		if (p == cred->user_ns)
> +			break;
> +		if (p == &init_user_ns)
> +			return ERR_PTR(-EPERM);
> +		p = p->parent;
> +	}
> +

The permission check really needs to be down here. And be:

	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
        	return ERR_PTR(-EPERM).

That cleanly and easily handles more than a depth of a single user
namespace.

> +	return &get_user_ns(user_ns)->ns;
> +}
> +
>  const struct proc_ns_operations userns_operations = {
>  	.name		= "user",
>  	.type		= CLONE_NEWUSER,


Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
@ 2016-07-24  5:03             ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24  5:03 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Serge Hallyn, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:

> Return -EPERM if an owning user namespace is outside of a process
> current user namespace.
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index a5bc78c..6382e5e 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -994,6 +994,30 @@ static int userns_install(struct nsproxy *nsproxy, struct ns_common *ns)
>  	return commit_creds(cred);
>  }
>  
> +struct ns_common *ns_get_owner(struct ns_common *ns)
> +{
> +	const struct cred *cred = current_cred();
> +	struct user_namespace *user_ns, *p;
> +
> +	user_ns = p = ns->user_ns;
> +	if (user_ns == NULL) { /* ns is init_user_ns */
> +		/* Unprivileged user should not know that it's init_user_ns. */
> +		if (capable(CAP_SYS_ADMIN))
> +			return ERR_PTR(-ENOENT);
> +		return ERR_PTR(-EPERM);
> +	}

This permission check is not what I meant to request.  This does not
handle nested user namespaces.

> +	for (;;) {
> +		if (p == cred->user_ns)
> +			break;
> +		if (p == &init_user_ns)
> +			return ERR_PTR(-EPERM);
> +		p = p->parent;
> +	}
> +

The permission check really needs to be down here. And be:

	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
        	return ERR_PTR(-EPERM).

That cleanly and easily handles more than a depth of a single user
namespace.

> +	return &get_user_ns(user_ns)->ns;
> +}
> +
>  const struct proc_ns_operations userns_operations = {
>  	.name		= "user",
>  	.type		= CLONE_NEWUSER,


Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 4/5] nsfs: add ioctl to get a parent namespace
       [not found]         ` <1468548742-32136-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2016-07-24  5:07           ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24  5:07 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: Serge Hallyn, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Michael Kerrisk (man-pages)

Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:

> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 3529a03..a63adfb 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -388,12 +388,38 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
>  	return 0;
>  }
>  
> +static struct ns_common *pidns_get_parent(struct ns_common *ns)
> +{
> +	struct pid_namespace *active = task_active_pid_ns(current);
> +	struct pid_namespace *pid_ns, *p;
> +
> +	pid_ns = to_pid_ns(ns);
> +	if (pid_ns == &init_pid_ns) {
> +		if (capable(CAP_SYS_ADMIN))
> +			return ERR_PTR(-ENOENT);
> +		return ERR_PTR(-EPERM);
> +	}
> +
> +	pid_ns = p = pid_ns->parent;
> +
> +	for (;;) {
> +		if (p == active)
> +			break;
> +		if (p == &init_pid_ns)
> +			return ERR_PTR(-EPERM);
> +		p = p->parent;
> +	}

Similarly to the user namespace issue the permission check here needs to
be:
	if (!ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN)
		return ERR_PTR(-EPERM);
> +
> +	return &get_pid_ns(pid_ns)->ns;
> +}
> +
>  const struct proc_ns_operations pidns_operations = {
>  	.name		= "pid",
>  	.type		= CLONE_NEWPID,
>  	.get		= pidns_get,
>  	.put		= pidns_put,
>  	.install	= pidns_install,
> +	.get_parent	= pidns_get_parent,
>  };
>  

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 4/5] nsfs: add ioctl to get a parent namespace
       [not found]         ` <1468548742-32136-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2016-07-24  5:07           ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24  5:07 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel, James Bottomley, Serge Hallyn, linux-api,
	containers, Alexander Viro, criu, linux-fsdevel,
	Michael Kerrisk (man-pages)

Andrey Vagin <avagin@openvz.org> writes:

> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 3529a03..a63adfb 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -388,12 +388,38 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
>  	return 0;
>  }
>  
> +static struct ns_common *pidns_get_parent(struct ns_common *ns)
> +{
> +	struct pid_namespace *active = task_active_pid_ns(current);
> +	struct pid_namespace *pid_ns, *p;
> +
> +	pid_ns = to_pid_ns(ns);
> +	if (pid_ns == &init_pid_ns) {
> +		if (capable(CAP_SYS_ADMIN))
> +			return ERR_PTR(-ENOENT);
> +		return ERR_PTR(-EPERM);
> +	}
> +
> +	pid_ns = p = pid_ns->parent;
> +
> +	for (;;) {
> +		if (p == active)
> +			break;
> +		if (p == &init_pid_ns)
> +			return ERR_PTR(-EPERM);
> +		p = p->parent;
> +	}

Similarly to the user namespace issue the permission check here needs to
be:
	if (!ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN)
		return ERR_PTR(-EPERM);
> +
> +	return &get_pid_ns(pid_ns)->ns;
> +}
> +
>  const struct proc_ns_operations pidns_operations = {
>  	.name		= "pid",
>  	.type		= CLONE_NEWPID,
>  	.get		= pidns_get,
>  	.put		= pidns_put,
>  	.install	= pidns_install,
> +	.get_parent	= pidns_get_parent,
>  };
>  

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 4/5] nsfs: add ioctl to get a parent namespace
@ 2016-07-24  5:07           ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24  5:07 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Serge Hallyn, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:

> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 3529a03..a63adfb 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -388,12 +388,38 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
>  	return 0;
>  }
>  
> +static struct ns_common *pidns_get_parent(struct ns_common *ns)
> +{
> +	struct pid_namespace *active = task_active_pid_ns(current);
> +	struct pid_namespace *pid_ns, *p;
> +
> +	pid_ns = to_pid_ns(ns);
> +	if (pid_ns == &init_pid_ns) {
> +		if (capable(CAP_SYS_ADMIN))
> +			return ERR_PTR(-ENOENT);
> +		return ERR_PTR(-EPERM);
> +	}
> +
> +	pid_ns = p = pid_ns->parent;
> +
> +	for (;;) {
> +		if (p == active)
> +			break;
> +		if (p == &init_pid_ns)
> +			return ERR_PTR(-EPERM);
> +		p = p->parent;
> +	}

Similarly to the user namespace issue the permission check here needs to
be:
	if (!ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN)
		return ERR_PTR(-EPERM);
> +
> +	return &get_pid_ns(pid_ns)->ns;
> +}
> +
>  const struct proc_ns_operations pidns_operations = {
>  	.name		= "pid",
>  	.type		= CLONE_NEWPID,
>  	.get		= pidns_get,
>  	.put		= pidns_put,
>  	.install	= pidns_install,
> +	.get_parent	= pidns_get_parent,
>  };
>  

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]   ` <CANaxB-xw_xBUq=0uT14ANv-jfg2NsGaPy=jyDO9=yF03_7toSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-07-15  2:12       ` Andrey Vagin
@ 2016-07-24  5:10     ` Eric W. Biederman
  1 sibling, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24  5:10 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: Serge Hallyn, criu-GEFAQzZX7r8dnm+yROfE0A, Linux API,
	Linux Containers, LKML, James Bottomley, Alexander Viro,
	linux-fsdevel, Michael Kerrisk (man-pages)

Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:

> Hello,
>
> I forgot to add --cc-cover for git send-email, so everyone who is in
> Cc got only a cover letter. All messages were sent in mail lists.
>
> Sorry for inconvenience.

Mostly the code looked sensible.  But I had a couple of issues.
Resend this in September (when the merge window is closed and I am back
from vacation) and I will give this a thorough review and get this
merged.  Or possibly next week if Linus releases another -rc

> On Thu, Jul 14, 2016 at 11:20 AM, Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
>> Each namespace has an owning user namespace and now there is not way
>> to discover these relationships.
>>
>> Pid and user namepaces are hierarchical. There is no way to discover
>> parent-child relationships too.
>>
>> Why we may want to know relationships between namespaces?
>>
>> One use would be visualization, in order to understand the running system.
>> Another would be to answer the question: what capability does process X have to
>> perform operations on a resource governed by namespace Y?
>>
>> One more use-case (which usually called abnormal) is checkpoint/restart.
>> In CRIU we age going to dump and restore nested namespaces.
>>
>> There [1] was a discussion about which interface to choose to determing
>> relationships between namespaces.
>>
>> Eric suggested to add two ioctl-s [2]:
>>> Grumble, Grumble.  I think this may actually a case for creating ioctls
>>> for these two cases.  Now that random nsfs file descriptors are bind
>>> mountable the original reason for using proc files is not as pressing.
>>>
>>> One ioctl for the user namespace that owns a file descriptor.
>>> One ioctl for the parent namespace of a namespace file descriptor.
>>
>> Here is an implementaions of these ioctl-s.
>>
>> [1] https://lkml.org/lkml/2016/7/6/158
>> [2] https://lkml.org/lkml/2016/7/9/101
>>
>> Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> Cc: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
>> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> Cc: "W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org>
>> Cc: Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
>> Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>


Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]   ` <CANaxB-xw_xBUq=0uT14ANv-jfg2NsGaPy=jyDO9=yF03_7toSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-07-24  5:10     ` Eric W. Biederman
  2016-07-24  5:10     ` [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Eric W. Biederman
  1 sibling, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24  5:10 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: LKML, James Bottomley, Serge Hallyn, Linux API, Linux Containers,
	Alexander Viro, criu, linux-fsdevel, Michael Kerrisk (man-pages)

Andrey Vagin <avagin@openvz.org> writes:

> Hello,
>
> I forgot to add --cc-cover for git send-email, so everyone who is in
> Cc got only a cover letter. All messages were sent in mail lists.
>
> Sorry for inconvenience.

Mostly the code looked sensible.  But I had a couple of issues.
Resend this in September (when the merge window is closed and I am back
from vacation) and I will give this a thorough review and get this
merged.  Or possibly next week if Linus releases another -rc

> On Thu, Jul 14, 2016 at 11:20 AM, Andrey Vagin <avagin@openvz.org> wrote:
>> Each namespace has an owning user namespace and now there is not way
>> to discover these relationships.
>>
>> Pid and user namepaces are hierarchical. There is no way to discover
>> parent-child relationships too.
>>
>> Why we may want to know relationships between namespaces?
>>
>> One use would be visualization, in order to understand the running system.
>> Another would be to answer the question: what capability does process X have to
>> perform operations on a resource governed by namespace Y?
>>
>> One more use-case (which usually called abnormal) is checkpoint/restart.
>> In CRIU we age going to dump and restore nested namespaces.
>>
>> There [1] was a discussion about which interface to choose to determing
>> relationships between namespaces.
>>
>> Eric suggested to add two ioctl-s [2]:
>>> Grumble, Grumble.  I think this may actually a case for creating ioctls
>>> for these two cases.  Now that random nsfs file descriptors are bind
>>> mountable the original reason for using proc files is not as pressing.
>>>
>>> One ioctl for the user namespace that owns a file descriptor.
>>> One ioctl for the parent namespace of a namespace file descriptor.
>>
>> Here is an implementaions of these ioctl-s.
>>
>> [1] https://lkml.org/lkml/2016/7/6/158
>> [2] https://lkml.org/lkml/2016/7/9/101
>>
>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
>> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
>> Cc: "W. Trevor King" <wking@tremily.us>
>> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
>> Cc: Serge Hallyn <serge.hallyn@canonical.com>


Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-24  5:10     ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24  5:10 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: LKML, James Bottomley, Serge Hallyn, Linux API, Linux Containers,
	Alexander Viro, criu@openvz.org, linux-fsdevel,
	Michael Kerrisk (man-pages)

Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:

> Hello,
>
> I forgot to add --cc-cover for git send-email, so everyone who is in
> Cc got only a cover letter. All messages were sent in mail lists.
>
> Sorry for inconvenience.

Mostly the code looked sensible.  But I had a couple of issues.
Resend this in September (when the merge window is closed and I am back
from vacation) and I will give this a thorough review and get this
merged.  Or possibly next week if Linus releases another -rc

> On Thu, Jul 14, 2016 at 11:20 AM, Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
>> Each namespace has an owning user namespace and now there is not way
>> to discover these relationships.
>>
>> Pid and user namepaces are hierarchical. There is no way to discover
>> parent-child relationships too.
>>
>> Why we may want to know relationships between namespaces?
>>
>> One use would be visualization, in order to understand the running system.
>> Another would be to answer the question: what capability does process X have to
>> perform operations on a resource governed by namespace Y?
>>
>> One more use-case (which usually called abnormal) is checkpoint/restart.
>> In CRIU we age going to dump and restore nested namespaces.
>>
>> There [1] was a discussion about which interface to choose to determing
>> relationships between namespaces.
>>
>> Eric suggested to add two ioctl-s [2]:
>>> Grumble, Grumble.  I think this may actually a case for creating ioctls
>>> for these two cases.  Now that random nsfs file descriptors are bind
>>> mountable the original reason for using proc files is not as pressing.
>>>
>>> One ioctl for the user namespace that owns a file descriptor.
>>> One ioctl for the parent namespace of a namespace file descriptor.
>>
>> Here is an implementaions of these ioctl-s.
>>
>> [1] https://lkml.org/lkml/2016/7/6/158
>> [2] https://lkml.org/lkml/2016/7/9/101
>>
>> Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> Cc: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
>> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> Cc: "W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org>
>> Cc: Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
>> Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>


Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
  2016-07-24  5:00         ` Eric W. Biederman
@ 2016-07-24  5:54             ` Andrew Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-24  5:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge Hallyn, Andrey Vagin, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Michael Kerrisk (man-pages)

On Sun, Jul 24, 2016 at 12:00:13AM -0500, Eric W. Biederman wrote:
> Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:
> 
> > Every namespace has a pointer to an user namespace where is was created,
> > but they're all privately embedded in the individual namespace specific
> > structures.
> >
> > Now we are going to add an user-space interface to get an owning user
> > namespace, so it looks reasonable to move it into ns_common.
> >
> > Originally this idea was suggested by James Bottomley.
> 
> I skimmed through this and I really don't like move user_ns into
> ns_common.  If for no other reason that it seems to have guarantteed
> this patchset as written would not apply to my tree.

I am not insisting on this. In a second version, I will add the
get_owner operation to proc_ns_operations.

Thanks!

> 
> > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> > index 8297e5b..a941b44 100644
> > --- a/include/linux/user_namespace.h
> > +++ b/include/linux/user_namespace.h
> > @@ -27,11 +27,15 @@ struct user_namespace {
> >  	struct uid_gid_map	gid_map;
> >  	struct uid_gid_map	projid_map;
> >  	atomic_t		count;
> > -	struct user_namespace	*parent;
> >  	int			level;
> >  	kuid_t			owner;
> >  	kgid_t			group;
> > -	struct ns_common	ns;
> > +
> > +	/* ->ns.user_ns and ->parent are synonyms */
> > +	union {
> > +		struct user_namespace	*parent;
> > +		struct ns_common	ns;
> > +	};
> >  	unsigned long		flags;
> >  
> >  	/* Register of per-UID persistent keyrings for this namespace */
> 
> This union is unmaintainable.  It is very easy for someone to change
> ns_common and accidentially break this.  The C standard does not
> allow data to be accessed as either one union member or the other.
> Which means semantically this code relies on undefined behavior, and
> the compiler can do anything in this case and gcc has sometimes been
> known to use that allowance.
> 
> Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
@ 2016-07-24  5:54             ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-24  5:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrey Vagin, linux-kernel, James Bottomley, Serge Hallyn,
	linux-api, containers, Alexander Viro, criu, linux-fsdevel,
	Michael Kerrisk (man-pages)

On Sun, Jul 24, 2016 at 12:00:13AM -0500, Eric W. Biederman wrote:
> Andrey Vagin <avagin@openvz.org> writes:
> 
> > Every namespace has a pointer to an user namespace where is was created,
> > but they're all privately embedded in the individual namespace specific
> > structures.
> >
> > Now we are going to add an user-space interface to get an owning user
> > namespace, so it looks reasonable to move it into ns_common.
> >
> > Originally this idea was suggested by James Bottomley.
> 
> I skimmed through this and I really don't like move user_ns into
> ns_common.  If for no other reason that it seems to have guarantteed
> this patchset as written would not apply to my tree.

I am not insisting on this. In a second version, I will add the
get_owner operation to proc_ns_operations.

Thanks!

> 
> > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> > index 8297e5b..a941b44 100644
> > --- a/include/linux/user_namespace.h
> > +++ b/include/linux/user_namespace.h
> > @@ -27,11 +27,15 @@ struct user_namespace {
> >  	struct uid_gid_map	gid_map;
> >  	struct uid_gid_map	projid_map;
> >  	atomic_t		count;
> > -	struct user_namespace	*parent;
> >  	int			level;
> >  	kuid_t			owner;
> >  	kgid_t			group;
> > -	struct ns_common	ns;
> > +
> > +	/* ->ns.user_ns and ->parent are synonyms */
> > +	union {
> > +		struct user_namespace	*parent;
> > +		struct ns_common	ns;
> > +	};
> >  	unsigned long		flags;
> >  
> >  	/* Register of per-UID persistent keyrings for this namespace */
> 
> This union is unmaintainable.  It is very easy for someone to change
> ns_common and accidentially break this.  The C standard does not
> allow data to be accessed as either one union member or the other.
> Which means semantically this code relies on undefined behavior, and
> the compiler can do anything in this case and gcc has sometimes been
> known to use that allowance.
> 
> Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
  2016-07-24  5:00         ` Eric W. Biederman
@ 2016-07-24  5:54           ` Andrew Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-24  5:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrey Vagin, linux-kernel, James Bottomley, Serge Hallyn,
	linux-api, containers, Alexander Viro, criu, linux-fsdevel,
	Michael Kerrisk (man-pages)

On Sun, Jul 24, 2016 at 12:00:13AM -0500, Eric W. Biederman wrote:
> Andrey Vagin <avagin@openvz.org> writes:
> 
> > Every namespace has a pointer to an user namespace where is was created,
> > but they're all privately embedded in the individual namespace specific
> > structures.
> >
> > Now we are going to add an user-space interface to get an owning user
> > namespace, so it looks reasonable to move it into ns_common.
> >
> > Originally this idea was suggested by James Bottomley.
> 
> I skimmed through this and I really don't like move user_ns into
> ns_common.  If for no other reason that it seems to have guarantteed
> this patchset as written would not apply to my tree.

I am not insisting on this. In a second version, I will add the
get_owner operation to proc_ns_operations.

Thanks!

> 
> > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> > index 8297e5b..a941b44 100644
> > --- a/include/linux/user_namespace.h
> > +++ b/include/linux/user_namespace.h
> > @@ -27,11 +27,15 @@ struct user_namespace {
> >  	struct uid_gid_map	gid_map;
> >  	struct uid_gid_map	projid_map;
> >  	atomic_t		count;
> > -	struct user_namespace	*parent;
> >  	int			level;
> >  	kuid_t			owner;
> >  	kgid_t			group;
> > -	struct ns_common	ns;
> > +
> > +	/* ->ns.user_ns and ->parent are synonyms */
> > +	union {
> > +		struct user_namespace	*parent;
> > +		struct ns_common	ns;
> > +	};
> >  	unsigned long		flags;
> >  
> >  	/* Register of per-UID persistent keyrings for this namespace */
> 
> This union is unmaintainable.  It is very easy for someone to change
> ns_common and accidentially break this.  The C standard does not
> allow data to be accessed as either one union member or the other.
> Which means semantically this code relies on undefined behavior, and
> the compiler can do anything in this case and gcc has sometimes been
> known to use that allowance.
> 
> Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
       [not found]         ` <87k2gbmy02.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
@ 2016-07-24  5:54             ` Andrew Vagin
  2016-07-24  5:54             ` Andrew Vagin
  1 sibling, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-24  5:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrey Vagin, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	James Bottomley, Serge Hallyn, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

On Sun, Jul 24, 2016 at 12:00:13AM -0500, Eric W. Biederman wrote:
> Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:
> 
> > Every namespace has a pointer to an user namespace where is was created,
> > but they're all privately embedded in the individual namespace specific
> > structures.
> >
> > Now we are going to add an user-space interface to get an owning user
> > namespace, so it looks reasonable to move it into ns_common.
> >
> > Originally this idea was suggested by James Bottomley.
> 
> I skimmed through this and I really don't like move user_ns into
> ns_common.  If for no other reason that it seems to have guarantteed
> this patchset as written would not apply to my tree.

I am not insisting on this. In a second version, I will add the
get_owner operation to proc_ns_operations.

Thanks!

> 
> > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> > index 8297e5b..a941b44 100644
> > --- a/include/linux/user_namespace.h
> > +++ b/include/linux/user_namespace.h
> > @@ -27,11 +27,15 @@ struct user_namespace {
> >  	struct uid_gid_map	gid_map;
> >  	struct uid_gid_map	projid_map;
> >  	atomic_t		count;
> > -	struct user_namespace	*parent;
> >  	int			level;
> >  	kuid_t			owner;
> >  	kgid_t			group;
> > -	struct ns_common	ns;
> > +
> > +	/* ->ns.user_ns and ->parent are synonyms */
> > +	union {
> > +		struct user_namespace	*parent;
> > +		struct ns_common	ns;
> > +	};
> >  	unsigned long		flags;
> >  
> >  	/* Register of per-UID persistent keyrings for this namespace */
> 
> This union is unmaintainable.  It is very easy for someone to change
> ns_common and accidentially break this.  The C standard does not
> allow data to be accessed as either one union member or the other.
> Which means semantically this code relies on undefined behavior, and
> the compiler can do anything in this case and gcc has sometimes been
> known to use that allowance.
> 
> Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
  2016-07-24  5:00         ` Eric W. Biederman
@ 2016-07-24  5:54           ` Andrew Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-24  5:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrey Vagin, linux-kernel, James Bottomley, Serge Hallyn,
	linux-api, containers, Alexander Viro, criu, linux-fsdevel,
	Michael Kerrisk (man-pages)

On Sun, Jul 24, 2016 at 12:00:13AM -0500, Eric W. Biederman wrote:
> Andrey Vagin <avagin@openvz.org> writes:
> 
> > Every namespace has a pointer to an user namespace where is was created,
> > but they're all privately embedded in the individual namespace specific
> > structures.
> >
> > Now we are going to add an user-space interface to get an owning user
> > namespace, so it looks reasonable to move it into ns_common.
> >
> > Originally this idea was suggested by James Bottomley.
> 
> I skimmed through this and I really don't like move user_ns into
> ns_common.  If for no other reason that it seems to have guarantteed
> this patchset as written would not apply to my tree.

I am not insisting on this. In a second version, I will add the
get_owner operation to proc_ns_operations.

Thanks!

> 
> > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> > index 8297e5b..a941b44 100644
> > --- a/include/linux/user_namespace.h
> > +++ b/include/linux/user_namespace.h
> > @@ -27,11 +27,15 @@ struct user_namespace {
> >  	struct uid_gid_map	gid_map;
> >  	struct uid_gid_map	projid_map;
> >  	atomic_t		count;
> > -	struct user_namespace	*parent;
> >  	int			level;
> >  	kuid_t			owner;
> >  	kgid_t			group;
> > -	struct ns_common	ns;
> > +
> > +	/* ->ns.user_ns and ->parent are synonyms */
> > +	union {
> > +		struct user_namespace	*parent;
> > +		struct ns_common	ns;
> > +	};
> >  	unsigned long		flags;
> >  
> >  	/* Register of per-UID persistent keyrings for this namespace */
> 
> This union is unmaintainable.  It is very easy for someone to change
> ns_common and accidentially break this.  The C standard does not
> allow data to be accessed as either one union member or the other.
> Which means semantically this code relies on undefined behavior, and
> the compiler can do anything in this case and gcc has sometimes been
> known to use that allowance.
> 
> Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
@ 2016-07-24  5:54           ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-24  5:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrey Vagin, linux-kernel, James Bottomley, Serge Hallyn,
	linux-api, containers, Alexander Viro, criu, linux-fsdevel,
	Michael Kerrisk (man-pages)

On Sun, Jul 24, 2016 at 12:00:13AM -0500, Eric W. Biederman wrote:
> Andrey Vagin <avagin@openvz.org> writes:
> 
> > Every namespace has a pointer to an user namespace where is was created,
> > but they're all privately embedded in the individual namespace specific
> > structures.
> >
> > Now we are going to add an user-space interface to get an owning user
> > namespace, so it looks reasonable to move it into ns_common.
> >
> > Originally this idea was suggested by James Bottomley.
> 
> I skimmed through this and I really don't like move user_ns into
> ns_common.  If for no other reason that it seems to have guarantteed
> this patchset as written would not apply to my tree.

I am not insisting on this. In a second version, I will add the
get_owner operation to proc_ns_operations.

Thanks!

> 
> > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> > index 8297e5b..a941b44 100644
> > --- a/include/linux/user_namespace.h
> > +++ b/include/linux/user_namespace.h
> > @@ -27,11 +27,15 @@ struct user_namespace {
> >  	struct uid_gid_map	gid_map;
> >  	struct uid_gid_map	projid_map;
> >  	atomic_t		count;
> > -	struct user_namespace	*parent;
> >  	int			level;
> >  	kuid_t			owner;
> >  	kgid_t			group;
> > -	struct ns_common	ns;
> > +
> > +	/* ->ns.user_ns and ->parent are synonyms */
> > +	union {
> > +		struct user_namespace	*parent;
> > +		struct ns_common	ns;
> > +	};
> >  	unsigned long		flags;
> >  
> >  	/* Register of per-UID persistent keyrings for this namespace */
> 
> This union is unmaintainable.  It is very easy for someone to change
> ns_common and accidentially break this.  The C standard does not
> allow data to be accessed as either one union member or the other.
> Which means semantically this code relies on undefined behavior, and
> the compiler can do anything in this case and gcc has sometimes been
> known to use that allowance.
> 
> Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
@ 2016-07-24  5:54             ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-24  5:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrey Vagin, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	James Bottomley, Serge Hallyn, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

On Sun, Jul 24, 2016 at 12:00:13AM -0500, Eric W. Biederman wrote:
> Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:
> 
> > Every namespace has a pointer to an user namespace where is was created,
> > but they're all privately embedded in the individual namespace specific
> > structures.
> >
> > Now we are going to add an user-space interface to get an owning user
> > namespace, so it looks reasonable to move it into ns_common.
> >
> > Originally this idea was suggested by James Bottomley.
> 
> I skimmed through this and I really don't like move user_ns into
> ns_common.  If for no other reason that it seems to have guarantteed
> this patchset as written would not apply to my tree.

I am not insisting on this. In a second version, I will add the
get_owner operation to proc_ns_operations.

Thanks!

> 
> > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> > index 8297e5b..a941b44 100644
> > --- a/include/linux/user_namespace.h
> > +++ b/include/linux/user_namespace.h
> > @@ -27,11 +27,15 @@ struct user_namespace {
> >  	struct uid_gid_map	gid_map;
> >  	struct uid_gid_map	projid_map;
> >  	atomic_t		count;
> > -	struct user_namespace	*parent;
> >  	int			level;
> >  	kuid_t			owner;
> >  	kgid_t			group;
> > -	struct ns_common	ns;
> > +
> > +	/* ->ns.user_ns and ->parent are synonyms */
> > +	union {
> > +		struct user_namespace	*parent;
> > +		struct ns_common	ns;
> > +	};
> >  	unsigned long		flags;
> >  
> >  	/* Register of per-UID persistent keyrings for this namespace */
> 
> This union is unmaintainable.  It is very easy for someone to change
> ns_common and accidentially break this.  The C standard does not
> allow data to be accessed as either one union member or the other.
> Which means semantically this code relies on undefined behavior, and
> the compiler can do anything in this case and gcc has sometimes been
> known to use that allowance.
> 
> Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 1/5] namespaces: move user_ns into ns_common
@ 2016-07-24  5:54           ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-24  5:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrey Vagin, linux-kernel, James Bottomley, Serge Hallyn,
	linux-api, containers, Alexander Viro, criu, linux-fsdevel,
	Michael Kerrisk (man-pages)

On Sun, Jul 24, 2016 at 12:00:13AM -0500, Eric W. Biederman wrote:
> Andrey Vagin <avagin@openvz.org> writes:
> 
> > Every namespace has a pointer to an user namespace where is was created,
> > but they're all privately embedded in the individual namespace specific
> > structures.
> >
> > Now we are going to add an user-space interface to get an owning user
> > namespace, so it looks reasonable to move it into ns_common.
> >
> > Originally this idea was suggested by James Bottomley.
> 
> I skimmed through this and I really don't like move user_ns into
> ns_common.  If for no other reason that it seems to have guarantteed
> this patchset as written would not apply to my tree.

I am not insisting on this. In a second version, I will add the
get_owner operation to proc_ns_operations.

Thanks!

> 
> > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> > index 8297e5b..a941b44 100644
> > --- a/include/linux/user_namespace.h
> > +++ b/include/linux/user_namespace.h
> > @@ -27,11 +27,15 @@ struct user_namespace {
> >  	struct uid_gid_map	gid_map;
> >  	struct uid_gid_map	projid_map;
> >  	atomic_t		count;
> > -	struct user_namespace	*parent;
> >  	int			level;
> >  	kuid_t			owner;
> >  	kgid_t			group;
> > -	struct ns_common	ns;
> > +
> > +	/* ->ns.user_ns and ->parent are synonyms */
> > +	union {
> > +		struct user_namespace	*parent;
> > +		struct ns_common	ns;
> > +	};
> >  	unsigned long		flags;
> >  
> >  	/* Register of per-UID persistent keyrings for this namespace */
> 
> This union is unmaintainable.  It is very easy for someone to change
> ns_common and accidentially break this.  The C standard does not
> allow data to be accessed as either one union member or the other.
> Which means semantically this code relies on undefined behavior, and
> the compiler can do anything in this case and gcc has sometimes been
> known to use that allowance.
> 
> Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
  2016-07-24  5:03             ` Eric W. Biederman
@ 2016-07-24  6:37                 ` Andrew Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-24  6:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge Hallyn, Andrey Vagin, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Michael Kerrisk (man-pages)

On Sun, Jul 24, 2016 at 12:03:49AM -0500, Eric W. Biederman wrote:
> Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:
> 
> > Return -EPERM if an owning user namespace is outside of a process
> > current user namespace.
> >
> > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> > index a5bc78c..6382e5e 100644
> > --- a/kernel/user_namespace.c
> > +++ b/kernel/user_namespace.c
> > @@ -994,6 +994,30 @@ static int userns_install(struct nsproxy *nsproxy, struct ns_common *ns)
> >  	return commit_creds(cred);
> >  }
> >  
> > +struct ns_common *ns_get_owner(struct ns_common *ns)
> > +{
> > +	const struct cred *cred = current_cred();
> > +	struct user_namespace *user_ns, *p;
> > +
> > +	user_ns = p = ns->user_ns;
> > +	if (user_ns == NULL) { /* ns is init_user_ns */
> > +		/* Unprivileged user should not know that it's init_user_ns. */
> > +		if (capable(CAP_SYS_ADMIN))
> > +			return ERR_PTR(-ENOENT);
> > +		return ERR_PTR(-EPERM);
> > +	}
> 
> This permission check is not what I meant to request.  This does not
> handle nested user namespaces.

Here I handle a case when ns is init_user_ns. init_user_ns doesn't have
a parent, so we need to return an error. We can't return ENOENT in all
cases, because we don't want to expose "that file descriptor is for the
root user namespace" to unprivileged users.
(Trevor suggested to add this check and it looks resonable for me too).
> 
> > +	for (;;) {
> > +		if (p == cred->user_ns)
> > +			break;
> > +		if (p == &init_user_ns)
> > +			return ERR_PTR(-EPERM);
> > +		p = p->parent;
> > +	}
> > +
> 
> The permission check really needs to be down here. And be:
> 
> 	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
>         	return ERR_PTR(-EPERM).
> 
> That cleanly and easily handles more than a depth of a single user
> namespace.
> 
> > +	return &get_user_ns(user_ns)->ns;
> > +}
> > +
> >  const struct proc_ns_operations userns_operations = {
> >  	.name		= "user",
> >  	.type		= CLONE_NEWUSER,
> 
> 
> Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
@ 2016-07-24  6:37                 ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-24  6:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrey Vagin, linux-kernel, James Bottomley, Serge Hallyn,
	linux-api, containers, Alexander Viro, criu, linux-fsdevel,
	Michael Kerrisk (man-pages)

On Sun, Jul 24, 2016 at 12:03:49AM -0500, Eric W. Biederman wrote:
> Andrey Vagin <avagin@openvz.org> writes:
> 
> > Return -EPERM if an owning user namespace is outside of a process
> > current user namespace.
> >
> > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> > index a5bc78c..6382e5e 100644
> > --- a/kernel/user_namespace.c
> > +++ b/kernel/user_namespace.c
> > @@ -994,6 +994,30 @@ static int userns_install(struct nsproxy *nsproxy, struct ns_common *ns)
> >  	return commit_creds(cred);
> >  }
> >  
> > +struct ns_common *ns_get_owner(struct ns_common *ns)
> > +{
> > +	const struct cred *cred = current_cred();
> > +	struct user_namespace *user_ns, *p;
> > +
> > +	user_ns = p = ns->user_ns;
> > +	if (user_ns == NULL) { /* ns is init_user_ns */
> > +		/* Unprivileged user should not know that it's init_user_ns. */
> > +		if (capable(CAP_SYS_ADMIN))
> > +			return ERR_PTR(-ENOENT);
> > +		return ERR_PTR(-EPERM);
> > +	}
> 
> This permission check is not what I meant to request.  This does not
> handle nested user namespaces.

Here I handle a case when ns is init_user_ns. init_user_ns doesn't have
a parent, so we need to return an error. We can't return ENOENT in all
cases, because we don't want to expose "that file descriptor is for the
root user namespace" to unprivileged users.
(Trevor suggested to add this check and it looks resonable for me too).
> 
> > +	for (;;) {
> > +		if (p == cred->user_ns)
> > +			break;
> > +		if (p == &init_user_ns)
> > +			return ERR_PTR(-EPERM);
> > +		p = p->parent;
> > +	}
> > +
> 
> The permission check really needs to be down here. And be:
> 
> 	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
>         	return ERR_PTR(-EPERM).
> 
> That cleanly and easily handles more than a depth of a single user
> namespace.
> 
> > +	return &get_user_ns(user_ns)->ns;
> > +}
> > +
> >  const struct proc_ns_operations userns_operations = {
> >  	.name		= "user",
> >  	.type		= CLONE_NEWUSER,
> 
> 
> Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
       [not found]                 ` <20160724063728.GA17810-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
@ 2016-07-24 14:30                   ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24 14:30 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: Serge Hallyn, Andrey Vagin, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Michael Kerrisk (man-pages)

Andrew Vagin <avagin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org> writes:

> On Sun, Jul 24, 2016 at 12:03:49AM -0500, Eric W. Biederman wrote:
>> Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:
>> 
>> > Return -EPERM if an owning user namespace is outside of a process
>> > current user namespace.
>> >
>> > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> > index a5bc78c..6382e5e 100644
>> > --- a/kernel/user_namespace.c
>> > +++ b/kernel/user_namespace.c
>> > @@ -994,6 +994,30 @@ static int userns_install(struct nsproxy *nsproxy, struct ns_common *ns)
>> >  	return commit_creds(cred);
>> >  }
>> >  
>> > +struct ns_common *ns_get_owner(struct ns_common *ns)
>> > +{
>> > +	const struct cred *cred = current_cred();
>> > +	struct user_namespace *user_ns, *p;
>> > +
>> > +	user_ns = p = ns->user_ns;
>> > +	if (user_ns == NULL) { /* ns is init_user_ns */
>> > +		/* Unprivileged user should not know that it's init_user_ns. */
>> > +		if (capable(CAP_SYS_ADMIN))
>> > +			return ERR_PTR(-ENOENT);
>> > +		return ERR_PTR(-EPERM);
>> > +	}
>> 
>> This permission check is not what I meant to request.  This does not
>> handle nested user namespaces.
>
> Here I handle a case when ns is init_user_ns. init_user_ns doesn't have
> a parent, so we need to return an error. We can't return ENOENT in all
> cases, because we don't want to expose "that file descriptor is for the
> root user namespace" to unprivileged users.
> (Trevor suggested to add this check and it looks resonable for me
> too).

Apologies. I was skimming and misread the code.  I mistook that loop for
some useful part of getting the owner.  Looking in more detail...

Your code says:
+struct ns_common *ns_get_owner(struct ns_common *ns)
+{
+	const struct cred *cred = current_cred();
+	struct user_namespace *user_ns, *p;
+
+	user_ns = p = ns->user_ns;
+	if (user_ns == NULL) { /* ns is init_user_ns */
+		/* Unprivileged user should not know that it's init_user_ns. */
+		if (capable(CAP_SYS_ADMIN))
+			return ERR_PTR(-ENOENT);
+		return ERR_PTR(-EPERM);
+	}
+
+	for (;;) {
+		if (p == cred->user_ns)
+			break;
+		if (p == &init_user_ns)
+			return ERR_PTR(-EPERM);
+		p = p->parent;
+	}
+
+	return &get_user_ns(user_ns)->ns;
+}

And all else being equal it could say:

+struct ns_common *ns_get_owner(struct ns_common *ns)
s+{
+	struct user_namespace *user_ns = ns->user_ns;
+
+	/* Are we allowed to see the user namespace? */
+	if (!ns_capable(user_ns?user_ns:&init_user_ns, CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
+
+	if (!user_ns)
+		return ERR_PTR(-ENOENT);
+
+	return &get_user_ns(user_ns)->ns;
+}

Which I think is the root of my confusion.  You hand rolled ns_capable,
and I did not recognize it because I was skimming, and just looking to
be certain the permission check was present.

Given that you have to hand roll the pid namespace check that hand
rolling may not be so bad.  But it certainly was confusing the first
time I saw it especially without a comment.

Hmm.

I am not at all certain it makes sense to return -ENOENT.

Without the -ENOENT check the code is much cleaner, and clearer.

I may be blinkered but I don't see the value in letting someone know we
are talking about the initial namespace.  If anything that information
is likely to cause issues with weird corner cases of checkpoint/restart,
as it acts different if you are in a container or not.

When things act different in a container that almost always is a source
of a problem somewhere.

So we could simplify the filter in the code to just this.

+struct ns_common *ns_get_owner(struct ns_common *ns)
+{
+	struct user_namespace *my_user_ns = current_user_ns();
+	struct user_namespace *owner, *p;
+
+	/* See if the owner is in the current user namespace */
+	owner = p = ns->user_ns;
+	for (;;) {
+		if (!p)
+			return ERR_PTR(-EPERM);
+		if (p == my_user_ns)
+			break;
+		p = p->parent;
+	}
+
+	return &get_user_ns(owner)->ns;
+}

And on reflection I do see the point of not using ns_capable as that
requires having privileges in a namespace while all we want here
is to see if someone is in a visible namespace.

So please ignore my comments about ns_capable on the pid namespace
parent.

But please simplify the loop and put an appropriate comment on it like I
have above.  The fewer special cases the easier the code is to get
correct, and the easier it is to read.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
       [not found]                 ` <20160724063728.GA17810-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
@ 2016-07-24 14:30                   ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24 14:30 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: Andrey Vagin, linux-kernel, James Bottomley, Serge Hallyn,
	linux-api, containers, Alexander Viro, criu, linux-fsdevel,
	Michael Kerrisk (man-pages)

Andrew Vagin <avagin@virtuozzo.com> writes:

> On Sun, Jul 24, 2016 at 12:03:49AM -0500, Eric W. Biederman wrote:
>> Andrey Vagin <avagin@openvz.org> writes:
>> 
>> > Return -EPERM if an owning user namespace is outside of a process
>> > current user namespace.
>> >
>> > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> > index a5bc78c..6382e5e 100644
>> > --- a/kernel/user_namespace.c
>> > +++ b/kernel/user_namespace.c
>> > @@ -994,6 +994,30 @@ static int userns_install(struct nsproxy *nsproxy, struct ns_common *ns)
>> >  	return commit_creds(cred);
>> >  }
>> >  
>> > +struct ns_common *ns_get_owner(struct ns_common *ns)
>> > +{
>> > +	const struct cred *cred = current_cred();
>> > +	struct user_namespace *user_ns, *p;
>> > +
>> > +	user_ns = p = ns->user_ns;
>> > +	if (user_ns == NULL) { /* ns is init_user_ns */
>> > +		/* Unprivileged user should not know that it's init_user_ns. */
>> > +		if (capable(CAP_SYS_ADMIN))
>> > +			return ERR_PTR(-ENOENT);
>> > +		return ERR_PTR(-EPERM);
>> > +	}
>> 
>> This permission check is not what I meant to request.  This does not
>> handle nested user namespaces.
>
> Here I handle a case when ns is init_user_ns. init_user_ns doesn't have
> a parent, so we need to return an error. We can't return ENOENT in all
> cases, because we don't want to expose "that file descriptor is for the
> root user namespace" to unprivileged users.
> (Trevor suggested to add this check and it looks resonable for me
> too).

Apologies. I was skimming and misread the code.  I mistook that loop for
some useful part of getting the owner.  Looking in more detail...

Your code says:
+struct ns_common *ns_get_owner(struct ns_common *ns)
+{
+	const struct cred *cred = current_cred();
+	struct user_namespace *user_ns, *p;
+
+	user_ns = p = ns->user_ns;
+	if (user_ns == NULL) { /* ns is init_user_ns */
+		/* Unprivileged user should not know that it's init_user_ns. */
+		if (capable(CAP_SYS_ADMIN))
+			return ERR_PTR(-ENOENT);
+		return ERR_PTR(-EPERM);
+	}
+
+	for (;;) {
+		if (p == cred->user_ns)
+			break;
+		if (p == &init_user_ns)
+			return ERR_PTR(-EPERM);
+		p = p->parent;
+	}
+
+	return &get_user_ns(user_ns)->ns;
+}

And all else being equal it could say:

+struct ns_common *ns_get_owner(struct ns_common *ns)
s+{
+	struct user_namespace *user_ns = ns->user_ns;
+
+	/* Are we allowed to see the user namespace? */
+	if (!ns_capable(user_ns?user_ns:&init_user_ns, CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
+
+	if (!user_ns)
+		return ERR_PTR(-ENOENT);
+
+	return &get_user_ns(user_ns)->ns;
+}

Which I think is the root of my confusion.  You hand rolled ns_capable,
and I did not recognize it because I was skimming, and just looking to
be certain the permission check was present.

Given that you have to hand roll the pid namespace check that hand
rolling may not be so bad.  But it certainly was confusing the first
time I saw it especially without a comment.

Hmm.

I am not at all certain it makes sense to return -ENOENT.

Without the -ENOENT check the code is much cleaner, and clearer.

I may be blinkered but I don't see the value in letting someone know we
are talking about the initial namespace.  If anything that information
is likely to cause issues with weird corner cases of checkpoint/restart,
as it acts different if you are in a container or not.

When things act different in a container that almost always is a source
of a problem somewhere.

So we could simplify the filter in the code to just this.

+struct ns_common *ns_get_owner(struct ns_common *ns)
+{
+	struct user_namespace *my_user_ns = current_user_ns();
+	struct user_namespace *owner, *p;
+
+	/* See if the owner is in the current user namespace */
+	owner = p = ns->user_ns;
+	for (;;) {
+		if (!p)
+			return ERR_PTR(-EPERM);
+		if (p == my_user_ns)
+			break;
+		p = p->parent;
+	}
+
+	return &get_user_ns(owner)->ns;
+}

And on reflection I do see the point of not using ns_capable as that
requires having privileges in a namespace while all we want here
is to see if someone is in a visible namespace.

So please ignore my comments about ns_capable on the pid namespace
parent.

But please simplify the loop and put an appropriate comment on it like I
have above.  The fewer special cases the easier the code is to get
correct, and the easier it is to read.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
@ 2016-07-24 14:30                   ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-24 14:30 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: Andrey Vagin, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	James Bottomley, Serge Hallyn, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)

Andrew Vagin <avagin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org> writes:

> On Sun, Jul 24, 2016 at 12:03:49AM -0500, Eric W. Biederman wrote:
>> Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:
>> 
>> > Return -EPERM if an owning user namespace is outside of a process
>> > current user namespace.
>> >
>> > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> > index a5bc78c..6382e5e 100644
>> > --- a/kernel/user_namespace.c
>> > +++ b/kernel/user_namespace.c
>> > @@ -994,6 +994,30 @@ static int userns_install(struct nsproxy *nsproxy, struct ns_common *ns)
>> >  	return commit_creds(cred);
>> >  }
>> >  
>> > +struct ns_common *ns_get_owner(struct ns_common *ns)
>> > +{
>> > +	const struct cred *cred = current_cred();
>> > +	struct user_namespace *user_ns, *p;
>> > +
>> > +	user_ns = p = ns->user_ns;
>> > +	if (user_ns == NULL) { /* ns is init_user_ns */
>> > +		/* Unprivileged user should not know that it's init_user_ns. */
>> > +		if (capable(CAP_SYS_ADMIN))
>> > +			return ERR_PTR(-ENOENT);
>> > +		return ERR_PTR(-EPERM);
>> > +	}
>> 
>> This permission check is not what I meant to request.  This does not
>> handle nested user namespaces.
>
> Here I handle a case when ns is init_user_ns. init_user_ns doesn't have
> a parent, so we need to return an error. We can't return ENOENT in all
> cases, because we don't want to expose "that file descriptor is for the
> root user namespace" to unprivileged users.
> (Trevor suggested to add this check and it looks resonable for me
> too).

Apologies. I was skimming and misread the code.  I mistook that loop for
some useful part of getting the owner.  Looking in more detail...

Your code says:
+struct ns_common *ns_get_owner(struct ns_common *ns)
+{
+	const struct cred *cred = current_cred();
+	struct user_namespace *user_ns, *p;
+
+	user_ns = p = ns->user_ns;
+	if (user_ns == NULL) { /* ns is init_user_ns */
+		/* Unprivileged user should not know that it's init_user_ns. */
+		if (capable(CAP_SYS_ADMIN))
+			return ERR_PTR(-ENOENT);
+		return ERR_PTR(-EPERM);
+	}
+
+	for (;;) {
+		if (p == cred->user_ns)
+			break;
+		if (p == &init_user_ns)
+			return ERR_PTR(-EPERM);
+		p = p->parent;
+	}
+
+	return &get_user_ns(user_ns)->ns;
+}

And all else being equal it could say:

+struct ns_common *ns_get_owner(struct ns_common *ns)
s+{
+	struct user_namespace *user_ns = ns->user_ns;
+
+	/* Are we allowed to see the user namespace? */
+	if (!ns_capable(user_ns?user_ns:&init_user_ns, CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
+
+	if (!user_ns)
+		return ERR_PTR(-ENOENT);
+
+	return &get_user_ns(user_ns)->ns;
+}

Which I think is the root of my confusion.  You hand rolled ns_capable,
and I did not recognize it because I was skimming, and just looking to
be certain the permission check was present.

Given that you have to hand roll the pid namespace check that hand
rolling may not be so bad.  But it certainly was confusing the first
time I saw it especially without a comment.

Hmm.

I am not at all certain it makes sense to return -ENOENT.

Without the -ENOENT check the code is much cleaner, and clearer.

I may be blinkered but I don't see the value in letting someone know we
are talking about the initial namespace.  If anything that information
is likely to cause issues with weird corner cases of checkpoint/restart,
as it acts different if you are in a container or not.

When things act different in a container that almost always is a source
of a problem somewhere.

So we could simplify the filter in the code to just this.

+struct ns_common *ns_get_owner(struct ns_common *ns)
+{
+	struct user_namespace *my_user_ns = current_user_ns();
+	struct user_namespace *owner, *p;
+
+	/* See if the owner is in the current user namespace */
+	owner = p = ns->user_ns;
+	for (;;) {
+		if (!p)
+			return ERR_PTR(-EPERM);
+		if (p == my_user_ns)
+			break;
+		p = p->parent;
+	}
+
+	return &get_user_ns(owner)->ns;
+}

And on reflection I do see the point of not using ns_capable as that
requires having privileges in a namespace while all we want here
is to see if someone is in a visible namespace.

So please ignore my comments about ns_capable on the pid namespace
parent.

But please simplify the loop and put an appropriate comment on it like I
have above.  The fewer special cases the easier the code is to get
correct, and the easier it is to read.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
  2016-07-15  2:12           ` Andrey Vagin
@ 2016-07-24 16:54               ` W. Trevor King
  -1 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-24 16:54 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: James Bottomley, Serge Hallyn, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
	criu-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages)


[-- Attachment #1.1: Type: text/plain, Size: 657 bytes --]

On Thu, Jul 14, 2016 at 07:12:19PM -0700, Andrey Vagin wrote:
> +struct ns_common *ns_get_owner(struct ns_common *ns)
> +{
> +	…
> +	return &get_user_ns(user_ns)->ns;
> +}

Is there a reason to return the generic ‘struct ns_common *’ here
instead of ‘struct user_namespace *’?  The current use case doesn't
need access to the additional information, but future ng_get_owner
callers might, and we know the returned namespace (if any) will be a
user namespace.

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
@ 2016-07-24 16:54               ` W. Trevor King
  0 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-24 16:54 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel, linux-api, containers, criu, linux-fsdevel,
	Eric W. Biederman, James Bottomley, Michael Kerrisk (man-pages),
	Alexander Viro, Serge Hallyn

[-- Attachment #1: Type: text/plain, Size: 657 bytes --]

On Thu, Jul 14, 2016 at 07:12:19PM -0700, Andrey Vagin wrote:
> +struct ns_common *ns_get_owner(struct ns_common *ns)
> +{
> +	…
> +	return &get_user_ns(user_ns)->ns;
> +}

Is there a reason to return the generic ‘struct ns_common *’ here
instead of ‘struct user_namespace *’?  The current use case doesn't
need access to the additional information, but future ng_get_owner
callers might, and we know the returned namespace (if any) will be a
user namespace.

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
  2016-07-24 14:30                   ` Eric W. Biederman
@ 2016-07-24 17:05                       ` W. Trevor King
  -1 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-24 17:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: James Bottomley, Andrew Vagin, linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
	Alexander Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Michael Kerrisk (man-pages),
	Andrey Vagin


[-- Attachment #1.1: Type: text/plain, Size: 988 bytes --]

On Sun, Jul 24, 2016 at 09:30:03AM -0500, Eric W. Biederman wrote:
> I am not at all certain it makes sense to return -ENOENT.
> 
> Without the -ENOENT check the code is much cleaner, and clearer.

This is fine with me, and makes even more sense for owner (user)
namespaces than it does for net namespaces [1].  At least, I can't
think of a reason why the root user namespace would have special
userspace-visible behavior ;).

Cheers,
Trevor

[1]: http://thread.gmane.org/gmane.linux.kernel.api/20626/focus=30639
     Subject: Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces                                                                              
     Date: Sat, 23 Jul 2016 23:51:07 -0500
     Message-ID: <877fcboczo.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace
@ 2016-07-24 17:05                       ` W. Trevor King
  0 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-24 17:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Vagin, Andrey Vagin, criu, linux-api, containers,
	linux-kernel, James Bottomley, Alexander Viro, linux-fsdevel,
	Michael Kerrisk (man-pages)

[-- Attachment #1: Type: text/plain, Size: 960 bytes --]

On Sun, Jul 24, 2016 at 09:30:03AM -0500, Eric W. Biederman wrote:
> I am not at all certain it makes sense to return -ENOENT.
> 
> Without the -ENOENT check the code is much cleaner, and clearer.

This is fine with me, and makes even more sense for owner (user)
namespaces than it does for net namespaces [1].  At least, I can't
think of a reason why the root user namespace would have special
userspace-visible behavior ;).

Cheers,
Trevor

[1]: http://thread.gmane.org/gmane.linux.kernel.api/20626/focus=30639
     Subject: Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces                                                                              
     Date: Sat, 23 Jul 2016 23:51:07 -0500
     Message-ID: <877fcboczo.fsf@x220.int.ebiederm.org>

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]               ` <CANaxB-w8H8Wo8FmtmBBZTpJX-ZDGRQx0rbm9E5c9WbduQ_Ukmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-07-25 11:47                 ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-25 11:47 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: Serge Hallyn, Eric W. Biederman, Andrew Vagin,
	criu-GEFAQzZX7r8dnm+yROfE0A, Linux API, Linux Containers, LKML,
	James Bottomley, mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-fsdevel, Alexander Viro

Hi Andrey,

On 07/22/2016 08:25 PM, Andrey Vagin wrote:
> On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> Hi Andrey,
>>
>>
>> On 07/21/2016 11:06 PM, Andrew Vagin wrote:
>>>
>>> On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages)
>>> wrote:
>>>>
>>>> Hi Andrey,
>>>>
>>>> On 07/14/2016 08:20 PM, Andrey Vagin wrote:
>>>
>>>
>>> <snip>
>>>
>>>>
>>>> Could you add here an of the API in detail: what do these FDs refer to,
>>>> and how do you use them to solve the use case? And could you you add
>>>> that info to the commit messages please.
>>>
>>>
>>> Hi Michael,
>>>
>>> A patch for man-pages is attached. It adds the following text to
>>> namespaces(7).
>>>
>>> Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
>>> pace file descriptors.  The correct syntax is:
>>>
>>>       fd = ioctl(ns_fd, ioctl_type);
>>>
>>> where ioctl_type is one of the following:
>>>
>>> NS_GET_USERNS
>>>       Returns a file descriptor that refers to an owning  user  names‐
>>>       pace.
>>>
>>> NS_GET_PARENT
>>>       Returns  a  file  descriptor  that refers to a parent namespace.
>>>       This ioctl(2) can be used for pid and user namespaces. For  user
>>>       namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
>>>       ing.

For each of the above, I think it is worth mentioning that the
close-on-exec flag is set for the returned file descriptor.

>>>
>>> In addition to generic ioctl(2) errors, the following specific ones can
>>> occur:
>>>
>>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>>>
>>> EPERM  The  requested  namespace  is  outside  of the current namespace
>>>       scope.

Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial
user namespace"?

>>>
>>> ENOENT ns_fd refers to the init namespace.
>>
>>
>> Thanks for this. But still part of the question remains unanswered.
>> How do we (in user-space) use the file descriptors to answer any of
>> the questions that this patch series was designed to solve? (This
>> info should be in the commit message and the man-pages patch.)
>
> I'm sorry, but I am not sure that I understand what you ask.
>
> Here are the origin questions:
> Someone else then asked me a question that led me to wonder about
> generally introspecting on the parental relationships between user
> namespaces and the association of other namespaces types with user
> namespaces. One use would be visualization, in order to understand the
> running system. Another would be to answer the question I already
> mentioned: what capability does process X have to perform operations
> on a resource governed by namespace Y?
>
> Here is an example which shows how we can get the owning namespace
> inode number by using these ioctl-s.
>
> $ ls -l /proc/13929/ns/pid
> lrwxrwxrwx 1 root root 0 Jul 22 21:03 /proc/13929/ns/pid -> 'pid:[4026532228]'
>
> $ ./nsowner /proc/13929/ns/pid
> user:[4026532227]
>
> The owning user namespace for pid:[4026532228] is user:[4026532227].
>
> The nsowner  tool is cimpiled from this code:
>
> int main(int argc, char *argv[])
> {
>         char buf[128], path[] = "/proc/self/fd/0123456789";
>         int ns, uns, ret;
>
>         ns = open(argv[1], O_RDONLY);
>         if (ns < 0)
>                 return 1;
>
>         uns = ioctl(ns, NS_GET_USERNS);
>         if (uns < 0)
>                 return 1;
>
>         snprintf(path, sizeof(path), "/proc/self/fd/%d", uns);
>         ret = readlink(path, buf, sizeof(buf) - 1);
>         if (ret < 0)
>                 return 1;
>         buf[ret] = 0;
>
>         printf("%s\n", buf);
>
>         return 0;
> }

So, from my point of view, the important piece that was missing from
your commit message was the note to use readlink("/proc/self/fd/%d")
on the returned FDs. I think that detail needs to be part of the
commit message (and also the man page text). I think it even be
helpful to include the above program as part of the commit message:
it helps people more quickly grasp the API.

> Does this example answer to the origin question?

Yes.

>If it isn't, could
> you eloborate what you expect to see here.
>
> And I wrote one more example which show all relationships between
> namespaces. It enumirates all processes in a system, collects all
> namespaces and determins parent and owning namespaces for each of
> them, then it constructs a namespace tree and shows it.
>
> Here is a code: https://gist.github.com/avagin/db805f95e15ffb0af7e559dbb8de4418

That's great! Thanks!
  
> Here is an example of output for my test system:
> [root@fc24 nsfs]# ./nstree
> user:[4026531837]
>  \__  mnt:[4026532203]
>  \__  ipc:[4026531839]
>  \__  user:[4026532224]
>      \__  user:[4026532226]
>          \__  user:[4026532227]
>              \__  pid:[4026532228]
>      \__  pid:[4026532225]
>          \__  pid:[4026532228]
>  \__  user:[4026532221]
>      \__  pid:[4026532222]
>      \__  user:[4026532223]
>  \__  mnt:[4026532211]
>  \__  uts:[4026531838]
>  \__  cgroup:[4026531835]
>  \__  pid:[4026531836]
>      \__  pid:[4026532225]
>          \__  pid:[4026532228]
>      \__  pid:[4026532222]
>  \__  mnt:[4026531857]
>  \__  mnt:[4026531840]
>  \__  net:[4026531957]

Cheers,

Michael

>>>>> [1] https://lkml.org/lkml/2016/7/6/158
>>>>> [2] https://lkml.org/lkml/2016/7/9/101

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-22 18:25               ` Andrey Vagin
@ 2016-07-25 11:47                 ` Michael Kerrisk (man-pages)
  -1 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-25 11:47 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: mtk.manpages, Andrew Vagin, James Bottomley, Serge Hallyn,
	Linux API, Linux Containers, LKML, Alexander Viro, criu,
	linux-fsdevel, Eric W. Biederman

Hi Andrey,

On 07/22/2016 08:25 PM, Andrey Vagin wrote:
> On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> Hi Andrey,
>>
>>
>> On 07/21/2016 11:06 PM, Andrew Vagin wrote:
>>>
>>> On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages)
>>> wrote:
>>>>
>>>> Hi Andrey,
>>>>
>>>> On 07/14/2016 08:20 PM, Andrey Vagin wrote:
>>>
>>>
>>> <snip>
>>>
>>>>
>>>> Could you add here an of the API in detail: what do these FDs refer to,
>>>> and how do you use them to solve the use case? And could you you add
>>>> that info to the commit messages please.
>>>
>>>
>>> Hi Michael,
>>>
>>> A patch for man-pages is attached. It adds the following text to
>>> namespaces(7).
>>>
>>> Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
>>> pace file descriptors.  The correct syntax is:
>>>
>>>       fd = ioctl(ns_fd, ioctl_type);
>>>
>>> where ioctl_type is one of the following:
>>>
>>> NS_GET_USERNS
>>>       Returns a file descriptor that refers to an owning  user  names‐
>>>       pace.
>>>
>>> NS_GET_PARENT
>>>       Returns  a  file  descriptor  that refers to a parent namespace.
>>>       This ioctl(2) can be used for pid and user namespaces. For  user
>>>       namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
>>>       ing.

For each of the above, I think it is worth mentioning that the
close-on-exec flag is set for the returned file descriptor.

>>>
>>> In addition to generic ioctl(2) errors, the following specific ones can
>>> occur:
>>>
>>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>>>
>>> EPERM  The  requested  namespace  is  outside  of the current namespace
>>>       scope.

Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial
user namespace"?

>>>
>>> ENOENT ns_fd refers to the init namespace.
>>
>>
>> Thanks for this. But still part of the question remains unanswered.
>> How do we (in user-space) use the file descriptors to answer any of
>> the questions that this patch series was designed to solve? (This
>> info should be in the commit message and the man-pages patch.)
>
> I'm sorry, but I am not sure that I understand what you ask.
>
> Here are the origin questions:
> Someone else then asked me a question that led me to wonder about
> generally introspecting on the parental relationships between user
> namespaces and the association of other namespaces types with user
> namespaces. One use would be visualization, in order to understand the
> running system. Another would be to answer the question I already
> mentioned: what capability does process X have to perform operations
> on a resource governed by namespace Y?
>
> Here is an example which shows how we can get the owning namespace
> inode number by using these ioctl-s.
>
> $ ls -l /proc/13929/ns/pid
> lrwxrwxrwx 1 root root 0 Jul 22 21:03 /proc/13929/ns/pid -> 'pid:[4026532228]'
>
> $ ./nsowner /proc/13929/ns/pid
> user:[4026532227]
>
> The owning user namespace for pid:[4026532228] is user:[4026532227].
>
> The nsowner  tool is cimpiled from this code:
>
> int main(int argc, char *argv[])
> {
>         char buf[128], path[] = "/proc/self/fd/0123456789";
>         int ns, uns, ret;
>
>         ns = open(argv[1], O_RDONLY);
>         if (ns < 0)
>                 return 1;
>
>         uns = ioctl(ns, NS_GET_USERNS);
>         if (uns < 0)
>                 return 1;
>
>         snprintf(path, sizeof(path), "/proc/self/fd/%d", uns);
>         ret = readlink(path, buf, sizeof(buf) - 1);
>         if (ret < 0)
>                 return 1;
>         buf[ret] = 0;
>
>         printf("%s\n", buf);
>
>         return 0;
> }

So, from my point of view, the important piece that was missing from
your commit message was the note to use readlink("/proc/self/fd/%d")
on the returned FDs. I think that detail needs to be part of the
commit message (and also the man page text). I think it even be
helpful to include the above program as part of the commit message:
it helps people more quickly grasp the API.

> Does this example answer to the origin question?

Yes.

>If it isn't, could
> you eloborate what you expect to see here.
>
> And I wrote one more example which show all relationships between
> namespaces. It enumirates all processes in a system, collects all
> namespaces and determins parent and owning namespaces for each of
> them, then it constructs a namespace tree and shows it.
>
> Here is a code: https://gist.github.com/avagin/db805f95e15ffb0af7e559dbb8de4418

That's great! Thanks!
  
> Here is an example of output for my test system:
> [root@fc24 nsfs]# ./nstree
> user:[4026531837]
>  \__  mnt:[4026532203]
>  \__  ipc:[4026531839]
>  \__  user:[4026532224]
>      \__  user:[4026532226]
>          \__  user:[4026532227]
>              \__  pid:[4026532228]
>      \__  pid:[4026532225]
>          \__  pid:[4026532228]
>  \__  user:[4026532221]
>      \__  pid:[4026532222]
>      \__  user:[4026532223]
>  \__  mnt:[4026532211]
>  \__  uts:[4026531838]
>  \__  cgroup:[4026531835]
>  \__  pid:[4026531836]
>      \__  pid:[4026532225]
>          \__  pid:[4026532228]
>      \__  pid:[4026532222]
>  \__  mnt:[4026531857]
>  \__  mnt:[4026531840]
>  \__  net:[4026531957]

Cheers,

Michael

>>>>> [1] https://lkml.org/lkml/2016/7/6/158
>>>>> [2] https://lkml.org/lkml/2016/7/9/101

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-25 11:47                 ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-25 11:47 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: mtk.manpages, Andrew Vagin, James Bottomley, Serge Hallyn,
	Linux API, Linux Containers, LKML, Alexander Viro, criu,
	linux-fsdevel, Eric W. Biederman

Hi Andrey,

On 07/22/2016 08:25 PM, Andrey Vagin wrote:
> On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> Hi Andrey,
>>
>>
>> On 07/21/2016 11:06 PM, Andrew Vagin wrote:
>>>
>>> On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages)
>>> wrote:
>>>>
>>>> Hi Andrey,
>>>>
>>>> On 07/14/2016 08:20 PM, Andrey Vagin wrote:
>>>
>>>
>>> <snip>
>>>
>>>>
>>>> Could you add here an of the API in detail: what do these FDs refer to,
>>>> and how do you use them to solve the use case? And could you you add
>>>> that info to the commit messages please.
>>>
>>>
>>> Hi Michael,
>>>
>>> A patch for man-pages is attached. It adds the following text to
>>> namespaces(7).
>>>
>>> Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
>>> pace file descriptors.  The correct syntax is:
>>>
>>>       fd = ioctl(ns_fd, ioctl_type);
>>>
>>> where ioctl_type is one of the following:
>>>
>>> NS_GET_USERNS
>>>       Returns a file descriptor that refers to an owning  user  names‐
>>>       pace.
>>>
>>> NS_GET_PARENT
>>>       Returns  a  file  descriptor  that refers to a parent namespace.
>>>       This ioctl(2) can be used for pid and user namespaces. For  user
>>>       namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
>>>       ing.

For each of the above, I think it is worth mentioning that the
close-on-exec flag is set for the returned file descriptor.

>>>
>>> In addition to generic ioctl(2) errors, the following specific ones can
>>> occur:
>>>
>>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>>>
>>> EPERM  The  requested  namespace  is  outside  of the current namespace
>>>       scope.

Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial
user namespace"?

>>>
>>> ENOENT ns_fd refers to the init namespace.
>>
>>
>> Thanks for this. But still part of the question remains unanswered.
>> How do we (in user-space) use the file descriptors to answer any of
>> the questions that this patch series was designed to solve? (This
>> info should be in the commit message and the man-pages patch.)
>
> I'm sorry, but I am not sure that I understand what you ask.
>
> Here are the origin questions:
> Someone else then asked me a question that led me to wonder about
> generally introspecting on the parental relationships between user
> namespaces and the association of other namespaces types with user
> namespaces. One use would be visualization, in order to understand the
> running system. Another would be to answer the question I already
> mentioned: what capability does process X have to perform operations
> on a resource governed by namespace Y?
>
> Here is an example which shows how we can get the owning namespace
> inode number by using these ioctl-s.
>
> $ ls -l /proc/13929/ns/pid
> lrwxrwxrwx 1 root root 0 Jul 22 21:03 /proc/13929/ns/pid -> 'pid:[4026532228]'
>
> $ ./nsowner /proc/13929/ns/pid
> user:[4026532227]
>
> The owning user namespace for pid:[4026532228] is user:[4026532227].
>
> The nsowner  tool is cimpiled from this code:
>
> int main(int argc, char *argv[])
> {
>         char buf[128], path[] = "/proc/self/fd/0123456789";
>         int ns, uns, ret;
>
>         ns = open(argv[1], O_RDONLY);
>         if (ns < 0)
>                 return 1;
>
>         uns = ioctl(ns, NS_GET_USERNS);
>         if (uns < 0)
>                 return 1;
>
>         snprintf(path, sizeof(path), "/proc/self/fd/%d", uns);
>         ret = readlink(path, buf, sizeof(buf) - 1);
>         if (ret < 0)
>                 return 1;
>         buf[ret] = 0;
>
>         printf("%s\n", buf);
>
>         return 0;
> }

So, from my point of view, the important piece that was missing from
your commit message was the note to use readlink("/proc/self/fd/%d")
on the returned FDs. I think that detail needs to be part of the
commit message (and also the man page text). I think it even be
helpful to include the above program as part of the commit message:
it helps people more quickly grasp the API.

> Does this example answer to the origin question?

Yes.

>If it isn't, could
> you eloborate what you expect to see here.
>
> And I wrote one more example which show all relationships between
> namespaces. It enumirates all processes in a system, collects all
> namespaces and determins parent and owning namespaces for each of
> them, then it constructs a namespace tree and shows it.
>
> Here is a code: https://gist.github.com/avagin/db805f95e15ffb0af7e559dbb8de4418

That's great! Thanks!
  
> Here is an example of output for my test system:
> [root@fc24 nsfs]# ./nstree
> user:[4026531837]
>  \__  mnt:[4026532203]
>  \__  ipc:[4026531839]
>  \__  user:[4026532224]
>      \__  user:[4026532226]
>          \__  user:[4026532227]
>              \__  pid:[4026532228]
>      \__  pid:[4026532225]
>          \__  pid:[4026532228]
>  \__  user:[4026532221]
>      \__  pid:[4026532222]
>      \__  user:[4026532223]
>  \__  mnt:[4026532211]
>  \__  uts:[4026531838]
>  \__  cgroup:[4026531835]
>  \__  pid:[4026531836]
>      \__  pid:[4026532225]
>          \__  pid:[4026532228]
>      \__  pid:[4026532222]
>  \__  mnt:[4026531857]
>  \__  mnt:[4026531840]
>  \__  net:[4026531957]

Cheers,

Michael

>>>>> [1] https://lkml.org/lkml/2016/7/6/158
>>>>> [2] https://lkml.org/lkml/2016/7/9/101

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-25 11:47                 ` Michael Kerrisk (man-pages)
@ 2016-07-25 13:18                     ` Eric W. Biederman
  -1 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-25 13:18 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Serge Hallyn, Andrey Vagin, Linux API, Linux Containers, LKML,
	criu-GEFAQzZX7r8dnm+yROfE0A, Alexander Viro, linux-fsdevel,
	James Bottomley, Andrew Vagin

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> Hi Andrey,
>
> On 07/22/2016 08:25 PM, Andrey Vagin wrote:
>> On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
>> <mtk.manpages@gmail.com> wrote:
>>> Hi Andrey,
>>>
>>>
>>> On 07/21/2016 11:06 PM, Andrew Vagin wrote:
>>>>
>>>> On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages)
>>>> wrote:
>>>>>
>>>>> Hi Andrey,
>>>>>
>>>>> On 07/14/2016 08:20 PM, Andrey Vagin wrote:
>>>>
>>>>
>>>> <snip>
>>>>
>>>>>
>>>>> Could you add here an of the API in detail: what do these FDs refer to,
>>>>> and how do you use them to solve the use case? And could you you add
>>>>> that info to the commit messages please.
>>>>
>>>>
>>>> Hi Michael,
>>>>
>>>> A patch for man-pages is attached. It adds the following text to
>>>> namespaces(7).
>>>>
>>>> Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
>>>> pace file descriptors.  The correct syntax is:
>>>>
>>>>       fd = ioctl(ns_fd, ioctl_type);
>>>>
>>>> where ioctl_type is one of the following:
>>>>
>>>> NS_GET_USERNS
>>>>       Returns a file descriptor that refers to an owning  user  names‐
>>>>       pace.
>>>>
>>>> NS_GET_PARENT
>>>>       Returns  a  file  descriptor  that refers to a parent namespace.
>>>>       This ioctl(2) can be used for pid and user namespaces. For  user
>>>>       namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
>>>>       ing.
>
> For each of the above, I think it is worth mentioning that the
> close-on-exec flag is set for the returned file descriptor.

Hmm.  That is an odd default.

>>>>
>>>> In addition to generic ioctl(2) errors, the following specific ones can
>>>> occur:
>>>>
>>>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>>>>
>>>> EPERM  The  requested  namespace  is  outside  of the current namespace
>>>>       scope.
>
> Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial
> user namespace"?

Having looked at that bit of code I don't think capabilities really
have a role to play.

>>>> ENOENT ns_fd refers to the init namespace.
>>>
>>>
>>> Thanks for this. But still part of the question remains unanswered.
>>> How do we (in user-space) use the file descriptors to answer any of
>>> the questions that this patch series was designed to solve? (This
>>> info should be in the commit message and the man-pages patch.)
>>
>> I'm sorry, but I am not sure that I understand what you ask.
>>
>> Here are the origin questions:
>> Someone else then asked me a question that led me to wonder about
>> generally introspecting on the parental relationships between user
>> namespaces and the association of other namespaces types with user
>> namespaces. One use would be visualization, in order to understand the
>> running system. Another would be to answer the question I already
>> mentioned: what capability does process X have to perform operations
>> on a resource governed by namespace Y?
>>
>> Here is an example which shows how we can get the owning namespace
>> inode number by using these ioctl-s.
>>
>> $ ls -l /proc/13929/ns/pid
>> lrwxrwxrwx 1 root root 0 Jul 22 21:03 /proc/13929/ns/pid -> 'pid:[4026532228]'
>>
>> $ ./nsowner /proc/13929/ns/pid
>> user:[4026532227]
>>
>> The owning user namespace for pid:[4026532228] is user:[4026532227].
>>
>> The nsowner  tool is cimpiled from this code:
>>
>> int main(int argc, char *argv[])
>> {
>>         char buf[128], path[] = "/proc/self/fd/0123456789";
>>         int ns, uns, ret;
>>
>>         ns = open(argv[1], O_RDONLY);
>>         if (ns < 0)
>>                 return 1;
>>
>>         uns = ioctl(ns, NS_GET_USERNS);
>>         if (uns < 0)
>>                 return 1;
>>
>>         snprintf(path, sizeof(path), "/proc/self/fd/%d", uns);
>>         ret = readlink(path, buf, sizeof(buf) - 1);
>>         if (ret < 0)
>>                 return 1;
>>         buf[ret] = 0;
>>
>>         printf("%s\n", buf);
>>
>>         return 0;
>> }
>
> So, from my point of view, the important piece that was missing from
> your commit message was the note to use readlink("/proc/self/fd/%d")
> on the returned FDs. I think that detail needs to be part of the
> commit message (and also the man page text). I think it even be
> helpful to include the above program as part of the commit message:
> it helps people more quickly grasp the API.

Please, please make the standard way to compare these things fstat.
That is much less magic than a symlink, and a little more future proof.
Possibly even kcmp.

At some point we will care about migrating a migrating sub-container and we
may have to have some minor changes.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-25 13:18                     ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-25 13:18 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Andrey Vagin, Serge Hallyn, Andrew Vagin, criu, Linux API,
	Linux Containers, LKML, James Bottomley, linux-fsdevel,
	Alexander Viro

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> Hi Andrey,
>
> On 07/22/2016 08:25 PM, Andrey Vagin wrote:
>> On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
>> <mtk.manpages@gmail.com> wrote:
>>> Hi Andrey,
>>>
>>>
>>> On 07/21/2016 11:06 PM, Andrew Vagin wrote:
>>>>
>>>> On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages)
>>>> wrote:
>>>>>
>>>>> Hi Andrey,
>>>>>
>>>>> On 07/14/2016 08:20 PM, Andrey Vagin wrote:
>>>>
>>>>
>>>> <snip>
>>>>
>>>>>
>>>>> Could you add here an of the API in detail: what do these FDs refer to,
>>>>> and how do you use them to solve the use case? And could you you add
>>>>> that info to the commit messages please.
>>>>
>>>>
>>>> Hi Michael,
>>>>
>>>> A patch for man-pages is attached. It adds the following text to
>>>> namespaces(7).
>>>>
>>>> Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
>>>> pace file descriptors.  The correct syntax is:
>>>>
>>>>       fd = ioctl(ns_fd, ioctl_type);
>>>>
>>>> where ioctl_type is one of the following:
>>>>
>>>> NS_GET_USERNS
>>>>       Returns a file descriptor that refers to an owning  user  names‐
>>>>       pace.
>>>>
>>>> NS_GET_PARENT
>>>>       Returns  a  file  descriptor  that refers to a parent namespace.
>>>>       This ioctl(2) can be used for pid and user namespaces. For  user
>>>>       namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
>>>>       ing.
>
> For each of the above, I think it is worth mentioning that the
> close-on-exec flag is set for the returned file descriptor.

Hmm.  That is an odd default.

>>>>
>>>> In addition to generic ioctl(2) errors, the following specific ones can
>>>> occur:
>>>>
>>>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>>>>
>>>> EPERM  The  requested  namespace  is  outside  of the current namespace
>>>>       scope.
>
> Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial
> user namespace"?

Having looked at that bit of code I don't think capabilities really
have a role to play.

>>>> ENOENT ns_fd refers to the init namespace.
>>>
>>>
>>> Thanks for this. But still part of the question remains unanswered.
>>> How do we (in user-space) use the file descriptors to answer any of
>>> the questions that this patch series was designed to solve? (This
>>> info should be in the commit message and the man-pages patch.)
>>
>> I'm sorry, but I am not sure that I understand what you ask.
>>
>> Here are the origin questions:
>> Someone else then asked me a question that led me to wonder about
>> generally introspecting on the parental relationships between user
>> namespaces and the association of other namespaces types with user
>> namespaces. One use would be visualization, in order to understand the
>> running system. Another would be to answer the question I already
>> mentioned: what capability does process X have to perform operations
>> on a resource governed by namespace Y?
>>
>> Here is an example which shows how we can get the owning namespace
>> inode number by using these ioctl-s.
>>
>> $ ls -l /proc/13929/ns/pid
>> lrwxrwxrwx 1 root root 0 Jul 22 21:03 /proc/13929/ns/pid -> 'pid:[4026532228]'
>>
>> $ ./nsowner /proc/13929/ns/pid
>> user:[4026532227]
>>
>> The owning user namespace for pid:[4026532228] is user:[4026532227].
>>
>> The nsowner  tool is cimpiled from this code:
>>
>> int main(int argc, char *argv[])
>> {
>>         char buf[128], path[] = "/proc/self/fd/0123456789";
>>         int ns, uns, ret;
>>
>>         ns = open(argv[1], O_RDONLY);
>>         if (ns < 0)
>>                 return 1;
>>
>>         uns = ioctl(ns, NS_GET_USERNS);
>>         if (uns < 0)
>>                 return 1;
>>
>>         snprintf(path, sizeof(path), "/proc/self/fd/%d", uns);
>>         ret = readlink(path, buf, sizeof(buf) - 1);
>>         if (ret < 0)
>>                 return 1;
>>         buf[ret] = 0;
>>
>>         printf("%s\n", buf);
>>
>>         return 0;
>> }
>
> So, from my point of view, the important piece that was missing from
> your commit message was the note to use readlink("/proc/self/fd/%d")
> on the returned FDs. I think that detail needs to be part of the
> commit message (and also the man page text). I think it even be
> helpful to include the above program as part of the commit message:
> it helps people more quickly grasp the API.

Please, please make the standard way to compare these things fstat.
That is much less magic than a symlink, and a little more future proof.
Possibly even kcmp.

At some point we will care about migrating a migrating sub-container and we
may have to have some minor changes.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-25 13:18                     ` Eric W. Biederman
@ 2016-07-25 14:46                         ` Michael Kerrisk (man-pages)
  -1 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-25 14:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge Hallyn, Andrew Vagin, Linux API, Linux Containers, LKML,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, linux-fsdevel,
	James Bottomley, Andrey Vagin

Hi Eric,

On 07/25/2016 03:18 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>
>> Hi Andrey,
>>
>> On 07/22/2016 08:25 PM, Andrey Vagin wrote:
>>> On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
>>> <mtk.manpages@gmail.com> wrote:
>>>> Hi Andrey,
>>>>
>>>>
>>>> On 07/21/2016 11:06 PM, Andrew Vagin wrote:
>>>>>
>>>>> On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages)
>>>>> wrote:
>>>>>>
>>>>>> Hi Andrey,
>>>>>>
>>>>>> On 07/14/2016 08:20 PM, Andrey Vagin wrote:
>>>>>
>>>>>
>>>>> <snip>
>>>>>
>>>>>>
>>>>>> Could you add here an of the API in detail: what do these FDs refer to,
>>>>>> and how do you use them to solve the use case? And could you you add
>>>>>> that info to the commit messages please.
>>>>>
>>>>>
>>>>> Hi Michael,
>>>>>
>>>>> A patch for man-pages is attached. It adds the following text to
>>>>> namespaces(7).
>>>>>
>>>>> Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
>>>>> pace file descriptors.  The correct syntax is:
>>>>>
>>>>>       fd = ioctl(ns_fd, ioctl_type);
>>>>>
>>>>> where ioctl_type is one of the following:
>>>>>
>>>>> NS_GET_USERNS
>>>>>       Returns a file descriptor that refers to an owning  user  names‐
>>>>>       pace.
>>>>>
>>>>> NS_GET_PARENT
>>>>>       Returns  a  file  descriptor  that refers to a parent namespace.
>>>>>       This ioctl(2) can be used for pid and user namespaces. For  user
>>>>>       namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
>>>>>       ing.
>>
>> For each of the above, I think it is worth mentioning that the
>> close-on-exec flag is set for the returned file descriptor.
>
> Hmm.  That is an odd default.

Why do you say that? It's pretty common as the default for various
APIs that create new FDs these days. (There's of course a strong argument
that the original UNIX default was a design blunder...)

>>>>>
>>>>> In addition to generic ioctl(2) errors, the following specific ones can
>>>>> occur:
>>>>>
>>>>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>>>>>
>>>>> EPERM  The  requested  namespace  is  outside  of the current namespace
>>>>>       scope.
>>
>> Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial
>> user namespace"?
>
> Having looked at that bit of code I don't think capabilities really
> have a role to play.

Yes, I caught up with that now. I await to see how this plays out
in the next patch version.

>>>>> ENOENT ns_fd refers to the init namespace.
>>>>
>>>>
>>>> Thanks for this. But still part of the question remains unanswered.
>>>> How do we (in user-space) use the file descriptors to answer any of
>>>> the questions that this patch series was designed to solve? (This
>>>> info should be in the commit message and the man-pages patch.)
>>>
>>> I'm sorry, but I am not sure that I understand what you ask.
>>>
>>> Here are the origin questions:
>>> Someone else then asked me a question that led me to wonder about
>>> generally introspecting on the parental relationships between user
>>> namespaces and the association of other namespaces types with user
>>> namespaces. One use would be visualization, in order to understand the
>>> running system. Another would be to answer the question I already
>>> mentioned: what capability does process X have to perform operations
>>> on a resource governed by namespace Y?
>>>
>>> Here is an example which shows how we can get the owning namespace
>>> inode number by using these ioctl-s.
>>>
>>> $ ls -l /proc/13929/ns/pid
>>> lrwxrwxrwx 1 root root 0 Jul 22 21:03 /proc/13929/ns/pid -> 'pid:[4026532228]'
>>>
>>> $ ./nsowner /proc/13929/ns/pid
>>> user:[4026532227]
>>>
>>> The owning user namespace for pid:[4026532228] is user:[4026532227].
>>>
>>> The nsowner  tool is cimpiled from this code:
>>>
>>> int main(int argc, char *argv[])
>>> {
>>>         char buf[128], path[] = "/proc/self/fd/0123456789";
>>>         int ns, uns, ret;
>>>
>>>         ns = open(argv[1], O_RDONLY);
>>>         if (ns < 0)
>>>                 return 1;
>>>
>>>         uns = ioctl(ns, NS_GET_USERNS);
>>>         if (uns < 0)
>>>                 return 1;
>>>
>>>         snprintf(path, sizeof(path), "/proc/self/fd/%d", uns);
>>>         ret = readlink(path, buf, sizeof(buf) - 1);
>>>         if (ret < 0)
>>>                 return 1;
>>>         buf[ret] = 0;
>>>
>>>         printf("%s\n", buf);
>>>
>>>         return 0;
>>> }
>>
>> So, from my point of view, the important piece that was missing from
>> your commit message was the note to use readlink("/proc/self/fd/%d")
>> on the returned FDs. I think that detail needs to be part of the
>> commit message (and also the man page text). I think it even be
>> helpful to include the above program as part of the commit message:
>> it helps people more quickly grasp the API.
>
> Please, please make the standard way to compare these things fstat.
> That is much less magic than a symlink, and a little more future proof.
> Possibly even kcmp.

As in fstat() to get the st_ino field, right?

Cheers,

Michael

> At some point we will care about migrating a migrating sub-container and we
> may have to have some minor changes.
>
> Eric
>


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-25 14:46                         ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-25 14:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: mtk.manpages, Andrey Vagin, Serge Hallyn, Andrew Vagin, criu,
	Linux API, Linux Containers, LKML, James Bottomley,
	linux-fsdevel, Alexander Viro

Hi Eric,

On 07/25/2016 03:18 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>
>> Hi Andrey,
>>
>> On 07/22/2016 08:25 PM, Andrey Vagin wrote:
>>> On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
>>> <mtk.manpages@gmail.com> wrote:
>>>> Hi Andrey,
>>>>
>>>>
>>>> On 07/21/2016 11:06 PM, Andrew Vagin wrote:
>>>>>
>>>>> On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages)
>>>>> wrote:
>>>>>>
>>>>>> Hi Andrey,
>>>>>>
>>>>>> On 07/14/2016 08:20 PM, Andrey Vagin wrote:
>>>>>
>>>>>
>>>>> <snip>
>>>>>
>>>>>>
>>>>>> Could you add here an of the API in detail: what do these FDs refer to,
>>>>>> and how do you use them to solve the use case? And could you you add
>>>>>> that info to the commit messages please.
>>>>>
>>>>>
>>>>> Hi Michael,
>>>>>
>>>>> A patch for man-pages is attached. It adds the following text to
>>>>> namespaces(7).
>>>>>
>>>>> Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
>>>>> pace file descriptors.  The correct syntax is:
>>>>>
>>>>>       fd = ioctl(ns_fd, ioctl_type);
>>>>>
>>>>> where ioctl_type is one of the following:
>>>>>
>>>>> NS_GET_USERNS
>>>>>       Returns a file descriptor that refers to an owning  user  names‐
>>>>>       pace.
>>>>>
>>>>> NS_GET_PARENT
>>>>>       Returns  a  file  descriptor  that refers to a parent namespace.
>>>>>       This ioctl(2) can be used for pid and user namespaces. For  user
>>>>>       namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
>>>>>       ing.
>>
>> For each of the above, I think it is worth mentioning that the
>> close-on-exec flag is set for the returned file descriptor.
>
> Hmm.  That is an odd default.

Why do you say that? It's pretty common as the default for various
APIs that create new FDs these days. (There's of course a strong argument
that the original UNIX default was a design blunder...)

>>>>>
>>>>> In addition to generic ioctl(2) errors, the following specific ones can
>>>>> occur:
>>>>>
>>>>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>>>>>
>>>>> EPERM  The  requested  namespace  is  outside  of the current namespace
>>>>>       scope.
>>
>> Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial
>> user namespace"?
>
> Having looked at that bit of code I don't think capabilities really
> have a role to play.

Yes, I caught up with that now. I await to see how this plays out
in the next patch version.

>>>>> ENOENT ns_fd refers to the init namespace.
>>>>
>>>>
>>>> Thanks for this. But still part of the question remains unanswered.
>>>> How do we (in user-space) use the file descriptors to answer any of
>>>> the questions that this patch series was designed to solve? (This
>>>> info should be in the commit message and the man-pages patch.)
>>>
>>> I'm sorry, but I am not sure that I understand what you ask.
>>>
>>> Here are the origin questions:
>>> Someone else then asked me a question that led me to wonder about
>>> generally introspecting on the parental relationships between user
>>> namespaces and the association of other namespaces types with user
>>> namespaces. One use would be visualization, in order to understand the
>>> running system. Another would be to answer the question I already
>>> mentioned: what capability does process X have to perform operations
>>> on a resource governed by namespace Y?
>>>
>>> Here is an example which shows how we can get the owning namespace
>>> inode number by using these ioctl-s.
>>>
>>> $ ls -l /proc/13929/ns/pid
>>> lrwxrwxrwx 1 root root 0 Jul 22 21:03 /proc/13929/ns/pid -> 'pid:[4026532228]'
>>>
>>> $ ./nsowner /proc/13929/ns/pid
>>> user:[4026532227]
>>>
>>> The owning user namespace for pid:[4026532228] is user:[4026532227].
>>>
>>> The nsowner  tool is cimpiled from this code:
>>>
>>> int main(int argc, char *argv[])
>>> {
>>>         char buf[128], path[] = "/proc/self/fd/0123456789";
>>>         int ns, uns, ret;
>>>
>>>         ns = open(argv[1], O_RDONLY);
>>>         if (ns < 0)
>>>                 return 1;
>>>
>>>         uns = ioctl(ns, NS_GET_USERNS);
>>>         if (uns < 0)
>>>                 return 1;
>>>
>>>         snprintf(path, sizeof(path), "/proc/self/fd/%d", uns);
>>>         ret = readlink(path, buf, sizeof(buf) - 1);
>>>         if (ret < 0)
>>>                 return 1;
>>>         buf[ret] = 0;
>>>
>>>         printf("%s\n", buf);
>>>
>>>         return 0;
>>> }
>>
>> So, from my point of view, the important piece that was missing from
>> your commit message was the note to use readlink("/proc/self/fd/%d")
>> on the returned FDs. I think that detail needs to be part of the
>> commit message (and also the man page text). I think it even be
>> helpful to include the above program as part of the commit message:
>> it helps people more quickly grasp the API.
>
> Please, please make the standard way to compare these things fstat.
> That is much less magic than a symlink, and a little more future proof.
> Possibly even kcmp.

As in fstat() to get the st_ino field, right?

Cheers,

Michael

> At some point we will care about migrating a migrating sub-container and we
> may have to have some minor changes.
>
> Eric
>


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]                         ` <44ca0e41-dc92-45b1-2a6c-c41a048a072d-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-07-25 14:54                           ` Serge E. Hallyn
  2016-07-25 14:59                             ` Eric W. Biederman
  1 sibling, 0 replies; 142+ messages in thread
From: Serge E. Hallyn @ 2016-07-25 14:54 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Serge Hallyn, Andrew Vagin, Linux API, Linux Containers, LKML,
	criu-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman, Andrey Vagin,
	linux-fsdevel, James Bottomley, Alexander Viro

Quoting Michael Kerrisk (man-pages) (mtk.manpages@gmail.com):
> Hi Eric,
> 
> On 07/25/2016 03:18 PM, Eric W. Biederman wrote:
> >"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> >
> >>Hi Andrey,
> >>
> >>On 07/22/2016 08:25 PM, Andrey Vagin wrote:
> >>>On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
> >>><mtk.manpages@gmail.com> wrote:
> >>>>Hi Andrey,
> >>>>
> >>>>
> >>>>On 07/21/2016 11:06 PM, Andrew Vagin wrote:
> >>>>>
> >>>>>On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages)
> >>>>>wrote:
> >>>>>>
> >>>>>>Hi Andrey,
> >>>>>>
> >>>>>>On 07/14/2016 08:20 PM, Andrey Vagin wrote:
> >>>>>
> >>>>>
> >>>>><snip>
> >>>>>
> >>>>>>
> >>>>>>Could you add here an of the API in detail: what do these FDs refer to,
> >>>>>>and how do you use them to solve the use case? And could you you add
> >>>>>>that info to the commit messages please.
> >>>>>
> >>>>>
> >>>>>Hi Michael,
> >>>>>
> >>>>>A patch for man-pages is attached. It adds the following text to
> >>>>>namespaces(7).
> >>>>>
> >>>>>Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
> >>>>>pace file descriptors.  The correct syntax is:
> >>>>>
> >>>>>      fd = ioctl(ns_fd, ioctl_type);
> >>>>>
> >>>>>where ioctl_type is one of the following:
> >>>>>
> >>>>>NS_GET_USERNS
> >>>>>      Returns a file descriptor that refers to an owning  user  names‐
> >>>>>      pace.
> >>>>>
> >>>>>NS_GET_PARENT
> >>>>>      Returns  a  file  descriptor  that refers to a parent namespace.
> >>>>>      This ioctl(2) can be used for pid and user namespaces. For  user
> >>>>>      namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
> >>>>>      ing.
> >>
> >>For each of the above, I think it is worth mentioning that the
> >>close-on-exec flag is set for the returned file descriptor.
> >
> >Hmm.  That is an odd default.
> 
> Why do you say that? It's pretty common as the default for various
> APIs that create new FDs these days. (There's of course a strong argument
> that the original UNIX default was a design blunder...)
> 
> >>>>>
> >>>>>In addition to generic ioctl(2) errors, the following specific ones can
> >>>>>occur:
> >>>>>
> >>>>>EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
> >>>>>
> >>>>>EPERM  The  requested  namespace  is  outside  of the current namespace
> >>>>>      scope.
> >>
> >>Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial
> >>user namespace"?
> >
> >Having looked at that bit of code I don't think capabilities really
> >have a role to play.
> 
> Yes, I caught up with that now. I await to see how this plays out
> in the next patch version.

Thanks - that had caught my eye but I hadn't had time to look into the
justification for this.  Hiding this kind of thing indeed seems wrong to
me, unless there is a really good justification for it, i.e. a way
to use that info in an exploit.

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]                         ` <44ca0e41-dc92-45b1-2a6c-c41a048a072d-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-07-25 14:54                           ` Serge E. Hallyn
  2016-07-25 14:59                             ` Eric W. Biederman
  1 sibling, 0 replies; 142+ messages in thread
From: Serge E. Hallyn @ 2016-07-25 14:54 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Eric W. Biederman, Serge Hallyn, Andrew Vagin, Linux API,
	Linux Containers, LKML, Alexander Viro, criu, linux-fsdevel,
	James Bottomley, Andrey Vagin

Quoting Michael Kerrisk (man-pages) (mtk.manpages@gmail.com):
> Hi Eric,
> 
> On 07/25/2016 03:18 PM, Eric W. Biederman wrote:
> >"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> >
> >>Hi Andrey,
> >>
> >>On 07/22/2016 08:25 PM, Andrey Vagin wrote:
> >>>On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
> >>><mtk.manpages@gmail.com> wrote:
> >>>>Hi Andrey,
> >>>>
> >>>>
> >>>>On 07/21/2016 11:06 PM, Andrew Vagin wrote:
> >>>>>
> >>>>>On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages)
> >>>>>wrote:
> >>>>>>
> >>>>>>Hi Andrey,
> >>>>>>
> >>>>>>On 07/14/2016 08:20 PM, Andrey Vagin wrote:
> >>>>>
> >>>>>
> >>>>><snip>
> >>>>>
> >>>>>>
> >>>>>>Could you add here an of the API in detail: what do these FDs refer to,
> >>>>>>and how do you use them to solve the use case? And could you you add
> >>>>>>that info to the commit messages please.
> >>>>>
> >>>>>
> >>>>>Hi Michael,
> >>>>>
> >>>>>A patch for man-pages is attached. It adds the following text to
> >>>>>namespaces(7).
> >>>>>
> >>>>>Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
> >>>>>pace file descriptors.  The correct syntax is:
> >>>>>
> >>>>>      fd = ioctl(ns_fd, ioctl_type);
> >>>>>
> >>>>>where ioctl_type is one of the following:
> >>>>>
> >>>>>NS_GET_USERNS
> >>>>>      Returns a file descriptor that refers to an owning  user  names‐
> >>>>>      pace.
> >>>>>
> >>>>>NS_GET_PARENT
> >>>>>      Returns  a  file  descriptor  that refers to a parent namespace.
> >>>>>      This ioctl(2) can be used for pid and user namespaces. For  user
> >>>>>      namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
> >>>>>      ing.
> >>
> >>For each of the above, I think it is worth mentioning that the
> >>close-on-exec flag is set for the returned file descriptor.
> >
> >Hmm.  That is an odd default.
> 
> Why do you say that? It's pretty common as the default for various
> APIs that create new FDs these days. (There's of course a strong argument
> that the original UNIX default was a design blunder...)
> 
> >>>>>
> >>>>>In addition to generic ioctl(2) errors, the following specific ones can
> >>>>>occur:
> >>>>>
> >>>>>EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
> >>>>>
> >>>>>EPERM  The  requested  namespace  is  outside  of the current namespace
> >>>>>      scope.
> >>
> >>Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial
> >>user namespace"?
> >
> >Having looked at that bit of code I don't think capabilities really
> >have a role to play.
> 
> Yes, I caught up with that now. I await to see how this plays out
> in the next patch version.

Thanks - that had caught my eye but I hadn't had time to look into the
justification for this.  Hiding this kind of thing indeed seems wrong to
me, unless there is a really good justification for it, i.e. a way
to use that info in an exploit.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-25 14:54                           ` Serge E. Hallyn
  0 siblings, 0 replies; 142+ messages in thread
From: Serge E. Hallyn @ 2016-07-25 14:54 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Eric W. Biederman, Serge Hallyn, Andrew Vagin, Linux API,
	Linux Containers, LKML, Alexander Viro,
	criu-GEFAQzZX7r8dnm+yROfE0A, linux-fsdevel, James Bottomley,
	Andrey Vagin

Quoting Michael Kerrisk (man-pages) (mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
> Hi Eric,
> 
> On 07/25/2016 03:18 PM, Eric W. Biederman wrote:
> >"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> >
> >>Hi Andrey,
> >>
> >>On 07/22/2016 08:25 PM, Andrey Vagin wrote:
> >>>On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
> >>><mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >>>>Hi Andrey,
> >>>>
> >>>>
> >>>>On 07/21/2016 11:06 PM, Andrew Vagin wrote:
> >>>>>
> >>>>>On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages)
> >>>>>wrote:
> >>>>>>
> >>>>>>Hi Andrey,
> >>>>>>
> >>>>>>On 07/14/2016 08:20 PM, Andrey Vagin wrote:
> >>>>>
> >>>>>
> >>>>><snip>
> >>>>>
> >>>>>>
> >>>>>>Could you add here an of the API in detail: what do these FDs refer to,
> >>>>>>and how do you use them to solve the use case? And could you you add
> >>>>>>that info to the commit messages please.
> >>>>>
> >>>>>
> >>>>>Hi Michael,
> >>>>>
> >>>>>A patch for man-pages is attached. It adds the following text to
> >>>>>namespaces(7).
> >>>>>
> >>>>>Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
> >>>>>pace file descriptors.  The correct syntax is:
> >>>>>
> >>>>>      fd = ioctl(ns_fd, ioctl_type);
> >>>>>
> >>>>>where ioctl_type is one of the following:
> >>>>>
> >>>>>NS_GET_USERNS
> >>>>>      Returns a file descriptor that refers to an owning  user  names‐
> >>>>>      pace.
> >>>>>
> >>>>>NS_GET_PARENT
> >>>>>      Returns  a  file  descriptor  that refers to a parent namespace.
> >>>>>      This ioctl(2) can be used for pid and user namespaces. For  user
> >>>>>      namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
> >>>>>      ing.
> >>
> >>For each of the above, I think it is worth mentioning that the
> >>close-on-exec flag is set for the returned file descriptor.
> >
> >Hmm.  That is an odd default.
> 
> Why do you say that? It's pretty common as the default for various
> APIs that create new FDs these days. (There's of course a strong argument
> that the original UNIX default was a design blunder...)
> 
> >>>>>
> >>>>>In addition to generic ioctl(2) errors, the following specific ones can
> >>>>>occur:
> >>>>>
> >>>>>EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
> >>>>>
> >>>>>EPERM  The  requested  namespace  is  outside  of the current namespace
> >>>>>      scope.
> >>
> >>Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial
> >>user namespace"?
> >
> >Having looked at that bit of code I don't think capabilities really
> >have a role to play.
> 
> Yes, I caught up with that now. I await to see how this plays out
> in the next patch version.

Thanks - that had caught my eye but I hadn't had time to look into the
justification for this.  Hiding this kind of thing indeed seems wrong to
me, unless there is a really good justification for it, i.e. a way
to use that info in an exploit.

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-25 14:46                         ` Michael Kerrisk (man-pages)
@ 2016-07-25 14:59                             ` Eric W. Biederman
  -1 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-25 14:59 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Serge Hallyn, Andrey Vagin, Linux API, Linux Containers, LKML,
	criu-GEFAQzZX7r8dnm+yROfE0A, Alexander Viro, linux-fsdevel,
	James Bottomley, Andrew Vagin

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> Hi Eric,
>
> On 07/25/2016 03:18 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>
>>> Hi Andrey,
>>>
>>> On 07/22/2016 08:25 PM, Andrey Vagin wrote:
>>>> On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
>>>> <mtk.manpages@gmail.com> wrote:
>>>>> Hi Andrey,
>>>>>
>>>>>
>>>>> On 07/21/2016 11:06 PM, Andrew Vagin wrote:
>>>>>>
[snip]
>>>>>> where ioctl_type is one of the following:
>>>>>>
>>>>>> NS_GET_USERNS
>>>>>>       Returns a file descriptor that refers to an owning  user  names‐
>>>>>>       pace.
>>>>>>
>>>>>> NS_GET_PARENT
>>>>>>       Returns  a  file  descriptor  that refers to a parent namespace.
>>>>>>       This ioctl(2) can be used for pid and user namespaces. For  user
>>>>>>       namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
>>>>>>       ing.
>>>
>>> For each of the above, I think it is worth mentioning that the
>>> close-on-exec flag is set for the returned file descriptor.
>>
>> Hmm.  That is an odd default.
>
> Why do you say that? It's pretty common as the default for various
> APIs that create new FDs these days. (There's of course a strong argument
> that the original UNIX default was a design blunder...)

Interesting.  I haven't kept up on that, but it seems reasonable.

[snip]
>>> So, from my point of view, the important piece that was missing from
>>> your commit message was the note to use readlink("/proc/self/fd/%d")
>>> on the returned FDs. I think that detail needs to be part of the
>>> commit message (and also the man page text). I think it even be
>>> helpful to include the above program as part of the commit message:
>>> it helps people more quickly grasp the API.
>>
>> Please, please make the standard way to compare these things fstat.
>> That is much less magic than a symlink, and a little more future proof.
>> Possibly even kcmp.
>
> As in fstat() to get the st_ino field, right?

Both the st_ino and st_dev fields.

The most likely change to support checkpoint/restart in the future is to
preserve st_ino across migrations and instantiate a different instance
of nsfs to hold the inode numbers from the previous machine.

We would need to handle the preservation carefully or else there is
a chance that two namespace file descriptors (collected from different
sources) with different st_dev and st_ino fields may actuall refer to
the same object.

Which is a long way of saying we have the st_dev field please use it,
it may matter at some point.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-25 14:59                             ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-25 14:59 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Andrey Vagin, Serge Hallyn, Andrew Vagin, criu, Linux API,
	Linux Containers, LKML, James Bottomley, linux-fsdevel,
	Alexander Viro

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> Hi Eric,
>
> On 07/25/2016 03:18 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>
>>> Hi Andrey,
>>>
>>> On 07/22/2016 08:25 PM, Andrey Vagin wrote:
>>>> On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
>>>> <mtk.manpages@gmail.com> wrote:
>>>>> Hi Andrey,
>>>>>
>>>>>
>>>>> On 07/21/2016 11:06 PM, Andrew Vagin wrote:
>>>>>>
[snip]
>>>>>> where ioctl_type is one of the following:
>>>>>>
>>>>>> NS_GET_USERNS
>>>>>>       Returns a file descriptor that refers to an owning  user  names‐
>>>>>>       pace.
>>>>>>
>>>>>> NS_GET_PARENT
>>>>>>       Returns  a  file  descriptor  that refers to a parent namespace.
>>>>>>       This ioctl(2) can be used for pid and user namespaces. For  user
>>>>>>       namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
>>>>>>       ing.
>>>
>>> For each of the above, I think it is worth mentioning that the
>>> close-on-exec flag is set for the returned file descriptor.
>>
>> Hmm.  That is an odd default.
>
> Why do you say that? It's pretty common as the default for various
> APIs that create new FDs these days. (There's of course a strong argument
> that the original UNIX default was a design blunder...)

Interesting.  I haven't kept up on that, but it seems reasonable.

[snip]
>>> So, from my point of view, the important piece that was missing from
>>> your commit message was the note to use readlink("/proc/self/fd/%d")
>>> on the returned FDs. I think that detail needs to be part of the
>>> commit message (and also the man page text). I think it even be
>>> helpful to include the above program as part of the commit message:
>>> it helps people more quickly grasp the API.
>>
>> Please, please make the standard way to compare these things fstat.
>> That is much less magic than a symlink, and a little more future proof.
>> Possibly even kcmp.
>
> As in fstat() to get the st_ino field, right?

Both the st_ino and st_dev fields.

The most likely change to support checkpoint/restart in the future is to
preserve st_ino across migrations and instantiate a different instance
of nsfs to hold the inode numbers from the previous machine.

We would need to handle the preservation carefully or else there is
a chance that two namespace file descriptors (collected from different
sources) with different st_dev and st_ino fields may actuall refer to
the same object.

Which is a long way of saying we have the st_dev field please use it,
it may matter at some point.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]                           ` <20160725145445.GA19879-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2016-07-25 15:17                             ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-25 15:17 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Serge Hallyn, Andrew Vagin, Linux API, Linux Containers, LKML,
	criu-GEFAQzZX7r8dnm+yROfE0A, Michael Kerrisk (man-pages),
	Andrey Vagin, linux-fsdevel, James Bottomley, Alexander Viro

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Michael Kerrisk (man-pages) (mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
>> Hi Eric,
>> 
>> On 07/25/2016 03:18 PM, Eric W. Biederman wrote:
>> >"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>> >
>> >>Hi Andrey,
>> >>
>> >>On 07/22/2016 08:25 PM, Andrey Vagin wrote:
>> >>Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial
>> >>user namespace"?
>> >
>> >Having looked at that bit of code I don't think capabilities really
>> >have a role to play.
>> 
>> Yes, I caught up with that now. I await to see how this plays out
>> in the next patch version.
>
> Thanks - that had caught my eye but I hadn't had time to look into the
> justification for this.  Hiding this kind of thing indeed seems wrong to
> me, unless there is a really good justification for it, i.e. a way
> to use that info in an exploit.

To avoid breaking checkpoint/restart we need to limit information to the
namespaces the caller is a member of for the user and pid namespaces.

This roughly duplicates the parentage checks in ns_capable.

Conceptually this is the same as limiting .. in a chroot environment.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]                           ` <20160725145445.GA19879-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2016-07-25 15:17                             ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-25 15:17 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Michael Kerrisk (man-pages),
	Serge Hallyn, Andrew Vagin, Linux API, Linux Containers, LKML,
	Alexander Viro, criu, linux-fsdevel, James Bottomley,
	Andrey Vagin

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Michael Kerrisk (man-pages) (mtk.manpages@gmail.com):
>> Hi Eric,
>> 
>> On 07/25/2016 03:18 PM, Eric W. Biederman wrote:
>> >"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>> >
>> >>Hi Andrey,
>> >>
>> >>On 07/22/2016 08:25 PM, Andrey Vagin wrote:
>> >>Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial
>> >>user namespace"?
>> >
>> >Having looked at that bit of code I don't think capabilities really
>> >have a role to play.
>> 
>> Yes, I caught up with that now. I await to see how this plays out
>> in the next patch version.
>
> Thanks - that had caught my eye but I hadn't had time to look into the
> justification for this.  Hiding this kind of thing indeed seems wrong to
> me, unless there is a really good justification for it, i.e. a way
> to use that info in an exploit.

To avoid breaking checkpoint/restart we need to limit information to the
namespaces the caller is a member of for the user and pid namespaces.

This roughly duplicates the parentage checks in ns_capable.

Conceptually this is the same as limiting .. in a chroot environment.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-25 15:17                             ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-25 15:17 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Michael Kerrisk (man-pages),
	Serge Hallyn, Andrew Vagin, Linux API, Linux Containers, LKML,
	Alexander Viro, criu@openvz.org, linux-fsdevel, James Bottomley,
	Andrey Vagin

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Michael Kerrisk (man-pages) (mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
>> Hi Eric,
>> 
>> On 07/25/2016 03:18 PM, Eric W. Biederman wrote:
>> >"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>> >
>> >>Hi Andrey,
>> >>
>> >>On 07/22/2016 08:25 PM, Andrey Vagin wrote:
>> >>Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial
>> >>user namespace"?
>> >
>> >Having looked at that bit of code I don't think capabilities really
>> >have a role to play.
>> 
>> Yes, I caught up with that now. I await to see how this plays out
>> in the next patch version.
>
> Thanks - that had caught my eye but I hadn't had time to look into the
> justification for this.  Hiding this kind of thing indeed seems wrong to
> me, unless there is a really good justification for it, i.e. a way
> to use that info in an exploit.

To avoid breaking checkpoint/restart we need to limit information to the
namespaces the caller is a member of for the user and pid namespaces.

This roughly duplicates the parentage checks in ns_capable.

Conceptually this is the same as limiting .. in a chroot environment.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-24  5:10     ` Eric W. Biederman
@ 2016-07-26  2:07         ` Andrew Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-26  2:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge Hallyn, Andrey Vagin, criu-GEFAQzZX7r8dnm+yROfE0A,
	Linux API, Linux Containers, LKML, James Bottomley,
	Alexander Viro, linux-fsdevel, Michael Kerrisk (man-pages)

On Sun, Jul 24, 2016 at 12:10:21AM -0500, Eric W. Biederman wrote:
> Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:
> 
> > Hello,
> >
> > I forgot to add --cc-cover for git send-email, so everyone who is in
> > Cc got only a cover letter. All messages were sent in mail lists.
> >
> > Sorry for inconvenience.
> 
> Mostly the code looked sensible.  But I had a couple of issues.
> Resend this in September (when the merge window is closed and I am back
> from vacation) and I will give this a thorough review and get this
> merged.  Or possibly next week if Linus releases another -rc

Eric, thank you for the detailed comments. I will rework this series and
send it after the merge window.

> 
> > On Thu, Jul 14, 2016 at 11:20 AM, Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> >> Each namespace has an owning user namespace and now there is not way
> >> to discover these relationships.
> >>
> >> Pid and user namepaces are hierarchical. There is no way to discover
> >> parent-child relationships too.
> >>
> >> Why we may want to know relationships between namespaces?
> >>
> >> One use would be visualization, in order to understand the running system.
> >> Another would be to answer the question: what capability does process X have to
> >> perform operations on a resource governed by namespace Y?
> >>
> >> One more use-case (which usually called abnormal) is checkpoint/restart.
> >> In CRIU we age going to dump and restore nested namespaces.
> >>
> >> There [1] was a discussion about which interface to choose to determing
> >> relationships between namespaces.
> >>
> >> Eric suggested to add two ioctl-s [2]:
> >>> Grumble, Grumble.  I think this may actually a case for creating ioctls
> >>> for these two cases.  Now that random nsfs file descriptors are bind
> >>> mountable the original reason for using proc files is not as pressing.
> >>>
> >>> One ioctl for the user namespace that owns a file descriptor.
> >>> One ioctl for the parent namespace of a namespace file descriptor.
> >>
> >> Here is an implementaions of these ioctl-s.
> >>
> >> [1] https://lkml.org/lkml/2016/7/6/158
> >> [2] https://lkml.org/lkml/2016/7/9/101
> >>
> >> Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> >> Cc: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
> >> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >> Cc: "W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org>
> >> Cc: Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
> >> Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> 
> 
> Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-26  2:07         ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-26  2:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrey Vagin, LKML, James Bottomley, Serge Hallyn, Linux API,
	Linux Containers, Alexander Viro, criu, linux-fsdevel,
	Michael Kerrisk (man-pages)

On Sun, Jul 24, 2016 at 12:10:21AM -0500, Eric W. Biederman wrote:
> Andrey Vagin <avagin@openvz.org> writes:
> 
> > Hello,
> >
> > I forgot to add --cc-cover for git send-email, so everyone who is in
> > Cc got only a cover letter. All messages were sent in mail lists.
> >
> > Sorry for inconvenience.
> 
> Mostly the code looked sensible.  But I had a couple of issues.
> Resend this in September (when the merge window is closed and I am back
> from vacation) and I will give this a thorough review and get this
> merged.  Or possibly next week if Linus releases another -rc

Eric, thank you for the detailed comments. I will rework this series and
send it after the merge window.

> 
> > On Thu, Jul 14, 2016 at 11:20 AM, Andrey Vagin <avagin@openvz.org> wrote:
> >> Each namespace has an owning user namespace and now there is not way
> >> to discover these relationships.
> >>
> >> Pid and user namepaces are hierarchical. There is no way to discover
> >> parent-child relationships too.
> >>
> >> Why we may want to know relationships between namespaces?
> >>
> >> One use would be visualization, in order to understand the running system.
> >> Another would be to answer the question: what capability does process X have to
> >> perform operations on a resource governed by namespace Y?
> >>
> >> One more use-case (which usually called abnormal) is checkpoint/restart.
> >> In CRIU we age going to dump and restore nested namespaces.
> >>
> >> There [1] was a discussion about which interface to choose to determing
> >> relationships between namespaces.
> >>
> >> Eric suggested to add two ioctl-s [2]:
> >>> Grumble, Grumble.  I think this may actually a case for creating ioctls
> >>> for these two cases.  Now that random nsfs file descriptors are bind
> >>> mountable the original reason for using proc files is not as pressing.
> >>>
> >>> One ioctl for the user namespace that owns a file descriptor.
> >>> One ioctl for the parent namespace of a namespace file descriptor.
> >>
> >> Here is an implementaions of these ioctl-s.
> >>
> >> [1] https://lkml.org/lkml/2016/7/6/158
> >> [2] https://lkml.org/lkml/2016/7/9/101
> >>
> >> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> >> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
> >> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
> >> Cc: "W. Trevor King" <wking@tremily.us>
> >> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> >> Cc: Serge Hallyn <serge.hallyn@canonical.com>
> 
> 
> Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-25 14:59                             ` Eric W. Biederman
@ 2016-07-26  2:54                                 ` Andrew Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-26  2:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge Hallyn, Andrey Vagin, Linux API, Linux Containers, LKML,
	criu-GEFAQzZX7r8dnm+yROfE0A, Michael Kerrisk (man-pages),
	linux-fsdevel, James Bottomley, Alexander Viro

On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

[snip]

> [snip]
> >>> So, from my point of view, the important piece that was missing from
> >>> your commit message was the note to use readlink("/proc/self/fd/%d")
> >>> on the returned FDs. I think that detail needs to be part of the
> >>> commit message (and also the man page text). I think it even be
> >>> helpful to include the above program as part of the commit message:
> >>> it helps people more quickly grasp the API.
> >>
> >> Please, please make the standard way to compare these things fstat.
> >> That is much less magic than a symlink, and a little more future proof.
> >> Possibly even kcmp.

I like the idea to use kcmp to compare namespaces. I am going to add this
functionality to kcmp and describe all these in the man page.

> >
> > As in fstat() to get the st_ino field, right?
> 
> Both the st_ino and st_dev fields.
> 
> The most likely change to support checkpoint/restart in the future is to
> preserve st_ino across migrations and instantiate a different instance
> of nsfs to hold the inode numbers from the previous machine.

It sounds tricky. BTW: Actually this is not only one places where we have
this sort of problem. For example, now mount id-s are not preserved when
a container is migrated. The same problem is applied to tmpfs, where
inode numbers are not preserved for files. 

> 
> We would need to handle the preservation carefully or else there is
> a chance that two namespace file descriptors (collected from different
> sources) with different st_dev and st_ino fields may actuall refer to
> the same object.
> 
> Which is a long way of saying we have the st_dev field please use it,
> it may matter at some point.
> 
> Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-26  2:54                                 ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-26  2:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Kerrisk (man-pages),
	Andrey Vagin, Serge Hallyn, criu, Linux API, Linux Containers,
	LKML, James Bottomley, linux-fsdevel, Alexander Viro

On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

[snip]

> [snip]
> >>> So, from my point of view, the important piece that was missing from
> >>> your commit message was the note to use readlink("/proc/self/fd/%d")
> >>> on the returned FDs. I think that detail needs to be part of the
> >>> commit message (and also the man page text). I think it even be
> >>> helpful to include the above program as part of the commit message:
> >>> it helps people more quickly grasp the API.
> >>
> >> Please, please make the standard way to compare these things fstat.
> >> That is much less magic than a symlink, and a little more future proof.
> >> Possibly even kcmp.

I like the idea to use kcmp to compare namespaces. I am going to add this
functionality to kcmp and describe all these in the man page.

> >
> > As in fstat() to get the st_ino field, right?
> 
> Both the st_ino and st_dev fields.
> 
> The most likely change to support checkpoint/restart in the future is to
> preserve st_ino across migrations and instantiate a different instance
> of nsfs to hold the inode numbers from the previous machine.

It sounds tricky. BTW: Actually this is not only one places where we have
this sort of problem. For example, now mount id-s are not preserved when
a container is migrated. The same problem is applied to tmpfs, where
inode numbers are not preserved for files. 

> 
> We would need to handle the preservation carefully or else there is
> a chance that two namespace file descriptors (collected from different
> sources) with different st_dev and st_ino fields may actuall refer to
> the same object.
> 
> Which is a long way of saying we have the st_dev field please use it,
> it may matter at some point.
> 
> Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]                                 ` <20160726025455.GC26206-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
@ 2016-07-26  8:03                                   ` Michael Kerrisk (man-pages)
  2016-07-26 19:38                                     ` Eric W. Biederman
  1 sibling, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-26  8:03 UTC (permalink / raw)
  To: Andrew Vagin, Eric W. Biederman
  Cc: Serge Hallyn, Andrey Vagin, Linux API, Linux Containers, LKML,
	criu-GEFAQzZX7r8dnm+yROfE0A, mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-fsdevel, James Bottomley, Alexander Viro

On 07/26/2016 04:54 AM, Andrew Vagin wrote:
> On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> [snip]
>
>> [snip]
>>>>> So, from my point of view, the important piece that was missing from
>>>>> your commit message was the note to use readlink("/proc/self/fd/%d")
>>>>> on the returned FDs. I think that detail needs to be part of the
>>>>> commit message (and also the man page text). I think it even be
>>>>> helpful to include the above program as part of the commit message:
>>>>> it helps people more quickly grasp the API.
>>>>
>>>> Please, please make the standard way to compare these things fstat.
>>>> That is much less magic than a symlink, and a little more future proof.
>>>> Possibly even kcmp.
>
> I like the idea to use kcmp to compare namespaces. I am going to add this
> functionality to kcmp and describe all these in the man page.

Hi Andrey,

Can you briefly sketch out the proposed API and how it would be used?
I'd find it useful to see that even before the implementation.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]                                 ` <20160726025455.GC26206-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
@ 2016-07-26  8:03                                   ` Michael Kerrisk (man-pages)
  2016-07-26 19:38                                     ` Eric W. Biederman
  1 sibling, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-26  8:03 UTC (permalink / raw)
  To: Andrew Vagin, Eric W. Biederman
  Cc: mtk.manpages, Andrey Vagin, Serge Hallyn, criu, Linux API,
	Linux Containers, LKML, James Bottomley, linux-fsdevel,
	Alexander Viro

On 07/26/2016 04:54 AM, Andrew Vagin wrote:
> On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>
> [snip]
>
>> [snip]
>>>>> So, from my point of view, the important piece that was missing from
>>>>> your commit message was the note to use readlink("/proc/self/fd/%d")
>>>>> on the returned FDs. I think that detail needs to be part of the
>>>>> commit message (and also the man page text). I think it even be
>>>>> helpful to include the above program as part of the commit message:
>>>>> it helps people more quickly grasp the API.
>>>>
>>>> Please, please make the standard way to compare these things fstat.
>>>> That is much less magic than a symlink, and a little more future proof.
>>>> Possibly even kcmp.
>
> I like the idea to use kcmp to compare namespaces. I am going to add this
> functionality to kcmp and describe all these in the man page.

Hi Andrey,

Can you briefly sketch out the proposed API and how it would be used?
I'd find it useful to see that even before the implementation.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-26  8:03                                   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-26  8:03 UTC (permalink / raw)
  To: Andrew Vagin, Eric W. Biederman
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Andrey Vagin, Serge Hallyn,
	criu-GEFAQzZX7r8dnm+yROfE0A, Linux API, Linux Containers, LKML,
	James Bottomley, linux-fsdevel, Alexander Viro

On 07/26/2016 04:54 AM, Andrew Vagin wrote:
> On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> [snip]
>
>> [snip]
>>>>> So, from my point of view, the important piece that was missing from
>>>>> your commit message was the note to use readlink("/proc/self/fd/%d")
>>>>> on the returned FDs. I think that detail needs to be part of the
>>>>> commit message (and also the man page text). I think it even be
>>>>> helpful to include the above program as part of the commit message:
>>>>> it helps people more quickly grasp the API.
>>>>
>>>> Please, please make the standard way to compare these things fstat.
>>>> That is much less magic than a symlink, and a little more future proof.
>>>> Possibly even kcmp.
>
> I like the idea to use kcmp to compare namespaces. I am going to add this
> functionality to kcmp and describe all these in the man page.

Hi Andrey,

Can you briefly sketch out the proposed API and how it would be used?
I'd find it useful to see that even before the implementation.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-26  8:03                                   ` Michael Kerrisk (man-pages)
@ 2016-07-26 18:25                                       ` Andrew Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-26 18:25 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Serge Hallyn, Andrey Vagin, Linux API, Linux Containers, LKML,
	criu-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman, linux-fsdevel,
	James Bottomley, Alexander Viro

On Tue, Jul 26, 2016 at 10:03:25AM +0200, Michael Kerrisk (man-pages) wrote:
> On 07/26/2016 04:54 AM, Andrew Vagin wrote:
> > On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
> > > "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> > 
> > [snip]
> > 
> > > [snip]
> > > > > > So, from my point of view, the important piece that was missing from
> > > > > > your commit message was the note to use readlink("/proc/self/fd/%d")
> > > > > > on the returned FDs. I think that detail needs to be part of the
> > > > > > commit message (and also the man page text). I think it even be
> > > > > > helpful to include the above program as part of the commit message:
> > > > > > it helps people more quickly grasp the API.
> > > > > 
> > > > > Please, please make the standard way to compare these things fstat.
> > > > > That is much less magic than a symlink, and a little more future proof.
> > > > > Possibly even kcmp.
> > 
> > I like the idea to use kcmp to compare namespaces. I am going to add this
> > functionality to kcmp and describe all these in the man page.
> 
> Hi Andrey,
> 
> Can you briefly sketch out the proposed API and how it would be used?
> I'd find it useful to see that even before the implementation.

Sure. If a process wants to compare two namespaces, it needs to get file
descriptors for them (open /proc/PID/ns/XXX, use new ioctl-s, find a
process which has them),
and then it calls kcmp(pid1, pid2, KCMP_NSFD, ns_fd1, ns_fd2)

For example, if we want to compare pid namespaces for 1 and 2 processes:

pid = getpid();
ns_fd1 = open("/proc/1/ns/pid")
ns_fd2 = open("/proc/2/ns/pid")

if (!kcmp(pid, pid, KCMP_NSFD, ns_fd1, ns_fd2))
	printf("Both processes live in the same pid namespace\n");

Thanks,
Andrew
> 
> Cheers,
> 
> Michael
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-26 18:25                                       ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-26 18:25 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Eric W. Biederman, Andrey Vagin, Serge Hallyn, criu, Linux API,
	Linux Containers, LKML, James Bottomley, linux-fsdevel,
	Alexander Viro

On Tue, Jul 26, 2016 at 10:03:25AM +0200, Michael Kerrisk (man-pages) wrote:
> On 07/26/2016 04:54 AM, Andrew Vagin wrote:
> > On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
> > > "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> > 
> > [snip]
> > 
> > > [snip]
> > > > > > So, from my point of view, the important piece that was missing from
> > > > > > your commit message was the note to use readlink("/proc/self/fd/%d")
> > > > > > on the returned FDs. I think that detail needs to be part of the
> > > > > > commit message (and also the man page text). I think it even be
> > > > > > helpful to include the above program as part of the commit message:
> > > > > > it helps people more quickly grasp the API.
> > > > > 
> > > > > Please, please make the standard way to compare these things fstat.
> > > > > That is much less magic than a symlink, and a little more future proof.
> > > > > Possibly even kcmp.
> > 
> > I like the idea to use kcmp to compare namespaces. I am going to add this
> > functionality to kcmp and describe all these in the man page.
> 
> Hi Andrey,
> 
> Can you briefly sketch out the proposed API and how it would be used?
> I'd find it useful to see that even before the implementation.

Sure. If a process wants to compare two namespaces, it needs to get file
descriptors for them (open /proc/PID/ns/XXX, use new ioctl-s, find a
process which has them),
and then it calls kcmp(pid1, pid2, KCMP_NSFD, ns_fd1, ns_fd2)

For example, if we want to compare pid namespaces for 1 and 2 processes:

pid = getpid();
ns_fd1 = open("/proc/1/ns/pid")
ns_fd2 = open("/proc/2/ns/pid")

if (!kcmp(pid, pid, KCMP_NSFD, ns_fd1, ns_fd2))
	printf("Both processes live in the same pid namespace\n");

Thanks,
Andrew
> 
> Cheers,
> 
> Michael
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]                                       ` <20160726182524.GA328-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
@ 2016-07-26 18:32                                         ` W. Trevor King
  2016-07-26 19:17                                         ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-26 18:32 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: Serge Hallyn, Andrey Vagin, Linux API, Linux Containers, LKML,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A,
	Michael Kerrisk (man-pages),
	linux-fsdevel, James Bottomley, Eric W. Biederman


[-- Attachment #1.1: Type: text/plain, Size: 569 bytes --]

On Tue, Jul 26, 2016 at 11:25:24AM -0700, Andrew Vagin wrote:
> Sure. If a process wants to compare two namespaces, it needs to get file
> descriptors for them (open /proc/PID/ns/XXX, use new ioctl-s, find a
> process which has them),
> and then it calls kcmp(pid1, pid2, KCMP_NSFD, ns_fd1, ns_fd2)

If you use the new ioctl-s to get ns_fd2, do you walk your local /proc
to find pid2?

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]                                       ` <20160726182524.GA328-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
@ 2016-07-26 18:32                                         ` W. Trevor King
  2016-07-26 19:17                                         ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-26 18:32 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: Michael Kerrisk (man-pages),
	Serge Hallyn, Andrey Vagin, Linux API, Linux Containers, LKML,
	criu, Eric W. Biederman, linux-fsdevel, James Bottomley,
	Alexander Viro

[-- Attachment #1: Type: text/plain, Size: 569 bytes --]

On Tue, Jul 26, 2016 at 11:25:24AM -0700, Andrew Vagin wrote:
> Sure. If a process wants to compare two namespaces, it needs to get file
> descriptors for them (open /proc/PID/ns/XXX, use new ioctl-s, find a
> process which has them),
> and then it calls kcmp(pid1, pid2, KCMP_NSFD, ns_fd1, ns_fd2)

If you use the new ioctl-s to get ns_fd2, do you walk your local /proc
to find pid2?

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-26 18:32                                         ` W. Trevor King
  0 siblings, 0 replies; 142+ messages in thread
From: W. Trevor King @ 2016-07-26 18:32 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: Michael Kerrisk (man-pages),
	Serge Hallyn, Andrey Vagin, Linux API, Linux Containers, LKML,
	criu-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman, linux-fsdevel,
	James Bottomley, Alexander Viro

[-- Attachment #1: Type: text/plain, Size: 569 bytes --]

On Tue, Jul 26, 2016 at 11:25:24AM -0700, Andrew Vagin wrote:
> Sure. If a process wants to compare two namespaces, it needs to get file
> descriptors for them (open /proc/PID/ns/XXX, use new ioctl-s, find a
> process which has them),
> and then it calls kcmp(pid1, pid2, KCMP_NSFD, ns_fd1, ns_fd2)

If you use the new ioctl-s to get ns_fd2, do you walk your local /proc
to find pid2?

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-26 18:32                                         ` W. Trevor King
@ 2016-07-26 19:11                                             ` Andrew Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-26 19:11 UTC (permalink / raw)
  To: W. Trevor King
  Cc: Serge Hallyn, Andrey Vagin, Linux API, Linux Containers, LKML,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A,
	Michael Kerrisk (man-pages),
	linux-fsdevel, James Bottomley, Eric W. Biederman

On Tue, Jul 26, 2016 at 11:32:25AM -0700, W. Trevor King wrote:
> On Tue, Jul 26, 2016 at 11:25:24AM -0700, Andrew Vagin wrote:
> > Sure. If a process wants to compare two namespaces, it needs to get file
> > descriptors for them (open /proc/PID/ns/XXX, use new ioctl-s, find a
> > process which has them),
> > and then it calls kcmp(pid1, pid2, KCMP_NSFD, ns_fd1, ns_fd2)
> 
> If you use the new ioctl-s to get ns_fd2, do you walk your local /proc
> to find pid2?

If you use the new ioctl-s to get nf_fd2, you will have it in the
current process, so pid2 will be getpid().

pidX identifies a process where to find fdX.

man 2 kcmp:
 The kcmp() system call can be used to check whether the  two processes
 identified  by  pid1  and  pid2 share a kernel resource such as virtual
 memory, file descriptors, and so on.

> 
> Cheers,
> Trevor
> 
> -- 
> This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
> For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-26 19:11                                             ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-26 19:11 UTC (permalink / raw)
  To: W. Trevor King
  Cc: Michael Kerrisk (man-pages),
	Serge Hallyn, Andrey Vagin, Linux API, Linux Containers, LKML,
	criu, Eric W. Biederman, linux-fsdevel, James Bottomley,
	Alexander Viro

On Tue, Jul 26, 2016 at 11:32:25AM -0700, W. Trevor King wrote:
> On Tue, Jul 26, 2016 at 11:25:24AM -0700, Andrew Vagin wrote:
> > Sure. If a process wants to compare two namespaces, it needs to get file
> > descriptors for them (open /proc/PID/ns/XXX, use new ioctl-s, find a
> > process which has them),
> > and then it calls kcmp(pid1, pid2, KCMP_NSFD, ns_fd1, ns_fd2)
> 
> If you use the new ioctl-s to get ns_fd2, do you walk your local /proc
> to find pid2?

If you use the new ioctl-s to get nf_fd2, you will have it in the
current process, so pid2 will be getpid().

pidX identifies a process where to find fdX.

man 2 kcmp:
 The kcmp() system call can be used to check whether the  two processes
 identified  by  pid1  and  pid2 share a kernel resource such as virtual
 memory, file descriptors, and so on.

> 
> Cheers,
> Trevor
> 
> -- 
> This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
> For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]                                       ` <20160726182524.GA328-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
  2016-07-26 18:32                                         ` W. Trevor King
@ 2016-07-26 19:17                                         ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-26 19:17 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: Serge Hallyn, Andrey Vagin, Linux API, Linux Containers, LKML,
	criu-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman, linux-fsdevel,
	James Bottomley, Alexander Viro

Hello Andrew,

On 26 July 2016 at 20:25, Andrew Vagin <avagin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org> wrote:
> On Tue, Jul 26, 2016 at 10:03:25AM +0200, Michael Kerrisk (man-pages) wrote:
>> On 07/26/2016 04:54 AM, Andrew Vagin wrote:
>> > On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
>> > > "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>> >
>> > [snip]
>> >
>> > > [snip]
>> > > > > > So, from my point of view, the important piece that was missing from
>> > > > > > your commit message was the note to use readlink("/proc/self/fd/%d")
>> > > > > > on the returned FDs. I think that detail needs to be part of the
>> > > > > > commit message (and also the man page text). I think it even be
>> > > > > > helpful to include the above program as part of the commit message:
>> > > > > > it helps people more quickly grasp the API.
>> > > > >
>> > > > > Please, please make the standard way to compare these things fstat.
>> > > > > That is much less magic than a symlink, and a little more future proof.
>> > > > > Possibly even kcmp.
>> >
>> > I like the idea to use kcmp to compare namespaces. I am going to add this
>> > functionality to kcmp and describe all these in the man page.
>>
>> Hi Andrey,
>>
>> Can you briefly sketch out the proposed API and how it would be used?
>> I'd find it useful to see that even before the implementation.
>
> Sure. If a process wants to compare two namespaces, it needs to get file
> descriptors for them (open /proc/PID/ns/XXX, use new ioctl-s, find a
> process which has them),
> and then it calls kcmp(pid1, pid2, KCMP_NSFD, ns_fd1, ns_fd2)
>
> For example, if we want to compare pid namespaces for 1 and 2 processes:
>

What's the purpose of the following line, and the use of 'pid' in the
kcmp() call?:

> pid = getpid();
> ns_fd1 = open("/proc/1/ns/pid")
> ns_fd2 = open("/proc/2/ns/pid")
>
> if (!kcmp(pid, pid, KCMP_NSFD, ns_fd1, ns_fd2))
>         printf("Both processes live in the same pid namespace\n");


Thanks,

Michael

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-26 18:25                                       ` Andrew Vagin
                                                         ` (2 preceding siblings ...)
  (?)
@ 2016-07-26 19:17                                       ` Michael Kerrisk (man-pages)
       [not found]                                         ` <CAKgNAkjmOu+vfiMDyeYQkkf7wQBH9PVmJ4nH2CTg43GrN-k7eA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  -1 siblings, 1 reply; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-26 19:17 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: Eric W. Biederman, Andrey Vagin, Serge Hallyn, criu, Linux API,
	Linux Containers, LKML, James Bottomley, linux-fsdevel,
	Alexander Viro

Hello Andrew,

On 26 July 2016 at 20:25, Andrew Vagin <avagin@virtuozzo.com> wrote:
> On Tue, Jul 26, 2016 at 10:03:25AM +0200, Michael Kerrisk (man-pages) wrote:
>> On 07/26/2016 04:54 AM, Andrew Vagin wrote:
>> > On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
>> > > "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>> >
>> > [snip]
>> >
>> > > [snip]
>> > > > > > So, from my point of view, the important piece that was missing from
>> > > > > > your commit message was the note to use readlink("/proc/self/fd/%d")
>> > > > > > on the returned FDs. I think that detail needs to be part of the
>> > > > > > commit message (and also the man page text). I think it even be
>> > > > > > helpful to include the above program as part of the commit message:
>> > > > > > it helps people more quickly grasp the API.
>> > > > >
>> > > > > Please, please make the standard way to compare these things fstat.
>> > > > > That is much less magic than a symlink, and a little more future proof.
>> > > > > Possibly even kcmp.
>> >
>> > I like the idea to use kcmp to compare namespaces. I am going to add this
>> > functionality to kcmp and describe all these in the man page.
>>
>> Hi Andrey,
>>
>> Can you briefly sketch out the proposed API and how it would be used?
>> I'd find it useful to see that even before the implementation.
>
> Sure. If a process wants to compare two namespaces, it needs to get file
> descriptors for them (open /proc/PID/ns/XXX, use new ioctl-s, find a
> process which has them),
> and then it calls kcmp(pid1, pid2, KCMP_NSFD, ns_fd1, ns_fd2)
>
> For example, if we want to compare pid namespaces for 1 and 2 processes:
>

What's the purpose of the following line, and the use of 'pid' in the
kcmp() call?:

> pid = getpid();
> ns_fd1 = open("/proc/1/ns/pid")
> ns_fd2 = open("/proc/2/ns/pid")
>
> if (!kcmp(pid, pid, KCMP_NSFD, ns_fd1, ns_fd2))
>         printf("Both processes live in the same pid namespace\n");


Thanks,

Michael

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-26  2:54                                 ` Andrew Vagin
@ 2016-07-26 19:38                                     ` Eric W. Biederman
  -1 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-26 19:38 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: Serge Hallyn, Andrey Vagin, Linux API, Linux Containers, LKML,
	criu-GEFAQzZX7r8dnm+yROfE0A, Michael Kerrisk (man-pages),
	linux-fsdevel, James Bottomley, Alexander Viro

Andrew Vagin <avagin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org> writes:

> On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> [snip]
>
>> [snip]
>> >>> So, from my point of view, the important piece that was missing from
>> >>> your commit message was the note to use readlink("/proc/self/fd/%d")
>> >>> on the returned FDs. I think that detail needs to be part of the
>> >>> commit message (and also the man page text). I think it even be
>> >>> helpful to include the above program as part of the commit message:
>> >>> it helps people more quickly grasp the API.
>> >>
>> >> Please, please make the standard way to compare these things fstat.
>> >> That is much less magic than a symlink, and a little more future proof.
>> >> Possibly even kcmp.
>
> I like the idea to use kcmp to compare namespaces. I am going to add this
> functionality to kcmp and describe all these in the man page.
>
>> >
>> > As in fstat() to get the st_ino field, right?
>> 
>> Both the st_ino and st_dev fields.
>> 
>> The most likely change to support checkpoint/restart in the future is to
>> preserve st_ino across migrations and instantiate a different instance
>> of nsfs to hold the inode numbers from the previous machine.
>
> It sounds tricky. BTW: Actually this is not only one places where we have
> this sort of problem. For example, now mount id-s are not preserved when
> a container is migrated. The same problem is applied to tmpfs, where
> inode numbers are not preserved for files.

Agreed.

Interesting. Interesting. Interesting.

I am not completely convinced that improving kcmp solves it for
everything but improving kcmp sounds good enough to be very interesting
and enough to solve a practical case (migration in migration).  Plus
improving kcmp is cheap and easy.

I would propose:

KCMP_OBJECT
    Check whether a file descriptor idx1 in the process pid1 refers to
    the same underlying object as file descriptor idx2 in the process
    pid2.

The default case would be checking to see if to file descriptors refer
to the same inode.  But for weird cases (like proc pid directories, or
sysfs files) the comparison could look deeper.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-26 19:38                                     ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-26 19:38 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: Michael Kerrisk (man-pages),
	Andrey Vagin, Serge Hallyn, criu, Linux API, Linux Containers,
	LKML, James Bottomley, linux-fsdevel, Alexander Viro

Andrew Vagin <avagin@virtuozzo.com> writes:

> On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>
> [snip]
>
>> [snip]
>> >>> So, from my point of view, the important piece that was missing from
>> >>> your commit message was the note to use readlink("/proc/self/fd/%d")
>> >>> on the returned FDs. I think that detail needs to be part of the
>> >>> commit message (and also the man page text). I think it even be
>> >>> helpful to include the above program as part of the commit message:
>> >>> it helps people more quickly grasp the API.
>> >>
>> >> Please, please make the standard way to compare these things fstat.
>> >> That is much less magic than a symlink, and a little more future proof.
>> >> Possibly even kcmp.
>
> I like the idea to use kcmp to compare namespaces. I am going to add this
> functionality to kcmp and describe all these in the man page.
>
>> >
>> > As in fstat() to get the st_ino field, right?
>> 
>> Both the st_ino and st_dev fields.
>> 
>> The most likely change to support checkpoint/restart in the future is to
>> preserve st_ino across migrations and instantiate a different instance
>> of nsfs to hold the inode numbers from the previous machine.
>
> It sounds tricky. BTW: Actually this is not only one places where we have
> this sort of problem. For example, now mount id-s are not preserved when
> a container is migrated. The same problem is applied to tmpfs, where
> inode numbers are not preserved for files.

Agreed.

Interesting. Interesting. Interesting.

I am not completely convinced that improving kcmp solves it for
everything but improving kcmp sounds good enough to be very interesting
and enough to solve a practical case (migration in migration).  Plus
improving kcmp is cheap and easy.

I would propose:

KCMP_OBJECT
    Check whether a file descriptor idx1 in the process pid1 refers to
    the same underlying object as file descriptor idx2 in the process
    pid2.

The default case would be checking to see if to file descriptors refer
to the same inode.  But for weird cases (like proc pid directories, or
sysfs files) the comparison could look deeper.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-26 19:17                                       ` Michael Kerrisk (man-pages)
@ 2016-07-26 20:39                                             ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-26 20:39 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Serge Hallyn, Andrey Vagin, Linux API, Linux Containers, LKML,
	criu-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman, linux-fsdevel,
	James Bottomley, Alexander Viro

On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
> Hello Andrew,
> 
> On 26 July 2016 at 20:25, Andrew Vagin <avagin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org> wrote:
> > On Tue, Jul 26, 2016 at 10:03:25AM +0200, Michael Kerrisk (man-pages) wrote:
> >> On 07/26/2016 04:54 AM, Andrew Vagin wrote:
> >> > On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
> >> > > "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> >> >
> >> > [snip]
> >> >
> >> > > [snip]
> >> > > > > > So, from my point of view, the important piece that was missing from
> >> > > > > > your commit message was the note to use readlink("/proc/self/fd/%d")
> >> > > > > > on the returned FDs. I think that detail needs to be part of the
> >> > > > > > commit message (and also the man page text). I think it even be
> >> > > > > > helpful to include the above program as part of the commit message:
> >> > > > > > it helps people more quickly grasp the API.
> >> > > > >
> >> > > > > Please, please make the standard way to compare these things fstat.
> >> > > > > That is much less magic than a symlink, and a little more future proof.
> >> > > > > Possibly even kcmp.
> >> >
> >> > I like the idea to use kcmp to compare namespaces. I am going to add this
> >> > functionality to kcmp and describe all these in the man page.
> >>
> >> Hi Andrey,
> >>
> >> Can you briefly sketch out the proposed API and how it would be used?
> >> I'd find it useful to see that even before the implementation.
> >
> > Sure. If a process wants to compare two namespaces, it needs to get file
> > descriptors for them (open /proc/PID/ns/XXX, use new ioctl-s, find a
> > process which has them),
> > and then it calls kcmp(pid1, pid2, KCMP_NSFD, ns_fd1, ns_fd2)
> >
> > For example, if we want to compare pid namespaces for 1 and 2 processes:
> >
> 
> What's the purpose of the following line, and the use of 'pid' in the
> kcmp() call?:

It's the existing interface of kcmp. It's used to check whether the
two processes identified  by pid1  and  pid2 share a kernel resource
such as virtual memory, file descriptors, and so on.

If we want to compare two file descriptors of the current process,
it is one of cases for which kcmp can be used. We can call kcmp to
compare two namespaces which are opened in other processes.

Thanks,
Andrew

> 
> > pid = getpid();
> > ns_fd1 = open("/proc/1/ns/pid")
> > ns_fd2 = open("/proc/2/ns/pid")
> >
> > if (!kcmp(pid, pid, KCMP_NSFD, ns_fd1, ns_fd2))
> >         printf("Both processes live in the same pid namespace\n");
> 
> Thanks,
> 
> Michael

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-26 20:39                                             ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-07-26 20:39 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Eric W. Biederman, Andrey Vagin, Serge Hallyn, criu, Linux API,
	Linux Containers, LKML, James Bottomley, linux-fsdevel,
	Alexander Viro

On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
> Hello Andrew,
> 
> On 26 July 2016 at 20:25, Andrew Vagin <avagin@virtuozzo.com> wrote:
> > On Tue, Jul 26, 2016 at 10:03:25AM +0200, Michael Kerrisk (man-pages) wrote:
> >> On 07/26/2016 04:54 AM, Andrew Vagin wrote:
> >> > On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
> >> > > "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> >> >
> >> > [snip]
> >> >
> >> > > [snip]
> >> > > > > > So, from my point of view, the important piece that was missing from
> >> > > > > > your commit message was the note to use readlink("/proc/self/fd/%d")
> >> > > > > > on the returned FDs. I think that detail needs to be part of the
> >> > > > > > commit message (and also the man page text). I think it even be
> >> > > > > > helpful to include the above program as part of the commit message:
> >> > > > > > it helps people more quickly grasp the API.
> >> > > > >
> >> > > > > Please, please make the standard way to compare these things fstat.
> >> > > > > That is much less magic than a symlink, and a little more future proof.
> >> > > > > Possibly even kcmp.
> >> >
> >> > I like the idea to use kcmp to compare namespaces. I am going to add this
> >> > functionality to kcmp and describe all these in the man page.
> >>
> >> Hi Andrey,
> >>
> >> Can you briefly sketch out the proposed API and how it would be used?
> >> I'd find it useful to see that even before the implementation.
> >
> > Sure. If a process wants to compare two namespaces, it needs to get file
> > descriptors for them (open /proc/PID/ns/XXX, use new ioctl-s, find a
> > process which has them),
> > and then it calls kcmp(pid1, pid2, KCMP_NSFD, ns_fd1, ns_fd2)
> >
> > For example, if we want to compare pid namespaces for 1 and 2 processes:
> >
> 
> What's the purpose of the following line, and the use of 'pid' in the
> kcmp() call?:

It's the existing interface of kcmp. It's used to check whether the
two processes identified  by pid1  and  pid2 share a kernel resource
such as virtual memory, file descriptors, and so on.

If we want to compare two file descriptors of the current process,
it is one of cases for which kcmp can be used. We can call kcmp to
compare two namespaces which are opened in other processes.

Thanks,
Andrew

> 
> > pid = getpid();
> > ns_fd1 = open("/proc/1/ns/pid")
> > ns_fd2 = open("/proc/2/ns/pid")
> >
> > if (!kcmp(pid, pid, KCMP_NSFD, ns_fd1, ns_fd2))
> >         printf("Both processes live in the same pid namespace\n");
> 
> Thanks,
> 
> Michael

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]                                             ` <20160726203955.GA9415-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
@ 2016-07-28 10:45                                               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-28 10:45 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: Serge Hallyn, Andrey Vagin, Linux API, Linux Containers, LKML,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, linux-fsdevel,
	James Bottomley, Eric W. Biederman

On 07/26/2016 10:39 PM, Andrew Vagin wrote:
> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
>> Hello Andrew,
>>
>> On 26 July 2016 at 20:25, Andrew Vagin <avagin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org> wrote:
>>> On Tue, Jul 26, 2016 at 10:03:25AM +0200, Michael Kerrisk (man-pages) wrote:
>>>> On 07/26/2016 04:54 AM, Andrew Vagin wrote:
>>>>> On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
>>>>>> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>>>>
>>>>> [snip]
>>>>>
>>>>>> [snip]
>>>>>>>>> So, from my point of view, the important piece that was missing from
>>>>>>>>> your commit message was the note to use readlink("/proc/self/fd/%d")
>>>>>>>>> on the returned FDs. I think that detail needs to be part of the
>>>>>>>>> commit message (and also the man page text). I think it even be
>>>>>>>>> helpful to include the above program as part of the commit message:
>>>>>>>>> it helps people more quickly grasp the API.
>>>>>>>>
>>>>>>>> Please, please make the standard way to compare these things fstat.
>>>>>>>> That is much less magic than a symlink, and a little more future proof.
>>>>>>>> Possibly even kcmp.
>>>>>
>>>>> I like the idea to use kcmp to compare namespaces. I am going to add this
>>>>> functionality to kcmp and describe all these in the man page.
>>>>
>>>> Hi Andrey,
>>>>
>>>> Can you briefly sketch out the proposed API and how it would be used?
>>>> I'd find it useful to see that even before the implementation.
>>>
>>> Sure. If a process wants to compare two namespaces, it needs to get file
>>> descriptors for them (open /proc/PID/ns/XXX, use new ioctl-s, find a
>>> process which has them),
>>> and then it calls kcmp(pid1, pid2, KCMP_NSFD, ns_fd1, ns_fd2)
>>>
>>> For example, if we want to compare pid namespaces for 1 and 2 processes:
>>>
>>
>> What's the purpose of the following line, and the use of 'pid' in the
>> kcmp() call?:
>
> It's the existing interface of kcmp.  It's used to check whether the
> two processes identified  by pid1  and  pid2 share a kernel resource
> such as virtual memory, file descriptors, and so on.


Yes, understood, but it seems a slightly weird use of the interface,
since in general pid1 will be the same as pid2 in this use case,
whereas in the other use cases, pid1 and pid2 are generally not
equal.

> If we want to compare two file descriptors of the current process,
> it is one of cases for which kcmp can be used. We can call kcmp to
> compare two namespaces which are opened in other processes.

Is there really a use case there? I assume we're talking about the
scenario where a process in one namespace opens a /proc/PID/ns/*
file descriptor and passes that FD to another process via a UNIX
domain socket. Is that correct?

So, supposing that we want to build a map of the relationships
between namespaces using the proposed kcmp() API, and there are
say N namespaces? Does this mena we make (N * (N-1) / 2) calls
to kcmp()?

Cheers,

Michael

>>> pid = getpid();
>>> ns_fd1 = open("/proc/1/ns/pid")
>>> ns_fd2 = open("/proc/2/ns/pid")
>>>
>>> if (!kcmp(pid, pid, KCMP_NSFD, ns_fd1, ns_fd2))
>>>         printf("Both processes live in the same pid namespace\n");
>>
>> Thanks,
>>
>> Michael
>


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]                                             ` <20160726203955.GA9415-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
@ 2016-07-28 10:45                                               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-28 10:45 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: mtk.manpages, Eric W. Biederman, Andrey Vagin, Serge Hallyn,
	criu, Linux API, Linux Containers, LKML, James Bottomley,
	linux-fsdevel, Alexander Viro

On 07/26/2016 10:39 PM, Andrew Vagin wrote:
> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
>> Hello Andrew,
>>
>> On 26 July 2016 at 20:25, Andrew Vagin <avagin@virtuozzo.com> wrote:
>>> On Tue, Jul 26, 2016 at 10:03:25AM +0200, Michael Kerrisk (man-pages) wrote:
>>>> On 07/26/2016 04:54 AM, Andrew Vagin wrote:
>>>>> On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
>>>>>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>>>>
>>>>> [snip]
>>>>>
>>>>>> [snip]
>>>>>>>>> So, from my point of view, the important piece that was missing from
>>>>>>>>> your commit message was the note to use readlink("/proc/self/fd/%d")
>>>>>>>>> on the returned FDs. I think that detail needs to be part of the
>>>>>>>>> commit message (and also the man page text). I think it even be
>>>>>>>>> helpful to include the above program as part of the commit message:
>>>>>>>>> it helps people more quickly grasp the API.
>>>>>>>>
>>>>>>>> Please, please make the standard way to compare these things fstat.
>>>>>>>> That is much less magic than a symlink, and a little more future proof.
>>>>>>>> Possibly even kcmp.
>>>>>
>>>>> I like the idea to use kcmp to compare namespaces. I am going to add this
>>>>> functionality to kcmp and describe all these in the man page.
>>>>
>>>> Hi Andrey,
>>>>
>>>> Can you briefly sketch out the proposed API and how it would be used?
>>>> I'd find it useful to see that even before the implementation.
>>>
>>> Sure. If a process wants to compare two namespaces, it needs to get file
>>> descriptors for them (open /proc/PID/ns/XXX, use new ioctl-s, find a
>>> process which has them),
>>> and then it calls kcmp(pid1, pid2, KCMP_NSFD, ns_fd1, ns_fd2)
>>>
>>> For example, if we want to compare pid namespaces for 1 and 2 processes:
>>>
>>
>> What's the purpose of the following line, and the use of 'pid' in the
>> kcmp() call?:
>
> It's the existing interface of kcmp.  It's used to check whether the
> two processes identified  by pid1  and  pid2 share a kernel resource
> such as virtual memory, file descriptors, and so on.


Yes, understood, but it seems a slightly weird use of the interface,
since in general pid1 will be the same as pid2 in this use case,
whereas in the other use cases, pid1 and pid2 are generally not
equal.

> If we want to compare two file descriptors of the current process,
> it is one of cases for which kcmp can be used. We can call kcmp to
> compare two namespaces which are opened in other processes.

Is there really a use case there? I assume we're talking about the
scenario where a process in one namespace opens a /proc/PID/ns/*
file descriptor and passes that FD to another process via a UNIX
domain socket. Is that correct?

So, supposing that we want to build a map of the relationships
between namespaces using the proposed kcmp() API, and there are
say N namespaces? Does this mena we make (N * (N-1) / 2) calls
to kcmp()?

Cheers,

Michael

>>> pid = getpid();
>>> ns_fd1 = open("/proc/1/ns/pid")
>>> ns_fd2 = open("/proc/2/ns/pid")
>>>
>>> if (!kcmp(pid, pid, KCMP_NSFD, ns_fd1, ns_fd2))
>>>         printf("Both processes live in the same pid namespace\n");
>>
>> Thanks,
>>
>> Michael
>


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-28 10:45                                               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-28 10:45 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Eric W. Biederman,
	Andrey Vagin, Serge Hallyn, criu-GEFAQzZX7r8dnm+yROfE0A,
	Linux API, Linux Containers, LKML, James Bottomley,
	linux-fsdevel, Alexander Viro

On 07/26/2016 10:39 PM, Andrew Vagin wrote:
> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
>> Hello Andrew,
>>
>> On 26 July 2016 at 20:25, Andrew Vagin <avagin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org> wrote:
>>> On Tue, Jul 26, 2016 at 10:03:25AM +0200, Michael Kerrisk (man-pages) wrote:
>>>> On 07/26/2016 04:54 AM, Andrew Vagin wrote:
>>>>> On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
>>>>>> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>>>>
>>>>> [snip]
>>>>>
>>>>>> [snip]
>>>>>>>>> So, from my point of view, the important piece that was missing from
>>>>>>>>> your commit message was the note to use readlink("/proc/self/fd/%d")
>>>>>>>>> on the returned FDs. I think that detail needs to be part of the
>>>>>>>>> commit message (and also the man page text). I think it even be
>>>>>>>>> helpful to include the above program as part of the commit message:
>>>>>>>>> it helps people more quickly grasp the API.
>>>>>>>>
>>>>>>>> Please, please make the standard way to compare these things fstat.
>>>>>>>> That is much less magic than a symlink, and a little more future proof.
>>>>>>>> Possibly even kcmp.
>>>>>
>>>>> I like the idea to use kcmp to compare namespaces. I am going to add this
>>>>> functionality to kcmp and describe all these in the man page.
>>>>
>>>> Hi Andrey,
>>>>
>>>> Can you briefly sketch out the proposed API and how it would be used?
>>>> I'd find it useful to see that even before the implementation.
>>>
>>> Sure. If a process wants to compare two namespaces, it needs to get file
>>> descriptors for them (open /proc/PID/ns/XXX, use new ioctl-s, find a
>>> process which has them),
>>> and then it calls kcmp(pid1, pid2, KCMP_NSFD, ns_fd1, ns_fd2)
>>>
>>> For example, if we want to compare pid namespaces for 1 and 2 processes:
>>>
>>
>> What's the purpose of the following line, and the use of 'pid' in the
>> kcmp() call?:
>
> It's the existing interface of kcmp.  It's used to check whether the
> two processes identified  by pid1  and  pid2 share a kernel resource
> such as virtual memory, file descriptors, and so on.


Yes, understood, but it seems a slightly weird use of the interface,
since in general pid1 will be the same as pid2 in this use case,
whereas in the other use cases, pid1 and pid2 are generally not
equal.

> If we want to compare two file descriptors of the current process,
> it is one of cases for which kcmp can be used. We can call kcmp to
> compare two namespaces which are opened in other processes.

Is there really a use case there? I assume we're talking about the
scenario where a process in one namespace opens a /proc/PID/ns/*
file descriptor and passes that FD to another process via a UNIX
domain socket. Is that correct?

So, supposing that we want to build a map of the relationships
between namespaces using the proposed kcmp() API, and there are
say N namespaces? Does this mena we make (N * (N-1) / 2) calls
to kcmp()?

Cheers,

Michael

>>> pid = getpid();
>>> ns_fd1 = open("/proc/1/ns/pid")
>>> ns_fd2 = open("/proc/2/ns/pid")
>>>
>>> if (!kcmp(pid, pid, KCMP_NSFD, ns_fd1, ns_fd2))
>>>         printf("Both processes live in the same pid namespace\n");
>>
>> Thanks,
>>
>> Michael
>


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-28 10:45                                               ` Michael Kerrisk (man-pages)
@ 2016-07-28 12:56                                                   ` Eric W. Biederman
  -1 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-28 12:56 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Serge Hallyn, Andrew Vagin, Linux API, Linux Containers, LKML,
	criu-GEFAQzZX7r8dnm+yROfE0A, Alexander Viro, linux-fsdevel,
	James Bottomley, Andrey Vagin

"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> On 07/26/2016 10:39 PM, Andrew Vagin wrote:
>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:

>> If we want to compare two file descriptors of the current process,
>> it is one of cases for which kcmp can be used. We can call kcmp to
>> compare two namespaces which are opened in other processes.
>
> Is there really a use case there? I assume we're talking about the
> scenario where a process in one namespace opens a /proc/PID/ns/*
> file descriptor and passes that FD to another process via a UNIX
> domain socket. Is that correct?
>
> So, supposing that we want to build a map of the relationships
> between namespaces using the proposed kcmp() API, and there are
> say N namespaces? Does this mena we make (N * (N-1) / 2) calls
> to kcmp()?

Potentially.  The numbers are small enough O(N^2) isn't fatal.

Where kcmp shines is that it allows migration to happen.  Inode numbers
to change (which they very much will today), and still have things work.

We can keep it O(Nlog(N)) by taking advantage of not just the equality
but the ordering relationship.  Although Ugh.  One disadvantage of
kcmp currently is that the way the ordering relationship is defined
the order is not preserved over migration :(

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-28 12:56                                                   ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-28 12:56 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Andrew Vagin, Andrey Vagin, Serge Hallyn, criu, Linux API,
	Linux Containers, LKML, James Bottomley, linux-fsdevel,
	Alexander Viro

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> On 07/26/2016 10:39 PM, Andrew Vagin wrote:
>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:

>> If we want to compare two file descriptors of the current process,
>> it is one of cases for which kcmp can be used. We can call kcmp to
>> compare two namespaces which are opened in other processes.
>
> Is there really a use case there? I assume we're talking about the
> scenario where a process in one namespace opens a /proc/PID/ns/*
> file descriptor and passes that FD to another process via a UNIX
> domain socket. Is that correct?
>
> So, supposing that we want to build a map of the relationships
> between namespaces using the proposed kcmp() API, and there are
> say N namespaces? Does this mena we make (N * (N-1) / 2) calls
> to kcmp()?

Potentially.  The numbers are small enough O(N^2) isn't fatal.

Where kcmp shines is that it allows migration to happen.  Inode numbers
to change (which they very much will today), and still have things work.

We can keep it O(Nlog(N)) by taking advantage of not just the equality
but the ordering relationship.  Although Ugh.  One disadvantage of
kcmp currently is that the way the ordering relationship is defined
the order is not preserved over migration :(

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]                                                   ` <87popxkjjp.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
@ 2016-07-28 19:00                                                     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-28 19:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: James Bottomley, Andrey Vagin, Andrew Vagin, Linux API,
	Linux Containers, LKML, Alexander Viro,
	criu-GEFAQzZX7r8dnm+yROfE0A, mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-fsdevel

Hi Eric,

On 07/28/2016 02:56 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> On 07/26/2016 10:39 PM, Andrew Vagin wrote:
>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
>
>>> If we want to compare two file descriptors of the current process,
>>> it is one of cases for which kcmp can be used. We can call kcmp to
>>> compare two namespaces which are opened in other processes.
>>
>> Is there really a use case there? I assume we're talking about the
>> scenario where a process in one namespace opens a /proc/PID/ns/*
>> file descriptor and passes that FD to another process via a UNIX
>> domain socket. Is that correct?
>>
>> So, supposing that we want to build a map of the relationships
>> between namespaces using the proposed kcmp() API, and there are
>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls
>> to kcmp()?
>
> Potentially.  The numbers are small enough O(N^2) isn't fatal.

Define "small", please.

O(N^2) makes me nervous about what other use cases lurk out
there that may get bitten by this.

> Where kcmp shines is that it allows migration to happen.  Inode numbers
> to change (which they very much will today), and still have things work.


> We can keep it O(Nlog(N)) by taking advantage of not just the equality
> but the ordering relationship.  Although Ugh.

Yes, that sounds pretty ugly...

>One disadvantage of
> kcmp currently is that the way the ordering relationship is defined
> the order is not preserved over migration :(

So, does kcmp() fully solve the proble(s) at hand? It sounds like
not, if I understand your last point correctly.


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-28 12:56                                                   ` Eric W. Biederman
  (?)
@ 2016-07-28 19:00                                                   ` Michael Kerrisk (man-pages)
  2016-07-29 18:05                                                       ` Eric W. Biederman
       [not found]                                                     ` <40e35f1a-10e6-b7a5-936e-a09f008be0d0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  -1 siblings, 2 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-28 19:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: mtk.manpages, Andrew Vagin, Andrey Vagin, Serge E. Hallyn, criu,
	Linux API, Linux Containers, LKML, James Bottomley,
	linux-fsdevel, Alexander Viro

Hi Eric,

On 07/28/2016 02:56 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>
>> On 07/26/2016 10:39 PM, Andrew Vagin wrote:
>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
>
>>> If we want to compare two file descriptors of the current process,
>>> it is one of cases for which kcmp can be used. We can call kcmp to
>>> compare two namespaces which are opened in other processes.
>>
>> Is there really a use case there? I assume we're talking about the
>> scenario where a process in one namespace opens a /proc/PID/ns/*
>> file descriptor and passes that FD to another process via a UNIX
>> domain socket. Is that correct?
>>
>> So, supposing that we want to build a map of the relationships
>> between namespaces using the proposed kcmp() API, and there are
>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls
>> to kcmp()?
>
> Potentially.  The numbers are small enough O(N^2) isn't fatal.

Define "small", please.

O(N^2) makes me nervous about what other use cases lurk out
there that may get bitten by this.

> Where kcmp shines is that it allows migration to happen.  Inode numbers
> to change (which they very much will today), and still have things work.


> We can keep it O(Nlog(N)) by taking advantage of not just the equality
> but the ordering relationship.  Although Ugh.

Yes, that sounds pretty ugly...

>One disadvantage of
> kcmp currently is that the way the ordering relationship is defined
> the order is not preserved over migration :(

So, does kcmp() fully solve the proble(s) at hand? It sounds like
not, if I understand your last point correctly.


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]                                                     ` <40e35f1a-10e6-b7a5-936e-a09f008be0d0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-07-29 18:05                                                       ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-29 18:05 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: James Bottomley, Andrew Vagin, Linux API, Linux Containers, LKML,
	criu-GEFAQzZX7r8dnm+yROfE0A, Alexander Viro, Andrey Vagin,
	linux-fsdevel

"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Hi Eric,
>
> On 07/28/2016 02:56 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>>> On 07/26/2016 10:39 PM, Andrew Vagin wrote:
>>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
>>
>>>> If we want to compare two file descriptors of the current process,
>>>> it is one of cases for which kcmp can be used. We can call kcmp to
>>>> compare two namespaces which are opened in other processes.
>>>
>>> Is there really a use case there? I assume we're talking about the
>>> scenario where a process in one namespace opens a /proc/PID/ns/*
>>> file descriptor and passes that FD to another process via a UNIX
>>> domain socket. Is that correct?
>>>
>>> So, supposing that we want to build a map of the relationships
>>> between namespaces using the proposed kcmp() API, and there are
>>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls
>>> to kcmp()?
>>
>> Potentially.  The numbers are small enough O(N^2) isn't fatal.
>
> Define "small", please.
>
> O(N^2) makes me nervous about what other use cases lurk out
> there that may get bitten by this.

Worst case for N (One namespace per thread) is about 60k.
A typical heavy use case may be 1000 namespaces of any type.
So we are talking about O(N^2) that rarely happens and should be done in
a couple of seconds.

>> Where kcmp shines is that it allows migration to happen.  Inode numbers
>> to change (which they very much will today), and still have things work.
>
>
>> We can keep it O(Nlog(N)) by taking advantage of not just the equality
>> but the ordering relationship.  Although Ugh.
>
> Yes, that sounds pretty ugly...

Actually having thought about this a little more if kcmp returns an
ordering by inode and migration preserves the relative order of
the inodes (which should just be a creation order) it should be quite
solvable.

Switch from an order by inode number to an order by object creation
time, and guarantee that all creations are have an order (which with
task_list_lock we practically already have) and it should be even easier
to create.  (A 64bit nanosecond resolution timestamp is good for 544
years of uptime).  A 64bit number that increments each time an object is
created should have an even better lifespan.

I don't know if we can find a way to give that guarantee for other kcmp
comparisons but it is worth a thought.

>>One disadvantage of
>> kcmp currently is that the way the ordering relationship is defined
>> the order is not preserved over migration :(
>
> So, does kcmp() fully solve the proble(s) at hand? It sounds like
> not, if I understand your last point correctly.

There are 3 possibilities I see for migration in migration, ordered
in order of implementation difficulty.
1) Have a clear signal that migration happened and a nested migration
   needs to restart.
2) Use kcmp so that only the relative order needs to be preserved.
3) Preserve the device number and inode numbers.

At a practical level I think (2) may actually in net be the simplest.
It requires a little more care to implement and you have to opt in,
but it should not require any rolling back of activity (merely careful
ordering of object creation).

I definititely like kcmp knowing how to compare things by inode
(aka st_dev, st_inode) because then even if you have to restart
the comparisons after a migration the exact details you are comparing
are hidden and so it is easier to support and harder to get wrong.

I can imagine how to preserve inode numbers by creating a new instance
of nsfs instance and using the old inode numbers upon restore.  I don't
currently see how we could possibly preserve st_dev over migration short of
a device number namespace.

So if we are going to continue with making device numbers be a legacy
attribute applications should not care about we need a way to compare
things by not looking at st_dev.  Which brings us back to kcmp.

Hmm.  Hotplugging as disk and plugging it back likely will change the
device number and give the same kind of challenge with st_dev (although
you can't keep a file descriptor open across that kind of event).  So
certainly a hotplug event on a device should be enough to say don't care
about the device number.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
       [not found]                                                     ` <40e35f1a-10e6-b7a5-936e-a09f008be0d0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-07-29 18:05                                                       ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-29 18:05 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Andrew Vagin, Andrey Vagin, Serge E. Hallyn, criu, Linux API,
	Linux Containers, LKML, James Bottomley, linux-fsdevel,
	Alexander Viro

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> Hi Eric,
>
> On 07/28/2016 02:56 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>
>>> On 07/26/2016 10:39 PM, Andrew Vagin wrote:
>>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
>>
>>>> If we want to compare two file descriptors of the current process,
>>>> it is one of cases for which kcmp can be used. We can call kcmp to
>>>> compare two namespaces which are opened in other processes.
>>>
>>> Is there really a use case there? I assume we're talking about the
>>> scenario where a process in one namespace opens a /proc/PID/ns/*
>>> file descriptor and passes that FD to another process via a UNIX
>>> domain socket. Is that correct?
>>>
>>> So, supposing that we want to build a map of the relationships
>>> between namespaces using the proposed kcmp() API, and there are
>>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls
>>> to kcmp()?
>>
>> Potentially.  The numbers are small enough O(N^2) isn't fatal.
>
> Define "small", please.
>
> O(N^2) makes me nervous about what other use cases lurk out
> there that may get bitten by this.

Worst case for N (One namespace per thread) is about 60k.
A typical heavy use case may be 1000 namespaces of any type.
So we are talking about O(N^2) that rarely happens and should be done in
a couple of seconds.

>> Where kcmp shines is that it allows migration to happen.  Inode numbers
>> to change (which they very much will today), and still have things work.
>
>
>> We can keep it O(Nlog(N)) by taking advantage of not just the equality
>> but the ordering relationship.  Although Ugh.
>
> Yes, that sounds pretty ugly...

Actually having thought about this a little more if kcmp returns an
ordering by inode and migration preserves the relative order of
the inodes (which should just be a creation order) it should be quite
solvable.

Switch from an order by inode number to an order by object creation
time, and guarantee that all creations are have an order (which with
task_list_lock we practically already have) and it should be even easier
to create.  (A 64bit nanosecond resolution timestamp is good for 544
years of uptime).  A 64bit number that increments each time an object is
created should have an even better lifespan.

I don't know if we can find a way to give that guarantee for other kcmp
comparisons but it is worth a thought.

>>One disadvantage of
>> kcmp currently is that the way the ordering relationship is defined
>> the order is not preserved over migration :(
>
> So, does kcmp() fully solve the proble(s) at hand? It sounds like
> not, if I understand your last point correctly.

There are 3 possibilities I see for migration in migration, ordered
in order of implementation difficulty.
1) Have a clear signal that migration happened and a nested migration
   needs to restart.
2) Use kcmp so that only the relative order needs to be preserved.
3) Preserve the device number and inode numbers.

At a practical level I think (2) may actually in net be the simplest.
It requires a little more care to implement and you have to opt in,
but it should not require any rolling back of activity (merely careful
ordering of object creation).

I definititely like kcmp knowing how to compare things by inode
(aka st_dev, st_inode) because then even if you have to restart
the comparisons after a migration the exact details you are comparing
are hidden and so it is easier to support and harder to get wrong.

I can imagine how to preserve inode numbers by creating a new instance
of nsfs instance and using the old inode numbers upon restore.  I don't
currently see how we could possibly preserve st_dev over migration short of
a device number namespace.

So if we are going to continue with making device numbers be a legacy
attribute applications should not care about we need a way to compare
things by not looking at st_dev.  Which brings us back to kcmp.

Hmm.  Hotplugging as disk and plugging it back likely will change the
device number and give the same kind of challenge with st_dev (although
you can't keep a file descriptor open across that kind of event).  So
certainly a hotplug event on a device should be enough to say don't care
about the device number.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-29 18:05                                                       ` Eric W. Biederman
  0 siblings, 0 replies; 142+ messages in thread
From: Eric W. Biederman @ 2016-07-29 18:05 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Andrew Vagin, Andrey Vagin, Serge E. Hallyn, criu@openvz.org,
	Linux API, Linux Containers, LKML, James Bottomley,
	linux-fsdevel, Alexander Viro

"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Hi Eric,
>
> On 07/28/2016 02:56 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>>> On 07/26/2016 10:39 PM, Andrew Vagin wrote:
>>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
>>
>>>> If we want to compare two file descriptors of the current process,
>>>> it is one of cases for which kcmp can be used. We can call kcmp to
>>>> compare two namespaces which are opened in other processes.
>>>
>>> Is there really a use case there? I assume we're talking about the
>>> scenario where a process in one namespace opens a /proc/PID/ns/*
>>> file descriptor and passes that FD to another process via a UNIX
>>> domain socket. Is that correct?
>>>
>>> So, supposing that we want to build a map of the relationships
>>> between namespaces using the proposed kcmp() API, and there are
>>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls
>>> to kcmp()?
>>
>> Potentially.  The numbers are small enough O(N^2) isn't fatal.
>
> Define "small", please.
>
> O(N^2) makes me nervous about what other use cases lurk out
> there that may get bitten by this.

Worst case for N (One namespace per thread) is about 60k.
A typical heavy use case may be 1000 namespaces of any type.
So we are talking about O(N^2) that rarely happens and should be done in
a couple of seconds.

>> Where kcmp shines is that it allows migration to happen.  Inode numbers
>> to change (which they very much will today), and still have things work.
>
>
>> We can keep it O(Nlog(N)) by taking advantage of not just the equality
>> but the ordering relationship.  Although Ugh.
>
> Yes, that sounds pretty ugly...

Actually having thought about this a little more if kcmp returns an
ordering by inode and migration preserves the relative order of
the inodes (which should just be a creation order) it should be quite
solvable.

Switch from an order by inode number to an order by object creation
time, and guarantee that all creations are have an order (which with
task_list_lock we practically already have) and it should be even easier
to create.  (A 64bit nanosecond resolution timestamp is good for 544
years of uptime).  A 64bit number that increments each time an object is
created should have an even better lifespan.

I don't know if we can find a way to give that guarantee for other kcmp
comparisons but it is worth a thought.

>>One disadvantage of
>> kcmp currently is that the way the ordering relationship is defined
>> the order is not preserved over migration :(
>
> So, does kcmp() fully solve the proble(s) at hand? It sounds like
> not, if I understand your last point correctly.

There are 3 possibilities I see for migration in migration, ordered
in order of implementation difficulty.
1) Have a clear signal that migration happened and a nested migration
   needs to restart.
2) Use kcmp so that only the relative order needs to be preserved.
3) Preserve the device number and inode numbers.

At a practical level I think (2) may actually in net be the simplest.
It requires a little more care to implement and you have to opt in,
but it should not require any rolling back of activity (merely careful
ordering of object creation).

I definititely like kcmp knowing how to compare things by inode
(aka st_dev, st_inode) because then even if you have to restart
the comparisons after a migration the exact details you are comparing
are hidden and so it is easier to support and harder to get wrong.

I can imagine how to preserve inode numbers by creating a new instance
of nsfs instance and using the old inode numbers upon restore.  I don't
currently see how we could possibly preserve st_dev over migration short of
a device number namespace.

So if we are going to continue with making device numbers be a legacy
attribute applications should not care about we need a way to compare
things by not looking at st_dev.  Which brings us back to kcmp.

Hmm.  Hotplugging as disk and plugging it back likely will change the
device number and give the same kind of challenge with st_dev (although
you can't keep a file descriptor open across that kind of event).  So
certainly a hotplug event on a device should be enough to say don't care
about the device number.

Eric

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-29 18:05                                                       ` Eric W. Biederman
@ 2016-07-31 21:31                                                           ` Michael Kerrisk (man-pages)
  -1 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-31 21:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: James Bottomley, Andrey Vagin, Andrew Vagin, Linux API,
	Linux Containers, LKML, Alexander Viro,
	criu-GEFAQzZX7r8dnm+yROfE0A, mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-fsdevel

Hi Eric,

On 07/29/2016 08:05 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> Hi Eric,
>>
>> On 07/28/2016 02:56 PM, Eric W. Biederman wrote:
>>> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>>
>>>> On 07/26/2016 10:39 PM, Andrew Vagin wrote:
>>>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
>>>
>>>>> If we want to compare two file descriptors of the current process,
>>>>> it is one of cases for which kcmp can be used. We can call kcmp to
>>>>> compare two namespaces which are opened in other processes.
>>>>
>>>> Is there really a use case there? I assume we're talking about the
>>>> scenario where a process in one namespace opens a /proc/PID/ns/*
>>>> file descriptor and passes that FD to another process via a UNIX
>>>> domain socket. Is that correct?
>>>>
>>>> So, supposing that we want to build a map of the relationships
>>>> between namespaces using the proposed kcmp() API, and there are
>>>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls
>>>> to kcmp()?
>>>
>>> Potentially.  The numbers are small enough O(N^2) isn't fatal.
>>
>> Define "small", please.
>>
>> O(N^2) makes me nervous about what other use cases lurk out
>> there that may get bitten by this.
>
> Worst case for N (One namespace per thread) is about 60k.

I'm getting an education here: where does the 60k number come from?

> A typical heavy use case may be 1000 namespaces of any type.
> So we are talking about O(N^2) that rarely happens and should be done in
> a couple of seconds.

I don't know whether that's acceptable for the migration use case,
but seems quite bad for the visualization use case.

>>> Where kcmp shines is that it allows migration to happen.  Inode numbers
>>> to change (which they very much will today), and still have things work.
>>
>>
>>> We can keep it O(Nlog(N)) by taking advantage of not just the equality
>>> but the ordering relationship.  Although Ugh.
>>
>> Yes, that sounds pretty ugly...
>
> Actually having thought about this a little more if kcmp returns an
> ordering by inode and migration preserves the relative order of
> the inodes (which should just be a creation order) it should be quite
> solvable.
>
> Switch from an order by inode number to an order by object creation
> time, and guarantee that all creations are have an order (which with
> task_list_lock we practically already have) and it should be even easier
> to create.  (A 64bit nanosecond resolution timestamp is good for 544
> years of uptime).  A 64bit number that increments each time an object is
> created should have an even better lifespan.
>
> I don't know if we can find a way to give that guarantee for other kcmp
> comparisons but it is worth a thought.

Okay. So, this is a pathway to O(Nlog(N)) at least then?

>>> One disadvantage of
>>> kcmp currently is that the way the ordering relationship is defined
>>> the order is not preserved over migration :(
>>
>> So, does kcmp() fully solve the proble(s) at hand? It sounds like
>> not, if I understand your last point correctly.
>
> There are 3 possibilities I see for migration in migration, ordered
> in order of implementation difficulty.
> 1) Have a clear signal that migration happened and a nested migration
>    needs to restart.
> 2) Use kcmp so that only the relative order needs to be preserved.
> 3) Preserve the device number and inode numbers.
>
> At a practical level I think (2) may actually in net be the simplest.
> It requires a little more care to implement and you have to opt in,
> but it should not require any rolling back of activity (merely careful
> ordering of object creation).
>
> I definititely like kcmp knowing how to compare things by inode
> (aka st_dev, st_inode) because then even if you have to restart
> the comparisons after a migration the exact details you are comparing
> are hidden and so it is easier to support and harder to get wrong.
>
> I can imagine how to preserve inode numbers by creating a new instance
> of nsfs instance and using the old inode numbers upon restore.  I don't
> currently see how we could possibly preserve st_dev over migration short of
> a device number namespace.
>
> So if we are going to continue with making device numbers be a legacy
> attribute applications should not care about we need a way to compare
> things by not looking at st_dev.  Which brings us back to kcmp.
>
> Hmm.  Hotplugging as disk and plugging it back likely will change the
> device number and give the same kind of challenge with st_dev (although
> you can't keep a file descriptor open across that kind of event).  So
> certainly a hotplug event on a device should be enough to say don't care
> about the device number.

Okay.

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-07-31 21:31                                                           ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 142+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-07-31 21:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: mtk.manpages, Andrew Vagin, Andrey Vagin, Serge E. Hallyn, criu,
	Linux API, Linux Containers, LKML, James Bottomley,
	linux-fsdevel, Alexander Viro

Hi Eric,

On 07/29/2016 08:05 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>
>> Hi Eric,
>>
>> On 07/28/2016 02:56 PM, Eric W. Biederman wrote:
>>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>>
>>>> On 07/26/2016 10:39 PM, Andrew Vagin wrote:
>>>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
>>>
>>>>> If we want to compare two file descriptors of the current process,
>>>>> it is one of cases for which kcmp can be used. We can call kcmp to
>>>>> compare two namespaces which are opened in other processes.
>>>>
>>>> Is there really a use case there? I assume we're talking about the
>>>> scenario where a process in one namespace opens a /proc/PID/ns/*
>>>> file descriptor and passes that FD to another process via a UNIX
>>>> domain socket. Is that correct?
>>>>
>>>> So, supposing that we want to build a map of the relationships
>>>> between namespaces using the proposed kcmp() API, and there are
>>>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls
>>>> to kcmp()?
>>>
>>> Potentially.  The numbers are small enough O(N^2) isn't fatal.
>>
>> Define "small", please.
>>
>> O(N^2) makes me nervous about what other use cases lurk out
>> there that may get bitten by this.
>
> Worst case for N (One namespace per thread) is about 60k.

I'm getting an education here: where does the 60k number come from?

> A typical heavy use case may be 1000 namespaces of any type.
> So we are talking about O(N^2) that rarely happens and should be done in
> a couple of seconds.

I don't know whether that's acceptable for the migration use case,
but seems quite bad for the visualization use case.

>>> Where kcmp shines is that it allows migration to happen.  Inode numbers
>>> to change (which they very much will today), and still have things work.
>>
>>
>>> We can keep it O(Nlog(N)) by taking advantage of not just the equality
>>> but the ordering relationship.  Although Ugh.
>>
>> Yes, that sounds pretty ugly...
>
> Actually having thought about this a little more if kcmp returns an
> ordering by inode and migration preserves the relative order of
> the inodes (which should just be a creation order) it should be quite
> solvable.
>
> Switch from an order by inode number to an order by object creation
> time, and guarantee that all creations are have an order (which with
> task_list_lock we practically already have) and it should be even easier
> to create.  (A 64bit nanosecond resolution timestamp is good for 544
> years of uptime).  A 64bit number that increments each time an object is
> created should have an even better lifespan.
>
> I don't know if we can find a way to give that guarantee for other kcmp
> comparisons but it is worth a thought.

Okay. So, this is a pathway to O(Nlog(N)) at least then?

>>> One disadvantage of
>>> kcmp currently is that the way the ordering relationship is defined
>>> the order is not preserved over migration :(
>>
>> So, does kcmp() fully solve the proble(s) at hand? It sounds like
>> not, if I understand your last point correctly.
>
> There are 3 possibilities I see for migration in migration, ordered
> in order of implementation difficulty.
> 1) Have a clear signal that migration happened and a nested migration
>    needs to restart.
> 2) Use kcmp so that only the relative order needs to be preserved.
> 3) Preserve the device number and inode numbers.
>
> At a practical level I think (2) may actually in net be the simplest.
> It requires a little more care to implement and you have to opt in,
> but it should not require any rolling back of activity (merely careful
> ordering of object creation).
>
> I definititely like kcmp knowing how to compare things by inode
> (aka st_dev, st_inode) because then even if you have to restart
> the comparisons after a migration the exact details you are comparing
> are hidden and so it is easier to support and harder to get wrong.
>
> I can imagine how to preserve inode numbers by creating a new instance
> of nsfs instance and using the old inode numbers upon restore.  I don't
> currently see how we could possibly preserve st_dev over migration short of
> a device number namespace.
>
> So if we are going to continue with making device numbers be a legacy
> attribute applications should not care about we need a way to compare
> things by not looking at st_dev.  Which brings us back to kcmp.
>
> Hmm.  Hotplugging as disk and plugging it back likely will change the
> device number and give the same kind of challenge with st_dev (although
> you can't keep a file descriptor open across that kind of event).  So
> certainly a hotplug event on a device should be enough to say don't care
> about the device number.

Okay.

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-14 18:20 ` Andrey Vagin
@ 2016-08-01 18:20     ` Alban Crequy
  -1 siblings, 0 replies; 142+ messages in thread
From: Alban Crequy @ 2016-08-01 18:20 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: Serge Hallyn, criu-GEFAQzZX7r8dnm+yROfE0A,
	iago-lYLaGTFnO9sWenYVfaLwtA, Linux API, Linux Containers,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Alban Crequy, Alexander Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages),
	Eric W. Biederman

Hi,

On 14 July 2016 at 20:20, Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> Each namespace has an owning user namespace and now there is not way
> to discover these relationships.
>
> Pid and user namepaces are hierarchical. There is no way to discover
> parent-child relationships too.
>
> Why we may want to know relationships between namespaces?
>
> One use would be visualization, in order to understand the running system.

This looks interesting to me because I am interested in representing
in a graphical way the relationship between different mounts in
different mount namespaces (showing the ID, the parent-children
relationships, mount peer groups, the master-slave relationships etc),
specially for containers. The first idea was to take both
/proc/1/mountinfo and /proc/$OTHER_PID/mountinfo and I can correlate
the "shared:" and "master:" fields in the mountinfo files.

But I cannot read the /proc/$pid/mountinfo of mount namespaces when
there are no processes in those mount namespaces. For example, if
those mount namespaces stay alive only because they contain
"shared&slave" mounts between master mounts and slave mounts that I
can see in /proc/$pid/mountinfo. Fictional example:

# mntns 1, mountinfo 1 (visible via /proc/1/mountinfo)
61 0 253:1 / / rw shared:1

# mntns 2, mountinfo 2 (not visible via any /proc/$pid/mountinfo)
731 569 0:75 / / rw master:1 shared:42

# mntns 3, mountinfo 3 (not visible via any /proc/${container_pid}/mountinfo)
762 597 0:82 / / rw master:42 shared:76

As far as I understand, I cannot get a reference to the mntns2 fd
because mnt namespaces are not hierarchical, and I cannot get its
/proc/???/mountinfo because no processes live inside.

Is there a way around it? Should this use case be handled together?

Thanks!
Alban

> Another would be to answer the question: what capability does process X have to
> perform operations on a resource governed by namespace Y?
>
> One more use-case (which usually called abnormal) is checkpoint/restart.
> In CRIU we age going to dump and restore nested namespaces.
>
> There [1] was a discussion about which interface to choose to determing
> relationships between namespaces.
>
> Eric suggested to add two ioctl-s [2]:
>> Grumble, Grumble.  I think this may actually a case for creating ioctls
>> for these two cases.  Now that random nsfs file descriptors are bind
>> mountable the original reason for using proc files is not as pressing.
>>
>> One ioctl for the user namespace that owns a file descriptor.
>> One ioctl for the parent namespace of a namespace file descriptor.
>
> Here is an implementaions of these ioctl-s.
>
> [1] https://lkml.org/lkml/2016/7/6/158
> [2] https://lkml.org/lkml/2016/7/9/101
>
> Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> Cc: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: "W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org>
> Cc: Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
> Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
>
> --
> 2.5.5
>
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-08-01 18:20     ` Alban Crequy
  0 siblings, 0 replies; 142+ messages in thread
From: Alban Crequy @ 2016-08-01 18:20 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel, James Bottomley, Serge Hallyn, Linux API,
	Linux Containers, Alexander Viro, criu, Eric W. Biederman,
	linux-fsdevel, Michael Kerrisk (man-pages),
	iago, Alban Crequy

Hi,

On 14 July 2016 at 20:20, Andrey Vagin <avagin@openvz.org> wrote:
> Each namespace has an owning user namespace and now there is not way
> to discover these relationships.
>
> Pid and user namepaces are hierarchical. There is no way to discover
> parent-child relationships too.
>
> Why we may want to know relationships between namespaces?
>
> One use would be visualization, in order to understand the running system.

This looks interesting to me because I am interested in representing
in a graphical way the relationship between different mounts in
different mount namespaces (showing the ID, the parent-children
relationships, mount peer groups, the master-slave relationships etc),
specially for containers. The first idea was to take both
/proc/1/mountinfo and /proc/$OTHER_PID/mountinfo and I can correlate
the "shared:" and "master:" fields in the mountinfo files.

But I cannot read the /proc/$pid/mountinfo of mount namespaces when
there are no processes in those mount namespaces. For example, if
those mount namespaces stay alive only because they contain
"shared&slave" mounts between master mounts and slave mounts that I
can see in /proc/$pid/mountinfo. Fictional example:

# mntns 1, mountinfo 1 (visible via /proc/1/mountinfo)
61 0 253:1 / / rw shared:1

# mntns 2, mountinfo 2 (not visible via any /proc/$pid/mountinfo)
731 569 0:75 / / rw master:1 shared:42

# mntns 3, mountinfo 3 (not visible via any /proc/${container_pid}/mountinfo)
762 597 0:82 / / rw master:42 shared:76

As far as I understand, I cannot get a reference to the mntns2 fd
because mnt namespaces are not hierarchical, and I cannot get its
/proc/???/mountinfo because no processes live inside.

Is there a way around it? Should this use case be handled together?

Thanks!
Alban

> Another would be to answer the question: what capability does process X have to
> perform operations on a resource governed by namespace Y?
>
> One more use-case (which usually called abnormal) is checkpoint/restart.
> In CRIU we age going to dump and restore nested namespaces.
>
> There [1] was a discussion about which interface to choose to determing
> relationships between namespaces.
>
> Eric suggested to add two ioctl-s [2]:
>> Grumble, Grumble.  I think this may actually a case for creating ioctls
>> for these two cases.  Now that random nsfs file descriptors are bind
>> mountable the original reason for using proc files is not as pressing.
>>
>> One ioctl for the user namespace that owns a file descriptor.
>> One ioctl for the parent namespace of a namespace file descriptor.
>
> Here is an implementaions of these ioctl-s.
>
> [1] https://lkml.org/lkml/2016/7/6/158
> [2] https://lkml.org/lkml/2016/7/9/101
>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
> Cc: "W. Trevor King" <wking@tremily.us>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Serge Hallyn <serge.hallyn@canonical.com>
>
> --
> 2.5.5
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-07-29 18:05                                                       ` Eric W. Biederman
@ 2016-08-01 23:01                                                           ` Andrew Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-08-01 23:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: James Bottomley, Andrey Vagin, Linux API, Linux Containers, LKML,
	Alexander Viro, criu-GEFAQzZX7r8dnm+yROfE0A,
	Michael Kerrisk (man-pages),
	linux-fsdevel

On Fri, Jul 29, 2016 at 01:05:48PM -0500, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> 
> > Hi Eric,
> >
> > On 07/28/2016 02:56 PM, Eric W. Biederman wrote:
> >> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> >>
> >>> On 07/26/2016 10:39 PM, Andrew Vagin wrote:
> >>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
> >>
> >>>> If we want to compare two file descriptors of the current process,
> >>>> it is one of cases for which kcmp can be used. We can call kcmp to
> >>>> compare two namespaces which are opened in other processes.
> >>>
> >>> Is there really a use case there? I assume we're talking about the
> >>> scenario where a process in one namespace opens a /proc/PID/ns/*
> >>> file descriptor and passes that FD to another process via a UNIX
> >>> domain socket. Is that correct?
> >>>
> >>> So, supposing that we want to build a map of the relationships
> >>> between namespaces using the proposed kcmp() API, and there are
> >>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls
> >>> to kcmp()?
> >>
> >> Potentially.  The numbers are small enough O(N^2) isn't fatal.
> >
> > Define "small", please.
> >
> > O(N^2) makes me nervous about what other use cases lurk out
> > there that may get bitten by this.
> 
> Worst case for N (One namespace per thread) is about 60k.
> A typical heavy use case may be 1000 namespaces of any type.
> So we are talking about O(N^2) that rarely happens and should be done in
> a couple of seconds.
> 
> >> Where kcmp shines is that it allows migration to happen.  Inode numbers
> >> to change (which they very much will today), and still have things work.
> >
> >
> >> We can keep it O(Nlog(N)) by taking advantage of not just the equality
> >> but the ordering relationship.  Although Ugh.
> >
> > Yes, that sounds pretty ugly...
> 
> Actually having thought about this a little more if kcmp returns an
> ordering by inode and migration preserves the relative order of
> the inodes (which should just be a creation order) it should be quite
> solvable.
> 
> Switch from an order by inode number to an order by object creation
> time, and guarantee that all creations are have an order (which with
> task_list_lock we practically already have) and it should be even easier
> to create.  (A 64bit nanosecond resolution timestamp is good for 544
> years of uptime).  A 64bit number that increments each time an object is
> created should have an even better lifespan.
> 
> I don't know if we can find a way to give that guarantee for other kcmp
> comparisons but it is worth a thought.
> 
> >>One disadvantage of
> >> kcmp currently is that the way the ordering relationship is defined
> >> the order is not preserved over migration :(
> >
> > So, does kcmp() fully solve the proble(s) at hand? It sounds like
> > not, if I understand your last point correctly.
> 
> There are 3 possibilities I see for migration in migration, ordered
> in order of implementation difficulty.
> 1) Have a clear signal that migration happened and a nested migration
>    needs to restart.
> 2) Use kcmp so that only the relative order needs to be preserved.
> 3) Preserve the device number and inode numbers.
> 
> At a practical level I think (2) may actually in net be the simplest.
> It requires a little more care to implement and you have to opt in,
> but it should not require any rolling back of activity (merely careful
> ordering of object creation).
> 
> I definititely like kcmp knowing how to compare things by inode
> (aka st_dev, st_inode) because then even if you have to restart
> the comparisons after a migration the exact details you are comparing
> are hidden and so it is easier to support and harder to get wrong.
> 
> I can imagine how to preserve inode numbers by creating a new instance
> of nsfs instance and using the old inode numbers upon restore.  I don't
> currently see how we could possibly preserve st_dev over migration short of
> a device number namespace.

I think we can avoid comparing st_dev if we will compare inode numbers
for parent user namespaces.

Namespaces looks like a tree where user-namespaces are directories and
other namespaces are files.

A namespace can be described by a path in this imaginary file system,
which looks like /userns1/userns2/XXXns.

In this case we need to guarantee uniq names inside each directories and
that they will be not changed over migration.

> 
> So if we are going to continue with making device numbers be a legacy
> attribute applications should not care about we need a way to compare
> things by not looking at st_dev.  Which brings us back to kcmp.
> 
> Hmm.  Hotplugging as disk and plugging it back likely will change the
> device number and give the same kind of challenge with st_dev (although
> you can't keep a file descriptor open across that kind of event).  So
> certainly a hotplug event on a device should be enough to say don't care
> about the device number.
> 
> Eric
> 

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-08-01 23:01                                                           ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-08-01 23:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Kerrisk (man-pages),
	Andrey Vagin, Serge E. Hallyn, criu, Linux API, Linux Containers,
	LKML, James Bottomley, linux-fsdevel, Alexander Viro

On Fri, Jul 29, 2016 at 01:05:48PM -0500, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> 
> > Hi Eric,
> >
> > On 07/28/2016 02:56 PM, Eric W. Biederman wrote:
> >> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> >>
> >>> On 07/26/2016 10:39 PM, Andrew Vagin wrote:
> >>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
> >>
> >>>> If we want to compare two file descriptors of the current process,
> >>>> it is one of cases for which kcmp can be used. We can call kcmp to
> >>>> compare two namespaces which are opened in other processes.
> >>>
> >>> Is there really a use case there? I assume we're talking about the
> >>> scenario where a process in one namespace opens a /proc/PID/ns/*
> >>> file descriptor and passes that FD to another process via a UNIX
> >>> domain socket. Is that correct?
> >>>
> >>> So, supposing that we want to build a map of the relationships
> >>> between namespaces using the proposed kcmp() API, and there are
> >>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls
> >>> to kcmp()?
> >>
> >> Potentially.  The numbers are small enough O(N^2) isn't fatal.
> >
> > Define "small", please.
> >
> > O(N^2) makes me nervous about what other use cases lurk out
> > there that may get bitten by this.
> 
> Worst case for N (One namespace per thread) is about 60k.
> A typical heavy use case may be 1000 namespaces of any type.
> So we are talking about O(N^2) that rarely happens and should be done in
> a couple of seconds.
> 
> >> Where kcmp shines is that it allows migration to happen.  Inode numbers
> >> to change (which they very much will today), and still have things work.
> >
> >
> >> We can keep it O(Nlog(N)) by taking advantage of not just the equality
> >> but the ordering relationship.  Although Ugh.
> >
> > Yes, that sounds pretty ugly...
> 
> Actually having thought about this a little more if kcmp returns an
> ordering by inode and migration preserves the relative order of
> the inodes (which should just be a creation order) it should be quite
> solvable.
> 
> Switch from an order by inode number to an order by object creation
> time, and guarantee that all creations are have an order (which with
> task_list_lock we practically already have) and it should be even easier
> to create.  (A 64bit nanosecond resolution timestamp is good for 544
> years of uptime).  A 64bit number that increments each time an object is
> created should have an even better lifespan.
> 
> I don't know if we can find a way to give that guarantee for other kcmp
> comparisons but it is worth a thought.
> 
> >>One disadvantage of
> >> kcmp currently is that the way the ordering relationship is defined
> >> the order is not preserved over migration :(
> >
> > So, does kcmp() fully solve the proble(s) at hand? It sounds like
> > not, if I understand your last point correctly.
> 
> There are 3 possibilities I see for migration in migration, ordered
> in order of implementation difficulty.
> 1) Have a clear signal that migration happened and a nested migration
>    needs to restart.
> 2) Use kcmp so that only the relative order needs to be preserved.
> 3) Preserve the device number and inode numbers.
> 
> At a practical level I think (2) may actually in net be the simplest.
> It requires a little more care to implement and you have to opt in,
> but it should not require any rolling back of activity (merely careful
> ordering of object creation).
> 
> I definititely like kcmp knowing how to compare things by inode
> (aka st_dev, st_inode) because then even if you have to restart
> the comparisons after a migration the exact details you are comparing
> are hidden and so it is easier to support and harder to get wrong.
> 
> I can imagine how to preserve inode numbers by creating a new instance
> of nsfs instance and using the old inode numbers upon restore.  I don't
> currently see how we could possibly preserve st_dev over migration short of
> a device number namespace.

I think we can avoid comparing st_dev if we will compare inode numbers
for parent user namespaces.

Namespaces looks like a tree where user-namespaces are directories and
other namespaces are files.

A namespace can be described by a path in this imaginary file system,
which looks like /userns1/userns2/XXXns.

In this case we need to guarantee uniq names inside each directories and
that they will be not changed over migration.

> 
> So if we are going to continue with making device numbers be a legacy
> attribute applications should not care about we need a way to compare
> things by not looking at st_dev.  Which brings us back to kcmp.
> 
> Hmm.  Hotplugging as disk and plugging it back likely will change the
> device number and give the same kind of challenge with st_dev (although
> you can't keep a file descriptor open across that kind of event).  So
> certainly a hotplug event on a device should be enough to say don't care
> about the device number.
> 
> Eric
> 

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
  2016-08-01 18:20     ` Alban Crequy
@ 2016-08-01 23:32         ` Andrew Vagin
  -1 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-08-01 23:32 UTC (permalink / raw)
  To: Alban Crequy
  Cc: Serge Hallyn, Andrey Vagin, criu-GEFAQzZX7r8dnm+yROfE0A,
	iago-lYLaGTFnO9sWenYVfaLwtA, Linux API, Linux Containers,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Alban Crequy, Alexander Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk (man-pages),
	Eric W. Biederman

On Mon, Aug 01, 2016 at 08:20:27PM +0200, Alban Crequy wrote:
> Hi,
> 
> On 14 July 2016 at 20:20, Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> > Each namespace has an owning user namespace and now there is not way
> > to discover these relationships.
> >
> > Pid and user namepaces are hierarchical. There is no way to discover
> > parent-child relationships too.
> >
> > Why we may want to know relationships between namespaces?
> >
> > One use would be visualization, in order to understand the running system.
> 
> This looks interesting to me because I am interested in representing
> in a graphical way the relationship between different mounts in
> different mount namespaces (showing the ID, the parent-children
> relationships, mount peer groups, the master-slave relationships etc),
> specially for containers. The first idea was to take both
> /proc/1/mountinfo and /proc/$OTHER_PID/mountinfo and I can correlate
> the "shared:" and "master:" fields in the mountinfo files.
> 
> But I cannot read the /proc/$pid/mountinfo of mount namespaces when
> there are no processes in those mount namespaces. For example, if
> those mount namespaces stay alive only because they contain
> "shared&slave" mounts between master mounts and slave mounts that I
> can see in /proc/$pid/mountinfo. Fictional example:
> 
> # mntns 1, mountinfo 1 (visible via /proc/1/mountinfo)
> 61 0 253:1 / / rw shared:1
> 
> # mntns 2, mountinfo 2 (not visible via any /proc/$pid/mountinfo)
> 731 569 0:75 / / rw master:1 shared:42
> 
> # mntns 3, mountinfo 3 (not visible via any /proc/${container_pid}/mountinfo)
> 762 597 0:82 / / rw master:42 shared:76
> 
> As far as I understand, I cannot get a reference to the mntns2 fd
> because mnt namespaces are not hierarchical, and I cannot get its
> /proc/???/mountinfo because no processes live inside.

Hi Alban,

A mount namespace is alive only if someone lives in it or if it is
bind-mounted somewhere.

In your case, the kernel destroys mntns2 and adjusts groups for mounts:

[root@fc24 zzz]# nsenter --mount=mnt2 -- cat /proc/self/mountinfo | grep zzz
184 183 0:43 / /tmp/zzz/a rw,relatime shared:72 master:70 - tmpfs a rw
[root@fc24 zzz]# nsenter --mount=mnt3 -- cat /proc/self/mountinfo | grep zzz
162 161 0:43 / /tmp/zzz/a rw,relatime master:72 - tmpfs a rw

[root@fc24 zzz]# umount mnt2
[root@fc24 zzz]# nsenter --mount=mnt3 -- cat /proc/self/mountinfo | grep zzz
162 161 0:43 / /tmp/zzz/a rw,relatime master:70 - tmpfs a rw

Thanks,
Andrew

> 
> Is there a way around it? Should this use case be handled together?
> 
> Thanks!
> Alban
> 
> > Another would be to answer the question: what capability does process X have to
> > perform operations on a resource governed by namespace Y?
> >
> > One more use-case (which usually called abnormal) is checkpoint/restart.
> > In CRIU we age going to dump and restore nested namespaces.
> >
> > There [1] was a discussion about which interface to choose to determing
> > relationships between namespaces.
> >
> > Eric suggested to add two ioctl-s [2]:
> >> Grumble, Grumble.  I think this may actually a case for creating ioctls
> >> for these two cases.  Now that random nsfs file descriptors are bind
> >> mountable the original reason for using proc files is not as pressing.
> >>
> >> One ioctl for the user namespace that owns a file descriptor.
> >> One ioctl for the parent namespace of a namespace file descriptor.
> >
> > Here is an implementaions of these ioctl-s.
> >
> > [1] https://lkml.org/lkml/2016/7/6/158
> > [2] https://lkml.org/lkml/2016/7/9/101
> >
> > Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> > Cc: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
> > Cc: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > Cc: "W. Trevor King" <wking-vJI2gpByivqcqzYg7KEe8g@public.gmane.org>
> > Cc: Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
> > Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> >
> > --
> > 2.5.5
> >
> > _______________________________________________
> > Containers mailing list
> > Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
@ 2016-08-01 23:32         ` Andrew Vagin
  0 siblings, 0 replies; 142+ messages in thread
From: Andrew Vagin @ 2016-08-01 23:32 UTC (permalink / raw)
  To: Alban Crequy
  Cc: Andrey Vagin, linux-kernel, James Bottomley, Serge Hallyn,
	Linux API, Linux Containers, Alexander Viro, criu,
	Eric W. Biederman, linux-fsdevel, Michael Kerrisk (man-pages),
	iago, Alban Crequy

On Mon, Aug 01, 2016 at 08:20:27PM +0200, Alban Crequy wrote:
> Hi,
> 
> On 14 July 2016 at 20:20, Andrey Vagin <avagin@openvz.org> wrote:
> > Each namespace has an owning user namespace and now there is not way
> > to discover these relationships.
> >
> > Pid and user namepaces are hierarchical. There is no way to discover
> > parent-child relationships too.
> >
> > Why we may want to know relationships between namespaces?
> >
> > One use would be visualization, in order to understand the running system.
> 
> This looks interesting to me because I am interested in representing
> in a graphical way the relationship between different mounts in
> different mount namespaces (showing the ID, the parent-children
> relationships, mount peer groups, the master-slave relationships etc),
> specially for containers. The first idea was to take both
> /proc/1/mountinfo and /proc/$OTHER_PID/mountinfo and I can correlate
> the "shared:" and "master:" fields in the mountinfo files.
> 
> But I cannot read the /proc/$pid/mountinfo of mount namespaces when
> there are no processes in those mount namespaces. For example, if
> those mount namespaces stay alive only because they contain
> "shared&slave" mounts between master mounts and slave mounts that I
> can see in /proc/$pid/mountinfo. Fictional example:
> 
> # mntns 1, mountinfo 1 (visible via /proc/1/mountinfo)
> 61 0 253:1 / / rw shared:1
> 
> # mntns 2, mountinfo 2 (not visible via any /proc/$pid/mountinfo)
> 731 569 0:75 / / rw master:1 shared:42
> 
> # mntns 3, mountinfo 3 (not visible via any /proc/${container_pid}/mountinfo)
> 762 597 0:82 / / rw master:42 shared:76
> 
> As far as I understand, I cannot get a reference to the mntns2 fd
> because mnt namespaces are not hierarchical, and I cannot get its
> /proc/???/mountinfo because no processes live inside.

Hi Alban,

A mount namespace is alive only if someone lives in it or if it is
bind-mounted somewhere.

In your case, the kernel destroys mntns2 and adjusts groups for mounts:

[root@fc24 zzz]# nsenter --mount=mnt2 -- cat /proc/self/mountinfo | grep zzz
184 183 0:43 / /tmp/zzz/a rw,relatime shared:72 master:70 - tmpfs a rw
[root@fc24 zzz]# nsenter --mount=mnt3 -- cat /proc/self/mountinfo | grep zzz
162 161 0:43 / /tmp/zzz/a rw,relatime master:72 - tmpfs a rw

[root@fc24 zzz]# umount mnt2
[root@fc24 zzz]# nsenter --mount=mnt3 -- cat /proc/self/mountinfo | grep zzz
162 161 0:43 / /tmp/zzz/a rw,relatime master:70 - tmpfs a rw

Thanks,
Andrew

> 
> Is there a way around it? Should this use case be handled together?
> 
> Thanks!
> Alban
> 
> > Another would be to answer the question: what capability does process X have to
> > perform operations on a resource governed by namespace Y?
> >
> > One more use-case (which usually called abnormal) is checkpoint/restart.
> > In CRIU we age going to dump and restore nested namespaces.
> >
> > There [1] was a discussion about which interface to choose to determing
> > relationships between namespaces.
> >
> > Eric suggested to add two ioctl-s [2]:
> >> Grumble, Grumble.  I think this may actually a case for creating ioctls
> >> for these two cases.  Now that random nsfs file descriptors are bind
> >> mountable the original reason for using proc files is not as pressing.
> >>
> >> One ioctl for the user namespace that owns a file descriptor.
> >> One ioctl for the parent namespace of a namespace file descriptor.
> >
> > Here is an implementaions of these ioctl-s.
> >
> > [1] https://lkml.org/lkml/2016/7/6/158
> > [2] https://lkml.org/lkml/2016/7/9/101
> >
> > Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> > Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
> > Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
> > Cc: "W. Trevor King" <wking@tremily.us>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Cc: Serge Hallyn <serge.hallyn@canonical.com>
> >
> > --
> > 2.5.5
> >
> > _______________________________________________
> > Containers mailing list
> > Containers@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 142+ messages in thread

end of thread, other threads:[~2016-08-02  9:49 UTC | newest]

Thread overview: 142+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-14 18:20 [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Andrey Vagin
2016-07-14 18:20 ` Andrey Vagin
2016-07-14 18:20 ` [PATCH 1/5] namespaces: move user_ns into ns_common Andrey Vagin
     [not found]   ` <1468520419-28220-2-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2016-07-15 12:21     ` kbuild test robot
2016-07-15 12:21       ` kbuild test robot
2016-07-14 18:20 ` [PATCH 3/5] nsfs: add ioctl to get an owning user namespace for ns file descriptor Andrey Vagin
2016-07-14 18:48   ` W. Trevor King
2016-07-14 18:48     ` W. Trevor King
     [not found]   ` <1468520419-28220-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2016-07-14 18:48     ` W. Trevor King
2016-07-14 22:02 ` [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Andrey Vagin
2016-07-14 22:02   ` Andrey Vagin
2016-07-24  5:10   ` Eric W. Biederman
2016-07-24  5:10     ` Eric W. Biederman
     [not found]     ` <87poq3liyq.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-07-26  2:07       ` Andrew Vagin
2016-07-26  2:07         ` Andrew Vagin
     [not found]   ` <CANaxB-xw_xBUq=0uT14ANv-jfg2NsGaPy=jyDO9=yF03_7toSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-15  2:12     ` [PATCH 1/5] namespaces: move user_ns into ns_common Andrey Vagin
2016-07-15  2:12       ` Andrey Vagin
2016-07-15  2:12       ` [PATCH 4/5] nsfs: add ioctl to get a parent namespace Andrey Vagin
2016-07-15  2:12         ` Andrey Vagin
     [not found]         ` <1468548742-32136-4-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2016-07-24  5:07           ` Eric W. Biederman
2016-07-24  5:07         ` Eric W. Biederman
2016-07-24  5:07           ` Eric W. Biederman
2016-07-15  2:12       ` [PATCH 5/5] tools/testing: add a test to check nsfs ioctl-s Andrey Vagin
     [not found]       ` <1468548742-32136-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2016-07-15  2:12         ` [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace Andrey Vagin
2016-07-15  2:12           ` Andrey Vagin
     [not found]           ` <1468548742-32136-2-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2016-07-24  5:03             ` Eric W. Biederman
2016-07-24 16:54             ` W. Trevor King
2016-07-24 16:54               ` W. Trevor King
2016-07-24  5:03           ` Eric W. Biederman
2016-07-24  5:03             ` Eric W. Biederman
     [not found]             ` <878twrmxu2.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-07-24  6:37               ` Andrew Vagin
2016-07-24  6:37                 ` Andrew Vagin
     [not found]                 ` <20160724063728.GA17810-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-07-24 14:30                   ` Eric W. Biederman
2016-07-24 14:30                 ` Eric W. Biederman
2016-07-24 14:30                   ` Eric W. Biederman
     [not found]                   ` <87shuzglck.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-07-24 17:05                     ` W. Trevor King
2016-07-24 17:05                       ` W. Trevor King
2016-07-15  2:12         ` [PATCH 3/5] nsfs: add ioctl to get an owning user namespace for ns file descriptor Andrey Vagin
2016-07-15  2:12           ` Andrey Vagin
2016-07-15  2:12         ` [PATCH 4/5] nsfs: add ioctl to get a parent namespace Andrey Vagin
2016-07-15  2:12         ` [PATCH 5/5] tools/testing: add a test to check nsfs ioctl-s Andrey Vagin
2016-07-16  8:21         ` [PATCH 1/5] namespaces: move user_ns into ns_common kbuild test robot
2016-07-16  8:21           ` kbuild test robot
2016-07-23 23:07         ` kbuild test robot
2016-07-23 23:07           ` kbuild test robot
2016-07-24  5:00         ` Eric W. Biederman
2016-07-24  5:00       ` Eric W. Biederman
2016-07-24  5:00         ` Eric W. Biederman
2016-07-24  5:54         ` Andrew Vagin
2016-07-24  5:54           ` Andrew Vagin
     [not found]         ` <87k2gbmy02.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-07-24  5:54           ` Andrew Vagin
2016-07-24  5:54             ` Andrew Vagin
2016-07-24  5:54           ` Andrew Vagin
2016-07-24  5:54             ` Andrew Vagin
2016-07-24  5:54         ` Andrew Vagin
2016-07-24  5:54           ` Andrew Vagin
2016-07-24  5:10     ` [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Eric W. Biederman
2016-07-21 14:41 ` Michael Kerrisk (man-pages)
2016-07-21 14:41   ` Michael Kerrisk (man-pages)
     [not found]   ` <c9bdaf3d-ec93-d754-81ac-9f524a0d0954-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-21 21:06     ` Andrew Vagin
2016-07-21 21:06       ` Andrew Vagin
2016-07-21 21:06       ` Andrew Vagin
     [not found]       ` <20160721210650.GA10989-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-07-22  6:48         ` Michael Kerrisk (man-pages)
     [not found]           ` <1515f5f2-5a49-fcab-61f4-8b627d3ba3e2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-22 18:25             ` Andrey Vagin
2016-07-22 18:25               ` Andrey Vagin
2016-07-25 11:47               ` Michael Kerrisk (man-pages)
2016-07-25 11:47                 ` Michael Kerrisk (man-pages)
     [not found]                 ` <e2811bf1-4b86-e115-bcdb-301d6f2546eb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-25 13:18                   ` Eric W. Biederman
2016-07-25 13:18                     ` Eric W. Biederman
     [not found]                     ` <87lh0pg8jx.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-07-25 14:46                       ` Michael Kerrisk (man-pages)
2016-07-25 14:46                         ` Michael Kerrisk (man-pages)
     [not found]                         ` <44ca0e41-dc92-45b1-2a6c-c41a048a072d-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-25 14:54                           ` Serge E. Hallyn
2016-07-25 14:59                           ` Eric W. Biederman
2016-07-25 14:59                             ` Eric W. Biederman
     [not found]                             ` <87r3ahepb4.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-07-26  2:54                               ` Andrew Vagin
2016-07-26  2:54                                 ` Andrew Vagin
2016-07-26  8:03                                 ` Michael Kerrisk (man-pages)
2016-07-26  8:03                                   ` Michael Kerrisk (man-pages)
     [not found]                                   ` <3390535b-0660-757f-aeba-c03d936b3485-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-26 18:25                                     ` Andrew Vagin
2016-07-26 18:25                                       ` Andrew Vagin
2016-07-26 18:32                                       ` W. Trevor King
2016-07-26 18:32                                         ` W. Trevor King
     [not found]                                         ` <20160726183224.GN24913-q4NCUed9G3sTnwFZoN752g@public.gmane.org>
2016-07-26 19:11                                           ` Andrew Vagin
2016-07-26 19:11                                             ` Andrew Vagin
     [not found]                                       ` <20160726182524.GA328-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-07-26 18:32                                         ` W. Trevor King
2016-07-26 19:17                                         ` Michael Kerrisk (man-pages)
2016-07-26 19:17                                       ` Michael Kerrisk (man-pages)
     [not found]                                         ` <CAKgNAkjmOu+vfiMDyeYQkkf7wQBH9PVmJ4nH2CTg43GrN-k7eA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-26 20:39                                           ` Andrew Vagin
2016-07-26 20:39                                             ` Andrew Vagin
2016-07-28 10:45                                             ` Michael Kerrisk (man-pages)
2016-07-28 10:45                                               ` Michael Kerrisk (man-pages)
     [not found]                                               ` <ca0787a3-b270-e962-46d1-7e63c9335a55-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-28 12:56                                                 ` Eric W. Biederman
2016-07-28 12:56                                                   ` Eric W. Biederman
2016-07-28 19:00                                                   ` Michael Kerrisk (man-pages)
2016-07-29 18:05                                                     ` Eric W. Biederman
2016-07-29 18:05                                                       ` Eric W. Biederman
     [not found]                                                       ` <87h9b8e2v7.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-07-31 21:31                                                         ` Michael Kerrisk (man-pages)
2016-07-31 21:31                                                           ` Michael Kerrisk (man-pages)
2016-08-01 23:01                                                         ` Andrew Vagin
2016-08-01 23:01                                                           ` Andrew Vagin
     [not found]                                                     ` <40e35f1a-10e6-b7a5-936e-a09f008be0d0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-29 18:05                                                       ` Eric W. Biederman
     [not found]                                                   ` <87popxkjjp.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-07-28 19:00                                                     ` Michael Kerrisk (man-pages)
     [not found]                                             ` <20160726203955.GA9415-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-07-28 10:45                                               ` Michael Kerrisk (man-pages)
     [not found]                                 ` <20160726025455.GC26206-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-07-26  8:03                                   ` Michael Kerrisk (man-pages)
2016-07-26 19:38                                   ` Eric W. Biederman
2016-07-26 19:38                                     ` Eric W. Biederman
2016-07-25 14:54                         ` Serge E. Hallyn
2016-07-25 14:54                           ` Serge E. Hallyn
2016-07-25 15:17                           ` Eric W. Biederman
2016-07-25 15:17                             ` Eric W. Biederman
     [not found]                           ` <20160725145445.GA19879-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-07-25 15:17                             ` Eric W. Biederman
     [not found]               ` <CANaxB-w8H8Wo8FmtmBBZTpJX-ZDGRQx0rbm9E5c9WbduQ_Ukmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-25 11:47                 ` Michael Kerrisk (man-pages)
     [not found] ` <1468520419-28220-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2016-07-14 18:20   ` [PATCH 1/5] namespaces: move user_ns into ns_common Andrey Vagin
2016-07-14 18:20   ` [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace Andrey Vagin
2016-07-14 18:20     ` Andrey Vagin
     [not found]     ` <1468520419-28220-3-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2016-07-14 19:07       ` W. Trevor King
2016-07-14 19:07         ` W. Trevor King
2016-07-14 18:20   ` [PATCH 3/5] nsfs: add ioctl to get an owning user namespace for ns file descriptor Andrey Vagin
2016-07-14 18:20   ` [PATCH 4/5] nsfs: add ioctl to get a parent namespace Andrey Vagin
2016-07-14 18:20     ` Andrey Vagin
2016-07-14 18:20   ` [PATCH 5/5] tools/testing: add a test to check nsfs ioctl-s Andrey Vagin
2016-07-14 18:20     ` Andrey Vagin
2016-07-14 22:02   ` [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Andrey Vagin
2016-07-21 14:41   ` Michael Kerrisk (man-pages)
2016-07-23 21:14   ` W. Trevor King
2016-07-23 21:14     ` W. Trevor King
     [not found]     ` <20160723211414.GA25371-q4NCUed9G3sTnwFZoN752g@public.gmane.org>
2016-07-23 21:38       ` James Bottomley
2016-07-23 21:38         ` James Bottomley
     [not found]         ` <1469309936.2332.35.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2016-07-23 21:58           ` W. Trevor King
2016-07-23 21:58         ` W. Trevor King
2016-07-23 21:58           ` W. Trevor King
2016-07-23 21:56           ` Eric W. Biederman
2016-07-23 21:56             ` Eric W. Biederman
     [not found]             ` <87mvl8nhlv.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-07-23 22:34               ` W. Trevor King
2016-07-23 22:34                 ` W. Trevor King
     [not found]                 ` <20160723223448.GP24913-q4NCUed9G3sTnwFZoN752g@public.gmane.org>
2016-07-24  4:51                   ` Eric W. Biederman
2016-07-24  4:51                     ` Eric W. Biederman
     [not found]           ` <20160723215802.GO24913-q4NCUed9G3sTnwFZoN752g@public.gmane.org>
2016-07-23 21:56             ` Eric W. Biederman
2016-08-01 18:20   ` Alban Crequy
2016-08-01 18:20     ` Alban Crequy
     [not found]     ` <CAMXgnP6j+rTeb5XJgoPV20y8puGyVm=9O9gdg9Sah4DuF5qm9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-08-01 23:32       ` Andrew Vagin
2016-08-01 23:32         ` Andrew Vagin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.